Function words
Definition
Grammatical words, prepositions, conjunctions, articles, pronouns, with little independent content meaning but high frequency in any text. Because they are used without conscious thought, their distributional patterns across a corpus are more resistant to stylistic manipulation than content-word choices.
- Examples
- Articles (the, a), prepositions (in, on, for), conjunctions (and, but, or), pronouns (he, she, it)
- Frequency
- Used at high frequency across any text regardless of topic or style
- Attribution strength
- Among the strongest features available for authorship identification
Common questions
Why are function words useful for figuring out who wrote something?+
Function words like articles, prepositions, and conjunctions are used unconsciously and at high frequency in any text. Because people don't think about using them, they're harder to fake or copy from someone else, making them strong markers of how a particular person writes.
Can someone change their function words to hide their writing style?+
Not easily. Unlike content words, function words are grammatically required and largely automatic. People can deliberately choose interesting adjectives or avoid certain topics, but they can't easily change their unconscious use of words like 'the', 'and', 'but', or 'in' across a whole text.
What's the difference between function words and content words?+
Function words are grammatical tools like prepositions, conjunctions, articles, and pronouns. They have little meaning on their own but are essential for structure. Content words are nouns, verbs, adjectives, and adverbs that carry the main ideas. Function words appear in every text; content words depend on the topic.
Related terms
- Idiolect
- The language variety specific to an individual, comprising their characteristic vocabulary, syntactic preferences, spelling habits, punctuation patterns, and discourse-level style. Authorship attribution...
- Closed-set attribution
- An attribution task where the true author is assumed to be one of a defined list of candidates. The system ranks candidates;...
- Corpus
- A principled, structured collection of texts or transcripts used as the basis for systematic frequency analysis. In forensic work a comparison corpus...
- Dialect
- A variety of language defined by a geographic region or social group, characterised by systematic differences in pronunciation, vocabulary, and grammar from...
- Discourse structure
- The way a text or conversation is organised above the sentence level: the sequence of moves in an argument, the turn-taking structure...
- Feature extraction
- The process of converting raw text into a numerical vector of linguistic measurements. The choice of features determines what signal the classifier...
- N-gram
- A contiguous sequence of n items (characters, words, or part-of-speech tags) extracted from text. Character n-grams and word n-grams are both standard...
- Open-set attribution
- An attribution task where the true author may or may not appear in the candidate pool. The system must both rank candidates...
- Register
- The variety of language associated with a particular situation, task, or relationship. Register varies along dimensions of formality, technicality, and interactional mode....
Explained in these topics
- Core Linguistic Concepts for Forensic WorkGrammatical words, prepositions, conjunctions, articles, pronouns, with little independent content meaning but high frequency in any text. Because they are use...
- Authorship Attribution: Principles and MethodsGrammatically obligatory words such as articles, prepositions, conjunctions, and pronouns. Used at high frequency and largely unconsciously, they are among the...