Feature extraction
Definition
The process of converting raw text into a numerical vector of linguistic measurements. The choice of features determines what signal the classifier can see, and is as consequential as the classifier itself.
Related terms
- Closed-set attribution
- An attribution task where the true author is assumed to be one of a defined list of candidates. The system ranks candidates;...
- Function words
- Grammatical words, prepositions, conjunctions, articles, pronouns, with little independent content meaning but high frequency in any text. Because they are used without...
- Idiolect
- The language variety specific to an individual, comprising their characteristic vocabulary, syntactic preferences, spelling habits, punctuation patterns, and discourse-level style. Authorship attribution...
- N-gram
- A contiguous sequence of n items (characters, words, or part-of-speech tags) extracted from text. Character n-grams and word n-grams are both standard...
- Open-set attribution
- An attribution task where the true author may or may not appear in the candidate pool. The system must both rank candidates...
Explained in
- Authorship Attribution: Principles and MethodsThe process of converting raw text into a numerical vector of linguistic measurements. The choice of features determines what signal the classifier can see, an...