Sentence embedding
Definition
A neural representation of a sentence as a dense numeric vector, trained so that semantically similar sentences land near each other in vector space regardless of wording. Used for semantic plagiarism and cross-language comparison.
Related terms
- Mosaic plagiarism
- Borrowing phrases and expressions from a source and embedding them across new sentences, so no single passage is a direct copy but...
- Paraphrase plagiarism
- Reproducing the ideas and structure of a source while replacing most of the wording, making character-level detection ineffective. Requires semantic rather than...
- Substantial similarity
- The legal standard in copyright infringement: whether the protected expression in one work is reproduced in another to a degree that a...
- Translation plagiarism
- Copying from a source in another language and rendering it in the target language. Defeats monolingual detection entirely and requires cross-lingual semantic...
- Verbatim plagiarism
- Copying text word-for-word from a source without attribution. The simplest form to detect computationally because character-level matching directly finds the copied string.
Explained in
- Plagiarism and Text Reuse: Detection Methods and EvidenceA neural representation of a sentence as a dense numeric vector, trained so that semantically similar sentences land near each other in vector space regardless...