Text Messaging and Digital Language as Evidence

Q: Can emoji be used to identify an author in a legal case?

Emoji can form part of an idiolect profile : some individuals use distinctive combinations or frequencies : but they are rarely sufficient on their own. They contribute to a broader stylometric picture alongside abbreviations, punctuation habits, and response timing.

Q: What is metadata in the context of digital messages, and why does it matter forensically?

Metadata is the data about the message rather than its content: sender and recipient identifiers, timestamps, delivery receipts, and in some platforms geolocation tags. It can prove a message existed and when it was sent, independent of what the text says, and is usually harder to forge than the message body.

Q: Why does stylometry become less reliable with very short texts?

Stylometry depends on measuring stable features across a large enough sample. A 30-word SMS gives too few instances of any one feature to produce a reliable statistical pattern. Analysts instead look for distinctive markers that appear even in short texts, such as specific abbreviations or spelling variants.

Q: Can deleted messages be recovered and used in court?

Yes. Mobile device forensics tools can recover deleted SMS and messaging app data from device storage, SIM cards, or cloud backups, depending on how the app stores data and how quickly deletion occurred. Recovered fragments may be partial, requiring linguistic analysis of what remains to establish meaning or authorship.

Q: What happened with text evidence in the Damilola Taylor case?

The 2000 murder of Damilola Taylor in London led to convictions in 2006 partly supported by mobile phone evidence including call records and message data. The case illustrated how digital communication records could establish timelines and contacts when physical trace evidence was limited.

Digital communication channels from SMS to messaging apps have created a new category of linguistic evidence, raising distinct challenges around authenticity, authorship attribution in short texts, and the recovery of deleted messages.

Last updated: 19 Jun 2026

Text messages, instant messages, and social media posts now constitute one of the most common categories of linguistic evidence in criminal and civil proceedings worldwide. Their informal register, including abbreviations, emoji, and irregular orthography, does not reduce their evidential value; it is precisely what makes them individuating. Admissibility depends first on proper digital forensics (device imaging with hash verification, not screenshots), then on linguistic analysis calibrated to text length. Classical stylometry degrades sharply below 100 words, so short-text cases rely on distinctive marker comparison rather than distributional statistics.

Billions of daily conversations now leave a written record: abbreviated, emoji-laced, time-stamped fragments scattered across dozens of platforms. For most of history, spoken arguments left no trace; digital communication changed that, and courts worldwide have had to develop frameworks for what such records can establish.

Digital messages have become one of the most common categories of linguistic evidence in criminal and civil proceedings. They appear in harassment and stalking cases, fraud investigations, murder prosecutions, and child safety inquiries. The challenge is that the same features that make digital language distinctive : brevity, abbreviation, code-switching, emoji, and the absence of normal orthographic conventions : also make it hard to apply the authorship tools built for longer, more formal text.

This topic covers what digital written language actually looks like, why its authenticity and chain of custody demand specific handling, what analysts can and cannot say about authorship in short texts, and what fragment recovery from deleted messages can establish in real casework.

By the end of this topic you will be able to:

Explain why digital written language is individuating despite its informal register, and identify the platform-specific features that distinguish it from formal writing.
Describe the digital forensics steps required to authenticate a message before linguistic analysis can carry weight in court.
Apply the correct authorship analysis method for a given text length, distinguishing between distributional stylometry and distinctive marker comparison.
Interpret what deleted-message recovery tools can and cannot establish, and explain how fragment context affects pragmatic interpretation.
Identify the risks of jury misinterpretation of informal digital register and describe how expert evidence can contextualise it without invading the jury's province.

Key terms

Digital written language: Text produced in digital communication contexts : SMS, instant messaging, social media posts, email : distinguished from formal writing by its brevity, informal orthography, emoji use, and expectation of near-instant exchange.
Metadata: Structured data automatically attached to a message: sender identifier, recipient, timestamp, delivery and read receipts, device identifier, and sometimes geolocation. Often more reliable as legal evidence than the message text itself.
Chain of custody: The documented sequence of possession and handling of a piece of evidence from its collection through analysis to court presentation, establishing that it has not been altered or tampered with.
Idiolect: The unique set of language habits belonging to a single individual: their characteristic vocabulary, grammar, spelling choices, abbreviations, and punctuation preferences, which can in principle serve as a linguistic fingerprint.
Forensic stylometry: The statistical analysis of textual features : function-word frequencies, sentence length distributions, punctuation patterns : to identify or compare authors. Reliable on texts of several hundred words; degrades sharply on very short samples.
Code-switching: Alternating between two or more languages or registers within a single conversation or even a single message. Common in bilingual and multilingual communities, and a stylistic feature that can contribute to authorship analysis.

What digital language actually looks like

Linguists who study computer-mediated communication note that digital written language has developed its own conventions that differ markedly from both formal writing and transcribed speech. Some features are nearly universal: truncated words, phonetic spellings, missing punctuation, and frequent use of capital letters or repeated characters for prosodic effect ("FINE" to signal frustration, "noooo" to signal dismay). Others are platform-specific or community-specific.

Abbreviations and acronyms: forms like "lol", "tbh", "omg", and "ngl" are so widespread they function as single lexical items, but their use patterns still vary by age, community, and individual.
Emoji and emoticons: used for tone modulation, irony signalling, and topic punctuation. Individual emoji selection and combination patterns can be distinctive enough to feature in authorship analysis, though the evidential bar is high.
Response timing: the gap between a received and a sent message carries pragmatic meaning in digital conversations : an immediate reply signals urgency; a 12-hour gap signals disengagement or deliberation. Metadata timestamps make this measurable.
Code-switching: bilingual or multilingual users frequently switch languages mid-conversation or even mid-sentence, a pattern that can identify both the user's linguistic background and shifts in topic or register.
Platform-specific norms: Twitter/X posts have different syntactic constraints than WhatsApp messages; Reddit comments differ from SMS. An analyst needs to know the platform's conventions to distinguish idiosyncratic from generic features.

The forensic implication is significant: the features that mark digital language as informal are the same features that make it individuating. Formal writing suppresses individuality through shared conventions; an unselfconscious text to a close contact does not.

Authenticity, metadata, and chain of custody

Before any linguistic analysis of a digital message can have weight in court, two prior questions must be answered: is this the original message, and has it been handled in a way that rules out modification? These are not linguistic questions : they belong to digital forensics : but a forensic linguist working on digital evidence must understand them because the answers constrain what conclusions language analysis can support.

Chain of custody for a digital message from seizure to court.

The standard approach in digital forensics is to image the device immediately on seizure, producing a bit-for-bit copy, and to verify the image with a cryptographic hash (typically SHA-256 or MD5). Every subsequent step : extraction, analysis, presentation : operates on the copy, and the hash can be re-checked at any point to confirm the data is unaltered. If this process was not followed correctly, defence counsel can argue that the message content cannot be relied upon.

Metadata offers a parallel layer of verification. The sender identifier, timestamp, delivery receipt, and read receipt embedded in the platform's data stores are typically harder to falsify than the message body, especially when corroborated by server-side records from the platform operator. Timestamps from multiple independent sources : the sender's device, the recipient's device, and the platform server : that all agree give a much stronger foundation than any single source alone.

Authorship in short texts: why stylometry degrades

Classical forensic stylometry works by counting stable features over a large enough corpus. Function words : "the", "of", "and", "to" : are particularly useful because writers use them unconsciously and their frequencies are hard to consciously manipulate. On texts of several hundred words, these methods can distinguish authors with high reliability. At 30 words, the sample is simply too small for the statistics to hold.

Text length	Stylometry reliability	Most useful methods
500+ words	High; statistical features are stable	Function-word frequency, PCA, Burrows's Delta
100-499 words	Moderate; some features viable	Targeted feature selection, comparison sets
30-100 words (typical SMS)	Low; most features unreliable	Distinctive markers, specific abbreviations, emoji patterns
< 30 words	Very low; single-feature only	Case-specific idiosyncratic markers if present

Short-text authorship analysis is not therefore useless. The method changes. Rather than measuring frequencies, the analyst looks for the presence or absence of specific forms that are individually distinctive: a rare abbreviation used consistently, an unusual spelling that appears in both the questioned message and the suspect's known writing, an emoji used in a particular pragmatic context. These are point features rather than distributional features, and they require a different analytical approach: targeted comparison of discrete markers rather than regression across many variables.

The comparison corpus is equally critical. If the analyst claims that a suspect's characteristic use of "2day" matches the questioned message, they need to know how common "2day" is in the relevant population. Without base-rate data for the demographic, a feature that looks distinctive may be standard in that community. Published corpora of SMS and messaging language, such as the NUS SMS Corpus, provide reference points, though none covers all platforms, dialects, or time periods.

The Damilola Taylor case and digital trace

Damilola Taylor, a 10-year-old boy, was stabbed and killed in Peckham, south London, in November 2000. The initial prosecution collapsed in 2002 when the key witness was severely discredited. A second prosecution in 2006 resulted in the conviction of two brothers, Danny and Ricky Preddie, for manslaughter. The case is significant in the digital evidence context because it represented an early serious use of mobile phone evidence : call records and message data : to help establish the movements and communications of suspects in the period before and after the killing, at a time when digital evidence in criminal proceedings was still an emerging practice.

The case illustrates a pattern that recurs across early digital-evidence prosecutions: phone records function as a timeline even when witnesses are unavailable, intimidated, or unreliable. The metadata : who called whom, when, from which cell tower : can corroborate or contradict other accounts without any linguistic analysis of message content. When the content is also available, it adds a layer of meaning to the timeline.

Deleted messages: recovery, fragments, and what analysis can establish

When a user deletes a message on a mobile device, the operating system typically marks the storage space as available for reuse rather than overwriting the data immediately. Until new data is written to that space, the deleted message may be recoverable in whole or in part. Mobile device forensics tools : Cellebrite UFED, Oxygen Forensic Detective, and others : exploit this to extract deleted SMS records, deleted WhatsApp, Signal, and Telegram conversations, and deleted email.

Recovery pathways for deleted messages from mobile devices.

Recovered fragments present a specific analytical challenge. If only part of a message thread is recovered, such as the outgoing messages but not the incoming, or the first half of a conversation before overwriting, the linguistic analyst must work with incomplete context. Pragmatic interpretation of fragments requires care: a recovered message saying "do it tonight" means something very different depending on what preceded and followed it, and if only half the thread survives, that context is missing.

Platform database structure: messaging apps store conversations in SQLite databases with specific schemas; understanding the schema tells analysts which tables to examine for deleted-record remnants.
Cloud corroboration: if the sender's device was backed up to iCloud, Google Drive, or a platform's own servers, the deleted message may exist intact in the backup, potentially recoverable with appropriate legal process.
Recipient evidence: the other party in a conversation holds a copy of every inbound message. The sender deleting their copy does not affect the recipient's copy, so evidence can be obtained from both ends.
End-to-end encryption: platforms like Signal use encryption that means server-side records contain ciphertext only. Recovery depends on physical access to the device and its decryption keys, not server data.

Presenting digital language evidence in court

A recurring problem in digital message cases is that jurors read informal messages through the conventions of formal written language. A grammatically fragmented, emoji-heavy message can appear aggressive, evasive, or illiterate when it is merely informal. Contextualising the register of the messages is therefore a standard part of the expert's role, prior to any opinion on authorship or meaning.

Ambiguity in short texts is genuine and persistent. Tone, irony, and pragmatic intent that would be conveyed by vocal prosody in speech must be reconstructed from context in a text message, and that reconstruction is interpretive. Courts in the United Kingdom, Australia, and the United States have grappled with whether expert evidence on the meaning of specific messages invades the jury's province. The prevailing view is that an expert may contextualise register and explain linguistic features, but opinions on the ultimate meaning of a disputed message should be carefully scoped.

Worked example

Authorship dispute in a harassment case

A defendant claims the harassing messages were sent by someone else using their phone.

A defendant is charged with sending a series of harassing messages to a former partner over three weeks. The messages were sent from the defendant's registered phone number. The defence argues that the defendant's phone was regularly used by a flatmate and that the harassing messages were sent by the flatmate, not the defendant. The prosecution asks a forensic linguist to compare the questioned messages to a sample of the defendant's known writing : texts to family members, a WhatsApp group chat, and work emails.

Building the comparison corpus. The analyst identifies 200 messages in the known sample and 34 questioned messages. The known sample is long enough to establish base rates for several features; the questioned messages are too short for full stylometric analysis but sufficient for marker comparison.
Feature selection. The analyst looks for features that appear consistently in the known sample: a specific abbreviation for "tomorrow" ("2mo"), a distinctive pattern of ending declarative statements with "innit", and a habit of omitting apostrophes in contractions ("dont", "cant"). All three appear in the questioned messages.
Base-rate consideration. The analyst checks that "innit" as a declarative tag is not universal in the relevant demographic, establishing that it narrows the population of plausible authors. "2mo" is less common than "2moro" or "tmrw" in the reference corpus used.
Opinion framing. The analyst reports that the combination of features supports the view that the questioned messages and the known sample share an author, expressed as moderate-to-strong support rather than certainty. The opinion notes that if the flatmate had extensive exposure to the defendant's texting style, limited accommodation of style is possible and cannot be excluded.

The metadata provides independent corroboration: the harassing messages were sent at times when other phone-usage data (app activity logs extracted from the device) show patterns consistent with the defendant's established daily routine, not the flatmate's. The combination of linguistic and metadata evidence proves stronger than either alone.

Check your understanding

Question 1 of 4· 0 answered

Why does classical stylometry based on function-word frequencies struggle with SMS-length messages?

Key Takeaways

Digital messages have become a primary evidence type across many categories of criminal and civil proceedings, and their informal register : abbreviations, emoji, response timing : is a forensic feature rather than a weakness.
Authenticity requires proper digital forensics: device imaging with hash verification, not screenshots, provides the foundation for any linguistic analysis to stand up in court.
Classical stylometry degrades sharply below 100 words; forensic linguists pivot to distinctive marker analysis for short texts, with careful attention to the base rates of those markers in the relevant population.
Deleted messages are often recoverable from device flash storage, cloud backups, or the recipient's device; partial recovery requires cautious pragmatic interpretation of fragments stripped of their context.
Presenting digital evidence well in court includes educating the jury about informal register so that grammatically fragmented messages are not misread as evidence of aggression, evasion, or low intelligence.

Can emoji be used to identify an author in a legal case?

Emoji can form part of an idiolect profile : some individuals use distinctive combinations or frequencies : but they are rarely sufficient on their own. They contribute to a broader stylometric picture alongside abbreviations, punctuation habits, and response timing.

What is metadata in the context of digital messages, and why does it matter forensically?

Metadata is the data about the message rather than its content: sender and recipient identifiers, timestamps, delivery receipts, and in some platforms geolocation tags. It can prove a message existed and when it was sent, independent of what the text says, and is usually harder to forge than the message body.

Why does stylometry become less reliable with very short texts?

Stylometry depends on measuring stable features across a large enough sample. A 30-word SMS gives too few instances of any one feature to produce a reliable statistical pattern. Analysts instead look for distinctive markers that appear even in short texts, such as specific abbreviations or spelling variants.

Can deleted messages be recovered and used in court?

Yes. Mobile device forensics tools can recover deleted SMS and messaging app data from device storage, SIM cards, or cloud backups, depending on how the app stores data and how quickly deletion occurred. Recovered fragments may be partial, requiring linguistic analysis of what remains to establish meaning or authorship.

What happened with text evidence in the Damilola Taylor case?

The 2000 murder of Damilola Taylor in London led to convictions in 2006 partly supported by mobile phone evidence including call records and message data. The case illustrated how digital communication records could establish timelines and contacts when physical trace evidence was limited.

Test yourself on Forensic Linguistics with free, timed mocks.

Practice Forensic Linguistics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.