Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.
How next-generation sequencing, whole-chloroplast phylogenomics, microsatellite profiling for Cannabis, and environmental DNA are pushing forensic plant genomics beyond standard barcoding toward more powerful and legally tested identification methods.
Last updated:
For most of forensic botany's history, the question a DNA analyst could answer was: what species is this? Standard barcoding with rbcL and matK answers that question well. But investigators frequently need to know more: not just the species, but the variety, the geographic source, the cultivation batch, or whether two samples came from the same individual plant. Answering those questions requires tools that go well beyond a 550-base barcode.
The past fifteen years have brought three powerful developments to forensic plant genomics. First, next-generation sequencing (NGS) can now decode an entire chloroplast genome in a few hours, producing thousands of informative positions instead of two. Second, microsatellite profiling for forensically important species, especially Cannabis, has matured to the point where validated multiplexes with population-frequency databases allow likelihood ratios comparable to those used in human DNA analysis. Third, environmental DNA (eDNA) extracted from soil or water now lets analysts describe the plant community at a scene without needing to find a leaf or a seed, using sequencing of mixed plant DNA from the environment itself.
This topic covers how each of these approaches works, what biological limits constrain what they can tell a court, and what validation a method must clear before it moves from research to casework. The theme running through all three is the same: power is not the same as admissibility. A method that produces beautiful results in a university laboratory still has to survive a Daubert hearing, an ISO 17025 audit, or a cross-examination by a forensic geneticist who has read the same papers.
From 550 bases to 150 kilobases: what sequencing the whole plastome adds.
The chloroplast genome contains around 120-160 kilobases of sequence, of which the standard rbcL amplicon covers about 550 bases and matK about 850 bases. Together they sample less than one percent of the plastome. The remaining 99 percent contains hundreds of additional variable positions: synonymous substitutions in other genes, intergenic spacers, and structural variations in the inverted repeat regions. When an Illumina short-read run sequences the whole plastome, the analyst has not one match score but potentially thousands.
The forensic gain is clearest for cases where two-locus barcoding gives only a genus-level identification. Whole-plastome phylogenomics can resolve species within groups where rbcL and matK are conserved. For tropical timber, this matters enormously. Dalbergia species (rosewoods) under CITES protection often share nearly identical barcode sequences with non-protected Dalbergia species, and customs enforcement has been limited by the species resolution of barcoding. Whole-plastome SNP phylogenies resolve these species with high confidence.
The limitation of whole-chloroplast methods is maternal inheritance. Because chloroplasts are passed only through the mother plant, two plants from the same maternal line, for example grown from cuttings of the same parent shrub, carry identical plastomes. Whole-chloroplast sequencing places a sample in a maternal clade but cannot distinguish siblings from the same mother. For individualisation, nuclear markers are required.
High forensic demand drove the development of a validated Cannabis STR system well before other plant species.
Cannabis sativa is the single most commonly submitted plant material in forensic laboratories worldwide. This created a strong practical incentive to develop nuclear marker systems for it analogous to the human STR multiplexes used for human identification. Beginning in the early 2000s, researchers identified highly polymorphic microsatellite loci in the Cannabis nuclear genome and tested them for reproducibility across laboratories. By the 2010s, several multiplexes were in use in European and North American laboratories, and population databases covering drug-type Cannabis from multiple geographic origins had been assembled.
What Cannabis SSR profiling answers that barcoding cannot: it can distinguish drug-type Cannabis from low-THC hemp cultivars (though this is not perfect, as modern cultivars show complex genetic structure), it can place a sample within a broad geographic or genetic cluster (e.g. South Asian, African, European cultivated), and in favourable cases it can link two seized samples as having come from the same cultivation batch. The last application is the most powerful for organised crime investigations.
| Question | Two-locus barcoding | Cannabis SSR multiplex |
|---|---|---|
| Is this Cannabis? | Yes (species level) | Yes, with more precision |
| Drug-type vs hemp? | No: chloroplast does not vary | Often yes: based on nuclear cluster |
| Geographic origin? | No | Broad regional cluster in some cases |
| Same batch as another seized sample? | No | Possible if same individual or clone |
| Requires population database? | No (BOLD suffices) | Yes: allele frequencies needed for LR |
You do not need to find a leaf. The soil carries the whole plant community as DNA.
Every plant in a location continuously sheds cells, pollen, root hairs, and other biological material into the surrounding soil and water. This material contains DNA that persists in soil, sometimes for years, long after the plant itself has died or been removed. Environmental DNA (eDNA) metabarcoding sequences all of this mixed DNA simultaneously and uses a reference database to identify which plant species are represented.
The forensic application is scene-to-suspect soil comparison at community level rather than single-species level. Instead of asking 'is this specific pollen grain from a Pinus tree?', the analyst asks 'do the ten most abundant plant species in this soil sample from the suspect's shoes match the ten most abundant species in the soil from the scene?' The community profile is much harder to explain by coincidence than a single-species match, because it reflects the specific mixture of plants growing at that exact location at that time.
Research groups in New Zealand, the United Kingdom, and Germany have published proof-of-concept studies showing that soil eDNA community profiles are sufficiently distinct between locations to allow scene-to-exhibit comparisons, and that profiles are stable enough in preserved soil samples to survive the time between collection and analysis. The method is not yet in routine casework in most jurisdictions, but it has been used in pilot studies by national police forensic units and is approaching the validation standard needed for court use.
A method that works in a research context still has to prove itself in a forensic laboratory.
The history of forensic science is peppered with examples of powerful techniques that moved into casework too quickly, without the validation work needed to understand their limitations. The Forensic Science Regulator in England and Wales, ISO/IEC 17025 accreditation requirements, and guidelines published by organisations such as SWGMAT and the OSAC (US) all require a documented validation process before a method is used in live casework.
Cannabis SSR profiling has largely cleared these bars. Whole-chloroplast phylogenomics is in the process of doing so, with key studies on Dalbergia, Pinus, and Quercus published in the 2015-2024 literature. eDNA metabarcoding for soil comparison is at the developmental validation stage. Knowing where each method sits in this pipeline is important when an analyst receives a case that might benefit from one of these approaches: the tool may exist but may not yet be ready for court.
A match is only informative when you can say how rare the match is.
The core statistical tool for DNA match evidence is the likelihood ratio (LR), which expresses how much more probable the observed evidence is under the prosecution hypothesis (the suspect's sample came from this plant) than under the defence hypothesis (the suspect's sample came from a random, unrelated plant of the same species). For human STR profiling, population databases with allele frequencies at each locus allow LR calculations that routinely produce values in the billions.
For plant forensics, the calculation follows the same logic but faces a practical challenge: population databases are smaller and less comprehensive than human STR databases. Cannabis databases now exist for several geographic populations and can support LRs in the thousands for a multi-locus SSR match. For non-Cannabis species, analysts often must conduct their own population sampling as part of the investigation, as Helentjaris did in the Palo Verde case, and report the LR as a local or provisional figure.
For eDNA community profiles, the statistical framework is still being developed. Researchers have used Bray-Curtis dissimilarity and permanova analysis to quantify how distinct two community profiles are. Converting this into a likelihood ratio that a court can use requires more work, and this remains an active research area. Until a validated statistical framework is published, an analyst presenting eDNA evidence should frame the conclusion qualitatively: the community profiles are highly similar / highly dissimilar, and discuss the distribution of similarity scores between profiles from the same versus different locations in the study data.
Advanced genomics still has to answer the basic expert-witness question: what can I honestly say?
Every new genomic method creates a temptation to overreach. A whole-chloroplast phylogenomic analysis might place a timber sample in a specific forest in eastern Madagascar with genuine certainty, but if the reference database contains only ten trees from that forest, the certainty is overstated. An eDNA community match might be striking, but if the method's transfer and persistence properties have not been studied for the surface type in question, the analyst cannot say whether the match arose from the suspect visiting the location or from secondary transfer at a loading dock.
Why can whole-chloroplast sequencing not individualise a plant the way nuclear STR profiling individualises a human?
Test yourself on Forensic Botany and Palynology with free, timed mocks.
Practice Forensic Botany and Palynology questionsSpotted an error in this page? Report a correction or read our editorial standards.