Plant DNA Barcoding: rbcL and matK

How two short chloroplast gene sequences, rbcL and matK, allow scientists to identify plant species from trace material and why those markers became the global standard for forensic botanical identification.

Last updated: 19 Jun 2026

Plant DNA barcoding uses two short chloroplast gene sequences, rbcL and matK, to identify plant species by matching them against a curated reference database. The Consortium for the Barcode of Life (CBOL) formally adopted these two loci as the international standard for land plants in 2009, after comparative testing showed that their combination achieves species-level identification for approximately 72 percent of vascular plant species and genus-level identification for over 90 percent. Chloroplast markers are preferred over nuclear DNA because a single plant cell contains hundreds to thousands of chloroplast genome copies, making successful PCR amplification possible even from small, degraded, or aged forensic samples. In casework, rbcL and matK together support species identification for trace evidence, controlled-substance analysis, illegal timber enforcement, and food fraud investigations, but their resolution limits must be explicitly stated in any expert report.

A leaf fragment on a suspect's boot, a seed pod stuck to a tyre tread, a stem crushed into a carpet fibre: plant material shows up at crime scenes constantly, but for most of forensic science history the only way to identify it was to find a botanist willing to peer through a microscope at it. That changed when molecular biologists realised you could read a plant's identity from two short segments of its chloroplast genome. Those segments, rbcL and matK, are now the international standard for plant DNA barcoding, and they have quietly transformed the kind of questions a forensic botanist can answer.

The concept of DNA barcoding is exactly what it sounds like: just as a supermarket barcode identifies a product by a short sequence of lines, a short, standardised DNA sequence identifies an organism by comparison to a reference database. For animals, the COI gene in mitochondrial DNA handles this job. For plants, the Consortium for the Barcode of Life (CBOL) selected two chloroplast markers in 2009 after extensive comparative testing. Why chloroplast, not nuclear DNA? Because chloroplasts are present in hundreds or thousands of copies per cell, which means even a tiny piece of degraded leaf can still yield enough intact template for PCR amplification.

This topic covers the biology that makes barcoding work, the laboratory workflow from extraction to database search, the honest limits on species resolution, and the forensic contexts where rbcL and matK evidence has changed case outcomes. By the end you should be able to explain why a positive barcode match is a strong class-level identification, what its limits are, and when a forensic scientist needs to add a third marker or shift to whole-chloroplast sequencing to answer the question a court is actually asking.

By the end of this topic you will be able to:

Explain why the chloroplast genome is the preferred source for forensic plant DNA barcoding and how copy number affects recovery from degraded material.
Describe the laboratory workflow from sample documentation through PCR amplification, Sanger sequencing, and database search, including contamination controls.
Compare BOLD and GenBank as reference databases, including their differences in curation level, voucher linkage, and implications for forensic reliability.
Identify the major plant groups where rbcL and matK fail to resolve to species level and state which additional markers or methods are used in those cases.
Construct correctly qualified expert-report language for a barcode identification, distinguishing what the sequence data shows from what the database coverage permits.

Key terms

DNA barcoding: Identification of an organism's species by sequencing a short, standardised region of its genome and matching the sequence against a curated reference database.
rbcL: The gene encoding the large subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO), located in the chloroplast genome. One of the two CBOL-recommended barcoding loci for land plants.
matK: The maturase K gene, embedded within the trnK intron of the chloroplast genome. It evolves faster than rbcL and therefore provides finer species-level discrimination.
BOLD: Barcode of Life Data Systems: the primary curated database linking barcode sequences to voucher-verified specimens, maintained by the Centre for Biodiversity Genomics in Guelph, Canada.
Chloroplast genome (plastome): A circular, 120-160 kb genome inside plant chloroplasts, present in hundreds of copies per cell. Its high copy number makes it recoverable from degraded material where nuclear DNA has been lost.
Species resolution rate: The proportion of species-level queries for which the barcoding approach returns an unambiguous species identification rather than only a genus or family assignment.

Why the chloroplast genome, and why two loci

When a leaf dies or a seed desiccates, its nuclear DNA begins to degrade within days to weeks. The double-stranded nuclear genome is present in just two copies per cell, and once those copies are damaged the information is gone. The chloroplast is different. A single photosynthetically active cell typically contains 50 to 100 chloroplasts, each holding multiple copies of its 120-160 kilobase circular genome. This means a single plant cell can hold a thousand or more intact chloroplast DNA templates, which is why PCR amplification succeeds from dried herbarium specimens decades old or from carbonised seeds recovered from fire scenes.

The chloroplast genome evolves slowly compared to the nuclear genome, which is a forensic double-edged sword. Slow evolution means sequences are conserved enough that universal PCR primers work across the whole plant kingdom without needing to design taxon-specific primers for every new species encountered in casework. But it also means closely related species can share identical or nearly identical rbcL sequences, because they have not yet diverged at that locus. This is the fundamental resolution ceiling.

CBOL resolved the resolution problem partly by requiring two loci together rather than one. rbcL provides reliable genus-level identification across almost all vascular plants and is easy to amplify. matK sits inside the trnK intron and evolves roughly three times faster than rbcL, providing the additional discrimination that lifts genus-level assignments to species-level for many groups. Used in combination, they cover around 72 percent of all land-plant species at species resolution in the global BOLD database, and over 90 percent at genus level.

Positions of rbcL and matK in the chloroplast genome.

The laboratory workflow from sample to sequence

A forensic botanist receiving a sample for barcoding follows a workflow that mirrors human DNA analysis in structure but differs in the specific reagents and primer sets used. Getting every step right matters: a contaminated extraction can yield a sequence from the analyst's own fingers (human chloroplast DNA does not exist, so this particular error does not apply, but environmental contamination from pollen or soil bacteria can produce misleading results).

Sample documentation and subsampling
Photograph the item in situ, note its condition, and take a sub-sample under clean conditions. A sample as small as a square centimetre of dry leaf or a single seed can be sufficient. Record the sub-sampling as part of the chain of custody record.
DNA extraction
CTAB (cetyltrimethylammonium bromide) extraction is the traditional method for plant material, effective at removing polysaccharides and polyphenols that inhibit PCR. Commercial silica-column kits (e.g. DNeasy Plant) are faster and better suited to casework throughput. For soil-rich samples, an additional bead-beating step and secondary purification may be needed.
PCR amplification
Universal primer pairs for rbcL (e.g. rbcLa-F/rbcLa-R targeting the 550 bp amplicon used by CBOL) and matK (e.g. 3F_KIM/1R_KIM or matK_390F/matK_1326R for angiosperms) are used in separate reactions. Degraded samples may require shorter, internally designed amplicons covering 150-200 bp.
Sequencing and quality trimming
Sanger sequencing of cleaned PCR products yields reads of 600-900 bp. Bidirectional sequencing (forward and reverse primer reads) is standard for casework to confirm ambiguous bases. Sequence-quality trimming removes low-confidence ends before database submission.
Database search and identification
The query sequence is searched against BOLD using the Identification Engine and against GenBank using BLAST. A match threshold of 99-100% identity to a single species is generally treated as a species-level identification; 97-98% identity to a single genus is a genus-level identification. Ambiguous results (multiple species within threshold) require additional markers or morphological corroboration.

Reference databases: BOLD and GenBank

The usefulness of any DNA barcode depends entirely on the completeness and accuracy of the reference database. If the species in question has not been sequenced or its record contains errors, the best laboratory work will return a misleading or incomplete answer. Both BOLD and GenBank have known weaknesses that forensic practitioners must account for.

Feature	BOLD	GenBank (NCBI)
Curation level	High: sequences tied to voucher specimens	Variable: many entries lack voucher links
Plant coverage	~500,000 barcode records (2024 estimate)	Larger total, but heterogeneous quality
Access model	Open identification engine; downloadable data	Fully open download
Preferred for forensics	Species-level IDs where voucher provenance matters	Backup search; broad coverage advantage
Error rate	Lower for curated records	Higher; taxonomic errors noted in peer review

A recurring issue is geographic coverage gaps. Plant families with the highest forensic relevance, such as Cannabis, many Solanaceae, and tropical timber species implicated in illegal logging, have been intensively sequenced. But large numbers of regional landraces, cultivated varieties, and tropical species remain absent from both databases. When a sequence fails to match anything above 95% identity, the analyst cannot conclude the plant is exotic or novel; it may simply be unsequenced. Expert testimony must be explicit about this ceiling.

The quality of a database identification also depends on accurate taxonomy in the database itself. GenBank has known instances of misidentified reference sequences deposited under incorrect species names. Cross-referencing against BOLD, which requires a specimen voucher, and against published regional flora treatments gives the analyst a defensible basis for the identification in court.

Species resolution rates and their forensic limits

Across the broad sweep of vascular plants, rbcL plus matK resolves about 72 percent of species queries to species level in the BOLD database, according to the foundational CBOL validation study. That number sounds good until you ask what happens with the remaining 28 percent. Some fall in groups where speciation was rapid and recent, leaving no time for the chloroplast markers to diverge between daughter species. Others are hybrids, which may carry the chloroplast of one parent species while morphologically resembling the other.

Grasses (Poaceae): a large, economically important family with substantial forensic relevance (crime-scene grass, fodder trace from livestock operations). Resolution to species by rbcL and matK alone is poor for many genera; trnH-psbA or ITS is routinely added.
Oaks (Quercus): extensive hybridisation means chloroplast haplotypes cross species boundaries. Identifying the particular oak species from wood or leaf may require nuclear markers or leaf morphology.
Cannabis: despite its high forensic profile, Cannabis sativa, Cannabis indica, and hemp cultivars are not reliably separated by chloroplast barcodes; microsatellite or whole-genome sequencing is used for variety-level questions.
Tropical timber trees: many flagship species under CITES protection (rosewood, ebony, teak) do resolve well at species level, but gaps in reference coverage persist for lesser-known species in the same genera.

Forensic case applications

Plant DNA barcoding is applied in several distinct forensic contexts. Understanding which context applies to a given case helps the analyst choose the right markers and frame the evidential question correctly from the start.

Trace evidence and scene linkage: leaf fragments, seeds, or pollen on clothing, footwear, or vehicles can be assigned to a plant species and compared to vegetation at a suspect location. The barcode provides a class-level link; whether that link is meaningful depends on how common or rare the species is in the area.
Controlled-substance identification: identifying plant material as Cannabis, Papaver somniferum (opium poppy), Erythroxylum coca, or Catha edulis (khat) at species level supports charges that turn on the specific controlled substance. Morphological examination is the first-line method but barcoding resolves ambiguous or heavily processed material.
Illegal wildlife and timber trade: CITES-listed timber species (e.g. Brazilian rosewood, Dalbergia nigra) can be identified from wood chips, sawdust, or small cross-sections using rbcL alone when the species is well-represented in BOLD. The US Lacey Act and EU Timber Regulation both require proof of species identity in enforcement actions.
Food fraud and poisoning cases: identifying plant species in food products, herbal medicines, or contaminated feed where morphological characters are destroyed. Substitution of Digitalis lanata for harmless herbs has caused fatalities; barcode-based identification provides the definitive species call.
Burial context and time-of-death: roots growing through grave fill can be identified to species, contributing to burial-interval estimates alongside entomological evidence. Species identity matters because different plants have known root-growth rates.

Forensic plant barcoding workflow: scene to expert opinion.

Admissibility and expert evidence standards

In US federal courts and in states that follow Daubert v. Merrell Dow Pharmaceuticals (1993), expert scientific testimony must clear four criteria: the theory has been tested, the error rate is known, it has been peer-reviewed, and it is generally accepted in the relevant scientific community. Plant DNA barcoding using rbcL and matK satisfies all four. The method was published in the Proceedings of the National Academy of Sciences in 2009 by the CBOL Plant Working Group, has been independently replicated across many laboratories, has documented species resolution rates, and is used routinely by herbaria, biodiversity institutes, and customs enforcement agencies worldwide.

In English and Welsh courts, a botanical expert must comply with Criminal Procedure Rule 19, which requires the expert to help the court rather than advocate for the party that retained them, to set out the facts relied on, and to indicate where there is a range of opinion. An expert presenting barcode evidence should state the percentage identity to the best database match, the number of other records within the match threshold, and the completeness of database coverage for the relevant plant family.

India's courts treat plant identification evidence under the Indian Evidence Act 1872, Section 45, which allows expert opinion on science, art, foreign law, and handwriting. A botanical DNA expert qualifies as a science expert. Chain-of-custody documentation linking the sequenced sample to the scene exhibit is particularly important because Indian courts scrutinise exhibit handling carefully, and any break in the chain can lead to exclusion.

Worked example

Identifying khat in a postal seizure

A parcel, some wilted leaves, and a database search that confirmed the species identity the morphology only suggested.

A border control agency intercepts a package labelled as 'dried herbs.' The material inside is a bundle of wilted shoots with some intact leaves but no flowers or fruit, making morphological identification uncertain. An examiner notes the serrated leaves and the characteristic reddish midrib, consistent with Catha edulis (khat), but requests DNA confirmation because the controlled-substance charge depends on correct species assignment.

Sub-sampling: a 2 cm square of leaf lamina is excised from the interior of the bundle, photographed as an exhibit in its own right, and placed in a 1.5 ml tube with silica desiccant for transport to the forensic plant DNA laboratory.
Extraction: CTAB extraction with 65°C incubation yields a light-green extract. After chloroform-isoamyl alcohol purification and isopropanol precipitation, the pellet is resuspended in TE buffer. Quantification by spectrophotometry gives 18 ng/µl, adequate for PCR.
PCR: rbcLa-F/rbcLa-R amplification yields a bright band at the expected 550 bp on an agarose gel. matK primers (3F_KIM/1R_KIM) also amplify cleanly. Both PCR products are cleaned with a column kit and sent for Sanger sequencing in both directions.
Identification: the rbcL sequence returns a 100% identity match (550/550 bp) to Catha edulis in BOLD, with the next-best match being Maytenus sp. at 97.6%. The matK sequence gives 99.4% identity to Catha edulis (Celastraceae). No other species matches above 98%. The combined result is a species-level identification with high confidence.
Expert report: the analyst reports that the sequence data are consistent with Catha edulis and inconsistent with any other species in the Celastraceae currently represented in BOLD, that the closest database gaps in coverage are noted, and that the morphological and molecular evidence together support the identification as khat. The report does not state 'this is khat' as a certainty but as the most supported identification given available data.

At trial, the defence does not challenge the DNA method but questions whether the BOLD Celastraceae coverage is complete enough to exclude all alternatives. The analyst responds that 423 Celastraceae barcode records are in BOLD and none matches above 98%, which the court accepts as adequate grounds for the species identification. The combination of morphological consistency and molecular data leads to conviction.

Check your understanding

Question 1 of 4· 0 answered

Why are rbcL and matK located in the chloroplast genome preferred over nuclear loci for forensic plant identification?

Key Takeaways

rbcL and matK are the CBOL-recommended plant barcoding loci because they reside in the chloroplast genome, which is present in hundreds of copies per cell and survives in degraded forensic samples.
The two-locus combination achieves approximately 72% species-level and over 90% genus-level identification across vascular plants; grasses, oaks, and Cannabis are among the groups with lower species resolution.
BOLD is the preferred forensic database because its sequences are linked to voucher-verified specimens; GenBank offers broader coverage but at lower curation quality.
Expert reports must state findings as consistencies rather than certainties, note database coverage gaps, and separate the sequence result from the taxonomic inference it supports.
Forensic applications include trace evidence linkage, controlled-substance identification, illegal timber enforcement, food fraud, and burial-interval estimation from root-growth species identity.

Why are rbcL and matK used for plant barcoding instead of ITS?

rbcL and matK are located in the chloroplast genome, which is present in hundreds of copies per cell and is therefore recoverable from small or degraded samples. ITS (internal transcribed spacer) resides in the nuclear genome and offers higher resolution at species level for some groups, but amplification from heavily degraded material is less reliable. The two chloroplast loci were selected as the CBOL standard because they work well together across the breadth of plant diversity.

What databases do forensic botanists use to identify a plant barcode sequence?

The two primary databases are BOLD (Barcode of Life Data Systems, run by the Centre for Biodiversity Genomics in Guelph, Canada) and GenBank (operated by the NCBI). BOLD holds curated barcode records linked to verified voucher specimens, while GenBank holds a far larger but less curated pool. Forensic analysts typically search both and weight BOLD identifications more heavily when the reference sequence is tied to a physical herbarium specimen.

Can rbcL and matK reliably identify plants to species level?

Not always. Studies across large plant families consistently achieve genus-level identification in more than 90% of cases, but species-level resolution drops to 70-80% depending on the plant group. Grasses (Poaceae) and some tropical tree families are notoriously difficult to resolve. When species-level identification is needed for a forensic opinion, analysts often add a third marker such as ITS or trnH-psbA, or turn to whole-chloroplast sequencing.

How is plant DNA extracted from degraded or old material at a crime scene?

The same principles used for degraded human DNA apply: use a silica-column or CTAB extraction protocol, select short PCR amplicons (under 150 base pairs if the sample is very degraded), and include well-documented negative controls at every extraction batch to detect contamination. Soil can inhibit PCR, so additional purification steps are needed for material recovered from burial contexts.

Is plant DNA barcoding accepted as evidence in court?

Yes, in several jurisdictions. The legal standard depends on the court system: US courts apply the Daubert test, which asks whether the method has been tested, peer-reviewed, and is generally accepted. Plant DNA barcoding using rbcL and matK meets all three criteria. The Palo Verde case in 1992 preceded formal barcoding standards but established the principle that plant DNA can link a suspect to a scene; modern barcode methods rest on a much larger validated database and broader peer-reviewed literature.

Test yourself on Forensic Botany and Palynology with free, timed mocks.

Practice Forensic Botany and Palynology questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.