Skip to content

Non-Human DNA and Species Identification

Species identification using DNA barcoding and nuclear markers extends the same molecular toolkit used for human profiling to animal, plant, and fungal biological material recovered from crime scenes. This topic covers cytochrome b barcoding, validated reference databases, casework applications, and the legal frameworks governing wildlife and food-fraud investigations worldwide.

Last updated:

Share

Non-human DNA analysis applies the same extraction, amplification, and sequencing methods used for human identification to biological material from animals, plants, and fungi. The central technique is DNA barcoding: sequencing a short, standardised genomic region, usually a mitochondrial gene such as cytochrome b or cytochrome oxidase I (COI), and comparing the result against curated reference databases to assign a species identity. This approach is used in wildlife trafficking investigations, food-fraud prosecutions, timber and plant product cases, and any casework where the species of biological material is legally relevant. Because mitochondrial DNA is present in high copy number per cell, it can often be recovered from degraded, trace, or processed materials where nuclear markers fail.

The forensic value of species identification lies in its connection to specific legal prohibitions. The Convention on International Trade in Endangered Species (CITES) restricts trade in over 38,000 listed species, and domestic legislation in most jurisdictions attaches criminal penalties to those restrictions. When a confiscated product, a vehicle, or a hunting site yields biological material of unknown origin, species identity determines whether a criminal offence has occurred. A sequence match to a CITES Appendix I taxon is, in many systems, both the proof of the crime and the basis for the charge. The same logic applies in food-fraud cases, where consumer protection statutes in the EU, the US, and elsewhere prohibit mislabelling of species in meat, fish, and processed food products.

Non-human DNA casework sits at the intersection of forensic biology, taxonomy, and wildlife law. Practitioners need to understand the molecular methods, the reference database environment, the limits of sequence-based identification for closely related taxa, and the admissibility requirements that apply to DNA evidence in the relevant jurisdiction. The scope of biological evidence topic covers the broader categories of biological material encountered in casework. Wildlife forensics as a distinct subdiscipline is treated at wildlife forensics.

1. Sample receipt and documentation2. DNA extraction3. PCR amplification (cytb or COI primers)4. Sanger sequencing (both strands)5. Database query: BOLD and GenBank6. Similarity threshold check (98 to 99%)Two possible outcomesAt or above thresholdBelow thresholdSpecies-level assignment confirmedReported at genus or family level only
The seven-step species identification workflow: a sequence match at or above the 98 to 99 percent similarity threshold yields a species-level assignment; below it, the report falls back to genus or family level only.

By the end of this topic you will be able to:

  • Explain the rationale for using mitochondrial markers in species identification and distinguish cytochrome b from COI barcoding approaches.
  • Describe the workflow for non-human DNA species identification from sample receipt to database query and species assignment.
  • Evaluate the quality and relevance of reference database entries from BOLD and GenBank when interpreting a casework sequence.
  • Identify casework scenarios, including wildlife trafficking, food fraud, and plant product cases, where species identification is legally decisive.
  • Describe the legal frameworks, including CITES and its domestic equivalents, under which non-human DNA evidence is used in prosecutions.
Key terms
DNA barcoding
The sequencing of a short, standardised genetic region to assign a specimen to a species. For animals, the primary barcode markers are a 648-bp segment of COI or the cytochrome b gene. The sequence is matched against a reference database to return a species identity, usually expressed as a percentage similarity score.
Cytochrome b (cytb)
A protein-coding gene in the mitochondrial genome used widely for vertebrate species identification. Its moderate mutation rate resolves most closely related species while remaining stable enough for reliable intraspecific comparison. High mitochondrial copy number improves recovery from degraded samples.
COI (cytochrome oxidase subunit I)
The primary barcode gene for the animal kingdom under the Barcode of Life initiative. A 648-bp region within the COI gene is the official BOLD barcode. COI is less used in vertebrate forensic work than cytochrome b but is standard for invertebrates, fish, and many other taxa.
BOLD (Barcode of Life Data System)
A curated online reference database of barcode sequences maintained by the University of Guelph, Canada. Each entry includes a voucher specimen, taxonomic identification, and collection data. BOLD is the preferred database for species-level barcoding queries because of its curation standards and provenance records.
CITES
The Convention on International Trade in Endangered Species of Wild Fauna and Flora, a multilateral treaty signed in 1973 now with over 180 parties. CITES Appendix I lists species for which commercial trade is prohibited; Appendix II requires permits; Appendix III allows countries to list species requiring cooperation from other parties. DNA identification of CITES-listed taxa is central evidence in wildlife crime prosecutions.
Species identification threshold
The minimum sequence similarity percentage at which a database match is accepted as a species-level assignment. Forensic laboratories typically require 98 to 99 percent similarity to the closest reference sequence for a definitive species call. Below this threshold, the result is reported at genus or family level only.

Why mitochondrial DNA for species identification

The choice of mitochondrial DNA for species identification is not arbitrary. Each somatic cell contains hundreds to thousands of mitochondria, and each mitochondrion carries multiple copies of its circular genome. The result is that mitochondrial targets exist in far higher copy number per cell than any single-copy nuclear locus. When a sample has been processed, cooked, dried, or otherwise degraded, the chance of recovering a short mitochondrial amplicon often remains high when nuclear markers have become undetectable. This copy-number advantage is the primary reason mitochondrial markers were adopted for species work before barcoding became formalised.

The mitochondrial genome also evolves at a higher rate than most nuclear regions, accumulating synonymous substitutions at a rate roughly five to ten times faster than single-copy nuclear genes. This rate produces sufficient interspecific variation to distinguish closely related species, while the maternal inheritance pattern and lack of recombination keep intraspecific variation low enough to give consistent sequence types within a species. For most vertebrate groups, cytochrome b sequences cluster into species-level groups with clear gaps to neighbouring species, the so-called barcoding gap.

The limitation of mitochondrial markers is that they report maternal lineage only. Where hybridisation between species is common, a mitochondrial sequence can assign a hybrid to the maternal parent's species rather than flagging it as a hybrid at all. Nuclear markers are needed to detect hybrid status. In casework involving closely related taxa, such as distinguishing Asian elephant from African elephant, or distinguishing species within a complex where hybridisation is known, the analysis may require both mitochondrial sequencing for species assignment and nuclear microsatellite profiling for individual or population-level work.

Marker choice: cytochrome b versus COI

Two mitochondrial markers dominate non-human forensic work. Cytochrome b (cytb), a 1140-bp gene, is the longest coding gene in the vertebrate mitochondrial genome and has accumulated an extensive forensic validation record across mammals, birds, and reptiles. The published primer sets are numerous, allowing analysts to select short amplicons (150 to 400 bp) suited to degraded samples while still capturing diagnostic sequence. Cytochrome b has been the marker of choice in food-fraud and wildlife casework published in the forensic literature since the 1990s.

COI (cytochrome oxidase subunit I) is the official Barcode of Life marker for the animal kingdom. The standard 648-bp amplicon is well-represented in BOLD, which gives COI queries an advantage in database breadth, particularly for fish, invertebrates, and less-studied taxa. For vertebrates commonly encountered in wildlife casework, both markers perform well. The choice is partly practical: cytb primers often produce cleaner amplification from processed meat samples, while COI has better coverage in BOLD for fish species identification, which matters in fisheries fraud cases.

FeatureCytochrome bCOI
Gene length1140 bp (full)1548 bp (full), 648 bp barcode region
Primary taxonomic useMammals, birds, reptilesAll animals; dominant for fish and invertebrates
Main reference databaseGenBankBOLD (primary), GenBank
Forensic validationExtensive, since 1990sGrowing, especially post-2003 BOLD launch
Performance in degraded samplesShort amplicons available, high copy numberShort mini-barcodes available, high copy number
Hybridisation limitationYes, maternal lineage onlyYes, maternal lineage only

For plant identification, neither cytb nor COI is suitable because plants lack cytochrome b in their mitochondria as used in animal barcoding, and COI evolves too slowly in plants to resolve species. The accepted plant barcoding markers are two chloroplast regions: rbcL and matK, which together form the two-locus plant barcode endorsed by the Consortium for the Barcode of Life. Fungal identification typically uses the ITS (internal transcribed spacer) region of ribosomal DNA. Analysts working with plant or fungal evidence need to apply the appropriate marker set.

Laboratory workflow: from sample to species assignment

The laboratory workflow for non-human species identification follows the same general phases as any forensic DNA analysis: receipt and documentation, extraction, quantification, amplification, and interpretation. The key differences from human STR profiling lie in primer selection, PCR optimisation for the target taxon, sequencing rather than fragment analysis, and reference database querying rather than allele comparison.

  • Sample receipt and documentation. Chain of custody documentation applies identically to non-human samples. Each item is photographed, described, and assigned a laboratory identifier before packaging is opened.
  • DNA extraction. Silica-column methods (e.g. QIAGEN DNeasy) work for most tissue and blood samples. Processed food samples may require modifications to break down fats and denatured proteins. Hair with the root attached can be extracted directly; shaft-only hair requires optimisation.
  • Quantification. Real-time PCR quantification using broad-spectrum eukaryotic primers gives a DNA yield estimate before amplification. Some protocols skip this step for non-human work and proceed directly to amplification, particularly with fresh tissue.
  • PCR amplification. Published primer sets targeting cytb or COI are selected based on the suspected taxon. For unknown material, universal vertebrate or mammal primers provide a first-pass screen. Amplicon length is chosen based on sample quality: 150 to 250 bp for degraded samples, longer for fresh material.
  • Sanger sequencing. Amplicons are purified and submitted for Sanger (capillary electrophoresis) sequencing. Both strands are sequenced where possible. Double-stranded sequencing is required for casework reports in most laboratory accreditation frameworks.
  • Sequence editing and database query. Raw trace files are inspected and edited to produce a consensus sequence. The sequence is then submitted to BOLD and GenBank using BLAST (Basic Local Alignment Search Tool) or the BOLD Species ID engine. The analyst records the top matches, similarity scores, and the provenance of the closest reference entries.
  • Interpretation and reporting. A species assignment is made if the closest match exceeds the laboratory's validated similarity threshold (typically 98 to 99 percent) and the reference entry is of known quality and provenance. If the match falls below the threshold, the result is reported at genus or family level.

Reference databases: BOLD and GenBank

The quality of a species identification is only as good as the reference database it is compared against. Two databases are used in virtually all non-human forensic casework. BOLD (Barcode of Life Data System, barcodinglife.org) is maintained by the University of Guelph and holds over 10 million barcode records linked to voucher specimens with taxonomic and collection provenance. BOLD applies a curation model: entries must be accompanied by specimen voucher information before they are included in the identification engine. This means BOLD sequences are generally more reliable for forensic use than an unfiltered GenBank query.

GenBank (ncbi.nlm.nih.gov/genbank), maintained by the US National Center for Biotechnology Information (NCBI), accepts sequence submissions without requiring voucher specimens. Its sequence pool is larger and covers more taxa and gene regions than BOLD, making it indispensable for species outside the BOLD barcode markers or for plant and fungal ITS queries. The analyst must appraise GenBank entries critically: misidentified vouchers, sequences from laboratory strains of uncertain identity, and entries with minimal taxonomic annotation do occur and can produce false assignments.

Standard forensic practice is to query both databases, report the top hits with their accession numbers, similarity percentages, and the geographic and taxonomic provenance of the reference specimen, and flag any discordance between the two databases. Where BOLD and GenBank return the same species at high similarity, the assignment is strong. Where they disagree, or where the closest match is to a species not known to occur in the geographic region of the casework, the analyst investigates before making an assignment.

CriterionBOLDGenBank (NCBI)
Primary contentBarcode sequences (COI and others)All DNA sequences, all gene regions
CurationVoucher-linked, taxonomist-reviewedAuthor-submitted, minimal curation
CoverageBest for animals via COI; expandingBroadest overall, essential for plants and fungi
Forensic preferenceFirst choice for species ID queriesEssential supplement; appraise entries critically
Query toolBOLD Species ID engineBLAST (blastn, megablast)
OutputSimilarity score, specimen provenanceE-value, bit score, accession metadata

Casework applications: wildlife, food fraud, and beyond

Wildlife trafficking is the most legally significant application. CITES Appendix I taxa, including tigers, rhinos, elephants, great apes, and many reptile and plant species, are subject to near-total trade prohibition. Seizures of ivory, horn, skin, or bone require species identification to establish that a CITES-listed species is involved before a criminal charge can be framed. The CITES-listed status of a species is determined by the current CITES Appendices, which are updated at each Conference of the Parties. Analysts must use the current appendix list at the time of the offence, not a historical version.

National implementing legislation adds another layer. In the United States, the Lacey Act 1900 (as amended) prohibits trade in wildlife taken in violation of any law, domestic or foreign, making the CITES violation in the country of origin a predicate offence. The Endangered Species Act 1973 (ESA) establishes separate domestic protections for listed US species. In the United Kingdom, the Control of Trade in Endangered Species Regulations 2018 (COTES 2018) implements CITES and creates specific offences carrying up to five years imprisonment. In India, the Wildlife Protection Act 1972, amended most recently in 2022, schedules protected species with penalties up to seven years for Schedule I offences. In the European Union, Council Regulation (EC) 338/97 implements CITES and is directly applicable across member states.

Food fraud is a growing area for non-human DNA casework. Regulatory frameworks in the EU (Regulation 1169/2011 on food information to consumers), the US (under FDA jurisdiction and the Federal Food, Drug, and Cosmetic Act), and many other countries require accurate species labelling of fish, meat, and seafood products. Substitution of cheaper species for premium ones, such as labelling Nile perch as grouper, or substituting donkey meat for beef in processed products, is detected through cytb or COI sequencing of the product sample. The 2013 European horsemeat scandal, in which beef products were found to contain undeclared horse and pig DNA, accelerated investment in multi-species detection methods including metabarcoding.

Plant and timber forensics use chloroplast markers to identify wood species in illegal logging cases. The Convention on Biological Diversity and CITES Appendix II or III listings cover many timber species including certain mahogany, rosewood, and ebony taxa. Stable isotope analysis of wood can further assign geographic origin, complementing the species identification. The forensic botany subject covers plant evidence in broader detail. For animal and insect evidence in wildlife and ecological crime, see forensic entomology.

Limitations, mixtures, and next-generation sequencing

Sanger sequencing of a single amplicon identifies one dominant sequence. This is sufficient for a sample containing biological material from one species. It fails for mixtures: if two species' DNA is present in similar proportions, the chromatogram will show double peaks at polymorphic positions and cannot be interpreted as a single sequence. This scenario arises frequently in processed foods, environmental samples, and composite products. The solution is next-generation sequencing (NGS) in metabarcoding mode, where millions of short reads are each individually assigned to their taxon of origin, giving a species inventory of the entire sample.

The barcoding gap problem is a second systematic limitation. For recently diverged species pairs, intraspecific variation and interspecific variation overlap, and a sequence-based identification cannot reliably separate the two. This is documented in several commercially important fish groups, in deer species, and in some bird families. Where the barcoding gap is absent, the forensic analyst must report an inability to discriminate and consider morphological examination, nuclear markers, or restriction enzyme profiling as supplementary approaches.

Individual identification within a species, for example matching a seized ivory tusk to a specific elephant, or a confiscated feather to a specific bird, requires nuclear microsatellite profiling rather than barcoding. Reference populations are needed to calculate match probabilities. For many protected species, reference databases are sparse, which limits the statistical weight of individual matches. The forensic biotechnology subject covers the molecular tools relevant to this level of analysis.

Check your understanding
Question 1 of 4· 0 answered

Why are mitochondrial markers preferred over single-copy nuclear markers for species identification in degraded or processed samples?

Key Takeaways

  • Mitochondrial markers, primarily cytochrome b for vertebrates and COI for broader animal work, are preferred for species identification because their high copy number per cell enables recovery from degraded and processed samples.
  • Species assignment requires querying both BOLD and GenBank, applying a validated similarity threshold of 98 to 99 percent, and critically appraising the provenance and quality of the closest reference entries before reporting.
  • CITES and its domestic implementing statutes, including the Lacey Act and ESA in the US, COTES 2018 in the UK, Council Regulation 338/97 in the EU, and the Wildlife Protection Act 1972 (amended 2022) in India, make species identification the evidentiary foundation of most wildlife crime prosecutions.
  • Sanger sequencing identifies a single dominant species; mixed-species samples from processed foods or composite products require next-generation sequencing metabarcoding to inventory all contributing taxa.
  • Individual-level identification within a species requires nuclear microsatellite profiling rather than barcoding, and the statistical weight of such evidence depends on the availability and quality of population reference databases for the taxon.
What is DNA barcoding and how is it used in forensic casework?
DNA barcoding is the sequencing of a short, standardised genetic marker to identify an organism's species. In forensic biology, the most widely used barcode for animals is a 648-base-pair segment of the mitochondrial cytochrome b or COI gene. The sequence is compared against a curated reference database such as BOLD or GenBank to return a species assignment. Barcoding is used in wildlife trafficking cases, food adulteration prosecutions, and any casework where species identity of biological material is in dispute.
Why is cytochrome b preferred over other markers for animal species identification?
Cytochrome b sits in the mitochondrial genome, so each cell contains hundreds to thousands of copies, making recovery from degraded or trace samples more likely than with single-copy nuclear markers. Its mutation rate is high enough to resolve closely related species but low enough to give consistent intraspecific readings. Primers and protocols have been extensively validated and published across major animal groups, so cytochrome b results from different laboratories are directly comparable.
What reference databases are used to interpret non-human DNA sequences?
The two principal databases are BOLD (Barcode of Life Data System, maintained by the University of Guelph) and GenBank (maintained by the US National Center for Biotechnology Information). BOLD is curated specifically for barcoding and reports similarity scores and specimen provenance. GenBank holds a broader sequence pool but requires more critical appraisal of entry quality. For forensic use, analysts compare their query sequence against both and require a minimum similarity threshold, typically 98 to 99 percent, before accepting a species assignment.
Which international laws govern wildlife forensic casework?
The Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES, 1973) is the primary instrument, with over 180 signatory states. CITES lists species in three appendices by trade restriction level. National implementing legislation varies: in the US the Lacey Act and Endangered Species Act apply; in the UK the Control of Trade in Endangered Species Regulations 2018 (COTES) implement CITES domestically; in India the Wildlife Protection Act 1972 is the key statute, currently amended through 2022. DNA evidence is increasingly used in prosecutions under all these frameworks.
How does non-human DNA analysis differ from human STR profiling in the laboratory?
Human STR profiling targets validated short tandem repeat loci with commercially kitted reagents, and the resulting profiles are compared to population databases. Non-human species identification usually targets mitochondrial sequence markers for species assignment first, then may use nuclear microsatellites for individual identification within a species. There are no universal commercial kits for most non-human species, so analysts must select and validate primers for the taxon at issue. Reference populations are often sparse or absent, limiting the statistical weight of individual-level comparisons.

Test yourself on Forensic Biology with free, timed mocks.

Practice Forensic Biology questions

Found this useful? Pass it along.

Share

Spotted an error in this page? Report a correction or read our editorial standards.

Your journey to becoming a forensic professional starts here.

Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.