Practice with national-level exam (FACT, FACT Plus, NET, CUET, etc.) mocks, learn from structured notes, and get your doubts solved in one place.
How variation between people is encoded into the genome and harvested by forensic ID: RFLP, VNTR, STR, SNP and mtDNA as the marker classes that drove the discipline from 1985 to today, and the heredity rules (autosomal, X-linked, Y-linked, mitochondrial) every examiner uses to read a profile.
Last updated:
In September 1984, Alec Jeffreys at the University of Leicester produced the first DNA fingerprint. He was studying inherited variation in human myoglobin genes when he noticed that certain regions of the genome varied so dramatically between individuals that the Southern blot pattern from one person was almost certainly unique to that person. He called the method restriction fragment length polymorphism (RFLP) analysis. Within two years, the technique had been used to resolve an immigration case (the first forensic DNA application), and by 1986 it had exonerated an innocent man in the Narborough murder inquiry and then identified Colin Pitchfork as the actual killer, the first criminal conviction based on DNA evidence.
The discipline has changed substantially since 1986. RFLP gave way to PCR-based short tandem repeat (STR) typing, which gave way to capillary electrophoresis platforms, which now sit alongside next-generation sequencing and probabilistic genotyping software. But the underlying concept has not changed: forensic DNA identification works by measuring the natural variation between individual human genomes, comparing the variant profile from an unknown crime-scene sample to a known reference, and expressing the probability of the match. Every analytical improvement in the past four decades has either expanded the number of variant markers measured, reduced the amount of DNA needed to measure them, or improved the statistical framework for interpreting the result.
Understanding the different categories of genomic variation, their inheritance patterns, and their discriminating power is the foundation on which all forensic DNA interpretation rests. A profile that cannot be interpreted correctly is worse than no profile at all, because an incorrect interpretation may incriminate the innocent or exonerate the guilty. Courts in the US, UK, India, and across the EU have all revisited high-profile cases where the underlying genetics was misunderstood or misrepresented at trial.
One nucleotide in every thousand differs between any two unrelated humans, and that 0.1% variation is the entire basis of forensic DNA identification.
The human genome contains approximately 3.2 billion base pairs per haploid set. The diploid somatic cell carries two complete copies, one from each parent, for a total of about 6.4 billion base pairs. Any two unrelated humans share approximately 99.9% of their DNA sequence: the differences account for about 3-4 million variant positions across the genome. These differences arise from mutations, primarily single-nucleotide substitutions but also insertions, deletions, duplications, and rearrangements, that have accumulated in human populations over evolutionary time. Any such position where the sequence differs between individuals at a population frequency above 1% is defined as a polymorphism.
Test yourself on Forensic Biotechnology with free, timed mocks.
Practice Forensic Biotechnology questionsPolymorphisms are classified by their molecular character. A single-nucleotide polymorphism (SNP) is a position where a single base differs between at least two alleles in a population. An insertion/deletion polymorphism (indel) is a position where one allele carries one or more extra bases relative to the other. A variable number of tandem repeat (VNTR) is a region where a short sequence motif is repeated in tandem, and the number of repeats varies between individuals. Short tandem repeats (STRs), also called microsatellites, are VNTRs with a repeat unit of 2-7 base pairs. Each of these classes has been used in forensic identification at different points in the discipline's history, and each has specific discriminating power, inheritance pattern, and practical constraints.
The formal measure of a polymorphism's discriminating power is its heterozygosity (the probability that two randomly chosen alleles from a population are different) or its polymorphism information content (PIC). A locus where every individual in the population has one of only two alleles carries limited discriminating power. A locus where hundreds of alleles exist in a population carries high discriminating power. The STR loci in CODIS, ESS, and NDIS proposals were selected partly on this criterion: they have high heterozygosity across all major population groups, which means that matching profiles at multiple such loci quickly becomes astronomically improbable by chance alone.
Alec Jeffreys' 1984 method was powerful enough to convict a killer, yet required a stain the size of a postage stamp and weeks of radiograph development, a set of constraints that drove the field toward PCR within a decade.
Restriction fragment length polymorphism (RFLP) analysis, as developed for forensic use, relied on three properties of DNA. First, restriction endonucleases cut double-stranded DNA at specific recognition sequences (typically 4-8 base pairs). Second, the human genome contains variable number of tandem repeat (VNTR) loci where a core repeat unit of 10-60 base pairs is tandemly repeated, with the number of repeats varying between individuals. The alleles at a VNTR locus therefore have different lengths. Third, if restriction sites flank a VNTR locus, cutting genomic DNA with the appropriate enzyme produces fragments of different length depending on the VNTR allele present. Southern blotting and hybridisation with a probe complementary to the VNTR core sequence reveals these fragment-length differences as bands on an autoradiograph.
RFLP typing at multiple independent VNTR loci (typically four to six loci in operational casework) produced a multi-locus pattern that Jeffreys called a DNA fingerprint. Its discriminating power was extraordinary: early calculations suggested that the probability of two unrelated individuals sharing the same multi-locus RFLP pattern was below one in a billion, though later population genetics work showed these estimates required more careful statistical grounding. The critical limitation was practical rather than theoretical: RFLP required at least 100-500 nanograms of high-molecular-weight (unfragmented) genomic DNA. Degraded DNA gave ambiguous or uninformative banding patterns. A bloodstain smaller than approximately 2 cm in diameter on a non-porous substrate, or any stain that had been exposed to significant environmental insult, often failed RFLP typing entirely.
People v. Castro (New York, 1989) is the landmark case that exposed both the power and the limitations of early RFLP typing. The prosecution's expert (from Lifecodes Corporation) claimed that DNA from a bloodstain on Castro's watch matched the victim. The defence challenged the testing conditions and population statistics. The pre-trial hearing, presided over by Judge Gerald Scheindlin and informed by a technical working group of molecular biologists including Richard Lewontin and Eric Lander, concluded that while the theory of RFLP was scientifically valid, Lifecodes had not followed adequate protocols in this specific case, and the DNA evidence was excluded from the prosecution's case. The case became a catalyst for quality standards in forensic DNA typing, leading directly to the formation of the Technical Working Group on DNA Analysis Methods (TWGDAM, later SWGDAM) in the US.
The shift from RFLP to STR in the early 1990s was not just a technical upgrade, it was the moment forensic DNA went from a specialist technique requiring large stains to a routine operational tool that could profile a cell or two on a cigarette end.
Short tandem repeats (STRs) are regions of the genome where a core sequence of 2-7 base pairs is repeated in tandem. An STR locus might have alleles ranging from 5 to 20 repeats of a four-base unit (a tetranucleotide repeat), with each allele differing from adjacent alleles by exactly four base pairs. These size differences are small enough to be resolved efficiently by capillary electrophoresis, yet large enough that automated fragment-analysis software (on Applied Biosystems 3130 or 3500xL series instruments or equivalent) can call alleles with high confidence.
PCR amplification of STR loci requires only picogram to nanogram quantities of template DNA (compared to 100-500 ng for RFLP). Commercial multiplex kits (GlobalFiler, Investigator 24plex QS, PowerPlex Fusion, Investigator IDplex Plus) co-amplify 15-30 STR loci and the sex-typing amelogenin marker in a single reaction. Each locus is distinguished by its chromosomal location, its repeat unit structure, and the fluorescent dye attached to one primer in the kit. The resulting electropherogram shows a peak pattern in multiple colour channels, and the allele calls for each locus define the individual's profile.
The global locus standards define which STR loci are typed in each national database. The FBI originally defined 13 CODIS (Combined DNA Index System) loci for the US NDIS, a standard that governed US casework from 1998 to 2017. The European Standard Set (ESS) was defined by ENFSI for EU databases, initially at 7 loci (ESS7), expanded to 12 (ESS12), and then aligned with the Prüm framework. In January 2017 the FBI expanded CODIS to 20 core loci, overlapping with the ESS and including additional loci to improve discriminating power and reduce adventitious matches. India's DNA Technology (Use and Application) Regulation Bill 2019 proposes a national database panel that includes the 20 CODIS loci and additional loci for compatibility with INTERPOL cross-border exchange. The overlap between national locus sets allows a profile generated in India using an international kit to be compared against NDIS or NDNAD profiles in a formal case-to-case request under bilateral mutual legal assistance treaty (MLAT) provisions.
When a 20-locus STR profile still cannot be recovered because the DNA is too fragmented for any PCR amplicon over 100 base pairs, single-nucleotide polymorphism typing offers a path forward, at the cost of a different statistical framework.
Single-nucleotide polymorphisms are the most abundant class of variation in the human genome. The Human Genome Project and subsequent SNP discovery efforts have catalogued over 600 million SNPs across diverse human populations in the dbSNP and gnomAD databases. In the genome as a whole, about one in every 300 base pairs is polymorphic at the SNP level. SNPs are biallelic (two alleles per locus, the ancestral allele and the derived allele) in the vast majority of cases, which means each SNP carries less information than a multi-allelic STR. To achieve the same discriminating power as a 20-locus STR profile, a forensic SNP panel needs approximately 50-100 carefully selected loci with high heterozygosity across major population groups.
The practical advantage of SNPs in forensic work is that each genotyping assay can be designed to work on very short DNA fragments, sometimes as short as 40-50 base pairs, far shorter than the smallest mini-STR amplicon. The SNPforID consortium (a European collaborative) published a 52-SNP panel for forensic identification that produces informative results from degraded DNA samples where STR typing fails. The FORENSeq DNA Signature Prep kit (Verogen) and similar massively parallel sequencing (MPS) platforms now combine STR typing with SNP genotyping and ancestry-informative SNPs in a single run, producing simultaneous identification, ancestry, and phenotype data.
Three distinct classes of forensic SNP panel serve different purposes. Identification SNPs are selected for high heterozygosity across all populations and are used in the same way as STRs, to link an unknown sample to a known individual, but with a much smaller amplicon requirement. Ancestry-informative SNPs (AISNPs) differentiate between broad continental population groups (broadly, European, African, East Asian, South Asian, Native American, Admixed) and provide the biogeographic origin estimate needed when there is no suspect or database hit. Externally-visible characteristic (EVC) SNPs, most extensively developed in the HIrisPlex-S system by Manfred Kayser's group at Erasmus MC, predict eye colour, hair colour, and skin colour from genomic DNA with validated accuracy statistics. EVC prediction has been admitted as investigative intelligence (not courtroom identification evidence) in a growing number of jurisdictions, including the UK (where the FSR has published specific guidance), the Netherlands, and some US state cases. India does not yet have a formal legal framework for EVC prediction evidence.
A forensic examiner reading a kinship case does not need to derive Mendel's laws from scratch, but they do need to know which laws apply to which marker, because getting the inheritance pattern wrong changes the statistical calculation completely.
The four forensic DNA marker classes follow four different inheritance pathways, and the choice of marker in a kinship or lineage case depends on which pathway is informative for the specific question being asked.
Autosomal STR loci follow Mendelian biparental inheritance. Each person carries two alleles at each autosomal locus, one inherited from the mother and one from the father. The alleles segregate independently of alleles at other autosomal loci (Mendel's law of independent assortment holds for loci on different chromosomes; loci on the same chromosome may show linkage). In a paternity case, a child's allele at each locus must have come from either the mother or the father. If one parental allele is known (the mother's), the other allele must match the alleged father's genotype. The paternity index (PI) at each locus is calculated as the probability of the child's paternal allele given that the alleged father is the true father, divided by the probability given that an unrelated man from the relevant population is the true father. This logic was first formalised in court in Christoph Rand v. German state pension authorities in the 1950s, codified in International Society of Blood Transfusion standards, and is now implemented in software such as Familias, DNA-VIEW, and FaSTaR used across India, Europe, the US, and Australia.
The Y chromosome is inherited paternally and without recombination (except at the pseudoautosomal regions, PAR1 and PAR2, at the tips of the chromosome). All males in a direct paternal lineage carry the same Y-STR haplotype unless a de novo mutation has occurred. This makes Y-STR typing valuable for: establishing paternal kinship when a male donor has no reference sample but paternal relatives do; separating male contributors from female-dominated mixtures in sexual-assault casework; and following paternal lineage in historical and genealogical questions. The discriminating limitation is that Y-STR cannot distinguish between a man and his father, son, brother, or paternal uncle.
The X chromosome is inherited sex-linked. Females carry two X chromosomes (one from each parent); males carry one X (from their mother) and one Y (from their father). X-STR markers on the X chromosome are useful in specific kinship scenarios where autosomal STR is ambiguous: father-daughter identity (where all X-linked alleles in the daughter must be present in the father), half-sibling relationships involving a shared father, and grandmother-granddaughter relationships. The inheritance of a block of X-linked alleles (an X-haplotype) through a common female ancestor can be traced across multiple generations in a way not possible with autosomal STRs. The Investigator Argus X-12 kit and the Mentype Argus X-8 kit are the commercial multiplex systems for X-STR profiling, used in DVI and complex kinship casework in Germany (BKA), the UK, the Netherlands, and increasingly in Indian CFSL practice.
Mitochondrial DNA follows strict maternal inheritance, as described in the preceding topic. All children of a mother share her mtDNA haplotype, and this haplotype can be traced backward through any number of maternal generations. In a missing-persons case, a live maternal-lineage relative (mother, sibling, maternal half-sibling, maternal aunt or uncle, maternal cousin) can serve as a reference for comparison. In the 2004 Indian Ocean tsunami, which killed approximately 230,000 people across 14 countries including India, Sri Lanka, Thailand, and Indonesia, mtDNA typing was used extensively for victim identification when nuclear STR profiles could not be recovered from degraded remains. INTERPOL's DVI protocols for mass disaster deployments specifically include mtDNA as a marker pathway when nuclear DNA is exhausted.
| Marker class | Inheritance | Discriminating power | Distinguishes from paternal relatives? | Distinguishes from maternal relatives? |
|---|---|---|---|---|
| Autosomal STR | Biparental (Mendelian) | Very high (RMP < 10⁻¹⁵ at 20 loci) | Yes (unique to individual) | Yes (unique to individual) |
| Y-STR | Paternal (non-recombining) | Moderate (population haplotype frequency) | No (shared with all paternal-line males) | Yes |
| X-STR | Sex-linked | Moderate to high (sex-specific) | Partial (useful for father-daughter, specific pedigrees) |
The marker classes described above produce numbers, and those numbers enter a legal proceeding, the question is whether the court in each jurisdiction will hear them, and under what conditions.
Forensic DNA marker evidence has now been admitted in virtually every developed legal system, but the gatekeeping rules and reporting standards differ in ways that matter for a practitioner working across jurisdictions.
In the US, the Daubert standard (Daubert v. Merrell Dow Pharmaceuticals, 1993; Kumho Tire v. Carmichael, 1999) requires trial courts to examine whether a scientific method is testable, peer-reviewed, has a known error rate, and is generally accepted. SWGDAM (Scientific Working Group for DNA Analysis Methods) and its successor body publish interpretation guidelines that are treated by US courts as the relevant professional consensus. The expansion of CODIS from 13 to 20 loci in 2017 was specifically designed to reduce the probability of adventitious matches in the database as the NDIS grew past 10 million profiles. No US federal case has successfully excluded STR typing on scientific grounds since the mid-1990s.
In the UK, admissibility of expert DNA evidence is governed by the common-law framework from R v. Bonython (1984) as applied through R v. Doheny and Adams (1996), which established that the forensic scientist must give the random-match probability and must not offer an opinion on guilt. The Forensic Science Regulator (FSR) publishes Codes of Practice that accredited providers are required to follow, and the Forensic Science International-Genetics journal literature underpins most UK court testimony. The Forensic Regulator Act 2021 put the FSR on a statutory footing, with powers to set mandatory standards.
In India, Section 39 of the Bharatiya Sakshya Adhiniyam 2023 (BSA) governs expert opinion evidence (formerly Section 45 of the Indian Evidence Act 1872). The DNA Technology (Use and Application) Regulation Bill 2019, if enacted, would create a DNA Regulatory Board, specify accreditation requirements for DNA testing laboratories, define a national database governed by informed consent principles, and set conditions for the use of DNA profiles in civil and criminal proceedings. Until the Bill passes, the legal framework for DNA evidence quality assurance relies on NABL laboratory accreditation, CFSL internal SOPs, and judicial discretion on the admissibility of expert testimony.
In the EU, the Prüm Convention (Council Decision 2008/615/JHA) requires that profiles exchanged through the network meet minimum standards compatible with the European Standard Set loci. ENFSI's DNA Working Group publishes guidelines, including the European Network Forensic Science Institutes Monograph on the Validation and Implementation of Forensic DNA Typing Methods, which laboratories across the EU adopt as the basis for their internal validation and quality documents.
Colin Pitchfork was convicted in 1986 using DNA evidence generated by Alec Jeffreys' original RFLP method. Which property of RFLP typing made it inadequate for the next generation of forensic casework?
| Partial (X-blocks shared with maternal female lineage) |
| mtDNA | Maternal (uniparental) | Low (maternal haplotype frequency) | Yes | No (shared with all maternal-line relatives) |
| SNP (identification) | Biparental (Mendelian) | High at 50–100 loci | Yes | Yes |