Practice with national-level exam (FACT, FACT Plus, NET, CUET, etc.) mocks, learn from structured notes, and get your doubts solved in one place.
Statistical kinship at the bench: the trio paternity index, the duo (motherless) case, sibling and grandparental indices, kinship LR software (Familias, DNA-VIEW, FaSTaR, EasyDNA), and the missing-persons reference-sample design that anchors national missing-persons databases like NamUs (US), the UK Missing Persons Unit, and India's Track Child platform.
Last updated:
A forensic DNA laboratory receives two categories of kinship cases that share the same statistical engine but sit at opposite ends of the emotional and legal spectrum. In paternity testing, the question is whether a named man is the biological father of a named child. In missing-persons casework, the question is whether human remains or a living unidentified person shares the genome of an unknown family's relative who disappeared months or years earlier. Both problems reduce to the same mathematical operation: computing a likelihood ratio that compares the probability of the observed genotype data under a proposed biological relationship against the probability under the null hypothesis that the two parties are unrelated.
The trio paternity case, in which mother, child, and alleged father are all genotyped, is the simplest kinship problem and the one that established the field. Its extension to the duo or motherless case, where only child and alleged father are tested, is arithmetically more demanding and more susceptible to error when allele frequencies are imprecise or the population is endogamous. Beyond paternity, sibling identification, grandparental analysis, half-sibling discrimination, and the reconstruction of complex pedigrees from a set of reference samples collected during a disaster or a missing-persons investigation all extend the same LR framework into territory where software is no longer optional.
Three continents run large-scale operational missing-persons programs that accept DNA reference samples from families: the US National Missing and Unidentified Persons System (NamUs), the UK Missing Persons Unit coordinated through the National Crime Agency, and India's Track Child portal managed by the Ministry of Women and Child Development in partnership with NCRB. Each program has different DNA-intake protocols, different laboratory networks, and different thresholds for reporting a candidate match. Across all three, the quality of the reference-sample design, which relatives are sampled, which loci are typed, what population database is used for allele frequencies, determines whether a match is reportable or inconclusive.
*The paternity index number on a report is not a probability; it is a likelihood ratio, and the distinction matters enormously in court.*
The paternity index (PI) for a single STR locus is the probability of the child's paternal allele given that the alleged father (AF) is the biological father, divided by the probability of that same allele given that a random unrelated man from the relevant population is the biological father. When the mother is tested, her contribution to the child's genotype is subtracted first; what remains is the one allele the child must have received from the father. If the AF carries that allele, the numerator is 1 (or 0.5 if the AF is heterozygous at that locus), and the denominator is the allele's frequency in the reference population. If the AF does not carry the obligate paternal allele, the numerator is 0 and the result is exclusion.
Test yourself on Forensic Biotechnology with free, timed mocks.
Practice Forensic Biotechnology questionsThe combined paternity index (CPI) is the product of the individual-locus PIs across all typed loci. With a 20-locus CODIS or ESS panel and typical European or South Asian allele frequencies, a CPI of tens of millions is routine for a true father-child pair. The Probability of Paternity (POP), derived from the CPI using Bayes' theorem with a prior of 0.5 (equal prior for paternity and non-paternity), is what most commercial labs report as a percentage on the final certificate: 99.9999% Probability of Paternity means a CPI of 999,999.
AABB (Association for the Advancement of Blood Banking, US) accreditation standards require a minimum CPI of 100 before any inclusion can be reported. ENFSI DNA WG recommendations in Europe suggest a probability of paternity above 99.99% as the reporting threshold for civil paternity matters. In India, the DNA Technology (Use and Application) Regulation Bill 2019 contemplates a national standard for paternity testing laboratories, though no final regulation was in force as of the time of writing; many Indian accredited labs currently follow AABB or ENFSI guidance voluntarily.
The choice of allele-frequency population database has a material effect on the PI at each locus and thus on the CPI. For South Asian communities, the widely used databases (NIST STRBase US Caucasian and African-American, FSS UK Caucasian) may not accurately represent allele frequencies; when a case involves an Indian or South Asian family, use of a validated South Asian population database substantially reduces the uncertainty in the reported CPI.
*Each step away from the trio configuration multiplies the statistical uncertainty by a factor the laboratory must explicitly account for.*
The duo (motherless) case arises when the mother is unavailable for testing, most commonly in post-mortem paternity disputes, immigration cases, or cases where the mother declines to participate. Without knowledge of the maternal allele contribution, the alleged father's alleles at each locus must be evaluated against all possible maternal contributions from the population, which broadens the denominator term and generally reduces the PI per locus relative to the trio. At highly polymorphic loci with many rare alleles, this effect is modest; at loci where the child's alleles are common, the duo PI can be substantially lower than the trio PI for the same alleged father. Courts in Germany, Australia, and the United States have all addressed when a duo result alone is sufficient for a civil paternity declaration; the threshold is typically a CPI above 1,000 in the duo configuration.
Sibling indices quantify how much more likely two people are to be full siblings rather than unrelated. A full-sibling likelihood ratio (SLR) is calculated locus by locus using the transmission probabilities for shared parental alleles. Unlike the paternity PI, the SLR does not exclude with certainty when alleles differ: full siblings share on average 50% of alleles, so mismatches at any locus are expected. SLRs for full-sibling pairs typically fall between 100 and 10,000 for a 20-locus profile; an SLR below 1 favours non-sibship. The half-sibling configuration (which shares one parent) produces an SLR roughly midway between the unrelated and full-sibling values, and distinguishing full from half-sibling pairs is a genuinely difficult problem when populations are endogamous and shared alleles are frequent.
Grandparental kinship (comparing a child to one or both grandparents when the alleged parent is deceased or unavailable) is the configuration most frequently encountered in inheritance disputes, immigration family reunification cases, and post-mortem paternity after a soldier's or accident victim's death. The grandparental LR is lower than the trio PI for the same allele set because only 25% of the child's genome is expected to derive from each grandparent. Typing additional relatives when available, aunts, uncles, the other grandparent, substantially increases the grandparental LR, and the software tools discussed in Section 3 handle these extended pedigrees.
*Manual kinship LR calculation works for a trio; for anything more complex, validated software is the only defensible approach.*
Four software platforms dominate operational kinship LR calculation in forensic and immigration testing contexts worldwide.
Familias (developed by Thore Egeland and colleagues at the Norwegian Institute of Public Health and the Oslo University Hospital, freely available) is the most widely used academic and operational kinship platform outside North America. It accepts arbitrary pedigree structures, handles up to thousands of loci simultaneously, and supports a Monte Carlo simulation module for validating the reported LR distribution under the proposed pedigree. Familias is the reference platform for ISFG (International Society for Forensic Genetics) kinship workshops and has been validated against the ENFSI DNA WG database. Swedish, Norwegian, Dutch, and Polish national forensic labs run it in production.
DNA-VIEW (Charles Brenner, Forensic Mathematics, Oakland) is the long-standing North American reference platform, first deployed in the early 1990s and used extensively for kinship calculations in immigration casework by the US Department of State and USCIS. DNA-VIEW handles complex pedigrees including those with inbreeding loops, uses Brenner's analytical PI formulations rather than Monte Carlo simulation, and outputs interpretable results that courts across the US and Australia have accepted for decades.
FaSTaR Kinship (a more recent platform developed for high-throughput DVI kinship matching, deployed in the INTERPOL sphere through the European STADNAP and IDENTIFYING DVI projects) focuses specifically on large-scale reference-sample databases where thousands of family-victim pairs must be matched simultaneously. It uses a likelihood-ratio matrix computed across the entire pedigree database and flags candidate pairs above a specified threshold for expert review.
EasyDNA (the software tier of the commercial EasyDNA laboratory group, with operations in the UK, Australia, US, Canada, India, and South Africa) is the platform most frequently encountered in civil paternity and immigration testing outside the DVI context. It is accredited under ISO 17025 in its major markets and uses population databases specific to the test population (South Asian, Afro-Caribbean, etc.) for each jurisdiction's casework.
*Which relatives you choose to swab on day one determines whether a match will be reportable when remains surface two years later.*
A missing-persons DNA case begins with a reference-sample strategy before any laboratory work is done. The optimal reference for identifying unknown remains is a direct reference sample from the missing person themselves (a stored buccal swab, a biological sample from a personal effect). When that is unavailable, the laboratory must rely on kinship references from relatives, and the kinship LR it can ultimately report depends entirely on the combination and degree of relatedness of those relatives.
The hierarchy of kinship reference value, holding the number of relatives constant, is approximately:
In practice, the reference-collection officer, a family liaison officer, a DVI officer, or a law-enforcement detective, often collects samples from whoever presents at the police station on day one. This is a critical juncture: collecting a buccal swab from a sibling when the missing person's parent is also available and willing is a significant missed opportunity that may reduce the eventual LR from 1,000,000 to 10,000 on the same remains.
NamUs (National Missing and Unidentified Persons System, US): Established in 2007 and fully nationalised under the Brittany Smith Act 2012, NamUs maintains two matched databases: the Unidentified Persons database (PM profiles from unidentified remains) and the Missing Persons database (AM reference profiles from biological relatives). Labs across the US submit profiles to the FBI via NDIS CODIS and cross-reference with NamUs simultaneously. As of 2023, NamUs had facilitated more than 23,000 identifications. The system accepts kinship reference samples from any first-degree relative and stores pedigree linkages for the automated kinship search.
UK Missing Persons Unit (UKMPU): Coordinated by the National Crime Agency, the UKMPU maintains the UK Missing Persons DNA database, accepting reference samples from police forces across England, Wales, Scotland, and Northern Ireland. Post-mortem profiles of unidentified remains submitted to the database are cross-matched against both direct references and kinship references. The Forensic Science Regulator's Codes of Practice (now statutory under the Forensic Science Regulator Act 2021) require accredited laboratories handling UKMPU casework to validate their kinship calculations against specific population databases (UK Caucasian, UK South Asian, Afro-Caribbean as applicable) and to report LR thresholds and uncertainties explicitly.
India's Track Child: Managed by the Ministry of Women and Child Development (MWCD) and integrated with the National Crime Records Bureau (NCRB), Track Child is a portal for reporting and tracking missing children. DNA intake into Track Child casework is handled through state forensic science laboratories and CFSL, though the integration between the portal's biographic records and DNA laboratory outputs is less automated than NamUs. The DNA Technology (Use and Application) Regulation Bill 2019 proposes a National DNA Data Bank with a Missing Persons index that would formalise this infrastructure. India's Child Welfare Committees and District Child Protection Units (under the POCSO framework) serve as the primary family-contact nodes for reference-sample collection.
*Consanguineous families present a specific statistical trap that standard paternity software was not designed to handle.*
Standard kinship LR software assumes that the two parties being compared are either related in the proposed manner or unrelated. In populations with significant consanguinity, common in parts of South Asia, the Middle East, North Africa, and historically isolated communities worldwide, the alternative hypothesis of unrelatedness is not actually the correct null hypothesis, because the alleged unrelated person may in fact share distant kinship with the child or the missing person through the endogamous community structure.
When consanguinity is a realistic possibility, the kinship LR denominator should not use the random-population allele frequency alone; it should incorporate the probability of allele sharing under the actual population inbreeding coefficient (F). All four major software platforms (Familias, DNA-VIEW, FaSTaR, EasyDNA) can incorporate a non-zero F value if it is estimated or provided, but in operational casework this correction is frequently omitted because F is difficult to estimate reliably for a specific family. The ISFG commission on missing-persons DNA identification recommends that labs servicing high-consanguinity populations run sensitivity analyses across a range of F values (typically 0.01 to 0.0625) and report the LR range, not a single point estimate.
A second complexity arises in pedigree reconstruction from a set of survivors when the missing person's entire nuclear family has perished, leaving only cousins and extended relatives as references. This configuration, encountered in large-scale disasters where entire households are missing, requires dedicated software (M-FIsys for DVI; Familias extended-pedigree mode) to compute an LR at all, because the transmission probabilities through multiple meioses are beyond hand calculation.
The Combined Paternity Index (CPI) is best described as: