Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.
How forensic geologists decide whether two soil samples share a common source, using classical statistics and likelihood ratios, and why the reference population is the hardest part of that judgment.
Last updated:
A soil analyst holds two vials. One came from the suspect's boot. One came from the crime scene. The question is as simple as it is difficult: could these two samples have come from the same patch of ground? Getting from that question to a number that holds up in court takes statistics, and the statistics are only as good as the reference population you have built around them.
This is the core methodological challenge in forensic geology. Soil composition varies at the metre scale, which is its strength as forensic evidence. But that same variability means that deciding what counts as a meaningful match, and what counts as coincidence, requires a careful statistical framework. The field has moved from simple profile comparisons and visual colour matching toward multivariate analysis and likelihood ratios, but the reference population problem runs through all of it.
This topic works through the main statistical tools used in soil evidence: principal component analysis and linear discriminant analysis for making sense of multi-element geochemical data, Bayesian likelihood ratios for expressing the weight of a match, and the honest accounting of false-positive rates. It pays particular attention to the reference population question, because that is where most of the real scientific debate happens and where opposing experts tend to fight hardest.
The question sounds simple. The answer requires a population.
Forensic comparison always asks a two-part question: are these samples similar, and if so, is that similarity unusual or commonplace? The first part is chemistry and mineralogy. The second part is statistics, and it depends entirely on knowing how the sample compares to what else is out there.
Soil makes the second question genuinely hard. Geochemical composition can vary dramatically over a few metres where a soil boundary crosses a geological contact, a drainage channel, or a fill deposit. That variability gives soil its discriminating power: two samples taken from different locations can often be distinguished. But it also means the reference population that represents 'anywhere else' must be carefully defined and sampled. A poorly designed reference collection can both inflate and deflate the apparent rarity of a match.
Published studies have examined how misclassification rates change with reference-population design. Morgan and Pringle (2012) tested soil discrimination across a range of English landscapes and found that misclassification rates ranged from under 5% in geologically diverse areas to over 20% in geologically uniform ones. Pye and Blott have demonstrated similar variation across different analytical protocols. These are not theoretical concerns. They translate directly into the strength of a court opinion.
Twenty elemental concentrations per sample is too many to reason about directly.
A modern soil analysis by inductively coupled plasma mass spectrometry (ICP-MS) can measure thirty or more elements in a single sample. Each element is a dimension, and a case involving a questioned sample and a hundred reference samples is a cloud of points in thirty-dimensional space. No human can visualise that. PCA is the standard tool for collapsing it into something interpretable.
PCA computes new axes, called principal components, that are linear combinations of the original variables and are ordered by how much variance they capture. The first principal component might account for 45% of the total variation in a geochemical dataset, the second another 20%, and so on. By plotting samples along the first two or three components, an analyst can see whether the questioned sample clusters with the reference samples from the suspect's alleged location or sits apart.
LDA goes one step further. Where PCA is unsupervised (it does not know which samples belong to which group), LDA is trained on labelled samples to find the axis that maximally separates known classes. If an analyst has well-characterised samples from several distinct source areas and wants to classify a questioned sample into one of them, LDA provides a principled way to make that assignment along with a posterior probability.
| Method | Supervised? | Output | Best use in casework |
|---|---|---|---|
| PCA | No | Score plot, explained variance | Visualising clustering, flagging outliers |
| LDA | Yes | Class assignment, posterior probability | Classifying questioned sample into defined source areas |
| Mahalanobis distance | No (uses group stats) | Distance from group centroid | Flagging whether a sample is within group scatter |
| Likelihood ratio (multivariate) | No | Numerical weight of evidence | Formal court-ready evaluation of the match strength |
The LR separates the science from the legal decision.
The likelihood ratio is now the preferred reporting framework in forensic science across multiple disciplines, including soil comparison. Its appeal is that it makes explicit what question the science can answer and what question it cannot. The scientist answers: given the observed geochemical and mineralogical data, how much more probable is this degree of similarity under the hypothesis that both samples came from the same location than under the hypothesis that they came from different locations? The court, not the scientist, then weighs that number against the rest of the evidence.
Formally, the LR is the ratio of two probabilities. The numerator is the probability of the observed data if the prosecution's hypothesis is correct (same source). The denominator is the probability of the observed data if the defence hypothesis is correct (different source). The denominator requires sampling the reference population to estimate how often soils from unrelated locations look as similar as the two in question. This is exactly where the reference population design matters most.
LR values in soil cases reported in the literature range widely, from modest values around 10-100 in geologically uniform areas to values exceeding 100,000 in cases where the questioned soil contains a highly distinctive mineralogical assemblage. The ENFSI verbal equivalence scale (discussed in the next topic) provides a way to communicate these magnitudes in plain language without misrepresenting the precision of the estimate.
The denominator is only as good as the sampling behind it.
There is no universal rule for how large a reference population must be or how it should be sampled. The answer depends on the case geography, the geological diversity of the area, and the analytical methods used. A case where the questioned site is in the middle of a geologically uniform glacial plain needs a different approach from one where the site sits on a narrow band of serpentinite surrounded by quite different geology.
The gap between what is ideal and what is achievable in casework is real. A forensic geologist may receive samples months after an incident, by which time seasonal changes have altered surface chemistry and the ability to sample the exact relevant area may be limited by access restrictions or development. The report must be honest about these limitations and what they mean for the precision of the LR estimate.
Error rates are empirical, not theoretical, and they vary by geology.
One of the Daubert criteria for admissibility of expert scientific testimony is a known or knowable error rate. For geological soil evidence, this means false-positive rates (two samples from different sources classified as matching) and false-negative rates (two samples from the same source classified as non-matching) must be estimated from empirical testing, not asserted from first principles.
Published validation studies provide benchmarks. Pye and Blott conducted blind trials comparing colour, particle size, and geochemical methods across diverse English landscapes and reported correct classification rates of 80-95% depending on method and geology. Morgan and Pringle used a known-source soil set from English agricultural and urban environments and found that multi-element ICP-MS data with LDA gave the lowest false-positive rates, typically below 10% in geologically diverse regions.
The numbers are only useful if the court understands what they are measuring.
Statistical outputs from PCA, LDA, or LR calculations are not self-explanatory to a jury or a judge with no statistical training. The forensic geologist's job is to translate them honestly without either overstating the precision of the estimate or understating the weight of genuine evidence.
The LR framework helps because it separates the scientific assessment from the legal conclusion. The scientist says: the observed similarity is X times more probable if both samples came from the same location than if they came from different locations. The court then combines that with everything else it knows about the case. The scientist does not say 'these samples match' as though that settles the question, nor do they say 'it is probable that the suspect was at the scene,' which conflates the LR with a posterior probability.
Why is the reference population the hardest part of the forensic soil comparison?
Test yourself on Forensic Geology and Geoforensics with free, timed mocks.
Practice Forensic Geology and Geoforensics questionsSpotted an error in this page? Report a correction or read our editorial standards.