Skip to content

Common Distributions in Forensic Science

Forensic scientists routinely encounter data that follow recognisable probability distributions: particle counts, event frequencies, chemical concentrations, and measurement errors each have characteristic shapes. Choosing the wrong distributional model distorts every inference that follows, from likelihood ratios to population database estimates.

Last updated:

Share

A probability distribution is a mathematical function that describes how likely different values of a variable are. In forensic science, the choice of distribution is not a technical formality: it is a substantive claim about how the data were generated. The four distributions most often encountered in forensic practice are the normal distribution, which describes symmetric measurement error; the binomial distribution, which models the number of successes in a fixed number of trials; the Poisson distribution, which models rare event counts; and the gamma distribution, which models right-skewed continuous quantities such as chemical concentrations. Each arises in specific forensic contexts, and fitting the wrong one to a data set produces incorrect probability statements, distorted likelihood ratios, and unreliable conclusions.

The consequences of a wrong distributional assumption range from mild to severe depending on how far the assumed model departs from the true data-generating process. A normal model applied to mildly skewed concentration data may give workable answers when the sample is large. The same normal model applied to count data that follow a Poisson distribution will produce nonsensical tail probabilities, because the normal assigns positive probability to negative values while count data cannot be negative. In courtroom settings, these errors propagate into the strength-of-evidence statements the expert communicates to the fact-finder.

Distributions are studied alongside basic probability rules and descriptive statistics, but their forensic application goes further than description. They underpin the population databases used to estimate allele frequencies, the reference intervals used to interpret chemical measurements, and the models used to calculate how many items in a seized consignment are likely to be of a particular type. Understanding where each distribution comes from, what its parameters mean, and what assumptions it requires allows the forensic practitioner to select models correctly and to scrutinise models others have applied.

By the end of this topic you will be able to:

  • Describe the shape, parameters, and key assumptions of the normal, binomial, Poisson, and gamma distributions.
  • Identify which distribution is appropriate for a given forensic data type and justify the choice on the basis of the data-generating process.
  • Explain how using a mismatched distribution affects the tail probabilities and likelihood ratios derived from forensic data.
  • Apply the Poisson model to particle count data and the normal model to repeated physical measurements at a basic level.
  • Describe the relationship between the binomial and Poisson distributions and state the conditions under which one approximates the other.
Key terms
Probability distribution
A function that assigns probabilities to each possible value (or range of values) of a random variable. It fully characterises the uncertainty about that variable before data are observed.
Parameters
The numerical constants that define the specific shape of a distribution within its family. The normal distribution has two parameters (mean and variance); the Poisson has one (rate); the binomial has two (number of trials and success probability); the gamma has two (shape and rate).
Likelihood ratio (LR)
The ratio of the probability of the observed evidence under two competing hypotheses. The distributional model chosen determines the probability calculations, so a wrong distribution produces a wrong LR.
Right skew
A distribution is right-skewed when its tail extends further to the right than to the left. Many forensic measurement quantities (trace concentrations, waiting times, particle counts) are right-skewed and cannot be modelled accurately by the symmetric normal distribution.
Rate parameter (lambda)
The single parameter of the Poisson distribution, equal to both the mean and the variance of the distribution. In forensic contexts lambda is the expected number of events (e.g., particles, rare alleles) per unit of time, area, or volume.
Shape and rate parameters
The two parameters of the gamma distribution. The shape parameter controls how peaked or spread the distribution is; the rate parameter (sometimes expressed as its inverse, the scale) controls the units. Together they allow the gamma to fit a wide range of right-skewed data.

The normal distribution: measurement error and physical properties

The normal distribution (also called the Gaussian distribution) is the bell-shaped curve that most people encounter first in statistics. It is defined by two parameters: the mean (the centre of the bell) and the variance (how wide the bell is, or equivalently its square root, the standard deviation). The distribution is symmetric around the mean, meaning values equally above and below the mean are equally probable. About 68% of values fall within one standard deviation of the mean, 95% within two, and 99.7% within three.

In forensic science, the normal distribution arises most naturally wherever a physical property is measured repeatedly by the same instrument or process. Repeated measurements of the refractive index of a glass fragment, the diameter of a textile fibre, the density of a paint chip, or the retention time of a compound on a chromatography column all tend to scatter normally around the true value. This happens because measurement error is the sum of many small, independent, random effects (temperature fluctuation, electronic noise, analyst technique variation), and the sum of many small independent effects converges to a normal distribution regardless of the shape of each individual component. This result is the central limit theorem.

The normal distribution is problematic when applied to data that cannot take negative values (counts, concentrations at or near zero) or data that are strongly skewed. A standard rule of thumb is that if the coefficient of variation (standard deviation divided by mean) exceeds about 0.3, the normal approximation begins to fail for concentration data, and a right-skewed model such as the log-normal or gamma should be considered. The log-normal distribution, where the logarithm of the variable is normally distributed, is a common choice for concentration data in toxicology and environmental forensics.

The binomial distribution: discrete outcomes and allele frequencies

The binomial distribution models situations where a process is repeated a fixed number of times, each repetition produces one of two outcomes (conventionally called success and failure), and the probability of success is the same on every repetition, with all repetitions independent of each other. If n is the number of trials and p is the success probability, the binomial gives the probability of observing exactly k successes.

The clearest forensic application is in population genetics. At a biallelic single-nucleotide polymorphism (SNP) locus, each copy of the chromosome either carries the variant allele (success) or does not. If the population frequency of the variant allele is p, and a diploid individual carries two independent copies, the number of variant alleles in that individual follows a binomial distribution with n = 2 and probability p. For a sample of n items drawn independently from a population, the binomial gives the probability of a specific count of items having a particular feature. This underlies random match probability calculations in DNA evidence and the analysis of population databases.

The binomial also applies in sampling problems: if a seized consignment contains a fraction p of items of type A, and n items are randomly selected, the number of type A items in the sample follows a binomial distribution. This allows a forensic scientist to estimate, with appropriate uncertainty, the proportion of type-A items in the consignment after examining a sample. Guidance on this application appears in forensic sampling standards including ENFSI guidelines for sampling seized drug consignments.

PropertyBinomialPoisson
Number of trialsFixed (n)Unlimited (large n, small p)
Number of parametersTwo (n, p)One (lambda = np)
Meannplambda
Variancenp(1-p)lambda
Typical forensic useAllele counts, sampling consignmentsRare event counts, particle densities

The Poisson distribution: particle counts and rare events

The Poisson distribution models the number of events that occur in a fixed interval of time or space when events are rare, occur independently of each other, and occur at a roughly constant average rate. Its single parameter, lambda, is both the mean and the variance of the distribution. This equality of mean and variance is a diagnostic feature: if the observed variance in a count data set is much larger than the mean (overdispersion), the Poisson model is misspecified and a negative binomial or other overdispersed model should be considered.

In forensic science, the Poisson distribution applies wherever particles or rare objects are counted in a defined area or volume. Gunshot residue (GSR) analysis provides a direct example: the number of characteristic GSR particles found on a swab from a suspect's hand is modelled as Poisson. Studies of background GSR deposition on unexposed individuals show that the count of characteristic three-component particles (lead, barium, antimony) on a randomly selected control subject follows a Poisson distribution with a low rate parameter, while a shooter or someone in close proximity shows a higher rate. Comparing the observed count against the background Poisson distribution is the basis of the statistical inference.

The Poisson is the mathematical limit of the binomial when n is very large and p is very small, with np equal to lambda. Practically, the Poisson is a good approximation to the binomial when n is greater than about 100 and p is less than about 0.01. This matters because many forensic rare-event problems are naturally framed as binomial (each item in a large population either has the rare feature or does not) but are more conveniently calculated using the Poisson formula.

The gamma distribution: concentrations and skewed continuous data

The gamma distribution is a flexible family for continuous positive-valued data that is right-skewed. It has two parameters: shape (often written alpha or k) and rate (often written beta, or its inverse the scale). When the shape parameter equals 1, the gamma becomes the exponential distribution. When the shape parameter is large, the gamma becomes approximately normal. This flexibility makes the gamma a natural first choice when data are clearly positive and skewed but a specific physical model is not available.

Chemical concentration data in forensic toxicology are a primary application. Blood alcohol concentration in a reference population of drivers is right-skewed: most drivers who have consumed alcohol have moderate concentrations, but a small proportion have very high concentrations that stretch the distribution rightward. A normal model underestimates the probability of extreme values. A gamma model (or log-normal, which is closely related) fits the right tail more accurately, which matters when computing the probability that a randomly chosen driver from the reference population would have a concentration above the legal limit.

The gamma distribution also describes waiting times between events, which arises in forensic intelligence contexts where the time between criminal events of a particular type is modelled. If individual events follow a Poisson process, the time until the k-th event follows a gamma distribution with shape parameter k. This connection between the Poisson and gamma distributions makes them a natural pair in the analysis of time-series data on criminal events.

In Bayesian forensic statistics, the gamma distribution serves a second role as the conjugate prior for the Poisson rate parameter. If a forensic scientist has prior beliefs about the rate of rare particle deposition, those beliefs can be represented as a gamma distribution. After observing count data, the posterior distribution (updated beliefs) is also gamma. This mathematical convenience has made gamma priors standard in Bayesian analyses of count data in forensic science, including the analysis of GSR and fibre transfer data.

How mismatched models affect forensic inference

The practical consequence of choosing the wrong distribution depends on which direction the mismatch runs. Three patterns are most common in forensic practice.

First, applying a normal model to count data produces estimates of negative counts, which are impossible. More subtly, the normal assigns symmetric tail probabilities, while count distributions are right-skewed. The result is that the probability of observing a high count is underestimated (making rare high counts seem even more surprising than they should), while the probability of a count near zero is overestimated. When this feeds into a likelihood ratio calculation, the LR for a high count under the prosecution hypothesis is inflated.

Second, applying a Poisson model to overdispersed data underestimates variance. The Poisson forces the variance to equal the mean, but if the true variance is higher, the model is too confident about what counts to expect. This makes unusual counts appear more extreme than they are, again inflating the apparent evidential weight. The ENFSI Guideline for Evaluative Reporting (2015) specifically addresses the need to validate distributional assumptions before computing likelihood ratios.

Third, applying a normal model to right-skewed concentration data underestimates the frequency of high concentrations in the reference population. This matters when the forensic question is whether an observed concentration is consistent with the background population or with a specific source. If the reference distribution is modelled as too thin in the right tail, extreme concentrations appear to be strong evidence for the specific-source hypothesis, when in fact they occur more often in the background population than the normal model predicts.

Selecting the right distribution: a decision framework

Choosing a distribution begins with characterising the data type. The first question is whether the variable is discrete (integer counts) or continuous (measured on a scale with decimal values). Discrete data cannot be modelled by the normal or gamma. The second question, for continuous data, is whether the variable can be negative. Physical measurements (mass, concentration, fibre diameter) are bounded below by zero, so the normal is at best an approximation that fails in the left tail. The third question is whether the data are symmetric or skewed. Symmetric data with thin tails support the normal; right-skewed positive data support the gamma or log-normal.

For discrete count data, the further question is whether the number of trials is fixed. If n is fixed and p is known or estimable, the binomial applies. If n is very large and p is very small, the Poisson is computationally simpler and adequately accurate. If the observed variance substantially exceeds the mean, the negative binomial should be considered.

Data typeDistributionKey forensic example
Continuous, symmetric, can be negativeNormalMeasurement error in glass RI, fibre diameter
Continuous, positive, right-skewedGamma / log-normalBlood alcohol concentration, drug levels in tissue
Count, fixed trials, moderate pBinomialAllele counts in a sample, items of type A in a consignment
Count, rare events, large n small pPoissonGSR particles per swab, rare allele occurrences per gel lane
Count, overdispersedNegative binomialParticle counts with clustering or heterogeneous rates

Once a candidate distribution is selected, it should be fitted to available reference data and tested against the data graphically and with formal goodness-of-fit tests. The fitted model, the testing procedure, and the results should be documented and reported. When the forensic expert presents a likelihood ratio or a probability statement in court, the underlying distributional assumption is a material part of the method. Courts in multiple jurisdictions, including the UK Court of Appeal guidance on expert evidence (R v Dlugosz [2013] EWCA Crim 2) and US Daubert standard practice, require that methodology be disclosed and tested.

Check your understanding
Question 1 of 4· 0 answered

A forensic chemist measures the refractive index of a glass fragment twelve times. The measurements cluster around 1.518 with a standard deviation of 0.0002 and appear symmetric. Which distribution is most appropriate for modelling these repeated measurements?

Key Takeaways

  • The normal distribution models symmetric continuous measurement error; the binomial models counts of successes in fixed trials; the Poisson models rare event counts with large n and small p; and the gamma models right-skewed continuous positive quantities such as chemical concentrations.
  • Applying the wrong distribution is not merely a technical error: it changes the probabilities assigned to observed data under each hypothesis and therefore changes the likelihood ratio and the evidential weight communicated to a court.
  • The Poisson distribution is the limiting case of the binomial when n is very large and p is very small; the two distributions assign nearly identical probabilities in this regime, and the Poisson is simpler to compute. Overdispersion (variance much greater than mean) invalidates the Poisson and calls for a negative binomial model.
  • Distribution selection should be based on the data-generating process, not convenience. Graphical checks (quantile-quantile plots) and formal goodness-of-fit tests should be performed and documented before any inference is reported.
  • Distributional assumptions are a material part of a forensic statistical method and should be disclosed in reports and testimony. Standards including the ENFSI Guideline for Evaluative Reporting and Daubert-framework requirements in US courts treat unexplained or untested model assumptions as a methodological weakness.
Why does the choice of probability distribution matter in forensic evidence evaluation?
Every statistical inference about forensic evidence depends on a model of how the data are generated. If the assumed distribution does not match the actual data-generating process, the resulting probabilities are wrong. A likelihood ratio computed under a normal distribution model applied to count data will be systematically biased, potentially inflating or deflating the evidential weight assigned to the findings.
When is the normal distribution appropriate for forensic data?
The normal distribution applies when measurements cluster symmetrically around a mean with tails that thin out rapidly. In forensic contexts this includes repeated measurements of a physical quantity (glass refractive index, fibre diameter, ink optical density), where random measurement error tends to be normally distributed. It is not appropriate for count data, event frequencies, or right-skewed concentration data.
What is the binomial distribution and where does it arise in forensics?
The binomial distribution models the number of successes in a fixed number of independent trials where each trial has the same probability of success. In forensics it applies to questions like: given a population frequency p for a DNA allele, what is the probability of observing exactly k copies of that allele in a sample of n items drawn from the population? It also underlies random match probability calculations for discrete genetic markers.
When should the Poisson distribution be used instead of the binomial?
The Poisson distribution is appropriate when events occur independently in time or space at a roughly constant average rate, and the number of trials is very large while the probability per trial is very small. In forensics it models particle counts on a surface (gunshot residue particles per unit area), the number of rare DNA base-pair mutations in a region, or the frequency of rare events in a large database. The Poisson is the limiting case of the binomial when n is large and p is small.
What role does the gamma distribution play in forensic statistics?
The gamma distribution models continuous positive-valued data with a right-skewed shape. It is commonly used for chemical concentration data (drug concentrations in blood or tissue), times between events, and waiting times in forensic processes. When a trace compound is present at low concentrations, its distribution across a reference population is often better described by a gamma model than a normal one, and fitting the wrong model will produce inaccurate reference interval estimates.

Test yourself on Forensic Statistics with free, timed mocks.

Practice Forensic Statistics questions

Found this useful? Pass it along.

Share

Spotted an error in this page? Report a correction or read our editorial standards.

Your journey to becoming a forensic professional starts here.

Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.