Confidence Intervals and What a Sample Can Support

A confidence interval expresses how precisely a sample estimate pins down the true population value, given sampling variability. This topic explains the correct interpretation of interval coverage, shows how width depends on sample size and variance, and addresses the common misreading that a 95% interval contains the true value with 95% probability.

Last updated: 24 Jun 2026

A confidence interval is a range of values, computed from sample data, that is designed to capture an unknown population parameter with a specified long-run probability. If a forensic biologist estimates the frequency of a DNA profile from a database of 5000 individuals and reports a 95% confidence interval of 1-in-2000 to 1-in-800, this means the construction procedure used would bracket the true frequency in 95 out of 100 repeated samples of the same size. The key word is procedure: the confidence level describes the method, not any probability attached to this particular interval. Once the interval is computed, the true value either falls inside it or outside it. No further probability applies to that fixed pair of numbers.

This distinction matters in court. Expert witnesses who say the 95% interval means there is a 95% chance the true value lies inside are making a statement that frequentist statistics does not support. Judges, jurors, and opposing experts who catch this error can undermine otherwise solid forensic evidence. The same confusion has appeared in published forensic science literature, making it a known source of inferential error across disciplines from DNA typing to drug purity estimation.

Interval width is controlled by three factors: sample size, population variance, and the chosen confidence level. Wider intervals are more likely to capture the true value but give weaker guidance; narrower intervals are more precise but carry more risk of missing the true value. In forensic applications, the sample is often a reference database built over years, and the analyst inherits its size rather than choosing it. Understanding what that inherited sample size can and cannot support is a practical skill for any forensic scientist who must present quantitative conclusions.

By the end of this topic you will be able to:

State the correct frequentist interpretation of a confidence interval and identify the common misreading.
Construct a confidence interval for a mean using the standard error and a critical value, and explain each component.
Predict the direction and magnitude of change in interval width when sample size, variance, or confidence level is altered.
Distinguish a frequentist confidence interval from a Bayesian credible interval and state the additional input required for the Bayesian version.
Communicate interval-based conclusions in court language that is accurate without requiring the audience to understand sampling theory.

Key terms

Confidence interval (CI): A range computed from sample data using a procedure that, over many repetitions, would contain the true population parameter a stated percentage of the time. The percentage is the confidence level (commonly 90%, 95%, or 99%).
Standard error (SE): The standard deviation of the sampling distribution of an estimator. For the sample mean, SE equals the population standard deviation divided by the square root of n. A smaller SE means the estimate is more precise.
Critical value: The value from a reference distribution (z or t) that cuts off the desired tail probability. For a 95% two-sided interval using the normal distribution, the critical value is 1.96.
Coverage probability: The true long-run proportion of intervals, from repeated sampling, that contain the parameter. If a procedure has 95% nominal coverage and its assumptions are met, coverage probability equals 0.95.
Bayesian credible interval: An interval computed from the posterior distribution of a parameter, given a prior and the observed data. Unlike a confidence interval, a credible interval carries a direct probability statement: there is a stated probability that the parameter lies in the interval.
Margin of error: Half the width of a symmetric confidence interval, equal to the critical value multiplied by the standard error. Commonly reported in survey and database studies. The full interval is the point estimate plus or minus the margin of error.

The coverage guarantee and what it does not say

The confidence level attached to an interval describes the procedure that generated it, not the particular interval in hand. Imagine repeating the same experiment 100 times, each time drawing a fresh sample of the same size and computing a 95% interval from each sample. On average, 95 of those 100 intervals will contain the true parameter and 5 will not. You do not know, for any given interval, whether it is one of the 95 or one of the 5.

This is the correct statement. Three statements that look similar but are wrong: (1) there is a 95% probability that the true value is in this interval; (2) 95% of the data falls in this interval; (3) if the experiment is repeated, the result will fall in this interval 95% of the time. Each of these confuses the parameter, the data distribution, or the interval procedure with the others.

Why does the distinction matter practically? Because in legal settings, the phrase a 95% probability that the true frequency lies between X and Y implies the court can treat X and Y as bounds on a probability. If the interval is 1-in-1000 to 1-in-500, a defender might argue the court should use the larger value (weaker evidence) because there is a 5% chance even that is too small. The correct framing instead reports the estimate and interval as a description of estimation precision, not as a probabilistic bracket on the true value.

Constructing a confidence interval for a mean

The standard interval for a population mean uses the sample mean, the standard error, and a critical value from the t or z distribution. The formula is: interval equals sample mean plus or minus (critical value times standard error). The standard error equals the sample standard deviation divided by the square root of n. For large samples (n above about 30) the z critical value is used; for small samples the t critical value with n minus 1 degrees of freedom is used.

For the normal distribution at 95% confidence, the critical value is 1.96 (commonly rounded to 2 in approximate calculations). At 99% confidence it is 2.576. At 90% confidence it is 1.645. Choosing a higher confidence level produces a wider interval for the same data, because you need more room to achieve higher coverage probability.

Confidence level	z critical value	Effect on interval width	Typical forensic use
90%	1.645	Narrowest	Screening, rapid assessment
95%	1.960	Standard	Most validation studies, database frequency estimates
99%	2.576	Widest	High-stakes determinations, accreditation standards
99.9%	3.291	Very wide	Rare; used when false inclusion risk is critical

For proportions, such as the frequency of a feature in a reference database, the standard interval uses the sample proportion p-hat. The standard error is the square root of p-hat times (1 minus p-hat) divided by n. This approximation works well when both n times p-hat and n times (1 minus p-hat) are greater than 10. For rare features where the expected count is small, exact binomial intervals such as the Clopper-Pearson interval are more appropriate and give wider, more conservative bounds.

How sample size and variance control interval width

Interval width equals twice the margin of error, which equals twice the critical value times the standard error. Because the standard error is the population standard deviation divided by the square root of n, doubling n reduces the standard error by a factor of 1.41 and reduces the interval width by the same factor. To halve the interval width, n must be quadrupled. This inverse square root relationship is why large forensic reference databases have disproportionate value: each additional individual narrows the interval, but the gains diminish as the database grows.

Population variance has the opposite effect: higher variance produces a wider standard error and a wider interval for the same n. In forensic soil analysis, for example, elemental concentrations may vary greatly between geographic regions. A database drawn from a single narrow region will show lower variance and tighter intervals than a database drawn from a whole country. Both can be valid, but they answer different questions: the regional database estimates frequency within a region; the national database estimates frequency across a country.

The practical implication for forensic scientists: when a reference database is small, the confidence interval around a rare feature frequency may be very wide. A DNA profile frequency estimated from 500 individuals in a specific population group could have a 95% interval spanning an order of magnitude. Presenting only the point estimate and omitting the interval creates an appearance of precision the data cannot support. Courts in multiple jurisdictions have begun requiring disclosure of the database size and the interval alongside the point estimate, as discussed in Role of Statistics in Evidence Evaluation.

One-sided intervals and conservative reporting

A two-sided interval covers both tails of the sampling distribution and answers: what range of parameter values is consistent with this sample? A one-sided interval covers only one tail and answers: what is the highest (or lowest) plausible value of the parameter? Forensic applications often call for one-sided intervals because the inferential question is one-directional.

In DNA frequency estimation, the question relevant to defence is: what is the largest plausible value of the profile frequency? A large frequency means the evidence is weaker. The NRC II report in the United States and subsequent guidance from ENFSI and other bodies recommended reporting an upper confidence bound on the profile frequency, or using a conservative point estimate derived from the upper bound of the frequency of each allele, rather than a central estimate. This gives defendants the benefit of database uncertainty.

The corresponding one-sided 95% upper bound uses the critical value 1.645 rather than 1.96, because only one tail is being controlled. This makes the one-sided 95% upper bound narrower than the two-sided 95% upper limit for the same data, which is a source of confusion when comparing reported values across studies that use different conventions.

Confidence intervals versus Bayesian credible intervals

The intuitive interpretation people want for a confidence interval (there is a 95% probability the true value is in here) is actually the correct interpretation of a Bayesian credible interval. A 95% credible interval is derived from the posterior distribution of the parameter after combining the prior distribution with the likelihood from the data. It directly answers: given what I knew before and what the data show, where does 95% of the posterior probability for the parameter lie?

The price of the Bayesian interpretation is a prior. For a DNA profile frequency, the prior would encode beliefs about how common the profile is before seeing the database data. If the prior is flat (no prior knowledge), the credible interval and the confidence interval are often numerically similar. Where they diverge is when the prior is informative, either concentrating probability on small or large frequencies. In casework, choosing and justifying a prior is a substantive decision that affects the conclusion and must be disclosed and defended.

Misidentifying which framework is being used leads to specific errors. Treating a credible interval as a confidence interval understates the role of the prior. Treating a confidence interval as a credible interval makes an unwarranted probability statement about the parameter. Both errors appear in peer review and in court, and both undermine the credibility of statistical evidence.

Communicating interval uncertainty honestly in forensic reports and court

The challenge in court is to communicate what the interval means without requiring the audience to understand sampling theory. Several standard phrasings have been developed through guidelines from bodies including the UK Forensic Science Regulator, the Scientific Working Group for DNA Analysis Methods (US), and the European Network of Forensic Science Institutes. A recommended form is: the estimated frequency is X, based on a database of N individuals; the 95% confidence interval for this estimate is [lower bound] to [upper bound]; this means the procedure used to compute this interval will bracket the true frequency in 95 out of 100 comparable studies.

Under the Bharatiya Sakshya Adhiniyam 2023 (which replaced the Indian Evidence Act 1872), expert opinions on scientific matters are admissible where the expert has relevant qualifications and the underlying method is disclosed. Disclosing a confidence interval and the database size is part of disclosing the method. Under the US Daubert standard, the error rate and known standards of operation of a technique are relevant to admissibility. An interval width that is not disclosed is an undisclosed error rate. UK Crown Court guidance similarly requires that statistical evidence in forensic reports be accompanied by the assumptions and limitations of the analysis.

Two forms of honest communication are useful in practice. First, state the result and its interval together: the estimated match probability is 1 in 4000, with a 95% confidence interval of 1 in 8000 to 1 in 2200. Second, when a narrow conclusion is needed (for example, to say the frequency is no higher than 1 in 1000), use the upper bound of a one-sided confidence interval explicitly rather than the point estimate. Both approaches prevent misuse of an estimate as if it were a known fact rather than a sample-based approximation. The connection between these principles and the broader practice of evaluative reporting is treated in Numbers in Forensic Conclusions.

Worked example

Computing and reporting a confidence interval for a soil feature frequency

A forensic geologist estimates the frequency of a distinctive clay mineral combination in a regional soil database and must report the result with an appropriate interval.

A forensic geologist has examined a soil database of 320 samples from a coastal region. Of these, 12 samples show a specific combination of illite and smectite in the proportion observed at a crime scene. The geologist must estimate the frequency of this combination and compute a 95% confidence interval to accompany the estimate in the court report.

Compute the point estimate. The sample proportion is p-hat equals 12 divided by 320, which equals 0.0375 (or 3.75%, or approximately 1 in 27).
Check the approximation condition. n times p-hat equals 12; n times (1 minus p-hat) equals 308. The first value is just at the threshold of 10. Since it is close to the boundary, both the normal approximation interval and the Clopper-Pearson exact interval should be computed and compared.
Compute the standard error. SE equals the square root of (0.0375 times 0.9625 divided by 320), which equals the square root of 0.0001127, which equals 0.01062.
Apply the 95% critical value. The margin of error equals 1.96 times 0.01062, which equals 0.0208. The two-sided 95% interval is 0.0375 plus or minus 0.0208, giving approximately 0.017 to 0.058 (1 in 59 to 1 in 17).
Cross-check with the Clopper-Pearson exact interval. For 12 successes in 320 trials, the exact 95% interval is approximately 0.019 to 0.065. The normal approximation gives a similar lower bound but misses the upper tail slightly because n times p-hat is near 10. The exact interval is preferred in the report.
Draft the report statement. The geologist writes: the estimated frequency of this clay mineral combination in the regional database is 3.8% (12 of 320 samples). The 95% confidence interval, computed using the Clopper-Pearson exact method, is 1.9% to 6.5%. This means the method used to compute this interval would bracket the true regional frequency in 95 of 100 comparable studies conducted on samples of this size. The database covers the coastal zone only; frequencies in inland areas may differ.

Check your understanding

Question 1 of 4· 0 answered

A toxicologist reports a 95% confidence interval of 0.08 to 0.12 g/dL for the mean blood alcohol concentration in a reference sample. Which statement is correct?

Key Takeaways

A confidence interval describes the long-run coverage of a procedure: in repeated sampling, a stated percentage of intervals constructed this way will contain the true parameter. It does not give the probability that this particular interval contains the true value.
Interval width is controlled by sample size, population variance, and the chosen confidence level. To halve the width, quadruple the sample size; to increase the confidence level, accept a wider interval for the same data.
For rare features in small databases, the normal approximation interval may be too optimistic; exact methods such as Clopper-Pearson should be used when n times p-hat is below about 10.
A Bayesian credible interval does carry a direct probability statement about the parameter, but it requires a prior distribution. Treating a frequentist confidence interval as if it were a credible interval is a documented error in forensic reports and court testimony.
Honest court reporting names the database size, the point estimate, the interval and its confidence level, and the method used to compute it; omitting the interval conceals estimation uncertainty and has been challenged under admissibility standards in the US, UK, India, and the EU.

What does a 95% confidence interval actually mean?

A 95% confidence interval means that if the same sampling procedure were repeated many times, 95% of the intervals constructed would contain the true population parameter. It does not mean there is a 95% probability that this particular interval contains the true value. Once calculated, the interval either contains the true value or it does not.

How does sample size affect the width of a confidence interval?

Larger samples produce narrower intervals because the standard error of the estimate decreases as sample size grows. Specifically, the standard error is proportional to 1 divided by the square root of n. To halve the interval width, you need to quadruple the sample size. This is why large reference databases are important in forensic statistics.

Can a confidence interval be used as a probability statement about the true value?

No. A confidence interval is a statement about the long-run behaviour of the interval construction procedure, not a probability statement about the true value. The true parameter is a fixed (if unknown) quantity, not a random variable. Using Bayesian credible intervals, which do carry probability statements about the parameter, requires a prior distribution and is a separate framework.

What is the difference between a confidence interval and a Bayesian credible interval?

A confidence interval is a frequentist construct expressing long-run coverage: 95% of intervals from repeated sampling will contain the true value. A Bayesian credible interval expresses posterior probability: given the data and a prior, there is a 95% posterior probability that the parameter lies in the interval. Both look similar numerically but rest on different philosophical foundations and require different inputs.

Why do forensic scientists need to understand confidence intervals?

Forensic scientists use sample-based estimates in many contexts, including frequency databases for DNA, fingerprint, or soil analysis, and validation studies for analytical methods. A point estimate without an interval gives a false sense of precision. Courts and opposing experts can challenge conclusions that ignore sampling uncertainty, and an honest confidence interval communicates the limits of what the data can support.

Test yourself on Forensic Statistics with free, timed mocks.

Practice Forensic Statistics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.