Measurement Uncertainty and Proficiency Testing

Q: What is measurement uncertainty and why does it matter in forensic science?

Measurement uncertainty is a parameter that characterises the spread of values that could reasonably be attributed to a measured quantity. In forensic science it matters because every numerical result, whether a blood alcohol concentration, a drug purity figure, or a trace element ratio, carries inherent doubt arising from instrument noise, calibration limits, and sampling variation. Reporting a result without its uncertainty misrepresents the precision of the evidence and can mislead a court about the strength of the finding.

Q: How is uncertainty propagated through a forensic calculation?

When a result depends on several measured inputs, the uncertainty in each input contributes to the uncertainty in the final result. For a function of independent inputs, the combined uncertainty is the square root of the sum of squared partial derivatives multiplied by the respective input uncertainties. In practice, forensic laboratories use sensitivity coefficients derived from the measurement equation to propagate uncertainty through steps such as dilution factors, calibration curve fitting, and density corrections.

Q: What is proficiency testing and how does it relate to declared uncertainty?

Proficiency testing (PT) is a scheme in which an external organiser distributes samples of known or consensus composition to participating laboratories, which analyse them without knowing the reference value. Each laboratory's result is compared to the reference value and scored, most commonly using the z-score, which expresses the deviation in units of the standard deviation of the assigned value. If a laboratory's declared uncertainty is realistic, the majority of its PT z-scores should fall within plus or minus two. Systematic bias or inflated uncertainty claims are exposed when results consistently miss the target or when the z-scores are implausibly small.

Q: How should uncertainty be incorporated into evaluative opinions and court reports?

An evaluative opinion should use the uncertainty interval to bound the range of values consistent with the measured result. For a substance concentration that determines guilt or innocence relative to a legal threshold, the uncertainty interval must be explicitly considered: if the threshold falls within the interval, the opinion should acknowledge that the measurement alone cannot resolve whether the threshold was crossed. Court reports in accredited laboratories are expected to state the expanded uncertainty at a specified coverage probability, typically 95 percent, alongside the reported value.

Measurement uncertainty quantifies the doubt attached to every forensic measurement, and the GUM framework provides the standard method for expressing and propagating that doubt through laboratory calculations. Proficiency testing offers an independent, external check on whether a laboratory's declared uncertainty is realistic and whether its reported results are fit for use in court.

Last updated: 24 Jun 2026

Measurement uncertainty is a quantitative statement of the doubt attached to a forensic measurement result. Every number reported by a laboratory, from blood alcohol concentration to drug purity to trace-element ratios, is subject to variation from instrument noise, calibration errors, sampling, and operator effects. The GUM (Guide to the Expression of Uncertainty in Measurement), produced by eight international metrology and standardisation bodies, provides the universally accepted method for identifying, combining, and reporting these sources of doubt. A result reported without uncertainty is incomplete: courts and forensic evaluators cannot properly interpret what the number means without knowing the range of values consistent with it.

Uncertainty propagation is the process of carrying input uncertainties through the mathematics of a measurement procedure. When blood alcohol is calculated from a breath-test reading corrected for partition ratio and temperature, each of those correction factors carries its own uncertainty, and these combine in the final result. The GUM specifies how to propagate Type A uncertainties (estimated from repeated observations) and Type B uncertainties (estimated from calibration certificates, published data, or scientific judgement) into a single combined standard uncertainty, then expand it to cover a stated probability, usually 95 percent.

Proficiency testing (PT) provides the external check. An independent organiser distributes blind samples to participating laboratories, compares each result against the assigned reference value, and scores performance using the z-score or equivalent statistic. If a laboratory's declared uncertainty budget is realistic, its PT results should land within acceptable limits the great majority of the time. Patterns of bias or inflated uncertainty claims are exposed through PT, and accreditation bodies in the UK, US, EU, India, and elsewhere require regular satisfactory PT participation as a condition of accreditation under ISO/IEC 17025.

By the end of this topic you will be able to:

Distinguish Type A and Type B uncertainty evaluation and explain how they are combined under the GUM framework.
Apply the law of propagation of uncertainty to a two-input forensic measurement and calculate the combined standard uncertainty.
Calculate an expanded uncertainty at the 95 percent coverage level using the coverage factor k = 2.
Interpret a proficiency testing z-score and explain what satisfactory and unsatisfactory performance indicate about a laboratory's uncertainty budget.
Describe how declared uncertainty should be incorporated into an evaluative opinion and disclosed in a court report when a measured value is close to a legal threshold.

Key terms

GUM: Guide to the Expression of Uncertainty in Measurement. The international standard, maintained by eight metrology and standardisation bodies, that defines Type A and Type B uncertainty evaluation, uncertainty propagation, and expanded uncertainty. ISO/IEC 17025 requires laboratories to implement GUM-consistent uncertainty procedures.
Type A uncertainty: Uncertainty evaluated by statistical analysis of a series of observations. The standard uncertainty is the standard deviation of the mean (standard error) of the repeated measurements. Requires a sufficient number of replicates to be reliable.
Type B uncertainty: Uncertainty evaluated by means other than statistical analysis of repeated measurements. Sources include calibration certificates, instrument specifications, reference data, and expert scientific judgement. Assigned a probability distribution (rectangular, normal, or other) and converted to a standard deviation equivalent.
Combined standard uncertainty: The overall standard uncertainty of a result, obtained by combining all Type A and Type B component uncertainties in quadrature (square root of the sum of squared components weighted by sensitivity coefficients). Denoted u_c.
Expanded uncertainty: The combined standard uncertainty multiplied by a coverage factor k, chosen to achieve a stated coverage probability. For a normal distribution, k = 2 gives approximately 95 percent coverage. Reported as U = k × u_c alongside the measurement result.
z-score (proficiency testing): A performance statistic in PT defined as z = (x - X) / s_PT, where x is the laboratory's result, X is the assigned value, and s_PT is the standard deviation for proficiency assessment. |z| ≤ 2 is satisfactory, 2 < |z| < 3 is a warning, and |z| ≥ 3 is unsatisfactory under ISO 13528.

The GUM framework: sources and types of uncertainty

The GUM distinguishes between the error in a measurement (the difference between the measured value and the true value, which is never exactly known) and the uncertainty (a quantified interval that characterises the spread of values that could reasonably be attributed to the measurand). This distinction matters in court: a laboratory does not claim to know the true value, it claims that the true value lies within the uncertainty interval with a stated probability.

Type A evaluation applies when repeated measurements of the same quantity under the same conditions are available. The standard uncertainty is the standard deviation of the mean: u_A = s / sqrt(n), where s is the standard deviation of the individual observations and n is the number of replicates. Blood alcohol analysis, drug weighing, and DNA quantification all generate replicate data suitable for Type A evaluation. The reliability of u_A improves as n increases, and laboratories should document the minimum n required before a Type A estimate is considered stable.

Type B evaluation covers all sources that cannot be assessed by direct statistical analysis of repeated measurements. A calibration certificate for a reference standard gives an expanded uncertainty U_cert and a coverage factor k; the standard uncertainty is U_cert / k. A balance specification quoting a maximum permissible error of plus or minus 0.1 mg is treated as a rectangular distribution with half-width a = 0.1 mg and standard uncertainty a / sqrt(3). The assignment of a probability distribution to a Type B source is a professional judgement, and laboratories must document the reasoning.

Propagating uncertainty through forensic calculations

Most forensic measurements are not direct: the reported result is a function of several inputs, each with its own uncertainty. The GUM law of propagation of uncertainty states that for a result y = f(x_1, x_2, ..., x_n) where the inputs are independent, the combined standard uncertainty is:

u_c(y) = sqrt[ sum_i (df/dx_i)^2 * u(x_i)^2 ]

The partial derivative df/dx_i is called the sensitivity coefficient for input x_i. It converts the uncertainty in that input into its contribution to the output uncertainty. For a product y = x_1 * x_2, the relative combined uncertainty is sqrt[ (u_r(x_1))^2 + (u_r(x_2))^2 ], where u_r denotes relative (fractional) standard uncertainty. This additive-in-quadrature rule means that a large uncertainty in one input dominates if the others are small, and doubling precision on minor contributors has little effect.

Measurement step	Uncertainty source	Evaluation type	Typical contribution
Reference standard concentration	Calibration certificate	Type B (normal)	0.2 to 0.5%
Weighing of sample	Balance resolution + linearity	Type B (rectangular)	0.05 to 0.2%
Volume of solvent	Volumetric flask tolerance	Type B (rectangular)	0.1 to 0.3%
Instrument signal	Repeatability of measurement	Type A	0.3 to 1.0%
Calibration curve fit	Residuals around regression line	Type A or B	0.2 to 0.8%

When inputs are correlated, cross-product terms must be added. Correlation arises when the same reference standard is used to calibrate a series of measurements, or when the same balance is used to weigh both sample and tare. In practice, forensic procedures are often designed to minimise correlation, and many laboratories assume independence as an approximation, but the approximation should be documented and its validity confirmed.

See Confidence Intervals and What a Sample Supports for a complementary treatment of how interval estimates are interpreted in evidential contexts.

Expanded uncertainty and the coverage factor

The combined standard uncertainty u_c is a one-sigma equivalent. For reporting and court purposes, laboratories report the expanded uncertainty U = k * u_c, where k is chosen to achieve a desired coverage probability. For a normal (Gaussian) output distribution, k = 2 covers approximately 95.45 percent and k = 2.576 covers 99 percent. Most accredited forensic laboratories report at k = 2 unless a higher level is required by procedure or statute.

The coverage probability is only meaningful if the output distribution is well characterised. When the combined uncertainty has few degrees of freedom (because Type A components come from small replicate sets), the effective degrees of freedom should be estimated using the Welch-Satterthwaite formula, and the coverage factor taken from the t-distribution at the corresponding degrees of freedom rather than from the normal distribution. Laboratories with n = 3 replicates face substantially wider coverage factors than those with n = 20 or more.

Different jurisdictions handle the disclosure obligation differently. Under the UK Crown Prosecution Service Forensic Science Regulator's Codes of Practice, accredited laboratories must report expanded uncertainty at 95 percent and state the coverage factor. US federal guidelines for forensic DNA laboratories (FBI Quality Assurance Standards) require uncertainty documentation at the method-validation stage, with results expressed consistently with validated procedures. Under the Bharatiya Sakshya Adhiniyam 2023, expert opinion evidence carries the obligation of the expert to state the basis of the opinion, which includes the precision of any measurement on which the opinion rests.

Proficiency testing: design and scoring

Proficiency testing schemes are organised by external bodies: UKAS-approved providers in the UK, A2LA-affiliated providers in the US, NABL-affiliated providers in India, and bodies operating under ILAC mutual recognition agreements across the EU. An organiser prepares or sources materials of known or consensus composition, distributes them blind to participating laboratories, collects results, calculates the assigned value and the standard deviation for proficiency assessment (s_PT), and scores each result.

The z-score is the standard scoring statistic defined in ISO 13528: z = (x - X) / s_PT. The assigned value X may be the certified value of a reference material, the consensus median of participants, or a value obtained from a reference laboratory. The s_PT is typically set using a fitness-for-purpose approach: it reflects the standard deviation that would be tolerable given the end use of the measurement, not simply the average performance of participants. Setting s_PT too wide allows systematic errors to pass undetected; setting it too narrow fails well-performing laboratories.

ISO 13528 defines performance categories: |z| ≤ 2 is satisfactory, 2 < |z| < 3 triggers an action signal, and |z| ≥ 3 is unsatisfactory. Two consecutive unsatisfactory results in the same analyte or method typically trigger a mandatory investigation under most accreditation body rules. Some schemes also compute the zeta-score (z'), which incorporates the laboratory's own stated uncertainty alongside s_PT, allowing the score to reflect the realism of the laboratory's uncertainty claim.

Uncertainty in evaluative opinions and court reporting

An evaluative opinion translates a measurement result into an inference about a case proposition. The uncertainty of the measurement is not separate from that inference: it directly affects the likelihood ratio or probability assigned to competing propositions. For a finding such as 'the blood alcohol concentration was 0.095 g/100 mL', where the legal limit is 0.08 g/100 mL, an expanded uncertainty of 0.010 g/100 mL means the interval [0.085, 0.105] g/100 mL is consistent with the data at 95 percent probability.

Some jurisdictions have adopted a worst-case convention for threshold enforcement: the reported value is the measured value minus the expanded uncertainty, so the laboratory asserts only what the data can support at the lower confidence bound. The UK Road Traffic Act enforcement practice applies this convention for evidential breath testing. Other jurisdictions report the central value and require the court to apply the uncertainty interval when drawing inferences. Neither approach changes the underlying science; the difference is where the risk of error is allocated between prosecution and defence.

The Role of Statistics in Evidence Evaluation topic covers the broader framework of how numerical evidence feeds into the court's decision. See Role of Statistics in Evidence Evaluation for that context. At the reporting stage, the practical requirements are: state the result, state the expanded uncertainty, state the coverage factor and probability, and explain in plain language what the interval means for the specific question the court is deciding.

Accreditation under ISO/IEC 17025 requires documented uncertainty procedures, PT participation, and uncertainty disclosure in test reports. The ILAC Policy on Measurement Uncertainty in Calibration (ILAC P14) and the EURACHEM/CITAC guide 'Quantifying Uncertainty in Analytical Measurement' (CG4) provide additional implementation guidance used in forensic laboratories across Europe, the UK, India, and countries that have adopted ISO-aligned accreditation frameworks.

Proficiency testing in accreditation and legal proceedings

National accreditation bodies mandate PT participation as a condition of ISO/IEC 17025 accreditation. In the UK, UKAS requires forensic laboratories to participate in appropriate PT schemes at a frequency commensurate with the test volume and risk level. In the US, the FBI's Quality Assurance Standards for forensic DNA testing require PT twice annually, with results reviewed by technical reviewers and reported to the laboratory director. In India, NABL (National Accreditation Board for Testing and Calibration Laboratories) requires PT participation under its accreditation criteria aligned to ISO/IEC 17043.

PT results can be disclosed in legal proceedings. Defence teams in several jurisdictions have successfully obtained PT records through discovery procedures and used patterns of unsatisfactory performance to challenge the reliability of a laboratory's results in a specific case. Courts in the US (under Daubert and related standards), the UK (under Criminal Procedure Rules), and the EU (under national procedural laws implementing Directive 2016/343) have addressed the admissibility and weight of PT evidence as part of reliability challenges to forensic evidence.

A single unsatisfactory PT result does not automatically invalidate case results produced during the same period, but it triggers an obligation to investigate whether the PT failure represents a systemic problem affecting case work, and to disclose that investigation to the court if it does. Laboratories should have documented procedures specifying the steps taken when PT performance is unsatisfactory, including retrospective review of affected case results, root cause analysis, and corrective action before the next testing round.

Worked example

Calculating combined uncertainty for a blood alcohol determination

A forensic toxicologist reports a blood ethanol concentration from a headspace GC-FID method. The calculation involves three contributing uncertainty components: the calibration standard concentration, the instrument repeatability, and the dilution factor. This example traces the GUM propagation steps.

The measurement equation is C_sample = (A_sample / A_cal) * C_cal * D, where A denotes the peak area ratio to internal standard, C_cal is the certified concentration of the calibration standard, and D is the dilution factor applied to the blood sample. Three independent uncertainty components are identified.

Calibration standard uncertainty (Type B). The certificate states C_cal = 1.000 g/L with U = 0.008 g/L at k = 2. Standard uncertainty: u(C_cal) = 0.008 / 2 = 0.004 g/L. Relative standard uncertainty: u_r(C_cal) = 0.004 / 1.000 = 0.40%.
Instrument repeatability (Type A). Ten replicate analyses of a 0.80 g/L working standard give a mean area ratio of 0.8012 with a standard deviation of 0.0045. Standard uncertainty of the mean: u_A = 0.0045 / sqrt(10) = 0.00142. Relative: u_r(instrument) = 0.00142 / 0.8012 = 0.18%.
Dilution factor (Type B). The blood is diluted 1:10 using a calibrated pipette with stated tolerance 0.5% (rectangular distribution). Standard uncertainty: u_r(D) = 0.005 / sqrt(3) = 0.29%.
Combined relative standard uncertainty. u_r(C_sample) = sqrt[ (0.40%)^2 + (0.18%)^2 + (0.29%)^2 ] = sqrt[ 0.160 + 0.032 + 0.084 ] = sqrt(0.276) = 0.525%. For a result of 0.095 g/100 mL, the combined standard uncertainty is 0.095 * 0.00525 = 0.000499 g/100 mL.
Expanded uncertainty at k = 2. U = 2 * 0.000499 = 0.001 g/100 mL (rounded to 1 significant figure). The reported result is 0.095 g/100 mL with an expanded uncertainty of 0.001 g/100 mL at a coverage probability of approximately 95 percent. The interval [0.094, 0.096] g/100 mL is consistent with the data, and the entire interval lies above the 0.08 g/100 mL legal limit. In this case the uncertainty does not affect the threshold inference, but the disclosure is still mandatory.

Check your understanding

Question 1 of 4· 0 answered

A balance specification states a maximum permissible error of plus or minus 0.2 mg. When this is treated as a Type B uncertainty with a rectangular distribution, what is the standard uncertainty?

Key Takeaways

The GUM framework classifies uncertainty components as Type A (estimated from repeated observations) or Type B (estimated from calibration certificates, specifications, or expert judgement), combines them in quadrature using sensitivity coefficients, and expands the result with a coverage factor to achieve a stated probability, typically 95 percent.
Uncertainty propagation through a multi-step forensic calculation requires identifying all contributing inputs, assigning standard uncertainties to each, applying the partial-derivative sensitivity coefficients from the measurement equation, and summing the weighted squared contributions before taking the square root.
When a measurement result's expanded uncertainty interval spans a legal threshold, the laboratory cannot assert on which side the true value lies; court reports must explicitly state this, and the applicable jurisdictional convention (worst-case reporting or central-value with interval) must be followed.
Proficiency testing z-scores (|z| ≤ 2 satisfactory, ≥ 3 unsatisfactory under ISO 13528) provide the most objective external check on whether a laboratory's declared uncertainty budget is realistic; systematic bias and inflated or understated uncertainty claims are both detectable through PT performance patterns.
ISO/IEC 17025 requires that forensic test reports state the measurement uncertainty when it is relevant to the interpretation of results; PT records can be disclosed in legal proceedings and used to challenge the reliability of a laboratory's case-work results under the admissibility standards applicable in the UK, US, EU, and India.

What is measurement uncertainty and why does it matter in forensic science?

Measurement uncertainty is a parameter that characterises the spread of values that could reasonably be attributed to a measured quantity. In forensic science it matters because every numerical result, whether a blood alcohol concentration, a drug purity figure, or a trace element ratio, carries inherent doubt arising from instrument noise, calibration limits, and sampling variation. Reporting a result without its uncertainty misrepresents the precision of the evidence and can mislead a court about the strength of the finding.

What is the GUM framework?

The GUM (Guide to the Expression of Uncertainty in Measurement), published jointly by BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML, is the internationally accepted standard for quantifying and reporting measurement uncertainty. It defines two types of uncertainty evaluation: Type A, based on statistical analysis of repeated measurements, and Type B, based on scientific judgement using calibration certificates, published data, or instrument specifications. The combined standard uncertainty is obtained by combining all components in quadrature.

How is uncertainty propagated through a forensic calculation?

When a result depends on several measured inputs, the uncertainty in each input contributes to the uncertainty in the final result. For a function of independent inputs, the combined uncertainty is the square root of the sum of squared partial derivatives multiplied by the respective input uncertainties. In practice, forensic laboratories use sensitivity coefficients derived from the measurement equation to propagate uncertainty through steps such as dilution factors, calibration curve fitting, and density corrections.

What is proficiency testing and how does it relate to declared uncertainty?

Proficiency testing (PT) is a scheme in which an external organiser distributes samples of known or consensus composition to participating laboratories, which analyse them without knowing the reference value. Each laboratory's result is compared to the reference value and scored, most commonly using the z-score, which expresses the deviation in units of the standard deviation of the assigned value. If a laboratory's declared uncertainty is realistic, the majority of its PT z-scores should fall within plus or minus two. Systematic bias or inflated uncertainty claims are exposed when results consistently miss the target or when the z-scores are implausibly small.

How should uncertainty be incorporated into evaluative opinions and court reports?

An evaluative opinion should use the uncertainty interval to bound the range of values consistent with the measured result. For a substance concentration that determines guilt or innocence relative to a legal threshold, the uncertainty interval must be explicitly considered: if the threshold falls within the interval, the opinion should acknowledge that the measurement alone cannot resolve whether the threshold was crossed. Court reports in accredited laboratories are expected to state the expanded uncertainty at a specified coverage probability, typically 95 percent, alongside the reported value.

Test yourself on Forensic Statistics with free, timed mocks.

Practice Forensic Statistics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.