Method Validation, Measurement Uncertainty and Proficiency Testing

Your journey to becoming a forensic professional starts here.

Practice with national-level exam (FACT, FACT Plus, NET, CUET, etc.) mocks, learn from structured notes, and get your doubts solved in one place.

Start Free Mock Test Create Your Account

Method Validation, Measurement Uncertainty and Proficiency Testing | ForensicSpot

Method validation, measurement uncertainty and proficiency testing are the three pillars that decide whether a forensic instrumental result holds up at trial or falls apart on cross-examination. Validation is the documented evidence that the method does what the laboratory says it does. Measurement uncertainty is the honest statement of how wide the confidence interval really is. Proficiency testing is the external check that the laboratory's number agrees with what other accredited labs get on the same blind sample. Take any one away and the chemical examiner's certificate under Bharatiya Sakshya Adhiniyam Section 63 is open to challenge.

This page covers the validation parameters codified by ISO 17025:2017 and NABL Document 141, the GUM-aligned uncertainty budget from JCGM 100:2008, the Shewhart control charts that catch instrument drift, and the PT schemes Indian laboratories participate in. The defence bar in trial courts has caught up faster than most analysts assume. "What is the measurement uncertainty on this number?" is now routine in NDPS and Section 63 cross-examinations, and "we do not calculate it" is no longer an acceptable answer.

Key terms

Method validation: Documented exercise that proves a method is fit for purpose for a defined matrix, analyte and concentration range. Required by ISO 17025:2017 clause 7.2.2 and NABL Document 141, inspected on every accreditation visit.
Measurement uncertainty (MU): The dispersion of values reasonably attributable to the measurand. Calculated through GUM (JCGM 100:2008) and reported as an expanded uncertainty U at a coverage factor (usually k=2) corresponding to roughly 95 percent confidence.
Accuracy and precision: Accuracy is closeness to the true value, measured as recovery from a spiked CRM with the 80 to 120 percent window. Precision is the spread of replicates as RSD: intra-day typically 1 to 5 percent for a well-tuned LC-MS/MS, inter-day 2 to 10 percent.
LOD and LOQ: Limit of detection is three times the SD of blank or S/N of 3:1. Limit of quantitation is ten times SD of blank or S/N of 10:1. Below LOD: 'not detected'. Between LOD and LOQ: 'detected, below LOQ' rather than a number.
Proficiency testing (PT): Inter-laboratory comparison scored as a z-score: |z| below 2 satisfactory, 2 to 3 questionable, above 3 unsatisfactory. ISO 17025 clause 7.7.2 requires regular participation.
Control chart: Time-series plot of a QC measurement with warning limits at ±2σ and action limits at ±3σ from validation data. Patterns such as seven consecutive points on one side trigger investigation before drift reaches case work.

Method validation: what ISO 17025 and NABL 141 demand

A validation file is a defensive document, written for the day a defence counsel asks for it.

ISO 17025:2017 clause 7.2.2 makes validation mandatory for any non-standard or laboratory-developed method, which covers almost every forensic toxicology and chemistry method at an Indian SFSL or CFSL. NABL Document 141 inspects the file on every accreditation visit. The Bharatiya Sakshya Adhiniyam does not name validation in Section 63, but the courts read into the section a requirement that the method be one a competent expert would accept as reliable, and the file is how a laboratory documents that. The parameter list is consistent across ICH Q2(R2), the FDA Bioanalytical guidance with 2022 ICH M10, AOAC, SOFT and SAMHSA, and the WADA International Standard for Laboratories.

Accuracy is percentage recovery from a spiked CRM with the 80 to 120 percent window for forensic toxicology. Precision splits into repeatability (intra-day, RSD below 5 percent), intermediate precision (inter-day with different operators, 5 to 15 percent) and reproducibility (inter-laboratory, what PT tests). Linearity needs R² above 0.995 across five to seven calibrators in triplicate, with back-calculated concentrations within ±15 percent of nominal (±20 percent at LOQ). A high R² alone is not enough; an over-fitted polynomial can show R² of 0.999 and still fall apart at the extremes.

LOD and LOQ come from statistics, not eyeball. The 3σ method runs ten blank-matrix injections, takes the SD of the response, and reports LOD as three times that SD; LOQ is the same with a factor of ten. Specificity is the demonstration that no interfering peak appears at the analyte's retention time and m/z transition in a blank matrix or with structural analogues. Robustness is the demonstration that small intentional variations (pH ±0.1, column temperature ±2 °C, flow ±5 percent) do not shift the result outside the precision window. Carry-over checks that a blank following the highest calibrator gives no peak above 20 percent of LOQ. System suitability is the pre-run check that catches an instrument drifting before the batch begins.

Measurement uncertainty: the GUM approach in practice

Every forensic number has an uncertainty. Calculate it and report it.

The Guide to the Expression of Uncertainty in Measurement (GUM, JCGM 100:2008) is the international consensus on how to compute and report uncertainty. NABL has adopted it through TR-001 and ISO 17025 clause 7.6. The BSA Section 63 certificate is now expected, as best practice, to include the expanded uncertainty alongside the reported value.

GUM splits contributions into Type A (statistical evaluation of repeated measurements, the SD of the mean) and Type B (the calibration certificate of the balance, the CRM's stated uncertainty, the volumetric flask's tolerance, temperature variation, matrix inhomogeneity). Each Type B contribution becomes a standard uncertainty by dividing the tolerance by an appropriate divisor (square root of three for a rectangular distribution, two for a stated 95 percent confidence interval). The combined standard uncertainty u_c is propagated through the measurement equation; for the multiplicative model most forensic methods use, u_c/c is the square root of the sum of (u_i/x_i)² across every contribution. The expanded uncertainty U is u_c multiplied by a coverage factor k, with k=2 giving roughly 95 percent confidence. The convention is "ethanol 110 ± 5 mg/100 mL (k=2)".

Bottom-up, the strict GUM method, identifies every contribution from first principles and computes a full budget; rigorous but laborious. Top-down uses validation data, CRM bias and PT performance to estimate combined uncertainty empirically. NABL accepts top-down for routine forensic work. For a typical LC-MS/MS toxicology method, contributions are repeatability (1 to 3 percent), calibration curve (0.5 to 2 percent), reference standard (0.3 to 1 percent for Sigma, 0.1 to 0.5 percent for IPC), sample inhomogeneity (0.5 to 2 percent for whole blood) and matrix effect on ionisation (1 to 3 percent with a deuterated IS). Combined in quadrature, u_c typically lands at 2 to 4 percent and expanded U at k=2 at 4 to 8 percent.

Control charts and proficiency testing: catching drift and bias

Internal QC catches drift. PT catches bias.

A validated method is not a permanent guarantee. Instruments drift, reference standards age, columns degrade. The Shewhart control chart catches this: a time-series plot of a QC measurement (spiked blank or CRM) against date, with warning limits at ±2σ and action limits at ±3σ from validation data. A point outside the action limits triggers investigation. Trend rules catch drift that single-point limits miss: seven consecutive points on one side of the mean indicates a systematic shift even within warning limits; a steady trend of six or more points often traces to a column ageing or a detector lamp dimming. The Western Electric rules formalise these patterns. The CUSUM chart catches a 0.5σ shift in five to ten measurements where Shewhart needs twenty to thirty. CFSL Chandigarh runs NIST SRM 1577c monthly on the ICP-MS panel and tracks lead, arsenic, mercury and cadmium on individual Shewhart charts.

Internal QC catches drift but not bias. A laboratory that consistently quantitates morphine 12 percent low because of an extraction step that strips the analyte will get tight Shewhart charts and a wrong number every time. The only way to catch systematic bias is proficiency testing.

The standard scoring statistic is the z-score: participant minus assigned value, divided by the target SD. |z| below 2 is satisfactory, 2 to 3 questionable, above 3 unsatisfactory. The target SD comes from the CRM's certified uncertainty, the consensus SD of participants (Algorithm A from ISO 13528, robust to outliers), or a fitness-for-purpose value. The En score incorporates each laboratory's stated MU. ISO 17025 clause 7.7.2 requires regular PT participation; NABL 141 expects at least annual participation for every major analyte class. An unsatisfactory result triggers investigation, root-cause analysis, CAPA, and retest where the scheme allows. Repeated unsatisfactory results without effective CAPA can cost the affected scope from accreditation.

Provider	Scope	Frequency	Indian participants

Common pitfalls in Indian validation and PT practice

Five mistakes NABL auditors and defence counsel both look for.

Over-fitted calibration curves are the most common: R² reads 0.9999 but back-calculated concentrations at the lowest and highest calibrators deviate by 25 percent because the analyst fitted a quadratic where a linear model belonged. Skipping intermediate precision is the second; files often report intra-day RSD only across six replicates within an hour, and the inter-day RSD across three to five days never makes it into the file even though that is what case work actually experiences. LOD and LOQ from a single low-concentration injection are the third; the chromatographer eyeballs the lowest standard at S/N of around 10:1 and writes that into the LOQ field instead of running the 3σ and 10σ calculation from blank-replicate SD.

Failing to revalidate after a meaningful method change is the fourth. Column lot, mobile phase composition, internal standard swap to a deuterated isotopologue. Each should trigger at least a partial revalidation. A defence expert who notices the change in the SOP revision history can use this against the certificate. Missing the MU statement on the Section 63 certificate is the fifth, and the one with the most direct courtroom consequence: the validation file has the calculation, the SOP references it, the certificate (written from a template that predates the requirement) omits it. The fix is a one-line template change.

Worked example

Blood ethanol by headspace GC-FID: validation, MU and PT package

The numbers that go on the certificate, traced through the validation file to the PT scheme.

A state SFSL validates blood ethanol by headspace GC-FID on an Agilent 7890 with a CTC PAL autosampler and an HP-INNOWax column.

Linearity: a seven-point curve at 10, 25, 50, 100, 200, 300 and 400 mg/100 mL in spiked whole blood, run in triplicate over three days. R² 0.9998; back-calculated concentrations within ±6 percent at every calibrator. LOD 2.0 mg/100 mL by the 3σ method on ten blank-blood injections; LOQ 6.0. The working range brackets the 30 mg/100 mL Motor Vehicles Act threshold.

Recovery at 80, 200 and 320 mg/100 mL: 98.5, 99.2 and 100.1 percent. Intra-day RSD 1.2 to 2.8 percent; inter-day 1.8 to 3.2 percent. Specificity verified against blanks from twenty drug-free volunteers and against blood spiked with methanol, isopropanol, acetone and acetaldehyde at 50 mg/100 mL; no interference.

MU budget, top-down: u(repeatability) 1.5 percent, u(calibration) 0.8 percent, u(reference) 0.5 percent, u(matrix) 1.0 percent. u_c is 2.1 percent; expanded U at k=2 is 4.2 percent.

A road-traffic case sample quantitates at 105 mg/100 mL. The Section 63 certificate reports "ethanol 105 mg/100 mL ± 4 mg/100 mL (k=2, approximately 95 percent confidence)" with the SOP reference, calibration date and CRM batch number. The result is well above the 30 mg/100 mL threshold and the uncertainty bracket does not bring it close.

The most recent NABL annual PT cycle distributed a blind whole-blood sample assigned at 142 mg/100 mL with target SD 8. The laboratory returned 144.4. The z-score is (144.4 − 142) / 8 = 0.3, comfortably satisfactory. Validated method, MU on the certificate, satisfactory PT, documented CRM. That is what makes the certificate hard to challenge.

Practice

Question 1 of 5· 0 answered

Under ISO 17025:2017 and NABL Document 141, the accepted RSD for intra-day precision of a forensic toxicology LC-MS/MS method at a working-range concentration is typically:

Frequently asked questions

What is method validation and why does ISO 17025 require it?

Method validation is the documented exercise that proves a method is fit for purpose for a defined matrix, analyte and concentration range. ISO 17025:2017 clause 7.2.2 makes it mandatory for any non-standard or laboratory-developed method. The file documents accuracy, precision, linearity, LOD, LOQ, specificity, robustness, recovery, range and where relevant carry-over and system suitability. NABL Document 141 inspects the file on every accreditation visit, and the defence is entitled to inspect it under BSA Section 63.

What is the difference between accuracy and precision?

Accuracy is closeness to the true value, measured as recovery from a spiked CRM with the 80 to 120 percent window. Precision is the spread of replicates as RSD: repeatability (intra-day), intermediate precision (inter-day with different operators) and reproducibility (inter-laboratory, via PT). A method can be precise but inaccurate (consistent but biased) or accurate but imprecise (correct on average with a wide spread).

How is measurement uncertainty calculated using GUM?

GUM (JCGM 100:2008) splits contributions into Type A (statistical evaluation of repeated measurements) and Type B (calibration certificates, CRM uncertainty, environmental variation, sample inhomogeneity). Each is converted to a standard uncertainty and propagated through the measurement equation; for a multiplicative model the relative contributions are summed in quadrature to give u_c. The expanded U is u_c multiplied by k, with k=2 giving roughly 95 percent confidence. NABL accepts both bottom-up and top-down approaches.

What does a z-score above 3 mean in proficiency testing?

A z-score above 3 is unsatisfactory under ISO 13528. The result lies more than three target SDs from the assigned value. The required action is investigation of the cause (calibration error, extraction loss, contamination, reference standard degradation, transcription error), root-cause analysis, CAPA documented in the QMS, and retest where the scheme allows. A repeated pattern without effective CAPA can cost the affected scope from accreditation.

NABL PT scheme	Forensic toxicology, chemistry, environmental	Annual	All NABL-accredited SFSLs and CFSLs
Collaborative Testing Services (CTS, USA)	Forensic chemistry, toxicology, trace, firearms	Twice yearly	CFSL Chandigarh, CFSL Hyderabad, FSL Madhuban
UNODC International Collaborative Exercises	Seized drug identification and quantitation	Twice yearly	CFSL Chandigarh, CDSCO regional labs
GeT-RM (USA)	Forensic DNA, mitochondrial, Y-STR	Annual	CFSL DNA divisions, state DNA units
WADA EQAS	Anti-doping, threshold substances, IRMS	Three times per year	NDTL Delhi (only WADA-accredited Indian lab)
SoHT proficiency	Hair toxicology, drugs in keratinised matrices	Annual	Specialist hair-toxicology units