Skip to content

Method Validation and Fitness for Purpose

Method validation is the structured process of demonstrating that an analytical procedure reliably measures what it is supposed to measure, within defined limits, before it is used operationally in a forensic laboratory. Fitness for purpose ties those validation data to the specific evidentiary question, showing that the method's performance is adequate to support a likelihood ratio calculation or other evaluative conclusion.

Last updated:

Share

Method validation is the formal process by which a forensic laboratory demonstrates that an analytical procedure performs reliably enough to produce evidence that can be used in court. Before any method is deployed operationally, the laboratory must characterise its key performance parameters: the lowest concentration it can detect, the range over which its measurements are linear, its precision across repeated analyses, its accuracy against known reference materials, its selectivity against likely interferents, and its behaviour when operating conditions change slightly. These parameters are not collected once and filed away. They form the evidentiary backbone for every conclusion the method subsequently generates.

Fitness for purpose is a related but distinct concept. A validated method has known performance characteristics. A fit-for-purpose method has performance characteristics that are adequate for the specific evidentiary question in the case at hand. The distinction matters because a method can be rigorously validated and still be unsuitable for a particular application: if a drug-quantitation method has a limit of detection of 10 nanograms per millilitre but the forensic question requires a conclusion about a sample that likely contains 2 nanograms per millilitre, the method is not fit for that purpose, regardless of its internal validation status.

ISO 17025:2017, the international accreditation standard for testing and calibration laboratories, sets out the framework within which forensic laboratories must conduct and document validation. Discipline-specific guidance from bodies such as ENFSI (European Network of Forensic Science Institutes), SWGMAT and OSAC (United States), and NATA (Australia) builds on this foundation. Courts in the United Kingdom, the United States, India, and the European Union have increasingly scrutinised the validation basis of forensic evidence, making competent documentation of validation data a prerequisite for admissibility, not an optional quality supplement.

By the end of this topic you will be able to:

  • Define limit of detection, limit of quantitation, selectivity, precision, reproducibility, and robustness, and explain how each is estimated during a validation study.
  • Describe what ISO 17025:2017 requires from a forensic laboratory with respect to method validation and uncertainty of measurement.
  • Explain the concept of fitness for purpose and assess whether a given validation profile is adequate for a specific evidentiary question.
  • Identify how validation data, particularly error rates and precision parameters, feed into the construction of a likelihood ratio.
  • Recognise common validation failures and explain how they affect the reliability of a forensic conclusion.
Key terms
Limit of detection (LOD)
The smallest quantity of an analyte that can be distinguished from background signal with a defined level of confidence, typically estimated as the blank mean plus three standard deviations of the blank. A result below the LOD cannot be reported as a positive finding.
Limit of quantitation (LOQ)
The smallest quantity at which a measurement can be reported with acceptable precision and accuracy, typically the blank mean plus ten standard deviations. Above the LOQ, numerical values can be attached to the measurement; between LOD and LOQ, the analyte is detected but not reliably quantified.
Selectivity
The ability of a method to measure the target analyte specifically, even when other substances that might plausibly appear in a real sample are present. Tested by spiking samples with known interferents and confirming the target signal is not distorted.
Repeatability
The closeness of agreement between repeated measurements of the same sample by the same analyst on the same instrument under identical conditions over a short time. A component of within-laboratory precision.
Reproducibility
The closeness of agreement between measurements obtained under changed conditions: different analysts, different instruments, different laboratories, or different times. A wider measure of precision that reflects how a result would transfer across forensic settings.
Robustness
The capacity of a method to remain unaffected by small, deliberate variations in its operating parameters, such as slight changes in temperature, reagent concentration, or extraction time. A method that passes robustness testing tolerates the normal variation of a working laboratory without producing unreliable results.

ISO 17025 and the regulatory framework for validation

ISO 17025:2017 is the international standard governing competence in testing and calibration laboratories. Accreditation to this standard is required for forensic laboratories in most jurisdictions that participate in mutual recognition arrangements, including those covered by ILAC (International Laboratory Accreditation Cooperation). The standard's clause 7.2 requires laboratories to validate all methods before use, define the scope of the validation in terms of the measurement procedure and its intended application, and retain records of the validation data.

The validation parameters that ISO 17025 expects to be addressed include: selectivity, range, linearity of response, detection and quantitation limits, precision (repeatability and within-laboratory reproducibility), accuracy or trueness against certified reference materials, measurement uncertainty, and robustness. Not all parameters apply to every method type. A qualitative method that reports a binary present/absent conclusion does not need a linearity assessment, but it does need a detection limit and a false-positive and false-negative rate. A quantitative method needs all of the above.

Accreditation bodies in different jurisdictions implement the standard through their own schemes. In the United Kingdom, UKAS (United Kingdom Accreditation Service) accredits forensic providers under ISO 17025 and publishes forensic-specific supplementary criteria. In the United States, ANAB (ANSI National Accreditation Board) and A2LA perform similar roles. In India, NABL (National Accreditation Board for Testing and Calibration Laboratories) accredits forensic science laboratories and references ISO 17025 as the normative document. The substantive validation requirements are the same across all of these schemes because they all derive from the same international standard.

Limit of detection, limit of quantitation, and their forensic significance

The limit of detection (LOD) defines the boundary between a true negative and a result that is below detection. It is conventionally estimated from replicate measurements of a blank (a sample known to contain no analyte) by calculating the mean blank signal and adding three standard deviations. A sample producing a signal above this threshold is said to give a detectable response. The choice of three standard deviations corresponds approximately to a 1-in-300 false-positive rate at the detection boundary.

The limit of quantitation (LOQ) is a stricter threshold, typically the blank mean plus ten standard deviations. Above the LOQ, the measurement carries enough signal-to-noise that a numerical value with stated precision can be assigned. Between the LOD and the LOQ, the analyte is present but cannot be reliably quantified. Forensic reports must be explicit about which threshold a result falls above or below: a result between LOD and LOQ should not be reported as a precise concentration.

ParameterDefinitionConventional thresholdReport language
LODSmallest detectable signal above blankMean blank + 3 SDDetected / Not detected
LOQSmallest signal with acceptable precisionMean blank + 10 SDQuantified as X ± U
Below LODSignal indistinguishable from blank< LOD thresholdNot detected
Between LOD and LOQDetected but not quantifiableLOD to LOQ rangeTrace detected, not quantified

Forensic significance turns on interpretation. A drug screening test with a high LOD may fail to detect a therapeutic dose, producing a false negative that exonerates when the drug was present. A DNA quantitation tool with an LOD lower than the amount actually present in a sample will flag the sample as suitable for profiling when in reality the profile will be incomplete. Both scenarios represent fitness-for-purpose failures even when the method's own LOD is accurately characterised.

Selectivity, precision, and robustness

Selectivity testing asks whether the method responds to the target analyte specifically or whether other substances produce the same or a similar signal. In forensic toxicology, a urine immunoassay screen for amphetamines is known to cross-react with certain decongestant medications. The selectivity of the assay is therefore limited, and a positive screen must be confirmed by a method with greater selectivity, typically gas chromatography-mass spectrometry (GC-MS), before a forensic conclusion can be drawn. The validation record should document which interferents were tested, at what concentrations, and what response they produced.

Precision encompasses two distinct concepts. Repeatability is the variation obtained when the same sample is analysed repeatedly by the same analyst on the same instrument under identical conditions in a short period. Reproducibility is the variation observed when conditions change: a different analyst runs the method, a different instrument is used, the sample is analysed on a different day, or a different laboratory runs the same sample. Both matter forensically. Repeatability sets the floor for how consistent a result is under ideal conditions. Reproducibility sets the realistic uncertainty when evidence travels between laboratories or when a result needs to be replicated.

Robustness is evaluated by intentionally varying method parameters within plausible ranges and observing whether performance degrades. A Plackett-Burman experimental design or a one-factor-at-a-time approach is common. Parameters tested might include extraction temperature, pH of the extraction solvent, mobile-phase composition in chromatography, or incubation time in immunoassay. A method that fails its own acceptance criteria when temperature shifts by 2 degrees Celsius has poor robustness, and that fragility must be documented in the validation record and managed operationally through tighter environmental controls.

Measurement uncertainty and its role in conclusions

ISO 17025:2017 and the GUM (Guide to the Expression of Uncertainty in Measurement, published by BIPM) require that every quantitative measurement be accompanied by a statement of its uncertainty. Measurement uncertainty is not the same as error. Error is the difference between a measured value and the true value. Uncertainty is the range within which the true value is expected to lie, at a stated level of confidence, given everything that is known about the measurement process.

A blood alcohol concentration reported as 82 mg/100 mL with a measurement uncertainty of plus or minus 3 mg/100 mL at 95% confidence means the true value is expected to lie between 79 and 85 mg/100 mL in nineteen cases out of twenty. In a jurisdiction where the legal limit is 80 mg/100 mL, the reported value of 82 mg/100 mL does not unambiguously place the true value above the limit. How to handle that ambiguity is a legal and policy question, but the forensic scientist must report the uncertainty so that the question can be addressed honestly.

Uncertainty has two components: Type A uncertainty is estimated from statistical analysis of repeated measurements, corresponding roughly to the standard deviation. Type B uncertainty is estimated from other sources of information, such as the uncertainty stated for a reference material, instrument calibration certificates, or published scientific knowledge about a process. A complete uncertainty budget accounts for both types and combines them using the law of propagation of uncertainty before expressing the combined uncertainty as an expanded uncertainty at a defined coverage probability.

Fitness for purpose: linking validation to the evidentiary question

A method is fit for purpose when its validated performance is adequate to answer the specific evidentiary question posed in a particular case. This requires comparing the method's performance profile against the demands of the question. If the question is whether a blood sample contains more than 50 micrograms per litre of a substance and the method's LOQ is 200 micrograms per litre, the method is not fit for that question. If the question is whether two samples share the same source and the method's between-laboratory reproducibility is wide enough that two samples from different sources could easily produce the same result, the method does not discriminate well enough to support a strong likelihood ratio.

The fitness-for-purpose assessment is the bridge between the validation record and the case report. It should appear explicitly in any case where the question at issue approaches the limits of the method's performance. A method that is adequate for routine casework at high concentration may be inadequate for a specific case involving trace quantities or an unusual matrix. The analyst must make this assessment case by case, not assume that accreditation to ISO 17025 implies universal adequacy.

International frameworks support this requirement. The ENFSI Guideline for Evaluative Reporting explicitly links the strength of a likelihood ratio to the quality of the underlying model, which in turn depends on validation data. The ILAC G19 guidelines for forensic science laboratories state that uncertainty and method limitations must be considered when formulating conclusions. Courts in the United Kingdom, United States, and Australia have all encountered cases where evidence was excluded or conviction overturned because the method's fitness for the specific purpose was not adequately demonstrated.

How validation data underpin the likelihood ratio

A likelihood ratio for forensic evidence requires the analyst to assign probabilities to the observed result under two competing propositions, typically that the suspect is the source of the trace and that some other unrelated person is the source. Both probabilities depend on a model of how the measurement behaves. That model has parameters: the mean and variance of measurements from same-source comparisons, the mean and variance of measurements from different-source comparisons, and the error rates of the method itself.

Validation data provide those parameters. If the method's inter-laboratory reproducibility standard deviation is large, same-source measurements from the same individual will scatter widely, and so will different-source measurements. The overlap between the two distributions will be large, and the LR will be modest even for genuine matches. A method with tight reproducibility produces distributions with less overlap, and a given degree of similarity between two measurements supports a larger LR. The validation study is, in statistical terms, the characterisation of the distributions that the LR calculation will use.

Error rates require special attention. A method with a known false-positive rate of 1 in 100 cannot support an LR of 10,000 for a match: the claimed discrimination exceeds what the method's error rate permits. This is sometimes called the error rate ceiling on the LR. The validation study must include assessments with samples from known different sources so that the false-positive rate is empirically established, not assumed to be zero. SWGMAT and OSAC standards, the ENFSI evaluative reporting guideline, and the UK Forensic Regulator's Codes of Practice all address this point explicitly. For more on how LR values are constructed and used, see Role of Statistics in Evidence Evaluation and Numbers in Forensic Conclusions.

Check your understanding
Question 1 of 4· 0 answered

A blank blood sample has a mean signal equivalent to 1.0 mg/100 mL ethanol, with a standard deviation of 0.5 mg/100 mL. What is the limit of detection?

Key Takeaways

  • Method validation establishes the performance profile of an analytical procedure, including its LOD, LOQ, selectivity, precision, reproducibility, and robustness, before it is used to generate forensic evidence.
  • ISO 17025:2017 is the international accreditation standard that mandates validation for all forensic laboratory methods; discipline-specific guidance from ENFSI, OSAC, and national bodies specifies additional requirements for particular forensic disciplines.
  • Fitness for purpose is a case-level assessment: a well-validated method may still be inadequate for a specific evidentiary question if, for example, the concentrations of interest fall below the method's LOQ or the matrix is outside the validation scope.
  • Measurement uncertainty must be calculated and reported with every quantitative forensic result; omitting it is both a non-compliance with ISO 17025 and a failure to give the court the information it needs to evaluate the evidence.
  • Validation data directly underpin likelihood ratio calculations: the precision parameters define the distributions used in the LR model, and the false-positive rate sets an empirical ceiling on the LR values that the method can legitimately support.
What is the difference between method validation and fitness for purpose?
Method validation establishes what a method can and cannot do: its limit of detection, selectivity, precision, reproducibility, and other performance characteristics. Fitness for purpose goes one step further and asks whether those performance characteristics are adequate for the specific evidentiary question being addressed. A method may be well validated yet still not fit for a particular purpose if, for example, its limit of detection is too high to detect the concentration of interest in a given case.
What is the limit of detection in forensic method validation?
The limit of detection (LOD) is the smallest quantity of an analyte that can be reliably distinguished from background noise under defined conditions. It is typically estimated as the concentration corresponding to the mean blank signal plus three standard deviations of the blank. In forensic contexts, the LOD determines whether a negative result means the analyte is absent or simply below the method's detection capability, a distinction that can be critical for evidential interpretation.
What does ISO 17025 require for forensic laboratory method validation?
ISO 17025:2017 requires laboratories to validate all methods before use, documenting the validation parameters relevant to the method's intended scope. For quantitative methods this typically includes LOD, limit of quantitation, linearity, precision (repeatability and reproducibility), accuracy, selectivity, and measurement uncertainty. The standard also requires that validation records be retained and that any changes to a validated method trigger re-validation of the affected parameters.
How does method validation data underpin a likelihood ratio calculation?
A likelihood ratio (LR) for forensic evidence requires a statistical model of how the evidence is expected to behave under the prosecution and defence propositions. That model depends on the method's known error rates, precision, and the distribution of measurements across reference populations. Without validation data, the analyst cannot assign reliable probabilities to the observed result under either proposition, making the LR numerically unstable or scientifically indefensible.
What is selectivity in forensic method validation?
Selectivity is the degree to which a method produces a response specifically for the target analyte in the presence of other substances that might be expected in a real case sample. A method with poor selectivity may give a positive result for a substance other than the one being sought, or may have its measurement of the target analyte interfered with by co-occurring compounds. Selectivity testing involves deliberately spiking samples with likely interferents and confirming that the method's response is attributable to the target alone.

Test yourself on Forensic Statistics with free, timed mocks.

Practice Forensic Statistics questions

Found this useful? Pass it along.

Share

Spotted an error in this page? Report a correction or read our editorial standards.

Your journey to becoming a forensic professional starts here.

Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.