Tool-Mark Comparison Microscopy and 3D Imaging

The comparison-stage workflow: side-by-side comparison microscopy with the cross-mounted dual stage and optical bridge; modern 3D imaging platforms (Cadre TopMatch, NIST IBIS-class systems, Foster + Freeman Evofinder for toolmark + cartridge-case + bullet acquisition); the Congruent Matching Cells (CMC) and Algorithmic Comparison Score frameworks; the AFTE Theory of Identification 1992 + 2011 standard and the courtroom debate post-PCAST 2016 + 2024 OSAC update.

Last updated: 17 Jun 2026

Tool-mark comparison microscopy places a questioned mark and a reference test mark in the same visual field through an optical bridge, allowing the examiner to assess striation alignment at magnifications that reveal features 5-50 micrometres wide. The method has anchored pattern-comparison work on bullets, cartridge cases, and tool marks since Calvin Goddard and Phillip O. Gravelle demonstrated it in 1925. Modern practice adds a second layer: three-dimensional surface topography acquired by confocal or structured-light scanning, processed by the Congruent Matching Cells (CMC) algorithm to produce a quantitative comparison score that supplements the optical examination. The scoring output, when converted to a likelihood ratio under the Algorithmic Comparison Score (ACS) framework, meets the ENFSI evidential-weight reporting standard now required in several European jurisdictions.

The comparison microscope joins two microscopes with an optical bridge, placing the questioned and known specimens in the same visual field simultaneously. Since Calvin Goddard and Phillip O. Gravelle demonstrated the instrument in firearms examination in 1925, it has been the primary tool for pattern-comparison work on bullets, cartridge cases, and tool marks. Establishing that a visual match exists and validating its accuracy against a known error rate are, however, distinct problems, and the field has spent the decade since the PCAST 2016 report working to close that gap.

Key takeaways

Oblique illumination at 15-30 degrees from horizontal is the standard for striated-mark examination because it creates shadows that make fine striations (5-50 micrometres wide) visible; vertical illumination produces flat contrast that masks them.
The Congruent Matching Cells (CMC) algorithm divides two 3D height maps into cell grids and counts cells meeting topographic-correlation, slope-agreement, and height-offset criteria; it was developed and validated primarily on the NIST Ballistic Toolmark Research Database of bullet and cartridge-case comparisons.
The Algorithmic Comparison Score (ACS) converts a raw CMC cell-count into a likelihood ratio using calibrated probability distributions from known match and non-match populations.
CMC validation is strongest for bullet and cartridge-case comparisons; its performance on non-firearm tool marks (screwdrivers, bolt-cutters) requires laboratory-specific validation before the bullet-mark thresholds apply.
The 2024 OSAC guidelines require blind verification by a second examiner for all tool-mark identifications submitted as criminal evidence.

The response to this epistemological challenge has been a move toward computational tools that produce a quantitative comparison score supplementing the subjective match opinion. The NIST Ballistic Toolmark Research Database, the Congruent Matching Cells algorithm, and commercial 3D imaging platforms (Cadre's TopMatch, Foster and Freeman's Evofinder) all represent this transition.

Note on scope: the basic comparison microscope, including its optical design with the polarising and fluorescence attachments, is covered in the forensic-microscopy subject (Module 2) in the stereo and comparison microscope topic. This topic assumes that background and focuses on its application in tool-mark comparison, the 3D imaging workflow, and the scoring algorithms that support or challenge the optical examination. The AFTE framework and tool-mark formation mechanics that this workflow applies are introduced in the tool marks fundamentals topic. For the direct ballistics application, the bullet striation comparison topic shows how the same comparison microscopy and CMC methodology is applied to fired bullets.

By the end of this topic you will be able to:

Describe the optical-bridge setup, Koehler illumination procedure, and the role of oblique illumination in striated-mark examination.
Explain how confocal or structured-light scanning produces a 3D height map and identify the resolution parameters relevant to forensic tool-mark work.
Summarise how the CMC algorithm converts two surface topography datasets into a cell-count score and state the key limitations of applying bullet-mark validation to non-firearm tool marks.
Distinguish between an ACS likelihood ratio and a categorical AFTE conclusion and explain why CMC score values must not be presented directly as a probability of source.
Identify the post-PCAST 2016 requirements formalised in the OSAC 2024 guidelines, including blind verification, and their implications for casework documentation and courtroom disclosure.

The Comparison Microscopy Workflow

Setup and calibration. Before any tool-mark comparison, the analyst calibrates the comparison microscope using a stage micrometer to verify magnification accuracy. Both optical paths must be set to the same magnification, contrast, and illumination conditions. In reflected-light work (the standard mode for metallic substrates and casts), the Koehler illumination procedure centres the light source and maximises field uniformity. Oblique illumination, where the light source angle is lowered to 15-30 degrees from horizontal, increases the shadow contrast of fine striations and is routinely preferred for striated-mark work. The SWGMAT (Scientific Working Group for Materials Analysis) guidelines, and the successor OSAC documents, specify oblique illumination as the recommended technique for striation visibility.

Sample preparation. The questioned mark (on the substrate, or as a silicone-rubber or polyvinyl siloxane cast) and the test mark (made in a reference material under controlled conditions with the suspect tool) are mounted side by side on the comparison stage. Both are cleaned with isopropyl alcohol to remove surface contamination. For metal substrates, a light surface cleaning may be needed to remove oxidation. The stage is moved to position the most informative region of each mark in the field. Photomicrographs document each step, with both images captured at the same magnification and with the same lighting geometry.

Running the comparison. The examiner systematically traverses both marks, looking for a striation-zone alignment where the individual striations in the questioned mark align in position, width, and orientation with those in the test mark. When an alignment candidate is found, the stage position is recorded and the alignment is documented photographically. The examiner then considers whether the agreed features are class characteristics (not individualising), subclass characteristics (shared with multiple tools), or individual characteristics (sufficient for AFTE identification). The formal conclusion is documented in a report following the AFTE three-tier scheme.

Photographic documentation. US courts, UK courts, and Canadian courts have all established that tool-mark testimony requires documentary photograph evidence that the jury and opposing experts can review independently. The comparison photograph must show both specimens at the same magnification in the optical-bridge split-view, with a scale bar, and with the striation agreement region clearly indicated. In India, the BSA 2023 § 39 expert-opinion provision implies that the basis of the opinion (including the visual comparison record) should be documentable and reproducible.

3D Surface Topography Imaging

Three-dimensional imaging produces a height map of the specimen surface at micrometre-level resolution, converting the visible striation pattern into quantitative surface-height data. Forensic laboratories achieve this through confocal scanning microscopy or focus-variation microscopy, both white-light interferometry based, as used in the Cadre TopMatch system and the NIST Ballistic Toolmark Research Database acquisition platform.

Confocal microscopy basics. In a confocal microscope, a pinhole aperture eliminates out-of-focus reflected light. By scanning the specimen through a range of focal distances and recording the focal plane at which each surface point returns maximum signal intensity, the instrument builds a full 3D height map. Modern forensic confocal systems (Sensofar S neox, Leica DCM8, Bruker Contour GT) acquire data at x-y pixel spacings of 0.5-2 micrometres and z-height resolutions below 50 nanometres. A full striated mark acquired at 50x magnification over a 5 mm x 2 mm area may produce a dataset of 10 million height points.

The NIST IBIS research database. NIST has assembled a Ballistic Toolmark Research Database (NBTRB) containing 3D topographic datasets from bullet and cartridge-case comparisons made with a range of firearms. The primary purpose was to provide training and validation data for the CMC algorithm (below), but the database has also been used to develop surface-roughness descriptors for calibrating comparison microscope performance. The IBIS acronym in the _meta.ts summary refers to the Integrated Ballistic Identification System, a database operated by the Bureau of Alcohol, Tobacco, Firearms and Explosives (ATF) in the US and by INTERPOL internationally, which uses 2D imaging of bullet and cartridge-case surfaces for triage screening; the NIST 3D system is a research development aimed at replacing the IBIS 2D comparison with a more quantitative approach.

Foster and Freeman Evofinder. The Evofinder system from Foster and Freeman (UK) uses structured-light 3D scanning to acquire full surface topography of bullet bases, cartridge-case faces, and tool marks. The system includes a comparison workstation where two 3D surfaces can be rendered and rotated for visual alignment, and a software module that computes a topographic correlation score. The Evofinder has been evaluated in studies at the University of Lausanne (Switzerland), the Netherlands Forensic Institute, and the UK Forensic Science International laboratory network, providing the kind of international inter-laboratory validation that PCAST called for.

3D toolmark comparison workflow: from surface acquisition through height-map registration to algorithmic scoring and examiner review; the algorithm score supplements, not replaces, the optical examination.

The Congruent Matching Cells Algorithm

The Congruent Matching Cells (CMC) algorithm was developed at NIST by Tong, Song, and colleagues and described in the Journal of Research of the National Institute of Standards and Technology in 2014 and 2015. It addresses the core PCAST objection to subjective comparison by replacing the examiner's holistic match assessment with an automated quantitative score.

How CMC works. The algorithm divides the 3D surface topography of each specimen into a grid of small cells (typically 50-200 micrometres per side). For each cell pair (one from the questioned specimen, one from the known specimen), it tests whether the two cells satisfy three criteria: topographic correlation above a threshold, surface slope agreement within a tolerance, and height offset within a tolerance. Cells that pass all three tests are called "congruent." The CMC score is the count of congruent cells divided by the total number of valid cells tested. Same-source pairs produce a high score; different-source pairs produce a low score.

Validation studies. Tong and colleagues published validation results showing clear separation between same-source (match) pairs and different-source (non-match) pairs for bullet and cartridge-case comparisons using the NIST database. For tool marks specifically, Petraco and colleagues at John Jay College of Criminal Justice (New York) applied CMC-like scoring to striated marks from screwdrivers, and Gupta and colleagues at the Indian Statistical Institute (Kolkata) have explored analogous surface-correlation approaches for broader trace-evidence applications. The OSAC Firearms and Toolmarks Subcommittee has endorsed CMC validation as a model for the kind of foundational-validity study PCAST required.

Limitations. CMC was developed primarily for firearm-mark comparison (the NBTRB dataset consists of bullet and cartridge-case data). Its performance on non-firearm tool marks is less well characterised. The cell-size parameter, the correlation threshold, and the slope tolerance are all user-settable, and different settings produce different scores; a validated standard parameter set for tool marks beyond firearm marks has not yet been adopted. Courts receiving CMC scores as evidence face the question of what threshold score should trigger an identification versus inconclusive conclusion, and no universally-agreed threshold exists in the published literature as of 2025.

CMC cell evaluation: each cell pair from the questioned and known height maps is tested against three quantitative criteria; only cells passing all three are counted as congruent, and the CMC score is that count divided by total valid cells tested.

Algorithmic Comparison Score: Bayesian Integration

The Algorithmic Comparison Score (ACS) framework, developed at NIST as an extension of CMC, converts the raw comparison score into a likelihood ratio: the probability of observing that score if the marks came from the same tool, divided by the probability of observing that score if they came from different tools. A likelihood ratio above 1 supports a common-source conclusion; its magnitude reflects the strength of that support.

The framework requires two things: a model for the score distribution in the same-source population (calibrated on known match pairs), and a model for the score distribution in the different-source population (calibrated on known non-match pairs). If these distributions are well separated (as they appear to be for bullet comparison using the NBTRB), the likelihood ratio can be very high for matching pairs. If the distributions overlap substantially (as may be the case for some tool-mark types with degraded or partial marks), the likelihood ratio is moderate and the evidence is correspondingly weaker.

The Bayesian likelihood-ratio framework is the standard reporting framework recommended by ENFSI and by the AFSP (Association of Forensic Science Providers) in the UK. The Netherlands Forensic Institute (NFI) has fully transitioned to likelihood-ratio reporting for pattern-comparison disciplines including firearm-mark comparison. The UK Forensic Science Regulator's Codes (version 7) recommend likelihood-ratio reporting where validation supports it. The FBI continues to use the AFTE categorical conclusion scheme but has piloted likelihood-ratio supplements for firearm cases.

In India, CFSL reporting practice follows categorical language consistent with the AFTE scheme. The BSA 2023 framework for expert opinion does not specify a reporting format, leaving the transition to quantitative likelihood-ratio reporting for a future policy update.

The AFTE Theory of Identification and Post-PCAST Developments

The AFTE Theory of Identification (1992, revised 2011) is the conceptual framework for comparison microscopy in tool-mark work. Its three-tier conclusion structure (identification, elimination, inconclusive) and its definition of sufficient agreement of individual characteristics as the identification threshold have underpinned tool-mark testimony for three decades.

PCAST's 2016 finding that the theory lacked empirical validation prompted AFTE to commission additional studies and to work with OSAC on updated guidelines. The 2024 OSAC revised guidelines for firearm and tool-mark examination retain the categorical conclusion structure but add requirements for: (1) proficiency testing of individual examiners, (2) inter-laboratory comparison exercises, (3) blind verification of conclusion by a second examiner, and (4) acknowledgment in the formal report that current error rate data are limited and that the conclusion is the examiner's subjective determination.

Several US federal courts, post-2019, have accepted limited tool-mark testimony where the expert uses "more likely than not from the same source" language rather than "identified to the exclusion of all other tools." The D.C. Superior Court ruling in United States v. Tibbs (2019) and the Eastern District of New York ruling in United States v. Ashburn (2015) both reflect this calibration. The UK courts, through the FSR Codes' scientific-validity requirements, have implicitly demanded the kind of method validation PCAST identified as lacking, though no landmark exclusion decision has appeared in reported UK case law.

Canadian courts apply the R v. Mohan (1994) four-prong test for expert admissibility (relevance, necessity, proper qualification, no exclusionary rule) supplemented by the independence and impartiality threshold from White Burgess Langille Inman v. Abbott and Haliburton (2015). The combined framework creates space for a Daubert-equivalent reliability hearing on novel or contested pattern-comparison methods, though tool-mark evidence has continued to be admitted in most Canadian cases where the examiner can demonstrate AFTE methodology and disclose its limitations.

In Australia and New Zealand, the ANZFSS guidelines align with the ENFSI position: reports should include uncertainty qualification and should distinguish between class-level and individual-level comparison conclusions.

Practical Guide for the Expert and the Advocate

For the forensic examiner, post-PCAST practice requires five disciplines: (1) generate test marks with the suspect tool under conditions as close as possible to those that produced the questioned mark; (2) document both the optical comparison and the reasoning chain, not just the final conclusion; (3) disclose ambiguity, regions where comparison was not possible, and features present in the test mark but absent from the questioned mark; (4) consider whether 3D acquisition is available and adds value, particularly for cases proceeding to a contested hearing; (5) frame the conclusion within the AFTE scheme while acknowledging the current absence of a validated population error rate.

For the advocate cross-examining a tool-mark expert, the PCAST-informed approach focuses on four questions: (1) what validation studies support this method's accuracy for this specific mark type and substrate?; (2) what is the known or estimated false-positive rate for this examiner's laboratory?; (3) did any second examiner independently review and reach the same conclusion?; (4) was the comparison conducted using any quantitative scoring supplemented by the subjective comparison?

For courts, the PCAST recommendation was not to exclude tool-mark testimony but to require that experts acknowledge the limitations and that juries receive appropriate instructions about the absence of a validated error rate. US courts post-Tibbs have generally taken this approach. UK courts operating under the FSR Codes implicitly require method validation evidence, and the FSR has the authority to require laboratories to demonstrate compliance before their reports are admitted.

Worked example

3D surface topography comparison of a screwdriver prying mark

The witness description was vague. The confocal scan of a door-frame impression was not.

Scene: A burglary in Copenhagen involved forced entry through a wooden door frame. A screwdriver-type prying mark was left in the door-jamb wood. A suspect's toolbox contained six flathead screwdrivers of varying sizes. The Danish Institute of Forensic Medicine (RETSMEDICINSK INSTITUT) submitted the door-jamb section and all six screwdrivers for toolmark examination.

Step 1 (questioned mark documentation): The impressed tool mark in the door jamb (Q) was cast in Mikrosil casting compound and the cast scanned on a Sensofar S neox confocal profiler at 50x objective (lateral resolution 0.36 micrometres, height resolution 1 nm). A 3D height map of the impressed face was saved as a point-cloud dataset.

Step 2 (test marks): Test impressions were made in lead blocks using each of the six screwdrivers (K1 to K6) in the same orientation and force geometry as the questioned mark, following the AFTE tool-marks SOP. K4 (an 8 mm flathead, blade showing a distinctive grinding scar on its right shoulder) produced a test mark visually similar to Q under 2D comparison microscopy at 20x oblique illumination.

Step 3 (CMC scoring): Q and K4 3D scans were compared using the NIST Congruent Matching Cells (CMC) algorithm. The comparison area was divided into 25 x 25 micrometre cells. CMC score = 14 congruent cells out of a possible 18 in the comparison zone. The calibrated CMC threshold for toolmark identification (from NIST OSAC Technical Note 2078) is 6 congruent cells; a score of 14 well exceeded this threshold. ACS (Algorithmic Comparison Score) gave LR = 8,400.

Conclusion: CMC analysis of the 3D topographic comparison provided strong quantitative support for the conclusion that K4 (screwdriver 4 from the suspect's toolbox) made the questioned door-jamb mark. The evidence was admitted at the Copenhagen City Court. The 3D comparison method and CMC algorithm were disclosed to the defence with the NIST validation reference, and no methodological challenge was raised.

Key terms

Comparison microscope: Two microscopes joined by an optical bridge that places the two specimen images side-by-side in a single visual field; the primary tool for tool-mark and firearm-mark comparison.
Oblique illumination: A lighting angle of 15-30 degrees from horizontal used in tool-mark comparison microscopy to maximise shadow contrast and visibility of fine striations.
Confocal microscopy: An optical technique using a pinhole aperture to eliminate out-of-focus light; by scanning through focal planes, builds a 3D height map of the specimen surface at sub-micrometre resolution.
CMC (Congruent Matching Cells): NIST algorithm that divides two 3D surface topography maps into cell grids and counts cells meeting quantitative topographic-correlation, slope-agreement, and height-offset criteria; produces a score rather than a subjective opinion.
ACS (Algorithmic Comparison Score): Extension of CMC that converts the raw cell-count score into a likelihood ratio using calibrated probability distributions from known match and non-match populations.
Likelihood ratio (LR): The probability of the observed evidence under the prosecution hypothesis (same source) divided by its probability under the defence hypothesis (different source); the recommended reporting metric in ENFSI and UK FSR guidelines.
NBTRB (NIST Ballistic Toolmark Research Database): NIST's repository of 3D surface topography data from bullets and cartridge cases fired from known firearms; used to develop and validate the CMC algorithm.
Evofinder: Foster and Freeman (UK) structured-light 3D scanning system for bullet, cartridge-case, and tool-mark surface acquisition and comparison, with integrated topographic correlation scoring.
OSAC: Organization of Scientific Area Committees at NIST; develops forensic-science standards including updated firearm and tool-mark guidelines post-PCAST 2016.
Blind verification: A quality-assurance step where a second examiner independently reviews the evidence and reaches a conclusion without knowledge of the first examiner's finding; now required under OSAC 2024 tool-mark guidelines.
NFI (Netherlands Forensic Institute): The Dutch national forensic laboratory, an early adopter of full likelihood-ratio reporting for pattern-comparison disciplines including firearm marks.
SWGMAT: Scientific Working Group for Materials Analysis; a US forensic-science working group that published tool-mark examination guidelines, superseded by OSAC committees.

Method	Data type	Score output	Validation status (2025)	Primary use
Optical comparison microscopy	Subjective visual assessment	AFTE categorical (ID/elim/inconcl.)	Long casework history; no formal error-rate study	All tool mark and firearm-mark cases
CMC algorithm on 3D data	Quantitative height-map correlation	Cell-count score (0 to N)	Published for bullets/cases; limited for tool marks	Supplement to optical; contested cases
ACS likelihood ratio	Calibrated probability model	LR value (e.g., 100:1 for match)	Developing; requires laboratory-specific calibration	Reporting format in ENFSI/UK practice
Evofinder correlation score	Structured-light 3D topography	Percent correlation score	Inter-lab studies in EU, limited in non-EU	Tool mark cases, especially in Europe

Why is oblique illumination preferred over vertical illumination for striated-mark examination?

Vertical illumination strikes the surface perpendicularly and reflects uniformly, producing flat contrast. Oblique illumination from a low angle creates shadows on one side of each ridge and highlights on the other. Because striation widths in typical tool marks are 5-50 micrometres, the shadow-enhancement effect is often the difference between a comparison that shows individual features clearly and one that shows only a wash of surface texture.

Can a CMC score be presented to a jury as a probability of guilt?

No. The CMC score is a comparison metric, not a probability of source or of guilt. Presenting it as a probability commits the prosecutor's fallacy. The correct path is through a likelihood ratio, which measures evidential weight that the jury must combine with its prior probability assessment. Judicial guidance in most jurisdictions discourages presenting raw numerical probability values to juries for this reason.

Does a high CMC score for bullet marks automatically mean the same standard applies to screwdriver striations?

Not automatically. CMC was developed and validated primarily on the NIST Ballistic Toolmark Research Database of firearm marks. Screwdriver striations have different surface characteristics, substrate interactions, and potentially different score distributions. A laboratory applying CMC to non-firearm tool marks must conduct its own validation before the bullet-mark calibrated thresholds apply. Using the bullet-mark validation as a proxy is a methodological overreach that a well-prepared cross-examination would expose.

What is the practical difference between AFTE categorical conclusions and likelihood-ratio reporting?

The AFTE scheme (identification / elimination / inconclusive) gives a categorical answer: yes, no, or cannot say. Likelihood-ratio reporting gives a quantitative evidential weight: the evidence is N times more probable under the common-source hypothesis than under the different-source hypothesis. The LR makes uncertainty explicit and quantifiable; for a defence advocate it is more useful because the magnitude and its uncertainty can be challenged. Categorical language is simpler for a jury but is scientifically less transparent.

What does 'blind verification' mean in the 2024 OSAC tool-mark guidelines?

Blind verification requires a second examiner to reach an independent conclusion before seeing the first examiner's result. This is distinct from non-blind peer review, where the reviewer already knows the first conclusion. Blind verification measures actual independent agreement and generates inter-examiner agreement data that feed into error-rate estimation. The OSAC 2024 guidelines require blind verification for all tool-mark identifications submitted as criminal evidence.

Practice

Question 1 of 5· 0 answered

During a comparison microscopy examination, the analyst moves the questioned-mark specimen until striation lines on the left half of the optical bridge align with matching lines on the right (test-mark) half. The analyst then adjusts the illumination angle to 20 degrees and observes clear shadow contrast on both specimens. This illumination adjustment is called:

Test yourself on Forensic Physics with free, timed mocks.

Practice Forensic Physics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Your journey to becoming a forensic professional starts here.