Audit Sampling Techniques and Working Papers

Audit sampling lets a security auditor draw conclusions about an entire population of controls or transactions by examining only a subset of items. Working papers are the documented record that connects each sample to the audit opinion and provides the evidence base for any challenge, regulatory review, or future audit cycle.

Last updated: 24 Jun 2026

Audit sampling is the practice of selecting and testing a subset of items from a larger population and using the results to form a conclusion about the whole population. In a security audit, the population might be access-review logs, firewall rule change tickets, patch deployment records, or encryption key rotation events. Because testing every item is rarely feasible in a large organisation, auditors apply either statistical methods, which let them project results with a calculable confidence level, or judgement-based methods, which rely on professional experience to target high-risk items. The choice of method determines how findings can be stated: statistical samples produce projectable conclusions while judgement samples support directional observations only.

Working papers are the audit team's documented record of every step taken from planning through conclusion. They capture the population definition, sampling rationale, test procedures, evidence obtained, exceptions noted, and the auditor's reasoning. Working papers are not optional documentation: they are the evidentiary basis for the audit opinion, the defence against challenge by a regulator or certification body, and the starting point for every future audit of the same control area. Frameworks including ISO 27001, SOC 2, and PCI DSS each specify what documentation must exist and how long it must be retained.

The discipline of sampling and working-paper management applies to every major compliance regime. A qualified security assessor testing PCI DSS controls, an ISO 27001 internal auditor reviewing ISMS effectiveness, a certified information systems auditor preparing a SOC 2 report, and an information commissioner inspecting GDPR records-management controls all follow the same underlying logic: define the population, choose a defensible selection method, test a sufficient number of items, document everything, and form a supportable conclusion. The procedural differences between regimes are variations on that shared structure.

By the end of this topic you will be able to:

Distinguish statistical from judgement sampling and state when each is appropriate in a security audit context.
Calculate or look up a sample size for attribute sampling given a confidence level, tolerable deviation rate, and expected error rate.
Identify the required sections of an audit working paper and explain why each section is necessary.
Apply the correct response when a sample produces an exception rate above the tolerable level.
State the working-paper retention requirements under ISO 27001, SOC 2, and PCI DSS, and explain how GDPR and India's DPDP Act 2023 affect retention decisions.

Key terms

Attribute sampling: A statistical sampling method that tests whether each selected item either has or lacks a specified attribute, for example whether a change ticket has an approved change-management record attached. Results are expressed as a deviation rate and compared against the tolerable deviation rate.
Confidence level: The probability that the sample result correctly reflects the true population characteristic. Expressed as a percentage, typically 90 to 95 percent in security audit work. A higher confidence level requires a larger sample for the same tolerable error rate.
Tolerable deviation rate: The maximum error rate the auditor is willing to accept in the population without modifying the audit conclusion. If the projected error rate from the sample exceeds this threshold, the control cannot be assessed as effective.
Judgement sampling: Selection of audit items based on the auditor's professional assessment of where errors or weaknesses are most likely to exist. Results cannot be statistically projected to the full population but are useful for targeted testing of high-risk areas or small populations.
Working paper: The documented record of an audit procedure: what was tested, how items were selected, the evidence obtained, any exceptions found, and the auditor's conclusion. Working papers are the primary evidence that an audit was conducted as described and form the basis for the audit opinion.
Upper error limit (UEL): The maximum error rate that could exist in the population at the chosen confidence level, given the number of exceptions found in the sample. If the UEL exceeds the tolerable deviation rate, the auditor cannot support a clean opinion on that control.

Statistical sampling: principles and methods

Statistical sampling gives every item in a defined population a known, nonzero probability of selection. Because selection is random and the sample size is calculated from probability theory, the auditor can make a statement of the form: we are 95 percent confident that the deviation rate in the population does not exceed 6 percent. That kind of precision is required by some frameworks and expected by most certification bodies reviewing SOC 2 or ISO 27001 audit evidence.

The most common statistical method in security auditing is attribute sampling, which tests each item for the presence or absence of a control characteristic. A random sample of 60 access-review records is tested: each record either has a documented supervisor approval or it does not. The number of deviations found is counted, the upper error limit is calculated from published tables, and that limit is compared to the tolerable deviation rate set during planning.

Method	Population type	Typical use in security audit	Output
Attribute sampling	Any discrete yes/no characteristic	Control operating effectiveness (does each change have an approval?)	Deviation rate + upper error limit
Variables sampling	Populations with measured values	Financial-system access amounts, data volumes processed	Mean estimate with confidence interval
Discovery sampling	Low-frequency high-risk events	Looking for at least one instance of an unencrypted backup or unapproved admin account	Whether at least one exception is likely to exist
Stratified sampling	Heterogeneous populations	Splitting access logs by user type before drawing separate random samples	Separate estimates per stratum

Random number generation for selection can use a random number table, a spreadsheet RAND() function, or purpose-built audit software such as ACL, IDEA, or the open-source equivalents. The selection method must be documented in the working paper. Systematic sampling (selecting every nth item) is sometimes used as a practical substitute for pure random sampling but carries a risk if the population has a periodic pattern that aligns with the interval.

Judgement sampling: when and how

Judgement sampling is appropriate when the population is small enough that statistical projection adds no practical value, when testing is targeted at specific high-risk items rather than the population as a whole, or when time and budget constraints make a full statistical approach impractical. It is also the default when the auditor's goal is to find at least one example of a specific failure pattern rather than to estimate an overall error rate.

Common judgement selection criteria in security audits include: items with the highest value or sensitivity, items processed immediately after a system change or personnel change, items from a period when a known control weakness existed, and items selected to cover the full range of transaction types or system components. A firewall rule audit might use judgement to select the 30 most recently added rules, the 10 rules with the widest open ports, and all rules touching a PCI DSS in-scope segment, rather than a random sample of all 4,000 rules.

Sample size determination

For attribute sampling, three inputs drive sample size: the confidence level, the tolerable deviation rate, and the expected deviation rate. A higher confidence level requires a larger sample. A lower tolerable deviation rate requires a larger sample. A higher expected deviation rate also requires a larger sample because the auditor needs more data to distinguish systematic failure from random variation. ISACA's IT Audit Framework and the AICPA's audit sampling guide both publish tables that translate these three inputs into sample sizes without requiring the auditor to compute the underlying hypergeometric or Poisson formula directly.

Confidence level	Tolerable deviation rate	Expected deviation rate	Approximate sample size
90%	10%	0%	24
95%	10%	0%	29
95%	5%	0%	59
95%	5%	1%	93
95%	3%	0%	99
99%	5%	0%	90

The population size has almost no effect on sample size when the population exceeds 500 items. This is counterintuitive but mathematically sound: a random sample of 59 items gives essentially the same statistical power whether the population is 1,000 or 1,000,000 items. Population size only matters when the population is very small (under roughly 200 items), in which case the finite population correction factor reduces the required sample.

After selecting and testing the sample, the auditor calculates the upper error limit using published tables or software. If zero exceptions are found in a sample of 59 at 95 percent confidence and a 5 percent tolerable rate, the UEL is approximately 5 percent, which just meets the threshold. If one exception is found, the UEL rises to approximately 8 percent, exceeding the tolerable rate and requiring the auditor to expand testing or report a finding.

Working paper structure and content

A working paper for a sampled audit procedure must contain enough information that a qualified reviewer who was not present can reconstruct exactly what was done and understand why the conclusion follows from the evidence. That standard, sometimes called the knowledgeable third party test, is the benchmark used by ISO 27001 certification bodies, AICPA peer reviewers, and PCI DSS qualified security assessors when reviewing audit documentation.

Header information: audit name, control or objective being tested, audit period, auditor name, date of testing, and reviewer name and date.
Population definition: the source of the population (which system, which log file, which date range), the total population count, and any items excluded and why.
Sampling plan: method chosen (statistical or judgement), confidence level and tolerable rate if statistical, selection rationale if judgement, and sample size with the basis for that size.
Test procedure: a step-by-step description of what was done to each selected item and what evidence was gathered.
Results: a listing of items tested with a pass/fail or exception note for each, the total exception count, and any follow-up on individual exceptions.
Conclusion: the auditor's assessment of control effectiveness based on the results, cross-referenced to the audit criteria (such as the specific ISO 27001 Annex A control or NIST CSF subcategory).

Working papers reference the audit process and must be cross-indexed to the overall audit file so that a reader can trace each finding in the report back to the specific working paper and from there to the original evidence. Electronic working paper tools such as TeamMate+, AuditBoard, or IBM OpenPages maintain this indexing automatically. Manual paper files require the auditor to apply consistent referencing codes.

Exceptions, escalation, and conclusions

An exception is any sampled item that does not meet the audit criterion. Finding an exception does not automatically mean reporting a finding: the auditor must determine whether the exception is isolated (an anomaly in an otherwise effective control) or systematic (a pattern indicating the control is not operating as designed). This determination requires investigation: reviewing the circumstances of the exception, checking whether it was already known and reported by management, and, where time permits, testing additional items from the same period.

When the projected error rate exceeds the tolerable deviation rate, the auditor's options are to expand the sample (which may reduce the UEL if no further exceptions are found), to treat the control as not effective and report accordingly, or to identify compensating controls that reduce the risk despite the failure of this specific control. The last option requires additional testing of the compensating control and careful documentation of the compensating-control reliance.

Retention requirements and regulatory context

Working paper retention requirements differ by framework. ISO 27001 clause 9.2 requires the organisation to retain documented information as evidence of the audit programme and results but does not specify a minimum period. Most certification bodies expect at least three years of audit records so that surveillance audits and recertification audits can be compared against prior cycles.

Framework	Minimum retention	Authority
ISO 27001 ISMS audit	Not specified; typically 3 to 5 years in practice	ISO/IEC 27001:2022 clause 9.2
SOC 2 (AICPA)	5 years	AICPA AU-C section 230
PCI DSS v4.0	12 months (most recent 3 months immediately available)	PCI DSS Requirement 12.5.2
HIPAA (US)	6 years from creation or last effective date	45 CFR § 164.530(j)
GDPR (EU)	Duration of processing plus applicable limitation period	GDPR Article 5(2) accountability principle

India's Digital Personal Data Protection Act 2023 (DPDP Act) requires Data Fiduciaries to implement and demonstrate reasonable security safeguards. The Act does not prescribe audit retention periods directly, but the accountability principle embedded in the Act means that organisations must be able to produce evidence of compliance on request from the Data Protection Board. In practice, organisations subject to the DPDP Act are aligning their audit retention with a minimum of three years, consistent with the anticipated enforcement cycle.

In the United Kingdom, the ICO's Accountability Framework expects organisations to maintain records of their data protection impact assessments and security reviews. In the United States, sector-specific rules apply: HIPAA requires six years, the Gramm-Leach-Bliley Act's Safeguards Rule requires organisations to retain the records of their information security programme, and the FTC has taken enforcement action against organisations that could not produce audit documentation. The general principle across jurisdictions is that the retention period should exceed the longest plausible enforcement limitation period.

Worked example

Sampling an access-review control for an ISO 27001 surveillance audit

An internal auditor is testing whether the quarterly user access review control is operating effectively across a population of 1,200 access review records from the preceding twelve months.

The control requires that every access review record has a documented supervisor approval within five business days of the review date. The auditor plans a statistical attribute sample at 95 percent confidence with a tolerable deviation rate of 5 percent and no expected errors, producing a required sample size of 59 items. The following steps document the procedure and conclusion.

Define the population. Export all access review records created between 1 January and 31 December from the identity governance system. Verify the export contains 1,200 records. Exclude 12 records flagged as test accounts, leaving a testable population of 1,188 items. Document the exclusion rationale.
Generate the random sample. Use a spreadsheet RAND() function to assign a random number to each of the 1,188 records, sort ascending, and select the top 59. Record the full list of selected record IDs in the working paper.
Execute the test. For each of the 59 records, open the record in the identity governance tool, locate the supervisor approval field and the approval timestamp, and compare the timestamp to the review date. Mark the record as a pass if approval occurred within five business days, or an exception if it did not. Capture a screenshot or export each record as evidence attached to the working paper.
Evaluate exceptions. Two exceptions are found: record IDs 0447 and 1102 show approvals on days 7 and 9 respectively. The auditor investigates both: record 0447 was approved late because the supervisor was on annual leave and no backup approver was assigned; record 1102 was a system error that prevented the approval workflow from triggering on time. Both are genuine deviations from the control procedure.
Calculate the upper error limit. With 2 exceptions in 59 items at 95 percent confidence, the UEL is approximately 9.6 percent, well above the tolerable rate of 5 percent. The auditor cannot conclude that the control is operating effectively.
Expand the sample and conclude. The auditor adds a further 40 items. No additional exceptions are found. The revised UEL with 2 exceptions in 99 items is approximately 6.2 percent, still above tolerance. The auditor documents a finding: the quarterly access review approval control is not operating with sufficient consistency; the projected deviation rate exceeds the 5 percent tolerable threshold. The finding is reported with two documented root causes and a recommendation to establish a formal backup approver policy and fix the workflow trigger.

Check your understanding

Question 1 of 4· 0 answered

An auditor tests a sample of 60 firewall change records and finds zero exceptions. The tolerable deviation rate was set at 5 percent and the confidence level was 95 percent. What can the auditor conclude?

Key Takeaways

Statistical sampling uses random selection and probability theory to produce a projectable conclusion about an entire population; judgement sampling targets high-risk items but its results cannot be projected and must be clearly qualified in the audit report.
Sample size for attribute sampling is set by three inputs: confidence level, tolerable deviation rate, and expected deviation rate. Population size has almost no effect once the population exceeds a few hundred items.
A working paper must satisfy the knowledgeable third-party test: a qualified reviewer who was not present must be able to reconstruct what was done and understand why the conclusion follows from the evidence.
When the upper error limit exceeds the tolerable deviation rate, the auditor cannot conclude the control is effective. The options are to expand the sample, identify compensating controls, or report a finding, each of which must be fully documented.
Retention requirements vary by framework: SOC 2 requires five years, HIPAA requires six years, PCI DSS requires twelve months with the three most recent immediately available, and ISO 27001 does not prescribe a period. India's DPDP Act 2023 and the EU's GDPR both require organisations to retain sufficient evidence to demonstrate accountability on regulatory demand.

What is the difference between statistical and judgement sampling in security audits?

Statistical sampling uses probability theory to select items and calculate the maximum error rate the auditor can accept at a defined confidence level. Every item in the population has a known, nonzero chance of selection, so the results can be projected to the whole population. Judgement sampling relies on the auditor's professional experience to choose items considered most likely to contain errors, such as high-value transactions or recently changed configurations. Judgement sampling cannot be statistically projected but is faster and is often used when the population is small or when targeted testing of high-risk items is the goal.

How do auditors decide how many items to sample?

Sample size depends on three factors: the desired confidence level (typically 90 to 95 percent for security audits), the tolerable error rate the auditor is willing to accept without qualifying the opinion, and the expected error rate based on prior audits or risk assessment. Higher confidence and lower tolerable error both increase the required sample size. For attribute sampling of controls, published tables or formulas from AICPA or ISACA guidance translate these three inputs into a specific sample count. For a 95 percent confidence level and a tolerable deviation rate of 5 percent with no expected errors, the standard tables produce a sample size of approximately 60 items.

What must audit working papers contain?

Working papers must document the audit objective, the population tested and how it was defined, the sampling method used and why it was chosen, the items selected and the results of testing each item, exceptions found and how they were resolved, and the auditor's conclusion. They should be sufficiently detailed that a qualified reviewer who was not present during the audit can understand exactly what was done and why the conclusion follows from the evidence. In regulated environments such as SOC 2, ISO 27001, or PCI DSS, working papers are subject to review by the certification body or qualified security assessor.

How long must audit working papers be retained?

Retention requirements vary by framework and jurisdiction. ISO 27001 requires documented information to be retained as evidence of the ISMS audit programme but does not specify a minimum period; most organisations apply three to five years. SOC 2 engagements follow AICPA standards, which require a minimum of five years. PCI DSS requires audit evidence to support each requirement to be retained for at least twelve months with the previous three months immediately available. In the European Union, GDPR-related audit records may need to be kept for the duration of any associated data processing activity plus any applicable limitation period for enforcement action. India's Digital Personal Data Protection Act 2023 does not yet prescribe specific audit retention periods, but organisations subject to it are expected to maintain records sufficient to demonstrate compliance.

What happens when a sample reveals an error rate above the tolerable level?

When projected errors exceed the tolerable deviation rate, the auditor cannot conclude that the control is operating effectively. The typical response is to expand the sample to get a more precise estimate, investigate the root cause of the exceptions found, and assess whether the exceptions indicate a systematic failure or isolated incidents. If expanded testing confirms the high error rate, the auditor reports the control deficiency. Depending on severity, this becomes a finding, a significant deficiency, or a material weakness in the audit report. The auditee is expected to propose a corrective action, and a follow-up test may be scheduled for the next audit cycle.

Test yourself on Information Security Audit and Compliance with free, timed mocks.

Practice Information Security Audit and Compliance questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.