Audit Sampling Techniques and Working Papers
Audit sampling lets a security auditor draw conclusions about an entire population of controls or transactions by examining only a subset of items. Working papers are the documented record that connects each sample to the audit opinion and provides the evidence base for any challenge, regulatory review, or future audit cycle.
Last updated:
Audit sampling is the practice of selecting and testing a subset of items from a larger population and using the results to form a conclusion about the whole population. In a security audit, the population might be access-review logs, firewall rule change tickets, patch deployment records, or encryption key rotation events. Because testing every item is rarely feasible in a large organisation, auditors apply either statistical methods, which let them project results with a calculable confidence level, or judgement-based methods, which rely on professional experience to target high-risk items. The choice of method determines how findings can be stated: statistical samples produce projectable conclusions while judgement samples support directional observations only.
Working papers are the audit team's documented record of every step taken from planning through conclusion. They capture the population definition, sampling rationale, test procedures, evidence obtained, exceptions noted, and the auditor's reasoning. Working papers are not optional documentation: they are the evidentiary basis for the audit opinion, the defence against challenge by a regulator or certification body, and the starting point for every future audit of the same control area. Frameworks including ISO 27001, SOC 2, and PCI DSS each specify what documentation must exist and how long it must be retained.
The discipline of sampling and working-paper management applies to every major compliance regime. A qualified security assessor testing PCI DSS controls, an ISO 27001 internal auditor reviewing ISMS effectiveness, a certified information systems auditor preparing a SOC 2 report, and an information commissioner inspecting GDPR records-management controls all follow the same underlying logic: define the population, choose a defensible selection method, test a sufficient number of items, document everything, and form a supportable conclusion. The procedural differences between regimes are variations on that shared structure.
By the end of this topic you will be able to:
- Distinguish statistical from judgement sampling and state when each is appropriate in a security audit context.
- Calculate or look up a sample size for attribute sampling given a confidence level, tolerable deviation rate, and expected error rate.
- Identify the required sections of an audit working paper and explain why each section is necessary.
- Apply the correct response when a sample produces an exception rate above the tolerable level.
- State the working-paper retention requirements under ISO 27001, SOC 2, and PCI DSS, and explain how GDPR and India's DPDP Act 2023 affect retention decisions.
- Attribute sampling
- A statistical sampling method that tests whether each selected item either has or lacks a specified attribute, for example whether a change ticket has an approved change-management record attached. Results are expressed as a deviation rate and compared against the tolerable deviation rate.
- Confidence level
- The probability that the sample result correctly reflects the true population characteristic. Expressed as a percentage, typically 90 to 95 percent in security audit work. A higher confidence level requires a larger sample for the same tolerable error rate.
- Tolerable deviation rate
- The maximum error rate the auditor is willing to accept in the population without modifying the audit conclusion. If the projected error rate from the sample exceeds this threshold, the control cannot be assessed as effective.
- Judgement sampling
- Selection of audit items based on the auditor's professional assessment of where errors or weaknesses are most likely to exist. Results cannot be statistically projected to the full population but are useful for targeted testing of high-risk areas or small populations.
- Working paper
- The documented record of an audit procedure: what was tested, how items were selected, the evidence obtained, any exceptions found, and the auditor's conclusion. Working papers are the primary evidence that an audit was conducted as described and form the basis for the audit opinion.
- Upper error limit (UEL)
- The maximum error rate that could exist in the population at the chosen confidence level, given the number of exceptions found in the sample. If the UEL exceeds the tolerable deviation rate, the auditor cannot support a clean opinion on that control.
Statistical sampling: principles and methods
Statistical sampling gives every item in a defined population a known, nonzero probability of selection. Because selection is random and the sample size is calculated from probability theory, the auditor can make a statement of the form: we are 95 percent confident that the deviation rate in the population does not exceed 6 percent. That kind of precision is required by some frameworks and expected by most certification bodies reviewing SOC 2 or ISO 27001 audit evidence.
The most common statistical method in security auditing is attribute sampling, which tests each item for the presence or absence of a control characteristic. A random sample of 60 access-review records is tested: each record either has a documented supervisor approval or it does not. The number of deviations found is counted, the upper error limit is calculated from published tables, and that limit is compared to the tolerable deviation rate set during planning.
| Method | Population type | Typical use in security audit | Output |
|---|---|---|---|
| Attribute sampling | Any discrete yes/no characteristic | Control operating effectiveness (does each change have an approval?) | Deviation rate + upper error limit |
| Variables sampling | Populations with measured values | Financial-system access amounts, data volumes processed | Mean estimate with confidence interval |
| Discovery sampling | Low-frequency high-risk events | Looking for at least one instance of an unencrypted backup or unapproved admin account | Whether at least one exception is likely to exist |
| Stratified sampling | Heterogeneous populations | Splitting access logs by user type before drawing separate random samples | Separate estimates per stratum |
Random number generation for selection can use a random number table, a spreadsheet RAND() function, or purpose-built audit software such as ACL, IDEA, or the open-source equivalents. The selection method must be documented in the working paper. Systematic sampling (selecting every nth item) is sometimes used as a practical substitute for pure random sampling but carries a risk if the population has a periodic pattern that aligns with the interval.
Judgement sampling: when and how
Judgement sampling is appropriate when the population is small enough that statistical projection adds no practical value, when testing is targeted at specific high-risk items rather than the population as a whole, or when time and budget constraints make a full statistical approach impractical. It is also the default when the auditor's goal is to find at least one example of a specific failure pattern rather than to estimate an overall error rate.
Common judgement selection criteria in security audits include: items with the highest value or sensitivity, items processed immediately after a system change or personnel change, items from a period when a known control weakness existed, and items selected to cover the full range of transaction types or system components. A firewall rule audit might use judgement to select the 30 most recently added rules, the 10 rules with the widest open ports, and all rules touching a PCI DSS in-scope segment, rather than a random sample of all 4,000 rules.
Sample size determination
For attribute sampling, three inputs drive sample size: the confidence level, the tolerable deviation rate, and the expected deviation rate. A higher confidence level requires a larger sample. A lower tolerable deviation rate requires a larger sample. A higher expected deviation rate also requires a larger sample because the auditor needs more data to distinguish systematic failure from random variation. ISACA's IT Audit Framework and the AICPA's audit sampling guide both publish tables that translate these three inputs into sample sizes without requiring the auditor to compute the underlying hypergeometric or Poisson formula directly.
| Confidence level | Tolerable deviation rate | Expected deviation rate | Approximate sample size |
|---|---|---|---|
| 90% | 10% | 0% | 24 |
| 95% | 10% | 0% | 29 |
| 95% | 5% | 0% | 59 |
| 95% | 5% | 1% | 93 |
| 95% | 3% | 0% | 99 |
| 99% | 5% | 0% | 90 |
The population size has almost no effect on sample size when the population exceeds 500 items. This is counterintuitive but mathematically sound: a random sample of 59 items gives essentially the same statistical power whether the population is 1,000 or 1,000,000 items. Population size only matters when the population is very small (under roughly 200 items), in which case the finite population correction factor reduces the required sample.
After selecting and testing the sample, the auditor calculates the upper error limit using published tables or software. If zero exceptions are found in a sample of 59 at 95 percent confidence and a 5 percent tolerable rate, the UEL is approximately 5 percent, which just meets the threshold. If one exception is found, the UEL rises to approximately 8 percent, exceeding the tolerable rate and requiring the auditor to expand testing or report a finding.
Working paper structure and content
A working paper for a sampled audit procedure must contain enough information that a qualified reviewer who was not present can reconstruct exactly what was done and understand why the conclusion follows from the evidence. That standard, sometimes called the knowledgeable third party test, is the benchmark used by ISO 27001 certification bodies, AICPA peer reviewers, and PCI DSS qualified security assessors when reviewing audit documentation.
- Header information: audit name, control or objective being tested, audit period, auditor name, date of testing, and reviewer name and date.
- Population definition: the source of the population (which system, which log file, which date range), the total population count, and any items excluded and why.
- Sampling plan: method chosen (statistical or judgement), confidence level and tolerable rate if statistical, selection rationale if judgement, and sample size with the basis for that size.
- Test procedure: a step-by-step description of what was done to each selected item and what evidence was gathered.
- Results: a listing of items tested with a pass/fail or exception note for each, the total exception count, and any follow-up on individual exceptions.
- Conclusion: the auditor's assessment of control effectiveness based on the results, cross-referenced to the audit criteria (such as the specific ISO 27001 Annex A control or NIST CSF subcategory).
Working papers reference the audit process and must be cross-indexed to the overall audit file so that a reader can trace each finding in the report back to the specific working paper and from there to the original evidence. Electronic working paper tools such as TeamMate+, AuditBoard, or IBM OpenPages maintain this indexing automatically. Manual paper files require the auditor to apply consistent referencing codes.
Exceptions, escalation, and conclusions
An exception is any sampled item that does not meet the audit criterion. Finding an exception does not automatically mean reporting a finding: the auditor must determine whether the exception is isolated (an anomaly in an otherwise effective control) or systematic (a pattern indicating the control is not operating as designed). This determination requires investigation: reviewing the circumstances of the exception, checking whether it was already known and reported by management, and, where time permits, testing additional items from the same period.
When the projected error rate exceeds the tolerable deviation rate, the auditor's options are to expand the sample (which may reduce the UEL if no further exceptions are found), to treat the control as not effective and report accordingly, or to identify compensating controls that reduce the risk despite the failure of this specific control. The last option requires additional testing of the compensating control and careful documentation of the compensating-control reliance.
Retention requirements and regulatory context
Working paper retention requirements differ by framework. ISO 27001 clause 9.2 requires the organisation to retain documented information as evidence of the audit programme and results but does not specify a minimum period. Most certification bodies expect at least three years of audit records so that surveillance audits and recertification audits can be compared against prior cycles.
| Framework | Minimum retention | Authority |
|---|---|---|
| ISO 27001 ISMS audit | Not specified; typically 3 to 5 years in practice | ISO/IEC 27001:2022 clause 9.2 |
| SOC 2 (AICPA) | 5 years | AICPA AU-C section 230 |
| PCI DSS v4.0 | 12 months (most recent 3 months immediately available) | PCI DSS Requirement 12.5.2 |
| HIPAA (US) | 6 years from creation or last effective date | 45 CFR § 164.530(j) |
| GDPR (EU) | Duration of processing plus applicable limitation period | GDPR Article 5(2) accountability principle |
India's Digital Personal Data Protection Act 2023 (DPDP Act) requires Data Fiduciaries to implement and demonstrate reasonable security safeguards. The Act does not prescribe audit retention periods directly, but the accountability principle embedded in the Act means that organisations must be able to produce evidence of compliance on request from the Data Protection Board. In practice, organisations subject to the DPDP Act are aligning their audit retention with a minimum of three years, consistent with the anticipated enforcement cycle.
In the United Kingdom, the ICO's Accountability Framework expects organisations to maintain records of their data protection impact assessments and security reviews. In the United States, sector-specific rules apply: HIPAA requires six years, the Gramm-Leach-Bliley Act's Safeguards Rule requires organisations to retain the records of their information security programme, and the FTC has taken enforcement action against organisations that could not produce audit documentation. The general principle across jurisdictions is that the retention period should exceed the longest plausible enforcement limitation period.
An auditor tests a sample of 60 firewall change records and finds zero exceptions. The tolerable deviation rate was set at 5 percent and the confidence level was 95 percent. What can the auditor conclude?
Key Takeaways
- Statistical sampling uses random selection and probability theory to produce a projectable conclusion about an entire population; judgement sampling targets high-risk items but its results cannot be projected and must be clearly qualified in the audit report.
- Sample size for attribute sampling is set by three inputs: confidence level, tolerable deviation rate, and expected deviation rate. Population size has almost no effect once the population exceeds a few hundred items.
- A working paper must satisfy the knowledgeable third-party test: a qualified reviewer who was not present must be able to reconstruct what was done and understand why the conclusion follows from the evidence.
- When the upper error limit exceeds the tolerable deviation rate, the auditor cannot conclude the control is effective. The options are to expand the sample, identify compensating controls, or report a finding, each of which must be fully documented.
- Retention requirements vary by framework: SOC 2 requires five years, HIPAA requires six years, PCI DSS requires twelve months with the three most recent immediately available, and ISO 27001 does not prescribe a period. India's DPDP Act 2023 and the EU's GDPR both require organisations to retain sufficient evidence to demonstrate accountability on regulatory demand.
What is the difference between statistical and judgement sampling in security audits?
How do auditors decide how many items to sample?
What must audit working papers contain?
How long must audit working papers be retained?
What happens when a sample reveals an error rate above the tolerable level?
Test yourself on Information Security Audit and Compliance with free, timed mocks.
Practice Information Security Audit and Compliance questionsSpotted an error in this page? Report a correction or read our editorial standards.