Triage and Incident Prioritisation
Incident triage is the structured process of determining whether an alert represents a genuine security incident and, if so, how urgently it requires a response. This topic covers severity matrices, business-impact scoring, false-positive management, and the analyst workflows that convert raw alert data into prioritised incident queues.
Last updated:
Incident triage is the decision process that determines whether a detection alert represents a genuine security incident and, if confirmed, assigns it a severity level that dictates the speed and scale of the response. Without triage, every alert would demand the same level of attention, which is operationally impossible in environments that generate thousands of alerts per day. Triage applies structured criteria, including technical impact indicators, asset criticality scores, and business-impact weighting, to convert a raw alert queue into a prioritised incident list that analysts can work through systematically.
The challenge triage solves is asymmetric: the number of alerts a modern security stack generates greatly exceeds the capacity of any human team to investigate each one from scratch. Detection tools flag everything from confirmed malware infections to routine patch-management noise. A triage process that works well separates the genuine threats quickly, assigns them to the right response track, and closes or suppresses false positives in a documented way so the team is not re-investigating the same benign events every week.
Triage sits at the boundary between the detection phase and the containment phase of the incident response lifecycle. Frameworks such as NIST SP 800-61 and the SANS PICERL model both treat triage as a distinct step within the Detection and Analysis phase, and both emphasise that the output of triage must be a documented severity assignment, not just an informal analyst judgment. A well-designed triage process also feeds back into detection tuning: false positives that are properly documented reveal which detection rules are producing noise, giving the team the data it needs to improve signal quality over time.
By the end of this topic you will be able to:
- Explain the purpose of incident triage and distinguish between an alert, a true positive, and a confirmed incident.
- Describe how a severity matrix combines technical impact and business impact to produce a prioritised severity level.
- Apply asset criticality weighting to adjust the severity of an alert based on the value of the affected system.
- Identify the causes and consequences of excessive false positive rates and describe strategies for managing them without sacrificing detection coverage.
- Describe the escalation path for alerts that cannot be immediately classified and explain why unclassified alerts must not be auto-closed.
- Triage
- The structured process of evaluating an alert to determine whether it is a genuine security incident and, if so, what severity level it should be assigned. The term is borrowed from emergency medicine, where it describes the sorting of patients by urgency.
- Severity matrix
- A two-dimensional scoring tool that combines technical impact and business impact to assign a severity level to a confirmed incident. Outputs are typically a four-level scale: critical, high, medium, and low (or equivalent numerals). The matrix makes prioritisation consistent and defensible across different analysts.
- False positive
- An alert that fires on a benign event and does not represent a real security incident. High false positive rates are a primary cause of alert fatigue. Documented false positives feed detection-rule tuning to reduce future noise.
- Asset criticality
- A pre-assigned score or label that records how important a system, service, or data set is to the organisation. Used during triage to weight the severity of an incident: the same attack pattern on a critical asset receives a higher severity than the same pattern on a low-value host.
- Alert fatigue
- A condition in which analysts are desensitised to alerts because the volume or false positive rate is too high to investigate thoroughly. Alert fatigue is a significant contributing factor in incidents where genuine attacks were detected but not escalated in time.
- Escalation threshold
- A defined criterion, based on severity level, asset type, or indicator type, that triggers handoff of an alert from a first-tier analyst to a more senior analyst or to a specialist team. Escalation thresholds are defined in the incident response plan and should be documented rather than left to analyst discretion.
From alert to incident: the classification decision
Every detection tool, whether a SIEM correlation rule, an endpoint detection and response (EDR) agent, or a network intrusion detection system, generates alerts. An alert is not an incident. It is a signal that something matching a detection rule has occurred. The first task of triage is classification: deciding whether the alert represents a true positive (a real security event) or a false positive (a benign event that matched a detection rule it should not have).
Classification relies on enriching the alert with context. Raw alert data typically contains a timestamp, a source and destination IP or host, a rule name, and a severity score assigned by the detection tool. None of this is sufficient on its own. The analyst enriches the alert by pulling supporting evidence: relevant log entries, endpoint telemetry, threat intelligence lookups for IP addresses and file hashes, user account history, and asset inventory records for the affected systems. This enrichment step is where the analyst distinguishes a scanning probe from an authenticated intrusion, or a misconfigured application from a data exfiltration attempt.
The classification output should be documented in the incident tracking system with the evidence examined and the reasoning used. This matters for two reasons. First, if the analyst's classification is wrong (a false negative, where a real incident is closed as benign), the documentation allows the error to be traced and the decision logic to be corrected. Second, closed false positives build a record that allows the detection engineering team to tune the rule that fired, reducing future noise.
Severity matrices: combining technical and business impact
Once an alert is classified as a true positive, it must be assigned a severity level that determines how quickly the team responds and which resources are mobilised. Severity assignment based on gut feel is inconsistent across analysts and shifts under pressure. A severity matrix makes the assignment explicit and repeatable.
A severity matrix has two axes. The first is technical impact: how much access or damage has the attacker achieved or could achieve from the confirmed foothold? Indicators include whether the compromise is limited to one endpoint or spans multiple systems, whether the attacker has privileged credentials, whether data has been exfiltrated or encrypted, and whether the attack is still in progress. The second axis is business impact: how important is the affected system to the organisation's operations, legal obligations, or reputation? This axis draws directly from the asset criticality register.
| Technical impact | Business impact | Severity | Target response time |
|---|---|---|---|
| High (active attacker, privileged access) | High (critical system or regulated data) | Critical (P1) | Immediate, within 15 minutes |
| High (active attacker, limited access) | Medium (important but not critical) | High (P2) | Within 1 hour |
| Medium (indicators of compromise, no confirmed access) | High (critical system) | High (P2) | Within 1 hour |
| Medium (indicators of compromise) | Low (non-critical system) | Medium (P3) | Within 4 hours |
| Low (failed attack, no access gained) | Any | Low (P4) | Within 24 hours |
The matrix is populated with target response times defined in the incident response plan. Those targets vary by organisation: a financial services firm regulated under the Payment Card Industry Data Security Standard (PCI-DSS) or the EU's Digital Operational Resilience Act (DORA) may have contractual and legal obligations to begin containment within specific windows. In India, the Information Technology (Amendment) Act 2008 and CERT-In's 2022 Cyber Security Directions mandate that certain categories of incident be reported to CERT-In within six hours of detection. In the US, the SEC's 2023 cybersecurity disclosure rules require publicly listed companies to disclose material incidents within four business days of determination. These external timelines set a floor on how quickly triage and initial response must occur.
Asset criticality and business-impact scoring
Asset criticality is the bridge between a technical alert and a business-risk judgment. Two identical alerts, say, a suspicious PowerShell execution, fire on two different hosts. One host is a domain controller serving the entire organisation's authentication infrastructure. The other is a developer's test workstation with no network access to production systems. The technical alert is the same; the business impact is radically different. Without asset criticality data in the triage workflow, both alerts receive the same score and the domain controller may not be prioritised appropriately.
Asset criticality is assigned in advance, as part of forensic readiness and IR preparation, not during an active incident. The assignment process typically maps assets to a tiered scale (for example, Tier 1: mission-critical, Tier 2: important, Tier 3: standard) based on the answers to questions such as: Would a one-hour outage of this system halt revenue-generating operations? Does this system hold regulated personal data (personal health information under HIPAA in the US, or sensitive personal data under India's Digital Personal Data Protection Act 2023, or special-category data under the EU General Data Protection Regulation)? Is this system required for compliance evidence (audit logs, access records)?
The criticality label is stored in the asset inventory and should be surfaced automatically in the triage interface when an alert fires on a known asset. When an alert fires on a host not in the inventory, a conservative approach treats it as Tier 2 until the asset is identified, rather than Tier 3, because unmanaged assets are often higher risk than known ones.
False positives: causes, costs, and management
A false positive is an alert that fires on a benign event. Some false positives are inevitable: any detection rule broad enough to catch all variants of a threat will also catch benign events that share surface features. The problem is not individual false positives but a sustained high false positive rate, which creates alert fatigue and degrades the team's ability to detect real incidents.
The causes of high false positive rates fall into a small number of patterns. Overly broad detection rules written without environment context, for example, alerting on any PowerShell execution rather than on PowerShell executions in contexts where PowerShell is not legitimate, generate noise across all hosts. Failure to suppress known-good baselines, such as a scheduled task that runs a benign script every night, causes the same alert to fire repeatedly. Threat intelligence feeds containing stale indicators of compromise (IOCs) that now point to reclaimed or shared infrastructure generate alerts on legitimate traffic.
Managing false positives without sacrificing detection coverage requires documentation before suppression. Before a detection rule is tuned or a suppression is added, the analyst should document the false positive in the incident tracking system, including the rule that fired, the evidence that showed it was benign, and the context (for example, this alert fires every Monday at 02:00 because of the weekly backup job on this host). Suppressions are then targeted and time-bounded or host-specific rather than global. Global suppression of a rule that fires frequently as a false positive may also suppress the same rule when it fires on a real incident.
Escalation paths and tiered analyst workflows
Not all alerts can be classified by the first analyst who sees them. A tiered escalation model ensures that unresolved alerts move to more experienced analysts rather than being closed without adequate investigation. Most SOC structures use three analyst tiers. Tier 1 handles initial alert review, enrichment, and classification of routine events. Tier 2 handles complex or ambiguous cases referred by Tier 1, and conducts deeper technical analysis including log correlation and threat hunting. Tier 3 handles advanced persistent threat scenarios, forensic investigation, and cases that require specialist knowledge.
Escalation criteria should be defined in the IR plan rather than left to analyst judgment. Typical criteria include: the alert cannot be classified as true or false positive within a defined time window (commonly 30 to 60 minutes for Tier 1); the alert involves a Tier 1 critical asset; the alert matches a known threat actor pattern listed in the threat intelligence feed; or the scope of affected systems appears to be expanding while investigation is in progress. When an alert meets escalation criteria, it moves to the next tier with the enrichment data already collected rather than requiring the receiving analyst to restart from scratch.
Time-to-escalate is a key triage metric. In the EU's NIS2 Directive (which applies to operators of essential services and digital service providers across EU member states), significant incidents must receive an early warning to the competent authority within 24 hours of detection. In the UK, the National Cyber Security Centre (NCSC) expects operators of essential services to report significant incidents promptly. These regulatory timelines mean that delays in the escalation path between Tier 1 detection and senior decision-making can create legal exposure as well as technical risk.
Triage playbooks and continuous improvement
A triage playbook is a documented decision procedure for a specific alert type or threat scenario. It specifies exactly which evidence to collect, which questions to ask, which thresholds trigger escalation, and which reference materials (threat intelligence, asset inventory, past incident records) the analyst should consult. Playbooks reduce the cognitive load on analysts under pressure and make triage consistent across different shifts and experience levels.
Effective playbooks are built from prior incident data. After each incident, the post-incident review should ask whether the triage process worked correctly: was the initial severity assignment accurate? Were any escalation steps delayed? Did false positive suppression miss the relevant context? The answers update the playbook for that alert type. Over time, this cycle produces playbooks that reflect the organisation's actual environment and threat history rather than generic templates.
Triage quality can be measured. Key metrics include mean time to triage (the average time between alert generation and classification), false positive rate by rule and by alert source, escalation rate (the proportion of Tier 1 alerts that escalate to Tier 2 or 3), and severity accuracy (the proportion of severity assignments confirmed correct after the full incident investigation). These metrics, reviewed regularly, identify which detection rules, asset categories, or analyst procedures need improvement and prevent triage from becoming a static process that degrades as the threat environment changes.
An analyst receives an alert that a known patch management tool executed a PowerShell script on 200 workstations at 03:00. The behaviour matches a detection rule for suspicious PowerShell use. What is the most appropriate triage outcome?
Key Takeaways
- Triage is the structured decision process that converts raw alerts into classified incidents with assigned severity levels; it sits between detection and containment in every major IR framework.
- A severity matrix combines technical impact (what access or damage has the attacker achieved) with business impact (how critical is the affected asset) to produce a consistent, defensible severity level that determines response speed and resource allocation.
- Asset criticality is assigned before incidents occur and stored in the asset inventory; it allows the same technical alert to receive different severity levels depending on the value of the affected system.
- High false positive rates cause alert fatigue, which is a primary contributor to missed incidents; managing false positives requires documented, targeted suppression rather than global rule changes, and every suppression should feed detection-rule tuning.
- Unclassified alerts must be escalated, not closed; tiered escalation paths with defined time windows and defined criteria ensure that genuinely ambiguous events receive experienced analysis rather than default closure.
What is the difference between an alert and an incident in triage?
What factors go into a severity matrix for incident prioritisation?
How do false positives harm incident response operations?
What is the role of asset criticality in incident triage?
How should a team handle an alert it cannot immediately classify as true or false positive?
Test yourself on Incident Response and Management with free, timed mocks.
Practice Incident Response and Management questionsSpotted an error in this page? Report a correction or read our editorial standards.