Cyber Investigation Tools and Analytical Workflow

A structured analytical workflow transforms raw log data into reproducible, court-ready findings by moving systematically from ingestion and timeline reconstruction through hypothesis testing and evidence correlation. This topic covers the core toolset used in cyber investigations, including SIEM platforms, network forensics tools, and link-analysis software, and explains how to document each analytical step.

Last updated: 24 Jun 2026

A cyber investigation workflow is a structured sequence of analytical steps that converts raw digital evidence into reproducible, documented findings. The sequence begins with evidence acquisition and log ingestion, moves through timeline reconstruction and hypothesis generation, and concludes with evidence correlation and reporting. Each step is governed by the same principles that govern any forensic discipline: the method must be documented, the findings must be reproducible by a second examiner, and the chain of custody must remain unbroken from collection to court. The tools used, including SIEM platforms such as Splunk and IBM QRadar, network forensics platforms such as Wireshark and Zeek, and link-analysis software such as Maltego and i2 Analyst's Notebook, are instruments within this workflow, not substitutes for it.

The core challenge in cyber investigation is volume. A single enterprise network can generate hundreds of millions of log entries per day. No investigator can read that data line by line; the workflow must include filtering, normalisation, and correlation steps that reduce the data to a manageable set of hypotheses without discarding evidence that later proves significant. SIEM platforms handle the ingestion and correlation layer. Network forensics tools examine packet-level traffic. Link-analysis software maps relationships among entities, including IP addresses, domains, accounts, and cryptocurrency wallets, that may not be visible in any single log source.

The legal framework governing how findings are presented varies by jurisdiction. In the United States, the Federal Rules of Evidence (Rules 702 and 901) require that expert testimony be based on reliable methods and that digital evidence be authenticated. In the United Kingdom, the Association of Chief Police Officers (ACPO) principles set the baseline for computer evidence handling. The European Union's Directive 2016/681 and national cybercrime laws shape how cross-border evidence may be obtained. In India, the Bharatiya Sakshya Adhiniyam 2023 governs electronic evidence admissibility. Regardless of jurisdiction, the obligation to document analytical method applies uniformly.

Each phase must clear its quality gate before the next phase begins: skipping the hash at phase 1 creates a chain-of-custody problem that cannot be fixed at phase 4.

By the end of this topic you will be able to:

Describe the four phases of a structured cyber investigation workflow and explain the purpose of each phase.
Explain how a SIEM platform ingests, normalises, and correlates log data, and identify the limitations that make SIEM output insufficient on its own.
Distinguish between full packet capture and NetFlow analysis, and select the appropriate technique given a particular investigative scenario.
Apply link-analysis concepts to map relationships among IP addresses, domains, accounts, and cryptocurrency wallets in a cyber investigation.
Describe the documentation requirements for each analytical step that make findings reproducible and defensible in court.

Key terms

SIEM (Security Information and Event Management): A platform that ingests logs from multiple sources, normalises them to a common schema, applies correlation rules to generate alerts, and provides a searchable data store for investigative queries. Examples include Splunk, IBM QRadar, Microsoft Sentinel, and the open-source Elastic SIEM.
NetFlow: A network protocol (originally Cisco, now standardised as IPFIX under RFC 7011) that records metadata about IP traffic flows: source and destination IP, port, protocol, byte count, and duration. NetFlow does not capture packet content but is far less storage-intensive than full packet capture and is adequate for many investigative queries.
Timeline reconstruction: The process of ordering digital events from multiple sources into a single chronological account. Requires normalising all timestamps to a common reference (typically UTC), identifying and accounting for clock-skew, and cross-referencing entries from different log sources to build a coherent narrative.
Link analysis: A graph-based analytical technique that maps entities (IP addresses, domains, accounts, phone numbers, wallets) as nodes and relationships (communications, ownership, transactions) as edges. Used to identify shared infrastructure, trace criminal networks, and follow financial flows.
Chain of custody: The documented record of who handled a piece of evidence, when, and what was done with it at each stage, from collection through analysis to presentation in court. A broken chain of custody allows the authenticity of the evidence to be challenged.
Hypothesis testing: In digital forensics, the practice of forming a specific, falsifiable proposition about what occurred (such as 'the attacker used account X to exfiltrate data between 02:00 and 04:00 UTC') and then searching the evidence specifically to confirm or refute it. Prevents confirmation bias from driving the analysis.

The four-phase investigative workflow

A cyber investigation can be divided into four phases: acquisition and preservation, ingestion and normalisation, analysis and hypothesis testing, and reporting and court preparation. These phases are not always strictly sequential. In a live incident response, acquisition and analysis may overlap; a finding in the analysis phase may require going back to collect additional evidence. The framework is still useful because it assigns clear quality gates to each phase.

Phase	Primary activity	Quality gate
1. Acquisition and preservation	Collect logs, images, and packet captures; hash all evidence	Cryptographic hash verified; chain of custody record started
2. Ingestion and normalisation	Load data into SIEM or analysis platform; normalise timestamps and field names	All sources loaded; time-zone offsets documented; data completeness checked
3. Analysis and hypothesis testing	Run queries, reconstruct timeline, test hypotheses, apply link analysis	Each finding references a specific data source, query, and output; alternative explanations considered
4. Reporting and court preparation	Produce written findings; attach supporting artefacts; prepare for cross-examination	Second-examiner review complete; all tool versions and hashes documented

The quality gate concept is borrowed from software engineering but applies directly to forensics. Before moving from one phase to the next, the investigator confirms that the current phase's outputs are complete and documented. Skipping the quality gate at phase 1 (for example, collecting logs without hashing them) creates a chain-of-custody problem that cannot be fixed in phase 4.

SIEM platforms: ingestion, correlation, and limitations

A SIEM platform ingests log data from dozens or hundreds of sources: Windows event logs, syslog from Linux hosts, firewall logs, DNS query logs, web proxy logs, authentication logs from Active Directory or LDAP, and application logs. Each source formats its data differently. The SIEM's normalisation layer maps these diverse formats to a common schema, so that a query for 'failed authentication events between 01:00 and 03:00 UTC' returns results from Windows, Linux, and network device sources in a single result set.

Correlation rules are pre-written logic applied to the normalised data stream. A typical rule might flag any account that produces more than 50 failed login attempts within 10 minutes, or any outbound connection to an IP address that appears in a threat intelligence feed. These rules generate alerts, which are the SIEM's primary operational output. In an investigation, the investigator also runs ad hoc queries directly against the historical data store, which in platforms like Splunk uses a proprietary search language (SPL) and in Elastic uses the Kibana Query Language (KQL).

Common SIEM platforms differ in architecture and licensing model. Splunk and IBM QRadar are primarily commercial, deployed on-premises or in cloud instances. Microsoft Sentinel is cloud-native and integrates with Azure services. The Elastic Stack (Elasticsearch, Logstash, Kibana) is open-source with commercial support options. In most jurisdictions, the choice of SIEM is an operational decision rather than a legal one, provided the platform preserves log integrity and documents its processing steps.

Network forensics: packet capture and flow analysis

Network forensics tools operate at the packet level rather than the log level. A packet capture records every byte traversing a network segment, preserving the full content of communications including file transfers, email bodies, and command-and-control traffic. Wireshark is the standard tool for interactive packet analysis. Tcpdump is used for command-line capture. Zeek (formerly known as Bro) is an open-source network analysis framework that processes live or recorded traffic and generates structured log files covering connections, DNS queries, HTTP transactions, SSL certificates, and file transfers.

Full packet capture at enterprise scale is storage-intensive. A 1 Gbps link generates approximately 450 GB of raw packet data per hour. Most organisations retain full packet capture only at perimeter points (internet gateway, data centre boundary) and only for limited periods, typically 24 to 72 hours. For longer-term visibility, NetFlow or IPFIX records are used instead. A NetFlow record captures the metadata of each connection (source and destination IP and port, protocol, byte count, start and end time) without the payload. This is sufficient for many investigative queries: who connected to what, when, and how much data was transferred.

When packet captures are available, Wireshark's display filters allow rapid isolation of specific traffic. Filters such as ip.addr == 192.168.1.10 && tcp.port == 443 isolate all TLS traffic to or from a specific host. The 'Follow TCP Stream' function reconstructs the full content of a TCP session from individual packets, showing the complete exchange between client and server. For encrypted traffic, content reconstruction is not possible without decryption keys, but the metadata (connection timing, byte volume, server certificate) remains available and is often sufficient to establish that a connection occurred.

Timeline reconstruction across multiple sources

Timeline reconstruction is the process of merging events from multiple log sources into a single chronological account. The first step is always timestamp normalisation. Logs from different systems may record time in different time zones, with different precision (second versus millisecond), and with different clock accuracy. A Windows event log may be in local time; a firewall log may be in UTC; a web server log may be in UTC with millisecond precision. Before any cross-source correlation is meaningful, all timestamps must be converted to a single reference, typically UTC.

Clock skew is a common source of investigative error. If a system's clock is five minutes fast, every event on that system appears to precede events on correctly synchronised systems by five minutes. This can cause the investigator to conclude that an action on the skewed system preceded an enabling action on another system, when in fact the sequence was reversed. NTP (Network Time Protocol) synchronisation records, where available, provide the evidence needed to correct for clock skew. When NTP records are not available, the skew must be estimated from correlated events (for example, a user login visible in both an authentication log and an application log should appear at the same time in both; any difference is the skew).

After normalisation, the investigator identifies anchor events: high-confidence timestamps that can serve as fixed reference points, such as a confirmed malware execution time from an endpoint detection system or a known-good NTP-synchronised firewall log entry. All other events are positioned relative to these anchors. Gaps in the timeline are as significant as the events themselves: a 20-minute gap in authentication logs during a period when the attacker is known to have been active may indicate log deletion.

Link analysis and entity mapping

Link analysis treats the investigation as a graph problem. Entities are nodes: IP addresses, domain names, email addresses, user accounts, phone numbers, cryptocurrency wallet addresses, company registrations. Relationships are edges: this IP resolved to that domain, this account sent email to that address, this wallet transacted with that wallet. Building the graph makes structural patterns visible that are not apparent from linear log analysis, including shared infrastructure (multiple malicious domains resolving to the same IP), clustering of actor accounts, and the paths through which funds or data moved.

Maltego is the most widely used commercial link-analysis platform in cyber investigations. It automates data gathering from public and commercial sources (WHOIS, passive DNS, certificate transparency logs, social media, threat intelligence feeds) and renders the results as an interactive graph. Each data gathering action is called a 'transform'. The investigator selects an entity, runs transforms against it, and progressively expands the graph. Gephi is an open-source alternative suited for large static graphs. i2 Analyst's Notebook, originally developed for law enforcement, is used in financial crime and national security investigations and integrates with databases not typically connected to Maltego.

Cryptocurrency tracing is a specialised form of link analysis. Blockchain transactions are public and permanently recorded, making them traceable in principle. Tools such as Chainalysis Reactor and CipherTrace map the flow of funds through wallets, identify clusters of wallets controlled by the same entity (through common-input-ownership heuristics), and flag wallets associated with known exchanges, mixing services, or sanctioned entities. Exchanges operating under anti-money-laundering regulations in the US (FinCEN), EU (AMLD5/6), and India (PMLA 2002, as amended) are required to maintain know-your-customer records that can be obtained through legal process, linking a wallet address to a real-world identity.

Documenting the analytical process for court

Documentation is not an afterthought appended to the investigation. It is part of the investigative method. For each analytical step, the examiner records: the tool used (name and version number), the input data (identified by its cryptographic hash), the exact command or query run, the date and time of execution, and the output produced. This record allows a second examiner to repeat the step and verify the result. It also allows a court to assess whether the method is reliable.

In the United States, expert testimony on digital evidence is evaluated under the Daubert standard: the method must be testable, must have a known or estimable error rate, must have been subjected to peer review, and must be generally accepted in the relevant scientific community. Standard forensic tools that are widely used, documented, and validated (Autopsy, Wireshark, Splunk, Volatility) meet this threshold. Custom scripts written for a single investigation require more justification: the investigator must explain the logic, provide the source code, and demonstrate that it produces correct results on known test data.

In the UK, the ACPO Good Practice Guide for Digital Evidence (and its successor guidance from the National Police Chiefs' Council) requires that no action should change data held on digital devices, that any person accessing original data must be competent to do so, that an audit trail of all processes must be created, and that the officer in overall charge of the case has overall responsibility for compliance. These four principles have been widely cited in Commonwealth jurisdictions. In India, the Bharatiya Sakshya Adhiniyam 2023 (Section 57 and related provisions) governs electronic records, requiring a certificate attesting to the device's proper operation and the record's integrity. Across all frameworks, the practical requirement is identical: document what you did, with what tool, on what data, and when.

Worked example

Tracing a business email compromise from alert to report

A financial institution's SIEM generates an alert: an employee account sent 47 emails to external addresses between 02:00 and 03:00 UTC, outside normal business hours. The following steps show how the investigation proceeds from alert to documented finding.

Business email compromise (BEC) is among the most financially damaging cybercrime categories globally: the FBI's Internet Crime Complaint Center reported over $2.9 billion in adjusted losses in 2023. The scenario below is generic but representative of the investigative workflow across jurisdictions.

Alert triage (Phase 1). The SIEM alert identifies account j.smith@example.com as the source. The investigator queries the authentication logs for that account in the 24 hours prior to the anomalous email activity. The query (Splunk SPL: index=auth user=j.smith earliest=-24h) returns a successful authentication from a previously unseen IP address (203.0.113.45) at 01:47 UTC, approximately 13 minutes before the email sending began. The investigator hashes the query output and records the tool version.
Network and geo check (Phase 2). The IP 203.0.113.45 is queried against the organisation's NetFlow records and against a commercial threat intelligence feed. NetFlow shows no prior connections from this IP to the organisation in the preceding 90 days. The threat intelligence query returns a flag: the IP is associated with a residential proxy service. A WHOIS lookup shows the IP registered to an ISP in a country where the employee does not work. The link-analysis graph (Maltego) adds this IP as a node connected to j.smith's account.
Email content and recipient analysis (Phase 3). With legal authorisation, the email server logs are pulled. All 47 outbound emails were sent to addresses at a free webmail provider. The bodies cannot be retrieved from SIEM (email body content was not logged), but the subject lines and attachment names are in the mail server log. Subject lines reference 'payment instructions' and 'invoice update'. The investigator adds the 47 recipient addresses to the link-analysis graph and runs transforms to check for previous associations with known fraud infrastructure.
Timeline reconstruction (Phase 4). The investigator merges the SIEM authentication log, the mail server log, and the NetFlow records into a single UTC-normalised timeline using Log2Timeline. The timeline shows: 01:47 login from 203.0.113.45, 02:00 first outbound email, 02:58 last outbound email, 03:02 logout. No other sessions for j.smith are active during this period. The employee's actual working hours, from badge access records provided by physical security, show no building entry that day.
Hypothesis test and documentation (Phase 5). The hypothesis is: the account was compromised via credential theft and used by an unauthorised actor to send fraudulent payment-instruction emails. The evidence is consistent with this hypothesis. The alternative hypothesis (the employee sent the emails from home via a proxy) is tested by interviewing the employee and checking whether the employee's home IP appears in authentication logs. It does not; the employee denies any knowledge of the activity. The findings are compiled into a structured report. Each finding references the source log, the query, the tool and version, and the timestamp. The report is reviewed by a second examiner before submission.

Check your understanding

Question 1 of 4· 0 answered

An investigator notices that login events appear in both an application log and an authentication log, but the timestamps differ by four minutes. What is the most likely explanation and what should the investigator do?

Key Takeaways

A cyber investigation workflow has four phases: acquisition and preservation, ingestion and normalisation, analysis and hypothesis testing, and reporting and court preparation. Each phase has a quality gate that must be cleared before progressing.
SIEM platforms collect and correlate log data from multiple sources, but their output is limited by the completeness of log ingestion and the accuracy of correlation rules. Every SIEM alert is a hypothesis that requires verification against raw log data.
Network forensics uses full packet capture (Wireshark, Zeek) for content-level analysis and NetFlow/IPFIX for connection metadata over longer retention periods. Clock-skew normalisation is a prerequisite for accurate timeline reconstruction across sources.
Link analysis (Maltego, i2 Analyst's Notebook, Gephi) maps relationships among IP addresses, domains, accounts, and cryptocurrency wallets to reveal shared infrastructure, actor clusters, and financial flows that are not visible in any single log source.
Documenting each analytical step, including tool name and version, input data hash, exact command or query, and output, makes findings reproducible and satisfies court-admissibility standards across jurisdictions including the US Daubert standard, UK ACPO principles, and India's Bharatiya Sakshya Adhiniyam 2023.

What is a SIEM platform and why do cyber investigators use it?

A Security Information and Event Management (SIEM) platform collects, normalises, and correlates log data from multiple sources, including firewalls, endpoints, authentication servers, and applications. Investigators use it because it provides a single searchable data store, enables timeline reconstruction across disparate systems, and generates alerts based on correlation rules. Common platforms include Splunk, IBM QRadar, and Microsoft Sentinel.

What does timeline reconstruction mean in a cyber investigation?

Timeline reconstruction is the process of ordering digital events chronologically across multiple systems to produce a coherent account of what happened and when. It involves normalising timestamps to a common time zone (usually UTC), identifying gaps or clock-skew anomalies, and cross-referencing log entries from different sources so that an attacker's actions on one system can be correlated with activity on another.

What is network forensics and how does it differ from log analysis?

Network forensics involves the capture, preservation, and analysis of network traffic (packets) to reconstruct communications and identify malicious activity. Log analysis works from records generated by systems after the fact, which may be incomplete or manipulated. Packet capture provides the raw data stream and cannot be retrospectively altered, making it more authoritative when available, though full packet capture at scale is storage-intensive and is often replaced by NetFlow records.

What is link analysis and when is it used in cyber investigations?

Link analysis is a graph-based technique that maps relationships between entities such as IP addresses, domain names, accounts, email addresses, and transactions. It is used to identify infrastructure shared across attacks, trace threat actor networks, map botnet command-and-control structures, and follow cryptocurrency flows. Tools such as Maltego, Gephi, and i2 Analyst's Notebook are commonly used for this purpose.

Why is documentation of each analytical step legally important?

Courts in every jurisdiction require that digital evidence be authentic, that the methods used to obtain it are reliable, and that the chain of custody is unbroken. Documenting each analytical step, including the tool used, the version, the input data hash, the query or command run, and the output, allows a second examiner to reproduce the findings and allows the court to assess the method's validity. Without this documentation, findings may be challenged as unreliable or excluded entirely.

Test yourself on Cyber Forensics with free, timed mocks.

Practice Cyber Forensics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.