Server and Application Log Analysis
Web server, email, database, and application logs each record a distinct slice of system activity that investigators can reconstruct into a timeline of user actions and intrusion events. This topic covers common log formats, parsing and correlation techniques, indicators of compromise, and the residual traces that survive anti-forensic log-clearing attempts.
Last updated:
Server and application log analysis is the discipline of extracting investigative value from the records that web servers, mail transfer agents, database engines, and application frameworks write as a by-product of normal operation. Each log source records a narrow slice of activity: an Apache access log shows HTTP requests; a Postfix mail log shows message routing; a MySQL general query log shows SQL statements; an authentication log shows login attempts. The forensic value of any single source is limited. Correlated across sources, aligned on a common timeline, and compared against known attacker behaviour, these records can reconstruct the sequence of an intrusion, identify the accounts and files that were accessed, and fix the boundaries of what the attacker saw.
Log formats range from unstructured plain text to structured JSON and binary formats. The Common Log Format and its extension, the Combined Log Format, are the baseline for web server access logs. Syslog (RFC 5424) carries system and application events across Unix-like platforms. Windows event logs use a proprietary binary format exposed through the Windows Event API and exportable to EVTX. Database audit logs vary by vendor. Understanding the structure of each format is a prerequisite for reliable automated parsing: a field that appears to be a client IP may contain a proxy chain, and a timestamp with no timezone label may be ambiguous by several hours.
Anti-forensic log manipulation is a consistent feature of sophisticated intrusions. Attackers who maintain persistence on a system will clear, truncate, or selectively edit logs to remove evidence of their presence. Understanding what traces survive log-clearing, which architectures make clearing harder, and how to detect the clearing event itself is part of the discipline. Real-time forwarding to a centralised log aggregator that the attacker cannot reach is the primary defensive measure and the primary reason investigators should always ask whether a centralised syslog or SIEM infrastructure exists before concluding that the local log record is complete.
By the end of this topic you will be able to:
- Identify and parse the Common Log Format, Combined Log Format, syslog, and Windows EVTX log structures, and explain what each field records.
- Describe the forensic value of web server, email, database, and application logs and the specific indicators of compromise each source can reveal.
- Apply log correlation techniques to align multiple log sources on a single timeline and identify attacker activity sequences.
- Recognise signs of anti-forensic log manipulation and explain what residual traces survive after local log files are cleared or deleted.
- Explain the legal basis for collecting server logs as evidence under frameworks including the Computer Fraud and Abuse Act (US), the Computer Misuse Act (UK), and the Information Technology Act 2000 (India), and the chain-of-custody requirements that apply.
- Combined Log Format
- An extension of the Common Log Format used as the default by Apache HTTP Server and widely adopted by Nginx. Adds referrer URL and user-agent string to the base six fields of the Common Log Format. The standard starting point for web server access log analysis.
- Syslog (RFC 5424)
- A standard protocol and message format for transmitting log data from Unix-like systems and network devices to a centralised collector. Each message carries a facility code, severity level, timestamp, hostname, and message text. The primary transport mechanism for centralised log aggregation.
- Indicator of Compromise (IoC)
- A specific observable artifact in log data or system state that indicates a security incident has occurred or is in progress. Examples include repeated failed authentication attempts from a single IP, HTTP requests to known malicious URIs, and SQL statements that include UNION SELECT sequences.
- SIEM
- Security Information and Event Management. A platform that ingests log streams from multiple sources, normalises them to a common schema, and applies correlation rules to generate alerts. Examples include Splunk, IBM QRadar, Microsoft Sentinel, and the open-source Wazuh. The primary infrastructure for large-scale log correlation.
- Log rotation
- The scheduled process of closing the current log file, compressing it, renaming it with a date or sequence suffix, and opening a new file. On Linux systems, logrotate manages this. Rotation schedules define how long historical logs are retained locally, which directly affects how far back an investigation can reach without a centralised archive.
- Binary log (database)
- A database engine's sequential record of all committed data modification statements, used primarily for replication and point-in-time recovery. In MySQL and MariaDB, the binary log is the most complete forensic record of write activity when the general query log is not enabled. On PostgreSQL the equivalent is the write-ahead log (WAL).
Web server log formats and forensic fields
The Apache HTTP Server Common Log Format is the baseline for web access logging. Each line records six fields in order: the client IP address (or the last proxy in the forwarding chain), the RFC 1413 identity (almost always a hyphen), the authenticated username if HTTP authentication is in use, the timestamp in square brackets with timezone offset, the request line in quotes, the HTTP status code, and the response body size in bytes. The Combined Log Format appends two more: the referrer URL and the user-agent string.
| Field | Example value | Forensic significance |
|---|---|---|
| Client IP | 203.0.113.42 | Source of the request; may be a proxy or Tor exit node |
| Timestamp | [24/Jun/2026:10:15:32 +0000] | Sequence and timing of requests; confirm timezone |
| Request line | "GET /admin/login HTTP/1.1" | Target URI, method, and protocol version |
| Status code | 403 | Access control response; repeated 403s suggest scanning |
| Response size | 1842 | Zero-byte responses may indicate blocked or empty resources |
| Referrer | "https://example.com/search" | Navigation path; direct-type attacks have no referrer |
| User-agent | "sqlmap/1.7.2" | Tool fingerprint; often deliberately falsified |
Nginx uses the same Combined Log Format by default. Microsoft IIS uses its own W3C Extended Log Format, which records similar fields but with different delimiters and an explicit field-name header line. The W3C format also includes the server-side port and the time taken to serve the request in milliseconds, which can help identify slow-response DoS conditions.
A critical limitation is the X-Forwarded-For header. When a web application sits behind a load balancer or reverse proxy, the client IP field in the log records the proxy's address, not the original client's. The original client IP appears in the X-Forwarded-For request header, which Nginx and Apache can be configured to log as an additional field. Without this configuration, the IP field in the log is forensically uninformative for hosted or cloud applications.
Email server logs and mail transfer analysis
Mail transfer agents (MTAs) generate per-message log entries at each stage of delivery: receipt, queue, forwarding, and delivery or bounce. Postfix, the most common MTA on Linux, logs to syslog with entries that include a unique queue ID, the sender address, the recipient address, the relay (next-hop server), and the delivery status. Each queue ID allows an investigator to trace a single message through multiple log entries covering its full lifecycle on that server.
For phishing and business email compromise (BEC) investigations, the Received header chain embedded in the message itself is a primary source. Each server that handles a message prepends a Received header recording the server's identity, the IP it accepted the message from, and the timestamp. Reading the Received chain from bottom to top traces the message's path from origin to delivery. Investigators compare this chain against the sending domain's SPF and DKIM records to identify forgery or authentication failures that the mail server may have logged.
Authentication logs from the mail server's IMAP and POP3 services record login attempts with client IP, username, and success or failure. A pattern of failed authentications from geographically diverse IP addresses followed by a successful login from a new location is a common indicator of credential stuffing against webmail. Microsoft Exchange and Office 365 (Microsoft 365) generate equivalent records through their unified audit log, which is accessible via the Microsoft Purview compliance portal and retains data for 90 days by default under standard licensing.
Database logs and query audit trails
Database engines maintain several independent log streams with different coverage and forensic value. The error log records startup, shutdown, authentication failures, and privilege errors. The general query log records every statement executed, including SELECT queries, and is the most complete record of data access. It is disabled by default on production systems because the write overhead is significant, so its absence in an investigation is expected rather than suspicious. The slow query log records statements that exceed a configured execution threshold and is more commonly enabled.
The MySQL and MariaDB binary log is a sequential record of all committed data modification events in a binary format. Each event records the statement or row changes, the database user, the timestamp, and the server ID. The binary log can be decoded with the mysqlbinlog utility. In SQL injection investigations where the attacker extracted data with SELECT statements, the binary log will not help (because SELECT statements are not writes), but it will show whether the attacker also modified or deleted records. On PostgreSQL, the write-ahead log (WAL) serves the same role.
Enterprise database platforms have dedicated audit systems. Oracle Database provides the Unified Audit Trail, which captures logins, logouts, privilege changes, and configurable statement categories. Microsoft SQL Server provides the SQL Server Audit feature, which writes to the Windows Security event log or to binary audit files. Both allow retroactive analysis of which accounts accessed which tables and when, making them the primary source for data exfiltration investigations where the attacker authenticated with legitimate credentials.
Application and authentication log sources
Beyond web server and database logs, modern applications generate their own application-level logs through logging frameworks. In Java applications, Log4j and Logback are the dominant frameworks. In Python, the standard logging module is universal. In .NET, Serilog and NLog are common. These logs capture business-logic events: user logins, privilege changes, failed authorisation checks, file downloads, and custom error conditions that the web server layer cannot see. Their format is application-defined, so the first step in analysis is locating the logging configuration to understand the schema.
Authentication events are recorded in different places depending on the operating system. On Linux, the PAM subsystem writes authentication results to /var/log/auth.log (Debian-based) or /var/log/secure (Red Hat-based), including SSH logins, sudo uses, and su commands. The last and lastb commands parse the wtmp and btmp binary files respectively to list successful and failed logins with timestamps and source IPs. On Windows, Security event log events 4624 (successful login), 4625 (failed login), 4648 (explicit credential use), and 4720 (account creation) are the primary authentication audit sources.
Cloud platforms add their own audit layers. AWS CloudTrail records every API call made to AWS services, with the identity of the caller, the source IP, the time, and the request parameters. Azure Monitor and Azure Activity Log provide equivalent coverage for Microsoft Azure. Google Cloud Audit Logs cover admin activity, data access, and system events. These cloud audit trails are stored outside the compromised host by default, making them difficult for an attacker who has compromised only the application layer to tamper with.
For cross-jurisdictional investigations, the legal framework for compelling disclosure of log records varies. In the United States, the Stored Communications Act (part of the Electronic Communications Privacy Act 1986) governs compelled disclosure from service providers. In the United Kingdom, the Investigatory Powers Act 2016 covers data retention and access orders. In India, Section 91 of the Bharatiya Nagarik Suraksha Sanhita 2023 (which replaced Section 91 of the CrPC) provides the mechanism for production of electronic records, with admissibility governed by the Bharatiya Sakshya Adhiniyam 2023. The EU's GDPR creates tension with data retention requirements, because logs containing IP addresses are personal data and must be justified under a lawful basis.
Log correlation and timeline reconstruction
Log correlation is the process of combining records from multiple independent sources into a single unified timeline. The prerequisite is timestamp normalisation: all log sources must use the same reference timezone, typically UTC, before records can be sorted by time. Network Time Protocol (NTP) synchronisation on servers reduces clock drift between sources, but the investigator should verify NTP status and any documented drift against the server's own ntpd or chrony logs.
With timestamps normalised, correlation joins records by shared attributes. An IP address that appears in a web server access log at a given time can be matched to firewall connection logs, proxy logs, and IDS alerts for the same address in the same time window. A username that appears in an authentication log can be joined to database audit records showing what that user queried. A session token appearing in application logs links individual HTTP requests to the session that generated them.
The tools used for correlation range from simple command-line utilities to enterprise SIEM platforms. For small-scale investigations, grep, awk, sort, and join on a Unix shell can produce a correlated timeline from a handful of log files. For larger investigations, tools such as Elastic Stack (Elasticsearch, Logstash, Kibana), Splunk, or Graylog ingest logs from multiple sources, index them, and provide query and visualisation capabilities. The open-source Wazuh platform combines log ingestion with rule-based detection and is commonly used in resource-constrained environments.
A reconstructed attack timeline typically shows identifiable phases: reconnaissance (scanning requests, probing URIs that return 404), initial access (the exploit request, the successful authentication bypass, or the phishing click), execution (commands run on the server, queries executed in the database), lateral movement (connections from the compromised host to internal systems), and exfiltration (large outbound transfers, downloads of database dumps). The timeline does not need to be complete to be useful: even a partial sequence is often sufficient to identify the initial access vector and the scope of data accessed.
Anti-forensic log clearing and residual traces
Log clearing is a standard post-compromise activity for attackers who wish to extend persistence without detection. The techniques vary by platform. On Linux, an attacker with root access can truncate log files with the > operator, delete them with rm, or use the shred utility to overwrite file content before deletion. The history file for bash can be cleared by unsetting the HISTFILE variable or writing directly to /dev/null. On Windows, an attacker can clear the Security, System, or Application event logs using the wevtutil cl command or through the Event Viewer GUI.
Each clearing technique leaves its own signature. A truncated log file retains its inode but has a modification timestamp that postdates the last legitimate entry. A deleted log file may leave an inode entry recoverable through file system forensic tools until the blocks are overwritten. The Linux kernel's process accounting (if enabled via psacct or acct) records every executed command with its arguments, user, and timestamp independent of shell history, so a shred or rm command will appear in the process accounting log even after the bash history is cleared. On Windows, the clearing of an event log itself generates a Security event log entry (Event ID 1102 for Security log, 104 for System log) if the clearing is performed through the Windows Event API, though this entry is in the same log that was just cleared, so it survives only if the clearing was incomplete.
The most reliable source of logs that survive host compromise is infrastructure that the attacker cannot access. A real-time syslog forward to a separate log server, especially one in a separate administrative domain, preserves the log state at the moment of forwarding. Cloud provider audit logs (CloudTrail, Azure Activity Log) are written to infrastructure outside the compromised instance. Network-level logs, including firewall connection logs and DNS query logs maintained on separate infrastructure, record the attacker's network activity even if the application-layer logs on the compromised host are wiped.
In the Apache Combined Log Format, a request line reads: GET /wp-admin/install.php HTTP/1.1. The status code is 200 and the user-agent field contains 'sqlmap/1.7.2'. What two indicators of compromise does this single log entry suggest?
Key Takeaways
- The Apache/Nginx Combined Log Format records client IP, timestamp, request line, status code, response size, referrer, and user-agent; the X-Forwarded-For field is critical for applications behind proxies or load balancers where the IP field alone is uninformative.
- Email server logs, database general query logs, application framework logs, and OS authentication logs each capture a different slice of attacker activity; correlated against a common UTC-normalised timeline, they reveal the full intrusion sequence that no single source can show alone.
- The MySQL binary log records committed write events even when the general query log is disabled, making it the primary source for confirming data modification or deletion by an attacker who used valid credentials.
- Anti-forensic log clearing leaves detectable traces: timestamp gaps in otherwise continuous streams, process accounting records of the clearing command, Windows Event ID 1102 for Security log clearing, and the presence of the full record on any centralised syslog or cloud audit infrastructure the attacker did not reach.
- Real-time forwarding to a centralised syslog server or cloud audit service in a separate administrative domain is the primary architectural defence against evidence destruction, and the first question an investigator should ask is whether such infrastructure exists before relying on local log records.
What is the Combined Log Format and why is it important in forensic investigations?
How can investigators detect log tampering by an attacker?
What residual traces survive after an attacker deletes or clears log files?
Which database log sources are most useful in a forensic investigation?
How does log correlation work across multiple server sources?
Test yourself on Mobile and Network Forensics with free, timed mocks.
Practice Mobile and Network Forensics questionsSpotted an error in this page? Report a correction or read our editorial standards.