Static Malware Analysis
Static malware analysis examines a suspicious file without executing it, using file-type identification, hashing, string extraction, and disassembly to characterise its capabilities and origin. This topic covers the analyst's toolkit and safe, repeatable workflow for conducting static examination of malware samples.
Last updated:
Static malware analysis is the examination of a suspicious file's content, structure, and code without ever executing it. An analyst works from a copy of the sample in an isolated environment, applying a sequence of tools: file-type identification to confirm what the binary actually is, cryptographic hashing to fingerprint it and query threat-intelligence platforms, string extraction to surface embedded URLs and API names, binary-header parsing to read compilation metadata and import tables, and disassembly or decompilation to read the code logic. Because the sample never runs, there is no risk of accidental infection, and every finding can be reproduced exactly from the same file bytes.
Static analysis is the first phase in any structured malware investigation workflow. It produces quick, low-risk intelligence: the file's hash can be submitted to VirusTotal within minutes; strings may reveal command-and-control domains before a sandbox run is even requested. The technique has limits, though. Packed or encrypted malware hides its real code until runtime, and heavily obfuscated samples resist string analysis. When static techniques hit these limits, the analyst moves to dynamic analysis. In practice, the two phases complement each other: static findings guide what to look for during execution, and dynamic observations point back to specific code regions to disassemble.
The legal authority for seizing, retaining, and analysing malware samples varies by jurisdiction. In India, the Information Technology Act 2000 (as amended 2008) and the Bharatiya Sakshya Adhiniyam 2023 govern digital evidence admissibility and require that analysis not alter the original evidence. In the United States, the Computer Fraud and Abuse Act and Federal Rules of Evidence Rule 901 set equivalent standards. In the United Kingdom, the Computer Misuse Act 1990 and the Police and Criminal Evidence Act 1984 apply. The European Union's Directive on Attacks Against Information Systems (2013/40/EU) harmonises criminal offences across member states. Regardless of jurisdiction, working from a verified hash-matched copy of the original and maintaining a documented chain of custody is mandatory for any analysis that may support a prosecution.
By the end of this topic you will be able to:
- Describe the purpose and sequence of the five core static analysis steps: file-type identification, hashing, string extraction, header parsing, and disassembly.
- Explain what information a PE (Portable Executable) import table reveals about a Windows malware sample's intended behaviour.
- Use cryptographic hashes as indicators of compromise and query threat-intelligence platforms to check whether a sample has been previously identified.
- Identify the signs of packing or obfuscation in static analysis output and explain why they limit the technique.
- Apply chain-of-custody and evidence-integrity principles to a malware examination that may support legal proceedings.
- Portable Executable (PE)
- The binary file format used by Windows executables (.exe), dynamic-link libraries (.dll), and drivers (.sys). The PE header contains a structured metadata block including the import table, section table, compilation timestamp, and entry point address. Parsing the PE header is a foundational step in static Windows malware analysis.
- Import Address Table (IAT)
- A section of the PE header that lists every external DLL and the functions the executable calls from each. A malware sample's IAT reveals its intended capabilities: calls to CreateRemoteThread suggest process injection; calls to CryptEncrypt suggest ransomware behaviour; calls to InternetOpen suggest network communication.
- Cryptographic hash
- A fixed-length digest produced from a file's bytes by an algorithm such as MD5 (128-bit), SHA-1 (160-bit), or SHA-256 (256-bit). Identical files always produce the same hash. Hashes are used to fingerprint malware samples, verify file integrity, and share indicators of compromise without distributing the sample itself.
- Packer / packing
- A technique in which the original malware code is compressed or encrypted and wrapped in a stub loader that decompresses or decrypts it at runtime. Packed binaries frustrate static analysis because the real payload is not visible in the file bytes; only the stub is available to the disassembler until the sample is executed or the packer is reversed.
- Disassembly
- The process of converting raw binary machine code back into human-readable assembly language instructions. Disassembly is always achievable from a binary, unlike decompilation, but requires the analyst to understand assembly mnemonics and calling conventions for the target architecture.
- Indicator of Compromise (IoC)
- An observable artefact that suggests a system has been involved in a malicious event. Static analysis produces file-based IoCs: cryptographic hashes, embedded domain names or IP addresses, and specific string patterns. These are shared via threat-intelligence platforms and detection rules so other defenders can identify the same threat.
Setting up the analysis environment
Before touching a malware sample, the analyst must prepare a safe, isolated workspace. The two fundamental requirements are isolation and integrity. Isolation means the analysis machine cannot route traffic to production networks, preventing accidental command-and-control callbacks and limiting any inadvertent execution damage. Integrity means the original sample is never modified, and every tool output can be traced back to a specific, hash-verified copy of the file.
In practice, static analysis is typically performed on a dedicated forensic workstation or inside a virtual machine with networking disabled or restricted to a controlled analysis VLAN. A VM snapshot taken before any sample is introduced allows the analyst to roll back to a clean state between cases. The host machine should run endpoint-detection software that is either disabled for the analysis session or configured to alert without blocking, to avoid the analysis tool being quarantined mid-examination.
The analyst should establish a case folder before beginning, recording the date and time of receipt, the source of the sample (incident ticket, email attachment hash, network capture extraction), the chain-of-custody information, and the hash of the original file. Every tool run should be logged with the command used and the timestamp. This documentation is what allows another analyst to reproduce the findings exactly, and what supports admissibility if the case reaches a court.
File-type identification and hashing
The first substantive step is confirming what the file actually is. File extensions are controlled by the attacker and are routinely spoofed; a file named invoice.pdf may be a Windows PE executable or a JavaScript dropper. The reliable method is to read the file's magic bytes: the first few bytes of any file contain a signature that identifies its format. The Unix file command and the Windows equivalent CFF Explorer or TrID read these bytes and report the actual format regardless of the extension. A PE executable always begins with the ASCII bytes MZ (0x4D5A). A PDF begins with %PDF. A ZIP archive begins with PK (0x504B).
Once the file type is confirmed, compute cryptographic hashes before doing anything else. The MD5, SHA-1, and SHA-256 hashes serve two purposes. First, they provide a permanent fingerprint of the file in its current state; any tool that modifies the file bytes will change the hash and reveal that tampering. Second, the hashes are the primary identifier for querying threat-intelligence platforms. Submitting a SHA-256 hash to VirusTotal returns the detection verdicts from over 70 antivirus engines plus any contextual reports linked to that hash, all without uploading the sample itself.
| Hash algorithm | Output length | Common use in malware analysis | Collision resistance |
|---|---|---|---|
| MD5 | 128-bit / 32 hex chars | Legacy IoC sharing; VirusTotal lookup | Weak (collision attacks practical) |
| SHA-1 | 160-bit / 40 hex chars | Some older threat-intel feeds | Broken (SHAttered, 2017) |
| SHA-256 | 256-bit / 64 hex chars | Current standard for IoC sharing | Strong |
| ssdeep (fuzzy hash) | Variable | Detecting similar variants of a sample | Not a cryptographic hash |
Fuzzy hashing tools such as ssdeep produce a hash that remains similar even when a file is slightly modified. This is useful for grouping malware variants that share a common code base despite different exact bytes. A malware author may change a single byte to defeat exact-hash detection; ssdeep will still show a high similarity score to the original sample.
String extraction
Most binaries contain sequences of printable ASCII or Unicode characters embedded in the file bytes: URLs for command-and-control servers, file paths that the malware creates or modifies, registry key names it reads or writes, error messages used for debugging during development, and the names of Windows API functions it imports. The strings command (available on Linux and as a Sysinternals tool on Windows) scans a file for sequences of printable characters above a minimum length threshold (typically four characters) and prints them. FLOSS (FireEye Labyratory Obfuscated String Solver) extends this by also extracting stack strings and encoded strings that the binary would construct at runtime.
A raw string dump from a malware sample may contain thousands of entries, most of them artefacts of the compiler or runtime library. The analyst filters for strings that reveal intent: any string that matches a URL pattern (http://, .onion, or an IP address in dotted-decimal notation); any string that matches a Windows registry path (HKCU\Software\..., HKLM\System\...); any string naming a sensitive file or directory; any string that appears to be a command or shell invocation; and any string in an unusual character set that might indicate an international target or a payload encoded in Base64.
PE header analysis and import table parsing
For Windows malware, the Portable Executable header is among the richest sources of static intelligence. Tools such as PEview, CFF Explorer, pestudio, and the Python library pefile parse the header into human-readable fields. The key fields to examine are: the compilation timestamp (can reveal when the binary was built, though it is easily forged), the section table (names, sizes, and entropy of each code and data section), and the import directory.
The Import Address Table lists every DLL the executable links to at load time and the specific functions it calls from each. This is extremely revealing: a binary that imports CreateRemoteThread, VirtualAllocEx, and WriteProcessMemory is almost certainly performing process injection. A binary importing CryptEncrypt or BCryptEncrypt alongside FindFirstFile and FindNextFile is searching the filesystem and encrypting files, the signature behaviour of ransomware. A binary importing WSAStartup, socket, connect, and send is establishing a network connection. The analyst builds a capability profile from this list before reading a single line of code.
Section entropy is a measure of randomness in the bytes of each section. Normal code sections have entropy in the range of 5.0 to 6.5. Packed or encrypted sections typically have entropy above 7.0, approaching the theoretical maximum of 8.0 for truly random data. High-entropy sections that are also marked as executable are a strong indicator of a packed payload. The tool pestudio flags high-entropy sections automatically and is a standard first-pass tool in many static analysis workflows.
| PE field | What to look for | Significance of anomaly |
|---|---|---|
| Compilation timestamp | Future date, or 1970-01-01 epoch | Possibly forged to mislead attribution |
| Section names | Names like .text, .data, .rdata are normal; random strings or known packer names (.packed, UPX0) are not | Indicates packing or obfuscation |
| Section entropy | Above 7.0 on executable sections | Encrypted or compressed payload |
| Import table | Very few imports, or only LoadLibrary and GetProcAddress | Imports resolved dynamically to hide capabilities |
| Number of sections | Fewer than 3 or more than 10 for a simple binary | May indicate stripping or added shellcode sections |
Disassembly and code analysis
Disassembly converts the raw binary bytes into assembly language instructions. The dominant tool for professional static analysis is IDA Pro, which produces an interactive disassembly graph, reconstructs function boundaries, and allows the analyst to annotate the code. Ghidra, developed and released by the US National Security Agency, is a free alternative with comparable capabilities and an integrated decompiler that produces C-like pseudocode. Radare2 is a command-line framework with scripting support, preferred in automated analysis pipelines.
The analyst begins at the entry point, the address recorded in the PE header where execution starts, and follows the control flow. In a packed binary, the entry point typically leads to a short stub that decrypts or decompresses the real code into memory and then jumps to it. Recognising the unpacking stub pattern (a loop that writes to a newly allocated memory region, ending with a computed jump) tells the analyst where the real analysis begins. In an unpacked binary, the entry point leads to the main payload logic relatively quickly.
Code analysis at this level requires understanding of x86 or x86-64 assembly, Windows calling conventions (the Microsoft x64 ABI for 64-bit binaries, cdecl or stdcall for 32-bit), and common patterns in malware code. A call to GetProcAddress immediately after LoadLibraryA, with the library name and function name passed as string arguments, is how malware resolves imports at runtime to hide them from the static import table. Recognising this pattern allows the analyst to annotate what the call is actually loading.
Documenting findings and producing IoCs
Static analysis concludes with a structured report that documents every finding in a reproducible form. The report records: the file hash (SHA-256 as the primary identifier, with MD5 and SHA-1 for legacy compatibility), the file type, size, and compilation timestamp; a capability profile derived from the import table; all meaningful strings grouped by category (network indicators, filesystem targets, registry paths, encoded strings); and any assembly-level findings from disassembly, including identified functions and obfuscation techniques.
IoCs extracted from static analysis are published in a structured format so that detection tools can consume them automatically. STIX (Structured Threat Information eXpression) is the current standard for expressing threat intelligence, including file hashes, IP addresses, domain names, and URL patterns. YARA rules are used to express code-level signatures: a YARA rule can specify that a file must contain a particular string pattern, have a particular hash, or satisfy conditions on PE header fields. YARA rules run against file systems and memory captures to detect the same malware or variants.
For investigations involving malware used in cybercrime, the findings from static analysis feed into the broader investigation workflow described in the cyber investigation process. Network-based IoCs such as command-and-control domains are passed to network investigators for traffic correlation. Attribution artefacts such as compilation metadata, language settings in string tables, and code reuse patterns are forwarded to threat-intelligence analysts. The static analysis report is not an endpoint; it is an input to the wider case.
An analyst runs file on a binary received via email and the output reads 'PE32 executable (GUI) Intel 80386'. The file was named quarterly_report.pdf. What does this finding mean?
Key Takeaways
- Static malware analysis examines a file without executing it, using file-type identification, hashing, string extraction, PE header parsing, and disassembly to build a capability and intelligence profile with no infection risk.
- Cryptographic hashes (SHA-256 as the current standard) fingerprint a sample uniquely, enable threat-intelligence lookups, and serve as the primary file-based indicator of compromise shared across defenders.
- The PE import table reveals a Windows malware sample's intended capabilities before a single line of code is read: specific API function groups map directly to process injection, encryption, persistence, or network communication.
- High section entropy (above 7.0) and a minimal import table containing only LoadLibraryA and GetProcAddress are the two clearest static indicators that a sample is packed, requiring dynamic unpacking or memory forensics to recover the real payload.
- All analysis must be performed on a hash-verified copy in an isolated environment, with every step logged, to satisfy chain-of-custody requirements under applicable law, including the Bharatiya Sakshya Adhiniyam 2023, US Federal Rules of Evidence, and the UK Police and Criminal Evidence Act 1984.
What is static malware analysis?
How does hashing help in malware analysis?
What do strings extracted from a malware sample reveal?
What is a PE header and why does it matter in static analysis?
What is the difference between disassembly and decompilation in malware analysis?
Test yourself on Cyber Forensics with free, timed mocks.
Practice Cyber Forensics questionsSpotted an error in this page? Report a correction or read our editorial standards.