Skip to content

Static Malware Analysis

Static malware analysis examines a suspicious file without executing it, using file-type identification, hashing, string extraction, and disassembly to characterise its capabilities and origin. This topic covers the analyst's toolkit and safe, repeatable workflow for conducting static examination of malware samples.

Last updated:

Share

Static malware analysis is the examination of a suspicious file's content, structure, and code without ever executing it. An analyst works from a copy of the sample in an isolated environment, applying a sequence of tools: file-type identification to confirm what the binary actually is, cryptographic hashing to fingerprint it and query threat-intelligence platforms, string extraction to surface embedded URLs and API names, binary-header parsing to read compilation metadata and import tables, and disassembly or decompilation to read the code logic. Because the sample never runs, there is no risk of accidental infection, and every finding can be reproduced exactly from the same file bytes.

Static analysis is the first phase in any structured malware investigation workflow. It produces quick, low-risk intelligence: the file's hash can be submitted to VirusTotal within minutes; strings may reveal command-and-control domains before a sandbox run is even requested. The technique has limits, though. Packed or encrypted malware hides its real code until runtime, and heavily obfuscated samples resist string analysis. When static techniques hit these limits, the analyst moves to dynamic analysis. In practice, the two phases complement each other: static findings guide what to look for during execution, and dynamic observations point back to specific code regions to disassemble.

The legal authority for seizing, retaining, and analysing malware samples varies by jurisdiction. In India, the Information Technology Act 2000 (as amended 2008) and the Bharatiya Sakshya Adhiniyam 2023 govern digital evidence admissibility and require that analysis not alter the original evidence. In the United States, the Computer Fraud and Abuse Act and Federal Rules of Evidence Rule 901 set equivalent standards. In the United Kingdom, the Computer Misuse Act 1990 and the Police and Criminal Evidence Act 1984 apply. The European Union's Directive on Attacks Against Information Systems (2013/40/EU) harmonises criminal offences across member states. Regardless of jurisdiction, working from a verified hash-matched copy of the original and maintaining a documented chain of custody is mandatory for any analysis that may support a prosecution.

Step 1: File-type ID -- read magic bytes, confirmformat ignoring extensionStep 2: Hashing -- compute SHA-256 and MD5, queryVirusTotalStep 3: String extraction -- filter URLs, registrypaths, API namesStep 4: PE header parsing -- imports, sectionentropy, timestampPacked or encrypted: pivot todynamic analysisStep 5: Disassembly -- trace entry point, identifycontrol flow and capabilitiesOutput: IoCs (SHA-256, C2 domains, YARA rules) andcapability profileentropy above 7.0 or minimal importsno packing signalsNormal analysis pathFinal outputsPacking detected
The five-step static workflow: the PE header branch at step 4 is the decision point -- high section entropy (above 7.0) or a minimal import table containing only LoadLibraryA and GetProcAddress signals packing, and the analyst must pivot to dynamic analysis before disassembly is meaningful.

By the end of this topic you will be able to:

  • Describe the purpose and sequence of the five core static analysis steps: file-type identification, hashing, string extraction, header parsing, and disassembly.
  • Explain what information a PE (Portable Executable) import table reveals about a Windows malware sample's intended behaviour.
  • Use cryptographic hashes as indicators of compromise and query threat-intelligence platforms to check whether a sample has been previously identified.
  • Identify the signs of packing or obfuscation in static analysis output and explain why they limit the technique.
  • Apply chain-of-custody and evidence-integrity principles to a malware examination that may support legal proceedings.
Key terms
Portable Executable (PE)
The binary file format used by Windows executables (.exe), dynamic-link libraries (.dll), and drivers (.sys). The PE header contains a structured metadata block including the import table, section table, compilation timestamp, and entry point address. Parsing the PE header is a foundational step in static Windows malware analysis.
Import Address Table (IAT)
A section of the PE header that lists every external DLL and the functions the executable calls from each. A malware sample's IAT reveals its intended capabilities: calls to CreateRemoteThread suggest process injection; calls to CryptEncrypt suggest ransomware behaviour; calls to InternetOpen suggest network communication.
Cryptographic hash
A fixed-length digest produced from a file's bytes by an algorithm such as MD5 (128-bit), SHA-1 (160-bit), or SHA-256 (256-bit). Identical files always produce the same hash. Hashes are used to fingerprint malware samples, verify file integrity, and share indicators of compromise without distributing the sample itself.
Packer / packing
A technique in which the original malware code is compressed or encrypted and wrapped in a stub loader that decompresses or decrypts it at runtime. Packed binaries frustrate static analysis because the real payload is not visible in the file bytes; only the stub is available to the disassembler until the sample is executed or the packer is reversed.
Disassembly
The process of converting raw binary machine code back into human-readable assembly language instructions. Disassembly is always achievable from a binary, unlike decompilation, but requires the analyst to understand assembly mnemonics and calling conventions for the target architecture.
Indicator of Compromise (IoC)
An observable artefact that suggests a system has been involved in a malicious event. Static analysis produces file-based IoCs: cryptographic hashes, embedded domain names or IP addresses, and specific string patterns. These are shared via threat-intelligence platforms and detection rules so other defenders can identify the same threat.

Setting up the analysis environment

Before touching a malware sample, the analyst must prepare a safe, isolated workspace. The two fundamental requirements are isolation and integrity. Isolation means the analysis machine cannot route traffic to production networks, preventing accidental command-and-control callbacks and limiting any inadvertent execution damage. Integrity means the original sample is never modified, and every tool output can be traced back to a specific, hash-verified copy of the file.

In practice, static analysis is typically performed on a dedicated forensic workstation or inside a virtual machine with networking disabled or restricted to a controlled analysis VLAN. A VM snapshot taken before any sample is introduced allows the analyst to roll back to a clean state between cases. The host machine should run endpoint-detection software that is either disabled for the analysis session or configured to alert without blocking, to avoid the analysis tool being quarantined mid-examination.

The analyst should establish a case folder before beginning, recording the date and time of receipt, the source of the sample (incident ticket, email attachment hash, network capture extraction), the chain-of-custody information, and the hash of the original file. Every tool run should be logged with the command used and the timestamp. This documentation is what allows another analyst to reproduce the findings exactly, and what supports admissibility if the case reaches a court.

File-type identification and hashing

The first substantive step is confirming what the file actually is. File extensions are controlled by the attacker and are routinely spoofed; a file named invoice.pdf may be a Windows PE executable or a JavaScript dropper. The reliable method is to read the file's magic bytes: the first few bytes of any file contain a signature that identifies its format. The Unix file command and the Windows equivalent CFF Explorer or TrID read these bytes and report the actual format regardless of the extension. A PE executable always begins with the ASCII bytes MZ (0x4D5A). A PDF begins with %PDF. A ZIP archive begins with PK (0x504B).

Once the file type is confirmed, compute cryptographic hashes before doing anything else. The MD5, SHA-1, and SHA-256 hashes serve two purposes. First, they provide a permanent fingerprint of the file in its current state; any tool that modifies the file bytes will change the hash and reveal that tampering. Second, the hashes are the primary identifier for querying threat-intelligence platforms. Submitting a SHA-256 hash to VirusTotal returns the detection verdicts from over 70 antivirus engines plus any contextual reports linked to that hash, all without uploading the sample itself.

Hash algorithmOutput lengthCommon use in malware analysisCollision resistance
MD5128-bit / 32 hex charsLegacy IoC sharing; VirusTotal lookupWeak (collision attacks practical)
SHA-1160-bit / 40 hex charsSome older threat-intel feedsBroken (SHAttered, 2017)
SHA-256256-bit / 64 hex charsCurrent standard for IoC sharingStrong
ssdeep (fuzzy hash)VariableDetecting similar variants of a sampleNot a cryptographic hash

Fuzzy hashing tools such as ssdeep produce a hash that remains similar even when a file is slightly modified. This is useful for grouping malware variants that share a common code base despite different exact bytes. A malware author may change a single byte to defeat exact-hash detection; ssdeep will still show a high similarity score to the original sample.

String extraction

Most binaries contain sequences of printable ASCII or Unicode characters embedded in the file bytes: URLs for command-and-control servers, file paths that the malware creates or modifies, registry key names it reads or writes, error messages used for debugging during development, and the names of Windows API functions it imports. The strings command (available on Linux and as a Sysinternals tool on Windows) scans a file for sequences of printable characters above a minimum length threshold (typically four characters) and prints them. FLOSS (FireEye Labyratory Obfuscated String Solver) extends this by also extracting stack strings and encoded strings that the binary would construct at runtime.

A raw string dump from a malware sample may contain thousands of entries, most of them artefacts of the compiler or runtime library. The analyst filters for strings that reveal intent: any string that matches a URL pattern (http://, .onion, or an IP address in dotted-decimal notation); any string that matches a Windows registry path (HKCU\Software\..., HKLM\System\...); any string naming a sensitive file or directory; any string that appears to be a command or shell invocation; and any string in an unusual character set that might indicate an international target or a payload encoded in Base64.

PE header analysis and import table parsing

For Windows malware, the Portable Executable header is among the richest sources of static intelligence. Tools such as PEview, CFF Explorer, pestudio, and the Python library pefile parse the header into human-readable fields. The key fields to examine are: the compilation timestamp (can reveal when the binary was built, though it is easily forged), the section table (names, sizes, and entropy of each code and data section), and the import directory.

The Import Address Table lists every DLL the executable links to at load time and the specific functions it calls from each. This is extremely revealing: a binary that imports CreateRemoteThread, VirtualAllocEx, and WriteProcessMemory is almost certainly performing process injection. A binary importing CryptEncrypt or BCryptEncrypt alongside FindFirstFile and FindNextFile is searching the filesystem and encrypting files, the signature behaviour of ransomware. A binary importing WSAStartup, socket, connect, and send is establishing a network connection. The analyst builds a capability profile from this list before reading a single line of code.

Section entropy is a measure of randomness in the bytes of each section. Normal code sections have entropy in the range of 5.0 to 6.5. Packed or encrypted sections typically have entropy above 7.0, approaching the theoretical maximum of 8.0 for truly random data. High-entropy sections that are also marked as executable are a strong indicator of a packed payload. The tool pestudio flags high-entropy sections automatically and is a standard first-pass tool in many static analysis workflows.

PE fieldWhat to look forSignificance of anomaly
Compilation timestampFuture date, or 1970-01-01 epochPossibly forged to mislead attribution
Section namesNames like .text, .data, .rdata are normal; random strings or known packer names (.packed, UPX0) are notIndicates packing or obfuscation
Section entropyAbove 7.0 on executable sectionsEncrypted or compressed payload
Import tableVery few imports, or only LoadLibrary and GetProcAddressImports resolved dynamically to hide capabilities
Number of sectionsFewer than 3 or more than 10 for a simple binaryMay indicate stripping or added shellcode sections

Disassembly and code analysis

Disassembly converts the raw binary bytes into assembly language instructions. The dominant tool for professional static analysis is IDA Pro, which produces an interactive disassembly graph, reconstructs function boundaries, and allows the analyst to annotate the code. Ghidra, developed and released by the US National Security Agency, is a free alternative with comparable capabilities and an integrated decompiler that produces C-like pseudocode. Radare2 is a command-line framework with scripting support, preferred in automated analysis pipelines.

The analyst begins at the entry point, the address recorded in the PE header where execution starts, and follows the control flow. In a packed binary, the entry point typically leads to a short stub that decrypts or decompresses the real code into memory and then jumps to it. Recognising the unpacking stub pattern (a loop that writes to a newly allocated memory region, ending with a computed jump) tells the analyst where the real analysis begins. In an unpacked binary, the entry point leads to the main payload logic relatively quickly.

Code analysis at this level requires understanding of x86 or x86-64 assembly, Windows calling conventions (the Microsoft x64 ABI for 64-bit binaries, cdecl or stdcall for 32-bit), and common patterns in malware code. A call to GetProcAddress immediately after LoadLibraryA, with the library name and function name passed as string arguments, is how malware resolves imports at runtime to hide them from the static import table. Recognising this pattern allows the analyst to annotate what the call is actually loading.

Documenting findings and producing IoCs

Static analysis concludes with a structured report that documents every finding in a reproducible form. The report records: the file hash (SHA-256 as the primary identifier, with MD5 and SHA-1 for legacy compatibility), the file type, size, and compilation timestamp; a capability profile derived from the import table; all meaningful strings grouped by category (network indicators, filesystem targets, registry paths, encoded strings); and any assembly-level findings from disassembly, including identified functions and obfuscation techniques.

IoCs extracted from static analysis are published in a structured format so that detection tools can consume them automatically. STIX (Structured Threat Information eXpression) is the current standard for expressing threat intelligence, including file hashes, IP addresses, domain names, and URL patterns. YARA rules are used to express code-level signatures: a YARA rule can specify that a file must contain a particular string pattern, have a particular hash, or satisfy conditions on PE header fields. YARA rules run against file systems and memory captures to detect the same malware or variants.

For investigations involving malware used in cybercrime, the findings from static analysis feed into the broader investigation workflow described in the cyber investigation process. Network-based IoCs such as command-and-control domains are passed to network investigators for traffic correlation. Attribution artefacts such as compilation metadata, language settings in string tables, and code reuse patterns are forwarded to threat-intelligence analysts. The static analysis report is not an endpoint; it is an input to the wider case.

Check your understanding
Question 1 of 4· 0 answered

An analyst runs file on a binary received via email and the output reads 'PE32 executable (GUI) Intel 80386'. The file was named quarterly_report.pdf. What does this finding mean?

Key Takeaways

  • Static malware analysis examines a file without executing it, using file-type identification, hashing, string extraction, PE header parsing, and disassembly to build a capability and intelligence profile with no infection risk.
  • Cryptographic hashes (SHA-256 as the current standard) fingerprint a sample uniquely, enable threat-intelligence lookups, and serve as the primary file-based indicator of compromise shared across defenders.
  • The PE import table reveals a Windows malware sample's intended capabilities before a single line of code is read: specific API function groups map directly to process injection, encryption, persistence, or network communication.
  • High section entropy (above 7.0) and a minimal import table containing only LoadLibraryA and GetProcAddress are the two clearest static indicators that a sample is packed, requiring dynamic unpacking or memory forensics to recover the real payload.
  • All analysis must be performed on a hash-verified copy in an isolated environment, with every step logged, to satisfy chain-of-custody requirements under applicable law, including the Bharatiya Sakshya Adhiniyam 2023, US Federal Rules of Evidence, and the UK Police and Criminal Evidence Act 1984.
What is static malware analysis?
Static malware analysis examines a suspicious file without executing it. The analyst uses tools to identify the file type, compute cryptographic hashes, extract readable strings, parse binary headers, and disassemble or decompile the code. This approach is safe because the sample never runs, and the findings can be reproduced by any analyst working from the same file.
How does hashing help in malware analysis?
Cryptographic hash functions such as MD5, SHA-1, and SHA-256 produce a fixed-length fingerprint for a given file. If two files share the same hash, they are identical. Analysts submit hashes to threat-intelligence platforms like VirusTotal to check whether the sample has been seen before, and share hashes as indicators of compromise so defenders can detect the same file across an estate without distributing the malware itself.
What do strings extracted from a malware sample reveal?
Printable strings embedded in a binary can expose URLs, IP addresses, registry key paths, file paths, API function names, error messages, and hard-coded credentials. Even partially obfuscated malware often leaves enough plain-text strings for an analyst to infer the malware family, its command-and-control infrastructure, or its intended target.
What is a PE header and why does it matter in static analysis?
The Portable Executable (PE) header is a structured metadata block at the start of Windows executable files. It records the compilation timestamp, the list of imported DLLs and functions (imports), exported functions, section names and their memory attributes, and the entry point. Anomalies in these fields, such as a future timestamp, suspicious import combinations, or mismatched section sizes, are strong indicators of packing, obfuscation, or malicious intent.
What is the difference between disassembly and decompilation in malware analysis?
Disassembly converts raw binary machine code into human-readable assembly language instructions. Decompilation goes further, attempting to reconstruct higher-level source code from the binary. Disassembly is always possible and accurate at the instruction level but requires knowledge of assembly. Decompilation produces more readable output but may introduce inaccuracies because the original source is not recoverable with certainty.

Test yourself on Cyber Forensics with free, timed mocks.

Practice Cyber Forensics questions

Found this useful? Pass it along.

Share

Spotted an error in this page? Report a correction or read our editorial standards.

Your journey to becoming a forensic professional starts here.

Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.