Chapter 14· 16 min read

Cryptography & Cryptanalysis

Reading as a guest

You'll lose your reading position and notes if you leave without an account.

Cryptography is the science of hiding information; cryptanalysis is the science of breaking that hiding. Both matter for digital forensics — you need to understand which algorithm protected the evidence you are trying to access, and whether it can be broken at all. This chapter covers all the cryptographic algorithms named in the FACT Elective III syllabus.

14.1Symmetric vs Asymmetric Encryption

The fundamental choice in any cryptographic system is how many keys are involved and how they are shared. Everything else follows from that choice.

Symmetric encryption. A single key encrypts the plaintext and the identical key decrypts the ciphertext. This makes it fast and computationally cheap — ideal for bulk data. The problem is key distribution: how do two parties securely agree on and share a secret key before they can communicate? If the channel is already insecure, sending the key over it defeats the purpose.

Asymmetric (public-key) encryption. Two mathematically linked keys are generated together — a public key and a private key. The public key can be shared with anyone. A message encrypted with the public key can only be decrypted with the corresponding private key, which the owner keeps secret. This solves the key distribution problem: anyone can write a secret message to you using your public key, but only you can read it. The tradeoff is speed — asymmetric operations are typically 100–1000× slower than symmetric ones.

Hybrid encryption. Real-world systems — TLS (HTTPS), PGP email encryption, WhatsApp's Signal Protocol — combine both. Asymmetric encryption is used in a short handshake to securely exchange a fresh symmetric session key. The fast symmetric key then protects all the actual data. This gives you the security of asymmetric key exchange with the speed of symmetric bulk encryption.

Key length and security equivalence:

A 128-bit symmetric key offers roughly the same brute-force resistance as a 3072-bit RSA key. Quantum computers threaten asymmetric cryptography (Shor's algorithm can factor large numbers exponentially faster) far more than symmetric algorithms (Grover's algorithm only halves the effective key length — a 256-bit AES key remains secure). This is why post-quantum cryptography focuses on replacing RSA/ECC rather than AES.

Digital signatures. Asymmetric keys work in reverse for signatures. The sender computes a hash of the message and encrypts that hash with their own private key. Anyone holding the sender's public key can decrypt the hash and verify it matches the message — proving both:

Authenticity — only the private key holder could have produced this signature.
Integrity — if the message was altered after signing, the hash would no longer match.

Fig 14.1 — Symmetric vs asymmetric encryption and digital signatures — three panels showing how keys are used in each paradigm.

14.2Classical Ciphers

Classical ciphers predate computers by centuries. They appear in the FACT syllabus because understanding them builds the vocabulary for modern cryptanalysis, and some still appear in CTF challenges and legacy forensic contexts (the Windows Registry UserAssist key, for example, uses ROT-13).

Substitution ciphers. Each letter in the plaintext is replaced by a different letter (or symbol) according to a fixed rule or alphabet.

Caesar cipher — the simplest substitution cipher. Every letter is shifted forward by N positions in the alphabet. ROT-13 (shift by 13) is a special case because it is self-inverse: applying ROT-13 twice returns the original text. The Caesar cipher is trivially broken by frequency analysis: count the most frequent letter in the ciphertext and map it to E (the most common letter in English), then test the implied shift.

Vigenère cipher — a polyalphabetic substitution using a repeating keyword. Each letter of the plaintext is shifted by the value of the corresponding letter of the keyword (A=0, B=1, ... Z=25), then the keyword repeats. This defeats simple frequency analysis because the same plaintext letter maps to different ciphertext letters depending on its position. However, it is broken by Kasiski analysis: find repeated ciphertext sequences; their positions reveal the likely keyword length; divide the ciphertext into groups at that spacing and apply frequency analysis to each group independently.

Atbash — an ancient Hebrew-origin cipher that maps A↔Z, B↔Y, C↔X, and so on — the alphabet reversed against itself. It is its own inverse. Relevant forensically because it appears as a puzzle element in some cases.

Transposition ciphers. The letters of the plaintext are rearranged — not replaced. The content of each letter is preserved; only the order changes.

Rail fence cipher — write the message in a zigzag across N rails (rows), then read off each rail from left to right in sequence. For example, "FORENSICS" on 2 rails:

F R N I S
 O E S C

Reading row by row: FRNISOES C.

Columnar transposition — write the message into a rectangular grid row by row, then read off the columns in an order determined by a keyword. The keyword defines which column number maps to which position — breaking it requires trying all permutations of column order.

Frequency analysis. The primary cryptanalysis tool for any substitution cipher. In English, the letters appear with well-known relative frequencies: E is most common (~12.7%), followed by T, A, O, I, N, S, H, R. A ciphertext produced by a monoalphabetic substitution (like Caesar) will have the same frequency distribution as English, just shifted. Mapping the most frequent ciphertext letter to E and testing adjacent shifts usually breaks a Caesar cipher within seconds.

Fig 14.2 — Caesar cipher — plaintext alphabet on top, ciphertext alphabet shifted by 3 below. Example: FORENSIC encodes to IRUHQVLF.

14.3Modern Symmetric Algorithms

Modern symmetric algorithms operate on fixed-size blocks of bits rather than individual letters, and use mathematical operations — bit shifts, XOR, substitution tables — that are far harder to attack than classical frequency analysis.

DES (Data Encryption Standard). Adopted by NIST in 1977. Key size: 56 bits. Block size: 64 bits. Structure: a Feistel network with 16 rounds. The 56-bit key was already considered marginal by the 1990s. In 1998 the EFF built the "DES Cracker" machine for under $250,000 and broke DES in 56 hours by brute force. DES is completely broken and must not be used.

3DES (Triple DES). Applies DES three times with two or three different keys to increase effective key length. The most common variant (EDE — Encrypt with K1, Decrypt with K2, Encrypt with K1) provides approximately 112 bits of effective security. 3DES is deprecated by NIST as of 2023 because it is slow and its 64-bit block size makes it vulnerable to birthday attacks on large data volumes. Legacy systems (some banking protocols) still use it.

AES (Advanced Encryption Standard). Selected by NIST in 2001 after an open international competition. Block size: always 128 bits. Key sizes: 128, 192, or 256 bits, giving 10, 12, or 14 rounds respectively. Structure: a substitution-permutation network (SubBytes → ShiftRows → MixColumns → AddRoundKey). AES has no practical attack faster than brute force. It is the current world standard for symmetric encryption and is used in:

Full-disk encryption: BitLocker (Windows), FileVault (macOS), VeraCrypt
Network protocols: TLS 1.2/1.3, WPA2, WPA3
Messaging: WhatsApp, Signal

RC4 (Rivest Cipher 4). A stream cipher — it generates a pseudorandom byte stream (keystream) and XORs it with the plaintext. RC4 is extremely fast and simple. It was used in WEP (Wi-Fi Encryption Protocol) and early SSL/TLS. RC4 is cryptographically broken: statistical biases in its keystream output make it vulnerable to plaintext recovery attacks. WEP was broken partly because of the predictable way RC4 initialisation vectors (IVs) were reused. RC4 must not be used in any new system.

Blowfish. Designed by Bruce Schneier in 1993. Block size: 64 bits. Variable key length: 32–448 bits. Structure: a Feistel network with 18 rounds. Fast on 32-bit systems, licence-free, and widely deployed in older software. The 64-bit block size makes it vulnerable to birthday attacks on large amounts of data (same weakness as 3DES). Modern systems have replaced Blowfish with AES. However, bcrypt — the widely used password hashing function — is based on Blowfish's key schedule and remains relevant in forensic contexts (recovering hashed passwords).

Modes of operation. A block cipher alone only encrypts one block at a time. Modes define how to extend it to arbitrary-length data:

ECB (Electronic Codebook) — each block encrypted independently with the same key. Fatal flaw: identical plaintext blocks produce identical ciphertext blocks. The "ECB penguin" (encrypting an image of a penguin in ECB mode still produces a recognisable penguin outline in the ciphertext) demonstrates this visually. Never use ECB for real data.
CBC (Cipher Block Chaining) — each plaintext block is XORed with the previous ciphertext block before encryption. Identical plaintext blocks produce different ciphertext. Requires an initialisation vector (IV) for the first block.
CTR (Counter Mode) — encrypts a counter value and XORs the result with the plaintext; turns a block cipher into a stream cipher; highly parallelisable.
GCM (Galois/Counter Mode) — CTR mode plus a Galois-field authentication tag; provides authenticated encryption (confidentiality + integrity in one pass). Used in TLS 1.3.

Algorithm	Type	Key Size	Block Size	Status
DES	Block (Feistel)	56-bit	64-bit	Broken — avoid completely
3DES	Block (Feistel ×3)	112-bit effective	64-bit	Deprecated (NIST 2023)
AES-128 / AES-256	Block (SPN)	128 / 256-bit	128-bit	Current standard
RC4	Stream	Variable	N/A	Broken — avoid completely
Blowfish	Block (Feistel)	32–448-bit	64-bit	Legacy (bcrypt uses its key schedule)

14.4Asymmetric Algorithms

Asymmetric algorithms underpin the public-key infrastructure (PKI) that makes HTTPS, digital signatures, and encrypted email possible. Their security comes from mathematical problems believed to be computationally hard.

RSA (Rivest–Shamir–Adleman). The most widely deployed public-key algorithm. Security is based on the difficulty of factoring the product of two large prime numbers. Key generation:

Choose two large primes p and q (each hundreds of digits long in practice).
Compute n = p × q (the modulus, shared publicly).
Compute φ(n) = (p−1)(q−1) (Euler's totient — kept secret).
Choose a public exponent e (commonly 65537 — large enough to resist small-exponent attacks, small enough to be fast).
Compute the private exponent d such that e × d ≡ 1 (mod φ(n)).
Public key = (e, n). Private key = (d, n). Destroy p, q, and φ(n).

Breaking RSA requires factoring n to recover p and q, which then reveals φ(n) and d. With 2048-bit keys, no classical computer can factor n in any practical time. Minimum recommended key size today: 2048 bits. For long-term security (documents that must remain confidential for 20+ years): 4096 bits.

RSA is used in: TLS/SSL certificates, PGP email encryption, digital signatures, SSH key authentication.

RSA key relationship

e × d ≡ 1 (mod φ(n))

e = public exponent (commonly 65537) · d = private exponent · φ(n) = (p−1)(q−1) · n = p × q (modulus) · knowing d requires factoring n, which requires knowing p and q

DSA (Digital Signature Algorithm). A NIST standard (FIPS 186), designed specifically for digital signatures — it cannot encrypt data, only sign it. Security is based on the discrete logarithm problem in a finite field. Key sizes: 1024–3072 bits (FIPS 186-4). DSA has been largely superseded by ECDSA in modern systems.

ECC (Elliptic Curve Cryptography). Based on the algebraic structure of elliptic curves over finite fields. The key advantage: much smaller keys than RSA for equivalent security. A 256-bit ECC key is approximately as hard to break as a 3072-bit RSA key. ECC operations are faster and use less power, making ECC the preferred choice for mobile devices and embedded systems.

Two main uses of ECC:

ECDH (Elliptic Curve Diffie-Hellman) — key exchange; used in TLS 1.3 handshake.
ECDSA (Elliptic Curve Digital Signature Algorithm) — digital signatures; used in TLS certificates, Bitcoin, and code signing.

Common curves: P-256 (NIST P-256, aka secp256r1) and P-384 for general use; Curve25519 (designed by Daniel Bernstein — faster, constant-time, resistant to timing attacks) for modern systems like Signal and WireGuard.

Diffie-Hellman Key Exchange (DH). Allows two parties to establish a shared secret over a completely insecure channel without transmitting the secret itself. Security is based on the discrete logarithm problem. Used in TLS (as DHE — ephemeral DH for forward secrecy) and IPSec IKE. ECDH is the elliptic curve variant used in TLS 1.3.

Fig 14.3 — Diffie-Hellman key exchange — Alice and Bob establish a shared secret over a public channel without ever transmitting the secret directly.

14.5Hash Functions & Message Authentication

A cryptographic hash function takes an input of any size and produces a fixed-size output (the digest or hash) with properties that make it almost impossible to reverse or manipulate.

Required properties of a cryptographic hash:

Deterministic — the same input always produces the same hash.
One-way (preimage resistance) — given the hash output, it is computationally infeasible to find any input that produces it.
Second-preimage resistance — given an input m1 and its hash, it is computationally infeasible to find a different input m2 that produces the same hash.
Collision resistance — it is computationally infeasible to find any two distinct inputs that produce the same hash.
Avalanche effect — a tiny change in the input (even flipping a single bit) produces a completely different hash output with no apparent relationship to the original.

MD5 (Message Digest 5). Designed by Ron Rivest (1991). Output: 128 bits (32 hex characters). MD5 is collision-vulnerable: since 2004 it has been possible to construct two different files that produce the same MD5 hash. The "SHAttered"-class attacks showed this extends to practical documents. MD5 must not be used for security applications (password storage, digital signatures, certificate fingerprints). However, MD5 is still widely used in digital forensics to verify that a file has not been altered during examination — in this context, an adversary manufacturing a collision against a specific evidence file in real time is not a realistic threat, so MD5 hashes in evidence logs are still accepted by courts, though examiners increasingly compute SHA-256 alongside MD5.

SHA-1 (Secure Hash Algorithm 1). NIST standard. Output: 160 bits (40 hex characters). Collision vulnerability demonstrated in the 2017 SHAttered attack (Google and CWI Amsterdam produced two different PDF files with identical SHA-1 hashes). SHA-1 is deprecated for all security purposes. Some older forensic tools still generate SHA-1 hashes; best practice is to also generate SHA-256 for any new examination.

SHA-2 family. The current family of NIST-standard hash functions. The most important variants:

SHA-256 — 256-bit (64 hex characters) digest. The current standard for evidence hashing in digital forensics. No practical collision known.
SHA-384 — 384-bit digest.
SHA-512 — 512-bit (128 hex characters) digest. Used in server-side password hashing and high-security applications.

SHA-3. Standardised by NIST in 2015. Uses a completely different internal construction — the Keccak sponge function — rather than the Merkle-Damgård construction used by MD5, SHA-1, and SHA-2. SHA-3 was not designed because SHA-2 was broken; SHA-3 is an independent alternative providing algorithmic diversity. If SHA-2 were someday broken, SHA-3 would remain unaffected because its design is structurally different.

HMAC (Hash-based Message Authentication Code). Combines a hash function with a secret key to produce a code that proves both that the message was not altered and that the message was produced by someone who holds the secret key. HMAC is not a hash function — it is an authentication scheme built on top of one.

HMAC construction

HMAC(K, m) = H((K ⊕ opad) ∥ H((K ⊕ ipad) ∥ m))

K = secret key (padded to block size) · m = message · H = underlying hash function (e.g. SHA-256) · ipad = inner padding constant (0x36 repeated) · opad = outer padding constant (0x5C repeated) · ⊕ = XOR · ∥ = concatenation

HMAC is used in: TLS authentication, JWT (JSON Web Token) signatures in web APIs, API request signing (AWS Signature Version 4), IPSec data integrity.

Forensic use of hash functions:

At the moment of evidence collection, the examiner computes MD5 and SHA-256 hashes of every acquired file and records them in the seizure log. This is the baseline.
Before any analysis, the examiner recomputes the hash and compares it to the baseline. Any mismatch means the file was altered.
SHA-256 is used for forensic disk images (E01, AFF4 formats embed the hash in the container).
The NIST NSRL (National Software Reference Library) maintains a database of hash values for millions of known files — operating system files, commercial software, known malware. Examiners use this to triage an evidence drive: filter out the millions of OS files (known-good) to focus on user files, and flag known-bad hashes (malware).

Fig 14.4 — Hash avalanche effect — changing a single character in the input ('FORENSIC' vs 'forensic') produces an entirely different SHA-256 hash with no visible relationship between the two outputs.

Memory hooks · Chapter 14

14.1 Symmetric vs asymmetric — symmetric = one key, fast, key distribution problem; asymmetric = public+private pair, slow, solves key distribution. Hybrid (TLS, PGP) = asymmetric handshake to agree a symmetric session key, then symmetric for data. Digital signature = private key signs, public key verifies (proves authenticity + integrity).

14.1 Key size equivalence — 128-bit AES ≈ 3072-bit RSA in brute-force resistance. Quantum computers (Shor's algorithm) threaten RSA/ECC far more than AES.

14.2 Caesar cipher — shift each letter by N; ROT-13 is self-inverse; broken instantly by frequency analysis (E is most frequent letter in English). Vigenère = polyalphabetic, broken by Kasiski analysis (repeated blocks reveal keyword length). UserAssist registry keys use ROT-13 encoding.

14.2 Substitution vs transposition — substitution = letters replaced (same order, different symbols); transposition = letters rearranged (same symbols, different order).

14.3 AES — current standard; 128-bit block; 128 or 256-bit key (10 or 14 rounds); substitution-permutation network; used in BitLocker, WPA2, TLS, WhatsApp. No practical attack faster than brute force.

14.3 DES — 56-bit key; broken by brute force 1998; do not use. 3DES = deprecated 2023. RC4 = broken stream cipher; was in WEP (IV reuse attack). Blowfish = 64-bit block, legacy; bcrypt password hash based on Blowfish key schedule.

14.3 ECB mode — same plaintext block → same ciphertext block; reveals patterns (ECB penguin). CBC, CTR, GCM are safe modes. GCM = authenticated encryption used in TLS 1.3.

14.4 RSA — security = factoring n = p×q. Key relationship: e × d ≡ 1 (mod φ(n)). Minimum 2048-bit keys; 4096-bit for long-term security. Used in TLS certificates, PGP, SSH.

14.4 ECC — 256-bit ECC ≈ 3072-bit RSA. ECDH = key exchange (TLS 1.3). ECDSA = signatures (Bitcoin, TLS certs). Curve25519 = modern, faster, timing-safe.

14.4 Diffie-Hellman — two parties establish shared secret over insecure channel; eavesdropper sees public values but cannot compute secret without solving discrete logarithm. ECDH is the ECC variant.

14.5 MD5 — 128-bit (32 hex chars); collision-vulnerable since 2004; still used in forensic evidence hashing (chain-of-custody verification, not security). SHA-1 = 160-bit; broken by SHAttered 2017; deprecated.

14.5 SHA-256 — current forensic evidence standard; 256-bit (64 hex chars); no practical collision known. SHA-3 = Keccak sponge function; different structure from SHA-2; algorithmic diversity, not a fix for SHA-2 weakness.

14.5 HMAC — hash + secret key = authentication code; proves both integrity (message unchanged) and authenticity (sender knows the key). Used in TLS, JWT, API signing.

14.5 NIST NSRL — database of known-good and known-bad file hashes; used in forensic triage to filter OS files and flag malware.

Don't lose your place

Save this chapter and the rest of Cyber & Digital Forensics.

A free ForensicSpot account remembers which chapters you've read, lets you highlight passages, take notes and resume from any device.

Create free account I already have an account