Skip to content

Data Recovery and File Carving: Deleted, Hidden, Encrypted

What 'deleted' means at the filesystem level, signature-based carving with PhotoRec and Foremost, SSD TRIM realities, BitLocker/FileVault/LUKS/VeraCrypt recovery paths, cold-boot key extraction, and the steganography detection workflow Indian CFSL teams actually run.

Last updated:

Share

When a file is deleted, the operating system marks its directory entry as available for reuse but leaves the underlying data blocks intact until new data overwrites them. Recovery is therefore possible on spinning hard drives, where unallocated blocks persist until reallocated, but is frequently impossible on TRIM-enabled SSDs, where the controller erases freed NAND pages within seconds to minutes. Signature-based file carving with tools such as PhotoRec, Foremost, and Scalpel can recover files from formatted or corrupted media by matching known header and footer byte patterns, though fragmented files and encrypted volumes require distinct approaches. Encrypted volumes under BitLocker, FileVault, LUKS, and VeraCrypt are practically unbreakable without the key, so recovery focuses on key retrieval through legal process, escrowed backups, RAM extraction, or hibernation files rather than on attacking the cipher.

The phrase "deleted file" covers a stack of distinct technical scenarios, and treating them as one problem produces wrong conclusions in court. A file deleted from a Windows NTFS desktop is not the same problem as a file deleted from an Android phone's flash storage, which is not the same as a file overwritten by a paranoid suspect with shred -uvz. Each demands a different recovery technique, a different success probability, and a different way of writing it up under BSA 2023 Section 63.

Key takeaways

  • When a file is deleted, the operating system marks the directory entry as available but leaves the data blocks intact until something writes over them, so recovery is often possible before that overwrite occurs.
  • Signature-based carving with tools such as PhotoRec, Foremost, and Scalpel recovers files by matching known header and footer byte patterns even when the file system metadata is entirely gone.
  • Consumer SSDs with TRIM enabled are the worst recovery target in the lab because the controller actively scrubs freed blocks, unlike spinning hard drives where data persists in unallocated space.
  • Encrypted volumes under BitLocker, FileVault, LUKS, and VeraCrypt require key recovery first, with cold-boot key extraction and hashcat brute force as the main technical routes when the password is unavailable.
  • Mounting an evidence image read-write, even briefly, can trigger journal replay and free-space reclamation that overwrites recoverable data, so working only on verified copies is a rule with no exception.

This topic walks the recovery stack from easy to hard: filesystem-aware undelete on NTFS, ext4 and APFS; signature-based carving with PhotoRec, Foremost and Scalpel; the SSD/TRIM problem that makes consumer SSDs the worst recovery target in the lab; partition-table recovery with TestDisk; encrypted-volume recovery for BitLocker, FileVault, LUKS and VeraCrypt; cold-boot key extraction; hashcat brute force with the right mode numbers; and the steganography and hidden-partition tricks that come up in CERT-In and I4C casework. Image first, work on copies, log every step. That last rule is the only one that has no exception.

By the end of this topic you will be able to:

  • Explain what 'deletion' means at the filesystem level for NTFS, ext4, APFS, and FAT32, and predict recovery probability before attempting it.
  • Apply signature-based file carving with PhotoRec, Foremost, and Scalpel on raw disk images, and identify when fragmentation or encryption makes signature carving insufficient.
  • Assess the impact of TRIM on SSD evidence and adjust seizure, imaging, and recovery expectations accordingly.
  • Select the appropriate key-recovery path (legal process, escrow, RAM extraction, hibernation file, or hashcat brute force) for BitLocker, FileVault, LUKS, and VeraCrypt volumes.
  • Conduct a steganography detection workflow using StegExpose and stegoVeritas on a set of candidate carrier files and document findings for court.
Key terms
Deletion vs erasure
Deletion marks the file entry as free; content blocks remain until overwritten. Erasure (shred, sdelete, secure-erase) overwrites the blocks before unlinking. The difference is the recovery probability.
File carving
Recovery technique that scans raw disk blocks for known file headers and footers, ignoring filesystem metadata. The only option when the filesystem is corrupted, formatted or deliberately wiped.
TRIM
ATA/SCSI command that tells an SSD which logical blocks are no longer in use. The SSD's garbage collector can then physically erase them. TRIM is why deleted data on a modern SSD is usually unrecoverable within seconds to minutes.
BitLocker
Microsoft's full-volume encryption on Windows. Volume Master Key wrapped under TPM, password, recovery key (48 digits), or AD/MBAM/Intune escrow. Hashcat mode -m 22100.
VeraCrypt hidden volume
A second encrypted volume nested inside the free space of an outer VeraCrypt volume, accessed with a different password. Designed for plausible deniability; statistical entropy and free-space patterns are how examiners flag the presence.
Steganography
Hiding data inside other data, typically by encoding payload bits into the least significant bits of pixels or audio samples. Tools: StegHide, OutGuess, F5. Detection: StegExpose, stegoVeritas, statistical chi-square against LSB plane.

What 'deleted' actually means

When a user clicks Delete or runs rm, the operating system does not erase content. It updates a small piece of metadata that marks the file's directory entry as available for reuse. The data blocks holding the file's actual content remain on the storage device, untouched, until the filesystem allocator reuses those blocks for a new file. On a lightly used drive that can be weeks. On a heavily used drive it can be seconds.

The mechanism differs per filesystem:

  • NTFS holds every file's metadata in a Master File Table ($MFT) entry. Deletion flips the entry's in-use bit and marks the entry as available. The content extents are dereferenced but not overwritten. Recovery walks the MFT looking for entries with the bit clear that still have valid extent pointers. Tools: MFTECmd, analyzeMFT, X-Ways. The $LogFile and $UsnJrnl:$J journals can hold older MFT states and recover entries the live MFT has reused.
  • ext4 marks the inode as free and updates the block group descriptor, but the inode itself is usually preserved until reuse. The journal (typically /dev/<device>1 block group 0's journal inode) holds older inode states. Tools: extundelete, ext4magic, debugfs. ext4's delayed-allocation behaviour means recently created files might not even be on disk yet when deleted.
  • APFS is the hardest of the three for traditional undelete because copy-on-write plus aggressive TRIM means the original blocks are usually freed and discarded quickly. APFS snapshots are the recovery path that actually works; cross-link macOS Forensic Artifacts: plist, Keychain, Time Machine and Browser Traces.
  • FAT32 / exFAT still appear on Indian-investigation USB sticks and SD cards. Deletion replaces the first byte of the filename with 0xE5. The directory entry, FAT chain and clusters often remain. Recovery is straightforward with TestDisk's "undelete" view or PhotoRec.

Signature-based carving

Filesystem-aware undelete works only while metadata exists. After a full format, a corrupted partition table or a deliberate mkfs, the metadata is gone but the content blocks often remain. File carving is the recovery technique that ignores filesystem metadata entirely and scans raw disk blocks for known file headers and footers.

File-carving signature workflow. A raw disk image is scanned byte by byte for known magic-byte headers. When a header is foun
File-carving signature workflow. A raw disk image is scanned byte by byte for known magic-byte headers. When a header is found the carver reads forward to the matching footer (or inferred length) and extracts the candidate file. Multiple file types are carved in parallel, then de-duplicated and validated before output.

The classic header/footer pairs every examiner memorises:

File typeHeader (hex)Footer (hex)Notes
JPEGFF D8 FF E0 / FF D8 FF E1FF D9JFIF (E0) or EXIF (E1); footer is the EOI marker
PNG89 50 4E 47 0D 0A 1A 0A49 45 4E 44 AE 42 60 82Header is the PNG signature; footer is the IEND chunk + CRC
PDF25 50 44 46 (%PDF)25 25 45 4F 46 (%%EOF)Multiple %%EOF possible (linearized / incremental update)
DOCX / XLSX / PPTX / ZIP50 4B 03 04 (PK..)50 4B 05 06 (PK..)All ZIP-family; differentiation needs central-directory parse
MP4 / MOV00 00 00 ?? 66 74 79 70 (ftyp)noneLength-prefixed boxes; no fixed footer, parse box tree
GIF47 49 46 38 37 61 / 47 49 46 38 39 6100 3BGIF87a or GIF89a
SQLite53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00nonePage-structured; no footer, recovery via page size header

The carving tools:

  • PhotoRec (CGSecurity) is the most widely used. Free, cross-platform, ships with TestDisk. Supports several hundred signatures. Strength: forgiving and recovers from severely damaged media. Weakness: filenames are lost, replaced with sequential numbering.
  • Foremost (US Air Force OSI) reads a config file (foremost.conf) that defines header/footer pairs. Compact, scriptable, predictable. The default config covers common types; custom signatures are one line each.
  • Scalpel is a Foremost fork that's faster on large images by skipping irrelevant block ranges. Same config style.
  • Bulk Extractor carves not files but content fragments: email addresses, URLs, credit-card numbers, phone numbers, BTC addresses, search-engine queries. Different problem class; very useful for cross-referencing.
  • X-Ways Forensics and ReclaiMe are commercial. X-Ways has the best filesystem-aware carving (it understands NTFS fragmentation and reassembles non-contiguous files where PhotoRec gives up).

SSDs, TRIM and the recovery reality

Hard-drive recovery techniques do not transfer to SSDs, and the mechanism that breaks them is TRIM. The mechanism is TRIM.

When a file is deleted on an SSD-backed filesystem with TRIM support (Windows NTFS, macOS APFS, Linux ext4 with discard mount option or periodic fstrim), the OS issues the TRIM (ATA) or UNMAP (SCSI) command to tell the SSD which logical blocks are no longer in use. The SSD's garbage collector then erases the corresponding NAND pages physically. Within seconds to minutes, the actual content is gone from the flash cells. No carving, no journal walking, no MFT scanning will bring it back.

The practical implications for Indian casework:

  • An SSD seized hot, with the system running, may already have lost most recently-deleted content. The first responder's job is to image fast and minimise live OS activity. Pulling the plug stops TRIM operations in flight but the damage is largely already done.
  • Built-in encryption on SSDs (SED, OPAL) means the controller may secure-erase by key destruction. A factory reset on an SED takes milliseconds and produces unrecoverable media.
  • eMMC (used in Android phones, IoT devices, low-cost laptops) supports TRIM via the discard operation. Same problem class, slightly slower garbage collection.
  • Chip-off recovery from the raw NAND can sometimes find data in wear-leveling reserve pages that the FTL has remapped but not yet erased. This is a specialist procedure done by CFSL chip-off labs (Hyderabad, Chandigarh and a handful of state labs); equipment cost runs into crores and per-case turnaround is weeks.

For Windows BitLocker-encrypted SSDs the picture is even thinner: the underlying flash content is ciphertext, so even physical recovery of the bytes yields only ciphertext until the BitLocker key is provided.

  1. Acknowledge media
    Identify SSD vs HDD vs eMMC vs UFS at intake. The recovery plan depends on it. SMART data and the controller's model number are part of the chain-of-custody record.
  2. Image at the lowest defensible level
    For SSDs, image the logical drive through a write blocker. Chip-off only if the case justifies the cost and the case-officer authorises it; document the decision.
  3. Set recovery expectations early
    If the device is a TRIM-enabled SSD and the alleged deletion was more than a few hours before seizure, communicate the low recovery probability to the IO before they build the case around it.
  4. Try snapshots, cloud and backups first
    APFS snapshots, Windows Volume Shadow Copies, Time Machine, OneDrive/Google Drive history, Android cloud backup. Often higher yield than carving the device.
  5. Carve and parse with realistic SLA
    Run PhotoRec/X-Ways carving against the image with the right signatures. Treat anything recovered as bonus, not base case.

Partition tables, formatted drives and TestDisk

A drive that "looks blank" to Windows or macOS often is not. Three scenarios show up regularly in Indian investigations:

  • Quick format. Most consumer OSes default to a quick format, which writes a new filesystem header (NTFS boot sector, FAT BPB, ext superblock) and a new empty file table. The data clusters remain untouched. Recovery is high probability with PhotoRec or filesystem-specific tools.
  • Full format. Windows 10 and later perform a full format by default if the option is selected, writing zeroes (or random data on encrypted volumes) over every sector. After a full format, signature carving has nothing to find. The drive is, for practical purposes, blank.
  • Damaged or deleted partition table. The first 512 bytes (MBR) or the first 17 KiB (GPT) of a drive can be corrupted, overwritten or deliberately deleted, while leaving the partitions themselves intact further down the disk. The drive shows as unallocated in Disk Management, but the data is whole.

For the third case, TestDisk (CGSecurity, same author as PhotoRec) is the canonical tool. TestDisk scans for filesystem signatures across the disk, reconstructs a probable partition table, and writes it back. After TestDisk's "Write" step, the drive mounts normally and recovery proceeds through standard tools. The workflow for a recovered MBR/GPT case:

  1. Image the raw drive
    dd or ddrescue the physical device to an .img or .E01 file, behind a hardware write blocker. Hash the source and image.
  2. Run TestDisk on the image
    Use the file-backed image so the original device is never modified. Choose the partition table type (Intel for MBR, EFI GPT for modern systems), let TestDisk Analyse, then Quick Search.
  3. Validate the proposed partition table
    TestDisk highlights probable partitions. Spot-check by walking the directory listing inside TestDisk before committing the write. Do not commit to the source image; write to a working copy.
  4. Mount read-only and proceed
    Attach the recovered partition with -o ro,noload,noatime and continue analysis with Autopsy, X-Ways, FTK or your tool of choice.

In a 2023 Hyderabad financial-fraud matter, a suspect used a low-level utility to wipe only the MBR while leaving the data partition intact; TestDisk reconstructed the partition table and the prosecution recovered three months of accounting records.

Encrypted volumes: BitLocker, FileVault, LUKS, VeraCrypt

File-system slack vs unallocated space. Within an allocated cluster, the file ends before the cluster boundary, the leftover
File-system slack vs unallocated space. Within an allocated cluster, the file ends before the cluster boundary, the leftover is file slack and may hold remnants of previous file content. Beyond the last cluster of the file come free (unallocated) clusters that the filesystem has not yet reassigned, which are the primary target for file-carving tools.

A correctly configured full-volume encryption system without the key is, to a first approximation, unbreakable. The forensic problem is therefore not "break the crypto" but "find the key, by any of the legitimate paths."

SystemPlatformRecovery pathsHashcat mode
BitLockerWindows 10/11TPM + PIN, user password, 48-digit recovery key, AD / Intune / MBAM escrow, RAM dump, hibernation file-m 22100
FileVault 2macOS APFSUser password, personal recovery key, institutional recovery key, iCloud-escrowed key, RAM dump (live system)-m 16700
LUKS / LUKS2LinuxPassphrase in any of 8 key slots, key file, TPM (LUKS2), RAM extraction of master key-m 14600 (LUKS1), -m 29511 to -m 29543 (LUKS1 by kdf/cipher variant), LUKS2 supported from hashcat v7.1+ in the 34100 range by variant
VeraCryptCross-platformPassword, keyfile, PIM, RAM-resident master key while volume is mounted; hidden volume needs separate password-m 13711 to -m 13771 (per kdf and cipher)

The legitimate paths in order of preference:

  • Ask the user. Under BNSS Section 94 production notice or with consent at seizure, the user is legally compellable to produce passwords for devices in their custody. Many Indian magistrates now accept "I have forgotten the password" as a fact question subject to cross-examination. CrPC-era cases like Selvi v. State of Karnataka shaped the limits; the BSA 2023 evidentiary frame is the active reference for new matters.
  • Recovery keys from the user's account. BitLocker keys are routinely escrowed to Microsoft accounts (consumer) or Azure AD / Intune (enterprise). FileVault keys go to iCloud or to Jamf/Kandji. LUKS keys may sit in a sysadmin's password manager. A correctly-drafted production notice to the right entity, often faster than the user themselves, recovers the key.
  • RAM extraction. A device seized in an unlocked state with the encrypted volume mounted holds the master key in RAM. The first responder's job is to capture RAM before shutdown. Tools: MAGNET RAM Capture (Windows), AVML or LiME (Linux), Recon ITR (macOS). The captured image is then scanned by Volatility plugins (bitlocker, luks, truecryptmaster) or by tools like passware/ElcomSoft for the master key.
  • Hibernation file (Windows). C:\hiberfil.sys is a compressed RAM dump produced when a Windows machine hibernates. If the suspect closed the lid on an unlocked machine and the laptop hibernated, the BitLocker master key may still be in the hibernation file, recoverable with hibr2bin and Volatility.
  • Cold-boot attack. RAM contents persist for seconds to minutes after power loss, longer when chilled. Pouring liquid nitrogen on DRAM modules then rebooting into a small key-extraction OS recovers keys that have not yet faded. The technique is real and was demonstrated against laptops at the 2008 USENIX Security Symposium (Halderman et al.); subsequent cold-boot work on Android phones appeared in separate academic venues, not at USENIX. Indian state FSLs have the capability in concept; the practical execution requires the device to be intercepted in a powered state by a team prepared for cold-boot work.

Brute force is the path of last resort. hashcat -m 22100 bitlocker.hash -a 0 wordlist.txt against a weak BitLocker password may work; against a 16-character random password it will not in the lifetime of the case. The realistic dictionary-attack work in Indian labs targets user-chosen passwords against rockyou.txt-style wordlists with rule-based mutation.

Hidden data and steganography

The last category covers data that was never deleted but was placed where a routine examination would not reach.

The locations Indian I4C and CERT-In analysts check by default:

  • Slack space. The unused portion of the last cluster of every file. NTFS file using 12 KB of a 16 KB cluster leaves 4 KB of slack. Tools: bulk_extractor, X-Ways. Useful for finding fragments of deleted-then-overwritten files, occasionally hand-placed payloads.
  • Alternate Data Streams (ADS) on NTFS. Covered in detail in Windows Artifacts: ShellBags, ADS, LNK, Hibernation and Slack. notes.txt:hidden.exe is the canonical pattern.
  • Bytes between MBR and the first partition. Sectors 1 through 62 (the first 31.5 KB) are typically unused and can host bootkits, hidden payloads or just text files. dd at the right offsets reveals them.
  • EFI System Partition. Hidden from Windows Explorer by default. Tools like BadRabbit and BlackLotus place persistence here. mount with mount -t vfat /dev/sdaX /mnt and check.
  • Host Protected Area (HPA) and Device Configuration Overlay (DCO). Reserved regions at the end of an ATA drive that subtract capacity reported by the firmware. hdparm --dco-identify and hdparm -N detect them; some forensic acquisition appliances temporarily disable the HPA to image the full physical capacity.
  • Encrypted ZIP / RAR / 7z containers. Password-protected archives are common in Indian financial-fraud and CSAM matters. Hashcat modes -m 13600 (WinZip), -m 12500 (RAR3), -m 13000 (RAR5), -m 11600 (7-Zip).

Steganography is a separate problem class. The technique embeds payload data into a carrier file (image, audio, video) in a way that is statistically subtle. The most common scheme is LSB substitution: replace the least-significant bit of each pixel's red/green/blue channel with one bit of payload. A 1920 by 1080 JPEG decoded to a bitmap has roughly 6 million pixels and 18 million channels, enough for a 2 MB payload while staying invisible to the eye.

The detection workflow:

  1. Identify carrier files
    Bulk extract all images, audio and video from the image. Hash and dedupe against known clean-set databases (NSRL).
  2. Run statistical detectors
    StegExpose against bitmaps; stegoVeritas for a battery of checks; chi-square against the LSB plane of each colour channel. Anomalously uniform LSB distribution is the signature.
  3. Try known-tool extraction
    StegHide, OutGuess, F5 and OpenStego have distinctive signatures. Run each tool's extract mode against suspect carriers. A successful extract with the suspect's password produces the payload directly.
  4. Brute force passwords on flagged carriers
    stegcracker, stegseek and hashcat can brute-force common stegano tool passwords with reasonable dictionaries.
  5. Correlate to user activity
    If a carrier image flags, check timestamps, source application, browser download history and any local copies of stego tools. The case lives or dies on attribution, not on the bare detection.

The forensic workflow underneath all of this is constant: image first, hash the source and the image, work on a verified copy, and log every step. The chain of custody from seizure through analysis to courtroom is covered in Chain of Custody and in Digital First Responder: Volatility, Seizure and Imaging, which together form the procedural backbone for everything in this topic.

Practice
Question 1 of 5· 0 answered

A 200 MB MP4 video was deleted from an NTFS volume on a 7200-rpm HDD, four hours ago. The volume is 60% full. Which recovery approach is most likely to succeed?

Frequently asked questions

What is the difference between deleting a file and securely erasing it?
Deletion marks the file's directory entry as available for reuse but leaves the content blocks intact on disk until the filesystem allocator overwrites them. Secure erasure (shred, sdelete, SSD ATA Secure Erase, or self-encrypting drive key destruction) overwrites the blocks before unlinking, or destroys the encryption key that made them readable. The first is recoverable until overwrite; the second is not. The recovery probability for any given case depends on which one was performed and on the storage medium.
Is data on a TRIM-enabled SSD ever recoverable after deletion?
Rarely. Once TRIM has been issued and the SSD's garbage collector has erased the underlying NAND pages, the data is gone from the flash cells. The window between deletion and TRIM-plus-GC is typically seconds to minutes. Recovery from wear-leveling reserve pages via chip-off is sometimes possible for specific cells the FTL has remapped but not yet erased, but this is a specialist operation. Practically, examiners working SSD evidence focus on snapshots, backups and cloud-side copies rather than on-device carving.
What is file carving and when is it the right tool?
File carving recovers files by scanning raw disk blocks for known file headers and footers, ignoring filesystem metadata. It is the right tool when the filesystem is corrupted, formatted, or deliberately wiped, leaving content blocks intact but metadata gone. PhotoRec, Foremost and Scalpel are the open-source standards; X-Ways and ReclaiMe are commercial. The weakness of pure signature carving is fragmented files; filesystem-aware carving in X-Ways handles fragmentation where PhotoRec cannot.
How do examiners deal with BitLocker-encrypted Windows drives?
Four legitimate paths exist. First, ask the user (BNSS Section 94 production notice). Second, retrieve the 48-digit recovery key from the user's Microsoft account, Azure AD / Intune / MBAM escrow, or printed copy. Third, extract the BitLocker master key from a RAM image captured while the volume was mounted. Fourth, recover the key from hiberfil.sys if the laptop hibernated while unlocked. Brute force with hashcat -m 22100 is a last resort and only viable against weak user passwords, not against the 48-digit recovery key itself.
Can a VeraCrypt hidden volume be detected forensically?
Not deterministically. VeraCrypt's design fills outer-volume free space with random bytes so a hidden volume blends in by definition. Detection relies on a combination of artifacts: the VeraCrypt application's presence, recent-files entries pointing at a container, dual access-time patterns on the same container, statistical entropy or chi-square analysis of free space, and RAM artifacts from a previously-mounted hidden volume. The prosecution rarely succeeds on hidden-volume detection alone; the case is typically built around surrounding artifacts.
What is a cold boot attack and is it practical in Indian casework?
A cold boot attack exploits the fact that DRAM contents persist for seconds to minutes after power loss, longer when chilled. The attacker chills the memory modules (compressed-air cans, liquid nitrogen) and reboots the device into a small key-extraction OS to read the still-resident memory. The technique is real, demonstrated repeatedly at academic venues, and yields encryption keys for BitLocker, FileVault, LUKS and VeraCrypt. Indian state FSLs have the capability in principle and CFSL Hyderabad has executed it in selected cases; routine application is limited by the requirement that the device be intercepted in a powered state by a team prepared for the procedure.
How is steganography detected if the algorithm and password are unknown?
By statistics, not by decryption. The detection workflow runs chi-square, entropy and visual-attack analyses against the LSB plane of every candidate image, audio and video file. Tools: StegExpose, StegoVeritas, custom chi-square scripts. Anomalous distributions justify tool-specific extraction attempts with StegHide, OutGuess, F5, OpenStego and others, often combined with dictionary attacks via stegseek or stegcracker. The detection step flags candidates; the extraction step recovers the payload; the case-attribution step ties the carrier back to the suspect's activity. All three are needed before a court will weigh a steganographic claim.

Test yourself on Digital Forensics with free, timed mocks.

Practice Digital Forensics questions

Found this useful? Pass it along.

Share

Spotted an error in this page? Report a correction or read our editorial standards.

Your journey to becoming a forensic professional starts here.

Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.