Virtual Machine and Cloud-Backed Endpoint Forensics
VMDK, VHDX, qcow2 and the snapshot chain; OneDrive Files On-Demand and iCloud placeholders; Docker overlay2; and how Indian cyber cells handle VM and cloud-stored evidence under BNSS 105.
Practice with national-level exam (FACT, FACT Plus, NET, CUET, etc.) mocks, learn from structured notes, and get your doubts solved in one place.
VMDK, VHDX, qcow2 and the snapshot chain; OneDrive Files On-Demand and iCloud placeholders; Docker overlay2; and how Indian cyber cells handle VM and cloud-stored evidence under BNSS 105.
The endpoint a forensic examiner images in 2026 is usually not what it pretends to be. A Bengaluru analyst's "desktop" may be a Windows guest running on a VMware Workstation host, with the real bits living in a 200 GB VMDK file plus three differential snapshots. A Mumbai banker's "C drive" may be a thin volume whose contents stream from OneDrive Files On-Demand, where the icon on screen represents a 0-byte placeholder that resolves to a cloud GET only on access. A Pune developer's "production server" may be a Docker container with an overlay2 file system whose layers vanish when the orchestrator schedules a new pod. The examiner who treats every device as a bare-metal Windows install will miss the bulk of the evidence and will not even know it.
The contrarian point most candidates miss is that the line between "local disk evidence" and "cloud evidence" has dissolved on modern endpoints. A file shown in File Explorer with the user's name and a modification date may have never touched the SSD; the local file system holds only a reparse point and a sync database row. A Section 63 BSA certificate for a disk image that shows that file in a listing is not a certificate for the file's contents, because the contents live in a Microsoft data centre. The examiner who imports a disk image into AXIOM, ticks "all files present", and reports the case as solved is publishing the wrong conclusion. The 2026 workflow always pairs disk imaging with placeholder identification and a separate cloud-side acquisition route.
The thing on the disk may not be the thing the user was using.
Virtual machines are now everywhere on Indian field endpoints: malware analysts run guest VMs to detonate samples in isolation, developers run Linux guests on Windows hosts for tooling parity, the average Pune startup engineer keeps two or three saved-state VMs for different projects, and state cyber cell training labs build dedicated sandbox guests for every new IOC. The forensic implication is that the examiner who images the host's physical SSD is not, by default, looking at the user's working environment. The user's working environment is a guest, and the guest is a bundle of files on the host file system: one or more virtual disk files, a configuration file, a memory file (if suspended), and potentially a snapshot chain.
The nested case is harder. A Windows 11 host can run Hyper-V; inside Hyper-V it can run a Linux guest; inside the Linux guest, KVM can run another guest. Each layer adds a virtual disk file in the layer above. The examiner faces a recursive problem: image the host, find the guest's virtual disk, mount it, find the nested guest's virtual disk inside, mount that. Tools handle one or two levels routinely; three or more is rare in Indian field practice but is appearing in sophisticated malware sandbox environments at CFSL Hyderabad. The malware-detonation case is covered in depth at Malware Forensics: Static, Dynamic, Sandbox and Memory Analysis.
The on-disk fingerprint of a VM host is the directory of virtual machine files, usually under Documents\Virtual Machines\<name>\ on VMware Workstation, C:\Users\<user>\VirtualBox VMs\<name>\ for VirtualBox, C:\ProgramData\Microsoft\Windows\Hyper-V\ for Hyper-V, and /var/lib/libvirt/images/ for KVM on Linux. The examiner who lists a Windows disk image and finds a .vmdk or .vhdx file knows immediately that a parallel acquisition path is required.
Four virtual disk formats, four parsing toolchains.
The examiner meets four virtual disk formats with any frequency on Indian endpoints. Each comes from a different hypervisor lineage, each has a slightly different on-disk header, and each maps to a different mounting toolchain.
| Format | Hypervisor | Companion files | On-disk variants | Standard mount tooling |
|---|---|---|---|---|
| VMDK | VMware Workstation, ESXi, Fusion | VMX (config), VMSN/VMSS (snapshot), VMEM (memory) | Monolithic flat, monolithic sparse, split 2GB sparse, stream-optimised | FTK Imager, Arsenal Image Mounter, vmware-mount, libguestfs |
| VHD / VHDX | Hyper-V, Virtual PC, Windows containers | AVHD/AVHDX (differencing), XML (Hyper-V 1.0) or VMCX (2.0) config | Fixed, dynamic, differencing | MountVHD (PowerShell), FTK Imager, Arsenal Image Mounter, libguestfs |
| VDI | VirtualBox | VBOX (XML config), VBOX-prev (backup config) | Dynamic, fixed | VBoxManage clonemedium to VMDK then mount, FTK Imager |
Every snapshot is a point-in-time; the chain is where the timeline lives.
A snapshot is a hypervisor's way of capturing a VM's state at a point in time without copying the whole disk. After a snapshot, the original VMDK or VHDX becomes read-only (the parent), and a new differencing disk is created (the child). All subsequent guest writes go to the child. Reading from the snapshot's point of view means reading the parent for any block the child has not modified, and the child for blocks it has. Take a second snapshot and a second child appears; the chain is now parent, child1, child2. Reading the "current" state means walking parent then child1 then child2 in order, with each child masking the blocks it has written.
For forensic work, this is gold. Each snapshot is a point-in-time copy. A malware analyst's VM with five snapshots taken at different stages of an investigation gives the examiner five distinct disk states to compare. The same chain appears on attacker-controlled VMs that the attacker used as a workspace; comparing the snapshot just before the malware was introduced with the snapshot just after gives the cleanest possible "before and after" for behaviour analysis.
Read-only mount, then carve as usual; VMEM is the RAM dump.
Mounting a virtual disk read-only is the standard examination move. Once mounted, the guest file system appears as a logical drive on the analysis host, and the entire normal toolchain (Autopsy, FTK, Magnet AXIOM, plaso, EZ Tools, Volatility for memory) operates exactly as on a physical disk image. The variation across formats is which tool does the mounting.
Practical mount recipes:
Mount-VHD -Path C:\evidence\sample.vhdx -ReadOnly exposes the disk as a logical drive. Dismount with Dismount-VHD. Hyper-V Manager can also do this graphically.qemu-nbd --read-only --connect=/dev/nbd0 evidence.qcow2, then kpartx -av /dev/nbd0 to expose partitions, then mount -o ro,noload /dev/mapper/nbd0p2 /mnt/evidence. libguestfs (guestfish --ro -a evidence.qcow2) gives a higher-level shell that does not require the host kernel's NBD support.VBoxManage clonemedium disk evidence.vdi evidence.vmdk --format VMDK converts to VMDK; mount that with FTK Imager or Arsenal Image Mounter.VM memory acquisition is uniquely tractable. On a physical machine, capturing RAM requires running a live tool against a powered system. On a VM, the hypervisor already exposes guest RAM as a file: .vmem for VMware, .vmss and .vmsn for VMware suspended state, .bin for Hyper-V saved state. A clean way to capture running-VM memory is to take a snapshot of the running VM; VMware writes the guest's RAM contents to a .vmem at snapshot time, producing a consistent memory image without disrupting the guest. Volatility 3 then processes the .vmem with a Windows or Linux profile to extract running processes, network connections, decrypted keys, and clipboard contents exactly as for a physical-machine RAM dump.
The icon on the desktop is not the file. The disk image alone misses the content.
Modern operating systems ship with cloud file sync as a first-class feature. OneDrive Files On-Demand on Windows, Drive for Desktop on Windows and macOS, iCloud Drive Optimised Storage on macOS, and Dropbox Smart Sync all share a common architecture: present files in the local file browser as if they live on the local disk, but actually fetch the content from the cloud on demand. The on-disk artifact is a placeholder, not the file.
| Service | Placeholder mechanism | Sync database | Forensic identification path |
|---|---|---|---|
| OneDrive Files On-Demand (Windows) | NTFS reparse point with tag IO_REPARSE_TAG_CLOUD (0x9000001A) | %LocalAppData%\Microsoft\OneDrive\settings\<id>\*.dat, Cloud State.sqlite | fsutil reparsepoint query <file> shows the tag; placeholders are 0 bytes on disk |
| Google Drive for Desktop | Virtual file system mount (drive letter on Windows, mount point on macOS) backed by FUSE-like driver | %LocalAppData%\Google\DriveFS\<profile>\metadata_sqlite_db, root_preference_sqlite.db | Drive letter or volume mount is virtual; underlying NTFS shows the FS driver, not the files |
| iCloud Drive Optimised Storage (macOS) | APFS clone with extended attribute com.apple.iCloud or com.apple.brokenAlias |
Docker and Kubernetes are even less stable than VMs.
Container forensics is a separate and significantly harder problem than VM forensics. A container is not a complete VM; it is a process group on the host kernel, isolated by namespaces and cgroups but sharing the host's kernel, kernel memory, and a large portion of the host file system. There is no guest disk file to image; there are file system layers, a thin overlay of writeable changes, and runtime metadata.
Docker on Linux uses the overlay2 storage driver by default. The root of the storage is /var/lib/docker/overlay2/, with one subdirectory per image layer or container layer. Each container has an <id>/ directory containing: lower (a file listing the parent layers, read-only), upper/ (the writeable layer where the container's runtime changes accumulate), merged/ (the unified view the container actually sees while running), and work/ (overlay2 housekeeping). The running container's process logs and metadata live in /var/lib/docker/containers/<id>/, including config.v2.json and <id>-json.log for container stdout and stderr.
For a forensic examiner, the practical implications are sharp. First, stopping the container freezes the upper layer at its current state, which is a usable disk image of the writeable layer. Second, the image layers in /var/lib/docker/overlay2/ are themselves examinable; an attacker who built a malicious image left fingerprints in the layer manifests and in the image's manifest.json and config.json. Third, the container's namespace-isolated process memory is reachable from the host via /proc/<pid>/maps and /proc/<pid>/mem, so Volatility-style RAM forensics works on the host with the container's PID rather than as a guest.
Kubernetes adds ephemerality on top. A pod can be scheduled, run for ninety seconds, and be killed by the scheduler. The container's file system layer is then garbage-collected on the node within minutes. Any forensic evidence has to be captured from the orchestration layer's logs (Fluentd, Fluent Bit, Loki, or the CRI runtime's journal) or from a persistent volume claim that the pod wrote to. The pod itself is gone.
Tools and methods seen in Indian incident-response practice:
An examiner finds a directory on a seized Windows laptop containing Win10.vmdk, Win10-s001.vmdk through Win10-s003.vmdk, Win10.vmx, Win10.vmsn and Win10.vmem. What does the .vmem file contain, and what should the examiner do with it?
| qcow2 / raw | QEMU, KVM, OpenStack, ProxMox | .xml libvirt config | qcow2 sparse, raw fixed | qemu-nbd plus losetup, libguestfs (guestfish, virt-cat), FTK Imager (raw) |
VMDK is dominant on Indian developer laptops because VMware Workstation Pro is widely licensed at corporate scale. Monolithic VMDK is a single large file containing the entire virtual disk; split VMDK breaks it into 2 GB chunks (an artifact of the FAT32 era that persists for cross-host portability). Sparse variants only allocate physical storage for blocks that contain data, so a 200 GB sparse VMDK with 60 GB of guest data occupies roughly 60 GB on the host file system.
VHDX is the Hyper-V format and the default for new Windows Server VMs since 2012. It supports up to 64 TB, includes block-level CRC32 for corruption detection, and handles 4K-sector physical media natively. VHDX is also used by Windows Server containers in process-isolated mode, which is increasingly seen on Indian fintech infrastructure.
VDI is VirtualBox's native format. Indian students and budget developers use VirtualBox because it is free; the examiner sees VDI on home machines and on academic lab setups. VirtualBox can also use VMDK, VHD and raw, so the format on disk is not always VDI even when VirtualBox is the hypervisor.
qcow2 is the QEMU and KVM format and dominates Indian Linux server infrastructure, including CERT-In's published malware sandbox stack and most state-FSL Linux training labs. The qcow2 file is itself sophisticated: it has internal snapshot support, optional zlib or zstd compression, and optional AES or LUKS encryption applied inside the format. An encrypted qcow2 file is opaque to libguestfs unless the key is supplied.
For a malware-analysis VM where the user has already done the work of detonating a sample and capturing snapshots, the examiner has everything: snapshot N-1 (pre-detonation disk), snapshot N (post-detonation disk), the .vmem captured at snapshot N (guest RAM at the moment of detonation), and a textual VMX configuration showing the network setup. This is a more complete forensic picture than any physical machine produces, because the hypervisor is itself an instrumented platform.
| ~/Library/Application Support/CloudDocs, server.db (SQLite) |
| ls -l@ shows the EA; mdls reports cloud status |
| Dropbox Smart Sync / online-only | NTFS reparse point on Windows, extended attribute on macOS | %AppData%\Dropbox\info.json, deleted.dbx, filecache.dbx | Sync DB enumerates which files are online-only vs locally cached |
The forensic implication is brutal in its simplicity: a file shown in Explorer or Finder with the user's name and a modification date may have never sat on the local SSD. The on-disk image captured by the examiner contains a reparse point or a clone marker plus an entry in a sync SQLite database. The actual bytes live in a Microsoft, Google, Apple or Dropbox data centre. A disk image alone does not contain the file's contents.
How to identify cloud-stored files on a local endpoint, in field-practical order:
fsutil reparsepoint query <path> reports the reparse tag. IO_REPARSE_TAG_CLOUD and its variants identify OneDrive placeholders. A PowerShell sweep enumerates every placeholder across a volume.xattr -l <path> lists all extended attributes. com.apple.iCloud, com.apple.brokenAlias, and the various com.apple.metadata:kMDItem* cloud attributes identify iCloud Drive items.