Virtual Machine and Cloud-Backed Endpoint Forensics
VMDK, VHDX, qcow2 and the snapshot chain; OneDrive Files On-Demand and iCloud placeholders; Docker overlay2; and how Indian cyber cells handle VM and cloud-stored evidence under BNSS 105.
Last updated:
Virtual machine and cloud-backed endpoint forensics addresses the gap between what a local disk image shows and where evidence actually resides. On a virtualised endpoint, the user's working environment lives inside a VMDK, VHDX, or qcow2 file on the host, not on a bare partition; on a cloud-synced endpoint, files shown in the file browser may be 0-byte reparse-point placeholders whose content has never touched the local SSD. Forensic acquisition therefore requires parallel paths: image the host disk, mount the virtual disk chain read-only as a separate exhibit, identify cloud placeholders, and obtain cloud content through credential-based or legal-process routes. Each path yields its own hash and its own Section 63 BSA certificate.
The endpoint a forensic examiner images in 2026 is frequently not a bare-metal machine. A Bengaluru analyst's desktop may be a Windows guest running inside VMware Workstation, with the actual disk contents stored in a 200 GB VMDK file plus three differential snapshots on the host. A Mumbai banker's C drive may be a thin volume whose files stream from OneDrive Files On-Demand: the icon in File Explorer represents a 0-byte placeholder, and the content resolves from the cloud only on access. A Pune developer's production server may be a Docker container with an overlay2 file system whose writeable layer is destroyed when the orchestrator reschedules the pod. Treating any of these as a conventional bare-metal Windows install risks missing the bulk of the evidence entirely.
Key takeaways
- A disk image certified under BSA Section 63 may not cover the actual file contents if those contents are OneDrive Files On-Demand placeholders that never touched the local SSD.
- Virtual machines are common on Indian field endpoints: malware analysts, developers, and startups all run guest VMs, meaning the forensic examiner may need to examine a VMDK and its differential snapshots rather than a bare physical disk.
- Docker containers with overlay2 file systems can lose evidence layers permanently when an orchestrator reschedules a pod, so volatile container state must be captured before shutdown.
- The 2026 forensic workflow pairs disk imaging with placeholder identification and a separate cloud-side acquisition route, because local and cloud evidence have merged on modern endpoints.
- An examiner who imports a disk image and treats all listed files as present risks publishing the wrong conclusion if the files are reparse points pointing to data in a Microsoft data centre.
The contrarian point most candidates miss is that the line between "local disk evidence" and "cloud evidence" has dissolved on modern endpoints. A file shown in File Explorer with the user's name and a modification date may have never touched the SSD; the local file system holds only a reparse point and a sync database row. A Section 63 BSA certificate for a disk image that shows that file in a listing is not a certificate for the file's contents, because the contents live in a Microsoft data centre. The examiner who imports a disk image into AXIOM, ticks "all files present", and reports the case as solved is publishing the wrong conclusion. The 2026 workflow always pairs disk imaging with placeholder identification and a separate cloud-side acquisition route.
By the end of this topic you will be able to:
- Identify and mount the four principal virtual disk formats (VMDK, VHDX, VDI, qcow2) read-only using the appropriate toolchain for each hypervisor lineage.
- Reconstruct a VM snapshot chain, compute independent hashes for each snapshot view, and diff consecutive snapshots to establish a behavioural timeline.
- Enumerate cloud-only file placeholders on Windows (NTFS reparse points) and macOS (APFS extended attributes), and distinguish what a disk image can prove from what requires a separate cloud acquisition route.
- Apply the correct legal framework under BNSS Section 105 videography and Section 94 production notices when acquiring cloud-stored evidence from OneDrive, Google Drive, or iCloud.
- Locate and preserve Docker overlay2 container layers and Kubernetes pod logs before orchestrator garbage collection destroys the writeable layer.
- VMDK
- VMware Virtual Machine Disk. The on-host file that holds the guest VM's disk contents, either as a monolithic file or split into 2 GB chunks. Sparse, preallocated and stream-optimised variants exist.
- VHDX
- Microsoft's second-generation virtual hard disk format, used by Hyper-V and recent Windows containers. Supports up to 64 TB, has block-level CRC, and is the default for new Hyper-V VMs since Windows Server 2012.
- qcow2
- QEMU Copy-On-Write version 2. The native disk format for QEMU and KVM hypervisors, with built-in snapshot, compression and encryption support. Used widely on Indian Linux servers, OpenStack clouds and CERT-In sandbox infrastructure.
- Snapshot chain
- A parent VMDK or VHDX plus one or more child differential disks. Each child captures changes since its parent was snapshotted. Reading a specific snapshot requires walking the chain back to the parent.
- Files On-Demand
- Microsoft OneDrive feature that represents cloud-only files as 0-byte placeholders on the local NTFS volume, marked with a reparse tag IO_REPARSE_TAG_CLOUD. The content downloads transparently when the user opens the file.
- Reparse point
- An NTFS attribute that redirects file access to another handler. Cloud providers use reparse tags to insert their sync drivers between the user's read and the actual content fetch.
Why VMs change the imaging question
Virtual machines are now everywhere on Indian field endpoints: malware analysts run guest VMs to detonate samples in isolation, developers run Linux guests on Windows hosts for tooling parity, the average Pune startup engineer keeps two or three saved-state VMs for different projects, and state cyber cell training labs build dedicated sandbox guests for every new IOC. The forensic implication is that the examiner who images the host's physical SSD is not, by default, looking at the user's working environment. The user's working environment is a guest, and the guest is a bundle of files on the host file system: one or more virtual disk files, a configuration file, a memory file (if suspended), and potentially a snapshot chain.
The nested case is harder. A Windows 11 host can run Hyper-V; inside Hyper-V it can run a Linux guest; inside the Linux guest, KVM can run another guest. Each layer adds a virtual disk file in the layer above. The examiner faces a recursive problem: image the host, find the guest's virtual disk, mount it, find the nested guest's virtual disk inside, mount that. Tools handle one or two levels routinely; three or more is rare in Indian field practice but is appearing in sophisticated malware sandbox environments at CFSL Hyderabad. The malware-detonation case is covered in depth at Malware Forensics: Static, Dynamic, Sandbox and Memory Analysis.
The on-disk fingerprint of a VM host is the directory of virtual machine files, usually under Documents\Virtual Machines\<name>\ on VMware Workstation, C:\Users\<user>\VirtualBox VMs\<name>\ for VirtualBox, C:\ProgramData\Microsoft\Windows\Hyper-V\ for Hyper-V, and /var/lib/libvirt/images/ for KVM on Linux. The examiner who lists a Windows disk image and finds a .vmdk or .vhdx file knows immediately that a parallel acquisition path is required.
VM file formats: VMDK, VHDX, VDI, qcow2
The examiner meets four virtual disk formats with any frequency on Indian endpoints. Each comes from a different hypervisor lineage, each has a slightly different on-disk header, and each maps to a different mounting toolchain.
| Format | Hypervisor | Companion files | On-disk variants | Standard mount tooling |
|---|---|---|---|---|
| VMDK | VMware Workstation, ESXi, Fusion | VMX (config), VMSN/VMSS (snapshot), VMEM (memory) | Monolithic flat, monolithic sparse, split 2GB sparse, stream-optimised | FTK Imager, Arsenal Image Mounter, vmware-mount, libguestfs |
| VHD / VHDX | Hyper-V, Virtual PC, Windows containers | AVHD/AVHDX (differencing), XML (Hyper-V 1.0) or VMCX (2.0) config | Fixed, dynamic, differencing | MountVHD (PowerShell), FTK Imager, Arsenal Image Mounter, libguestfs |
| VDI | VirtualBox | VBOX (XML config), VBOX-prev (backup config) | Dynamic, fixed | VBoxManage clonemedium to VMDK then mount, FTK Imager |
| qcow2 / raw | QEMU, KVM, OpenStack, ProxMox | .xml libvirt config | qcow2 sparse, raw fixed | qemu-nbd plus losetup, libguestfs (guestfish, virt-cat), FTK Imager (raw) |
VMDK is dominant on Indian developer laptops because VMware Workstation Pro is widely licensed at corporate scale. Monolithic VMDK is a single large file containing the entire virtual disk; split VMDK breaks it into 2 GB chunks (an artifact of the FAT32 era that persists for cross-host portability). Sparse variants only allocate physical storage for blocks that contain data, so a 200 GB sparse VMDK with 60 GB of guest data occupies roughly 60 GB on the host file system.
VHDX is the Hyper-V format and the default for new Windows Server VMs since 2012. It supports up to 64 TB, includes block-level CRC32 for corruption detection, and handles 4K-sector physical media natively. VHDX is also used by Windows Server containers in process-isolated mode, which is increasingly seen on Indian fintech infrastructure.
VDI is VirtualBox's native format. Indian students and budget developers use VirtualBox because it is free; the examiner sees VDI on home machines and on academic lab setups. VirtualBox can also use VMDK, VHD and raw, so the format on disk is not always VDI even when VirtualBox is the hypervisor.
qcow2 is the QEMU and KVM format and dominates Indian Linux server infrastructure, including CERT-In's published malware sandbox stack and most state-FSL Linux training labs. The qcow2 file is itself sophisticated: it has internal snapshot support, optional zlib or zstd compression, and optional AES or LUKS encryption applied inside the format. An encrypted qcow2 file is opaque to libguestfs unless the key is supplied.
Snapshot forensics and the differencing-disk chain
A snapshot is a hypervisor's way of capturing a VM's state at a point in time without copying the whole disk. After a snapshot, the original VMDK or VHDX becomes read-only (the parent), and a new differencing disk is created (the child). All subsequent guest writes go to the child. Reading from the snapshot's point of view means reading the parent for any block the child has not modified, and the child for blocks it has. Take a second snapshot and a second child appears; the chain is now parent, child1, child2. Reading the "current" state means walking parent then child1 then child2 in order, with each child masking the blocks it has written.
For forensic work, the snapshot chain is a timeline built into the VM itself. A malware analyst's VM with five snapshots taken at different investigation stages yields five distinct disk states. The same structure appears on attacker-controlled VMs used as workspaces; comparing the snapshot taken just before malware was introduced with the one taken just after provides a precise before-and-after record of file system changes.
- Identify the chainOn VMware, the .vmsd file in the VM directory describes the snapshot tree. On Hyper-V, the .avhdx differencing disks have parent IDs in their headers. On VirtualBox, the .vbox XML describes the snapshot graph. Map the parent-child relationships before mounting.
- Copy the entire chain offlineCopy parent and every child to the analysis workstation. Do not copy only the child; without the parent, the differencing disk holds only deltas and is not a complete disk image.
- Mount each snapshot as a distinct disk viewFTK Imager and Arsenal Image Mounter accept the chain by selecting the leaf (most recent child); the tool walks the chain backward to reconstruct that snapshot's view of the disk. To examine an earlier snapshot, mount that earlier child as the leaf and let the tool walk back from there.
- Hash each mount independentlyEach snapshot's reconstructed disk view has its own SHA-256. Record all hashes in the case file. A snapshot chain produces N hashes for N snapshots.
- Diff between snapshots for behaviour analysisMount snapshot N and snapshot N+1, run a recursive comparison (PowerShell Compare-Object, rsync --dry-run, or a forensic diff tool) to enumerate every file created, modified or deleted between the two points.

Mounting VM disks and capturing VM memory

Mounting a virtual disk read-only is the standard examination move. Once mounted, the guest file system appears as a logical drive on the analysis host, and the entire normal toolchain (Autopsy, FTK, Magnet AXIOM, plaso, EZ Tools, Volatility for memory) operates exactly as on a physical disk image. The variation across formats is which tool does the mounting.
Practical mount recipes:
- VMDK on Windows analysis host. FTK Imager → Add Evidence Item → Image File → select the .vmdk. Arsenal Image Mounter is the second-choice tool and supports more recent VMDK variants. Both mount read-only by default. For split VMDK, point the tool at the descriptor .vmdk; the tool follows to the extents.
- VHDX on Windows. PowerShell
Mount-VHD -Path C:\evidence\sample.vhdx -ReadOnlyexposes the disk as a logical drive. Dismount withDismount-VHD. Hyper-V Manager can also do this graphically. - qcow2 on Linux.
qemu-nbd --read-only --connect=/dev/nbd0 evidence.qcow2, thenkpartx -av /dev/nbd0to expose partitions, thenmount -o ro,noload /dev/mapper/nbd0p2 /mnt/evidence. libguestfs (guestfish --ro -a evidence.qcow2) gives a higher-level shell that does not require the host kernel's NBD support. - VDI on any host.
VBoxManage clonemedium disk evidence.vdi evidence.vmdk --format VMDKconverts to VMDK; mount that with FTK Imager or Arsenal Image Mounter.
VM memory acquisition is uniquely tractable. On a physical machine, capturing RAM requires running a live tool against a powered system. On a VM, the hypervisor already exposes guest RAM as a file: .vmem for VMware, .vmss and .vmsn for VMware suspended state, .bin for Hyper-V saved state. A clean way to capture running-VM memory is to take a snapshot of the running VM; VMware writes the guest's RAM contents to a .vmem at snapshot time, producing a consistent memory image without disrupting the guest. Volatility 3 then processes the .vmem with a Windows or Linux profile to extract running processes, network connections, decrypted keys, and clipboard contents exactly as for a physical-machine RAM dump.
Cloud-backed endpoints: the file-locator problem
Modern operating systems ship with cloud file sync as a first-class feature. OneDrive Files On-Demand on Windows, Drive for Desktop on Windows and macOS, iCloud Drive Optimised Storage on macOS, and Dropbox Smart Sync all share a common architecture: present files in the local file browser as if they live on the local disk, but actually fetch the content from the cloud on demand. The on-disk artifact is a placeholder, not the file.
| Service | Placeholder mechanism | Sync database | Forensic identification path |
|---|---|---|---|
| OneDrive Files On-Demand (Windows) | NTFS reparse point with tag IO_REPARSE_TAG_CLOUD (0x9000001A) | %LocalAppData%\Microsoft\OneDrive\settings\<id>\*.dat, Cloud State.sqlite | fsutil reparsepoint query <file> shows the tag; placeholders are 0 bytes on disk |
| Google Drive for Desktop | Virtual file system mount (drive letter on Windows, mount point on macOS) backed by FUSE-like driver | %LocalAppData%\Google\DriveFS\<profile>\metadata_sqlite_db, root_preference_sqlite.db | Drive letter or volume mount is virtual; underlying NTFS shows the FS driver, not the files |
| iCloud Drive Optimised Storage (macOS) | APFS clone with extended attribute com.apple.iCloud or com.apple.brokenAlias | ~/Library/Application Support/CloudDocs, server.db (SQLite) | ls -l@ shows the EA; mdls reports cloud status |
| Dropbox Smart Sync / online-only | NTFS reparse point on Windows, extended attribute on macOS | %AppData%\Dropbox\info.json, deleted.dbx, filecache.dbx | Sync DB enumerates which files are online-only vs locally cached |
The forensic implication is direct: a file shown in Explorer or Finder with the user's name and a modification date may have never resided on the local SSD. The on-disk image captured by the examiner contains a reparse point or a clone marker plus an entry in a sync SQLite database. The actual bytes live in a Microsoft, Google, Apple or Dropbox data centre. A disk image alone does not contain the file's contents.
How to identify cloud-stored files on a local endpoint, in field-practical order:
- Visual cue first. Windows 10 and 11 File Explorer shows a cloud icon on placeholders. Finder on macOS shows a downward-arrow cloud icon on iCloud Drive Optimised Storage files. On a live system, the user's desktop screenshot from seizure photographs this.
- Reparse tag enumeration on Windows.
fsutil reparsepoint query <path>reports the reparse tag. IO_REPARSE_TAG_CLOUD and its variants identify OneDrive placeholders. A PowerShell sweep enumerates every placeholder across a volume. - Extended attributes on macOS.
xattr -l <path>lists all extended attributes.com.apple.iCloud,com.apple.brokenAlias, and the variouscom.apple.metadata:kMDItem*cloud attributes identify iCloud Drive items. - Sync database parsing. OneDriveExplorer parses Cloud State.sqlite and the OneDrive .dat files to list every cloud-only file in the user's tenant. GoogleDriveExtractor handles Drive for Desktop. These tools are increasingly used by Indian state cyber cells in fraud cases involving exfiltration to personal cloud accounts.
Container forensics primer: overlay2, Kubernetes ephemerality
Container forensics is a distinct and more demanding problem than VM forensics. A container is not a complete VM; it is a process group on the host kernel, isolated by namespaces and cgroups but sharing the host's kernel, kernel memory, and a large portion of the host file system. There is no guest disk file to image; there are file system layers, a thin overlay of writeable changes, and runtime metadata.
Docker on Linux uses the overlay2 storage driver by default. The root of the storage is /var/lib/docker/overlay2/, with one subdirectory per image layer or container layer. Each container has an <id>/ directory containing: lower (a file listing the parent layers, read-only), upper/ (the writeable layer where the container's runtime changes accumulate), merged/ (the unified view the container actually sees while running), and work/ (overlay2 housekeeping). The running container's process logs and metadata live in /var/lib/docker/containers/<id>/, including config.v2.json and <id>-json.log for container stdout and stderr.
For a forensic examiner, the practical implications are sharp. First, stopping the container freezes the upper layer at its current state, which is a usable disk image of the writeable layer. Second, the image layers in /var/lib/docker/overlay2/ are themselves examinable; an attacker who built a malicious image left fingerprints in the layer manifests and in the image's manifest.json and config.json. Third, the container's namespace-isolated process memory is reachable from the host via /proc/<pid>/maps and /proc/<pid>/mem, so Volatility-style RAM forensics works on the host with the container's PID rather than as a guest.
Kubernetes adds ephemerality on top. A pod can be scheduled, run for ninety seconds, and be killed by the scheduler. The container's file system layer is then garbage-collected on the node within minutes. Any forensic evidence has to be captured from the orchestration layer's logs (Fluentd, Fluent Bit, Loki, or the CRI runtime's journal) or from a persistent volume claim that the pod wrote to. The pod itself is gone.
Tools and methods seen in Indian incident-response practice:
- vmkfstools for direct VMFS-level operations on VMware ESXi datastores when imaging running production VMs.
- VMware vSphere Forensics for hypervisor-side acquisition of running VM memory and disk via the management API.
- Magnet AXIOM Cloud for OneDrive, Google Drive and iCloud cloud-side acquisition once the user credential or vendor API access is in hand.
- OneDriveExplorer for parsing OneDrive sync databases offline from a disk image.
- GoogleDriveExtractor for the Drive for Desktop sync metadata.
- Docker inspect, docker diff, docker save for read-only inspection of stopped containers in incident response.
An examiner finds a directory on a seized Windows laptop containing Win10.vmdk, Win10-s001.vmdk through Win10-s003.vmdk, Win10.vmx, Win10.vmsn and Win10.vmem. What does the .vmem file contain, and what should the examiner do with it?
Frequently asked questions
If a suspect's workstation is a VMware guest, do I need to image both the host and the guest?
Can I parse OneDrive cloud-only files from a disk image without the user's credential?
How does a snapshot differ from a backup in VM forensics?
What is the safest way to capture memory from a running VMware VM without disrupting it?
Do iCloud Drive Optimised Storage placeholders look the same as OneDrive placeholders on disk?
Where does Docker store the writeable layer of a running container, and is it imageable?
Does BNSS Section 105 videography apply to cloud-stored evidence I download from OneDrive during seizure?
Test yourself on Digital Forensics with free, timed mocks.
Practice Digital Forensics questionsSpotted an error in this page? Report a correction or read our editorial standards.