Virtual Machine and Cloud-Backed Endpoint Forensics

Your journey to becoming a forensic professional starts here.

Practice with national-level exam (FACT, FACT Plus, NET, CUET, etc.) mocks, learn from structured notes, and get your doubts solved in one place.

Start Free Mock Test Create Your Account

Virtual Machine and Cloud-Backed Endpoint Forensics | ForensicSpot

The endpoint a forensic examiner images in 2026 is usually not what it pretends to be. A Bengaluru analyst's "desktop" may be a Windows guest running on a VMware Workstation host, with the real bits living in a 200 GB VMDK file plus three differential snapshots. A Mumbai banker's "C drive" may be a thin volume whose contents stream from OneDrive Files On-Demand, where the icon on screen represents a 0-byte placeholder that resolves to a cloud GET only on access. A Pune developer's "production server" may be a Docker container with an overlay2 file system whose layers vanish when the orchestrator schedules a new pod. The examiner who treats every device as a bare-metal Windows install will miss the bulk of the evidence and will not even know it.

The contrarian point most candidates miss is that the line between "local disk evidence" and "cloud evidence" has dissolved on modern endpoints. A file shown in File Explorer with the user's name and a modification date may have never touched the SSD; the local file system holds only a reparse point and a sync database row. A Section 63 BSA certificate for a disk image that shows that file in a listing is not a certificate for the file's contents, because the contents live in a Microsoft data centre. The examiner who imports a disk image into AXIOM, ticks "all files present", and reports the case as solved is publishing the wrong conclusion. The 2026 workflow always pairs disk imaging with placeholder identification and a separate cloud-side acquisition route.

Key terms

VMDK: VMware Virtual Machine Disk. The on-host file that holds the guest VM's disk contents, either as a monolithic file or split into 2 GB chunks. Sparse, preallocated and stream-optimised variants exist.
VHDX: Microsoft's second-generation virtual hard disk format, used by Hyper-V and recent Windows containers. Supports up to 64 TB, has block-level CRC, and is the default for new Hyper-V VMs since Windows Server 2012.
qcow2: QEMU Copy-On-Write version 2. The native disk format for QEMU and KVM hypervisors, with built-in snapshot, compression and encryption support. Used widely on Indian Linux servers, OpenStack clouds and CERT-In sandbox infrastructure.
Snapshot chain: A parent VMDK or VHDX plus one or more child differential disks. Each child captures changes since its parent was snapshotted. Reading a specific snapshot requires walking the chain back to the parent.
Files On-Demand: Microsoft OneDrive feature that represents cloud-only files as 0-byte placeholders on the local NTFS volume, marked with a reparse tag IO_REPARSE_TAG_CLOUD. The content downloads transparently when the user opens the file.
Reparse point: An NTFS attribute that redirects file access to another handler. Cloud providers use reparse tags to insert their sync drivers between the user's read and the actual content fetch.

Why VMs change the imaging question

The thing on the disk may not be the thing the user was using.

Virtual machines are now everywhere on Indian field endpoints: malware analysts run guest VMs to detonate samples in isolation, developers run Linux guests on Windows hosts for tooling parity, the average Pune startup engineer keeps two or three saved-state VMs for different projects, and state cyber cell training labs build dedicated sandbox guests for every new IOC. The forensic implication is that the examiner who images the host's physical SSD is not, by default, looking at the user's working environment. The user's working environment is a guest, and the guest is a bundle of files on the host file system: one or more virtual disk files, a configuration file, a memory file (if suspended), and potentially a snapshot chain.

The nested case is harder. A Windows 11 host can run Hyper-V; inside Hyper-V it can run a Linux guest; inside the Linux guest, KVM can run another guest. Each layer adds a virtual disk file in the layer above. The examiner faces a recursive problem: image the host, find the guest's virtual disk, mount it, find the nested guest's virtual disk inside, mount that. Tools handle one or two levels routinely; three or more is rare in Indian field practice but is appearing in sophisticated malware sandbox environments at CFSL Hyderabad. The malware-detonation case is covered in depth at Malware Forensics: Static, Dynamic, Sandbox and Memory Analysis.

The on-disk fingerprint of a VM host is the directory of virtual machine files, usually under Documents\Virtual Machines\<name>\ on VMware Workstation, C:\Users\<user>\VirtualBox VMs\<name>\ for VirtualBox, C:\ProgramData\Microsoft\Windows\Hyper-V\ for Hyper-V, and /var/lib/libvirt/images/ for KVM on Linux. The examiner who lists a Windows disk image and finds a .vmdk or .vhdx file knows immediately that a parallel acquisition path is required.

VM file formats: VMDK, VHDX, VDI, qcow2

Four virtual disk formats, four parsing toolchains.

The examiner meets four virtual disk formats with any frequency on Indian endpoints. Each comes from a different hypervisor lineage, each has a slightly different on-disk header, and each maps to a different mounting toolchain.

Format	Hypervisor	Companion files	On-disk variants	Standard mount tooling
VMDK	VMware Workstation, ESXi, Fusion	VMX (config), VMSN/VMSS (snapshot), VMEM (memory)	Monolithic flat, monolithic sparse, split 2GB sparse, stream-optimised	FTK Imager, Arsenal Image Mounter, vmware-mount, libguestfs
VHD / VHDX	Hyper-V, Virtual PC, Windows containers	AVHD/AVHDX (differencing), XML (Hyper-V 1.0) or VMCX (2.0) config	Fixed, dynamic, differencing	MountVHD (PowerShell), FTK Imager, Arsenal Image Mounter, libguestfs
VDI	VirtualBox	VBOX (XML config), VBOX-prev (backup config)	Dynamic, fixed	VBoxManage clonemedium to VMDK then mount, FTK Imager

Snapshot forensics and the differencing-disk chain

Every snapshot is a point-in-time; the chain is where the timeline lives.

A snapshot is a hypervisor's way of capturing a VM's state at a point in time without copying the whole disk. After a snapshot, the original VMDK or VHDX becomes read-only (the parent), and a new differencing disk is created (the child). All subsequent guest writes go to the child. Reading from the snapshot's point of view means reading the parent for any block the child has not modified, and the child for blocks it has. Take a second snapshot and a second child appears; the chain is now parent, child1, child2. Reading the "current" state means walking parent then child1 then child2 in order, with each child masking the blocks it has written.

For forensic work, this is gold. Each snapshot is a point-in-time copy. A malware analyst's VM with five snapshots taken at different stages of an investigation gives the examiner five distinct disk states to compare. The same chain appears on attacker-controlled VMs that the attacker used as a workspace; comparing the snapshot just before the malware was introduced with the snapshot just after gives the cleanest possible "before and after" for behaviour analysis.

Identify the chain
On VMware, the .vmsd file in the VM directory describes the snapshot tree. On Hyper-V, the .avhdx differencing disks have parent IDs in their headers. On VirtualBox, the .vbox XML describes the snapshot graph. Map the parent-child relationships before mounting.
Copy the entire chain offline
Copy parent and every child to the analysis workstation. Do not copy only the child; without the parent, the differencing disk holds only deltas and is not a complete disk image.

Mounting VM disks and capturing VM memory

Read-only mount, then carve as usual; VMEM is the RAM dump.

Mounting a virtual disk read-only is the standard examination move. Once mounted, the guest file system appears as a logical drive on the analysis host, and the entire normal toolchain (Autopsy, FTK, Magnet AXIOM, plaso, EZ Tools, Volatility for memory) operates exactly as on a physical disk image. The variation across formats is which tool does the mounting.

Practical mount recipes:

VMDK on Windows analysis host. FTK Imager → Add Evidence Item → Image File → select the .vmdk. Arsenal Image Mounter is the second-choice tool and supports more recent VMDK variants. Both mount read-only by default. For split VMDK, point the tool at the descriptor .vmdk; the tool follows to the extents.
VHDX on Windows. PowerShell Mount-VHD -Path C:\evidence\sample.vhdx -ReadOnly exposes the disk as a logical drive. Dismount with Dismount-VHD. Hyper-V Manager can also do this graphically.
qcow2 on Linux. qemu-nbd --read-only --connect=/dev/nbd0 evidence.qcow2, then kpartx -av /dev/nbd0 to expose partitions, then mount -o ro,noload /dev/mapper/nbd0p2 /mnt/evidence. libguestfs (guestfish --ro -a evidence.qcow2) gives a higher-level shell that does not require the host kernel's NBD support.
VDI on any host. VBoxManage clonemedium disk evidence.vdi evidence.vmdk --format VMDK converts to VMDK; mount that with FTK Imager or Arsenal Image Mounter.

VM memory acquisition is uniquely tractable. On a physical machine, capturing RAM requires running a live tool against a powered system. On a VM, the hypervisor already exposes guest RAM as a file: .vmem for VMware, .vmss and .vmsn for VMware suspended state, .bin for Hyper-V saved state. A clean way to capture running-VM memory is to take a snapshot of the running VM; VMware writes the guest's RAM contents to a .vmem at snapshot time, producing a consistent memory image without disrupting the guest. Volatility 3 then processes the .vmem with a Windows or Linux profile to extract running processes, network connections, decrypted keys, and clipboard contents exactly as for a physical-machine RAM dump.

Cloud-backed endpoints: the file-locator problem

The icon on the desktop is not the file. The disk image alone misses the content.

Modern operating systems ship with cloud file sync as a first-class feature. OneDrive Files On-Demand on Windows, Drive for Desktop on Windows and macOS, iCloud Drive Optimised Storage on macOS, and Dropbox Smart Sync all share a common architecture: present files in the local file browser as if they live on the local disk, but actually fetch the content from the cloud on demand. The on-disk artifact is a placeholder, not the file.

Service	Placeholder mechanism	Sync database	Forensic identification path
OneDrive Files On-Demand (Windows)	NTFS reparse point with tag IO_REPARSE_TAG_CLOUD (0x9000001A)	%LocalAppData%\Microsoft\OneDrive\settings\<id>\*.dat, Cloud State.sqlite	fsutil reparsepoint query <file> shows the tag; placeholders are 0 bytes on disk
Google Drive for Desktop	Virtual file system mount (drive letter on Windows, mount point on macOS) backed by FUSE-like driver	%LocalAppData%\Google\DriveFS\<profile>\metadata_sqlite_db, root_preference_sqlite.db	Drive letter or volume mount is virtual; underlying NTFS shows the FS driver, not the files
iCloud Drive Optimised Storage (macOS)	APFS clone with extended attribute com.apple.iCloud or com.apple.brokenAlias

Container forensics primer: overlay2, Kubernetes ephemerality

Docker and Kubernetes are even less stable than VMs.

Container forensics is a separate and significantly harder problem than VM forensics. A container is not a complete VM; it is a process group on the host kernel, isolated by namespaces and cgroups but sharing the host's kernel, kernel memory, and a large portion of the host file system. There is no guest disk file to image; there are file system layers, a thin overlay of writeable changes, and runtime metadata.

Docker on Linux uses the overlay2 storage driver by default. The root of the storage is /var/lib/docker/overlay2/, with one subdirectory per image layer or container layer. Each container has an <id>/ directory containing: lower (a file listing the parent layers, read-only), upper/ (the writeable layer where the container's runtime changes accumulate), merged/ (the unified view the container actually sees while running), and work/ (overlay2 housekeeping). The running container's process logs and metadata live in /var/lib/docker/containers/<id>/, including config.v2.json and <id>-json.log for container stdout and stderr.

For a forensic examiner, the practical implications are sharp. First, stopping the container freezes the upper layer at its current state, which is a usable disk image of the writeable layer. Second, the image layers in /var/lib/docker/overlay2/ are themselves examinable; an attacker who built a malicious image left fingerprints in the layer manifests and in the image's manifest.json and config.json. Third, the container's namespace-isolated process memory is reachable from the host via /proc/<pid>/maps and /proc/<pid>/mem, so Volatility-style RAM forensics works on the host with the container's PID rather than as a guest.

Kubernetes adds ephemerality on top. A pod can be scheduled, run for ninety seconds, and be killed by the scheduler. The container's file system layer is then garbage-collected on the node within minutes. Any forensic evidence has to be captured from the orchestration layer's logs (Fluentd, Fluent Bit, Loki, or the CRI runtime's journal) or from a persistent volume claim that the pod wrote to. The pod itself is gone.

Tools and methods seen in Indian incident-response practice:

vmkfstools for direct VMFS-level operations on VMware ESXi datastores when imaging running production VMs.

Worked example

A Bengaluru fintech breach: VMware host, OneDrive endpoint, and a Docker exfil container

Three acquisition routes on one investigation.

A Bengaluru fintech reports unauthorised transfers totalling Rs 4.1 crore over a weekend. The state cyber cell traces the lateral movement to an analyst's laptop, which on inspection turns out to be a Windows 11 host running VMware Workstation Pro with a Windows 10 guest where the actual work happened. The host C drive has 230 GB free, an unremarkable Windows installation, and a Documents\Virtual Machines\Analyst-Win10\ directory holding Analyst-Win10.vmdk (sparse, currently 180 GB on disk), two .vmsn snapshot files, a .vmem from a recent suspend, and a .vmx config file.

The examiner images the host SSD behind a NVMe write-blocker, computes SHA-256 of the image, and lists the VMware Workstation files. She then mounts the .vmdk leaf snapshot read-only in Arsenal Image Mounter and treats the guest as a second exhibit with its own hash and Section 63 certificate. Inside the guest, File Explorer on the Desktop shows about 90 files, of which 64 carry the cloud-only icon for OneDrive Files On-Demand. A fsutil reparsepoint sweep confirms the IO_REPARSE_TAG_CLOUD tag on each; OneDriveExplorer parses the guest's Cloud State.sqlite and produces a list of the cloud-only filenames, the OneDrive sync IDs, and the last-modified timestamps from the Microsoft tenant.

In parallel, the investigator obtains the analyst's OneDrive credential under a BNSS 105 videographed acquisition session at the cyber-cell premises, with the user's consent recorded. Magnet AXIOM Cloud fetches the actual file contents from Microsoft 365 against that credential. The downloaded files reveal a CSV of customer account numbers that matches the unauthorised transfer list.

The exfiltration path is then traced to a Docker container running on a forgotten staging VM on the company's vSphere cluster. The IR team takes a vSphere snapshot of the running staging VM, exporting the .vmem and the running disk state without halting the workload. On the resulting disk image, /var/lib/docker/containers/ has an entry for a container named "report-gen" whose -json.log contains outbound HTTP POST entries to a Tor exit node. The container's overlay2 upper layer holds the staged CSV files and a Python script that uploaded them. The container has been deleted, but the layer in /var/lib/docker/overlay2/ is still present because Docker's garbage collection had not yet run.

Exhibit pack: host disk image, guest VMDK leaf-snapshot image, guest .vmem, OneDrive cloud return from Microsoft 365 (with vendor Section 63 certificate), vSphere snapshot disk image, vSphere .vmem, Docker container overlay2 directory image. Seven hashes, three Section 63 certificates (cyber-cell examiner for the host and guest, Microsoft India for the OneDrive cloud return, fintech IT manager for the vSphere snapshot), and a BNSS 105 videography file covering the OneDrive acquisition. The case file is unusually complete; the defence focus shifts to authorisation rather than authenticity.

Practice

Question 1 of 5· 0 answered

An examiner finds a directory on a seized Windows laptop containing Win10.vmdk, Win10-s001.vmdk through Win10-s003.vmdk, Win10.vmx, Win10.vmsn and Win10.vmem. What does the .vmem file contain, and what should the examiner do with it?

Frequently asked questions

If a suspect's workstation is a VMware guest, do I need to image both the host and the guest?

Yes. The host image proves the existence of the VM and captures the .vmx config, the snapshot chain, any .vmem files and the surrounding artifact context (recent files, browser history, USB device history) on the host. The guest image (mounted from the .vmdk leaf snapshot) holds the user's actual working environment. Each is a separate exhibit with its own SHA-256 and its own Section 63 certificate.

Can I parse OneDrive cloud-only files from a disk image without the user's credential?

You can enumerate them. OneDriveExplorer and similar tools parse the local sync database to list every cloud-only file's name, path, timestamps and sync ID. You cannot retrieve the contents from the local image; the bytes live in Microsoft 365. To obtain contents you need either the user's credential (captured under BNSS 105 videography) or a BNSS Section 94 production notice to Microsoft Corporation (India) Pvt Ltd's nodal officer.

How does a snapshot differ from a backup in VM forensics?

A snapshot is a hypervisor-managed differencing disk that captures changes since the snapshot was taken; the original disk becomes read-only and a new child accumulates writes. A backup is an out-of-band complete copy of the disk file as it existed at backup time. For forensics, snapshots are richer because the chain itself records a timeline of states, but each snapshot depends on its parent and is meaningless without it; a backup is self-contained.

What is the safest way to capture memory from a running VMware VM without disrupting it?

Take a snapshot of the running VM with the 'snapshot memory' option enabled. VMware writes the guest's current RAM contents to a .vmem file as part of the snapshot. The guest does not pause for the user, and the resulting .vmem is a consistent memory image suitable for Volatility 3 analysis. This is significantly less disruptive than running an in-guest memory acquisition tool.