Disk-Backed HAL
Overview
Cocos uses a disk-backed HAL flow for CVMs. Instead of running the full guest environment entirely from an in-memory initramfs, the CVM boots from a disk image that contains:
- an EFI system partition with GRUB
- a kernel and initramfs
- a read-only root filesystem partition protected by dm-verity
- a separate dm-verity hash partition
- a separate writable
/cocospartition
The early initramfs is still part of the boot chain, it mounts the real root
filesystem read-only, provisions an encrypted /cocos volume, prepares a few
runtime mounts, and then switches into the installed system.
This complements the RAM-only HAL described in On-premises. The RAM-only path keeps everything in memory, while the disk-backed path gives the guest a richer OS image plus an encrypted writable data area.
High-Level Architecture
The current disk-backed HAL implementation is composed of these pieces:
- Buildroot external tree: The HAL lives under
cocos/hal/diskand builds the bootable image layout. - EFI boot image: GRUB loads the kernel and initramfs from the EFI system partition.
- Early initramfs: The initramfs opens the dm-verity root mapping,
provisions the encrypted
/cocospartition, rewrites/etc/fstab, and runsswitch_root. - Runtime system: The guest OS continues booting from the read-only root
partition while using
/cocosfor writable state. - Manager/QEMU integration: The Manager can either boot a kernel and initramfs directly or boot from a disk image when disk mode is enabled.
Disk Layout
The current HAL image produces a GPT disk with four main partitions:
- EFI partition: FAT filesystem containing GRUB, the kernel, and the initramfs
- Root partition: ext4 root filesystem used as the dm-verity data device
- Verity partition: dm-verity hash tree for the root filesystem
- Cocos partition: writable partition provisioned at boot as LUKS2 and
mounted at
/cocos
At runtime, the initramfs expects the boot disk as /dev/sda and uses:
/dev/sda2as the root filesystem partition/dev/sda3as the verity hash partition/dev/sda4as the writable/cocospartition
Boot Flow
The disk-backed initramfs runs as PID 1 and performs the early boot sequence.
1. Early System Setup
The initramfs initializes:
/dev,/proc,/sys,/run, and/tmpdevtmpfsanddevptsconfigfswhen available
2. Root Filesystem Verification And Mount
The initramfs:
- assumes the main disk is
/dev/sda - resolves the root partition as
/dev/sda2 - resolves the verity hash partition as
/dev/sda3 - reads the
roothash=value from the kernel command line - opens
/dev/mapper/root_veritywithveritysetup - mounts
/dev/mapper/root_verityread-only at/root
The root filesystem is not copied from elsewhere during boot. The VM boots directly from the image already attached to QEMU, and dm-verity protects the read-only root mount. The GRUB command line also disables late systemd verity auto-setup because the initramfs has already opened the root mapping.
3. /cocos Provisioning
The initramfs then provisions the writable data partition:
- resolves the writable partition as
/dev/sda4 - generates a fresh ephemeral key
- formats
/dev/sda4as a LUKS2 container - opens it as
/dev/mapper/cocos_crypt - formats that mapper as ext4
- mounts it at
/root/cocos
It also prepares the expected directory structure on /cocos, including:
.cache/ocidatasetsdockercocos_init
4. Writable Runtime Mounts
Because the root filesystem is intentionally read-only, the initramfs mounts:
tmpfson/tmptmpfson/var
It then redirects selected writable paths:
/var/lib/dockeris bind-mounted from/cocos/docker/cocos_initis backed by/cocos/cocos_init
5. Configuration Handoff
Before switching root, the initramfs rewrites /etc/fstab in the mounted root
filesystem so the live runtime view is consistent.
It also preserves or adds 9P mounts for:
certs_share->/etc/certsenv_share->/etc/cocos
6. Switch Root
After provisioning is complete, the initramfs securely wipes the temporary key file and executes:
switch_root /root /sbin/initThe installed system then continues normal boot from the already mounted read-only root.
Runtime Filesystem Model
The running system is split into:
- read-only root on
/ - encrypted writable storage on
/cocos tmpfson/tmptmpfson/var
This means service state is intentionally redirected away from the root filesystem. In particular:
- Docker data lives under
/cocos/docker - agent working data lives under
/cocos - runtime helper scripts use
/cocos_init
From the guest’s point of view, /cocos behaves like a normal directory tree,
but it is backed by an encrypted mapper created during early boot.
Security Model
The current design has a few important properties:
- the root filesystem is mounted read-only
- the root filesystem is protected by dm-verity for integrity
/cocosis protected by LUKS2 using an ephemeral per-boot key- that key is wiped before
switch_root /etc/certsand/etc/cocosare supplied separately through 9P mounts
This means the protections are split by purpose:
- Root filesystem: integrity-protected and read-only
/cocos: encrypted and writable/tmpand/var: volatiletmpfs
This is not a persistent disk unlock scheme. The writable /cocos partition is
provisioned fresh on each boot using a key that is not retained for later
reboots.
Manager Integration
The Manager participates in this flow in two different ways:
- Direct kernel/initramfs boot: the Manager passes
-kerneland-initrdto QEMU for RAM-only or direct-boot workflows - Disk boot: when
MANAGER_QEMU_ENABLE_DISK=true, the Manager boots from an attached disk image instead of passing-kerneland-initrd
In disk mode, the disk artifact must already contain the EFI boot chain, kernel, initramfs, and root filesystem expected by this HAL flow.
See the Manager page for the Manager-side environment variables, disk boot settings, and launch examples.
Build Artifacts
The HAL build produces:
disk.img: the bootable GPT disk imagerootfs.ext4: the root filesystem imagerootfs.verity: the dm-verity hash image for the root filesystemrootfs.roothash: the dm-verity root hash embedded into the GRUB command linerootfs.cpio.gz: the early initramfs used during boot
These artifacts come from the disk HAL Buildroot workflow under cocos/hal/disk
rather than the older cloud-init-based source-image copy flow.
Build Instructions
The disk-backed HAL is implemented as a Buildroot external tree rooted at
cocos/hal/disk.
To build the image:
- clone or open a Buildroot checkout
- point Buildroot at the Cocos disk HAL external tree
- load the
cocos_defconfig - build the image
Steps:
git clone https://github.com/ultravioletrs/cocos.git
git clone https://github.com/buildroot/buildroot.git
cd buildroot
git checkout 2025.11
make BR2_EXTERNAL=../cocos/hal/disk cocos_defconfig
# Execute 'make menuconfig' only if you want to make additional configuration changes to Buildroot.
make menuconfig
makeThe main output is:
/path/to/buildroot/output/images/disk.imgAdditional generated artifacts include:
/path/to/buildroot/output/images/rootfs.ext4
/path/to/buildroot/output/images/rootfs.verity
/path/to/buildroot/output/images/rootfs.roothash
/path/to/buildroot/output/images/rootfs.cpio.gzLaunch Examples
The disk-backed HAL boots from the disk image through firmware:
- firmware loads GRUB from the EFI partition
- GRUB loads the kernel and initramfs
- the initramfs verifies the root filesystem with dm-verity and provisions
/cocos
The following examples mirror the style used in On-premises, but
they use disk boot instead of direct -kernel / -initrd boot.
AMD SEV-SNP
DISK=/path/to/buildroot/output/images/disk.img
IGVM=/path/to/coconut-qemu.igvm
ENV_PATH=/path/to/env_directory
CERT_PATH=/path/to/cert_directory
sudo qemu-system-x86_64 \
-enable-kvm \
-cpu EPYC-v4 \
-machine q35 \
-smp 16,sockets=1,threads=1 \
-m 8G,slots=5,maxmem=40G \
-netdev user,id=vmnic,hostfwd=tcp::7020-:7002 \
-device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= \
-machine confidential-guest-support=sev0,memory-backend=ram1,igvm-cfg=igvm0 \
-object memory-backend-memfd,id=ram1,size=8G,share=true,prealloc=false,reserve=false \
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 \
-drive file=$DISK,if=none,id=disk0,format=raw \
-device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=true \
-device scsi-hd,drive=disk0,bus=scsi0.0 \
-object igvm-cfg,id=igvm0,file=$IGVM \
-nographic \
-monitor pty \
-monitor unix:monitor,server,nowait \
-fsdev local,id=env_fs,path=$ENV_PATH,security_model=mapped \
-device virtio-9p-pci,disable-legacy=on,iommu_platform=true,fsdev=env_fs,mount_tag=env_share,romfile= \
-fsdev local,id=cert_fs,path=$CERT_PATH,security_model=mapped \
-device virtio-9p-pci,disable-legacy=on,iommu_platform=true,fsdev=cert_fs,mount_tag=certs_share,romfile=Intel TDX
DISK=/path/to/buildroot/output/images/disk.img
OVMF=/path/to/OVMF.fd
ENV_PATH=/path/to/env_directory
CERT_PATH=/path/to/cert_directory
sudo qemu-system-x86_64 \
-enable-kvm \
-m 8G \
-smp cores=16,sockets=1,threads=1 \
-cpu host \
-object '{"qom-type":"tdx-guest","id":"tdx","quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"}}' \
-machine q35,kernel_irqchip=split,confidential-guest-support=tdx,memory-backend=mem0,hpet=off \
-bios $OVMF \
-nographic \
-nodefaults \
-no-reboot \
-serial mon:stdio \
-device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=nic0_td,romfile= \
-netdev user,id=nic0_td,hostfwd=tcp::7020-:7002 \
-object memory-backend-memfd,id=mem0,size=8G \
-drive file=$DISK,if=none,id=disk0,format=raw \
-device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=true \
-device scsi-hd,drive=disk0,bus=scsi0.0 \
-device vhost-vsock-pci,guest-cid=6 \
-monitor pty \
-monitor unix:monitor,server,nowait \
-fsdev local,id=env_fs,path=$ENV_PATH,security_model=mapped \
-device virtio-9p-pci,disable-legacy=on,iommu_platform=true,fsdev=env_fs,mount_tag=env_share,romfile= \
-fsdev local,id=cert_fs,path=$CERT_PATH,security_model=mapped \
-device virtio-9p-pci,disable-legacy=on,iommu_platform=true,fsdev=cert_fs,mount_tag=certs_share,romfile=