436 lines
18 KiB
Markdown
436 lines
18 KiB
Markdown
# Protected Virtual Machine Firmware
|
||
|
||
In the context of the [Android Virtualization Framework][AVF], a hypervisor
|
||
(_e.g._ [pKVM]) enforces full memory isolation between its virtual machines
|
||
(VMs) and the host. As a result, the host is only allowed to access memory that
|
||
has been explicitly shared back by a VM. Such _protected VMs_ (“pVMs”) are
|
||
therefore able to manipulate secrets without being at risk of an attacker
|
||
stealing them by compromising the Android host.
|
||
|
||
As pVMs are started dynamically by a _virtual machine manager_ (“VMM”) running
|
||
as a host process and as pVMs must not trust the host (see [_Why
|
||
AVF?_][why-avf]), the virtual machine it configures can't be trusted either.
|
||
Furthermore, even though the isolation mentioned above allows pVMs to protect
|
||
their secrets from the host, it does not help with provisioning them during
|
||
boot. In particular, the threat model would prohibit the host from ever having
|
||
access to those secrets, preventing the VMM from passing them to the pVM.
|
||
|
||
To address these concerns the hypervisor securely loads the pVM firmware
|
||
(“pvmfw”) in the pVM from a protected memory region (this prevents the host or
|
||
any pVM from tampering with it), setting it as the entry point of the virtual
|
||
machine. As a result, pvmfw becomes the very first code that gets executed in
|
||
the pVM, allowing it to validate the environment and abort the boot sequence if
|
||
necessary. This process takes place whenever the VMM places a VM in protected
|
||
mode and can’t be prevented by the host.
|
||
|
||
Given the threat model, pvmfw is not allowed to trust the devices or device
|
||
layout provided by the virtual platform it is running on as those are configured
|
||
by the VMM. Instead, it performs all the necessary checks to ensure that the pVM
|
||
was set up as expected. For functional purposes, the interface with the
|
||
hypervisor, although trusted, is also validated.
|
||
|
||
Once it has been determined that the platform can be trusted, pvmfw derives
|
||
unique secrets for the guest through the [_Boot Certificate Chain_][BCC]
|
||
("BCC", see [Open Profile for DICE][open-dice]) that can be used to prove the
|
||
identity of the pVM to local and remote actors. If any operation or check fails,
|
||
or in case of a missing prerequisite, pvmfw will abort the boot process of the
|
||
pVM, effectively preventing non-compliant pVMs and/or guests from running.
|
||
Otherwise, it hands over the pVM to the guest kernel by jumping to its first
|
||
instruction, similarly to a bootloader.
|
||
|
||
pvmfw currently only supports AArch64.
|
||
|
||
[AVF]: https://source.android.com/docs/core/virtualization
|
||
[why-avf]: https://source.android.com/docs/core/virtualization/whyavf
|
||
[BCC]: https://pigweed.googlesource.com/open-dice/+/master/src/android/README.md
|
||
[pKVM]: https://source.android.com/docs/core/virtualization/architecture#hypervisor
|
||
[open-dice]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md
|
||
|
||
## Integration
|
||
|
||
### pvmfw Loading
|
||
|
||
When running pKVM, the physical memory from which the hypervisor loads pvmfw
|
||
into guest address space is not initially populated by the hypervisor itself.
|
||
Instead, it receives a pre-loaded memory region from a trusted pvmfw loader and
|
||
only then becomes responsible for protecting it. As a result, the hypervisor is
|
||
kept generic (beyond AVF) and small as it is not expected (nor necessary) for it
|
||
to know how to interpret or obtain the content of that region.
|
||
|
||
#### Android Bootloader (ABL) Support
|
||
|
||
Starting in Android T, the `PRODUCT_BUILD_PVMFW_IMAGE` build variable controls
|
||
the generation of `pvmfw.img`, a new [ABL partition][ABL-part] containing the
|
||
pvmfw binary (sometimes called "`pvmfw.bin`") and following the internal format
|
||
of the [`boot`][boot-img] partition, intended to be verified and loaded by ABL
|
||
on AVF-compatible devices.
|
||
|
||
Once ABL has verified the `pvmfw.img` chained static partition, the contained
|
||
[`boot.img` header][boot-img] may be used to obtain the size of the `pvmfw.bin`
|
||
image (recorded in the `kernel_size` field), as it already does for the kernel
|
||
itself. In accordance with the header format, the `kernel_size` bytes of the
|
||
partition following the header will be the `pvmfw.bin` image.
|
||
|
||
Note that when it gets executed in the context of a pVM, `pvmfw` expects to have
|
||
been loaded at 4KiB-aligned intermediate physical address (IPA) so if ABL loads
|
||
the `pvmfw.bin` image without respecting this alignment, it is the
|
||
responsibility of the hypervisor to either reject the image or copy it into
|
||
guest address space with the right alignment.
|
||
|
||
To support pKVM, ABL is expected to describe the region using a reserved memory
|
||
device tree node where both address and size have been properly aligned to the
|
||
page size used by the hypervisor. This single region must include both the pvmfw
|
||
binary image and its configuration data (see below). For example, the following
|
||
node describes a region of size `0x40000` at address `0x80000000`:
|
||
```
|
||
reserved-memory {
|
||
...
|
||
pkvm_guest_firmware {
|
||
compatible = "linux,pkvm-guest-firmware-memory";
|
||
reg = <0x0 0x80000000 0x40000>;
|
||
no-map;
|
||
}
|
||
}
|
||
```
|
||
|
||
[ABL-part]: https://source.android.com/docs/core/architecture/bootloader/partitions
|
||
[boot-img]: https://source.android.com/docs/core/architecture/bootloader/boot-image-header
|
||
|
||
### Configuration Data
|
||
|
||
As part of the process of loading pvmfw, the loader (typically the Android
|
||
Bootloader, "ABL") is expected to pass device-specific pvmfw configuration data
|
||
by appending it to the pvmfw binary and including it in the region passed to the
|
||
hypervisor. As a result, the hypervisor will give the same protection to this
|
||
data as it does to pvmfw and will transparently load it in guest memory, making
|
||
it available to pvmfw at runtime. This enables pvmfw to be kept device-agnostic,
|
||
simplifying its adoption and distribution as a centralized signed binary, while
|
||
also being able to support device-specific details.
|
||
|
||
The configuration data will be read by pvmfw at the next 4KiB boundary from the
|
||
end of its loaded binary. Even if the pvmfw is position-independent, it will be
|
||
expected for it to also have been loaded at a 4-KiB boundary. As a result, the
|
||
location of the configuration data is implicitly passed to pvmfw and known to it
|
||
at build time.
|
||
|
||
#### Configuration Data Format
|
||
|
||
The configuration data is described using the following [header]:
|
||
|
||
```
|
||
+===============================+
|
||
| pvmfw.bin |
|
||
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
|
||
| (Padding to 4KiB alignment) |
|
||
+===============================+ <-- HEAD
|
||
| Magic (= 0x666d7670) |
|
||
+-------------------------------+
|
||
| Version |
|
||
+-------------------------------+
|
||
| Total Size = (TAIL - HEAD) |
|
||
+-------------------------------+
|
||
| Flags |
|
||
+-------------------------------+
|
||
| [Entry 0] |
|
||
| offset = (FIRST - HEAD) |
|
||
| size = (FIRST_END - FIRST) |
|
||
+-------------------------------+
|
||
| [Entry 1] |
|
||
| offset = (SECOND - HEAD) |
|
||
| size = (SECOND_END - SECOND) |
|
||
+-------------------------------+
|
||
| ... |
|
||
+-------------------------------+
|
||
| [Entry n] |
|
||
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
|
||
| (Padding to 8-byte alignment) |
|
||
+===============================+ <-- FIRST
|
||
| {First blob: BCC} |
|
||
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- FIRST_END
|
||
| (Padding to 8-byte alignment) |
|
||
+===============================+ <-- SECOND
|
||
| {Second blob: DP} |
|
||
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- SECOND_END
|
||
| (Padding to 8-byte alignment) |
|
||
+===============================+
|
||
| ... |
|
||
+===============================+ <-- TAIL
|
||
```
|
||
|
||
Where the version number is encoded using a "`major.minor`" as follows
|
||
|
||
```
|
||
((major << 16) | (minor & 0xffff))
|
||
```
|
||
|
||
and defines the format of the header (which may change between major versions),
|
||
its size and, in particular, the expected number of appended blobs. Each blob is
|
||
referred to by its offset in the entry array and may be mandatory or optional
|
||
(as defined by this specification), where missing entries are denoted by a zero
|
||
size. It is therefore not allowed to trim missing optional entries from the end
|
||
of the array. The header uses the endianness of the virtual machine.
|
||
|
||
The header format itself is agnostic of the internal format of the individual
|
||
blos it refers to. In version 1.0, it describes two blobs:
|
||
|
||
- entry 0 must point to a valid BCC Handover (see below)
|
||
- entry 1 may point to a [DTBO] to be applied to the pVM device tree
|
||
|
||
[header]: src/config.rs
|
||
[DTBO]: https://android.googlesource.com/platform/external/dtc/+/refs/heads/master/Documentation/dt-object-internal.txt
|
||
|
||
#### Virtual Platform Boot Certificate Chain Handover
|
||
|
||
The format of the BCC entry mentioned above, compatible with the
|
||
[`BccHandover`][BccHandover] defined by the Open Profile for DICE reference
|
||
implementation, is described by the following [CDDL][CDDL]:
|
||
```
|
||
PvmfwBccHandover = {
|
||
1 : bstr .size 32, ; CDI_Attest
|
||
2 : bstr .size 32, ; CDI_Seal
|
||
3 : Bcc, ; Certificate chain
|
||
}
|
||
```
|
||
|
||
and contains the _Compound Device Identifiers_ ("CDIs"), used to derive the
|
||
next-stage secret, and a certificate chain, intended for pVM attestation. Note
|
||
that it differs from the `BccHandover` defined by the specification in that its
|
||
`Bcc` field is mandatory (while optional in the original).
|
||
|
||
Devices that fully implement DICE should provide a certificate rooted at the
|
||
Unique Device Secret (UDS) in a boot stage preceding the pvmfw loader (typically
|
||
ABL), in such a way that it would receive a valid `BccHandover`, that can be
|
||
passed to [`BccHandoverMainFlow`][BccHandoverMainFlow] along with the inputs
|
||
described below.
|
||
|
||
Otherwise, as an intermediate step towards supporting DICE throughout the
|
||
software stack of the device, incomplete implementations may root the BCC at the
|
||
pvmfw loader, using an arbitrary constant as initial CDI. The pvmfw loader can
|
||
easily do so by:
|
||
|
||
1. Building a BCC-less `BccHandover` using CBOR operations
|
||
([example][Trusty-BCC]) and containing the constant CDIs
|
||
1. Passing the resulting `BccHandover` to `BccHandoverMainFlow` as described
|
||
above
|
||
|
||
The recommended DICE inputs at this stage are:
|
||
|
||
- **Code**: hash of the pvmfw image, hypervisor (`boot.img`), and other target
|
||
code relevant to the secure execution of pvmfw (_e.g._ `vendor_boot.img`)
|
||
- **Configuration Data**: any extra input relevant to pvmfw security
|
||
- **Authority Data**: must cover all the public keys used to sign and verify the
|
||
code contributing to the **Code** input
|
||
- **Mode Decision**: Set according to the [specification][dice-mode]. In
|
||
particular, should only be `Normal` if secure boot is being properly enforced
|
||
(_e.g._ locked device in [Android Verified Boot][AVB])
|
||
- **Hidden Inputs**: Factory Reset Secret (FRS, stored in a tamper evident
|
||
storage and changes during every factory reset) or similar that changes as
|
||
part of the device lifecycle (_e.g._ reset)
|
||
|
||
The resulting `BccHandover` is then used by pvmfw in a similar way to derive
|
||
another [DICE layer][Layering], passed to the guest through a `/reserved-memory`
|
||
device tree node marked as [`compatible=”google,open-dice”`][dice-dt].
|
||
|
||
[AVB]: https://source.android.com/docs/security/features/verifiedboot/boot-flow
|
||
[BccHandover]: https://pigweed.googlesource.com/open-dice/+/825e3beb6c/src/android/bcc.c#260
|
||
[BccHandoverMainFlow]: https://pigweed.googlesource.com/open-dice/+/825e3beb6c/src/android/bcc.c#199
|
||
[CDDL]: https://datatracker.ietf.org/doc/rfc8610
|
||
[dice-mode]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md#Mode-Value-Details
|
||
[dice-dt]: https://www.kernel.org/doc/Documentation/devicetree/bindings/reserved-memory/google%2Copen-dice.yaml
|
||
[Layering]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md#layering-details
|
||
[Trusty-BCC]: https://android.googlesource.com/trusty/lib/+/1696be0a8f3a7103/lib/hwbcc/common/swbcc.c#554
|
||
|
||
#### pVM Device Tree Overlay
|
||
|
||
Config header can provide a DTBO to be overlaid on top of the baseline device
|
||
tree from crosvm.
|
||
|
||
The DTBO may contain debug policies. Debug policies MUST NOT be provided for
|
||
locked devices for security reasons.
|
||
|
||
Here are an example of DTBO.
|
||
|
||
```
|
||
/ {
|
||
fragment@avf {
|
||
target-path = "/";
|
||
|
||
__overlay__ {
|
||
avf {
|
||
/* your debug policy here */
|
||
};
|
||
};
|
||
};
|
||
}; /* end of avf */
|
||
```
|
||
|
||
For specifying DTBO, host bootloader should apply the DTBO to both host
|
||
OS's device tree and config header of `pvmfw`. Both `virtualizationmanager` and
|
||
`pvmfw` will prepare for debugging features.
|
||
|
||
For details about device tree properties for debug policies, see
|
||
[microdroid's debugging policy guide](../microdroid/README.md#option-1-running-microdroid-on-avf-debug-policy-configured-device).
|
||
|
||
### Platform Requirements
|
||
|
||
pvmfw is intended to run in a virtualized environment according to the `crosvm`
|
||
[memory layout][crosvm-mem] for protected VMs and so it expects to have been
|
||
loaded at address `0x7fc0_0000` and uses the 2MiB region at address
|
||
`0x7fe0_0000` as scratch memory. It makes use of the virtual PCI bus to obtain a
|
||
virtio interface to the host and prints its logs through the 16550 UART (address
|
||
`0x3f8`).
|
||
|
||
At boot, pvmfw discovers the running hypervisor in order to select the
|
||
appropriate hypervisor calls to share/unshare memory, mark IPA regions as MMIO,
|
||
obtain trusted true entropy, and reboot the virtual machine. In particular, it
|
||
makes use of the following hypervisor calls:
|
||
|
||
- Arm [SMC Calling Convention][smccc] v1.1 or above:
|
||
|
||
- `SMCCC_VERSION`
|
||
- Vendor Specific Hypervisor Service Call UID Query
|
||
|
||
- Arm [Power State Coordination Interface][psci] v1.0 or above:
|
||
|
||
- `PSCI_VERSION`
|
||
- `PSCI_FEATURES`
|
||
- `PSCI_SYSTEM_RESET`
|
||
- `PSCI_SYSTEM_SHUTDOWN`
|
||
|
||
- Arm [True Random Number Generator Firmware Interface][smccc-trng] v1.0:
|
||
|
||
- `TRNG_VERSION`
|
||
- `TRNG_FEATURES`
|
||
- `TRNG_RND`
|
||
|
||
- When running under KVM, the pKVM-specific hypervisor interface must provide:
|
||
|
||
- `MEMINFO` (function ID `0xc6000002`)
|
||
- `MEM_SHARE` (function ID `0xc6000003`)
|
||
- `MEM_UNSHARE` (function ID `0xc6000004`)
|
||
- `MMIO_GUARD_INFO` (function ID `0xc6000005`)
|
||
- `MMIO_GUARD_ENROLL` (function ID `0xc6000006`)
|
||
- `MMIO_GUARD_MAP` (function ID `0xc6000007`)
|
||
- `MMIO_GUARD_UNMAP` (function ID `0xc6000008`)
|
||
|
||
[crosvm-mem]: https://crosvm.dev/book/appendix/memory_layout.html
|
||
[psci]: https://developer.arm.com/documentation/den0022
|
||
[smccc]: https://developer.arm.com/documentation/den0028
|
||
[smccc-trng]: https://developer.arm.com/documentation/den0098
|
||
|
||
## Booting Protected Virtual Machines
|
||
|
||
### Boot Protocol
|
||
|
||
As the hypervisor makes pvmfw the entry point of the VM, the initial value of
|
||
the registers it receives is configured by the VMM and is expected to follow the
|
||
[Linux ABI] _i.e._
|
||
|
||
- x0 = physical address of device tree blob (dtb) in system RAM.
|
||
- x1 = 0 (reserved for future use)
|
||
- x2 = 0 (reserved for future use)
|
||
- x3 = 0 (reserved for future use)
|
||
|
||
Images to be verified, which have been loaded to guest memory by the VMM prior
|
||
to booting the VM, are described to pvmfw using the device tree (x0):
|
||
|
||
- the kernel in the `/config` DT node _e.g._
|
||
|
||
```
|
||
/ {
|
||
config {
|
||
kernel-address = <0x80200000>;
|
||
kernel-size = <0x1000000>;
|
||
};
|
||
};
|
||
````
|
||
|
||
- the (optional) ramdisk in the standard `/chosen` node _e.g._
|
||
|
||
```
|
||
/ {
|
||
chosen {
|
||
linux,initrd-start = <0x82000000>;
|
||
linux,initrd-end = <0x82800000>;
|
||
};
|
||
};
|
||
```
|
||
|
||
[Linux ABI]: https://www.kernel.org/doc/Documentation/arm64/booting.txt
|
||
|
||
### Handover ABI
|
||
|
||
After verifying the guest kernel, pvmfw boots it using the Linux ABI described
|
||
above. It uses the device tree to pass the following:
|
||
|
||
- a reserved memory node containing the produced BCC:
|
||
|
||
```
|
||
/ {
|
||
reserved-memory {
|
||
#address-cells = <0x02>;
|
||
#size-cells = <0x02>;
|
||
ranges;
|
||
dice {
|
||
compatible = "google,open-dice";
|
||
no-map;
|
||
reg = <0x0 0x7fe0000>, <0x0 0x1000>;
|
||
};
|
||
};
|
||
};
|
||
```
|
||
|
||
- the `/chosen/avf,new-instance` flag, set when pvmfw generated a new secret
|
||
(_i.e._ the pVM instance was booted for the first time). This should be used
|
||
by the next stages to ensure that an attacker isn't trying to force new
|
||
secrets to be generated by one stage, in isolation;
|
||
|
||
- the `/chosen/avf,strict-boot` flag, always set and can be used by guests to
|
||
enable extra validation
|
||
|
||
### Guest Image Signing
|
||
|
||
pvmfw verifies the guest kernel image (loaded by the VMM) by re-using tools and
|
||
formats introduced by the Android Verified Boot. In particular, it expects the
|
||
kernel region (see `/config/kernel-{address,size}` described above) to contain
|
||
an appended VBMeta structure, which can be generated as follows:
|
||
|
||
```
|
||
avbtool add_hash_footer --image <kernel.bin> \
|
||
--partition_name boot \
|
||
--dynamic_partition_size \
|
||
--key $KEY
|
||
```
|
||
|
||
In cases where a ramdisk is required by the guest, pvmfw must also verify it. To
|
||
do so, it must be covered by a hash descriptor in the VBMeta of the kernel:
|
||
|
||
```
|
||
cp <initrd.bin> /tmp/
|
||
avbtool add_hash_footer --image /tmp/<initrd.bin> \
|
||
--partition_name $INITRD_NAME \
|
||
--dynamic_partition_size \
|
||
--key $KEY
|
||
avbtool add_hash_footer --image <kernel.bin> \
|
||
--partition_name boot \
|
||
--dynamic_partition_size \
|
||
--include_descriptor_from_image /tmp/<initrd.bin> \
|
||
--key $KEY
|
||
```
|
||
|
||
Note that the `/tmp/<initrd.bin>` file is only created to temporarily hold the
|
||
hash descriptor to be added to the kernel footer and that the unsigned
|
||
`<initrd.bin>` should be passed to the VMM when booting a pVM.
|
||
|
||
The name of the AVB "partition" for the ramdisk (`$INITRD_NAME`) can be used by
|
||
the signer to specify if pvmfw must consider the guest to be debuggable
|
||
(`initrd_debug`) or not (`initrd_normal`), which will be reflected in the
|
||
certificate of the guest and will affect the secrets being provisioned.
|
||
|
||
If pVM guest kernels are built and/or packaged using the Android Build system,
|
||
the signing described above is recommended to be done through an
|
||
`avb_add_hash_footer` Soong module (see [how we sign the Microdroid
|
||
kernel][soong-udroid]).
|
||
|
||
[soong-udroid]: https://cs.android.com/android/platform/superproject/+/master:packages/modules/Virtualization/microdroid/Android.bp;l=427;drc=ca0049be4d84897b8c9956924cfae506773103eb
|