0% found this document useful (0 votes)
53 views15 pages

Shedding Light On Static Partitioning Hypervisors PDF

This paper evaluates static partitioning hypervisors (SPH) for Arm-based mixed-criticality systems, focusing on performance, safety, and real-time requirements. The study examines popular open-source SPH like Jailhouse, Xen, Bao, and seL4 CAmkES, providing empirical data to inform industrial practitioners and highlight existing weaknesses in these systems. The authors aim to raise awareness in the research community about the challenges in SPH and have made all artifacts publicly available for further exploration and validation.

Uploaded by

saftab.rashid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views15 pages

Shedding Light On Static Partitioning Hypervisors PDF

This paper evaluates static partitioning hypervisors (SPH) for Arm-based mixed-criticality systems, focusing on performance, safety, and real-time requirements. The study examines popular open-source SPH like Jailhouse, Xen, Bao, and seL4 CAmkES, providing empirical data to inform industrial practitioners and highlight existing weaknesses in these systems. The authors aim to raise awareness in the research community about the challenges in SPH and have made all artifacts publicly available for further exploration and validation.

Uploaded by

saftab.rashid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/369380018

Shedding Light on Static Partitioning Hypervisors for Arm-based Mixed-


Criticality Systems

Preprint · March 2023

CITATIONS READS

0 54

2 authors:

José Martins Sandro Pinto


University of Minho University of Minho
22 PUBLICATIONS 186 CITATIONS 91 PUBLICATIONS 1,548 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by José Martins on 20 April 2023.

The user has requested enhancement of the downloaded file.


Shedding Light on Static Partitioning Hypervisors
for Arm-based Mixed-Criticality Systems
José Martins Sandro Pinto
Centro ALGORITMI/LASI, Universidade do Minho Centro ALGORITMI/LASI, Universidade do Minho
[email protected] [email protected]

solutions, i.e., Jailhouse, Xen (Dom0-less), Bao, and the seL4


Abstract—In this paper, we aim to understand the properties CAmkES virtual machine monitor (VMM). We drive our study
and guarantees of static partitioning hypervisors (SPH) for Arm- based on two key requirements of modern MCS, i.e., real-
based mixed-criticality systems (MCS). To this end, we performed
a comprehensive empirical evaluation of popular open-source time and safety, focusing on (i) performance, (ii) interrupt
SPH, i.e., Jailhouse, Xen (Dom0-less), Bao, and seL4 CAmkES latency, (iii) inter-VM communication, (iv) boot time, and
VMM, focusing on two key requirements of modern MCS: (v) code size. For each metric, we assess the effectiveness of
real-time and safety. The goal of this study is twofold. Firstly, the cache coloring technique, pervasive in SPH, for inter-VM
to empower industrial practitioners with hard data to reason interference mitigation.
about the different trade-offs of SPH. Secondly, we aim to
raise awareness of the research and open-source communities The goal of this study is twofold. Firstly, we aim at em-
to the still open problems in SPH by unveiling new insights powering industrial practitioners with hard data to understand
regarding lingering weaknesses. All artifacts will be open-sourced the trade-offs and limits of modern Arm-based SPH as well
to enable independent validation of results and encourage further as how best configure these systems for their use case and
exploration on SPH. requirements. For example, the use of superpages significantly
Index Terms—Virtualization, Static Partitioning, Hypervisor,
Mixed-Criticality, Arm. decreases the number of TLB misses, resulting in negligible
performance overhead (<1% without interference), but it is
I. I NTRODUCTION precluded by enabling page coloring, a widely adopted cache
The explosion in the number of functional requirements in partitioning technique in these hypervisors. Also, experiments
industries such as automotive has led to a trend for centralized demonstrated that coloring, per se, can impact the performance
architectures that consolidate heterogeneous software stacks in up to 20% and that it cannot fully mitigate interference,
high-performance platforms [1], [2]. These typically take the where overhead can still reach up to 60%. Regarding inter-VM
form of mixed-criticality systems (MCSs) [3], [4] as they often communication, we show that for bulk data transfers, buffer
integrate safety- or mission-critical workloads with real-time size choice is crucial to maximize throughput while avoiding
requirements, alongside Unix-like operating systems (OSs) degradation due to inter-VM interference.
providing rich functionality. Virtualization technology is the Secondly, with the collected empirical data, we aim at rais-
de facto enabler for these architectures as, by definition, it ing awareness of the research and open-source communities
allows for consolidation with strong fault encapsulation. In to the still open problems in SPH, by highlighting both new
this context, hypervisor design must balance, on one side, and previously known weaknesses for these SPH, which seem
minimality for safety and security, and feature-richness and to be mostly due to interrupt virtualization issues. Prominent
efficient sharing of resources on the other. While traditional examples include: (i) the need for implementing state-of-the-
hypervisors were optimized for the latter [5], [6], on the art mechanisms to fully mitigate inter-VM interference (e.g.,
opposite end of the spectrum we have static partitioning hyper- memory throttling) in mainstream SPH; (ii) the extent of the
visors (SPH) specifically designed for MCS [7], [8]. Besides impact of interference on interrupt latency, which can increase
statically assigning system resources (e.g., CPUs, memory, by several orders of magnitude; (iii) the lack of support for
or devices) to virtual machines (VMs), SPH must provide correctly handling and delivering interrupts in priority order;
latency and isolation guarantees at the microarchitectural level (iv) the absence of mechanisms that prioritize the boot of a
to comply with the freedom from interference requirements of critical VM; and (v) the lack of plasticity of the SPH archi-
industry safety standards such as ISO 26262 [1], [9]–[11]. tecture which might hinder achieving its own goal of allowing
In this paper, we shed light on open-source SPH for Arm- full IO passthrough. To address observed shortcomings, we
based MCS. Despite the existence of multiple reports, research discuss potential solutions and research directions.
papers, and public artifacts, information on these systems tends We made all artifacts openly available [12] to enable prac-
to be scattered or focus on a single hypervisor or metric, while, titioners and researchers with the methods and materials to
in some cases, empirical evidence is non-existent. Thus, it is (i) independently replicate all experiments and corroborate as-
difficult to obtain a comprehensive understanding of the overall sessed results, as well as (ii) encourage and facilitate additional
properties and guarantees of these systems in the context of experiments and further exploration of SPH.
MCS. To fill this gap, we conduct a leveled playing-field In summary, this paper makes the following contributions:
evaluation of four open-source static partitioning virtualization (1) presents the most comprehensive empirical study to date
on popular open-source SPH focusing on a set of key metrics them in the VM, by writing to GIC list registers (LRs). These
for modern MCS; (2) provides hard empirical data to empower registers essentially take the place of the distributor for the
industrial practitioners with the knowledge to understand the virtual interface: when a given interrupt (along with metadata
limits and trade-offs of SPH; (3) raises awareness of the such as priority or state) is present on a register, it is forwarded
research and open-source communities to the open problems of to the virtual interface. The GICv2 spec limits the number of
SPH by shedding light on their shortcomings; and finally, (4) LRs to a maximum of 16. GICv3 and GICv4 provide support
opens all artifacts to enable independent validation of results for direct delivery of hardware interrupts to VMs; however,
and facilitate further research. this feature is only implemented for inter-processor interrupts
(IPIs) and message-signaled interrupts (MSIs), i.e., interrupts
II. BACKGROUND
implemented as write operations to special interrupt controller
In this section, we start by overviewing Armv8-A virtualiza- registers and propagated via the system interconnect. Standard
tion support. We then explain the concept of static partitioning wired interrupts, propagated by dedicated side-band signals,
virtualization, including key techniques implemented in SPH. are still subject to the mentioned limitation, i.e., hypervisor
Finally, we describe Xen, Jailhouse, Bao, and seL4 CAmkES. interrupt injection through the list register.
A. Arm Virtualization B. Static Partitioning Virtualization (SPV)
CPU & Memory. Given the widespread proliferation of vir- Static partitioning is the practice of, either at build or ini-
tualization in the last decades, Arm implemented hardware tialization time, distributing all platform resources to different
support since version 7 of the ISA. The most recent version subsystems. This can be materialized in many shapes and
of the architecture, i.e., Armv8/9-A, extends the privileged forms, depending on the hardware primitives. Virtualization
architecture with a dedicated hypervisor privilege mode (EL2) is a natural enabler for the static partitioning architecture, due
which sits between the secure firmware mode (EL3) and the to the strong encapsulation guarantees and flexible resource
kernel/user modes (EL1/EL0) [13] where guests execute. A assignment. Hypervisors designed for the static partitioning
hypervisor running at EL2 has fine-grained control over which use case (or providing such a configuration) have three fun-
CPU resources are directly accessible by guests (e.g., control damental properties: (i) exclusive assignment of virtual CPUs
registers). Access to a denied functionality by a guest OS to physical CPUs (i.e., no scheduler); (ii) static allocation,
results in a trap to the hypervisor. It is possible to route specific assignment, and mapping of all hypervisor and VM memory
guest exceptions and system interrupts to EL2. Other resources at build or initialization time; and (iii) direct assignment of
that can be managed by the hypervisor include the CPU- devices to VMs (passthrough) and exclusive allocation of their
private generic timer and the performance monitor unit (PMU). interrupts to the same VM. To implement this efficiently, these
EL1/EL0 memory accesses are subject to a second stage of hypervisors are highly dependent on virtualization hardware
translation which is in full control of the hypervisor [13]. Any support both at the CPU and platform level (e.g., SMMU).
guest access to a memory region not mapped in the second SPH also have non-functional requirements centered around
stage of translation will result in a precise trap to EL2. Arm minimizing interrupt latency and inter-VM interference. Thus,
provides multiple “translation granules”, resulting in pages of over the past few years, there have been efforts to enhance
different sizes: 4 KiB, 16 KiB, and 64 KiB. For each page size SPH with mechanisms to address these requirements. These
it is also possible to map large contiguous memory regions. include cache coloring and, analogously to what has been done
These are known as superpages (or hugepages), which reduces for x86 [14], direct injection in Arm processors. Furthermore,
TLB pressure. The more commonly used 4KiB granule allows it is important for the code base to be minimal and follow
for 1GiB and 2MiB superpages. Arm also defines the System industry coding standards (e.g., MISRA); this eases functional
Memory-Management Unit (SMMU), that extends memory safety (FuSa) certification efforts.
virtualization mechanisms from the CPU to the bus, to restrict Cache Coloring. In SPH, VMs still share microarchitectural
VM-originated direct-memory accesses (DMAs). resources such as the last-level cache (LLC). The behavior
Interrupts. Arm virtualization acceleration spans the full plat- and memory access pattern of one VM might result in the
form, including the Generic Interrupt Controller (GIC). The eviction of another VM’s cache lines, impacting the latter’s
GICv2 standard has two main components: a central distributor hit rate and consequently its execution time. Thus, there is the
and a per-core interface. All interrupts are routed first to the need to partition shared caches, assigning each partition to a
distributor, which then forwards them to the interfaces. The different VM. While in the past, Armv7 processors provided
distributor allows the configuration of interrupt parameters hardware means to apply this partitioning by way of per-master
(e.g., priority, target CPU) and the monitoring of interrupt cache-locking, modern-day Arm CPUs do not provide those
state, while the interface enables the core management of facilities. A solution is cache coloring, a software technique for
interrupts. GICv2 provides virtualization support only on the index-based cache partitioning [15]. Cache coloring explores
interface; there is a fully virtual interface with which the the intersection of the virtual addresses’ cache index and page
guests can directly interact without VM exits. The distributor, number when creating virtual-to-physical memory mappings.
however, must be fully emulated. Furthermore, all interrupts Each color is a specific bit pattern in this intersection that
must first be handled by the hypervisor, which can then inject maps only to specific cache sets. Thus, hypervisors can control
which cache sets are assigned to a given VM by selecting was initially designed for servers and desktops, but has found
which physical pages are mapped to it. By exclusively assign- also adoption on embedded applications. For embedded and
ing a cache partition (i.e., group of cache sets or colors) to a automotive applications, Xilinx has led the implementation
given VM, cache coloring fully eliminates the conflict misses of Xen Dom0-less. With this novel approach, it is possible
resulting from inter-VM contention. Cache coloring can also to have a Xen deployment without any Dom0, booting all
be applied to the hypervisor itself by assigning it one or more guests directly from the hypervisor and statically partitioning
specific colors. the system. A patch for guest and hypervisor cache coloring
Direct Interrupt Injection. Direct interrupt injection is a new support [23] is available. There is also a SIG working to-
technique implemented in Arm-based SPH to eliminate the wards facilitating downstream FuSa certifications by fostering
need of the hypervisor mediating interrupt injection. With this multiple initiatives within the community including MISRA
technique, the hypervisor passes through the physical GIC refactoring, or providing the option of running Zephyr [24] as
CPU interface and routes all interrupts directly to the VM Dom0. Besides Armv8-A, Xen also supports x86, and Armv8-
by configuring the CPU to trigger interrupt traps directly at R and RISC-V ports are underway.
EL1, i.e., kernel mode. The hypervisor must still emulate Bao Hypervisor. Bao [8] is an open-source static partitioning
the shared distributor to ensure isolation between VMs, i.e., hypervisor that was made publicly available in 2020. It imple-
prevent misconfiguration of a given VM’s interrupts by another ments the pure static partitioning architecture, i.e., a minimal,
VM. This allows physical interrupts to be directly delivered thin-layer of privileged software which leverages the existing
to the VM with no hypervisor intervention, reducing latency ISA virtualization primitives to partition the hardware. Bao
to native execution levels. The forfeiting of interrupts should has no scheduler and does not rely on any external libraries
not be a major issue as SPH do not directly manage devices. or privileged VM (e.g., Linux), consisting on a standalone
However, SPH still need to communicate internally using IPIs. component which depends only on standard firmware to
Direct interrupt injection implementations address this issue initialize the system and perform platform-specific tasks such
by leveraging standard software-delegated exception interface as power management. Bao originally targeted Armv8-A [8].
(SDEI) [16] events instead of directly using IPIs. SDEI is The mainline now includes support for RISC-V [25], Armv7-
implemented by firmware, allowing the hypervisor to register A, and Armv8-R ports are in the making. Bao was specifically
an event during initialization. The hypervisor can then trigger designed to provide strong real-time and safety guarantees.
the event by issuing a system call to firmware (via a secure It implements hardware partitioning mechanisms to guarantee
monitor call instruction, SMC), which will result in diverting true freedom from interference, i.e., cache coloring (VM and
execution to a predefined hypervisor handler, similarly to Unix hypervisor), and direct interrupt injection. There are ongoing
signals. In reality, firmware maps these events to its own efforts to implement memory throttling.
secure reserved IPIs since, as part of TrustZone [16], the GIC
provides further facilities to reserve interrupts to EL3. seL4 CAmkES VMM. seL4 is a formally verified microkernel
[26]. Its design model revolves around the use of capabilities.
C. Static Partitioning Hypervisors (SPH)
When used as a hypervisor, seL4 executes in hypervisor mode
Jailhouse Hypervisor. Jailhouse [7], [17] is an open-source (e.g, EL2) and exposes extra capabilities and APIs to manage
hypervisor developed by Siemens. Unlike traditional baremetal virtualization functionality [27]. A user-level VMM uses its
hypervisors, Jailhouse leverages the Linux kernel to boot and resource capabilities to create VMs. As of this writing, only
initialize the system and uses a kernel module to install the the seL4 CAmkES VMM [28], [29] code is open-source.
hypervisor. Once Jailhouse is activated, it runs as a baremetal Each CAmkES VMM manages a single VM. One current
component, taking full control over the hardware. Jailhouse issue of the CAmkES VMM is that, although it supports
has no scheduler and only leverages the ISA virtualization multicore VMs, each VMM runs as a single thread pinned to a
primitives to partition hardware resources across multiple single CPU. seL4 supports x86, Armv7/8-A and RISC-V, but
isolated domains, a.k.a. “cells”. Guest OSes or baremetal the latter is not supported by CAmkES VMM. In CAmkES,
applications running inside cells are called “inmates”. The resources are statically allocated to each component using
mainline includes support for x86 and Armv7/8-A, and a work- capabilities. Originally, seL4 provided only a priority-based
in-progress RISC-V port [18]. The research community has preemptive scheduler. The newest MCS kernel extends it with
been actively contributing with mechanisms to enhance pre- scheduling context capabilities, allowing time management
dictability, namely: cache coloring, DRAM bank partitioning policies to be defined in user space [30]. Cache coloring has
[19], memory throttling, and device quality of service (QoS) also been implemented in seL4 [31], not only at the user/VM
regulation [20]. An unofficial fork including these features is level, but also for the kernel, but it was not publicly available at
available [21]. Direct injection [22] was also implemented. the time of writing. seL4 has formal proofs for its specification,
Xen (Dom0-less) Hypervisor. Xen [5] is an open-source hy- implementation from C to binary, and security properties [32],
pervisor widely used in a broad range of application domains. [33]. There are also ongoing efforts to extend the formal
A key distinct feature of Xen is its dependency on a privi- verification to prove the absence of covert timing channels
leged VM (Dom0) that typically runs Linux, to manage non- [34]. Finally, CAmkES is being deprecated in the near future
privileged VMs (DomUs) and interface with peripherals. Xen in favor of the seL4 Core Platform (seL4CP) [35]. seL4CP
Privileged Cell 1 Cell 2 Privileged Dom0 DomU 1 DomU 2 VM1 VM2 VM3 VM 1 VMM 1 VM 2 VMM 2
Root Cell (optional in Dom0-less)
Rich
User Rich Apps RT Apps System Rich Apps RT Apps Rich Apps RT Apps RT Apps
Apps
Services

Inmate Inmate Dom0 Rich RTOS


Sup Linux Rich OS RTOS Rich OS RTOS BM App
Rich OS RTOS Kernel OS
Native Drivers

Hyp Jailhouse Hypervisor Xen (Dom0-less) Hypervisor Bao Hypervisor seL4 microkernel

HW Platform HW Platform HW Platform HW Platform

Fig. 1: Architectural overview of the assessed hypervisors: Jailhouse, Xen (Dom0-less), Bao and seL4 CAmkES VMM
will also provide support for per-VM user-mode VMMs1 while a quad-core Cortex-A53 running at 1.2 GHz, a GIC-400
promising to alleviate the performance overhead of CAmkES. (GICv2) featuring 4 list registers, and an MMU-500 (SM-
MUv2). Cores have private 32KiB separate L1 instruction
III. M ETHODOLOGY AND E XPERIMENTAL S ETUP
and data caches, and share a L2 1MiB unified cache. It also
A. Methodology includes a programmable logic (PL) component (i.e., FPGA).
Selected Hypervisors. We have selected four open-source
SPH (Fig.1). Jailhouse and Bao were designed for the static Hypervisors configuration. We made an effort to use the latest
partitioning use case; both are open-source and target Arm versions of each SPH. Still, we applied a few patches to
platforms. Xen Dom0-less is a novel deployment that al- Jailhouse, Xen, and Bao to include features such as coloring
lows directly booting multiple VMs (bypassing Dom0) and or direct injection, which are not yet fully merged. Further,
passthrough of peripherals to VMs. Finally, seL4 is a well- we had to make small adjustments to all SPH to enable
established open-source microkernel, which can be used as a homogeneous configurations (e.g., uniforming VM memory
hypervisor in combination with a user-level VMM. The seL4 map), allow direct guest access to PMUs, or instrumenting
CAmkES VMM is an open-source reference VMM implemen- hypervisors for specific experiments. For each SPH, we lever-
tation with static allocation of resources. These systems are ac- aged the default configuration for the target SoC, with some
tively maintained, adopted for commercial purposes, and there tweaked options such as disabling debug and logging features.
is a fair amount of information about them. We have excluded There were, however, specific adjustments that were made on a
other open-source SPH that do not support Armv8-A such as per-hypervisor basis. For example, to remove or minimize the
the SPH architecture pioneer Quest-V [36]–[38], and ACRN invocation of a scheduler in Xen, we used the null scheduler
[39], as well as open-source hypervisors that don’t explicitly and disabled trapping of wait-for-interrupt (WFI) instructions;
target static partitioning (e.g., KVM [6], Xvisor [40]). We have in seL4, since it was not possible to disable the timer tick,
excluded microkernels such as NOVA [41] due to the lack of we configured the tick with a period of about 5 seconds.
availability of an open-source reference user-space VMM, and We compiled all hypervisors with GCC 11.2, with the default
because we believe seL4 serves as a faithful representative of optimization level defined by each hypervisor’s build system.
the microkernel architecture. TrustZone-assisted hypervisors All these SPH configurations and modifications are available
[42]–[44] were left out due to multicore scalability issues and clearly discernible in the provided artifact [12].
and lack of active maintenance. Finally, we have excluded
commercial products (e.g., PikeOS, LynxSecure) as these often VM configuration. VM configurations are as similar as
require licenses the authors did not have access to, and that possible, mainly w.r.t. number of vCPUs and memory. For
would limit wide access to the study artifacts. Jailhouse and seL4-VMM, where memory must be manually
allocated, we set memory regions aligned to 2 MiB. The
Empirical Evaluation. The evaluation focuses on perfor-
only device assigned to each VM is a UART. We evaluated
mance, interrupt latency, inter-VM communication latency and
two different classes of VMs: (i) large VMs running Linux
bandwidth, boot time, and code size. We also assess the effect
(v5.14), as representative of rich, Unix-like OSs; and (ii) small
of interference and of the available mitigation mechanism
VMs running baremetal applications or FreeRTOS (v10.4), as
(i.e., cache coloring). Although we consider virtual device
representative of critical workloads. When cache coloring is
performance, IO interference, and applied security techniques
enabled, we assign half of the colors (four out of eight2 ) to the
such as stack canaries or guards, data execution prevention or
VM executing the benchmark, three colors to the interference
control-flow integrity very relevant, these are out of scope of
application, and one color to the hypervisor (just supported in
this work. We advocate for a follow up study as future work.
Bao and Xen). Note that color assignment configuration can
B. Experimental setup significantly impact the final measurements for all metrics. In
Hardware Platform. Experiments were carried out on a Xil- real deployments, the color assignment should be carefully
inx ZCU104, featuring a Zynq Ultrascale+ SoC. It includes defined based on the profile of the final system.
1 Only after the bulk of this work was carried out, virtualization support in 2 We consider only eight cache colors while, in truth, the target platform
seL4CP was made openly available. At the time of writing, it still appears to allows for 16. We do this to avoid color assignment configurations that would
be in a beta stage and not as mature as CAmkES. partition the L1 cache.
Interference Workload. When evaluating memory hierarchy benchmark, while large operates over a considerable input data
interference, we use a custom baremetal guest which continu- set, emulating a real-world application scenario.
ously writes a buffer with the size of the LLC (1MiB). Unless
Base Performance Overhead. Fig. 2 presents the relative per-
noted otherwise, this interference guest runs on a VM with
formance degradation for the MiBench AICS. For each bench-
two vCPUs. We stress that although parameterized to cause a
mark, below the plotted bars, we present the average absolute
significant level of interference, the observed effects caused by
execution time for the native execution. The first observation
the interference workload do not necessarily reflect the worst
is that, independently of the hypervisor, different benchmarks
case that could be achieved if further fine-tuned.
are affected to different degrees. Secondly, Jailhouse, Xen, and
Measurement tools. We use the Arm PMU to collect microar- Bao incur a negligible performance penalty, i.e., less than 1%
chitectural events on benchmark execution. The selected events across all benchmarks. Although seL4 CAmkES-VMM also
include instruction count, TLB accesses and refills, cache presents a small overhead for most benchmarks, the overhead
access and refills, number of exceptions taken, and number of can reach up to 7%.
interrupts triggered; we register the exception level on which
For a virtualized system configured with a single guest
these events occur. For the Linux VMs, we use the perf tool
VM, there are two main possible sources of overhead. The
[45] to measure the time and to collect microarchitectural
first source is the increase in TLB miss penalty due to
events. For baremetal or RTOS VMs, we use the Arm Generic
the second stage of translation, since it can, in the worst
Timer, with a resolution of 10 ns, and a custom PMU driver.
case, increase the number of memory accesses in a page-
C. Threats to validity walk by a factor of four. Second, the overhead of trapping
Experiments were independently conducted by two re- to the hypervisor and performing interrupt injection, e.g.,
searchers. Each used a different ZCU104 platform and pre- timer tick interrupt. Additionally, the pollution of caches and
agreed VM configurations (cross-checked). We have contacted TLBs by the hypervisor might also affect guest performance.
key individuals and/or maintainers as representatives of each To further understand the behavior of the benchmarks, in
SPH community. We have received replies from all of them, particular the larger overhead of the CAmkES-VMM, we have
which led to a few iterations and repetition of some exper- collected a number of microarchitectural events. Fig. 3 shows
iments. Overall, the comments and issues raised by these them normalized to the number of executed instructions. We
individuals are reflected in the presented ideas and results. highlight two events whose increase is highly correlated with
Despite all efforts, these experiments may still be subject to the degradation observed: hypervisor L2 cache refills (Fig.
latent inaccuracies. We will open source all artifacts to enable 3a) and guest TLB misses (Fig. 3b), with Pearson correlation
independent validation of the results. This study may also coefficients of up to 0.94 and 0.96, respectively.
include limitations on the generalization to other platforms. An important hypervisor feature to minimize the impact of
For the hardware platform, we argue both the SoC (Zynq two-stage translation is to leverage superpages. By inspecting
Ultrascale+) and the Cortex-A53 are representative of others hypervisor code, we concluded that only CAmkES-VMM does
used in automotive and industrial settings (e.g., NXP i.MX8 or not have support for 2MiB superpages. This justifies the higher
Renesas R-Car M3). To corroborate this, we have also carried number of TLB misses. Notwithstanding, to corroborate this
out the performance and interrupt latency experiments for the argument, we have configured the other SPH to preclude the
Bao hypervisor in an NXP i.MX8QM, which features the GIC- use of superpages. As expected, we observed an increase
500 (GICv3). The obtained results are fully consistent with in the performance degradation (and TLB misses) similar to
those presented in Sections IV and V. Furthermore, we argue CAmkES-VMM (Fig. 4). We still observed a gap of up to 2%
next generation platforms, such as i.MX9 featuring Cortex- between CAmkES-VMM and the other SPH; this is related to
A55 CPUs, implement very similar microarchitectures. the aforementioned interrupt handling and injection overheads,
i.e., a consequence of the microkernel design: more costly
IV. SPH: P ERFORMANCE switches between VM and VMM and a high number of VMM
We start by assessing the performance degradation3 of a to microkernel calls for managing and inject the interrupts.
single-core Linux VM atop each SPH. The main results are This is confirmed by Figures 3c and 3d, which show the
depicted in Figures 2, 3, and 4. We then evaluate the system hypervisor-to-guest executed instruction ratio and the number
under interference to understand the effectiveness of microar- of exceptions taken by the hypervisor, respectively. For these
chitectural isolation mechanisms available in each SPH. events, seL4 has a higher ratio when compared to the other
Selected Benchmark. We use the MiBench Embedded Bench- SPH. We further investigate interrupt injection in Section V.
marks’ Automotive and Industrial Control Suite (AICS) [46].
These benchmarks are intended to emulate the environment of Takeaway 1. SPH do not incur in meaningful performance
embedded applications such as airbag controllers and sensor impacts due to: (i) modern hardware virtualization support;
systems. Each test has two variants: small operates in a (ii) 1-to-1 mapping between virtual and physical CPUs; and
reduced input data set representing a lightweight use of the (iii) minimal traps. However, one key aspect is that SPH must
3 Performance degradation is the ratio between the total execution time of
have support for / make use of superpages to minimize TLB
the benchmark running atop the hypervisors and native execution.
misses and page-table walk overheads.
jailhouse

% Performance Degradation
7
6 xen
5 bao
sel4/camkes-vmm
4
3
2
1
0
22.24 ms 219.46 ms 4.74 ms 18.40 ms 5.14 ms 33.46 ms 23.45 ms 297.08 ms 20.72 ms 252.75 ms 100.47 ms 1496.70 ms
qsort qsort susanc susanc susane susane susans susans bitcount bitcount basicmath basicmath
small large small large small large small large small large small large
Fig. 2: Relative performance degradation for the MiBench Automotive and Industrial Control Suite.
0.0006 coloring can only reduce interference but not completely
0.00015 mitigate it. In these experiments, the interference workload
0.0004
0.00010 runs continuously. However, in a more realistic scenario,
0.0002 it might be intermittent. The improvement in predictability
0.00005
achieved by coloring is reflected in the difference between
0.00000 0.0000
(a) Hyp. L2 cache miss per instr. (b) Guest iTLB miss per instr. the base experiment results (bars in Fig. 2 and +interf in
1e 6 Fig. 5) and respective variants with coloring enabled (+col in
0.005 4
0.004 Fig. 5). The lower the difference, the higher the predictability.
3
0.003 For example, in the case of susanc-small, we observed that
2
0.002 without coloring, the variation can go up to 105 percentage
0.001 1
points (pp), while when coloring is enabled, the observed
0.000 0
(c) Hyp./Guest instr. ratio (d) Hyp. exceptions per instr.
overhead is around 58%, which corresponds to a variation
of 38 pp compared to the configuration with coloring enabled
Fig. 3: MiBench AICS microarchitectural events. but without interference. Nevertheless, we observed that cache
misses are essentially reduced to the same level as when
7 0.0006
6
coloring is enabled but without interference. Clearly, the
5 0.0004 observed interference is not only due to cache-line contention.
4
3 There are points of contention at deeper levels of the memory
2 0.0002
hierarchy, e.g., buses and memory controller [47] or even in
1
0 0.0000 internal LLC structures [48]. Finally, results on Xen and Bao
(a) % Performance Degradation (b) Guest iTLB miss per instr. demonstrate that hypervisor coloring has no substantial benefit
Fig. 4: MiBench AICS without the use of superpages on as it only reduces performance degradation due to interference
second-stage translation. by at most 1% (omitted due to lack of space).

Performance under interference. We also evaluate inter- Takeaway 2. Multicore memory hierarchy interference sig-
VM interference and the effectiveness of cache coloring at nificantly affects guests’ performance. Cache partitioning via
both guest and hypervisor levels. Fig. 5 plots the results page coloring is not a silver bullet as despite fully elimi-
under interference (+interf ), with coloring enabled (+col), nating inter-core conflict misses, it does not fully mitigate
and with interference and coloring enabled (+interf+col). seL4 interference (up to 38 pp increase in relative overhead).
CAmkES VMM shows no results for coloring enabled as this
feature is not openly available yet. V. SPH: I NTERRUPT L ATENCY
There are four conclusions to be drawn. Firstly, interference As discussed in Section II-A, the existing GIC virtualization
significantly affects the benchmark execution over all hyper- support is not ideal for MCS: hypervisors have to handle and
visors. As expected, this is explained by a significant increase inject all interrupts and must actively manage list registers
in L2 cache misses. On Jailhouse, Xen, and Bao performance when the number of pending interrupts is larger than the phys-
is degraded by a similar factor, i.e., to a maximum of about ical list registers. This is of particular importance to guarantee
105%; seL4-VMM is more susceptible to interference, reach- the correct interrupt priority order which might be critical for
ing up to 125% in the worst case. This pertains to the fact an RTOS [49]. In this section, we investigate the overhead
that, given that seL4-VMM executes a much higher number of each SPH in the interrupt latency, their susceptibility to
of instructions, the interference also impacts the execution interference, and the effectiveness of cache coloring. Then, we
of the hypervisor. Secondly, coloring, per se, significantly evaluate the direct injection technique and analyze interrupt
impacts performance (up to about 20%). This seems logical priority support as well as virtual IPI latencies.
given that coloring (i) forces the hypervisor to use 4KiB Methodology. To measure interrupt latency, we used a custom
pages, reducing TLB reach, and (ii) reduces the available lightweight baremetal benchmark, which measures the latency
cache space, which for working sets larger than LLC increases of a periodic interrupt triggered by the Arm Generic Timer.
memory system pressure (i.e., L2 cache misses). Thirdly, The timer is programmed in auto-reload mode, to continuously
% Performance Degradation
120 jailhouse+interf xen+interf+col
jailhouse+col bao+interf
100 jailhouse+interf+col bao+col
80 xen+interf bao+interf+col
60 xen+col sel4/camkes-vmm+interf
40
20
0
0.0125
Guest L2 Cache Misses

0.0100
per Instruction

0.0075
0.0050
0.0025
0.0000
qsort qsort susanc susanc susane susane susans susans bitcount bitcount basicmath basicmath
small large small large small large small large small large small large
Fig. 5: Performance degradation and L2 cache misses per instruction for the Mibench AICS under interference and coloring.

9000 baremetal VMM; (ii) a system call from the VMM to inject the interrupt
jailhouse
8000 xen in the VM (i.e., write the list register); (iii) another to “reply”
bao
Interrupt Latency (ns)

7000 to the exception, resuming the VM; and (iv) a final one where
sel4/camkes-vmm
6000
5000
the VMM waits for a message signaling a new VM event or
4000 interrupt, resulting in a final context-switch back to the VM.
3000 We have also concluded that seL4 does not use a GIC feature
2000
that would allow guests to directly deactivate4 the physical
1000
0 interrupt, resulting in an extra trap.

Fig. 6: Base interrupt latency. Takeaway 3. Due to the lack of efficient hardware support
for directly delivering interrupts to guests in Arm platforms,
trigger an interrupt at each 10 ms. The interrupt handler reads
all SPH increase the interrupt latency by at least one order of
the value of the timer, i.e., it measures the time elapsed since
magnitude. However, by-design, SPH such as Jailhouse and
the interrupt was triggered. Each measurement is carried out
Bao are able to achieve the lowest latencies as they provide
with cold L1 caches. To achieve this, after each measurement,
an optimized path for hardware interrupt injection.
we flush the instruction cache. During the 10 ms, we also
prime the L1 data cache with useless data. Latency Under Interference. Fig. 7 shows the results for
Base Latency. Fig. 6 depicts the violin plots for the custom interrupt latency under interference, including the baseline
benchmark running atop each SPH. From the baseline of about results of Fig. 6 for relative comparison as solo. Analyzing
200 ns, Bao and Jailhouse incur the smallest increase, albeit the effects of VM interference on interrupt latency (interf ),
significant, to an interrupt latency of about 4x (840 ns) and 5x we observed that Bao latency increases to an average of
(1090ns), respectively. Xen shows an increase of about 14x 7260 ns, Jailhouse to 7730 ns, Xen to 23000 ns, and seL4-
(2800 ns). The variance observed in these three systems is VMM to 85940 ns. It corresponds to an increase of 36x, 38x,
negligible. The difference observed between Jailhouse/Bao and 115x, and 430x, respectively, compared to the base latency.
Xen is justified by the interrupt injection path being highly It is also worth noting that the variance also increases. When
optimized in the former, while more generic in Xen. We enabling coloring (col), we measured no significant difference
confirmed this by studying the source code and assessing the in interrupt latency compared to the base case. However,
number of instructions executed by each hypervisor on the when enabling cache coloring in the presence of inter-VM
interrupt handling and injection path: while Jailhouse and Bao interference (interf+col), there is a visible improvement in
execute around 200 instructions, Xen executes about 1050. average latency and variance. However, note that the observed
seL4-VMM presents the largest interrupt latency (47x, 9400 variance does not constitute a measure of predictability. As
ns), an order of magnitude higher than Jailhouse and Bao. explained in Section IV, predictability is reflected in the differ-
The variance of the latency is also affected. This can be ence between the interf and interf+col results and respective
explained by the interrupt handling and injection mechanism baselines, i.e., solo and col. Finally, by applying coloring also
of a microkernel architecture. In the other SPH, each interrupt to the hypervisor (interf+col+hypcol), Bao latency is reduced
results in a single exception taken at EL2, where the interrupt to almost no-interference levels with negligible variance. Xen
is handled and injected in the VM; virtualization support is latency also drops considerably to an average of 6300 ns.
leveraged such that no further traps occur. In CAmkES VMM The observed interrupt latency under interference can be
it results in four traps to the microkernel: (i) the first due to 4 Deactivating an interrupt in the GIC means marking it as handled, enabling
the interrupt that results in forwarding it as a message to the the distributor to forward it to the CPU when it occurs again.
jailhouse 3000 jailhouse
100000 bao
90000 xen
bao 2500

Interrupt Latency (ns)


Interrupt Latency (ns)

80000 sel4/camkes-vmm
70000 2000
60000 1500
50000
40000 1000
30000 500
20000
10000 0
solo interf interf+col
0
solo col
interf+col interf+colinterf Fig. 9: Interrupt latency with direct injection enabled.
+hypcol
test
Fig. 7: Interrupt latency under interference and cache coloring. 35000 baremetal
30000 jailhouse

Interrupt Arrival Interval (ns)


xen
350 25000 bao
200 300 sel4/camkes-vmm
20000
250
150 15000
200
100 150 10000
100 5000
50
50 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0
solo interf interf+col interf+col solo interf interf+col interf+col Fig. 10: Time between the handling of different priority
+hypcol +hypcol
(a) Guest L2 cache misses. (b) Hypervisor L2 cache misses. interrupts triggered simultaneously, i.e., for interrupt N in the
X-axis, the time between the arrival of interrupts N-1 and N.
Fig. 8: L2 cache misses for the interrupt latency benchmark.
Takeaway 5. The direct injection technique is effective in
mostly explained by L2 cache misses. Fig. 8 shows the L2 addressing the shortcomings of GIC interrupt virtualization,
cache misses for both guest and hypervisor during interrupt as results clearly demonstrated interrupt latency overhead is
latency measurement. We can see that interference increases reduced to near native latencies.
guest L2 cache misses, but that cache coloring can lower them
back to the base case values. However, this is not the case Priority Handling. For studying the support of SPH delivering
for hypervisor L2 cache misses. For the base case, there are interrupts in the correct priority order, we have implemented
no cache misses for the hypervisor, which increases substan- a PL device which can be used to trigger up to 16 simulta-
tially under interference. Despite VM coloring contributing neous interrupts, and a custom benchmark that assigns each
to reduce hypervisor L2 cache misses, only by coloring the a different priority. It starts by triggering the eight lowest
hypervisor level, it is possible to minimize L2 cache misses for priority interrupts. When handling the first, it triggers the eight
the hypervisor. On Bao, L2 cache misses are fully eliminated, highest priority interrupts. This would force the hypervisor
but not on Xen5 , which might explain why latency does not to refill the four available LRs with the new higher priority
reduce to non-interference levels. interrupts, and refill them in priority order as LRs become
available. The benchmark verifies if the priority order was
Takeaway 4. Interrupt latency increases tenfold under the kept and measures the arrival interval between each interrupt.
interference workload. Applying cache coloring to VMs We verified that only Xen and Bao guarantee the delivery of
proves very beneficial, but for it to be fully effective, it is interrupts in the correct priority order. By inspecting the code,
imperative to reserve a color for the hypervisor itself. we have confirmed that both seL4-VMM and Jailhouse fill
the GIC LRs following a FIFO policy. Furthermore, the seL4-
Direct Injection. We evaluate the effectiveness of the direct
VMM does not even commit the interrupt priorities configured
injection technique, implemented only in Jailhouse and Bao.
in the virtual GIC distributor to hardware, precluding the
Fig. 9 depicts the results. The first conclusion is that for the
arrival of physical interrupts in the correct priority order. Fig.
base case, i.e., no interference, the interrupt latency is near
10 shows that across all hypervisors if multiple interrupts
to native (about 210 ns). Indeed, we have confirmed that
are delivered simultaneously, there is an increase by several
during the execution of the benchmark, there are no traps to
orders of magnitude in the arrival time of the first and second
the hypervisor. Next, we observed that interference somewhat
interrupts, which is less than 700 ns for the baremetal case.
increases latency and its variance, but much less than in the
This larger increase is justified by the fact that the hypervisors
previous experiments. Finally, we concluded that by enabling
must handle all interrupts before the guest starts handling the
coloring, it is possible to lower the average latency to near
first interrupt. Another observation is that there is a periodic
native (243 and 232 ns for Bao and Jailhouse, respectively),
increase in the interval of arrival. We have concluded this is
however, there is still some variance due to the interference.
the point at which there are no pending interrupts left in the
5 At the time of writing, Xen’s coloring patch was still under review. Thus,
LRs, which triggers the hypervisor to refill these registers with
the assessed implementation may contain some imprecisions that are likely
to be fixed by the time the patch is merged. previously spilled pending interrupts.
17500 baremetal 60000 jailhouse
15000 jailhouse 55000 xen
xen 50000 bao

Notification Latency (ns)


12500 bao 45000 sel4/camkes-vmm
Time (ns)

10000 sel4/camkes-vmm 40000


35000
7500 30000
5000 25000
20000
2500 15000
0 10000
Trap Emulation IPI 5000
0
Fig. 11: Average cost for each send IPI operation component. base interf interf+col interf+col+hypcol

Takeaway 6. Only Xen and Bao respect interrupt priority Fig. 12: Inter-VM notification latencies.
order. Additionally, we observe that for all SPH, if multiple
interrupts are triggered simultaneously, there is a partial VI. SPH: I NTER -VM COMMUNICATION
priority inversion as lower priority interrupts take precedence For inter-VM communication, SPH typically only provide
due to the need for the hypervisor to handle and inject them. statically allocated shared memory. This is usually coupled
Inter-Processor Interrupts. IPIs (SGIs) are critical for multi- with an asynchronous notification mechanism signaled as an
core VM performance. For a vCPU to send an SGI, the guest interrupt. All four SPH provide such mechanisms. Next, we
must write a virtual GIC distributor register. This will trap analyze inter-VM notification latency and transfer throughput.
to the hypervisor that must emulate the access and forward Inter-VM latency. Fig. 12 shows the inter-VM notification
the event to the target core, where the SGI is injected via list latency, reflecting the time since the notification is issued until
registers. We use a custom baremetal benchmark to measure the execution of the handler in the destination VM. The relative
IPI latency. It works by measuring the time between when the differences between the latencies for each SPH are similar to
source vCPU writes the distributor register and when the final those observed for passthrough interrupts and IPIs. Jailhouse
IPI handler starts executing. It also measures the overhead of achieves the lower latency (1500 ns), followed by Bao (1900
the trap. We instrument the SPH to sample the time the IPI is ns). Xen shows an intermediate value of 4600 ns, while seL4
forwarded internally; this signals the end of the emulation and CAmkES VMM is significantly larger than others (average
translates the overhead of injecting the interrupt in the target. 18000 ns). Studying the internals of the implementations,
Figure 11 shows that IPI latency increases significantly for we note that while most hypervisors synthesize and inject
all SPH. While the baremetal IPI latency is around 260 ns, the virtual interrupts, Jailhouse uses non-allocated physical
it reaches 2258 ns for Jailhouse, 4157 ns for Xen, 2711 ns interrupts for these notifications. Thus, to send one, Jailhouse
for Bao, and 10868 ns for the CAmkES VMM. However, the only sets the interrupt pending in the GIC distributor. This is
costs of the register access emulation and interrupt injection significantly advantageous when combined with direct injec-
are not proportional across all SPH. For example, Bao has the tion. Note that enabling direct injection in Bao would preclude
lowest emulation and event forwarding times, but the overall the use of this mechanism. For seL4, we highlight the impact
IPI latency is higher than Jailhouse’s. This means that the of the microkernel architecture since atop VM/VMM context
interrupt injection path on Bao is slower than on Jailhouse. By switches, we observe additional overheads due to inter-VMM
inspecting the source of both hypervisors, we have observed communication. Lastly, we see that interference increases all
that Bao immediately forwards the SGI event to the target core, latencies accordingly and that coloring can mitigate it.
performing all interrupt injection operations in the target core. Inter-VM throughput. In Fig. 13, we evaluate the throughput
Jailhouse, in turn, manages the interrupt injection structures of bulk data transfers via a shared memory buffer. The bench-
at the source core and only then signals the target vCPU by mark transmits 16 MiB of random data through a shared buffer
writing the list register. Xen follows the same approach as with varying sizes. When the source VM finishes writing
Jailhouse, but presents higher overhead. The CAmkES VMM the buffer, it either signals the destination VM via a shared
has the highest overhead due to the large number of system memory flag or via an asynchronous notification, and waits for
calls the VMM issues to the microkernel (in total, 7). Four are a signal back to start writing the next chunk. For the polling
issued before the event forwarding, and the rest only after the scenario, the obtained throughput is very similar across all
SGI is forward to the target core. All in all, the access to the hypervisors; this confirms that are no significant differences
virtual distributor is more expensive than the IPI itself. in how they allocate and map memory or configure memory
attributes. Throughput is stable (1500 MiB/s) until the buffer
Takeaway 7. IPI latency reflects the same overheads of size surpasses the LLC size (1 MiB), dropping to about
external interrupts. Future Arm platforms might reduce them 1300 MiB/s. For the asynchronous scenario, throughput is
with GICv4.1 [50]. In the short term, direct injection might significantly impacted when using smaller buffer sizes, given
alleviate this issue. However, both approaches fall short the high number of synchronization points that reflects the
of achieving native latency as they still pay the price of observed interrupt overheads. Finally, we note that interference
emulating the write to the “IPI send” register. has no significant effect as long as the buffer size is kept
No Interference Interference 12000 fsbl root-cell fsbl uboot
1500 atf jailhouse atf xen
10000 uboot
1250
Throughput (MiB/s)

8000

Time (ms)
1000
6000

Polling
750
jailhouse 4000
500 xen
bao 2000
250
sel4/camkes-vmm 0
0
1500 12000 fsbl uboot fsbl elf-loader
atf bao atf sel4
1250 10000 uboot camkes-vmm
Throughput (MiB/s)

1000 8000

Time (ms)
Interrupt
750 6000
500 4000
250 2000
0 0
104 105 106 107 104 105 106 107 0 10 20 30 40 50 60 0 10 20 30 40 50 60
Buffer Size (KiB) Buffer Size (KiB) VM Image Size (MiB) VM Image Size (MiB)

Fig. 13: Inter-VM communication throughput. Fig. 14: Boot time for each stage by VM image region size.
this macro perspective, the other hypervisors add an almost
below about half the size of LLC. Beyond that, throughput is
constant offset to U-boot’s boot time, the largest being seL4-
reduced from 1300 to 850 MiB/s. Although not shown due to
VMM’s. We observe this overhead is not on the microkernel,
lack of space, using coloring does not prove beneficial, as the
but at user level, which nevertheless heavily interacts with the
throughput illustrated in Fig. 13 remains virtually unchanged.
microkernel to setup capabilities and kernel objects. We can
Takeaway 8. Inter-VM notification latencies are significant conclude that VM boot time has its bottleneck by the loading
and, as is the case for hardware interrupts, very susceptible to of guest images to memory, not the hypervisor logic.
the effects of interference. However, for bulk data transfers it FreeRTOS and Linux Boot Times. We also measure the boot
does not seem to significantly affect throughput if the shared time of (i) a small VM running FreeRTOS with a 90 KiB
buffer size is chosen on a range of about one-fourth to half image and (ii) a large VM with a Linux guest (built-in ramfs)
the LLC size (i.e., 256 KiB to 512 KiB). totaling 59 MiB of image size. For Jailhouse, the Linux VM
is a non-root cell. In Table I, we present results for a single-
VII. SPH: B OOT TIME guest and a dual-guest system. For the latter, both VMs boot
System boot time is a crucial metric in industries such as simultaneously; thus, we did not run experiments for dual-
automotive [51], [52] as critical components have strict timing guest with Jailhouse, because it launches VMs sequentially.
requirements for becoming fully operational. Table I presents the absolute boot time for the guest’s native
Platform’s Boot Flow. The platform’s boot flow [53] starts by and virtualized execution, highlighting the relative percentage
executing ROM code which loads the first-stage bootloader increase compared to native execution. For the single-guest
(FSBL) and enables the main cores. These initial boot stages FreeRTOS VM, all hypervisors but Bao cause a non-negligible
setup the platform basic infrastructure (e.g., clocks, DRAM) increase in boot time. The same happens with the single-guest
and load the TF-A and U-boot. U-boot will load the hypervisor Linux VM. For the dual-guest configuration, we concluded
and, except for Jailhouse, the guest images. Bao and Xen that the small VM is heavily affected for all hypervisors.
directly boot guests after initialization. Jailhouse starts with the Surprisingly, we observe that although the cost of booting
boot of the Linux root cell, that installs the hypervisor which a single FreeRTOS in Bao is negligible, this is not true
then loads the guests. seL4’s execution starts with an ELF for a dual-guest configuration. Booting it alongside a Linux
loader which loads the all images, initializes secondary cores, VM significantly increases its boot time, reaching similar
and sets up an initial set of page tables for the microkernel. overheads to those observed in Jailhouse’s sequential boot.
The microkernel initializes and hands control to user space.
Takeaway 9. The major bottleneck for the VM boot time is
Total VM Boot Time. The hypervisor boot time is heavily caused by the bootloader, not the hypervisors. Notwithstand-
dependent on the VM and how it is configured. We observed ing, the hypervisor can significantly increase the boot time
that the VM image size is one of the parameters that has the of a critical VM (small RTOS) when booting it alongside a
higher impact in the hypervisor boot time. We measure boot larger VM (e.g., in dual-OS Linux+RTOS configuration).
time as a function of VM image size. Thus, to understand
the overhead of the hypervisor in the context of the complete VIII. SPH: C ODE S IZE AND TCB
boot flow, in Fig. 14, we plot the cumulative time for each In MCS, the size of the hypervisor code, measured in source
boot stage. Here, we can confirm that in all hypervisors but lines of code (SLoC), is critical. It should be minimal as
Jailhouse, the bulk of boot time is spent by U-boot. For it is part of the trusted computing base (TCB) of all VMs.
Jailhouse, U-boot run time is constant, albeit large, as it In this paper, we consider that a VM TCB encompasses any
always only loads the root cell’s image. Jailhouse execution component with sufficient privileges that if it is compromised
time increases steeply while loading the VM image. From or malfunctions, might be able to affect the safety and/or
Baremetal Jailhouse Xen Bao seL4-VMM

FreeRTOS
Single 1670.89 6242.18 / 173.58% 2338.24 / 39.94% 1716.23 / 2.71 % 3496.19 / 109.24% differences are reflected in the binary size of each hypervisor.
Dual
Single 7665.14
N/A
12284.92 / 60.27%
6887.88 / 312.23%
8533.88 / 11.33%
5734.04 / 143.17%
7805.54 / 1.83 %
9291.02 / 456.05%
12629.79 / 64.77%
TCB. The hypervisor SLoC does not directly reflect the VM
Linux
Dual N/A 8707.15 / 13.59% 7895.95 / 3.01 % 13086.86 / 70.73% TCB. Although by design SPH such as Bao has a smaller
SLoC count, the seL4-VMM is vastly superior from a security
TABLE I: Total boot time (ms) and relative increase compared
perspective: shared TCB is limited only to the formally verified
to the baremetal case, for FreeRTOS and Linux VMs.
microkernel, because each VM is managed by a fully isolated
C
Asm
Total .text VMM. From a FuSa certification standpoint, however, the
(SLoC) (KiB)
.c .h VMM would still need to be considered. Moreover, seL4
jailhouse
hypervisor 7308 2279 342 9929 79.3 formal proofs are limited to a set of kernel configurations,
driver 2041 139 N/A 2180 20.1
currently not including multicore. Regarding Jailhouse, despite
xen 57360 8127 1765 67342 451.5
bao 5046 2840 537 8423 57.9
its small size, the root cell is a privileged component of the
seL4 microkernel 14569 N/A 189 14758 224.7 system. It executes part of all VM management logic, being in
CAmkES VMM
VMM 20932 19291 N/A 40223 724.3 the critical path for booting all other VMs. It is arguably part
of all VM’s TCB, increasing it significantly [54]. Analogously,
TABLE II: Hypervisor SLoC count and binary code size.
Xen must depart from true Dom0-less to leverage richer
security properties of the VM. As well understood in the features (e.g., PV drivers, dynamic VM creation). Recently,
literature, a larger TCB typically has a higher number of bugs the Xen community has ignited efforts to use a smaller OS,
and wider attack surface [54], resulting in a higher probability such as Zephyr [24], as Dom0, refactor Xen to MISRA C, and
of vulnerabilities. It is important to understand that each VM provide extensive requirements and test documentation [56].
has its own TCB. Thus, CAmkES VMM is only considered Takeaway 10. Hypervisors specifically targeting static par-
for the managed VM’s TCB, not the others. Further, large code titioning have the smallest code bases. Despite facilitating
bases are impractical for certification, both from a technical certification, none of the evaluated SPH provide other arti-
and economic perspective. To qualify a component assigned facts (e.g., requirements specification, coding standards). Xen
a safety integrity level (SIL), all components on which it is the first to take steps in this direction; nevertheless, seL4’s
depends must also be qualified to the same or higher SIL [4]. formal proofs provide the most comprehensive guarantees.
Methodology. We measured SLoC for the target configurations
using cloc [55]. Xen build system offers a make target to assess IX. D ISCUSSION AND F UTURE D IRECTIONS
the SLoC for a specific configuration. However, it does not In this section, we discuss some of the open issues and
count header files, which we believe must be accounted for potential research directions to improve the guarantees of SPH.
since they provide function-like macros and inline functions. Interference Mitigation Techniques. Cache coloring does not
We have modified the Xen makefile to measure headers. We fully mitigate the effects of inter-core interference. Further-
have also extended Jailhouse and Bao build systems with the more, coloring has inherent inefficiencies such as (i) pre-
same functionality. For seL4, we used the fully unified and cluding the use of superpages and (ii) increasing memory
pre-processed kernel source file to assess the microkernel code pressure which affects performance and predictability, as well
base. For the CAmkES VMM, given that its source code is as (iii) internal fragmentation (exclusively assigning 1 out of
scattered throughout multiple seL4 project libraries, we were N colors, implicitly allocates 1/Nth of physical memory, a
not able to list its source code files from the build system. portion of which may remain unused for small RTOSs or
Instead, we used debug information from the final executable the SPH). While the latter could be solved by employing
and inspected each source to assess the included header files. cache bleaching [57] in heterogeneous platforms, to further
Code Size. Looking at Table II we see Bao and Jailhouse minimize coloring bottlenecks, we advocate for SPH to adopt
have the smallest code base of about 8400 and 9900 SLoC, other proven, widely applicable contention mitigation mech-
respectively. Bao is implemented as a standalone component anisms, e.g., bandwidth regulation mechanisms implemented
with no external dependencies. However, since part of Jail- via PMU-based CPU throttling [58], [59]. We also stress the
house functionality is implemented as a Linux kernel module, importance of including support for hardware extensions such
we also account that for the code base. It adds about 2180 as Arm’s Memory Partitioning and Monitoring (MPAM) [11],
SLoC, bringing Jailhouse total code base to 12 KSLoC. For [60], which provide flexible hardware means for partitioning
Xen we use a custom config with almost all features disabled, cache space and memory bandwidth and call for platform
except a few ones such as coloring and static shared memory. It designers to include such facilities in their upcoming designs
features the largest code base with around 67 KSLoC. Finally, targeting MCS. Finally, we stress the need for instrumentation,
seL4 microkernel has 14.5 KSLoC, while the CAmkES VMM analysis, and profiling tools [20], [61] that integrate with these
can go up to 40K, i.e., almost 55 KSLoC in total. The visible hypervisors to help system designers understand the trade-offs
difference between Bao and Jailhouse, and seL4 microkernel and fine-tune these mechanisms (e.g., through automation).
and, especially, Xen, lies in the fact that the former were Platform-Level Contention and Mitigation. None of the stud-
designed specifically for the static partitioning use case, while ied SPH manages traffic from peripheral DMAs. We advocate
the latter aim at being more generic and adaptable. These that SPH must provide contention mitigation mechanisms at
the platform level, e.g., (i) leveraging QoS hardware [20], SPH (to minimize code size). On the other hand, the seL4
[62] available on the bus and (ii) controlling interference microkernel architecture is much more flexible as it allows
from DMA-capable devices or accelerators. Furthermore, since for an isolated user space VMM per guest, providing more
DMA masters still share SMMU structures (e.g., TLBs [63]), robust isolation and customization; however, it comes at the
we hypothesize that bandwidth regulation techniques may fall cost of non-negligible latencies. We advocate for novel archi-
short of efficiently mitigating interference at this level. tectures that combine microkernels’ flexibility and strong fault
Interrupt Injection Optimization. Arm-based SPH’s interrupt encapsulation with SPH’s simplicity and minimalist latencies
latency is mainly due to inadequate support in GICv2/3. GICv4 by hosting per-partition VMMs directly at the hypervisor
will provide direct interrupt injection support, but only for IPIs privilege level. Such a design could arguably be achieved
and MSIs. We want to raise awareness of Arm silicon makers by combining multikernel-like architectures [64] and per-core
and designers of the need for additional hardware support at memory protection mechanisms (e.g., Armv9 RME’s GPT
the GIC level for direct injection of wired interrupts. The [65], or RISC-V PMP [66]) statically configured by firmware.
same holds for RISC-V [25]. Besides hardware support, we Full IO Passthrough. Pure static partitioning supports only
observed that simple SPH provide optimized interrupt injection passthrough IO. However, as highlighted by [7], there is a crit-
paths. It is also possible to optimize this path in larger SPH ical problem in providing full IO passthrough when controls
(e.g., Xen) and in microkernels (e.g., moving injection logic over IO resources such as clock, reset, power, or pin-muxes
to the microkernel). Finally, Bao and Jailhouse implement cannot be securely partitioned or shared, e.g., if their MMIO
direct interrupt injection; however, we must stress that using registers reside on the same frame or they are configured via
this technique severely hinders the ability of the SPH to platform management co-processors oblivious of SPH’s VMs.
manage devices or implement any functionality dependent on Thus, SPH should provide controlled guest access to these
interrupts. A plausible research direction would be a hybrid resources by emulation or through standard interfaces such as
approach, i.e., selectively enabling direct injection only in SCMI [67]. Nevertheless, this would require including drivers
specific cores for critical guests while providing the more in the hypervisor, increasing its code base. Again, we urge
complex functionality in cores running non-critical guests. hardware designers to provide hardware primitives that enable
Interrupt Priority Inversion Fix. As discussed in Section V, SPH to pass through IO resource controls.
the studied SPH suffer from partial interrupt priority inversion X. R ELATED W ORK
because all currently pending interrupts are handled by the There are several hypervisor analyses in the context of
hypervisor and injected in the guest before it can service embedded and MCSs, but none provide a cross-section anal-
the highest-priority one. We advocate for implementing a ysis and comparison on SPH. Some works focus on a single
lightweight solution by dynamically setting the interrupt pri- hypervisor while others evaluate a single metric or feature.
ority mask based on the priority of the last injected interrupt. In [40], authors compare the performance of Xvisor with
This approach ensures the hypervisor only receives the next Xen and KVM. Others have evaluated the effectiveness of
interrupt once the guest has handled the highest priority one. cache coloring and bandwidth reservations in Xvisor [59].
Critical VM Boot Priority. Section VII highlights the issue Similarly, in [19], authors evaluate cache and DRAM bank
of critical VM boot time overhead when booted under a dual- coloring in Jailhouse. Other works have evaluated Jailhouse
OS configuration. We advocate for the development of boot interrupt latency [68] or VM interference [69]. There are
mechanisms that prioritize the boot of small critical VMs. also studies about the feasibility of using Xen and KVM
However, as noted in Jumpstart [52], it must encompass the as real-time hypervisors [70], but mainly for x86. Little
full boot flow and be optimized across stages and components has been published regarding the new Xen Dom0-less and
since the bottleneck of the boot time is in the image loading cache coloring features, but results can be found in [71].
process performed by the bootloader, not the hypervisor. Evaluation of the seL4 CAmkES VMM has also been done
Per-Partition Hypervisor Replica. Memory contention highly for performance and interrupt latency [29]. There have been
affects interrupt latency but can be minimized by assigning works providing a qualitative analysis for MCS hypervisors,
different colors for VMs and the hypervisor. Notwithstanding, contrasting architectural approaches and highlighting future
coloring the hypervisor may prove wasteful and insufficient to trends [72] while others layout guidelines on how to choose
address other interference channels internal to the hypervisor. such a hypervisor in industrial settings [51].
We advocate for à la multikernel [64] implementations such XI. C ONCLUSION
as the one implemented in seL4, where the hypervisor image We have conducted the most comprehensive empirical eval-
is replicated per cache partition [31], fully closing internal uation of open-source SPH to date, focusing on key metrics
channels. For SPH with a small enough footprint, memory for MCS. With that, we drew a set of observations that
consumption or boot time costs should not be prohibitive. (i) will help industrial practitioners understand the trade-offs
Architecture Flexibility. Purely monolithic SPH (e.g., Jail- of SPH and (ii) raise awareness of the research and open-
house or Bao) have smaller code bases at the cost of feature source communities to the still open problems in SPH. We
richness and flexibility. The same holds for Xen, i.e., many are opening all artifacts to enable independent validation of
widely-used rich features are absent when configured as an results and encourage further exploration on SPH.
XII. ACKNOWLEDGMENTS [21] “jailhouse-rt: Bu-maintained version of the jailhouse partition-
ing hypervisor with real-time features.” [Online]. Available:
We would like to express our gratitude to the reviewers https://github.com/rntmancuso/jailhouse-rt
for their valuable feedback and suggestions, as well as to [22] A. Biondi et al., “SPHERE: A Multi-SoC Architecture for Next-
Generation Cyber-Physical Systems Based on Heterogeneous Plat-
our friendly shepherd for guiding us in making final im- forms,” IEEE Access, 2021.
provements. Additionally, we appreciate the time and thought- [23] G. Corradi, “Xen on Arm: Real-Time Virtualization with Cache Color-
ful input from all the representatives of SPH, namely Ralf ing,” in Proc. of Embedded World Conference, 2020.
[24] “Zephyr project,” Feb 2023. [Online]. Available:
Ramsauer (Jailhouse), Stefano Stabellini (Xen), and Gernot https://www.zephyrproject.org/
Heiser (seL4/CAmkES-VMM). José Martins was supported [25] B. Sa et al., “A First Look at RISC-V Virtualization from an Embedded
by FCT grant SFRH/BD/138660/2018. This work is supported Systems Perspective,” IEEE Transactions on Computers, 2021.
[26] G. Klein et al., “SeL4: Formal Verification of an OS Kernel,” in Proc.
by FCT – Fundação para a Ciência e Tecnologia within the of ACM Symposium on Operating Systems Principles (SOSP), 2009.
RD Units Project Scope UIDB/00319/2020, and European [27] G. Heiser, “The seL4 Microkernel: An Introduction,” The seL4 Foun-
Union’s Horizon Europe research and innovation program dation, 2020.
[28] G. Klein et al., “Formally Verified Software in the Real World,”
under grant agreement No 101070537, project CROSSCON Communications of the ACM, 2018.
(Cross-platform Open Security Stack for Connected Devices). [29] J. Millwood et al., “Performance Impacts from the seL4 Hypervisor,”
in Proc. of the Ground Vehicle Systems Engineering and Technology
R EFERENCES Symposium, 2020.
[30] A. Lyons et al., “Scheduling-Context Capabilities: A Principled, Light-
[1] J. Cerrolaza et al., “Multi-Core Devices for Safety-Critical Systems: A Weight Operating-System Mechanism for Managing Time,” in Proc. of
Survey,” ACM Computing Surveys, 2020. European Conference on Computer Systems (EuroSys), 2018.
[2] M. Staron, Contemporary Software Architectures: Federated and Cen- [31] Q. Ge et al., “Time Protection: The Missing OS Abstraction,” in Proc.
tralized. Springer International Publishing, 2021. of European Conference on Computer Systems (EuroSys), 2019.
[3] A. Burns and R. Davis, “A Survey of Research into Mixed Criticality [32] T. Murray et al., “seL4: From General Purpose to a Proof of Information
Systems,” ACM Computing Surveys, 2017. Flow Enforcement,” in Proc. of IEEE Symposium on Security and
[4] A. Esper et al., “An industrial view on the common academic under- Privacy (S&P), 2013.
standing of mixed-criticality systems,” Real-Time Systems, 2018. [33] G. Klein et al., “Comprehensive Formal Verification of an OS Micro-
[5] J. Hwang et al., “Xen on ARM: System Virtualization Using Xen Hy- kernel,” ACM Transactions on Computer Systems, 2014.
pervisor for ARM-Based Secure Mobile Phones,” in Proc. of Consumer [34] G. Heiser et al., “Towards Provable Timing-Channel Prevention,” ACM
Communications and Networking Conference, 2008. SIGOPS Operating Systems Review, 2020.
[6] C. Dall and J. Nieh, “KVM/ARM: The Design and Implementation of [35] ——, “Can We Put the ”S” Into IoT?” in Proc. of IEEE World Forum
the Linux ARM Hypervisor,” ACM SIGARCH Computer Architecture on Internet of Things, 2022.
News, 2014. [36] Y. Li et al, “A Virtualized Separation Kernel for Mixed Criticality
[7] R. Ramsauer et al., “A Novel Software Architecture for Mixed Criticality Systems,” SIGPLAN Notices, 2014.
Systems,” in Digital Transformation in Semiconductor Manufacturing, [37] R. West et al., “A Virtualized Separation Kernel for Mixed-Criticality
2020. Systems,” ACM Transactions on Computer Systems, 2016.
[8] J. Martins et al., “Bao: A Lightweight Static Partitioning Hypervisor for [38] S. Sinha and R. West, “Towards an integrated vehicle management
Modern Multi-Core Embedded Systems,” in Proc. of Workshop on Next system in driveos,” ACM Transactions on Computer Systems, 2021.
Generation Real-Time Embedded Systems (NG-RES), 2020. [39] H. Li et al., “ACRN: A Big Little Hypervisor for IoT Development,”
[9] S. VanderLeest and D. White, “MPSoC hypervisor: The safe & secure in Proc. of International Conference on Virtual Execution Environments
future of avionics,” in Proc. of Digital Avionics Systems Conference (VEE), 2019.
(DASC), 2015. [40] A. Patel et al., “Embedded Hypervisor Xvisor: A Comparative Analysis,”
[10] P. Burgio et al., “A software stack for next-generation automotive in Proc. of Euromicro International Conference on Parallel, Distributed,
systems on many-core heterogeneous platforms,” Microprocessors and and Network-Based Processing, 2015.
Microsystems, 2017. [41] U. Steinberg and B. Kauer, “NOVA: A Microhypervisor-Based Secure
[11] F. Rehm et al, “The Road towards Predictable Automotive High - Virtualization Architecture,” in Proc. of European Conference on Com-
Performance Platforms,” in Proc. of Design, Automation and Test in puter Systems, 2010.
Europe Conference (DATE), 2021. [42] S. Pinto et al., “LTZVisor: TrustZone is the Key,” in Proc. of Euromicro
[12] J. Martins, “ESRGv3/shedding-light-static-partitioning- Conference on Real-Time Systems (ECRTS), 2017.
hypervisors: v0.1.0,” 2023. [Online]. Available: [43] J. Martins et al., “µRTZVisor: A Secure and Safe Real-Time Hypervi-
https://doi.org/10.5281/zenodo.7696937 sor,” Electronics, 2017.
[13] Arm, “Learn the architecture: AArch64 Virtualization,” [44] S. Pinto and N. Santos, “Demystifying Arm TrustZone: A Comprehen-
https://developer.arm.com/documentation/den0125/latest, 2022. sive Survey,” ACM Computing Surveys, 2019.
[14] A. Gordon et al., “ELI: Bare-Metal Performance for I/O Virtualization,” [45] “perf: Linux profiling with performance counters.” [Online]. Available:
SIGPLAN Notices, 2012. https://perf.wiki.kernel.org/index.php/Main Page
[15] G. Gracioli et al., “A Survey on Cache Management Mechanisms for [46] M. Guthaus et al., “MiBench: A free, commercially representative
Real-Time Embedded Systems,” ACM Computing Surveys, 2015. embedded benchmark suite,” in Proc. of International Workshop on
[16] Arm, “Software Delegated Exception Interface (SDEI),” Workload Characterization (WWC), 2001.
https://developer.arm.com/documentation/den0054/latest, 2021. [47] T. Moscibroda and O. Mutlu, “Memory Performance Attacks: Denial of
[17] R. Ramsauer et al., “Look Mum, no VM Exits!(Almost),” in Proc. of Memory Service in Multi-Core Systems,” in Proc. of USENIX Security
Workshop on Operating Systems Platforms for Embedded Real-Time Symposium, 2007.
Applications (OSPERT), 2017. [48] P. Valsan et al., “Taming Non-Blocking Caches to Improve Isolation in
[18] ——, “Static Hardware Partitioning on RISC-V – Shortcomings, Lim- Multicore Real-Time Systems,” in Proc. of Real-Time and Embedded
itations, and Prospects,” in Proc. of IEEE World Forum on Internet of Technology and Applications Symposium (RTAS), 2016.
Things, 2022. [49] W. Hofer et al., “Sloth: Threads as Interrupts,” in Proc. of Real-Time
[19] T. Kloda et al., “Deterministic Memory Hierarchy and Virtualization Systems Symposium (RTSS), 2009.
for Modern Multi-Core Embedded Systems,” in Proc. of Real-Time and [50] Arm Ltd., “Arm Generic Interrupt Controller v3 and v4 - Virtualization,”
Embedded Technology and Applications Symposium (RTAS), 2019. 2022.
[20] P. Sohal et al., “E-WarP: A System-wide Framework for Memory [51] E. Hamelin et al., “Selection and evaluation of an embedded hypervisor:
Bandwidth Profiling and Management,” in Proc. of Real-Time Systems Application to an automotive platform,” in Proc. of European Congress
Symposium (RTSS), 2020. of Embedded Real Time Software and Systems, 2020.
[52] A. Golchin and R. West, “Jumpstart: Fast Critical Service Resumption
for a Partitioning Hypervisor in Embedded Systems,” in Proc. of Real-
Time and Embedded Technology and Applications Symposium (RTAS),
2022.
[53] Xilinx, “Zynq UltraScale+ Device: Technical Reference Manual,”
https://docs.xilinx.com/v/u/en-US/ug1085-zynq-ultrascale-trm, 2020.
[54] S. Biggs et al., “The Jury Is In: Monolithic OS Design Is Flawed:
Microkernel-Based Designs Improve Security,” in Proc. of Asia-Pacific
Workshop on Systems, 2018.
[55] Al Danial, “cloc - count lines of code,” https://github.com/AlDanial/cloc.
[56] A. Mygaiev and S. Stabellini, “Xen FuSa SIG update,” in Xen
Project Developer and Design Summit, 2021. [Online]. Available:
https://www.youtube.com/watch?v=XMNaIWZ-2sU
[57] S. Roozkhosh and R. Mancuso, “The potential of programmable logic in
the middle: Cache bleaching,” in Real-Time and Embedded Technology
and Applications Symposium (RTAS), 2020.
[58] H. Yun et al., “MemGuard: Memory bandwidth reservation system for
efficient performance isolation in multi-core platforms,” in Proc. of Real-
Time and Embedded Technology and Applications Symposium (RTAS),
2013.
[59] P. Modica at al., “Supporting temporal and spatial isolation in a
hypervisor for ARM multicore platforms,” in Proc. of International
Conference on Industrial Technology (ICIT), 2018.
[60] Arm Ltd., “Arm Architecture Reference Manual Supplement - Memory
System Resource Partitioning and Monitoring (MPAM), for A-profile
architecture,” 2022.
[61] G. Ghaemi et al., “Governing with Insights: Towards Profile-Driven
Cache Management of Black-Box Applications,” in Proc. of Euromicro
Conference on Real-Time Systems (ECRTS), 2021.
[62] M. Zini et al., “Profiling and controlling I/O-related memory contention
in COTS heterogeneous platforms,” Software: Practice and Experience,
2022.
[63] A. Panchamukhi and F. Mueller, “Providing task isolation via tlb
coloring,” in Real-Time and Embedded Technology and Applications
Symposium (RTAS), 2015.
[64] A. Baumann et al., “The multikernel: A new os architecture for scalable
multicore systems,” in Proc. of ACM Symposium on Operating Systems
Principles (SOSP), 2009.
[65] X. Li et al., “Design and verification of the arm confidential compute
architecture,” in USENIX Symposium on Operating Systems Design and
Implementation (OSDI), 2022.
[66] D. Lee et al., “Keystone: An open framework for architecting trusted
execution environments,” in Proc. of European Conference on Computer
Systems (EuroSys), 2020.
[67] Arm Ltd., “Arm System Control and Management Interface - Platform
Design Document, Version 3.1,” 2022.
[68] I. Pavic and H. Dzapo, “Virtualization in multicore real-time embedded
systems for improvement of interrupt latency,” in Proc. of International
Convention on Information and Communication Technology, Electronics
and Microelectronics (MIPRO), 2018.
[69] J. Danielsson et al., “Testing Performance-Isolation in Multi-core Sys-
tems,” in Proc. of Annual Computer Software and Applications Confer-
ence (COMPSAC), 2019.
[70] L. Abeni and D. Faggioli, “Using Xen and KVM as real-time hypervi-
sors,” Journal of Systems Architecture, 2020.
[71] S. Stabellini, “Xen Cache-Coloring: Interference Free Real-Time
Systems,” in Open Source Summit (Noth America), 2020. [Online].
Available: https://www.youtube.com/watch?v=9cA0QK2CdwQ
[72] M. Cinque et al., “Virtualizing mixed-criticality systems: A survey on
industrial trends and issues,” Future Generation Computer Systems,
2022.

View publication stats

Common questions

Powered by AI

In SPH implementations like Bao and Jailhouse, where the direct injection technique is used, the impact on interrupt priority management and latency is minimized, providing near-native latencies and correct priority handling even under interference . However, latency in other SPHs like Xen is higher due to a less optimized path. Priority management is critical, as ensuring correct priority order is essential for real-time operating systems . Limitations arise when interference increases, despite direct injection use, indicating inherent architectural challenges in maintaining consistent latency across varied workloads .

SPH implementation affects interrupt handling predictability through the efficiency of interrupt paths and handling mechanisms. Bao and Jailhouse have optimized paths leading to predictable interrupt latencies despite some variance under interference. In contrast, Xen experiences more substantial latency and variance due to a less optimized path . Interrupt latency predictability is further influenced by reductions in L2 cache misses achieved through VM and hypervisor-level coloring, although complete mitigation is challenging .

Different SPHs experience varying levels of L2 cache interference impacting interrupt latency. Bao eliminates L2 cache misses with full coloring, enhancing performance predictability, whereas Xen, lacking complete coloring, suffers higher latencies . This discrepancy indicates that while VM-level coloring lowers cache interference, hypervisors require their own coloring strategies to ensure low latencies, highlighting the relationship between cache management strategies and system performance . The inability to mitigate hypervisor L2 cache misses fully in Xen suggests inherent challenges in optimizing SPHs' performance and latency.

Hypervisor coloring has limited substantial benefit for Xen and Bao, as it only reduces performance degradation from interference by a marginal amount, less than 1% additional improvement . While it helps reduce L2 cache misses at the hypervisor level in Bao, the benefits in Xen are minimal due to its partially implemented coloring patch . This suggests that while beneficial for certain aspects, coloring alone may not be sufficient to address all interference issues effectively.

Multicore memory hierarchy interference significantly degrades guest performance primarily due to increased L2 cache misses, affecting execution flow and leading to reduced overall system efficiency . While page coloring can eliminate inter-core conflict misses, it does not fully resolve interference, leading to increased performance overhead in hypervisors, particularly when working within busy multi-threaded environments .

The direct injection technique is effective in reducing interrupt latency to near-native levels under no interference conditions, as it avoids additional traps to hypervisors . For example, with direct injection, average latency drops to near-native levels (243 ns for Bao and 232 ns for Jailhouse). However, latency and variance still increase under interference, albeit less than in systems without direct injection . This technique requires careful system design to maintain its effectiveness across different workloads and interference situations.

Cache coloring is effective in reducing guest L2 cache misses back to base case values but is less effective for hypervisor L2 cache misses under interference. Hypervisor cache misses increase substantially, and only by coloring the hypervisor level can these be minimized . On Bao, L2 cache misses are fully eliminated, unlike Xen where latency does not fully reduce to non-interference levels due to incomplete cache coloring implementation .

Interrupt latency correlates with the architecture and interrupt handling path of hypervisors. Bao and Jailhouse have optimized interrupt paths, leading to lower latency increases (4x and 5x) compared to Xen (14x) and seL4-VMM (47x). This difference arises from the number of executed instructions upon interrupt: Bao and Jailhouse handle around 200 instructions, while Xen executes about 1050 . The variance in latency is minimal in optimized paths but significant in microkernel-based architectures like seL4-VMM due to multiple traps to the microkernel .

Critical factors affecting interrupt latency variance in Bao and Jailhouse during interference include the efficacy of direct injection technique and level of cache coloring applied. Direct injection reduces traps to the hypervisor, decreasing latency variance, yet interference still affects performance variance (e.g. from 243 ns in Bao and 232 ns in Jailhouse). Cache coloring mitigates guest L2 cache misses to base levels, yet hypervisor-level misses require additional coloring to further reduce variance . These factors indicate the necessity of optimizing both software techniques and hardware resource management to minimize latency variance effectively.

Interference significantly impacts benchmark execution across all hypervisors due to increased L2 cache misses. This interference leads to performance degradation, especially in the seL4-VMM, where the execution of a higher number of instructions makes it more susceptible, reaching up to 125% in the worst case . Coloring, although helpful in reducing some interference, results in about a 20% performance impact because it uses smaller pages, reducing TLB reach and available cache space, leading to increased memory pressure . Cache partitioning via page coloring eliminates inter-core conflict misses but does not fully mitigate total interference, with up to a 38 pp increase in overhead .

You might also like