0% found this document useful (0 votes)
50 views21 pages

KVM Forum 2013 Nested Virtualization Shadow Turtles

This document summarizes a presentation on nested virtualization. It discusses shadow virtual machine control structures (VMCS) which allow a host hypervisor to access and merge guest VMCS structures without exits, improving performance of nested virtualization. It covers new features for nested VMX like nested EPT and preemption timers. The document also explains what a VMCS is, how nested VMX works, and the implementation of shadow VMCS to eliminate exits when accessing guest VMCS structures.

Uploaded by

cebila9115
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views21 pages

KVM Forum 2013 Nested Virtualization Shadow Turtles

This document summarizes a presentation on nested virtualization. It discusses shadow virtual machine control structures (VMCS) which allow a host hypervisor to access and merge guest VMCS structures without exits, improving performance of nested virtualization. It covers new features for nested VMX like nested EPT and preemption timers. The document also explains what a VMCS is, how nested VMX works, and the implementation of shadow VMCS to eliminate exits when accessing guest VMCS structures.

Uploaded by

cebila9115
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Nested virtualization:

shadow turtles
Orit Wasserman
Red Hat

KVM forum 2013


Agenda

• Nested virtualization (the turtles project) overview


• Whats new in nested VMX?
• What is VMCS?
• VMCS and nested virtualization
• Shadow VMCS implementation

Karen Noel

Karen Noel

2 Karen
Orit Wasserman - KVM Noel
forum 2013
What is nested virtualization?
• L0 – hypervisor running directly
on the hardware (host hypervisor)
L2 L2
• L1 – guest hypervisor or nested
hypervisor
L1
• L2 – nested guest
• VMCSx→y – VMCS used by Lx to
run Ly (VMCSxy for short) L0
Karen Noel
• The scope of the talk is limited to
Intel x86 architecture.

Karen Noel

3 Karen
Orit Wasserman - KVM Noel
forum 2013
What is nested virtualization?
• Running multiple unmodified
hypervisors, with their associated
L2 L2
unmodified guest VM's
simultaneously on the x86
hardware L1

• x86 supports a single level of


virtualization
L0
• Does not support nesting in
Karen Noel
hardware (mainframe does)

Karen Noel

4 Karen
Orit Wasserman - KVM Noel
forum 2013
Why?
• Operating system may have built in hypervisors (Windows
2008R2/2012R2, Linux/KVM)
• To be able to run another hypervisor in the cloud
• Security (e.g. Hypervisor level toolkit)
• Co-design of x86 hardware and system software
• Testing, demonstrating and debugging hypervisors
• Live migration of hypervisors
Karen Noel

Karen Noel

5 Karen
Orit Wasserman - KVM Noel
forum 2013
How?
• L0 multiplexes the hardware between L1 and L2, running
both as guests of L0 – without either being aware of it

Karen Noel

Karen Noel

6 Karen
Orit Wasserman - KVM Noel
forum 2013
How?
1. L0 runs L1 with VMCS0→1
2. L1 prepares VMCS1→2 and executes
vmlaunch
3. vmlaunch traps to L0
4. L0 merges VMCS 0→1 with VMCS 1→2
into VMCS0→2
5. L0 launches L2
6. L2 causes a trap
Karen
7. L0 handles trap itself or forwards it to L1Noel
8. ...
9. Eventually, L0 resumes L2
10.Repeat Karen Noel

7 Karen
Orit Wasserman - KVM Noel
forum 2013
Nested VMX cost

• To handle a single L2 exit, L1 does many things: read and write the
VMCS, disable interrupts, etc
• Those operations can trap, leading to exit multiplication
• Exit multiplication: a single L2 exit can cause 40-50 L1 exits!
• Optimize: execute a single exit faster and reduce frequency of exits

Karen Noel

Karen Noel

8 Karen
Orit Wasserman - KVM Noel
forum 2013
Whats new in nested VMX?
• Many bug fixes :)
• Lots of new tests in kvm-unit-tests (Arthur Chunqi Li)
• Nested EPT (Nadav Har'El/Gleb Natapov) – Gleb will talk
about it next, don't miss it!

Karen Noel

Karen Noel

9 Karen
Orit Wasserman - KVM Noel
forum 2013
“Unrestricted guest” support for nested VMX

• "Unrestricted Guest" feature was added to the VMX


specification in Intel Westmere and onward
• It allows kvm guests to run real mode and unpaged mode
code natively under VMX mode when EPT is turned on
• With the unrestricted guest there is no need to emulate the
guest real mode code in the vm86 container or in the
emulator
Karen Noel
• The guest big real mode code runs like native
• By Jan Kiszka

Karen Noel

10 Karen
Orit Wasserman - KVM Noel
forum 2013
VMX Preemption timer for nested VMX
• Enable setting a timer for the VM executing. When the timer
expires there will be a vmexit
• The timer is set for the VM time slice
• Used to improve virtual machine scheduling because VM
won't need to exit on every timer interrupt (fewer exits)
• By Arthur Chunqi Li

Karen Noel

Karen Noel

11 Karen
Orit Wasserman - KVM Noel
forum 2013
What is VMCS - Virtual Machine Control Structure

• Each vCPU has a structure/block for storing it's state and


information needed for running it.
• VMCS is stored in on-chip-memory
• Special VMX instructions to access it: VMREAD and
VMWRITE
• It is divided into 4 sections:
• Guest state
Karen Noel
• Host state
• Control – fields to control VMExit/VMEntry behavior
• Read Only – Usually contain VMExit information
Karen Noel

12 Karen
Orit Wasserman - KVM Noel
forum 2013
What is VMCS - Virtual Machine Control Structure cont

• Special encoding that can move fields between processor


versions
31 15 14 13 12 11 10 9 1 0

reserved (=0) W 0 T INDEX A

Legend:
Karen 10=32-bit,
W (width of field): 00=16-bit, 01=64-bit, Noel 11=natural-width
T (Type of field): 00=control, 01=read-only, 10=guest-state, 11=host-state
A (Access-type): 0= full, 1=high
(NOTE: Access-type must be ‘full’ for 16-bit, 32-bit, and ‘natural’ widths)
Karen Noel

13 Karen
Orit Wasserman - KVM Noel
forum 2013
Nested VMX and VMCS accesses
• Every time the L1 hypervisor
access VMCS1→2 it causes an exit
• Need to eliminate those exits
• One solution is to use a para-virtual
nested hypervisor as is done in the
turtles project
• Binary patching – Could be
complicated as VMREADKaren
and Noel
VMWRITE are short commands

Karen Noel

14 Karen
Orit Wasserman - KVM Noel
forum 2013
Shadow VMCS
• Allow L0 hypervisor to define a
shadow VMCS
• This VMCS can be accessed
without a vmexit in guest mode
• Removes the extra exits penalty
for nested virtualization
• Was added in the Haswell
architecture Karen Noel

Karen Noel

15 Karen
Orit Wasserman - KVM Noel
forum 2013
Shadow VMCS implementation (1/3)
• Shadow VMCS is processor-dependent and must be accessed
by L0 or L1 using VMREAD and VMWRITE instructions
only
• To avoid hardware dependencies:
• Software defined VMCS1→2 format is part of L1 address
space
• Processor-specific shadow VMCS format is part of L0
address space
Karen Noel
• L0 synchronize the shadow VMCS content with the
software-controlled VMCS1→2 format
• Design simplifies live migration of L1, which does not
Karen Noel
depended on the shadow VMCS layout

16 Karen
Orit Wasserman - KVM Noel
forum 2013
Shadow VMCS implementation (2/3)
• Sync process:
• Before running L2 after switching from L1 we need to
update all the changes L1 did, from the shadow VMCS to
VMCS1→2:
• Load the shadow VMCS to the processor using
VMPTRLD
• Read each VMCS field with using VMREAD command
• Before switching back to L1 after running L2 we need to
sync from VMCS1→2Karen Noel
to the shadow VMCS:
• Load the shadow VMCS to the processor using
VMPTRLD
• Write each VMCS field with
Karen using VMWRITE
Noel
command
17 Karen
Orit Wasserman - KVM Noel
forum 2013
Shadow VMCS implementation (3/3)
• Reducing syncing cost:
• Shadow only the necessary fields
• Use a bitmap for fields that are shadowed for read
• A field will be synced in the first scenario only if the bit is
set
• Use a bitmap for fields that are shadowed for write
• A field in the second scenario will be synced only if the bit
is set
Karen Noel
• A flag to indicate that VMCS1→2 was changed by L0.
Reduce the second scenario occurrence

Karen Noel

18 Karen
Orit Wasserman - KVM Noel
forum 2013
Results
• By Abel Gordon
• From the turtles paper (DRW stands for direct read/write):

Karen Noel

Karen Noel

19 Karen
Orit Wasserman - KVM Noel
forum 2013
What is still missing for nested VMX?
• Stability
• Nested VT-d to allow usage of device assignment in nested
guests to improve I/O performance
• Running other hypervisors as L1 (nested hypervisor). ESX
requires Acknowledge interrupt on exit.
• Live migration

Karen Noel

Karen Noel

20 Karen
Orit Wasserman - KVM Noel
forum 2013

You might also like