linux/Documentation/virtual/kvm/ppc-pv.txt
<<
>>
Prefs
   1The PPC KVM paravirtual interface
   2=================================
   3
   4The basic execution principle by which KVM on PowerPC works is to run all kernel
   5space code in PR=1 which is user space. This way we trap all privileged
   6instructions and can emulate them accordingly.
   7
   8Unfortunately that is also the downfall. There are quite some privileged
   9instructions that needlessly return us to the hypervisor even though they
  10could be handled differently.
  11
  12This is what the PPC PV interface helps with. It takes privileged instructions
  13and transforms them into unprivileged ones with some help from the hypervisor.
  14This cuts down virtualization costs by about 50% on some of my benchmarks.
  15
  16The code for that interface can be found in arch/powerpc/kernel/kvm*
  17
  18Querying for existence
  19======================
  20
  21To find out if we're running on KVM or not, we leverage the device tree. When
  22Linux is running on KVM, a node /hypervisor exists. That node contains a
  23compatible property with the value "linux,kvm".
  24
  25Once you determined you're running under a PV capable KVM, you can now use
  26hypercalls as described below.
  27
  28KVM hypercalls
  29==============
  30
  31Inside the device tree's /hypervisor node there's a property called
  32'hypercall-instructions'. This property contains at most 4 opcodes that make
  33up the hypercall. To call a hypercall, just call these instructions.
  34
  35The parameters are as follows:
  36
  37        Register        IN                      OUT
  38
  39        r0              -                       volatile
  40        r3              1st parameter           Return code
  41        r4              2nd parameter           1st output value
  42        r5              3rd parameter           2nd output value
  43        r6              4th parameter           3rd output value
  44        r7              5th parameter           4th output value
  45        r8              6th parameter           5th output value
  46        r9              7th parameter           6th output value
  47        r10             8th parameter           7th output value
  48        r11             hypercall number        8th output value
  49        r12             -                       volatile
  50
  51Hypercall definitions are shared in generic code, so the same hypercall numbers
  52apply for x86 and powerpc alike with the exception that each KVM hypercall
  53also needs to be ORed with the KVM vendor code which is (42 << 16).
  54
  55Return codes can be as follows:
  56
  57        Code            Meaning
  58
  59        0               Success
  60        12              Hypercall not implemented
  61        <0              Error
  62
  63The magic page
  64==============
  65
  66To enable communication between the hypervisor and guest there is a new shared
  67page that contains parts of supervisor visible register state. The guest can
  68map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
  69
  70With this hypercall issued the guest always gets the magic page mapped at the
  71desired location. The first parameter indicates the effective address when the
  72MMU is enabled. The second parameter indicates the address in real mode, if
  73applicable to the target. For now, we always map the page to -4096. This way we
  74can access it using absolute load and store functions. The following
  75instruction reads the first field of the magic page:
  76
  77        ld      rX, -4096(0)
  78
  79The interface is designed to be extensible should there be need later to add
  80additional registers to the magic page. If you add fields to the magic page,
  81also define a new hypercall feature to indicate that the host can give you more
  82registers. Only if the host supports the additional features, make use of them.
  83
  84The magic page layout is described by struct kvm_vcpu_arch_shared
  85in arch/powerpc/include/asm/kvm_para.h.
  86
  87Magic page features
  88===================
  89
  90When mapping the magic page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE,
  91a second return value is passed to the guest. This second return value contains
  92a bitmap of available features inside the magic page.
  93
  94The following enhancements to the magic page are currently available:
  95
  96  KVM_MAGIC_FEAT_SR             Maps SR registers r/w in the magic page
  97
  98For enhanced features in the magic page, please check for the existence of the
  99feature before using them!
 100
 101MSR bits
 102========
 103
 104The MSR contains bits that require hypervisor intervention and bits that do
 105not require direct hypervisor intervention because they only get interpreted
 106when entering the guest or don't have any impact on the hypervisor's behavior.
 107
 108The following bits are safe to be set inside the guest:
 109
 110  MSR_EE
 111  MSR_RI
 112
 113If any other bit changes in the MSR, please still use mtmsr(d).
 114
 115Patched instructions
 116====================
 117
 118The "ld" and "std" instructions are transormed to "lwz" and "stw" instructions
 119respectively on 32 bit systems with an added offset of 4 to accommodate for big
 120endianness.
 121
 122The following is a list of mapping the Linux kernel performs when running as
 123guest. Implementing any of those mappings is optional, as the instruction traps
 124also act on the shared page. So calling privileged instructions still works as
 125before.
 126
 127From                    To
 128====                    ==
 129
 130mfmsr   rX              ld      rX, magic_page->msr
 131mfsprg  rX, 0           ld      rX, magic_page->sprg0
 132mfsprg  rX, 1           ld      rX, magic_page->sprg1
 133mfsprg  rX, 2           ld      rX, magic_page->sprg2
 134mfsprg  rX, 3           ld      rX, magic_page->sprg3
 135mfsrr0  rX              ld      rX, magic_page->srr0
 136mfsrr1  rX              ld      rX, magic_page->srr1
 137mfdar   rX              ld      rX, magic_page->dar
 138mfdsisr rX              lwz     rX, magic_page->dsisr
 139
 140mtmsr   rX              std     rX, magic_page->msr
 141mtsprg  0, rX           std     rX, magic_page->sprg0
 142mtsprg  1, rX           std     rX, magic_page->sprg1
 143mtsprg  2, rX           std     rX, magic_page->sprg2
 144mtsprg  3, rX           std     rX, magic_page->sprg3
 145mtsrr0  rX              std     rX, magic_page->srr0
 146mtsrr1  rX              std     rX, magic_page->srr1
 147mtdar   rX              std     rX, magic_page->dar
 148mtdsisr rX              stw     rX, magic_page->dsisr
 149
 150tlbsync                 nop
 151
 152mtmsrd  rX, 0           b       <special mtmsr section>
 153mtmsr   rX              b       <special mtmsr section>
 154
 155mtmsrd  rX, 1           b       <special mtmsrd section>
 156
 157[Book3S only]
 158mtsrin  rX, rY          b       <special mtsrin section>
 159
 160[BookE only]
 161wrteei  [0|1]           b       <special wrteei section>
 162
 163
 164Some instructions require more logic to determine what's going on than a load
 165or store instruction can deliver. To enable patching of those, we keep some
 166RAM around where we can live translate instructions to. What happens is the
 167following:
 168
 169        1) copy emulation code to memory
 170        2) patch that code to fit the emulated instruction
 171        3) patch that code to return to the original pc + 4
 172        4) patch the original instruction to branch to the new code
 173
 174That way we can inject an arbitrary amount of code as replacement for a single
 175instruction. This allows us to check for pending interrupts when setting EE=1
 176for example.
 177
 178Hypercall ABIs in KVM on PowerPC
 179=================================
 1801) KVM hypercalls (ePAPR)
 181
 182These are ePAPR compliant hypercall implementation (mentioned above). Even
 183generic hypercalls are implemented here, like the ePAPR idle hcall. These are
 184available on all targets.
 185
 1862) PAPR hypercalls
 187
 188PAPR hypercalls are needed to run server PowerPC PAPR guests (-M pseries in QEMU).
 189These are the same hypercalls that pHyp, the POWER hypervisor implements. Some of
 190them are handled in the kernel, some are handled in user space. This is only
 191available on book3s_64.
 192
 1933) OSI hypercalls
 194
 195Mac-on-Linux is another user of KVM on PowerPC, which has its own hypercall (long
 196before KVM). This is supported to maintain compatibility. All these hypercalls get
 197forwarded to user space. This is only useful on book3s_32, but can be used with
 198book3s_64 as well.
 199
lxr.linux.no kindly hosted by Redpill Linpro AS, provider of Linux consulting and operations services since 1995.