1 The MSI Driver Guide HOWTO 2 Tom L Nguyen tom.l.nguyen@intel.com 3 10/03/2003 4 51. About this guide 6 7This guide describes the basics of Message Signaled Interrupts(MSI), the 8advantages of using MSI over traditional interrupt mechanisms, and how 9to enable your driver to use MSI or MSI-X. Also included is a Frequently 10Asked Questions. 11 122. Copyright 2003 Intel Corporation 13 143. What is MSI/MSI-X? 15 16Message Signaled Interrupt (MSI), as described in the PCI Local Bus 17Specification Revision 2.3 or latest, is an optional feature, and a 18required feature for PCI Express devices. MSI enables a device function 19to request service by sending an Inbound Memory Write on its PCI bus to 20the FSB as a Message Signal Interrupt transaction. Because MSI is 21generated in the form of a Memory Write, all transaction conditions, 22such as a Retry, Master-Abort, Target-Abort or normal completion, are 23supported. 24 25A PCI device that supports MSI must also support pin IRQ assertion 26interrupt mechanism to provide backward compatibility for systems that 27do not support MSI. In Systems, which support MSI, the bus driver is 28responsible for initializing the message address and message data of 29the device function's MSI/MSI-X capability structure during device 30initial configuration. 31 32An MSI capable device function indicates MSI support by implementing 33the MSI/MSI-X capability structure in its PCI capability list. The 34device function may implement both the MSI capability structure and 35the MSI-X capability structure; however, the bus driver should not 36enable both, but instead enable only the MSI-X capability structure. 37 38The MSI capability structure contains Message Control register, 39Message Address register and Message Data register. These registers 40provide the bus driver control over MSI. The Message Control register 41indicates the MSI capability supported by the device. The Message 42Address register specifies the target address and the Message Data 43register specifies the characteristics of the message. To request 44service, the device function writes the content of the Message Data 45register to the target address. The device and its software driver 46are prohibited from writing to these registers. 47 48The MSI-X capability structure is an optional extension to MSI. It 49uses an independent and separate capability structure. There are 50some key advantages to implementing the MSI-X capability structure 51over the MSI capability structure as described below. 52 53 - Support a larger maximum number of vectors per function. 54 55 - Provide the ability for system software to configure 56 each vector with an independent message address and message 57 data, specified by a table that resides in Memory Space. 58 59 - MSI and MSI-X both support per-vector masking. Per-vector 60 masking is an optional extension of MSI but a required 61 feature for MSI-X. Per-vector masking provides the kernel 62 the ability to mask/unmask MSI when servicing its software 63 interrupt service routing handler. If per-vector masking is 64 not supported, then the device driver should provide the 65 hardware/software synchronization to ensure that the device 66 generates MSI when the driver wants it to do so. 67 684. Why use MSI? 69 70As a benefit the simplification of board design, MSI allows board 71designers to remove out of band interrupt routing. MSI is another 72step towards a legacy-free environment. 73 74Due to increasing pressure on chipset and processor packages to 75reduce pin count, the need for interrupt pins is expected to 76diminish over time. Devices, due to pin constraints, may implement 77messages to increase performance. 78 79PCI Express endpoints uses INTx emulation (in-band messages) instead 80of IRQ pin assertion. Using INTx emulation requires interrupt 81sharing among devices connected to the same node (PCI bridge) while 82MSI is unique (non-shared) and does not require BIOS configuration 83support. As a result, the PCI Express technology requires MSI 84support for better interrupt performance. 85 86Using MSI enables the device functions to support two or more 87vectors, which can be configure to target different CPU's to 88increase scalability. 89 905. Configuring a driver to use MSI/MSI-X 91 92By default, the kernel will not enable MSI/MSI-X on all devices that 93support this capability once the patch is installed. A kernel 94configuration option must be selected to enable MSI/MSI-X support. 95 965.1 Including MSI support into the kernel 97 98To include MSI support into the kernel requires users to patch the 99VECTOR-base patch first and then the MSI patch because the MSI 100support needs VECTOR based scheme. Once these patches are installed, 101setting CONFIG_PCI_USE_VECTOR enables the VECTOR based scheme and 102the option for MSI-capable device drivers to selectively enable MSI 103(using pci_enable_msi as desribed below). 104 105Since the target of the inbound message is the local APIC, providing 106CONFIG_PCI_USE_VECTOR is dependent on whether CONFIG_X86_LOCAL_APIC 107is enabled or not. 108 109int pci_enable_msi(struct pci_dev *) 110 111With this new API, any existing device driver, which like to have 112MSI enabled on its device function, must call this explicitly. A 113successful call will initialize the MSI/MSI-X capability structure 114with ONE vector, regardless of whether the device function is 115capable of supporting multiple messages. This vector replaces the 116pre-assigned dev->irq with a new MSI vector. To avoid the conflict 117of new assigned vector with existing pre-assigned vector requires 118the device driver to call this API before calling request_irq(...). 119 120The below diagram shows the events, which switches the interrupt 121mode on the MSI-capable device function between MSI mode and 122PIN-IRQ assertion mode. 123 124 ------------ pci_enable_msi ------------------------ 125 | | <=============== | | 126 | MSI MODE | | PIN-IRQ ASSERTION MODE | 127 | | ===============> | | 128 ------------ free_irq ------------------------ 129 1305.2 Configuring for MSI support 131 132Due to the non-contiguous fashion in vector assignment of the 133existing Linux kernel, this patch does not support multiple 134messages regardless of the device function is capable of supporting 135more than one vector. The bus driver initializes only entry 0 of 136this capability if pci_enable_msi(...) is called successfully by 137the device driver. 138 1395.3 Configuring for MSI-X support 140 141Both the MSI capability structure and the MSI-X capability structure 142share the same above semantics; however, due to the ability of the 143system software to configure each vector of the MSI-X capability 144structure with an independent message address and message data, the 145non-contiguous fashion in vector assignment of the existing Linux 146kernel has no impact on supporting multiple messages on an MSI-X 147capable device functions. By default, as mentioned above, ONE vector 148should be always allocated to the MSI-X capability structure at 149entry 0. The bus driver does not initialize other entries of the 150MSI-X table. 151 152Note that the PCI subsystem should have full control of a MSI-X 153table that resides in Memory Space. The software device driver 154should not access this table. 155 156To request for additional vectors, the device software driver should 157call function msi_alloc_vectors(). It is recommended that the 158software driver should call this function once during the 159initialization phase of the device driver. 160 161The function msi_alloc_vectors(), once invoked, enables either 162all or nothing, depending on the current availability of vector 163resources. If no vector resources are available, the device function 164still works with ONE vector. If the vector resources are available 165for the number of vectors requested by the driver, this function 166will reconfigure the MSI-X capability structure of the device with 167additional messages, starting from entry 1. To emphasize this 168reason, for example, the device may be capable for supporting the 169maximum of 32 vectors while its software driver usually may request 1704 vectors. 171 172For each vector, after this successful call, the device driver is 173responsible to call other functions like request_irq(), enable_irq(), 174etc. to enable this vector with its corresponding interrupt service 175handler. It is the device driver's choice to have all vectors shared 176the same interrupt service handler or each vector with a unique 177interrupt service handler. 178 179In addition to the function msi_alloc_vectors(), another function 180msi_free_vectors() is provided to allow the software driver to 181release a number of vectors back to the vector resources. Once 182invoked, the PCI subsystem disables (masks) each vector released. 183These vectors are no longer valid for the hardware device and its 184software driver to use. Like free_irq, it recommends that the 185device driver should also call msi_free_vectors to release all 186additional vectors previously requested. 187 188int msi_alloc_vectors(struct pci_dev *dev, int *vector, int nvec) 189 190This API enables the software driver to request the PCI subsystem 191for additional messages. Depending on the number of vectors 192available, the PCI subsystem enables either all or nothing. 193 194Argument dev points to the device (pci_dev) structure. 195Argument vector is a pointer of integer type. The number of 196elements is indicated in argument nvec. 197Argument nvec is an integer indicating the number of messages 198requested. 199A return of zero indicates that the number of allocated vector is 200successfully allocated. Otherwise, indicate resources not 201available. 202 203int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec) 204 205This API enables the software driver to inform the PCI subsystem 206that it is willing to release a number of vectors back to the 207MSI resource pool. Once invoked, the PCI subsystem disables each 208MSI-X entry associated with each vector stored in the argument 2. 209These vectors are no longer valid for the hardware device and 210its software driver to use. 211 212Argument dev points to the device (pci_dev) structure. 213Argument vector is a pointer of integer type. The number of 214elements is indicated in argument nvec. 215Argument nvec is an integer indicating the number of messages 216released. 217A return of zero indicates that the number of allocated vectors 218is successfully released. Otherwise, indicates a failure. 219 2205.4 Hardware requirements for MSI support 221MSI support requires support from both system hardware and 222individual hardware device functions. 223 2245.4.1 System hardware support 225Since the target of MSI address is the local APIC CPU, enabling 226MSI support in Linux kernel is dependent on whether existing 227system hardware supports local APIC. Users should verify their 228system whether it runs when CONFIG_X86_LOCAL_APIC=y. 229 230In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set; 231however, in UP environment, users must manually set 232CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting 233CONFIG_PCI_USE_VECTOR enables the VECTOR based scheme and 234the option for MSI-capable device drivers to selectively enable 235MSI (using pci_enable_msi as desribed below). 236 237Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI 238vector is allocated new during runtime and MSI support does not 239depend on BIOS support. This key independency enables MSI support 240on future IOxAPIC free platform. 241 2425.4.2 Device hardware support 243The hardware device function supports MSI by indicating the 244MSI/MSI-X capability structure on its PCI capability list. By 245default, this capability structure will not be initialized by 246the kernel to enable MSI during the system boot. In other words, 247the device function is running on its default pin assertion mode. 248Note that in many cases the hardware supporting MSI have bugs, 249which may result in system hang. The software driver of specific 250MSI-capable hardware is responsible for whether calling 251pci_enable_msi or not. A return of zero indicates the kernel 252successfully initializes the MSI/MSI-X capability structure of the 253device funtion. The device function is now running on MSI mode. 254 2555.5 How to tell whether MSI is enabled on device function 256 257At the driver level, a return of zero from pci_enable_msi(...) 258indicates to the device driver that its device function is 259initialized successfully and ready to run in MSI mode. 260 261At the user level, users can use command 'cat /proc/interrupts' 262to display the vector allocated for the device and its interrupt 263mode, as shown below. 264 265 CPU0 CPU1 266 0: 324639 0 IO-APIC-edge timer 267 1: 1186 0 IO-APIC-edge i8042 268 2: 0 0 XT-PIC cascade 269 12: 2797 0 IO-APIC-edge i8042 270 14: 6543 0 IO-APIC-edge ide0 271 15: 1 0 IO-APIC-edge ide1 272169: 0 0 IO-APIC-level uhci-hcd 273185: 0 0 IO-APIC-level uhci-hcd 274193: 138 10 PCI MSI aic79xx 275201: 30 0 PCI MSI aic79xx 276225: 30 0 IO-APIC-level aic7xxx 277233: 30 0 IO-APIC-level aic7xxx 278NMI: 0 0 279LOC: 324553 325068 280ERR: 0 281MIS: 0 282 2836. FAQ 284 285Q1. Are there any limitations on using the MSI? 286 287A1. If the PCI device supports MSI and conforms to the 288specification and the platform supports the APIC local bus, 289then using MSI should work. 290 291Q2. Will it work on all the Pentium processors (P3, P4, Xeon, 292AMD processors)? In P3 IPI's are transmitted on the APIC local 293bus and in P4 and Xeon they are transmitted on the system 294bus. Are there any implications with this? 295 296A2. MSI support enables a PCI device sending an inbound 297memory write (0xfeexxxxx as target address) on its PCI bus 298directly to the FSB. Since the message address has a 299redirection hint bit cleared, it should work. 300 301Q3. The target address 0xfeexxxxx will be translated by the 302Host Bridge into an interrupt message. Are there any 303limitations on the chipsets such as Intel 8xx, Intel e7xxx, 304or VIA? 305 306A3. If these chipsets support an inbound memory write with 307target address set as 0xfeexxxxx, as conformed to PCI 308specification 2.3 or latest, then it should work. 309 310Q4. From the driver point of view, if the MSI is lost because 311of the errors occur during inbound memory write, then it may 312wait for ever. Is there a mechanism for it to recover? 313 314A4. Since the target of the transaction is an inbound memory 315write, all transaction termination conditions (Retry, 316Master-Abort, Target-Abort, or normal completion) are 317supported. A device sending an MSI must abide by all the PCI 318rules and conditions regarding that inbound memory write. So, 319if a retry is signaled it must retry, etc... We believe that 320the recommendation for Abort is also a retry (refer to PCI 321specification 2.3 or latest). 322

