2                          The Linux IPMI Driver
   3                          ---------------------
   4                              Corey Minyard
   5                          <>
   6                            <>
   8The Intelligent Platform Management Interface, or IPMI, is a
   9standard for controlling intelligent devices that monitor a system.
  10It provides for dynamic discovery of sensors in the system and the
  11ability to monitor the sensors and be informed when the sensor's
  12values change or go outside certain boundaries.  It also has a
  13standardized database for field-replaceable units (FRUs) and a watchdog
  16To use this, you need an interface to an IPMI controller in your
  17system (called a Baseboard Management Controller, or BMC) and
  18management software that can use the IPMI system.
  20This document describes how to use the IPMI driver for Linux.  If you
  21are not familiar with IPMI itself, see the web site at
  22  IPMI is a big
  23subject and I can't cover it all here!
  28The Linux IPMI driver is modular, which means you have to pick several
  29things to have it work right depending on your hardware.  Most of
  30these are available in the 'Character Devices' menu then the IPMI
  33No matter what, you must pick 'IPMI top-level message handler' to use
  34IPMI.  What you do beyond that depends on your needs and hardware.
  36The message handler does not provide any user-level interfaces.
  37Kernel code (like the watchdog) can still use it.  If you need access
  38from userland, you need to select 'Device interface for IPMI' if you
  39want access through a device driver.
  41The driver interface depends on your hardware.  If your system
  42properly provides the SMBIOS info for IPMI, the driver will detect it
  43and just work.  If you have a board with a standard interface (These
  44will generally be either "KCS", "SMIC", or "BT", consult your hardware
  45manual), choose the 'IPMI SI handler' option.
  47You should generally enable ACPI on your system, as systems with IPMI
  48can have ACPI tables describing them.
  50If you have a standard interface and the board manufacturer has done
  51their job correctly, the IPMI controller should be automatically
  52detected (via ACPI or SMBIOS tables) and should just work.  Sadly,
  53many boards do not have this information.  The driver attempts
  54standard defaults, but they may not work.  If you fall into this
  55situation, you need to read the section below named 'The SI Driver'.
  57IPMI defines a standard watchdog timer.  You can enable this with the
  58'IPMI Watchdog Timer' config option.  If you compile the driver into
  59the kernel, then via a kernel command-line option you can have the
  60watchdog timer start as soon as it initializes.  It also have a lot
  61of other options, see the 'Watchdog' section below for more details.
  62Note that you can also have the watchdog continue to run if it is
  63closed (by default it is disabled on close).  Go into the 'Watchdog
  64Cards' menu, enable 'Watchdog Timer Support', and enable the option
  65'Disable watchdog shutdown on close'.
  67IPMI systems can often be powered off using IPMI commands.  Select
  68'IPMI Poweroff' to do this.  The driver will auto-detect if the system
  69can be powered off by IPMI.  It is safe to enable this even if your
  70system doesn't support this option.  This works on ATCA systems, the
  71Radisys CPI1 card, and any IPMI system that supports standard chassis
  72management commands.
  74If you want the driver to put an event into the event log on a panic,
  75enable the 'Generate a panic event to all BMCs on a panic' option.  If
  76you want the whole panic string put into the event log using OEM
  77events, enable the 'Generate OEM events containing the panic string'
  80Basic Design
  83The Linux IPMI driver is designed to be very modular and flexible, you
  84only need to take the pieces you need and you can use it in many
  85different ways.  Because of that, it's broken into many chunks of
  86code.  These chunks (by module name) are:
  88ipmi_msghandler - This is the central piece of software for the IPMI
  89system.  It handles all messages, message timing, and responses.  The
  90IPMI users tie into this, and the IPMI physical interfaces (called
  91System Management Interfaces, or SMIs) also tie in here.  This
  92provides the kernelland interface for IPMI, but does not provide an
  93interface for use by application processes.
  95ipmi_devintf - This provides a userland IOCTL interface for the IPMI
  96driver, each open file for this device ties in to the message handler
  97as an IPMI user.
  99ipmi_si - A driver for various system interfaces.  This supports KCS,
 100SMIC, and BT interfaces.
 102ipmi_watchdog - IPMI requires systems to have a very capable watchdog
 103timer.  This driver implements the standard Linux watchdog timer
 104interface on top of the IPMI message handler.
 106ipmi_poweroff - Some systems support the ability to be turned off via
 107IPMI commands.
 109These are all individually selectable via configuration options.
 111Note that the KCS-only interface has been removed.  The af_ipmi driver
 112is no longer supported and has been removed because it was impossible
 113to do 32 bit emulation on 64-bit kernels with it.
 115Much documentation for the interface is in the include files.  The
 116IPMI include files are:
 118net/af_ipmi.h - Contains the socket interface.
 120linux/ipmi.h - Contains the user interface and IOCTL interface for IPMI.
 122linux/ipmi_smi.h - Contains the interface for system management interfaces
 123(things that interface to IPMI controllers) to use.
 125linux/ipmi_msgdefs.h - General definitions for base IPMI messaging.
 131The IPMI addressing works much like IP addresses, you have an overlay
 132to handle the different address types.  The overlay is:
 134  struct ipmi_addr
 135  {
 136        int   addr_type;
 137        short channel;
 138        char  data[IPMI_MAX_ADDR_SIZE];
 139  };
 141The addr_type determines what the address really is.  The driver
 142currently understands two different types of addresses.
 144"System Interface" addresses are defined as:
 146  struct ipmi_system_interface_addr
 147  {
 148        int   addr_type;
 149        short channel;
 150  };
 152and the type is IPMI_SYSTEM_INTERFACE_ADDR_TYPE.  This is used for talking
 153straight to the BMC on the current card.  The channel must be
 156Messages that are destined to go out on the IPMB bus use the
 157IPMI_IPMB_ADDR_TYPE address type.  The format is
 159  struct ipmi_ipmb_addr
 160  {
 161        int           addr_type;
 162        short         channel;
 163        unsigned char slave_addr;
 164        unsigned char lun;
 165  };
 167The "channel" here is generally zero, but some devices support more
 168than one channel, it corresponds to the channel as defined in the IPMI
 175Messages are defined as:
 177struct ipmi_msg
 179        unsigned char netfn;
 180        unsigned char lun;
 181        unsigned char cmd;
 182        unsigned char *data;
 183        int           data_len;
 186The driver takes care of adding/stripping the header information.  The
 187data portion is just the data to be send (do NOT put addressing info
 188here) or the response.  Note that the completion code of a response is
 189the first item in "data", it is not stripped out because that is how
 190all the messages are defined in the spec (and thus makes counting the
 191offsets a little easier :-).
 193When using the IOCTL interface from userland, you must provide a block
 194of data for "data", fill it, and set data_len to the length of the
 195block of data, even when receiving messages.  Otherwise the driver
 196will have no place to put the message.
 198Messages coming up from the message handler in kernelland will come in
 201  struct ipmi_recv_msg
 202  {
 203        struct list_head link;
 205        /* The type of message as defined in the "Receive Types"
 206           defines above. */
 207        int         recv_type;
 209        ipmi_user_t      *user;
 210        struct ipmi_addr addr;
 211        long             msgid;
 212        struct ipmi_msg  msg;
 214        /* Call this when done with the message.  It will presumably free
 215           the message and do any other necessary cleanup. */
 216        void (*done)(struct ipmi_recv_msg *msg);
 218        /* Place-holder for the data, don't make any assumptions about
 219           the size or existence of this, since it may change. */
 220        unsigned char   msg_data[IPMI_MAX_MSG_LENGTH];
 221  };
 223You should look at the receive type and handle the message
 227The Upper Layer Interface (Message Handler)
 230The upper layer of the interface provides the users with a consistent
 231view of the IPMI interfaces.  It allows multiple SMI interfaces to be
 232addressed (because some boards actually have multiple BMCs on them)
 233and the user should not have to care what type of SMI is below them.
 236Creating the User
 238To user the message handler, you must first create a user using
 239ipmi_create_user.  The interface number specifies which SMI you want
 240to connect to, and you must supply callback functions to be called
 241when data comes in.  The callback function can run at interrupt level,
 242so be careful using the callbacks.  This also allows to you pass in a
 243piece of data, the handler_data, that will be passed back to you on
 244all calls.
 246Once you are done, call ipmi_destroy_user() to get rid of the user.
 248From userland, opening the device automatically creates a user, and
 249closing the device automatically destroys the user.
 254To send a message from kernel-land, the ipmi_request() call does
 255pretty much all message handling.  Most of the parameter are
 256self-explanatory.  However, it takes a "msgid" parameter.  This is NOT
 257the sequence number of messages.  It is simply a long value that is
 258passed back when the response for the message is returned.  You may
 259use it for anything you like.
 261Responses come back in the function pointed to by the ipmi_recv_hndl
 262field of the "handler" that you passed in to ipmi_create_user().
 263Remember again, these may be running at interrupt level.  Remember to
 264look at the receive type, too.
 266From userland, you fill out an ipmi_req_t structure and use the
 267IPMICTL_SEND_COMMAND ioctl.  For incoming stuff, you can use select()
 268or poll() to wait for messages to come in.  However, you cannot use
 269read() to get them, you must call the IPMICTL_RECEIVE_MSG with the
 270ipmi_recv_t structure to actually get the message.  Remember that you
 271must supply a pointer to a block of data in the field, and
 272you must fill in the msg.data_len field with the size of the data.
 273This gives the receiver a place to actually put the message.
 275If the message cannot fit into the data you provide, you will get an
 276EMSGSIZE error and the driver will leave the data in the receive
 277queue.  If you want to get it and have it truncate the message, us
 280When you send a command (which is defined by the lowest-order bit of
 281the netfn per the IPMI spec) on the IPMB bus, the driver will
 282automatically assign the sequence number to the command and save the
 283command.  If the response is not receive in the IPMI-specified 5
 284seconds, it will generate a response automatically saying the command
 285timed out.  If an unsolicited response comes in (if it was after 5
 286seconds, for instance), that response will be ignored.
 288In kernelland, after you receive a message and are done with it, you
 289MUST call ipmi_free_recv_msg() on it, or you will leak messages.  Note
 290that you should NEVER mess with the "done" field of a message, that is
 291required to properly clean up the message.
 293Note that when sending, there is an ipmi_request_supply_msgs() call
 294that lets you supply the smi and receive message.  This is useful for
 295pieces of code that need to work even if the system is out of buffers
 296(the watchdog timer uses this, for instance).  You supply your own
 297buffer and own free routines.  This is not recommended for normal use,
 298though, since it is tricky to manage your own buffers.
 301Events and Incoming Commands
 303The driver takes care of polling for IPMI events and receiving
 304commands (commands are messages that are not responses, they are
 305commands that other things on the IPMB bus have sent you).  To receive
 306these, you must register for them, they will not automatically be sent
 307to you.
 309To receive events, you must call ipmi_set_gets_events() and set the
 310"val" to non-zero.  Any events that have been received by the driver
 311since startup will immediately be delivered to the first user that
 312registers for events.  After that, if multiple users are registered
 313for events, they will all receive all events that come in.
 315For receiving commands, you have to individually register commands you
 316want to receive.  Call ipmi_register_for_cmd() and supply the netfn
 317and command name for each command you want to receive.  You also
 318specify a bitmask of the channels you want to receive the command from
 319(or use IPMI_CHAN_ALL for all channels if you don't care).  Only one
 320user may be registered for each netfn/cmd/channel, but different users
 321may register for different commands, or the same command if the
 322channel bitmasks do not overlap.
 324From userland, equivalent IOCTLs are provided to do these functions.
 327The Lower Layer (SMI) Interface
 330As mentioned before, multiple SMI interfaces may be registered to the
 331message handler, each of these is assigned an interface number when
 332they register with the message handler.  They are generally assigned
 333in the order they register, although if an SMI unregisters and then
 334another one registers, all bets are off.
 336The ipmi_smi.h defines the interface for management interfaces, see
 337that for more details.
 340The SI Driver
 343The SI driver allows up to 4 KCS or SMIC interfaces to be configured
 344in the system.  By default, scan the ACPI tables for interfaces, and
 345if it doesn't find any the driver will attempt to register one KCS
 346interface at the spec-specified I/O port 0xca2 without interrupts.
 347You can change this at module load time (for a module) with:
 349  modprobe ipmi_si.o type=<type1>,<type2>....
 350       ports=<port1>,<port2>... addrs=<addr1>,<addr2>...
 351       irqs=<irq1>,<irq2>... trydefaults=[0|1]
 352       regspacings=<sp1>,<sp2>,... regsizes=<size1>,<size2>,...
 353       regshifts=<shift1>,<shift2>,...
 354       slave_addrs=<addr1>,<addr2>,...
 355       force_kipmid=<enable1>,<enable2>,...
 356       kipmid_max_busy_us=<ustime1>,<ustime2>,...
 357       unload_when_empty=[0|1]
 359Each of these except si_trydefaults is a list, the first item for the
 360first interface, second item for the second interface, etc.
 362The si_type may be either "kcs", "smic", or "bt".  If you leave it blank, it
 363defaults to "kcs".
 365If you specify si_addrs as non-zero for an interface, the driver will
 366use the memory address given as the address of the device.  This
 367overrides si_ports.
 369If you specify si_ports as non-zero for an interface, the driver will
 370use the I/O port given as the device address.
 372If you specify si_irqs as non-zero for an interface, the driver will
 373attempt to use the given interrupt for the device.
 375si_trydefaults sets whether the standard IPMI interface at 0xca2 and
 376any interfaces specified by ACPE are tried.  By default, the driver
 377tries it, set this value to zero to turn this off.
 379The next three parameters have to do with register layout.  The
 380registers used by the interfaces may not appear at successive
 381locations and they may not be in 8-bit registers.  These parameters
 382allow the layout of the data in the registers to be more precisely
 385The regspacings parameter give the number of bytes between successive
 386register start addresses.  For instance, if the regspacing is set to 4
 387and the start address is 0xca2, then the address for the second
 388register would be 0xca6.  This defaults to 1.
 390The regsizes parameter gives the size of a register, in bytes.  The
 391data used by IPMI is 8-bits wide, but it may be inside a larger
 392register.  This parameter allows the read and write type to specified.
 393It may be 1, 2, 4, or 8.  The default is 1.
 395Since the register size may be larger than 32 bits, the IPMI data may not
 396be in the lower 8 bits.  The regshifts parameter give the amount to shift
 397the data to get to the actual IPMI data.
 399The slave_addrs specifies the IPMI address of the local BMC.  This is
 400usually 0x20 and the driver defaults to that, but in case it's not, it
 401can be specified when the driver starts up.
 403The force_ipmid parameter forcefully enables (if set to 1) or disables
 404(if set to 0) the kernel IPMI daemon.  Normally this is auto-detected
 405by the driver, but systems with broken interrupts might need an enable,
 406or users that don't want the daemon (don't need the performance, don't
 407want the CPU hit) can disable it.
 409If unload_when_empty is set to 1, the driver will be unloaded if it
 410doesn't find any interfaces or all the interfaces fail to work.  The
 411default is one.  Setting to 0 is useful with the hotmod, but is
 412obviously only useful for modules.
 414When compiled into the kernel, the parameters can be specified on the
 415kernel command line as:
 417  ipmi_si.type=<type1>,<type2>...
 418       ipmi_si.ports=<port1>,<port2>... ipmi_si.addrs=<addr1>,<addr2>...
 419       ipmi_si.irqs=<irq1>,<irq2>... ipmi_si.trydefaults=[0|1]
 420       ipmi_si.regspacings=<sp1>,<sp2>,...
 421       ipmi_si.regsizes=<size1>,<size2>,...
 422       ipmi_si.regshifts=<shift1>,<shift2>,...
 423       ipmi_si.slave_addrs=<addr1>,<addr2>,...
 424       ipmi_si.force_kipmid=<enable1>,<enable2>,...
 425       ipmi_si.kipmid_max_busy_us=<ustime1>,<ustime2>,...
 427It works the same as the module parameters of the same names.
 429By default, the driver will attempt to detect any device specified by
 430ACPI, and if none of those then a KCS device at the spec-specified
 4310xca2.  If you want to turn this off, set the "trydefaults" option to
 434If your IPMI interface does not support interrupts and is a KCS or
 435SMIC interface, the IPMI driver will start a kernel thread for the
 436interface to help speed things up.  This is a low-priority kernel
 437thread that constantly polls the IPMI driver while an IPMI operation
 438is in progress.  The force_kipmid module parameter will all the user to
 439force this thread on or off.  If you force it off and don't have
 440interrupts, the driver will run VERY slowly.  Don't blame me,
 441these interfaces suck.
 443Unfortunately, this thread can use a lot of CPU depending on the
 444interface's performance.  This can waste a lot of CPU and cause
 445various issues with detecting idle CPU and using extra power.  To
 446avoid this, the kipmid_max_busy_us sets the maximum amount of time, in
 447microseconds, that kipmid will spin before sleeping for a tick.  This
 448value sets a balance between performance and CPU waste and needs to be
 449tuned to your needs.  Maybe, someday, auto-tuning will be added, but
 450that's not a simple thing and even the auto-tuning would need to be
 451tuned to the user's desired performance.
 453The driver supports a hot add and remove of interfaces.  This way,
 454interfaces can be added or removed after the kernel is up and running.
 455This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a
 456write-only parameter.  You write a string to this interface.  The string
 457has the format:
 458   <op1>[:op2[:op3...]]
 459The "op"s are:
 460   add|remove,kcs|bt|smic,mem|i/o,<address>[,<opt1>[,<opt2>[,...]]]
 461You can specify more than one interface on the line.  The "opt"s are:
 462   rsp=<regspacing>
 463   rsi=<regsize>
 464   rsh=<regshift>
 465   irq=<irq>
 466   ipmb=<ipmb slave addr>
 467and these have the same meanings as discussed above.  Note that you
 468can also use this on the kernel command line for a more compact format
 469for specifying an interface.  Note that when removing an interface,
 470only the first three parameters (si type, address type, and address)
 471are used for the comparison.  Any options are ignored for removing.
 474Other Pieces
 477Get the detailed info related with the IPMI device
 480Some users need more detailed information about a device, like where
 481the address came from or the raw base device for the IPMI interface.
 482You can use the IPMI smi_watcher to catch the IPMI interfaces as they
 483come or go, and to grab the information, you can use the function
 484ipmi_get_smi_info(), which returns the following structure:
 486struct ipmi_smi_info {
 487        enum ipmi_addr_src addr_src;
 488        struct device *dev;
 489        union {
 490                struct {
 491                        void *acpi_handle;
 492                } acpi_info;
 493        } addr_info;
 496Currently special info for only for SI_ACPI address sources is
 497returned.  Others may be added as necessary.
 499Note that the dev pointer is included in the above structure, and
 500assuming ipmi_smi_get_info returns success, you must call put_device
 501on the dev pointer.
 507A watchdog timer is provided that implements the Linux-standard
 508watchdog timer interface.  It has three module parameters that can be
 509used to control it:
 511  modprobe ipmi_watchdog timeout=<t> pretimeout=<t> action=<action type>
 512      preaction=<preaction type> preop=<preop type> start_now=x
 513      nowayout=x ifnum_to_use=n
 515ifnum_to_use specifies which interface the watchdog timer should use.
 516The default is -1, which means to pick the first one registered.
 518The timeout is the number of seconds to the action, and the pretimeout
 519is the amount of seconds before the reset that the pre-timeout panic will
 520occur (if pretimeout is zero, then pretimeout will not be enabled).  Note
 521that the pretimeout is the time before the final timeout.  So if the
 522timeout is 50 seconds and the pretimeout is 10 seconds, then the pretimeout
 523will occur in 40 second (10 seconds before the timeout).
 525The action may be "reset", "power_cycle", or "power_off", and
 526specifies what to do when the timer times out, and defaults to
 529The preaction may be "pre_smi" for an indication through the SMI
 530interface, "pre_int" for an indication through the SMI with an
 531interrupts, and "pre_nmi" for a NMI on a preaction.  This is how
 532the driver is informed of the pretimeout.
 534The preop may be set to "preop_none" for no operation on a pretimeout,
 535"preop_panic" to set the preoperation to panic, or "preop_give_data"
 536to provide data to read from the watchdog device when the pretimeout
 537occurs.  A "pre_nmi" setting CANNOT be used with "preop_give_data"
 538because you can't do data operations from an NMI.
 540When preop is set to "preop_give_data", one byte comes ready to read
 541on the device when the pretimeout occurs.  Select and fasync work on
 542the device, as well.
 544If start_now is set to 1, the watchdog timer will start running as
 545soon as the driver is loaded.
 547If nowayout is set to 1, the watchdog timer will not stop when the
 548watchdog device is closed.  The default value of nowayout is true
 549if the CONFIG_WATCHDOG_NOWAYOUT option is enabled, or false if not.
 551When compiled into the kernel, the kernel command line is available
 552for configuring the watchdog:
 554  ipmi_watchdog.timeout=<t> ipmi_watchdog.pretimeout=<t>
 555        ipmi_watchdog.action=<action type>
 556        ipmi_watchdog.preaction=<preaction type>
 557        ipmi_watchdog.preop=<preop type>
 558        ipmi_watchdog.start_now=x
 559        ipmi_watchdog.nowayout=x
 561The options are the same as the module parameter options.
 563The watchdog will panic and start a 120 second reset timeout if it
 564gets a pre-action.  During a panic or a reboot, the watchdog will
 565start a 120 timer if it is running to make sure the reboot occurs.
 567Note that if you use the NMI preaction for the watchdog, you MUST NOT
 568use the nmi watchdog.  There is no reasonable way to tell if an NMI
 569comes from the IPMI controller, so it must assume that if it gets an
 570otherwise unhandled NMI, it must be from IPMI and it will panic
 573Once you open the watchdog timer, you must write a 'V' character to the
 574device to close it, or the timer will not stop.  This is a new semantic
 575for the driver, but makes it consistent with the rest of the watchdog
 576drivers in Linux.
 579Panic Timeouts
 582The OpenIPMI driver supports the ability to put semi-custom and custom
 583events in the system event log if a panic occurs.  if you enable the
 584'Generate a panic event to all BMCs on a panic' option, you will get
 585one event on a panic in a standard IPMI event format.  If you enable
 586the 'Generate OEM events containing the panic string' option, you will
 587also get a bunch of OEM events holding the panic string.
 590The field settings of the events are:
 591* Generator ID: 0x21 (kernel)
 592* EvM Rev: 0x03 (this event is formatting in IPMI 1.0 format)
 593* Sensor Type: 0x20 (OS critical stop sensor)
 594* Sensor #: The first byte of the panic string (0 if no panic string)
 595* Event Dir | Event Type: 0x6f (Assertion, sensor-specific event info)
 596* Event Data 1: 0xa1 (Runtime stop in OEM bytes 2 and 3)
 597* Event data 2: second byte of panic string
 598* Event data 3: third byte of panic string
 599See the IPMI spec for the details of the event layout.  This event is
 600always sent to the local management controller.  It will handle routing
 601the message to the right place
 603Other OEM events have the following format:
 604Record ID (bytes 0-1): Set by the SEL.
 605Record type (byte 2): 0xf0 (OEM non-timestamped)
 606byte 3: The slave address of the card saving the panic
 607byte 4: A sequence number (starting at zero)
 608The rest of the bytes (11 bytes) are the panic string.  If the panic string
 609is longer than 11 bytes, multiple messages will be sent with increasing
 610sequence numbers.
 612Because you cannot send OEM events using the standard interface, this
 613function will attempt to find an SEL and add the events there.  It
 614will first query the capabilities of the local management controller.
 615If it has an SEL, then they will be stored in the SEL of the local
 616management controller.  If not, and the local management controller is
 617an event generator, the event receiver from the local management
 618controller will be queried and the events sent to the SEL on that
 619device.  Otherwise, the events go nowhere since there is nowhere to
 620send them.
 626If the poweroff capability is selected, the IPMI driver will install
 627a shutdown function into the standard poweroff function pointer.  This
 628is in the ipmi_poweroff module.  When the system requests a powerdown,
 629it will send the proper IPMI commands to do this.  This is supported on
 630several platforms.
 632There is a module parameter named "poweroff_powercycle" that may
 633either be zero (do a power down) or non-zero (do a power cycle, power
 634the system off, then power it on in a few seconds).  Setting
 635ipmi_poweroff.poweroff_control=x will do the same thing on the kernel
 636command line.  The parameter is also available via the proc filesystem
 637in /proc/sys/dev/ipmi/poweroff_powercycle.  Note that if the system
 638does not support power cycling, it will always do the power off.
 640The "ifnum_to_use" parameter specifies which interface the poweroff
 641code should use.  The default is -1, which means to pick the first one
 644Note that if you have ACPI enabled, the system will prefer using ACPI to
 645power off.