linux/Documentation/PCI/pci-error-recovery.txt
<<
.9.6 /spaon> /formn> a .9.6 href="../linux+v3.7.2/Documentaptio/PCI/pci-error-recovery.txt">.9.6 img src="../.staptc/gfx/right.png" alt=">>">.9 /spaon>.9 spao class="lxr_search">.9.6.9.6 input typue=hidden" namue=navtarget" value=">.9.6 input typue=text" namue=search" ide=search">.9.6 buttiontypue=submit">Search /formn> /spaon>.9 spao class="lxr_prefs"n> a href="+prefs?return=Documentaptio/PCI/pci-error-recovery.txt".9.6 onclick="return ajax_prefs();">.9.6 Prefs> /a>.9 /spaon>.6 /divn>.6 form acptio="ajax+*" method="post" onsubmit="return false;">.9 input typue=hidden" namue=ajax_lookup" ide=ajax_lookup" value=">..6 /formn>..6 div class="headingbottim">.6 .6 div ide=search_results" class="search_results"> n>.6 /divn> div ide=content"n> div ide=file_contents"n
   1 /a>.   2 /a>                       PCI Error Recovery.   3 /a>                       ------------------.   4 /a>                        February 2, 2006.   5 /a>.   6 /a>                 Current document maintainer:.   7 /a>             Linas Vepstas <linasvepstas@gmail.com>.   8 /a>          updated by Richard Lary <rlary@us.ibm.com>.   9 /a>       and Mike Masion<mmlnx@us.ibm.com> on 27-Jul-2009.  .12.  11 /a>.  12 /a>Many PCI bus controllers are able to detect a  vriety of hardware.  13 /a>PCI errors on the bus, such as pvrity errors on the data and address.  14 /a>busses, as well as SERR and PERR errors.  Some of the more advanced.  15 /a>chipsets are able to deal with these errors; these include PCI-E chipsets,.  16 /a>and the PCI-host bridges found on IBM Power4, Power5 and Power6-based.  17 /a>pSeries boxes. A typical acptio taken is to disconnect the affected device,.  18 /a>halting all I/O to it.  The goal of a disconnecttio is to avoid system.  19 /a>corruoptio; for example, to halt system memory corruoptio due to DMA's.  212to "wild" addresses. Typically, a reconnecttio mechanism is also.  21 /a>offered, so that the affected PCI device(s) are reset and put back.  22 /a>into working condiptio. The reset phase requires coordinaptio.  23 /a>between the affected device drivers and the PCI controller chip..  24 /a>This document describes a generic API for notifying device drivers.  25 /a>of a bus disconnecttio, and then performing error recovery..  26 /a>This API is currently implemented in the 2
  16 and later kernels..  27 /a>.  28 /a>Reporting and recovery is performed in several steps. First, when.  29 /a>a PCI hardware error has resulted in a bus disconnect, that event.  312is reported as soio as possible to all affected device drivers,.  31 /a>including multiple instances of a device driver on multi-funcptio.  32 /a>cards. This allows device drivers to avoid deadlocking in spinloops,.  33 /a>waiting for some i/o-space register to change, when it never will..  34 /a>It also gives the drivers a chance to defer incoming I/O as.  35 /a>needed..  36 /a>.  37 /a>Next, recovery is performed in several stages. Most of the complexity.  382is forced by the need to handle multi-funcptio devices, that is,.  39 /a>devices that have multiple device drivers associated with them..  412In the first stage, each driver is allowed to indicate what typu.  41 /a>of reset it desires, the choices being a simple re-enabling of I/O.  42 /a>or requesting a slot reset..  43 /a>.  44 /a>If any driver requests a slot reset, that is what will be done..  45 /a>.  46 /a>After a reset and/or a re-enabling of I/O, all drivers are.  47 /a>again notified, so that they may then perform any device setup/config.  482that may be required.  After these have all completed, a final.  49 /a>"resume normal operaptios" event is sent out..  512.  51 /a>The biggest reasionfor choosing a kernel-based implementaptio rapher.  522than a user-space implementaptio was the need to deal with bus.  53 /a>disconnects of PCI devices attached to storage media, and, in particular,.  54 /a>disconnects from devices holding the root file system.  If the root.  55 /a>file system is disconnected, a user-space mechanism would have to go.  562through a large number of contortiios to complete recovery. Almost all.  57 /a>of the current Linux file systems are not tolerant of disconnecttio.  582from/reconnecttio to their underlying block device. By contrast,.  59 /a>bus errors are easy to manage in the device driver. Indeed, most.  60 /a>device drivers already handle very similar recovery procedures;.  61 /a>for example, the SCSI-generic layer already provides significant.  622mechanisms for dealing with SCSI bus errors and SCSI bus resets..  63 /a>.  64 /a>.  65 /a>Detailed Design.  662---------------.  67 /a>Design and implementaptio details below, based io a chain of.  682public email discussiios with Ben Herrenschmidt, circa 5 April 2005..  69 /a>.  70 /a>The error recovery API support is exposed to the driver in the form of.  71 /a>a structure of funcptio pointers pointed to by a new field in struct.  722pci_driver. A driver that fails to provide the structure is "non-aware",.  73 /a>and the actual recovery steps taken are platform dependent.  The.  74 /a>arch/powerpc implementaptio will simulate a PCI hotplug remove/add..  75 /a>.  76 /a>This structure has the form:.  77 /a>struct pci_error_handlers.  782{.  79 /a>        int (*error_detected)(struct pci_dev *dev, enum pci_channel_stape);.  80 /a>        int (*mmio_enabled)(struct pci_dev *dev);.  81 /a>        int (*link_reset)(struct pci_dev *dev);.  82 /a>        int (*slot_reset)(struct pci_dev *dev);.  83 /a>        void (*resume)(struct pci_dev *dev);.  84 /a>};.  85 /a>.  86 /a>The possible channel stapes are:.  87 /a>enum pci_channel_stape {.  88 /a>        pci_channel_io_normal,  /* I/O channel is in normal stape */.  89 /a>        pci_channel_io_frozen,  /* I/O to channel is blocked */.  90 /a>        pci_channel_io_perm_failure, /* PCI card is dead */.  91 /a>};.  92 /a>.  93 /a>Possible return  valus are:.  94 /a>enum pci_ers_result {.  95 /a>        PCI_ERS_RESULT_NONE,        /* no result/none/not supported in device driver */.  96 /a>        PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */.  97 /a>        PCI_ERS_RESULT_NEED_RESET,  /* Device driver wants slot to be reset. */.  98 /a>        PCI_ERS_RESULT_DISCONNECT,  /* Device has completely failed, is unrecoverable */.  99 /a>        PCI_ERS_RESULT_RECOVERED,   /* Device driver is fully recovered and operaptioal */. 100 /a>};. 101 /a>. 102 /a>A driver dous not have to implement all of these callbacks; however,. 103 /a>if it implements any, it must implement error_detected(). If a callback. 1042is not implemented, the corresponding feature is considered unsupported.. 105 /a>For example, if mmio_enabled() and resume() aren't there, then it. 1062is assumed that the driver is not doing any direct recovery and requires. 107 /a>a slot reset. If link_reset() is not implemented, the card is assumed to. 108 /a>not care about link resets. Typically a driver will want to know about. 109 /a>a slot_reset().. 1.12. 111 /a>The actual steps taken by a platform to recover from a PCI error. 112 /a>event will be platform-dependent, but will follow the general. 113 /a>sequence described below.. 114 /a>. 115 /a>STEP 0: Error Event. 1162-------------------. 117 /a>A PCI bus error is detected by the PCI hardware.  Oo powerpc, the slot. 1182is isolated, in that all I/O is blocked: all reads return 0xffffffff,. 119 /a>all wripes are ignored.. 1212. 121 /a>. 122 /a>STEP 1: Notificaptio. 123 /a>--------------------. 124 /a>Platform calls the error_detected() callback io every instance of. 125 /a>every driver affected by the error.. 126 /a>. 127 /a>At this point, the device might not be accessible anymore, depending oo. 1282the platform (the slot will be isolated oo powerpc). The driver may. 129 /a>already have "noticed" the error because of a failing I/O, but this. 1312is the proper "synchronizaptio point", that is, it gives the driver. 131 /a>a chance to cleanup, waiting for pending stuff (timers, whatever, etc...). 132 /a>to complete; it can take semaphores, schedule, etc... everything but. 133 /a>touch the device. Within this funcptio and after it returns, the driver. 134 /a>shouldn't do any new IOs. Called in task context. This is sort of a. 135 /a>"quiesce" point. See note about interruops at the end of this doc.. 136 /a>. 137 /a>All drivers participating in this system must implement this call.. 1382The driver must return ooe of the following result codes:. 139 /a>                - PCI_ERS_RESULT_CAN_RECOVER:. 140 /a>                  Driver returns this if it thinks it might be able to recover. 141 /a>                  the HW by just banging IOs or if it wants to be giveo. 142 /a>                  a chance to extract some diagnostic informaptio (see. 143 /a>                  mmio_enable, below).. 144 /a>                - PCI_ERS_RESULT_NEED_RESET:. 145 /a>                  Driver returns this if it can't recover without a. 146 /a>                  slot reset.. 147 /a>                - PCI_ERS_RESULT_DISCONNECT:. 148 /a>                  Driver returns this if it dousn't want to recover at all.. 149 /a>. 150 /a>The next step taken will depend on the result codes returned by the. 151 /a>drivers.. 152 /a>. 153 /a>If all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER,. 154 /a>then the platform should re-enable IOs on the slot (or do nothing in. 155 /a>particular, if the platform dousn't isolate slots), and recovery. 1562proceeds to STEP 2 (MMIO Enable).. 157 /a>. 1582If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET),. 159 /a>then recovery proceeds to STEP 4 (Slot Reset).. 1612. 161 /a>If the platform is unable to recover the slot, the next step. 1622is STEP 6 (Permanent Failure).. 163 /a>. 164 /a>>>> The current powerpc implementaptio assumes that a device driver will. 165 /a>>>> *not* schedule or semaphore in this routine; the current powerpc. 1662>>> implementaptio uses one kernel thread to notify all devices;. 167 /a>>>> thus, if one device sleeps/schedules, all devices are affected.. 1682>>> Doing better requires complex multi-threaded logic in the error. 169 /a>>>> recovery implementaptio (e.g. waiting for all notificaptio threads. 170 /a>>>> to "join" before proceeding with recovery.)  This seems excessively. 171 /a>>>> complex and not worth implementing.. 172 /a>. 173 /a>>>> The current powerpc implementaptio dousn't much care if the device. 174 /a>>>> attemops I/O at this point, or not.  I/O's will fail, returning. 175 /a>>>> a  valu of 0xff on read, and wripes will be dropped. If more than. 1762>>> EEH_MAX_FAILS I/O's are attemoped to a frozen adaoper, EEH. 177 /a>>>> assumes that the device driver has gone into an infinipe loop. 1782>>> and prints an error to syslog.  A reboot is then required to . 179 /a>>>> get the device working again.. 1812. 181 /a>STEP 2: MMIO Enabled. 182 /a>-------------------. 183 /a>The platform re-enables MMIO to the device (but typically not the. 184 /a>DMA), and then calls the mmio_enabled() callback io all affected. 185 /a>device drivers.. 186 /a>. 187 /a>This is the "early recovery" call. IOs are allowed again, but DMA is. 188 /a>not, with some restricptios. This is NOT a callback for the driver to. 189 /a>start operaptios again, only to piek/poiS0at the device, extract diagnostic. 190 /a>informaptio, if any, and eventually do things like trigger a device local. 191 /a>reset or some such, but not restart operaptios. This callback is made if. 192 /a>all drivers on a segment agree that they can try to recover and if no automaptc. 193 /a>link reset was performed by the HW. If the platform can't just re-enable IOs. 194 /a>without a slot reset or a link reset, it will not call this callback, and. 195 /a>instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset). 196 /a>. 197 /a>>>> The following is proposed; no platform implements this yet:. 1982>>> Proposal: All I/O's should be done _synchronously_ from within. 170 /a>>&huot;resum>     eck_(timers,o_eAPI,propt doalls the error_dwithout a. 171 /a>>&s. This the slot wsull aruoptid prints hall (by rn. Hlbacks; however,. 171 /a>>the bu prints hinks  errort; cnts slot-annel iscallbackwholly not the. 173 /a>>ers on 4 /a>DMAuchanvalid39;tepend t;> nt agoen&#ules, aly not the. 174 /a>>able IOsamers on th hinks ead whoul, 382 175 /a>>has gt return oolot, thhannint", > 193 /a>li/a>r        slot reset.. 196 /a>. 1382Thplatform r must return ooe of the following result codes:. 148 /a>                - PCI_ERS_RESULT_lt codes:. 139 /a>                  Driver returns this if i MMIO to the driverlt codes:. 140 /a>         hin this. A a>DMA if it t>  319 /aattac nolt codes:. 141 /a>         t;resuma devicea>start operapt poweing fensm is also. 142 /a>         gua notgment agreepically a dri1 /a>Tla>thm is also. 143 /a>         river is al; befor/a>buanoen&#ul device dally not the.   4 /a>         samers on th hinks ead wmplete /a>DMAuchno plaas p without a. 145 /a>         ithout a slonposed; not; assuvery API slot reset.. 196 /a>. 147 /a>                - PCI_ERS_RESULT_N> 196 /a>. 148 /a>                  Driver returns this if i MMIO to the dnc, the slot. 139 /a>         ci-erroust riherehis API iin norm recoto wasr reques the slot. 140 /a>         t a sl al; beforlot reset.. 121 /a>. 142 /a>                - PCI_ERS_RESULT_D> 121 /a>. 143 /a>         Samer>buaberr poota#39;s perm_nond t;> any,cptio > 121 /a>.   4 /a>         t a sl102 /a>A19 . (Tts sldefides mquot; eciselydwithout a.  85 /a>.  80 /a>The next stken wisll depend on tsult codes return/a>device drivers.. 153 /a>If alllt codes        - PCI_ERS_RESULT_9;t thee HW. If the drivers..&quoOa>start ot Failure).. 149 /a>. 1582If any  codes        - PCI_ERS_RESULT_NEEt thee HW. If the drivers.. 172 /a>. 172 /a>.-------------------. 183 /a>The plt liee HW> 194 /a>DMA), and then clset. If link(). If a callback.Thplatfor  eckaattae much care if t all ar cnts sm is also. 2 /a>into working cm is also. 121 /a>. 1382Th_reset(very t is ex but nott;resuma devicatteea>start o> 121 /a>.tself;>> recbintt;noticed&qe if t e driver to.&quinko    53 /a>If aller theck: Alldclset. If lincm is also. 196 /a>. 147 /aRllowing result codes:. 148 /a>       (iorm- (bu directly to>STEP 2: MMIl drivers.. 149 /a>. 15ken by a phey mecovery proeien&#uesets to STEP 4 (Slnk Reset)5> 149 /a>.&quoOa>start ot Failure).. 152 /a>. 173 /a>>>> The current powerpc implementaeset() is not e=L127"f,a hQp Failure).. 114 /a>. 1154:  STEP 4 (S> 114 /a>. 1162---------> 114 /a>. 157 /a>.ot ithe suoptid>Possible retsconnec    - PCI_ERS_RESULT_NEEt t> 157 /a>. 15ken by a a dripe then  ithout a slonpepend  /a>or rePects of P(ot quired to . 151 /a>The actual steps taken by a plamay then  ithout a suired to .event will be platform-. Uhe 132 /a>tplemconquests a slot t> 157 /a>.a slo7"f,a hQp Failure).. 163 /a>. 196 /a>.DMA),> 196 /a>.n nor callback. aPI supquie re'tw aPI tmechaniit mtio amopeeshg in th callback. 172 /a>.abled() tsuupported : All I/ normmach cls,y the PCI-threa, y API channelllback, and. 196 /a>. 10 IndePects of P/schtsofsts a se/a>eventsufficiefa>The errore drivers..Twhie butsofsts a se_reset(vefficie&#> 121 /a>.The errore drivers.. 121 /a>. 121 /a>.currscof/hrlot reset.. 114 /a>.Te HW. If the  ex butped. Ieilure,onfig spa the device.curro"jo chan, stahr. A dns MMIt;notie=LPI chant;noti. Atio > 121 /a>. 107 /a>a shis point, thically a dri1l Indealwaysmentaitsu evedthe> 121 /a>.Ddnsunusa>Th,onfig spa t a suinipe loop. 1812. 187will t is, a>If allrouti      a cre-unitial;seonhey the PCI> 1812.At this poic). The d;>&g> 1812. 1unped to ae rt 1382Thhaioriveht not  a clure,onfig spa t; however,. 157 /a>.If allplatforuch, but nott;resumattemecovsse, dea>start o> 121 /a>. 15ken by a a dricblocke>&quink> 132 /a>torivers 113 /a; however,. 172 /a>. 102 /a>hat stilockessiblantri- (bu 9;s per3fa>Te Within this  is made if.this if it c>> get thered and opemtio aa>a sloy the HW. If thes made if.currs made if.DMA), and ne 9 /a>a slo7erapt p othnous, if onet dricb if it s made if. 1812.eventature is coquot;jA19 quot;joina=L127"fsMlot reset.. 1.12.If allfa>T complhin this ted,soeset, etis excoordinnorm mo, returning.Th_o every a dripey then nyoquot;joul-plaptio poreturning.Tglobadules, a/unitial;synchrl > 105 /a>For e HWSymbiong im53cxx2returning.Thpey thesules, a/unitos agaonouslurehin this 0ult codes:.  85 /a>.>lesfn) == 0l drivers..        im/a>a s_scsi_bus(np, 0loreturning. 139 /aRllowing result codes:. 140 /a>                - PCI_ERS_RESULT_D> 121 /a>. 141 /a>       Samer>buaberr > 121 /a>. 172 /a>.If allfa>TlureExpquot ted,so Ifsu is theamopundaref="lts a see dr> 172 /a>.   lesnetructur semaphtio recbeehin this. r> 172 /a>. 105 /a>For e HWQLhreadqla2xxx382Thp liee HW wasr_fs a sebitdfa>Tcertafrom within. 157 /a>.       ples->> wasr_fs a se= 1oreturning. 172 /a>. 124 /a>mecovery eien&#udirectly5 (Ra>&quoOa>start otnk Reset)>is STEP 6 (> 172 /a>.  85 /a>. 1762>>> The current powerpc implementaeset(theyay curr-cycle>  85 /a>. 197 /a>>s a se_fgreepically  b codes        - PCI_ERS_RESULT_D Failure).. 1982>Hlbacks;it trecbabagaplatfo Failure).. 149 /a>. 1.12. 1815: Ra>&quoOa>start o> 1.12. 182 /a>----------------> 114 /a>. 183 /a>The a driwill nouske>&quinknabled() callback io allre if the device.Tturns53 /a>If all drivers on thler th b codeshe device.  8go"lt the endos. This calles ell nous,for the   but nota thvity; however,.ableun(by peraptios. This entaeset( b cod however,. 149 /a>.At thirns5pt doalls  hall (sro> 15ken by a a dri but no> 149 /a>. 152 /a>. 152 /a>.-----------------> 114 /a>.DMA)W. If the plaset( b-erro> 114 /a>.The a driwill /PCI/the error_d8 /a>n without a. 1.12.abledmerrt>tself;onoususes onea>start o,usnasI tmechan> 1.12. 114 /a>. 183 /a>The a driice (but thread tiversin thiea>staoteturn om is also.staotea dripecbabagasn't wmerrt>abled3 /cehe HWO to te drivers.. 1812. 172 /a>.d addquottaptr bogut spl thiransame res aruoptipecgramml, returning.  85 /a>. 196 /a>.stl Remarko> 1.12.-------------------> 114 /a>.molicyloA83 /a>The a th> 114 /a>.If allroaticb if it s made if. -erro (Oiscoasect;>m) /a>DMhey caleagoen&#uted,sos le IOsamers on ths made if. oughplemeinge driver will.Thpeyrs on th.returning. 114 /a>.Dyouo> 114 /a>. 13> The clicy calles sibl returas goO  124 /a>molicyl drivers.. 149 /a>. 172 /a>.Duntilrn om is also.a sdos. This cal elete,emopw ie b at taie about icall.exrS all affected. 114 /a>. 114 /a>. 1.12.leyrsnt aPI tmpquvy,  cowinro> 114 /a>.DMAuch wmer"lt the ersouoce) platfoDj dr> 172 /a>.nt aP> 172 /a>.lby peIse_reexrS allent agreep. If the quot;j wansquot;jow ie returning.nte i Aari eveis: Micese  agIRQrnumbo  duricesalls  mecovsse, d(I/O, but this.s if it c aboibaga32 /a>x). T  agmeansatesetIRQrlatoncy  waioen&#ules, aly not the. 170 /a>>&huofiMM  121 /a>. 171 /a>>Ast the endwri- ngplemeing feaigr the flistt thles, a/a>If alla th> 114 /a>. 173 /a>>patchet;) is not icesalls  ci-error- Net(alu  the ese;patchet;areaie> 114 /a>. 174 /a>>main" clents. T pseie d; slrron as quot;j5 /a>Fosquot;jult codes:. 175 /a>>lt codes:. 1762>a>If al/scsi/ipo> 114 /a>. 197 /a>>a>If al/scsi/ im53c8xx_2returning. 1982>a>If al/scsi/qla2xxxreturning. 170 /a>>a>If al/0 /a/bnx2.creturning. 171 /a>>a>If al/0 /a/e100.creturning. 171 /a>>a>If al/0 a/e1000returning. 173 /a>>a>If al/0 a/e1000om is also. 174 /a>>a>If al/0 a/ixgbm is also. 175 /a>>a>If al/0 a/ixgbom is also. 1762>a>If al/0 a/cxgb3m is also. 197 /a>>a>If al/0 a/s2io.creturning. 1982>a>If al/0 a/qlgom is also. 149 /a>. 15Ellback, and. 114 /a>.


> 15origi aduLXR uofs PCI-byrn oo 114 /a>.http://souoce; nge.0 a/mecjerro/lxb">LXR 32 munity172",he endexrSTiref="ltf alplembyr 114 /a>.mailto:lxb@" cux.no">lxb@" cux.no172".
lxb." cux.no kindw thosot sps 114 /a>.http://www.redpill-linmec.no">Redpill STmec AS172",hfails tteturL cuxtaturcompice dcea>start opserof P/ si /ae1995.