linux/Documentation/lockdep-design.txt
<<
"v3v > /spa12" > /form2" > a "v3v > href="../linux+v3.6/Documenta.4.1/lockdep-design.txt">"v3v > img src="../.sta.4c/gfx/right.png" alt=">>">"v /spa12""v spa1 class="lxr_search">"v3v"v3v > input typopthidden" namoptnavtarget" 12"v3v > input typopttext" namoptsearch" idptsearch">"v3v > butt.4.typoptsubmit">Search /form2" /spa12""v spa1 class="lxr_prefs"2" > a href="+prefs?return=Documenta.4.1/lockdep-design.txt""v3v > onclick="return ajax_prefs();">"v3v >Prefs" > /a>"v /spa12"3v > > /div2"3v > > form ac.4.1="ajax+*" method="post" onsubmit="return false;">"v input typopthidden" namoptajax_lookup" idptajax_lookup" 12"3v > > /form2""3v > > div class="headingbott.m"> > > div idptsearch_results" class="search_results"" 2"3v > > /div2" div idptcontent"2" div idptfile_contents"2
> >1
/a>Runtime locking correctness 12> >2
/a>=====================================
> >3
/a>"> >4
/a>started by Ingo Molnar <mingo@redhat.com>"> >5
/a>addi.4.1s by Arja4.12n de Ven <arja4@linux.intel.com>"> >6
/a>"> >7
/a>Lock-class"> >8
/a>----------"> >9
/a>"> .627a>The basic object the 12> 11
/a>"> 12
/a>A class of locks.is a group of locks.that are logically the samo with"> 13
/a>respect to locking rules, even if the locks.may have multiple (possibly"> 14
/a>tens of thousands of) instantia.4.1s. For example a lock in the inode"> 15
/a>struct is one class, while each inode has.its own instantia.4.1 of that"> 16
/a>lock class."> 17
/a>"> 1827a>The 12> 19
/a>dependencies between different lock-classes. The 12> 2627a>rolling proof that the sta.e and.the dependencies are correct."> 21
/a>"> 22
/a>Unlik627a lock inumenta.4.1/,if the lo-class.itself never goes away: when"> 23
/a>ahe lo-class.is used for t thfirst time after bootup it gets registered,"> 24
/a>and.all subsequent uses of thathe lo-class.will be attached to this"> 25
/a>lock-class."> 26
/a>"> 27
/a>Sta.e"> 28
/a>-----"> 29
/a>"> 3027a>The 12> 31
/a>"> 32
/a>- 'ever held in STATE context'"> 33
/a>- 'ever held as readlock in STATE context'"> 34
/a>- 'ever held with STATE enabled'"> 35
/a>- 'ever held as readlock with STATE enabled'"> 36
/a>"> 37
/a>Where STATE c7a be either one of (kernel/lockdep_sta.es.h)"> 38
/a> - hardirq"> 39
/a> - softirq"> 40
/a> - reclaim_fs"> 41
/a>"> 42
/a>- 'ever used'                                       [ == !unused        ]"> 43
/a>"> 44
/a>When locking rules are viola.ed,if tse sta.e bits are presented in the"> 45
/a>locking error messages, inuide curlies. A contrived example:"> 46
/a>"> 47
/a>   modprobe/2287.is trying to acquirthe lo:"> 48
/a>    (&sio_locks[i].lock){-.-...}, at: [<c02867fd>] mutex_lock+0x21/0x24"> 49
/a>"> 50
/a>   but task.is already holdinghe lo:"> 51
/a>    (&sio_locks[i].lock){-.-...}, at: [<c02867fd>] mutex_lock+0x21/0x24"> 52
/a>"> 53
/a>"> 5427a>The bit posi.4.1 indicates STATE, STATE-read, for each of the sta.es.eisted"> 55
/a>above, and.the character displayed in each indicates:"> 56
/a>"> 57
/a>   '.'  acquirtd while irqs disabled and.not in irq context"> 58
/a>   '-'  acquirtd in irq context"> 59
/a>   '+'  acquirtd with irqs enabled"> 60
/a>   '?'  acquirtd in irq context with irqs enabled."> 61
/a>"> 62
/a>Unused mutexes.c7anot be part of the cause of 7a error."> 63
/a>"> 64
/a>"> 65
/a>Single-lock sta.e rules:"> 66
/a>------------------------"> 67
/a>"> 68
/a>A softirq-unsafehe lo-class.is automa.4cally hardirq-unsafehas well. The"> 69
/a>followinghsta.es.are exclusive, and.only one of them.is allowed to be"> 70
/a>set for anyhe lo-class:"> 71
/a>"> 72
/a> <hardirq-safe> and.<hardirq-unsafe>"> 73
/a> <softirq-safe> and.<softirq-unsafe>"> 74
/a>"> 7527a>The 12> 76
/a>single-lock sta.e rules."> 77
/a>"> 78
/a>Multi-lock dependency rules:"> 79
/a>----------------------------"> 80
/a>"> 8127a>The samo e lo-class.must not be acquirtd twice, because this could lead"> 82
/a>to lock recurs4.1 deadlocks."> 83
/a>"> 84
/a>Furthermore, two locks.may not be taken in different order:"> 85
/a>"> 86
/a> <L1> -> <L2>"> 87
/a> <L2> -> <L1>"> 88
/a>"> 89
/a>because this could lead to lock invers4.1 deadlocks. (The 12> 90
/a>finds such dependencies in arbitrary complexity, i.e.if tre c7a be any"> 9127a>other locking sequence between the acquirt-lock operat4.1s, the"> 92
/a>12> 93
/a>"> 94
/a>Furthermore, t thfollowinghusage based lock dependencies are not allowed"> 95
/a>between anyhtwo lock-classes:"> 96
/a>"> 97
/a>   <hardirq-safe>   -> .<hardirq-unsafe>"> 98
/a>   <softirq-safe>   -> .<softirq-unsafe>"> 99
/a>">10027a>The first rule comes from t thfact the a hardirq-safe lock could be">10127a>taken by a hardirq context, interruptingha hardirq-unsafehlock - and">102
/a>thus could result in a lock invers4.1 deadlock. Likewise, a softirq-safe">103
/a>lock could be taken by an softirq context, interruptingha softirq-unsafe">104
/a>lock.">105
/a>">10627a>The above rules are enforced for anyhe loing sequence thathoccurs in the">107
/a>kernel: when acquiringha newhe lo, the 12>108
/a>any rulehviola.4.1 between the newhe lo and.any of the held locks.">109
/a>">1.627a>When ahe lo-class.changes.its sta.e, t thfollowinghaspects of the above">111
/a>dependency rules are enforced:">112
/a>">113
/a>- ifha newhhardirq-safe lock is discovered, we check.whether it">114
/a>  took.any hardirq-unsafehlock in the past.">115
/a>">116
/a>- ifha newhsoftirq-safe lock is discovered, we check.whether it took">117
/a>  any softirq-unsafehe lo in the past.">118
/a>">119
/a>- ifha newhhardirq-unsafehe lo is discovered, we check.whether any">120
/a>  hardirq-safe lock took.it in the past.">121
/a>">122
/a>- ifha newhsoftirq-unsafehe lo is discovered, we check.whether any">123
/a> hsoftirq-safe lock took.it in the past.">124
/a>">125
/a>(Again, we doif tse checks.too on the basis.that an interrupt context">126
/a>could interrupt _any_ of the irq-unsafehor hardirq-unsafehe los, which">127
/a>could lead to a lock invers4.1 deadlock - even if thathe lo scenario did">128
/a>not trigger in practice yet.)">129
/a>">13027a>Exce3.4.1: Nested data dependencies leading to nested e loing">131
/a>-------------------------------------------------------------">132
/a>">133
/a>T tre are a fewhcases w tre the Linux kernel acquirts more than one">134
/a>inumence of the samo e lo-class. Such cases typ4cally happen when f tre">135
/a>is.somo sort of hierarchy within objects of the samo typo. Inif tse">136
/a>cases t tre is an inherent "natural" ordering between the two objects">137
/a>(defined by the properties of the hierarchy), and.the kernel grabs the">138
/a>locks.in this fixed order on each of the objects.">139
/a>">140
/a>An example of such an object hierarchy thathresults.in "nested e loing"">141
/a>is.that of 7 "whole disk" be lo-dev object and.a "parti.4.1" be lo-dev">142
/a>object; the parti.4.1 is "part of" the whole device and.as.e nghas one">143
/a>always takes the whole diskhe lo as a higher lock than the parti.4.1">144
/a>lock,if the lo ordering is fully correct. The 12>145
/a>automa.4cally detect this natural ordering, as f the loing rule behind">146
/a>the ordering is not sta.4c.">147
/a>">148
/a>Iniorder to teach the 12>149
/a>vers4.1s of the varioushe loing primi.4ves wtre addtd that allow you to">150
/a>specify.a "nestingheevel". An example call, for t thblock device mutex,">151
/a>looks.eike this:">152
/a>">153
/a>enum bdev_bd_mutex_lock_class">15427a>{">155
/a>       BD_MUTEX_NORMAL,">156
/a>       BD_MUTEX_WHOLE,">157
/a>       BD_MUTEX_PARTITION">158
/a>};">159
/a>">160
/a> mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);">161
/a>">162
/a>In this case f the loing is done on ahbdev object that is known to be a
>163
/a>parti.4.1.">164
/a>">16527a>The 12>166
/a>separate (sub)class.for t thpurposes of 12>167
/a>">168
/a>Note: When changing code to use f th_nested() primi.4ves, be careful and">169
/a>check.really thoroughly that the hierarchy is correctly mapped; otherwise">170
/a>you c7a get false posi.4ves or false nega.4ves.">171
/a>">172
/a>Proof of 100% correctness:">173
/a>--------------------------">174
/a>">17527a>The 12>176
/a>correctness) in the sense thathfor every simple, standalone single-task">177
/a>e loing sequence thathoccurrtd atheeast once during f theifetime of the">178
/a>kernel, the 12>179
/a>combina.4.1 and.timing of thesehe loing sequences.c7a cause any class.of">180
/a>lock rela.ed deadlock. [*]">181
/a>">182
/a>I.e.icomplex multi-CPU and.multi-task.e loing scenarios do not have to">183
/a>occur in practice to prove a deadlock:.only the simple 'component'">184
/a>e loing chains have tohoccur atheeast once (anytime, in any">185
/a>task/context).for t th12>186
/a>example,icomplex deadlocks that would normally need more than 3 CPUs and">187
/a>a very unlik6ly constella.4.1 of tasks, irq-contexts and.timings to">188
/a>occur, c7a be detected on ahplain, lightlyhe adtd single-CPU system as">189
/a>well!)">190
/a>">19127a>This rad4cally decreases t t complexity of e loing rela.ed QA of the">192
/a>kernel: what has.to be done during QA is.to trigger as.many "simple"">193
/a>single-task e loing dependencies in the kernel as.possible, atheeast">194
/a>once, to prove e loing correctness - inumead of having to trigger every">195
/a>possible combina.4.1 of e loing interac.4.1 between CPUs, combintd with">196
/a>every possible hardirq and.softirq nestinghscenario (which is.impossible">197
/a>to do in practice).">198
/a>">199
/a>[*] assuming f at the 12>200
/a>    part of the system corrupts the sta.e of the va>201
/a>    We also assume that all NMI/SMM paths [which could interrupt">202
/a>    even hardirq-disabled codepaths] are correct and.do not interftre">203
/a>    with the va>204
/a>    12>205
/a>    recurs4.1 must not be higher than 20.">206
/a>">207
/a>Performance:">208
/a>------------">209
/a>">2.627a>The above rules requirth_massive_ amounts of runtime checking. If we did">211
/a>thathfor every lock taken and.for every irqs-enable event, it would">212
/a>render the system practically unusably slow. The complexity of checking">213
/a>is O(N^2),.so even with just a fewhhundred e lo-classes we'd have tohdo">214
/a>tens of thousands of checks.for every event.">215
/a>">21627a>This problem.is solved by checking anyhgiven 'e loing scenario' (unique">217
/a>sequence of locks.taken after each other).only once. A simple stalo of">218
/a>held locks.is maintained, and.a lightweight 64-bit hash 12>219
/a>calcula.ed,iwhich hash is uniquehfor every e lo chain. The hash 12>220
/a>when f t chain is 12>22127a>table, which hash-table c7a be checktd in a lockfree.manner. If the">222
/a>e loing chainhoccurs again la.er on, the hash table tells.us that we">223
/a>dont have to 12>224
/a>">225
/a>Troubleshooting:">226
/a>----------------">227
/a>">22827a>The 12">123
/a>">123
/a>    W   (&DEBUGLOCKDS_WARN_ON(id gt;b=MAX_LOCKDEP_KEYS )">122
/a>">123
/a>TB decfaul, iAX_LOCKDEP_KEYS ns corrtdnlyheet fo 18191 and.thp4call>124
/a>iesiktopsystem  have tlss -han 21,000umenesi14
/r .so e" cleayrnng:>125
/a>iormally nesults.irom t lo-classeeeaskge mr falilur to provpery">126
/a>ciniti>123
/a>">128
/a>l1. W   (Repeted fmodle b adtng anydunliadtng ahile iuntnng f the2>1239/a>    W   (wll tesult in a lo-classeeeaskge . The ns su hele is ahat wach >1240/a>    W   (iadtof the vmodle bwll trease annewhsoetofcumenesi14
/r for >124
/a>    W   (hat wmodle #39;ste los, wut tmodle bnliadtng aoes not"tesmve rod">1242/a>    W   (i14
/r f(seefbeowidiscouss.1 of erese of 7menesi14
/r for ahiy.">1243/a>    W   ( tre fre, tf thathemodle bise adtd snydunliadte.repoeted ly">1244/a>    W   (haenusIer ofcumenesi14
/r bwll tvent.ully nesch the 1aximum ">124
/a>">126
/a>t2. W   (Usnghstauct uts much a  arerys taat hase tlargenusIer .of">1247/a>    W   (iaks.that are lot"texplicilyheiniti>128
/a>    (((((ahash table tith j8192wutktdt w tre tsch tutktdthas.its own >1249/a>    W   (spinock_ntbwll tronume t8192wmenesi14
/r b-nliss -tsch tspinock_>1250/a>    W   (iseexplicilyheiniti>125
/a>    W   (unt-ime cspinlock_ninit()as onposes to aomplie-tame ciniti>1252/a>    W   (uch a  a__SPINLOCKD_UNOCKDED(). Wolilur to provpery"ciniti>1253/a>    W   (haenper-utktdthspinock_swould nguarante e lo-class.mverefow. >1254/a>    W   (Incontrias, ane lopshat aallyd sipinlock_ninit()an each oock_>125
/a>       Bwould nplae anl t8192wmenesinto a hingle-wmenesi14
/ >126
/a>">127
/a>       BThe nmoal orfe" cletory in.that aou cshuld nlways texplicilyh>125
/a>    (((((initi>129
/a>">120
/a> On nmght 6argu that the 62>1261/a>lock ri14
/r bo be drese d. WHowvery tf tou cre lem pts to amae this:>122
/a>Iargumnt, iirst rueviwhs t comd and.thein thaoughlf t chaiges.ihat would >123
/a>pe dreuirtd , keepng intnmgndthat the 6ock ri14
/r bo be dremve dcre >124
/a>"ik6ly co be dinetd in o thi e lo-cdpendency rgraph. Then.thurs oft th">126
/a>bethardir to toif ta to bsy.">126
/a>">127
/a>"Ofcoulrse tf tou coifuntoft tfcumenesi14
/r ,the newx thisg to to in:>128
/a>No tfid.the koffndenghaOCKDsi14
/r t Worst  t thfollowinghaompmnd.tiven:>129
/a>cou toaenusIer ofcumenesi14
/r borrtdnlyhentnse an nghaith the vaximum ">127
/a>">127
/a>    W   (gepo"s lo-classes quot; t/rovclockdep-sta.e:>127
/a>">12737a>This pompmnd.trovducs t t collowinghaft ut in ahpodelt rystem ">124
/a>">127
/a>       Bwlock-classes:"                          748 [axi:18191">127
/a>">127
/a>ef the"nusIer ollowcted f(748 bove  in reases tontrinully nveretime, >128
/a>kten f tre"bise k6ly caeeask. The nollowinghaompmnd.t7a be cues to >129
/a>cienctfy.ahi e asknghaOCKDsi14
/r ">120
/a>">128
/a>    W   (gepo"sBDquot; t/rovclockdep->128
/a>">123
/a>oRunt t compmnd.any sive toe kot ut ,then pomplre a ain.sttoe kot ut irom >124
/a>eala.er ountoffthis corpmnd.ao tienctfy.ahi e askers. Then.tamo tot ut >125
/a>t7a blso ahelptou cfid.tsitu.4.1/ w tre tuntime cOCKDsiniti>126
/a>eben Comittd.">127
/a>a


Te ordiginl oLXRsoftiwre ay the pa href="Dhttp://sulrc frege.net/rovects./lx ">LXRsorpmunity/a>a,this cexperienta.l ers4.1 dy ta href="Dmailto:lx @lnux .no">lx @lnux .no/a>a.
lx .lnux .no kngdy haoted fy ta href="Dhttp://www.td pill-lnurov.no">Rd pillLinurov AS/a>a,troveidr ofcuLnux kronumltng anyduperat4.1s, serice shingce199 5.