linux/Documentation/RCU/stallwarn.txt
<<
="+search" method="post" onsubmit="return do_search(this);"> ">Search ="ajax+*" method="post" onsubmit="return false;"> ">
4 41./a>Using RCU's CPU Stall Detector 4 42./a><4 43./a>The rcu_cpu_stall_suppress module param ter enables RCU's CPU stall<4 44./a>detector, which detects condi > s that unduly delay RCU grace periods.<4 45./a>This module param ter enables CPU stall detect/optby default, but<4 46./a>may be overridden via boot-time param ter or at runtime via sysfs.<4 47./a>The stall detector's idea of what constitutes "unduly delayed" is<4 48./a>controlledtby a set of kernel configura4 49./a><4 on> a>CONFIG_RCU_CPU_STALL_TIMEOUT<4 11./a><4 12./a> This kernel configura4 13./a> that RCU will wait from the beginning of a grace period until it<4 14./a> issues an RCU CPU stall warning. This time period is normally<4 15./a> sixty seconds.<4 16./a><4 17./a> This configura4 18./a> /sys/module/rcutree/param ters/rcu_cpu_stall_timeout, however<4 19./a> this param ter is checkedtonly at the beginning of a cycle.<4 20./a> So if you are 30 seconds into a 70-second stall, setting this<4 21./a> sysfs param ter to (say) five will shorten the timeout for the<4 22./a> -next- stall, or the following warning for the current stall<4 23./a> (assuming the stall lasts long enough). It will not affect the<4 24./a> timing of the next warning for the current stall.<4 25./a><4 26./a> Stall-warning messages may be enabled and disabled compl tely via<4 27./a> /sys/module/rcutree/param ters/rcu_cpu_stall_suppress.<4 28./a><4 29> a>CONFIG_RCU_CPU_STALL_VERBOSE<4 30./a><4 31./a> This kernel configura4 32./a> also dump the stacks of any tasks that are blocking the current<4 33./a> RCU-preempt grace period.<4 34./a><4 35./a>RCU_CPU_STALL_INFO<4 36./a><4 37./a> This kernel configura4 38./a> print out addi > al per-CPU diagnostic informa > , including<4 39./a> informa > ton scheduling-clock ticks and RCU's idle-CPU tracking.<4 40./a><4 41./a>RCU_STALL_DELAY_DELTA<4 42./a><4 43./a> Although the lockdep facility is extremely useful, it does add<4 44./a> some overhead. Therefore, under CONFIG_PROVE_RCU, the<4 45./a> RCU_STALL_DELAY_DELTA macro allows five extra seconds before<4 46./a> giving an RCU CPU stall warning message.<4 47./a><4 48./a>RCU_STALL_RAT_DELAY<4 49./a><4 50./a> The CPU stall detector tries to make the offending CPU print its<4 51./a> own warnings, as this often gives better-quality stack traces.<4 52./a> However, if the offending CPU does not detect its own stall in<4 53./a> the number of jiffies specifiedtby RCU_STALL_RAT_DELAY, then<4 54./a> some other CPU will complain. This delay is normally set to<4 55./a> two jiffies.<4 56./a><4 57./a>When a CPU detects that it is stalling, it will print a message similar<4 58./a>to the following:<4 59./a><4 60./a>INFO: rcu_sched_state detected stallton CPU 5 (t=2500 jiffies)<4 61./a><4 62./a>This message indicates that CPU 5 detected that it was causing a stall,<4 63./a>and that the stall was affecting RCU-sched. This message will normally be<4 64./a>followedtby a stack dump of the offending CPU. On TREE_RCU kernel builds,<4 65./a>RCU and RCU-sched are impl mentedtby the sam underlying mechanism,<4 66./a>whileton TREE_PREEMPT_RCU kernel builds, RCU is instead impl mented<4 67./a>by rcu_preempt_state.<4 68./a><4 69./a>On the other hand, if the offending CPU fails to print out a stall-warning<4 70./a>message quickly enough, some other CPU will print a message similar to<4 71./a>the following:<4 72./a><4 73./a>INFO: rcu_bh_state detected stallston CPUs/tasks: { 3 5 } (detected by 2, 2502 jiffies)<4 74./a><4 75./a>This message indicates that CPU 2 detected that CPUs 3 and 5 were both<4 76./a>causing stalls, and that the stall was affecting RCU-bh. This message<4 77./a>will normally be followedtby stack dumps for each CPU. Please note that<4 78./a>TREE_PREEMPT_RCU builds can be stalledtby tasks as well as by CPUs,<4 79./a>and that the tasks will be indicatedtby PID, for exampl , "P3421".<4 80./a>It is even possible for a rcu_preempt_state stall to be causedtby both<4 81./a>CPUs -and- tasks, in which case the offending CPUs and tasks will all<4 82./a>be calledtout in the list.<4 83./a><4 84./a>Finally, if the grace period ends just as the stall warning starts<4 85./a>printing, there will be a spurious stall-warning message:<4 86./a><4 87./a>INFO: rcu_bh_state detected stallston CPUs/tasks: { } (detected by 4, 2502 jiffies)<4 88./a><4 89./a>This is rare, but does happen from time to time in real life.<4 90./a><4 91./a>If the CONFIG_RCU_CPU_STALL_INFO kernel configura4 92./a>more informa > tis printed with the stall-warning message, for exampl :<4 93./a><4 94./a> INFO: rcu_preempt detected stallton CPU<4 95./a> 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0<4 96./a> (t=65000 jiffies)<4 97./a><4 98./a>In kernels with CONFIG_RCU_FAST_NO_HZ, even more informa > tis<4 99./a>printed:<4100./a><4101./a> INFO: rcu_preempt detected stallton CPU<4102./a> 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer not pending<4103./a> (t=65000 jiffies)<4104./a><4105./a>The "(64628 ticks this GP)" indicates that this CPU has taken more<4106./a>than 64,000 scheduling-clock interrupts during the current stalled<4107./a>grace period. If the CPU was not yet aware of the current grace<4108./a>period (for exampl , if it was offline), then this part of the message<4109./a>indicates how many grace periods behind the CPU is.<4110./a><4111./a>The "idle=" por > tof the message prints the dyntick-idle state.<4112./a>The hex number before the first "/" is the low-order 12 bits of the<4113./a>dynticks counter, which will have an even-numbered ion> if the CPU is<4114./a>in dyntick-idle mode and an odd-numbered ion> otherwise. The hex<4115./a>number between the two "/"s is the ion> of the nesting, which will<4116./a>be a small positive number if in the idle loop and a very large positive<4117./a>number (as shown above) otherwise.<4118./a><4119./a>For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the CPU is<4120./a>not in the process of trying to force itself into dyntick-idle state, the<4121./a>"." indicates that the CPU has not given up forcing RCU into dyntick-idle<4122./a>mode (it would be "H" otherwise), and the "timer not pending" indicates<4123./a>that the CPU has not recently forced RCU into dyntick-idle mode (it<4124./a>would otherwise indicate the number of microseconds remaining in this<4125./a>forced state).<4126./a><4127./a><4128./a>Multipl Warnings From One Stall<4129./a><4130./a>If a stall lasts long enough, multipl stall-warning messages will be<4131./a>printed for it. The second and subsequent messages are printed at<4132./a>longer interions, so that the time between (say) the first and second<4133./a>message will be about three times the interion between the beginning<4134./a>of the stall and the first message.<4135./a><4136./a><4137./a>What Causes RCU CPU Stall Warnings?<4138./a><4139./a>So your kernel printed an RCU CPU stall warning. The next ques > tis<4140./a>"What causedtit?" The following problems can result in RCU CPU stall<4141./a>warnings:<4142./a><4143./a>o A CPU looping in an RCU read-side criticon sect/op.<4144./a> <4145./a>o A CPU looping with interrupts disabled. This condi > can<4146./a> result in RCU-sched and RCU-bh stalls.<4147./a><4148./a>o A CPU looping with preempt > disabled. This condi > can<4149./a> result in RCU-sched stallstand, if ksoftirqd is in use, RCU-bh<4150./a> stalls.<4151./a><4152./a>o A CPU looping with bottom halves disabled. This condi > can<4153./a> result in RCU-sched and RCU-bh stalls.<4154./a><4155./a>o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel<4156./a> without invoking schedule().<4157./a><4158./a>o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might<4159./a> happen to preempt a low-priority task in the middle of an RCU<4160./a> read-side criticon sect/op. This is especially damaging if<4161./a> that low-priority task is not permitted to runton any other CPU,<4162./a> in which case the next RCU grace period can never compl te, which<4163./a> will eventually cause the system to runtout of memory and hang.<4164./a> Whiletthe system is in the process of running itself out of<4165./a> memory, you might se stall-warning messages.<4166./a><4167./a>o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that<4168./a> is running at a higher priority than the RCU softirq threads.<4169./a> This will prevent RCU callbacks from ever being invoked,<4170./a> and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent<4171./a> RCU grace periods from ever compl ting. Either way, the<4172./a> system will eventually runtout of memory and hang. In the<4173./a> CONFIG_TREE_PREEMPT_RCU case, you might se stall-warning<4174./a> messages.<4175./a><4176./a>o A hardware or software issue shuts off the scheduler-clock<4177./a> interruptton a CPU that is not in dyntick-idle mode. This<4178./a> problem really has happened, and seems to be most likely to<4179./a> result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.<4180./a><4181./a>o A bug in the RCU impl mentat/op.<4182./a><4183./a>o A hardware failure. This is quit unlikely, but has occurred<4184./a> at leasttonce in real life. A CPU failed in a running system,<4185./a> becoming unresponsive, but not causing an immediate crash.<4186./a> This resulted in a series of RCU CPU stall warnings, eventually<4187./a> leading the realiza > tthat the CPU had failed.<4188./a><4189./a>The RCU, RCU-sched, and RCU-bh impl mentat/ops have CPU stall warning.<4190./a>SRCU does not have its own CPU stall warnings, but its callstto<4191./a>synchronize_sched() will result in RCU-sched detect/ng RCU-sched-related<4192./a>CPU stalls. Please note that RCU only detects CPU stalls when there is<4193./a>a grace period in progress. No grace period, no CPU stall warnings.<4194./a><4195./a>To diagnosetthe cause of the stall, inspect the stack traces.<4196./a>The offending funct/optwill usually be near the top of the stack.<4197./a>If you have a series of stall warnings from a single extended stall,<4198./a>comparing the stack traces can often help determine where the stall<4199./a>is occurring, which will usually be in the funct/optnearest the top of<4200./a>that por > tof the stack which remains the sam from trace to trace.<4201./a>If you can reliably trigger the stall, ftrace can be quit helpful.<4202./a><4203./a>RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE<4204./a>and with RCU's event tracing.<4205./a>
The original LXR software by the LXR community./a>, this experimental versioptby lxr@linux.no./a>. ./div .div class="subfooter"> lxr.linux.no kindly hosted by Redpill Linpro AS./a>, provider of Linux consulting and operat/ops services since 1995. ./div ./body ./html