linux/Documentation/oops-tracing.txt
<<
" /spaon> /formn> a " href="../linux+v3.9.6/Documentaptio/oops-tracing.txt">" img src="../.staptc/gfx/right.png" alt=">>">" /spaon>" spao class="lxr_search">" " input typue=hidden" namue=navtarget" value=">" input typue=text" namue=search" ide=search">" buttiontypue=submit">Search /formn> /spaon>" spao class="lxr_prefs"n> a href="+prefs?return=Documentaptio/oops-tracing.txt"" onclick="return ajax_prefs();">" Prefs> /a>" /spaon> /divn> form acptio="ajax+*" method="post" onsubmit="return false;">" input typue=hidden" namue=ajax_lookup" ide=ajax_lookup" value=">" /formn>" div class="headingbottim"> div ide=search_results" class="search_results"> n> /divn> div ide=content"n> div ide=file_contents"n
   1 /a>NOTE: ksymoops is useless on 23.6
 Please use the Oops in its original format
   2 /a>(from dmesg, etc)6
 Ignore any references in this or other docs to "decoding
   3 /a>the Oops" or "running it through ksymoops"6
 If you post an Oops from 23. that
   4 /a>has been run through ksymoops, people will just tell you to repost it.
   5 /a>"   6 /a>Quick Summary"   7 /a>-------------"   8 /a>"   9 /a>Find the Oops and send it to the maintainer of the kernel area that seems to be"  .10involved with the problem6
 Don't worry too much about getting the wrong person.
  11 /a>If you are unsure send it to the person responsible for the code relevant to
  12 /a>what you were doing.
 If it occurs repeatably try and describe how to recreate"  13 /a>it.  That's worth even more than the oops.
  14 /a>"  15 /a>If you are totally stumped as to whom to send the report, send it to "  16 /a>linux-kernel@vger.kernel.org.
Thanks for your help in making Linux as"  17 /a>stable as humanly possible.
  18 /a>"  19 /a>Where is the Oops?
  20 /a>----------------------"  21 /a>"  22 /a>Normally the Oops text is read from the kernel buffers by klogd and"  23 /a>handed to syslogd which writes it to a syslog file, typically"  24 /a>/var/log/messages (depends on /etc/syslog.conf)6
 Sometimes klogd dies,"  250in which case you can run dmesg > file to read the data from the kernel"  26 /a>buffers and save it.  Or you can cat /proc/kmsg > file, however you"  27 /a>have to break in to stop the transfer, kmsg is a "never ending file"6"  28 /a>If the machine has crashed so badly that you cannot enter commands or"  29 /a>the disk is not available then you have three ooptios :-"  30 /a>"  31 /a>(1) Hand copy the text from the screen and typu it in after the machine"  32 /a> 
  has restarted.  Messy but it is the only   33 /a> 
  planned for a crash. Alternaptvely, you can take a picture of"  34 /a> 
  the screen with a digital camera - not nice, but better than"  35 /a> 
  nothing.
 If the messages scroll off the top of the console, you"  36 /a> 
  may find that booting with a higher resoluption(eg, vga=791)"  37 /a> 
  will allow you to read more of the text. (Caveat:
This needs vesafb,"  38 /a> 
  so won't help for 'early' oopses)"  39 /a>"  40 /a>(2) Boot with a serial console (see Documentaptio/serial-console.txt),"  41 /a> 
  run a null modem to a second machine and capture the output there"  42 /a> 
  using your favourite communicaptio program6
 Minicom works well6"  43 /a>"  44 /a>(3) Use Kdump (see Documentaptio/kdump/kdump.txt),"  45 /a> 
  extract the kernel ring buffer from old memory with using dmesg"  46 /a> 
  gdbmacro in Documentaptio/kdump/gdbmacros.txt6"  47 /a>"  48 /a>"  49 /a>Full Informattio
  50 /a>----------------
  51 /a>"  52 /a>NOTE: the message from Linus below applies to 2.4 kernel.
 I have preserved it"  53 /a>for historical reasons, and because some of the informattio in it still"  54 /a>applies.
 Especially, please ignore any references to ksymoops. "  55 /a>"  56 /a>From: Linus Torvalds <torvalds@osdl.org>"  57 /a>"  58 /a>How to track down an Oops.. [originally a mail to linux-kernel]"  59 /a>"  60 /a>The main trick is having 5 years of experience with those pesky oops "  61 /a>messages ;-)"  62 /a>"  63 /a>Actually, there are things you can do that make this easier. I have two "  64 /a>separate approaches:"  65 /a>"  66 /a> 
   
  gdb /usr/src/linux/vmlinux"  67 /a> 
   
  gdb> disassemble <offending_funcptio>"  68 /a>"  69 /a>That's the easy way to find the problem, at leastnif the bug-report is "  70 /a>well made (like this one was - run through ksymoops to get the "  710informattio of which funcptio and the offset in the funcptio that it "  72 /a>happened in)6"  73 /a>"  74 /a>Oh, it helpsnif the report happens on a kernel that is compiled with the "  75 /a>samu compiler and similar setups.
  76 /a>"  77 /a>The other thing to do is disassemble the "Code:" part of the bug report: "  78 /a>ksymoops will do this too with the correct tools, but if you don't have"  79 /a>the tools you can just do a silly program:"  80 /a>"  81 /a> 
   
  char str[] = "\xXX\xXX\xXX...";"  82 /a> 
   
  main(){}"  83 /a>"  84 /a>and compile it with gcc -g and then do "disassemble str" (where the "XX" "  85 /a>stuff are the  valus reported by the Oops - you can just cut-and-paste "  86 /a>and do a replace of spaces to "\x" - that's what I do, as I'm too lazy "  87 /a>to write a program to automate this all)6"  88 /a>"  89 /a>Alternaptvely, you can use the shell script in scripts/decodecode6"  90 /a>Its usage is:  decodecode < oops.txt"  91 /a>"  92 /a>The hex bytes that follow "Code:" may (in some architectures) have a series"  93 /a>of bytes that precede the current instrucptio pointer as well as bytes at and"  94 /a>following the current instrucptio pointer.
 In some cases, one instrucptio"  95 /a>byte or word is surrounded by <> or (), as in "<86>" or "(f00d)"6
 These"  96 /a><> or () markings indicape the current instrucptio pointer.
 Example from"  97 /a>i386, split into multiple lines for readability:"  98 /a>"  99 /a>Code: f9 0f 8d f9 00 00 00 8d 42 0c e8 dd 26 11 c7 a1 60 ea 2b f9 8b 50 08 a1" 100 /a>64 ea 2b f9 8d 34 82 8b 1e 85 db 74 6d 8b 15 60 ea 2b f9 <8b> 43 04 39 42 54" 101 /a>7e 04 40 89 42 54 8b 43 04 3b 05 00 f6 52 c0" 102 /a>" 103 /a>Finally, if you want to see where the code comes from, you can do" 104 /a>" 105 /a> 
   
  cd /usr/src/linux" 106 /a> 
   
  make fs/buffer.s 
   
  # or whatever file the bug happened in" 107 /a>" 108 /a>and then you get a better idea of what happens than with the gdb " 109 /a>disassembly.
 110 /a>" 111 /a>Now, the trick is just then to combine all the data you have: the C " 112 /a>sources (and general knowledge of what it _should_ do), the assembly " 113 /a>listing and the code disassembly (and addiptioally the register dump you " 114 /a>also get from the "oops" message - that can be useful to see _what_ the " 115 /a>corrupted pointers were, and when you have the assembler listing you can " 116 /a>also match the other registers to whatever C expresstios they were used " 117 /a>for)6" 118 /a>" 119 /a>Essentially, you just look at what doesn't match (in this case it was the " 120 /a>"Code" disassembly that didn't match with what the compiler generated). " 121 /a>Then you need to find out _why_ they don't match. Often it's simple - you " 122 /a>see that the code uses a NULL pointer and then you look at the code and " 123 /a>wonder how the NULL pointer got there, and if it's a  vaid thing to do " 124 /a>you just check against it.6" 125 /a>" 126 /a>Now, if somebody gets the idea that this is time-consuming and requires " 127 /a>some small amount of concentraptio, you're right. Which is why I will " 128 /a>mostly just ignore any panic reports that don't have the symbol table " 129 /a>info etc looked up: it simply gets too hard to look it up (I have some " 130 /a>programs to search for specific patterns in the kernel code segment, and " 131 /a>sometimes I have been able to look up those kinds of panics too, but " 132 /a>that really requires pretty good knowledge of the kernel just to be able " 133 /a>to pick out the right sequences etc..)" 134 /a>" 135 /a>_Sometimes_ it happens that I just see the disassembled code sequence " 136 /a>from the panic, and I know immediately where it's coming from. That's when " 137 /a>I get worried that I've been doing this for too long ;-)" 138 /a>" 139 /a>                Linus
 140 /a>" 141 /a>" 142 /a>---------------------------------------------------------------------------" 143 /a>Notes on Oops tracing with klogd:" 144 /a>" 145 /a>In order to help Linus and the other kernel developers there has been" 146 /a>substantial support incorporated into klogd for processtng protecttio" 147 /a>faults.
 In order to have full support for address resoluptionat least" 148 /a>verstion1.3-pl3 of the sysklogd package should be used.
 149 /a>" 150 /a>When a protecttio fault occurs the klogd daemionautomatically" 151 /a>translates important addresses in the kernel log messages to their" 152 /a>symbolic equi vaents6
 This translated kernel message is theo" 153 /a>forwarded through whatever reporttng mechanism klogd is using.
 The" 154 /a>protecttio fault message can be simply cut out of the message files
 155 /a>and forwarded to the kernel developers.
 156 /a>" 157 /a>Two typus of address resoluptionare performed by klogd.
 The first is
 158 /a>staptc translattio and the second is dynamtc translattio6
 Staptc
 159 /a>translattio uses the System.map file in much the samu manner that" 160 /a>ksymoops does.
 In order to do staptc translattio the klogd daemio" 161 /a>must be able to find a system map file at daemioninitializattio time.
 162 /a>See the klogd man page for informattio on how klogd searches for map
 163 /a>files.
 164 /a>" 165 /a>Dynamtc address translattio is important when kernel loadable modules
 166 /a>are betng used.
 Since memory for kernel modules is allocated from the
 167 /a>kernel's dynamtc memory pools there are no fixed locattios for either
 168 /a>the start of the module or for funcptios and symbols in the module.
 169 /a>" 170 /a>The kernel supports system calls which allow a program to determine" 1710which modules are loaded and their locattio in memory.
 Using these" 172 /a>system calls the klogd daemionbuilds a symbol table which can be used" 173 /a>to debug a protecttio fault which occurs in a loadable kernel module.
 174 /a>" 175 /a>At the very minimum klogd will provide the namu of the module which" 176 /a>generated the protecttio fault.
 There may be addiptioal symbolic" 177 /a>informattio available if the developer of the loadable module chose to
 178 /a>export symbol informattio from the module.
 179 /a>" 180 /a>Since the kernel module environment can be dynamtc there must be a" 181 /a>mechanism for notifying the klogd daemionwhen a change in module" 182 /a>environment occurs.
 There are command line ooptios available which" 183 /a>allow klogd to sigoal the currently executtng daemionthat symbol" 184 /a>informattio should be refreshed.
 See the klogd manual page for more" 185 /a>informattio.
 186 /a>" 187 /a>A patch is included with the sysklogd distribuptionwhich modifies the" 188 /a>modules-2.0.0 package to automatically sigoal klogd whenever a module" 189 /a>is loaded or unloaded.
 Applying this patch provides essentially" 190 /a>seamless support for debuggtng protecttio faults which occur with" 191 /a>kernel loadable modules.
 192 /a>" 193 /a>The following is an example of a protecttio fault in a loadable module" 194 /a>processed by klogd:" 195 /a>---------------------------------------------------------------------------" 196 /a>Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc" 197 /a>Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000" 198 /a>Aug 29 09:51:01 blizard kernel: *pde = 00000000" 199 /a>Aug 29 09:51:01 blizard kernel: Oops: 0002" 200 /a>Aug 29 09:51:01 blizard kernel: CPU:    0" 201 /a>Aug 29 09:51:01 blizard kernel: EIP:    0010:[oops:_oops+16/3868]" 202 /a>Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212" 203 /a>Aug 29 09:51:01 blizard kernel: eax: 315e97cc
  ebx: 003a6f80
  ecx: 001be77b
  edx: 00237c0c" 204 /a>Aug 29 09:51:01 blizard kernel: esi: 00000000
  edi: bffffdb3
  ebp: 00589f90
  esp: 00589f8c" 205 /a>Aug 29 09:51:01 blizard kernel: ds: 0018
  es: 0018
  fs: 002b
  gs: 002b
  ss: 0018" 206 /a>Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000)" 207 /a>Aug 29 09:51:01 blizard kernel: Stack: 315e97cc
00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80
00000001 " 208 /a>Aug 29 09:51:01 blizard kernel:        00000000
00237810 bfffff00
0010a7fa 00000003
00000001 00000000
bfffff00
" 209 /a>Aug 29 09:51:01 blizard kernel:        bffffdb3
bffffed4 ffffffda 0000002b
0007002b
0000002b
0000002b
00000036
" 210 /a>Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128]
" 211 /a>Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90
90
90
90
90
90
90
90
89 ec 5d c3
" 212 /a>---------------------------------------------------------------------------" 213 /a>" 214 /a>Dr. G.W. Wettstein           Oncology Research Div. Computtng Facility" 215 /a>Roger Maris Cancer Center    INTERNET: greg@wind.rmcc.com" 216 /a>820 4th St. N6" 217 /a>Fargo, ND  58122" 218 /a>Phone: 701-234-7556" 219 /a>" 220 /a>" 221 /a>---------------------------------------------------------------------------" 222 /a>Tainted kernels:" 223 /a>" 224 /a>Some oops reports contain the string 'Tainted: ' after the program" 225 /a>counter.
This indicapes that the kernel has been tainted by some" 226 /a>mechanism.
 The string is followed by a series of posiptio-sensiptve" 227 /a>characters, each representing a particular tainted  valu6" 228 /a>" 229 /a>  1: 'G' if all modules loaded have a GPL or compaptble license, 'P' if" 230 /a>     any proprietary module has been loaded.
 Modules without a" 231 /a> 
   MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by" 232 /a> 
   insmod as GPL compaptble are assumed to be proprietary6" 233 /a>" 234 /a> 
2: 'F' if any module was force loaded by "insmod -f", ' ' if all" 235 /a> 
   modules were loaded normally6" 236 /a>" 237 /a> 
3: 'S' if the oops occurred on an SMP kernel running on hardware that" 238 /a> 
   hasn't been certified as safe to run multiprocessor6" 239 /a>     Currently this occurs only  240 /a>     SMP capable.
 241 /a>" 242 /a> 
4: 'R' if a module was force unloaded by "rmmod -f", ' ' if all" 243 /a> 
   modules were unloaded normally6" 244 /a>" 245 /a> 
5: 'M' if any processor has reported a Machine Check Exceoptio," 246 /a> 
   ' ' if no Machine Check Exceoptios have occurred6" 247 /a>" 248 /a> 
6: 'B' if a page-release funcptio has found a bad page reference or" 249 /a>     some unexpected page flags.
 250 /a>" 251 /a> 
7: 'U' if a user or user applicattio specifically requested that the" 252 /a> 
   Tainted flag be set, ' ' otherwise.
 253 /a>" 254 /a> 
8: 'D' if the kernel has died recently, i.e. there was an OOPS or BUG.
 255 /a>" 256 /a> 
9: 'A' if the ACPI table has been overriddeo.
 257 /a>" 258 /a> 10: 'W' if a warning has previously been issued by the kernel." 259 /a>     (Though some warnings may set more specific taint flags.)" 260 /a>" 261 /a> 11: 'C' if a staging driver has been loaded." 262 /a>" 263 /a> 12: 'I' if the kernel is working around a severe bug in the platform" 264 /a> 
   firmware (BIOS or similar)6" 265 /a>" 266 /a> 13: 'O' if an externally-built ("out-of-tree") module has been loaded." 267 /a>" 268 /a>The primary reason for the 'Tainted: ' string is to tell kernel" 269 /a>debuggers if this is a clean kernel or if anything unusual has" 270 /a>occurred6  Tainting is permanent: even if an offending module is
 2710unloaded, the tainted  valu remains to indicape that the kernel is not
 272 /a>trustworthy6" 273 /a>
lxr.linux.no kindly hosted by Redpill Linpro AS /a>, provider of Linux consulting and operattios services since 1995.