linux/Documentation/power/basic-pm-debugging.txt
<<
>>
Prefs
   1Debugging hibernation and suspend
   2        (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
   3
   41. Testing hibernation (aka suspend to disk or STD)
   5
   6To check if hibernation works, you can try to hibernate in the "reboot" mode:
   7
   8# echo reboot > /sys/power/disk
   9# echo disk > /sys/power/state
  10
  11and the system should create a hibernation image, reboot, resume and get back to
  12the command prompt where you have started the transition.  If that happens,
  13hibernation is most likely to work correctly.  Still, you need to repeat the
  14test at least a couple of times in a row for confidence.  [This is necessary,
  15because some problems only show up on a second attempt at suspending and
  16resuming the system.]  Moreover, hibernating in the "reboot" and "shutdown"
  17modes causes the PM core to skip some platform-related callbacks which on ACPI
  18systems might be necessary to make hibernation work.  Thus, if your machine fails
  19to hibernate or resume in the "reboot" mode, you should try the "platform" mode:
  20
  21# echo platform > /sys/power/disk
  22# echo disk > /sys/power/state
  23
  24which is the default and recommended mode of hibernation.
  25
  26Unfortunately, the "platform" mode of hibernation does not work on some systems
  27with broken BIOSes.  In such cases the "shutdown" mode of hibernation might
  28work:
  29
  30# echo shutdown > /sys/power/disk
  31# echo disk > /sys/power/state
  32
  33(it is similar to the "reboot" mode, but it requires you to press the power
  34button to make the system resume).
  35
  36If neither "platform" nor "shutdown" hibernation mode works, you will need to
  37identify what goes wrong.
  38
  39a) Test modes of hibernation
  40
  41To find out why hibernation fails on your system, you can use a special testing
  42facility available if the kernel is compiled with CONFIG_PM_DEBUG set.  Then,
  43there is the file /sys/power/pm_test that can be used to make the hibernation
  44core run in a test mode.  There are 5 test modes available:
  45
  46freezer
  47- test the freezing of processes
  48
  49devices
  50- test the freezing of processes and suspending of devices
  51
  52platform
  53- test the freezing of processes, suspending of devices and platform
  54  global control methods(*)
  55
  56processors
  57- test the freezing of processes, suspending of devices, platform
  58  global control methods(*) and the disabling of nonboot CPUs
  59
  60core
  61- test the freezing of processes, suspending of devices, platform global
  62  control methods(*), the disabling of nonboot CPUs and suspending of
  63  platform/system devices
  64
  65(*) the platform global control methods are only available on ACPI systems
  66    and are only tested if the hibernation mode is set to "platform"
  67
  68To use one of them it is necessary to write the corresponding string to
  69/sys/power/pm_test (eg. "devices" to test the freezing of processes and
  70suspending devices) and issue the standard hibernation commands.  For example,
  71to use the "devices" test mode along with the "platform" mode of hibernation,
  72you should do the following:
  73
  74# echo devices > /sys/power/pm_test
  75# echo platform > /sys/power/disk
  76# echo disk > /sys/power/state
  77
  78Then, the kernel will try to freeze processes, suspend devices, wait 5 seconds,
  79resume devices and thaw processes.  If "platform" is written to
  80/sys/power/pm_test , then after suspending devices the kernel will additionally
  81invoke the global control methods (eg. ACPI global control methods) used to
  82prepare the platform firmware for hibernation.  Next, it will wait 5 seconds and
  83invoke the platform (eg. ACPI) global methods used to cancel hibernation etc.
  84
  85Writing "none" to /sys/power/pm_test causes the kernel to switch to the normal
  86hibernation/suspend operations.  Also, when open for reading, /sys/power/pm_test
  87contains a space-separated list of all available tests (including "none" that
  88represents the normal functionality) in which the current test level is
  89indicated by square brackets.
  90
  91Generally, as you can see, each test level is more "invasive" than the previous
  92one and the "core" level tests the hardware and drivers as deeply as possible
  93without creating a hibernation image.  Obviously, if the "devices" test fails,
  94the "platform" test will fail as well and so on.  Thus, as a rule of thumb, you
  95should try the test modes starting from "freezer", through "devices", "platform"
  96and "processors" up to "core" (repeat the test on each level a couple of times
  97to make sure that any random factors are avoided).
  98
  99If the "freezer" test fails, there is a task that cannot be frozen (in that case
 100it usually is possible to identify the offending task by analysing the output of
 101dmesg obtained after the failing test).  Failure at this level usually means
 102that there is a problem with the tasks freezer subsystem that should be
 103reported.
 104
 105If the "devices" test fails, most likely there is a driver that cannot suspend
 106or resume its device (in the latter case the system may hang or become unstable
 107after the test, so please take that into consideration).  To find this driver,
 108you can carry out a binary search according to the rules:
 109- if the test fails, unload a half of the drivers currently loaded and repeat
 110(that would probably involve rebooting the system, so always note what drivers
 111have been loaded before the test),
 112- if the test succeeds, load a half of the drivers you have unloaded most
 113recently and repeat.
 114
 115Once you have found the failing driver (there can be more than just one of
 116them), you have to unload it every time before hibernation.  In that case please
 117make sure to report the problem with the driver.
 118
 119It is also possible that the "devices" test will still fail after you have
 120unloaded all modules. In that case, you may want to look in your kernel
 121configuration for the drivers that can be compiled as modules (and test again
 122with these drivers compiled as modules).  You may also try to use some special
 123kernel command line options such as "noapic", "noacpi" or even "acpi=off".
 124
 125If the "platform" test fails, there is a problem with the handling of the
 126platform (eg. ACPI) firmware on your system.  In that case the "platform" mode
 127of hibernation is not likely to work.  You can try the "shutdown" mode, but that
 128is rather a poor man's workaround.
 129
 130If the "processors" test fails, the disabling/enabling of nonboot CPUs does not
 131work (of course, this only may be an issue on SMP systems) and the problem
 132should be reported.  In that case you can also try to switch the nonboot CPUs
 133off and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and
 134see if that works.
 135
 136If the "core" test fails, which means that suspending of the system/platform
 137devices has failed (these devices are suspended on one CPU with interrupts off),
 138the problem is most probably hardware-related and serious, so it should be
 139reported.
 140
 141A failure of any of the "platform", "processors" or "core" tests may cause your
 142system to hang or become unstable, so please beware.  Such a failure usually
 143indicates a serious problem that very well may be related to the hardware, but
 144please report it anyway.
 145
 146b) Testing minimal configuration
 147
 148If all of the hibernation test modes work, you can boot the system with the
 149"init=/bin/bash" command line parameter and attempt to hibernate in the
 150"reboot", "shutdown" and "platform" modes.  If that does not work, there
 151probably is a problem with a driver statically compiled into the kernel and you
 152can try to compile more drivers as modules, so that they can be tested
 153individually.  Otherwise, there is a problem with a modular driver and you can
 154find it by loading a half of the modules you normally use and binary searching
 155in accordance with the algorithm:
 156- if there are n modules loaded and the attempt to suspend and resume fails,
 157unload n/2 of the modules and try again (that would probably involve rebooting
 158the system),
 159- if there are n modules loaded and the attempt to suspend and resume succeeds,
 160load n/2 modules more and try again.
 161
 162Again, if you find the offending module(s), it(they) must be unloaded every time
 163before hibernation, and please report the problem with it(them).
 164
 165c) Advanced debugging
 166
 167In case that hibernation does not work on your system even in the minimal
 168configuration and compiling more drivers as modules is not practical or some
 169modules cannot be unloaded, you can use one of the more advanced debugging
 170techniques to find the problem.  First, if there is a serial port in your box,
 171you can boot the kernel with the 'no_console_suspend' parameter and try to log
 172kernel messages using the serial console.  This may provide you with some
 173information about the reasons of the suspend (resume) failure.  Alternatively,
 174it may be possible to use a FireWire port for debugging with firescope
 175(ftp://ftp.firstfloor.org/pub/ak/firescope/).  On x86 it is also possible to
 176use the PM_TRACE mechanism documented in Documentation/power/s2ram.txt .
 177
 1782. Testing suspend to RAM (STR)
 179
 180To verify that the STR works, it is generally more convenient to use the s2ram
 181tool available from http://suspend.sf.net and documented at
 182http://en.opensuse.org/SDB:Suspend_to_RAM.
 183
 184Namely, after writing "freezer", "devices", "platform", "processors", or "core"
 185into /sys/power/pm_test (available if the kernel is compiled with
 186CONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding
 187to given string.  The STR test modes are defined in the same way as for
 188hibernation, so please refer to Section 1 for more information about them.  In
 189particular, the "core" test allows you to test everything except for the actual
 190invocation of the platform firmware in order to put the system into the sleep
 191state.
 192
 193Among other things, the testing with the help of /sys/power/pm_test may allow
 194you to identify drivers that fail to suspend or resume their devices.  They
 195should be unloaded every time before an STR transition.
 196
 197Next, you can follow the instructions at http://en.opensuse.org/s2ram to test
 198the system, but if it does not work "out of the box", you may need to boot it
 199with "init=/bin/bash" and test s2ram in the minimal configuration.  In that
 200case, you may be able to search for failing drivers by following the procedure
 201analogous to the one described in section 1.  If you find some failing drivers,
 202you will have to unload them every time before an STR transition (ie. before
 203you run s2ram), and please report the problems with them.
 204
 205There is a debugfs entry which shows the suspend to RAM statistics. Here is an
 206example of its output.
 207        # mount -t debugfs none /sys/kernel/debug
 208        # cat /sys/kernel/debug/suspend_stats
 209        success: 20
 210        fail: 5
 211        failed_freeze: 0
 212        failed_prepare: 0
 213        failed_suspend: 5
 214        failed_suspend_noirq: 0
 215        failed_resume: 0
 216        failed_resume_noirq: 0
 217        failures:
 218          last_failed_dev:      alarm
 219                                adc
 220          last_failed_errno:    -16
 221                                -16
 222          last_failed_step:     suspend
 223                                suspend
 224Field success means the success number of suspend to RAM, and field fail means
 225the failure number. Others are the failure number of different steps of suspend
 226to RAM. suspend_stats just lists the last 2 failed devices, error number and
 227failed step of suspend.
 228
lxr.linux.no kindly hosted by Redpill Linpro AS, provider of Linux consulting and operations services since 1995.