linux/Documentation/trace/events.txt
<<
>>
Prefs
   1                             Event Tracing
   2
   3                Documentation written by Theodore Ts'o
   4                Updated by Li Zefan and Tom Zanussi
   5
   61. Introduction
   7===============
   8
   9Tracepoints (see Documentation/trace/tracepoints.txt) can be used
  10without creating custom kernel modules to register probe functions
  11using the event tracing infrastructure.
  12
  13Not all tracepoints can be traced using the event tracing system;
  14the kernel developer must provide code snippets which define how the
  15tracing information is saved into the tracing buffer, and how the
  16tracing information should be printed.
  17
  182. Using Event Tracing
  19======================
  20
  212.1 Via the 'set_event' interface
  22---------------------------------
  23
  24The events which are available for tracing can be found in the file
  25/sys/kernel/debug/tracing/available_events.
  26
  27To enable a particular event, such as 'sched_wakeup', simply echo it
  28to /sys/kernel/debug/tracing/set_event. For example:
  29
  30        # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
  31
  32[ Note: '>>' is necessary, otherwise it will firstly disable
  33  all the events. ]
  34
  35To disable an event, echo the event name to the set_event file prefixed
  36with an exclamation point:
  37
  38        # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
  39
  40To disable all events, echo an empty line to the set_event file:
  41
  42        # echo > /sys/kernel/debug/tracing/set_event
  43
  44To enable all events, echo '*:*' or '*:' to the set_event file:
  45
  46        # echo *:* > /sys/kernel/debug/tracing/set_event
  47
  48The events are organized into subsystems, such as ext4, irq, sched,
  49etc., and a full event name looks like this: <subsystem>:<event>.  The
  50subsystem name is optional, but it is displayed in the available_events
  51file.  All of the events in a subsystem can be specified via the syntax
  52"<subsystem>:*"; for example, to enable all irq events, you can use the
  53command:
  54
  55        # echo 'irq:*' > /sys/kernel/debug/tracing/set_event
  56
  572.2 Via the 'enable' toggle
  58---------------------------
  59
  60The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy
  61of directories.
  62
  63To enable event 'sched_wakeup':
  64
  65        # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
  66
  67To disable it:
  68
  69        # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
  70
  71To enable all events in sched subsystem:
  72
  73        # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
  74
  75To enable all events:
  76
  77        # echo 1 > /sys/kernel/debug/tracing/events/enable
  78
  79When reading one of these enable files, there are four results:
  80
  81 0 - all events this file affects are disabled
  82 1 - all events this file affects are enabled
  83 X - there is a mixture of events enabled and disabled
  84 ? - this file does not affect any event
  85
  862.3 Boot option
  87---------------
  88
  89In order to facilitate early boot debugging, use boot option:
  90
  91        trace_event=[event-list]
  92
  93event-list is a comma separated list of events. See section 2.1 for event
  94format.
  95
  963. Defining an event-enabled tracepoint
  97=======================================
  98
  99See The example provided in samples/trace_events
 100
 1014. Event formats
 102================
 103
 104Each trace event has a 'format' file associated with it that contains
 105a description of each field in a logged event.  This information can
 106be used to parse the binary trace stream, and is also the place to
 107find the field names that can be used in event filters (see section 5).
 108
 109It also displays the format string that will be used to print the
 110event in text mode, along with the event name and ID used for
 111profiling.
 112
 113Every event has a set of 'common' fields associated with it; these are
 114the fields prefixed with 'common_'.  The other fields vary between
 115events and correspond to the fields defined in the TRACE_EVENT
 116definition for that event.
 117
 118Each field in the format has the form:
 119
 120     field:field-type field-name; offset:N; size:N;
 121
 122where offset is the offset of the field in the trace record and size
 123is the size of the data item, in bytes.
 124
 125For example, here's the information displayed for the 'sched_wakeup'
 126event:
 127
 128# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
 129
 130name: sched_wakeup
 131ID: 60
 132format:
 133        field:unsigned short common_type;       offset:0;       size:2;
 134        field:unsigned char common_flags;       offset:2;       size:1;
 135        field:unsigned char common_preempt_count;       offset:3;       size:1;
 136        field:int common_pid;   offset:4;       size:4;
 137        field:int common_tgid;  offset:8;       size:4;
 138
 139        field:char comm[TASK_COMM_LEN]; offset:12;      size:16;
 140        field:pid_t pid;        offset:28;      size:4;
 141        field:int prio; offset:32;      size:4;
 142        field:int success;      offset:36;      size:4;
 143        field:int cpu;  offset:40;      size:4;
 144
 145print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid,
 146           REC->prio, REC->success, REC->cpu
 147
 148This event contains 10 fields, the first 5 common and the remaining 5
 149event-specific.  All the fields for this event are numeric, except for
 150'comm' which is a string, a distinction important for event filtering.
 151
 1525. Event filtering
 153==================
 154
 155Trace events can be filtered in the kernel by associating boolean
 156'filter expressions' with them.  As soon as an event is logged into
 157the trace buffer, its fields are checked against the filter expression
 158associated with that event type.  An event with field values that
 159'match' the filter will appear in the trace output, and an event whose
 160values don't match will be discarded.  An event with no filter
 161associated with it matches everything, and is the default when no
 162filter has been set for an event.
 163
 1645.1 Expression syntax
 165---------------------
 166
 167A filter expression consists of one or more 'predicates' that can be
 168combined using the logical operators '&&' and '||'.  A predicate is
 169simply a clause that compares the value of a field contained within a
 170logged event with a constant value and returns either 0 or 1 depending
 171on whether the field value matched (1) or didn't match (0):
 172
 173          field-name relational-operator value
 174
 175Parentheses can be used to provide arbitrary logical groupings and
 176double-quotes can be used to prevent the shell from interpreting
 177operators as shell metacharacters.
 178
 179The field-names available for use in filters can be found in the
 180'format' files for trace events (see section 4).
 181
 182The relational-operators depend on the type of the field being tested:
 183
 184The operators available for numeric fields are:
 185
 186==, !=, <, <=, >, >=, &
 187
 188And for string fields they are:
 189
 190==, !=, ~
 191
 192The glob (~) only accepts a wild card character (*) at the start and or
 193end of the string. For example:
 194
 195  prev_comm ~ "*sh"
 196  prev_comm ~ "sh*"
 197  prev_comm ~ "*sh*"
 198
 199But does not allow for it to be within the string:
 200
 201  prev_comm ~ "ba*sh"   <-- is invalid
 202
 2035.2 Setting filters
 204-------------------
 205
 206A filter for an individual event is set by writing a filter expression
 207to the 'filter' file for the given event.
 208
 209For example:
 210
 211# cd /sys/kernel/debug/tracing/events/sched/sched_wakeup
 212# echo "common_preempt_count > 4" > filter
 213
 214A slightly more involved example:
 215
 216# cd /sys/kernel/debug/tracing/events/signal/signal_generate
 217# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
 218
 219If there is an error in the expression, you'll get an 'Invalid
 220argument' error when setting it, and the erroneous string along with
 221an error message can be seen by looking at the filter e.g.:
 222
 223# cd /sys/kernel/debug/tracing/events/signal/signal_generate
 224# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
 225-bash: echo: write error: Invalid argument
 226# cat filter
 227((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
 228^
 229parse_error: Field not found
 230
 231Currently the caret ('^') for an error always appears at the beginning of
 232the filter string; the error message should still be useful though
 233even without more accurate position info.
 234
 2355.3 Clearing filters
 236--------------------
 237
 238To clear the filter for an event, write a '0' to the event's filter
 239file.
 240
 241To clear the filters for all events in a subsystem, write a '0' to the
 242subsystem's filter file.
 243
 2445.3 Subsystem filters
 245---------------------
 246
 247For convenience, filters for every event in a subsystem can be set or
 248cleared as a group by writing a filter expression into the filter file
 249at the root of the subsystem.  Note however, that if a filter for any
 250event within the subsystem lacks a field specified in the subsystem
 251filter, or if the filter can't be applied for any other reason, the
 252filter for that event will retain its previous setting.  This can
 253result in an unintended mixture of filters which could lead to
 254confusing (to the user who might think different filters are in
 255effect) trace output.  Only filters that reference just the common
 256fields can be guaranteed to propagate successfully to all events.
 257
 258Here are a few subsystem filter examples that also illustrate the
 259above points:
 260
 261Clear the filters on all events in the sched subsystem:
 262
 263# cd /sys/kernel/debug/tracing/events/sched
 264# echo 0 > filter
 265# cat sched_switch/filter
 266none
 267# cat sched_wakeup/filter
 268none
 269
 270Set a filter using only common fields for all events in the sched
 271subsystem (all events end up with the same filter):
 272
 273# cd /sys/kernel/debug/tracing/events/sched
 274# echo common_pid == 0 > filter
 275# cat sched_switch/filter
 276common_pid == 0
 277# cat sched_wakeup/filter
 278common_pid == 0
 279
 280Attempt to set a filter using a non-common field for all events in the
 281sched subsystem (all events but those that have a prev_pid field retain
 282their old filters):
 283
 284# cd /sys/kernel/debug/tracing/events/sched
 285# echo prev_pid == 0 > filter
 286# cat sched_switch/filter
 287prev_pid == 0
 288# cat sched_wakeup/filter
 289common_pid == 0
 290
 2916. Event triggers
 292=================
 293
 294Trace events can be made to conditionally invoke trigger 'commands'
 295which can take various forms and are described in detail below;
 296examples would be enabling or disabling other trace events or invoking
 297a stack trace whenever the trace event is hit.  Whenever a trace event
 298with attached triggers is invoked, the set of trigger commands
 299associated with that event is invoked.  Any given trigger can
 300additionally have an event filter of the same form as described in
 301section 5 (Event filtering) associated with it - the command will only
 302be invoked if the event being invoked passes the associated filter.
 303If no filter is associated with the trigger, it always passes.
 304
 305Triggers are added to and removed from a particular event by writing
 306trigger expressions to the 'trigger' file for the given event.
 307
 308A given event can have any number of triggers associated with it,
 309subject to any restrictions that individual commands may have in that
 310regard.
 311
 312Event triggers are implemented on top of "soft" mode, which means that
 313whenever a trace event has one or more triggers associated with it,
 314the event is activated even if it isn't actually enabled, but is
 315disabled in a "soft" mode.  That is, the tracepoint will be called,
 316but just will not be traced, unless of course it's actually enabled.
 317This scheme allows triggers to be invoked even for events that aren't
 318enabled, and also allows the current event filter implementation to be
 319used for conditionally invoking triggers.
 320
 321The syntax for event triggers is roughly based on the syntax for
 322set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
 323section of Documentation/trace/ftrace.txt), but there are major
 324differences and the implementation isn't currently tied to it in any
 325way, so beware about making generalizations between the two.
 326
 3276.1 Expression syntax
 328---------------------
 329
 330Triggers are added by echoing the command to the 'trigger' file:
 331
 332  # echo 'command[:count] [if filter]' > trigger
 333
 334Triggers are removed by echoing the same command but starting with '!'
 335to the 'trigger' file:
 336
 337  # echo '!command[:count] [if filter]' > trigger
 338
 339The [if filter] part isn't used in matching commands when removing, so
 340leaving that off in a '!' command will accomplish the same thing as
 341having it in.
 342
 343The filter syntax is the same as that described in the 'Event
 344filtering' section above.
 345
 346For ease of use, writing to the trigger file using '>' currently just
 347adds or removes a single trigger and there's no explicit '>>' support
 348('>' actually behaves like '>>') or truncation support to remove all
 349triggers (you have to use '!' for each one added.)
 350
 3516.2 Supported trigger commands
 352------------------------------
 353
 354The following commands are supported:
 355
 356- enable_event/disable_event
 357
 358  These commands can enable or disable another trace event whenever
 359  the triggering event is hit.  When these commands are registered,
 360  the other trace event is activated, but disabled in a "soft" mode.
 361  That is, the tracepoint will be called, but just will not be traced.
 362  The event tracepoint stays in this mode as long as there's a trigger
 363  in effect that can trigger it.
 364
 365  For example, the following trigger causes kmalloc events to be
 366  traced when a read system call is entered, and the :1 at the end
 367  specifies that this enablement happens only once:
 368
 369  # echo 'enable_event:kmem:kmalloc:1' > \
 370      /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
 371
 372  The following trigger causes kmalloc events to stop being traced
 373  when a read system call exits.  This disablement happens on every
 374  read system call exit:
 375
 376  # echo 'disable_event:kmem:kmalloc' > \
 377      /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
 378
 379  The format is:
 380
 381      enable_event:<system>:<event>[:count]
 382      disable_event:<system>:<event>[:count]
 383
 384  To remove the above commands:
 385
 386  # echo '!enable_event:kmem:kmalloc:1' > \
 387      /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
 388
 389  # echo '!disable_event:kmem:kmalloc' > \
 390      /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
 391
 392  Note that there can be any number of enable/disable_event triggers
 393  per triggering event, but there can only be one trigger per
 394  triggered event. e.g. sys_enter_read can have triggers enabling both
 395  kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc
 396  versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if
 397  bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they
 398  could be combined into a single filter on kmem:kmalloc though).
 399
 400- stacktrace
 401
 402  This command dumps a stacktrace in the trace buffer whenever the
 403  triggering event occurs.
 404
 405  For example, the following trigger dumps a stacktrace every time the
 406  kmalloc tracepoint is hit:
 407
 408  # echo 'stacktrace' > \
 409        /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
 410
 411  The following trigger dumps a stacktrace the first 5 times a kmalloc
 412  request happens with a size >= 64K
 413
 414  # echo 'stacktrace:5 if bytes_req >= 65536' > \
 415        /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
 416
 417  The format is:
 418
 419      stacktrace[:count]
 420
 421  To remove the above commands:
 422
 423  # echo '!stacktrace' > \
 424        /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
 425
 426  # echo '!stacktrace:5 if bytes_req >= 65536' > \
 427        /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
 428
 429  The latter can also be removed more simply by the following (without
 430  the filter):
 431
 432  # echo '!stacktrace:5' > \
 433        /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
 434
 435  Note that there can be only one stacktrace trigger per triggering
 436  event.
 437
 438- snapshot
 439
 440  This command causes a snapshot to be triggered whenever the
 441  triggering event occurs.
 442
 443  The following command creates a snapshot every time a block request
 444  queue is unplugged with a depth > 1.  If you were tracing a set of
 445  events or functions at the time, the snapshot trace buffer would
 446  capture those events when the trigger event occurred:
 447
 448  # echo 'snapshot if nr_rq > 1' > \
 449        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 450
 451  To only snapshot once:
 452
 453  # echo 'snapshot:1 if nr_rq > 1' > \
 454        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 455
 456  To remove the above commands:
 457
 458  # echo '!snapshot if nr_rq > 1' > \
 459        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 460
 461  # echo '!snapshot:1 if nr_rq > 1' > \
 462        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 463
 464  Note that there can be only one snapshot trigger per triggering
 465  event.
 466
 467- traceon/traceoff
 468
 469  These commands turn tracing on and off when the specified events are
 470  hit. The parameter determines how many times the tracing system is
 471  turned on and off. If unspecified, there is no limit.
 472
 473  The following command turns tracing off the first time a block
 474  request queue is unplugged with a depth > 1.  If you were tracing a
 475  set of events or functions at the time, you could then examine the
 476  trace buffer to see the sequence of events that led up to the
 477  trigger event:
 478
 479  # echo 'traceoff:1 if nr_rq > 1' > \
 480        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 481
 482  To always disable tracing when nr_rq  > 1 :
 483
 484  # echo 'traceoff if nr_rq > 1' > \
 485        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 486
 487  To remove the above commands:
 488
 489  # echo '!traceoff:1 if nr_rq > 1' > \
 490        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 491
 492  # echo '!traceoff if nr_rq > 1' > \
 493        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 494
 495  Note that there can be only one traceon or traceoff trigger per
 496  triggering event.
 497
lxr.linux.no kindly hosted by Redpill Linpro AS, provider of Linux consulting and operations services since 1995.