linux/Documentation/trace/events.txt
<<
>>
Prefs
   1                             Event Tracing
   2
   3                Documentation written by Theodore Ts'o
   4                Updated by Li Zefan and Tom Zanussi
   5
   61. Introduction
   7===============
   8
   9Tracepoints (see Documentation/trace/tracepoints.txt) can be used
  10without creating custom kernel modules to register probe functions
  11using the event tracing infrastructure.
  12
  13Not all tracepoints can be traced using the event tracing system;
  14the kernel developer must provide code snippets which define how the
  15tracing information is saved into the tracing buffer, and how the
  16tracing information should be printed.
  17
  182. Using Event Tracing
  19======================
  20
  212.1 Via the 'set_event' interface
  22---------------------------------
  23
  24The events which are available for tracing can be found in the file
  25/sys/kernel/debug/tracing/available_events.
  26
  27To enable a particular event, such as 'sched_wakeup', simply echo it
  28to /sys/kernel/debug/tracing/set_event. For example:
  29
  30        # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
  31
  32[ Note: '>>' is necessary, otherwise it will firstly disable
  33  all the events. ]
  34
  35To disable an event, echo the event name to the set_event file prefixed
  36with an exclamation point:
  37
  38        # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
  39
  40To disable all events, echo an empty line to the set_event file:
  41
  42        # echo > /sys/kernel/debug/tracing/set_event
  43
  44To enable all events, echo '*:*' or '*:' to the set_event file:
  45
  46        # echo *:* > /sys/kernel/debug/tracing/set_event
  47
  48The events are organized into subsystems, such as ext4, irq, sched,
  49etc., and a full event name looks like this: <subsystem>:<event>.  The
  50subsystem name is optional, but it is displayed in the available_events
  51file.  All of the events in a subsystem can be specified via the syntax
  52"<subsystem>:*"; for example, to enable all irq events, you can use the
  53command:
  54
  55        # echo 'irq:*' > /sys/kernel/debug/tracing/set_event
  56
  572.2 Via the 'enable' toggle
  58---------------------------
  59
  60The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy
  61of directories.
  62
  63To enable event 'sched_wakeup':
  64
  65        # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
  66
  67To disable it:
  68
  69        # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
  70
  71To enable all events in sched subsystem:
  72
  73        # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
  74
  75To enable all events:
  76
  77        # echo 1 > /sys/kernel/debug/tracing/events/enable
  78
  79When reading one of these enable files, there are four results:
  80
  81 0 - all events this file affects are disabled
  82 1 - all events this file affects are enabled
  83 X - there is a mixture of events enabled and disabled
  84 ? - this file does not affect any event
  85
  862.3 Boot option
  87---------------
  88
  89In order to facilitate early boot debugging, use boot option:
  90
  91        trace_event=[event-list]
  92
  93event-list is a comma separated list of events. See section 2.1 for event
  94format.
  95
  963. Defining an event-enabled tracepoint
  97=======================================
  98
  99See The example provided in samples/trace_events
 100
 1014. Event formats
 102================
 103
 104Each trace event has a 'format' file associated with it that contains
 105a description of each field in a logged event.  This information can
 106be used to parse the binary trace stream, and is also the place to
 107find the field names that can be used in event filters (see section 5).
 108
 109It also displays the format string that will be used to print the
 110event in text mode, along with the event name and ID used for
 111profiling.
 112
 113Every event has a set of 'common' fields associated with it; these are
 114the fields prefixed with 'common_'.  The other fields vary between
 115events and correspond to the fields defined in the TRACE_EVENT
 116definition for that event.
 117
 118Each field in the format has the form:
 119
 120     field:field-type field-name; offset:N; size:N;
 121
 122where offset is the offset of the field in the trace record and size
 123is the size of the data item, in bytes.
 124
 125For example, here's the information displayed for the 'sched_wakeup'
 126event:
 127
 128# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
 129
 130name: sched_wakeup
 131ID: 60
 132format:
 133        field:unsigned short common_type;       offset:0;       size:2;
 134        field:unsigned char common_flags;       offset:2;       size:1;
 135        field:unsigned char common_preempt_count;       offset:3;       size:1;
 136        field:int common_pid;   offset:4;       size:4;
 137        field:int common_tgid;  offset:8;       size:4;
 138
 139        field:char comm[TASK_COMM_LEN]; offset:12;      size:16;
 140        field:pid_t pid;        offset:28;      size:4;
 141        field:int prio; offset:32;      size:4;
 142        field:int success;      offset:36;      size:4;
 143        field:int cpu;  offset:40;      size:4;
 144
 145print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid,
 146           REC->prio, REC->success, REC->cpu
 147
 148This event contains 10 fields, the first 5 common and the remaining 5
 149event-specific.  All the fields for this event are numeric, except for
 150'comm' which is a string, a distinction important for event filtering.
 151
 1525. Event filtering
 153==================
 154
 155Trace events can be filtered in the kernel by associating boolean
 156'filter expressions' with them.  As soon as an event is logged into
 157the trace buffer, its fields are checked against the filter expression
 158associated with that event type.  An event with field values that
 159'match' the filter will appear in the trace output, and an event whose
 160values don't match will be discarded.  An event with no filter
 161associated with it matches everything, and is the default when no
 162filter has been set for an event.
 163
 1645.1 Expression syntax
 165---------------------
 166
 167A filter expression consists of one or more 'predicates' that can be
 168combined using the logical operators '&&' and '||'.  A predicate is
 169simply a clause that compares the value of a field contained within a
 170logged event with a constant value and returns either 0 or 1 depending
 171on whether the field value matched (1) or didn't match (0):
 172
 173          field-name relational-operator value
 174
 175Parentheses can be used to provide arbitrary logical groupings and
 176double-quotes can be used to prevent the shell from interpreting
 177operators as shell metacharacters.
 178
 179The field-names available for use in filters can be found in the
 180'format' files for trace events (see section 4).
 181
 182The relational-operators depend on the type of the field being tested:
 183
 184The operators available for numeric fields are:
 185
 186==, !=, <, <=, >, >=, &
 187
 188And for string fields they are:
 189
 190==, !=, ~
 191
 192The glob (~) only accepts a wild card character (*) at the start and or
 193end of the string. For example:
 194
 195  prev_comm ~ "*sh"
 196  prev_comm ~ "sh*"
 197  prev_comm ~ "*sh*"
 198
 199But does not allow for it to be within the string:
 200
 201  prev_comm ~ "ba*sh"   <-- is invalid
 202
 2035.2 Setting filters
 204-------------------
 205
 206A filter for an individual event is set by writing a filter expression
 207to the 'filter' file for the given event.
 208
 209For example:
 210
 211# cd /sys/kernel/debug/tracing/events/sched/sched_wakeup
 212# echo "common_preempt_count > 4" > filter
 213
 214A slightly more involved example:
 215
 216# cd /sys/kernel/debug/tracing/events/signal/signal_generate
 217# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
 218
 219If there is an error in the expression, you'll get an 'Invalid
 220argument' error when setting it, and the erroneous string along with
 221an error message can be seen by looking at the filter e.g.:
 222
 223# cd /sys/kernel/debug/tracing/events/signal/signal_generate
 224# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
 225-bash: echo: write error: Invalid argument
 226# cat filter
 227((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
 228^
 229parse_error: Field not found
 230
 231Currently the caret ('^') for an error always appears at the beginning of
 232the filter string; the error message should still be useful though
 233even without more accurate position info.
 234
 2355.3 Clearing filters
 236--------------------
 237
 238To clear the filter for an event, write a '0' to the event's filter
 239file.
 240
 241To clear the filters for all events in a subsystem, write a '0' to the
 242subsystem's filter file.
 243
 2445.3 Subsystem filters
 245---------------------
 246
 247For convenience, filters for every event in a subsystem can be set or
 248cleared as a group by writing a filter expression into the filter file
 249at the root of the subsystem.  Note however, that if a filter for any
 250event within the subsystem lacks a field specified in the subsystem
 251filter, or if the filter can't be applied for any other reason, the
 252filter for that event will retain its previous setting.  This can
 253result in an unintended mixture of filters which could lead to
 254confusing (to the user who might think different filters are in
 255effect) trace output.  Only filters that reference just the common
 256fields can be guaranteed to propagate successfully to all events.
 257
 258Here are a few subsystem filter examples that also illustrate the
 259above points:
 260
 261Clear the filters on all events in the sched subsystem:
 262
 263# cd /sys/kernel/debug/tracing/events/sched
 264# echo 0 > filter
 265# cat sched_switch/filter
 266none
 267# cat sched_wakeup/filter
 268none
 269
 270Set a filter using only common fields for all events in the sched
 271subsystem (all events end up with the same filter):
 272
 273# cd /sys/kernel/debug/tracing/events/sched
 274# echo common_pid == 0 > filter
 275# cat sched_switch/filter
 276common_pid == 0
 277# cat sched_wakeup/filter
 278common_pid == 0
 279
 280Attempt to set a filter using a non-common field for all events in the
 281sched subsystem (all events but those that have a prev_pid field retain
 282their old filters):
 283
 284# cd /sys/kernel/debug/tracing/events/sched
 285# echo prev_pid == 0 > filter
 286# cat sched_switch/filter
 287prev_pid == 0
 288# cat sched_wakeup/filter
 289common_pid == 0
 290
 2915.4 PID filtering
 292-----------------
 293
 294The set_event_pid file in the same directory as the top events directory
 295exists, will filter all events from tracing any task that does not have the
 296PID listed in the set_event_pid file.
 297
 298# cd /sys/kernel/debug/tracing
 299# echo $$ > set_event_pid
 300# echo 1 > events/enabled
 301
 302Will only trace events for the current task.
 303
 304To add more PIDs without losing the PIDs already included, use '>>'.
 305
 306# echo 123 244 1 >> set_event_pid
 307
 308
 3096. Event triggers
 310=================
 311
 312Trace events can be made to conditionally invoke trigger 'commands'
 313which can take various forms and are described in detail below;
 314examples would be enabling or disabling other trace events or invoking
 315a stack trace whenever the trace event is hit.  Whenever a trace event
 316with attached triggers is invoked, the set of trigger commands
 317associated with that event is invoked.  Any given trigger can
 318additionally have an event filter of the same form as described in
 319section 5 (Event filtering) associated with it - the command will only
 320be invoked if the event being invoked passes the associated filter.
 321If no filter is associated with the trigger, it always passes.
 322
 323Triggers are added to and removed from a particular event by writing
 324trigger expressions to the 'trigger' file for the given event.
 325
 326A given event can have any number of triggers associated with it,
 327subject to any restrictions that individual commands may have in that
 328regard.
 329
 330Event triggers are implemented on top of "soft" mode, which means that
 331whenever a trace event has one or more triggers associated with it,
 332the event is activated even if it isn't actually enabled, but is
 333disabled in a "soft" mode.  That is, the tracepoint will be called,
 334but just will not be traced, unless of course it's actually enabled.
 335This scheme allows triggers to be invoked even for events that aren't
 336enabled, and also allows the current event filter implementation to be
 337used for conditionally invoking triggers.
 338
 339The syntax for event triggers is roughly based on the syntax for
 340set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
 341section of Documentation/trace/ftrace.txt), but there are major
 342differences and the implementation isn't currently tied to it in any
 343way, so beware about making generalizations between the two.
 344
 3456.1 Expression syntax
 346---------------------
 347
 348Triggers are added by echoing the command to the 'trigger' file:
 349
 350  # echo 'command[:count] [if filter]' > trigger
 351
 352Triggers are removed by echoing the same command but starting with '!'
 353to the 'trigger' file:
 354
 355  # echo '!command[:count] [if filter]' > trigger
 356
 357The [if filter] part isn't used in matching commands when removing, so
 358leaving that off in a '!' command will accomplish the same thing as
 359having it in.
 360
 361The filter syntax is the same as that described in the 'Event
 362filtering' section above.
 363
 364For ease of use, writing to the trigger file using '>' currently just
 365adds or removes a single trigger and there's no explicit '>>' support
 366('>' actually behaves like '>>') or truncation support to remove all
 367triggers (you have to use '!' for each one added.)
 368
 3696.2 Supported trigger commands
 370------------------------------
 371
 372The following commands are supported:
 373
 374- enable_event/disable_event
 375
 376  These commands can enable or disable another trace event whenever
 377  the triggering event is hit.  When these commands are registered,
 378  the other trace event is activated, but disabled in a "soft" mode.
 379  That is, the tracepoint will be called, but just will not be traced.
 380  The event tracepoint stays in this mode as long as there's a trigger
 381  in effect that can trigger it.
 382
 383  For example, the following trigger causes kmalloc events to be
 384  traced when a read system call is entered, and the :1 at the end
 385  specifies that this enablement happens only once:
 386
 387  # echo 'enable_event:kmem:kmalloc:1' > \
 388      /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
 389
 390  The following trigger causes kmalloc events to stop being traced
 391  when a read system call exits.  This disablement happens on every
 392  read system call exit:
 393
 394  # echo 'disable_event:kmem:kmalloc' > \
 395      /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
 396
 397  The format is:
 398
 399      enable_event:<system>:<event>[:count]
 400      disable_event:<system>:<event>[:count]
 401
 402  To remove the above commands:
 403
 404  # echo '!enable_event:kmem:kmalloc:1' > \
 405      /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
 406
 407  # echo '!disable_event:kmem:kmalloc' > \
 408      /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
 409
 410  Note that there can be any number of enable/disable_event triggers
 411  per triggering event, but there can only be one trigger per
 412  triggered event. e.g. sys_enter_read can have triggers enabling both
 413  kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc
 414  versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if
 415  bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they
 416  could be combined into a single filter on kmem:kmalloc though).
 417
 418- stacktrace
 419
 420  This command dumps a stacktrace in the trace buffer whenever the
 421  triggering event occurs.
 422
 423  For example, the following trigger dumps a stacktrace every time the
 424  kmalloc tracepoint is hit:
 425
 426  # echo 'stacktrace' > \
 427        /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
 428
 429  The following trigger dumps a stacktrace the first 5 times a kmalloc
 430  request happens with a size >= 64K
 431
 432  # echo 'stacktrace:5 if bytes_req >= 65536' > \
 433        /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
 434
 435  The format is:
 436
 437      stacktrace[:count]
 438
 439  To remove the above commands:
 440
 441  # echo '!stacktrace' > \
 442        /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
 443
 444  # echo '!stacktrace:5 if bytes_req >= 65536' > \
 445        /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
 446
 447  The latter can also be removed more simply by the following (without
 448  the filter):
 449
 450  # echo '!stacktrace:5' > \
 451        /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
 452
 453  Note that there can be only one stacktrace trigger per triggering
 454  event.
 455
 456- snapshot
 457
 458  This command causes a snapshot to be triggered whenever the
 459  triggering event occurs.
 460
 461  The following command creates a snapshot every time a block request
 462  queue is unplugged with a depth > 1.  If you were tracing a set of
 463  events or functions at the time, the snapshot trace buffer would
 464  capture those events when the trigger event occurred:
 465
 466  # echo 'snapshot if nr_rq > 1' > \
 467        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 468
 469  To only snapshot once:
 470
 471  # echo 'snapshot:1 if nr_rq > 1' > \
 472        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 473
 474  To remove the above commands:
 475
 476  # echo '!snapshot if nr_rq > 1' > \
 477        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 478
 479  # echo '!snapshot:1 if nr_rq > 1' > \
 480        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 481
 482  Note that there can be only one snapshot trigger per triggering
 483  event.
 484
 485- traceon/traceoff
 486
 487  These commands turn tracing on and off when the specified events are
 488  hit. The parameter determines how many times the tracing system is
 489  turned on and off. If unspecified, there is no limit.
 490
 491  The following command turns tracing off the first time a block
 492  request queue is unplugged with a depth > 1.  If you were tracing a
 493  set of events or functions at the time, you could then examine the
 494  trace buffer to see the sequence of events that led up to the
 495  trigger event:
 496
 497  # echo 'traceoff:1 if nr_rq > 1' > \
 498        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 499
 500  To always disable tracing when nr_rq  > 1 :
 501
 502  # echo 'traceoff if nr_rq > 1' > \
 503        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 504
 505  To remove the above commands:
 506
 507  # echo '!traceoff:1 if nr_rq > 1' > \
 508        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 509
 510  # echo '!traceoff if nr_rq > 1' > \
 511        /sys/kernel/debug/tracing/events/block/block_unplug/trigger
 512
 513  Note that there can be only one traceon or traceoff trigger per
 514  triggering event.
 515
lxr.linux.no kindly hosted by Redpill Linpro AS, provider of Linux consulting and operations services since 1995.