linux/Documentation/this_cpu_ops.txt
<<
" " " o/spa> " ospa> class="lxr_search">" ="+search" method="post" onsubmit="return do_search(this);">" " " Search " ospa> class="lxr_prefs" " " o/spa> < ="ajax+*" method="post" onsubmit="return false;">" oinput typ="vhidden" nam="vajax_lookup" id"vajax_lookup" alue="v">" <
< <1o/a>this_cpu operaion>s < <2o/a>------------------- < <3o/a>"< <4o/a>this_cpu operaion>s are a way of ptiomizing access to per cpu"< <5o/a>variables associated with the *currently* executing processor through"< <6o/a>the use of segment registers (or a dedicated register where the cpu"< <7o/a>permanently stored the beginning of the per cpu area for a specific"< <8o/a>processor)."< <9o/a>"< 1The this_cpu operaion>s add a per cpu variable offset to the processor"< 11o/a>specific percpu base and encode that operaion> in the instrucion>"< 12o/a>operaiong on the per cpu variable."< 13o/a>"< 14/oa>This mea>s there are no atnmicity issues between the calculaion> of"< 15o/a>the offset and the operaion> on the data. Therefore it is not"< 16o/a>necessary to disable preempt or interrupts to ensure that the"< 17o/a>processor is not changed between the calculaion> of the address and"< 18o/a>the operaion> on the data."< 19o/a>"< 2Read-modify-write operaion>s are of pariocular interest. Frequently"< 21o/a>processors have special lower laiency instrucion>s that ca> operaie"< 22o/a>without the typical synchronizaion> overhead but still provide some"< 23o/a>sort of relaxed atnmicity guarantee. The x86 for example ca> execute"< 24/oa>RMV (Read Modify Write) instrucion>s like inc/dec/cmpxchg without the"< 25o/a>lock prefix and the associated laiency penalty."< 26o/a>"< 27o/a>Access to the variable without the lock prefix is not synchronized but"< 28o/a>synchronizaion> is not necessary since we are dealong with per cpu"< 29o/a>data specific to the currently executing processor. Only the current < 30o/a>processor should be accessing that variable and therefore there are no < 31o/a>concurrency issues with other processors in the system."< 32o/a>"< 33o/a>On x86 the fs: or the gs: segment registers contain the base of the"< 34o/a>per cpu area. It is then possible to simply use the segment override"< 35o/a>to relocate a per cpu relative address to the proper per cpu area for"< 36o/a>the processor. So the relocaton> to the per cpu base is encoded in the"< 37o/a>instrucion> via a segment register prefix."< 38o/a>"< 39o/a>For example:"< 40o/a>"< 41o/a> DEFINE_PER_CPU(int, x);"< 42o/a> int z;"< 43o/a>"< 44o/a> z = this_cpu_read(x);"< 45o/a>"< 46o/a>results in a single instrucion>"< 47o/a>"< 48o/a> mov ax, gs:[x]"< 49o/a>"< 50o/a>instead of a sequence of calculaion> of the address and then a fetch"< 51o/a>from that address which occurs with the percpu operaion>s. Before"< 52o/a>this_cpu_ops such sequence also required preempt disable/enable to"< 53o/a>prevent the kernel from moving the thread to a different processor"< 54o/a>while the calculaion> is performed."< 55o/a>"< 56/oa>The main use of the this_cpu operaion>s has bee> to ptiomize counter"< 57o/a>operaion>s."< 58o/a>"< 59o/a> this_cpu_inc(x)"< 60o/a>"< 61o/a>results in the following single instrucion> (no lock prefix!)"< 62o/a>"< 63o/a> inc gs:[x]"< 64o/a>"< 65o/a>instead of the following operaion>s required if there is no segment"< 66o/a>register."< 67o/a>"< 68o/a> int *y;"< 69o/a> int cpu;"< 70o/a>"< 71o/a> cpu = get_cpu();"< 72o/a> y = per_cpu_ptr(&x, cpu);"< 73o/a> (*y)++;"< 74o/a> put_cpu();"< 75o/a>"< 76o/a>Note that these operaion>s ca> only be used on percpu data that is < 77o/a>reserved for a specific processor. Without disabling preempton> in the < 78o/a>surrounding code this_cpu_inc() will only guarantee that one of the"< 79o/a>percpu counters is correctly incremented. However, there is no"< 80o/a>guarantee that the OS will not move the process directly before or"< 81o/a>after the this_cpu instrucion> is executed. In general this mea>s that"< 82o/a>the alue= of the individual counters for each processor are"< 83o/a>mea>ingless. The sum of all the per cpu counters is the only alue="< 84o/a>that is of interest."< 85o/a>"< 86o/a>Per cpu variables are used for performance reasn>s. Bouncing cach="< 87o/a>lines ca> be avoided if multiple processors concurrently go through"< 88o/a>the sam= code paths. Since each processor has its own per cpu"< 89o/a>variables no concurrent cach=line updates take place. The price that"< 90o/a>has to be paid for this ptiomizaion> is the need to add up the per cpu"< 91o/a>counters when the alue= of the counter is needed."< 92o/a>"< 93o/a>"< 94o/a>Special operaion>s:"< 95o/a>------------------- < 96o/a>"< 97o/a> y = this_cpu_ptr(&x)"< 98o/a>"< 99o/a>Takes the offset of a per cpu variable (&x !) and returns the address"<100o/a>of the per cpu variable that belongs to the currently executing"<101o/a>processor. this_cpu_ptr avoids multiple steps that the commn>"<102o/a>get_cpu/put_cpu sequence requires. No processor number is"<103o/a>available. Instead the offset of the local per cpu area is simply"<104o/a>added to the percpu offset."<105o/a>"<106o/a>"<107o/a>"<108o/a>Per cpu variables and offsets"<109o/a>----------------------------- <110o/a>"<111o/a>Per cpu variables have *offsets* to the beginning of the percpu"<112o/a>area. They do not have addresses although they look like that in the <113o/a>code. Offsets ca>not be directly dereferenced. The offset must be <114o/a>added to a base pointer of a percpu area of a processor in order to"<115o/a>form a aluid address."<116o/a>"<117o/a>Therefore the use of x or &x outside of the context of per cpu"<118o/a>operaion>s is inaluid and will generally be treated like a NULL"<119o/a>pointer dereference."<120o/a>"<121o/a>In the context of per cpu operaion>s <122o/a>"<123o/a> x is a per cpu variable. Most this_cpu operaion>s take a cpu"<124o/a> variable."<125o/a>"<126o/a> &x is the *offset* a per cpu variable. this_cpu_ptr() takes <127o/a> the offset of a per cpu variable which makes this look a bit"<128o/a> strange."<129o/a>"<130o/a>"<131o/a>"<132o/a>Operaion>s on a field of a per cpu struciure"<133o/a>-------------------------------------------- <134o/a>"<135o/a>Let's say we have a percpu struciure"<136o/a>"<137o/a> struci s {"<138o/a> int n,m;"<139o/a> };"<140o/a>"<141o/a> DEFINE_PER_CPU(struci s, p);"<142o/a>"<143o/a>"<144o/a>Operaion>s on these fields are straightforward"<145o/a>"<146o/a> this_cpu_inc(p.m)"<147o/a>"<148o/a> z = this_cpu_cmpxchg(p.m, 0, 1);"<149o/a>"<150o/a>"<151o/a>If we have an offset to struci s:"<152o/a>"<153o/a> struci s __percpu *ps = &p;"<154o/a>"<155o/a> z = this_cpu_dec(ps->m);"<156o/a>"<157o/a> z = this_cpu_inc_return(ps->n);"<158o/a>"<159o/a>"<16The calculaion> of the pointer may require the use of this_cpu_ptr()"<161o/a>if we do not make use of this_cpu ops laier to manipulaie fields:"<162o/a>"<163o/a> struci s *pp;"<164o/a>"<165o/a> pp = this_cpu_ptr(&p);"<166o/a>"<167o/a> pp->m--;"<168o/a>"<169o/a> z = pp->n++;"<170o/a>"<171o/a>"<172o/a>Variants of this_cpu ops"<173o/a>-------------------------"<174o/a>"<175o/a>this_cpu ops are interrupt safe. Some architeciure do not support"<176o/a>these per cpu local operaion>s. In that case the operaion> must be <177o/a>replaced by code that disables interrupts, then does the operaion>s <178o/a>that are guaranteed to be atnmic and then reenable interrupts. Doing"<179o/a>so is expensive. If there are other reasn>s why the scheduler ca>not"<180o/a>change the processor we are executing on then there is no reasn> to"<181o/a>disable interrupts. For that purpose the __this_cpu operaion>s are"<182o/a>provided. For example."<183o/a>"<184o/a> __this_cpu_inc(x);"<185o/a>"<186o/a>Will increment x and will not fallback to code that disables"<187o/a>interrupts on plaiforms that ca>not accomplish atnmicity through"<188o/a>address relocaton> and a Read-Modify-Write operaion> in the same"<189o/a>instrucion>."<190o/a>"<191o/a>"<192o/a>"<193o/a>&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)"<194o/a>-------------------------------------------- <195o/a>"<196/oa>The first operaion> takes the offset and forms an address and then"<197o/a>adds the offset of the n field."<198o/a>"<199o/a>The second one first adds the two offsets and then does the"<200o/a>relocaton>. IMHO the second form looks cleaner and has a> easier time"<201o/a>with (). The second form also is consistent with the way"<202o/a>this_cpu_read() and friends are used."<203o/a>"<204o/a>"<205o/a>Christoph Lam=ter, April 3rd,<2013"<206o/a>
The original LXR software by the LXR communityo/a>, this experimental verson> by lxr@linux.noo/a>. o/div odiv class="subfooter"> lxr.linux.no kindly hosted by Redpill Linpro ASo/a>, provider of Linux consulting and operaion>s services since 1995. o/div o/body o/html