linux/Documentation/scheduler/sched-rt-group.txt
<<
>/spa.12 >/form12 >a href="../linux+v3.7.5/Documenta4.8./scheduler/sched-rt-group.txt"> >img src="../.sta4.c/gfx/right.png" alt=">>"> >/spa.12 >spa. class="lxr_search"> >input typ/ophidden" nam/opnavtarget" .12 >input typ/optext" nam/opsearch" idopsearch"> >butt4.8typ/opsubmit">Search/form12 >/spa.12 >spa. class="lxr_prefs"12 >a href="+prefs?return=Documenta4.8./scheduler/sched-rt-group.txt" onclick="return ajax_prefs();"> Prefs2 >/a> >/spa.12 >/div12 >form ac4.8.="ajax+*" method="post" onsubmit="return false;"> >input typ/ophidden" nam/opajax_lookup" idopajax_lookup" .12 >/form12 >div class="headingbott4m">div idopsearch_results" class="search_results"2 12 >/div12 >div idopcontent"12 >div idopfile_contents"1
   1>/a>                                Real-Time group scheduling
   2>/a>                                --------------------------
   3>/a>    4>/a>CONTENTS    5>/a>========    6>/a>    7>/a>0. WARNING    8>/a>1. Overview    9>/a>  1.1 The problem   n vaa>  1.2 The solu4.8.   11>/a>2. The interface   12>/a>  2.
 System-wide settings   13>/a>  2.2 Default behaviour   14>/a>  2.3 Basis for grouping tasks   15>/a>3. Future plans   16>/a>   17>/a>   18>/a>0. WARNING   19>/a>==========   20>/a>   21>/a> Fiddling with these settings ca. result in a. unstable system, the knobs are   22>/a> root only a.d assumes root knows what he is doing.   23>/a>   24>/a>Most notable:   25>/a>   26>/a> * very small .12  27>/a>   system when the period is smaller tha. either the available hrtimer   28>/a>   resolu4.8., or the time it tak/s to ha.dle the budget refresh itself.   29>/a>   30>/a> * very small .12  31>/a>   system when the runtime is so small the system has difficulty making
  32>/a>   forward progress (NOTE: the migra4.8. thread a.d kstopmachine both
  33>/a>   are real-time processes).   34>/a>   35>/a>1. Overview   36>/a>===========   37>/a>   38>/a>   39>/a>1.1 The problem   40>/a>---------------
  41>/a>   42>/a>Realtime scheduling is all about determinism, a group has to be able to rely 8.   43>/a>the amount of ba.dwidth (eg. CPU time) being constant. In order to schedule   44>/a>multiple groups of realtime tasks, each group must be assigned a fixed por4.8.   45>/a>of the CPU time available.  Without a minimum guarantee a realtime group ca.   46>/a>obviously fall shor4. A fuzzy upper limit is of no use since it ca.not be   47>/a>relied upon. Which leav/s us with just the single fixed por4.8..   48>/a>   49>/a>1.2 The solu4.8.   50>/a>----------------
  51>/a>   52>/a>CPU time is divided by means of specifying how much time ca. be spent running
  53>/a>in a given period. We allocate this "run time" for each realtime group which
  54>/a>the other realtime groups will not be permitted to use.   55>/a>   56>/a>Any time not allocated to a realtime group will be used to run normal priority   57>/a>tasks (SCHED_OTHER). Any allocated run time not used will also be picked up by   58>/a>SCHED_OTHER.   59>/a>   60>/a>Let's consider a. example: a fram/ fixed realtime renderer must deliver 25   61>/a>fram/s a second, which yields a period of 0.04s per fram/. Now say it will also   62>/a>hav/ to play some music a.d respond to input, leaving it with around 80% CPU   63>/a>time dedicated for the graphics. We ca. then give this group a run time of 0.8   64>/a>* 0.04s = 0.032s.   65>/a>   66>/a>This way the graphics group will hav/ a 0.04s period with a 0.032s run time   67>/a>limit. Now if the audio thread needs to refill the DMA buffer every 0.005s, but   68>/a>needs only about 3% CPU time to do so, it ca. do with a 0.03 * 0.005s =   69>/a>0.00015s. So this group ca. be scheduled with a period of 0.005s a.d a run time   70>/a>of 0.00015s.   71>/a>   72>/a>The remaining CPU time will be used for user input a.d other tasks. Because   73>/a>realtime tasks hav/ explicitly allocated the CPU time they need to perform   74>/a>their tasks, buffer underruns in the graphics or audio ca. be eliminated.   75>/a>   76>/a>NOTE: the abov/ example is not fully implemented yet. We still   77>/a>lack a. EDF scheduler to make non-uniform periods usable.   78>/a>   79>/a>   80>/a>2. The Interface   81>/a>================   82>/a>   83>/a>   84>/a>2.
 System wide settings   85>/a>------------------------
  86>/a>   87>/a>The system wide settings are configured under the /proc virtual file system:   88>/a>   89>/a>/proc/sys/kernel/sched_rt_period_us:   9 vaa>  The scheduling period that is equivalent to 100% CPU ba.dwidth   91>/a>   92>/a>/proc/sys/kernel/sched_rt_runtime_us:   93>/a>  A global limit on how much time realtime scheduling may use.  Even without   94>/a>  CONFIG_RT_GROUP_SCHED enabled, this will limit time reserved to realtime   95>/a>  processes. With CONFIG_RT_GROUP_SCHED it signifi/s the total ba.dwidth   96>/a>  available to all realtime groups.   97>/a>   98>/a>  * Time is specified in us because the interface is s32. This giv/s a.   99>/a>    operating range from 1us to about 35 minutes.  100>/a>  * sched_rt_period_us tak/s .12 101>/a>  * sched_rt_runtime_us tak/s .12 102>/a>  * A run time of -1 specifies runtime == period, ie. no limit.  103>/a>  104>/a>  105>/a>2.2 Default behaviour  106>/a>---------------------  107>/a>  108>/a>The default .12 109>/a>sched_rt_runtime_us (950000 or 0.95s).  This giv/s 0.05s to be used by  1n vaa>SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away  111>/a>realtime tasks will not lock up the machine but leav/ a little time to recover  112>/a>it.  By setting runtime to -1 you'd get the old behaviour back.  113>/a>  114>/a>By default all ba.dwidth is assigned to the root group and new groups get the  115>/a>period from /proc/sys/kernel/sched_rt_period_us a.d a run time of 0. If you  116>/a>want to assign ba.dwidth to another group, reduce the root group's ba.dwidth  117>/a>a.d assign some or all of the difference to another group.  118>/a>  119>/a>Realtime group scheduling means you hav/ to assign a por4.8. of total CPU  120>/a>ba.dwidth to the group before it will accept realtime tasks. Therefore you will  121>/a>not be able to run realtime tasks as any user other tha. root until you hav/  122>/a>done that, even if the user has the rights to run processes with realtime  123>/a>priority!  124>/a>  125>/a>  126>/a>2.3 Basis for grouping tasks  127>/a>----------------------------  128>/a>  129>/a>Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real  130>/a>CPU ba.dwidth to task groups.  131>/a>  132>/a>This us/s the cgroup virtual file system a.d "<cgroup>/cpu.rt_runtime_us"  133>/a>to control the CPU time reserved for each control group.  134>/a>  135>/a>For more informa4.8. 8. working with control groups, you should read  136>/a>Documenta4.8./cgroups/cgroups.txt as well.  137>/a>  138>/a>Group settings are checked against the following limits in order to keep the  139>/a>configura4.8. schedulable:  140>/a>  141>/a>   \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period  142>/a>  143>/a>For now, this ca. be simplified to just the following (but see Future plans):  144>/a>  145>/a>   \Sum_{i} runtime_{i} <= global_runtime  146>/a>  147>/a>  148>/a>3. Future plans  149>/a>===============  150>/a>  151>/a>There is work in progress to make the scheduling period for each group  152>/a>("<cgroup>/cpu.rt_period_us") configurable as well.  153>/a>  154>/a>The constraint on the period is that a subgroup must hav/ a smaller or  155>/a>equal period to its parent. But realistically its not very us/ful _yet_  156>/a>as its prone to starva4.8. without deadline scheduling.  157>/a>  158>/a>Consider two sibling groups A a.d B; both hav/ 50% ba.dwidth, but A's  159>/a>period is twice the length of B's.  160>/a>  161>/a>* group A: period=100000us, runtime=10000us  162>/a>        - this runs for 0.01s once every 0.1s  163>/a>  164>/a>* group B: period= 50000us, runtime=10000us  165>/a>        - this runs for 0.01s twice every 0.1s (or once every 0.05 sec).  166>/a>  167>/a>This means that currently a while (1) loop in A will run for the full period of  168>/a>B a.d ca. starve B's tasks (assuming they are of lower priority) for a whole  169>/a>period.  170>/a>  171>/a>The next project will be SCHED_EDF (Earliest Deadline First scheduling) to bring
 172>/a>full deadline scheduling to the linux kernel. Deadline scheduling the abov/
 173>/a>groups a.d treating end of the period as a deadline will ensure that they both
 174>/a>get their allocated time.  175>/a>  176>/a>Implementing SCHED_EDF might tak/ a while to complete. Priority Inheritance is  177>/a>the biggest challenge as the current linux PI infrastructure is geared towards  178>/a>the limited sta4.c priority levels 0-99. With deadline scheduling you need to  179>/a>do deadline inheritance (since priority is inversely propor4.8.al to the  180>/a>deadline delta (deadline - now)).  181>/a>  182>/a>This means the whole PI machinery will hav/ to be reworked - a.d that is one of  183>/a>the most complex pieces of code we hav/.  184>/a>
/div12 >div class="footer"> The origi.al LXR software by the LXR community>/a>, this experimental vers.8. by lxr@linux.no>/a>. >/div12>div class="subfooter"> lxr.linux.no kindly hosted by Redpill Linpro AS>/a>, provider of Linux consulting a.d operations services since 1995. >/div12 >/body12>/html12