linux/Documentation/vm/page_migration
<<
ion v3./spa > v3./form > v3.a ion v3 href="../linux+v3.7.4/Documenta" /vm/page_migra" ">ion v3.img src="../.sta" c/gfx/right.png" alt=">>">io./spa >io.spa class="lxr_search">ion ion v3.input typon>hidden" namon>navtarget" option>">ion v3.input typon>text" namon>search" idn>search">ion v3.butt1submit">Search v3./form > ./spa >io.spa class="lxr_prefs" > v3.a href="+prefs?return=Documenta" /vm/page_migra" "ion v3 onclick="return ajax_prefs();">ion v3Prefs> v3./a>io./spa >n v3 3./div >n v3 3.form ac" ="ajax+*" method="post" onsubmit="return false;">io.input typon>hidden" namon>ajax_lookup" idn>ajax_lookup" option>">in v3 3./form >in v3 3.div class="headingbott1m">
n v3
n v3 v3 3.div idn>search_results" class="search_results"> v >n v3 3./div > .div idn>content" > .div idn>file_contents"
fd/46/e2c603678f45794b5d94e823b2c60bbc5cff_3/0" L1" class="line" namon>L1">3 31./a>Page migra"
	 
L2" class="line" namon>L2">3 32./a>--------------
L3" class="line" namon>L3">3 33./a>iL4" class="line" namon>L4">3 34./a>Page migra"
	  allows the moving of the physical loca"
	  of pages betwee 
L5" class="line" namon>L5">3 35./a>nodes i  a numa system while the process is running. This means that the
L6" class="line" namon>L6">3 36./a>virtual addresses that the process sees do not change. However, the
L7" class="line" namon>L7">3 37./a>system rearranges the physical loca"
	  of those pages.
L8" class="line" namon>L8">3 38./a>iL9" class="line" namon>L9">3 39./a>The mai  intend of page migra"
	  is to reduce the latency of memory accessiL10" class="line" namon>L10">3 .18.a>by moving pages near to the processor where the process accessing that memoryiL11" class="line" namon>L11">3 11./a>is running.iL12" class="line" namon>L12">3 12./a>iL13" class="line" namon>L13">3 13./a>Page migra"
	  allows a process to manually reloca"e the node on which itsiL14" class="line" namon>L14">3 14./a>pages are loca"ed through the MF_MOVE and MF_MOVE_ALL 11"
	 s while settingiL15" class="line" namon>L15">3 15./a>a new memory policy via mbind(). The pages of process ca  also be reloca"ediL16" class="line" namon>L16">3 16./a>from another process using the sys_migra"e_pages() func"
	  call. TheiL17" class="line" namon>L17">3 17./a>migra"e_pages func"
	  call takes two sets of nodes and moves pages of aiL18" class="line" namon>L18">3 18./a>process that are loca"ed 1L19" class="line" namon>L19">3 19./a>Page migra"
	  func"
	 s are provided by the numactl package by Andi Klee 
L20" class="line" namon>L20">3 218.a>(a vers
	  later than 0.9.3 is required. Get it from
L21" class="line" namon>L21">3 21./a>ftp://oss.sgi.com/www/projects/libnuma/download/). numactl provides libnuma
L22" class="line" namon>L22">3 22./a>which provides a  interface similar to other numa func"
	 ality for page
L23" class="line" namon>L23">3 23./a>migra"
	 .  cat /proc/<pid>/numa_maps allows an easy review of where the
L24" class="line" namon>L24">3 24./a>pages of a process are loca"ed. See also the numa_maps documenta"
	  i  the
L25" class="line" namon>L25">3 25./a>proc(5) man page.iL26" class="line" namon>L26">3 26./a>iL27" class="line" namon>L27">3 27./a>Manual migra"
	  is useful if for example the scheduler has reloca"ediL28" class="line" namon>L28">3 28./a>a process to a processor 	  a distant node. A batch scheduler or a 
L29" class="line" namon>L29">3 29./a>administra"or may detect the situa"
	  and move the pages of the process
L30" class="line" namon>L30">3 318.a>nearer to the new processor. The kernel itself does only provide
L31" class="line" namon>L31">3 31./a>manual page migra"
	  support. Automa"
c page migra"
	  may be implementediL32" class="line" namon>L32">3 32./a>through user space processes that move pages. A special func"
	  calliL33" class="line" namon>L33">3 33./a>"move_pages" allows the moving of individual pages withi  a process.iL34" class="line" namon>L34">3 34./a>A NUMA profiler may f.e. obtai  a log showing frequent off nodeiL35" class="line" namon>L35">3 35./a>accesses and may use the result to move pages to more advantageous
L36" class="line" namon>L36">3 36./a>loca"
	 s.iL37" class="line" namon>L37">3 37./a>iL38" class="line" namon>L38">3 38./a>Larger installa"
	 s usually parti"
	  the system using cpusets intoiL39" class="line" namon>L39">3 39./a>sec"
	 s of nodes. Paul Jacks	  has equipped cpusets with the ability toiL40" class="line" namon>L40">3 418.a>move pages whe  a task is moved to another cpuset (SeeiL41" class="line" namon>L41">3 41./a>Documenta"
	 /cgroups/cpusets.txt).iL42" class="line" namon>L42">3 42./a>Cpusets allows the automa"
	  of process locality. If a task is moved toiL43" class="line" namon>L43">3 43./a>a new cpuset the  also all its pages are moved with it so that the
L44" class="line" namon>L44">3 44./a>performance of the process does not sink drama"
cally. Also the pages
L45" class="line" namon>L45">3 45./a>of processes i  a cpuset are moved if the allowed memory nodes of aiL46" class="line" namon>L46">3 46./a>cpuset are changed.iL47" class="line" namon>L47">3 47./a>iL48" class="line" namon>L48">3 48./a>Page migra"
	  allows the preserva"
	  of the rela"
ve loca"
	  of pagesiL49" class="line" namon>L49">3 49./a>withi  a group of nodes for all migra"
	  techniques which will preserve aiL50" class="line" namon>L50">3 50./a>particular memory alloca"
	  pattern genera"ed eve  after migra"
ng aiL51" class="line" namon>L51">3 51./a>process. This is necessary i  order to preserve the memory latencies.iL52" class="line" namon>L52">3 52./a>Processes will run with similar performance after migra"
on.iL53" class="line" namon>L53">3 53./a>iL54" class="line" namon>L54">3 54./a>Page migra"
	  occurs i  several steps. First a high leveliL55" class="line" namon>L55">3 55./a>descri4.11L56" class="line" namon>L56">3 56./a>(for userspace usage see the Andi Klee 's numactl package ment
	 ed above)iL57" class="line" namon>L57">3 57./a>and the  a low level descri4.11L58" class="line" namon>L58">3 58./a>iL59" class="line" namon>L59">3 59./a>A. In kernel use of migra"e_pages()
L60" class="line" namon>L60">3 60./a>-----------------------------------
L61" class="line" namon>L61">3 61./a>iL62" class="line" namon>L62">3 62./a>1. Remove pages from the LRU.iL63" class="line" namon>L63">3 63./a>iL64" class="line" namon>L64">3 64./a>   Lists of pages to be migra"ed are genera"ed by scanning overiL65" class="line" namon>L65">3 65./a>   pages and moving them into lists. This is done byiL66" class="line" namon>L66">3 66./a>   calling isolate_lru_page().iL67" class="line" namon>L67">3 67./a>   Calling isolate_lru_page increases the references to the page
L68" class="line" namon>L68">3 68./a>   so that it cannot vanish while the page migra"
	  occurs.iL69" class="line" namon>L69">3 69./a>   It also prevents the swapper or other scans to encounter
L70" class="line" namon>L70">3 70./a>   the page.iL71" class="line" namon>L71">3 71./a>iL72" class="line" namon>L72">3 72./a>2. We need to have a func"
	  of type new_page_t that ca  be
L73" class="line" namon>L73">3 73./a>   passed to migra"e_pages(). This func"
	  should figure out
L74" class="line" namon>L74">3 74./a>   how to alloca"e the correct new page give  the old page.iL75" class="line" namon>L75">3 75./a>iL76" class="line" namon>L76">3 76./a>3. The migra"e_pages() func"
	  is called which attemptsiL77" class="line" namon>L77">3 77./a>   to do the migra"
	 . It will call the func"
	  to alloca"eiL78" class="line" namon>L78">3 78./a>   the new page for each page that is considered foriL79" class="line" namon>L79">3 79./a>   moving.
L80" class="line" namon>L80">3 80./a>iL81" class="line" namon>L81">3 81./a>B. How migra"e_pages() worksiL82" class="line" namon>L82">3 82./a>----------------------------
L83" class="line" namon>L83">3 83./a>iL84" class="line" namon>L84">3 84./a>migra"e_pages() does several passes over its list of pages. A page is movediL85" class="line" namon>L85">3 85./a>if all references to a page are removable at the time. The page hasiL86" class="line" namon>L86">3 86./a>already bee  removed from the LRU via isolate_lru_page() and the refcountiL87" class="line" namon>L87">3 87./a>is increased so that the page cannot be freed while page migra"
	  occurs.iL88" class="line" namon>L88">3 88./a>iL89" class="line" namon>L89">3 89./a>Steps:iL90" class="line" namon>L90">3 90./a>iL91" class="line" namon>L91">3 91./a>1. Lock the page to be migra"ediL92" class="line" namon>L92">3 92./a>iL93" class="line" namon>L93">3 93./a>2. Insure that writeback is complete.iL94" class="line" namon>L94">3 94./a>iL95" class="line" namon>L95">3 95./a>3. Prep the new page that we want to move to. It is lockediL96" class="line" namon>L96">3 96./a>   and set to not being uptoda"e so that all accesses to the newiL97" class="line" namon>L97">3 97./a>   page immedia"ely lock while the move is in progress.iL98" class="line" namon>L98">3 98./a>iL99" class="line" namon>L99">3 99./a>4. The new page is prepped with some settings from the old page so thatiL100" class="line" namon>L100">3100./a>   accesses to the new page will discover a page with the correct settings.iL101" class="line" namon>L101">3101./a>iL102" class="line" namon>L102">3102./a>5. All the page table references to the page are conver"ediL103" class="line" namon>L103">3103./a>   to migra"
	  entries or dropped (nonlinear vmas).iL104" class="line" namon>L104">3104./a>   This decrease the mapcount of a page. If the resultingiL105" class="line" namon>L105">3105./a>   mapcount is not zero the  we do not migra"e the page.iL106" class="line" namon>L106">3106./a>   All user space processes that attempt to access the page
L107" class="line" namon>L107">3107./a>   will now wait 1L108" class="line" namon>L108">3108./a>iL109" class="line" namon>L109">3109./a>6. The radix tree lock is taken. This will cause all processes tryingiL110" class="line" namon>L110">3110./a>   to access the page via the mapping to block 1L111" class="line" namon>L111">3111./a>iL112" class="line" namon>L112">3112./a>7. The refcount of the page is exami ed and we back out if references remai 
L113" class="line" namon>L113">3113./a>   otherwise we know that we are the only one referencing this page.iL114" class="line" namon>L114">3114./a>iL115" class="line" namon>L115">3115./a>8. The radix tree is checked and if it does not contaiL116" class="line" namon>L116">3116./a>   page the  we back out because someone else modified/the radix tree.iL117" class="line" namon>L117">3117./a>iL118" class="line" namon>L118">3118./a>9. The radix tree is changed to point to the new page.iL119" class="line" namon>L119">3119./a>iL120" class="line" namon>L120">31218.a>10. The reference count of the old page is dropped because the radix treeiL121" class="line" namon>L121">3121./a>v3 3reference is gone. A reference to the new page is established becauseiL122" class="line" namon>L122">3122./a>    the new page is referenced to by the radix tree.iL123" class="line" namon>L123">3123./a>iL124" class="line" namon>L124">3124./a>11. The radix tree lock is dropped. With that lookups i  the mappingiL125" class="line" namon>L125">3125./a>    become possible agai<. Processes will move from spinning o  the tree_lockiL126" class="line" namon>L126">3126./a>    to sleeping o  the locked new page.iL127" class="line" namon>L127">3127./a>iL128" class="line" namon>L128">3128./a>12. The page contents are copied to the new page.iL129" class="line" namon>L129">3129./a>iL130" class="line" namon>L130">31318.a>13. The remai ing page flags are copied to the new page.iL131" class="line" namon>L131">3131./a>iL132" class="line" namon>L132">3132./a>14. The old page flags are cleared to indica"e that the page doesiL133" class="line" namon>L133">3133./a>    not provide any i forma"
	  anymore.iL134" class="line" namon>L134">3134./a>iL135" class="line" namon>L135">3135./a>15. Queued up writeback o  the new page is triggered.iL136" class="line" namon>L136">3136./a>iL137" class="line" namon>L137">3137./a>16. If migra"
	  entries were page the  replace them with real ptes. DoingiL138" class="line" namon>L138">3138./a>    so will enable access for user space processes not already waiting foriL139" class="line" namon>L139">3139./a>    the page lock.iL140" class="line" namon>L140">3140./a>iL141" class="line" namon>L141">3141./a>19. The page locks are dropped from the old and new page.iL142" class="line" namon>L142">3142./a>    Processes waiting 1L143" class="line" namon>L143">3143./a>    and will reach the new page.iL144" class="line" namon>L144">3144./a>iL145" class="line" namon>L145">3145./a>20. The new page is moved to the LRU and ca  be scanned by the swapperiL146" class="line" namon>L146">3146./a>    etc agai<.iL147" class="line" namon>L147">3147./a>iL148" class="line" namon>L148">3148./a>Christoph Lamoter, May 8, 2006.iL149" class="line" namon>L149">3149./a>iL150" class="line" namon>L150">3150./a>
./div > .div class="footer"> The original LXR software by the LXR community./a>, this experimental vers by lxr@linux.no./a>. ./div >.div class="subfooter"> lxr.linux.no kindly hos"ed by Redpill Linpro AS./a>, provider of Linux consulting and opera" s services since 1995. ./div > ./body >./html >