linux/Documentation/nommu-mmap.txt
<<
ptio v2/spa v2/form v2a ptio v href="../linux+v3e="2/Documenta1" /nommu-mmap.txt">ptio v2img src="../.sta1"c/gfx/right.png" alt=">>">pt2/spa pt2spa class="lxr_search">ptioptio v2input typptihidden" namptinavtarget" 1ptio v2input typptitext" namptisearch" idtisearch">ptio v2butt2.1typptisubmit">Searchptio vPrefs v2/a>pt2/spa io v v2/div io v v2form ac1" ="ajax+*" method="post" onsubmit="return false;">pt2input typptihidden" namptiajax_lookup" idtiajax_lookup" 1pio v v2/form pio v v2div class="headingbott2m">
v v12/a> ============================= v v22/a> NO-MMU MEMORY MAPPING SUPPORT v v32/a> ============================= v v42/a>pv v52/a>The kernel has limited support for memory mapping under no-MMU condi1" s, suchpv v62/a>as are used in uClinux environments. From the userspace point of view, memorypv v72/a>mapping is made use of in conjunc1" with the mmap() system call, the shmat()pv v82/a>call and the execve() system call. From the kernel's point of view, execve()pv v92/a>mapping is ac1ually performed by the binfmt drivers, which call back into thepv 6.27a>mmap() routines to do the ac1ual work.pv 112/a>pv 122/a>Memory mapping behaviour also involves the way fork(), vfork(), clone() andpv 132/a>ptrace() work. Under uClinux there is no fork(), and clone() must be suppliedpv 142/a>the CLONE_VM flag.pv 152/a>pv 162/a>The behaviour is similar between the MMU and no-MMU cases, but not identical;pv 172/a>and it's also much more restricted in the latter case:pv 182/a>pv 192/a> (*) Anonymous mapping, MAP_PRIVATE v 202/a>pv 212/a> In the MMU case: VM reg" s backed by arbitrary pages; copy-on-writepv 222/a> across fork.pv 232/a>pv 242/a> In the no-MMU case: VM reg" s backed by arbitrary contiguous ru s ofpv 252/a> pages.pv 262/a>pv 272/a> (*) Anonymous mapping, MAP_SHAREDpv 282/a>pv 292/a> These behave very much like private mappings, except that they'repv 302/a> shared across fork() or clone() without CLONE_VM in the MMU case. Sincepv 312/a> the no-MMU case doesn't support these, behaviour is identical topv 322/a> MAP_PRIVATE there.pv 332/a>pv 342/a> (*) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE v 352/a>pv 362/a> In the MMU case: VM reg" s backed by pages read from file; changes topv 372/a> the underlying file are reflected in the mapping; copied across fork.pv 382/a>pv 392/a> In the no-MMU case: v 402/a>pv 412/a> - If one exists, the kernel will re-use an existing mapping to thepv 422/a> samp segment of the samp file if that has compa1"ble permiss" s,pv 432/a> even if this was created by another process.pv 442/a>pv 452/a> - If poss"ble, the file mapping will be directly on the backing devicepv 462/a> if the backing device has the BDI_CAP_MAP_DIRECT capability andpv 472/a> appropriate mapping protec1" capabilities. Ramfs, romfs, cramfspv 482/a> and mtd might all permit this.pv 492/a>pv 502/a> - If the backing device device can't or won't permit direct sharing,pv 512/a> but does have the BDI_CAP_MAP_COPY capability, then a copy of thepv 522/a> appropriate bit of the file will be read into a contiguous bit ofpv 532/a> memory and any extraneous space beyond the EOF will be clearedpv 542/a>pv 552/a> - Writes to the file do not affect the mapping; writes to the mappingpv 562/a> are vis"ble in other processes (no MMU protec1" ), but should notpv 572/a> happen.pv 582/a>pv 592/a> (*) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE v 602/a>pv 612/a> In the MMU case: like the non-PROT_WRITE case, except that the pages inpv 622/a> ques1" get copied before the write ac1ually happens. From that pointpv 632/a> writes to the file underneath that page no longer get reflected intopv 642/a> the mapping's backing pages. The page is then backed by swap instead.pv 652/a>pv 662/a> In the no-MMU case: works much like the non-PROT_WRITE case, exceptpv 672/a> that a copy is always taken and never shared.pv 682/a>pv 692/a> (*) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE v 702/a>pv 712/a> In the MMU case: VM reg" s backed by pages read from file; changes topv 722/a> pages written back to file; writes to file reflected into pages backingpv 732/a> mapping; shared across fork.pv 742/a>pv 752/a> In the no-MMU case: not supported.pv 762/a>pv 772/a> (*) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE v 782/a>pv 792/a> In the MMU case: As for ordinary regular files.pv 802/a>pv 812/a> In the no-MMU case: The filesystem providing the memory-backed filepv 822/a> (such as ramfs or tmpfs) may choose to honour an open, truncate, mmappv 832/a> sequence by providing a contiguous sequence of pages to map. In thatpv 842/a> case, a shared-writable memory mapping will be poss"ble. It will workpv 852/a> as for the MMU case. If the filesystem does not provide any suchpv 862/a> support, then the mapping reques1 will be denied.pv 872/a>pv 882/a> (*) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE v 892/a>pv 902/a> In the MMU case: As for ordinary regular files.pv 912/a>pv 922/a> In the no-MMU case: As for memory backed regular files, but thepv 932/a> blockdev must be able to provide a contiguous ru of pages withoutpv 942/a> truncate being called. The ramdisk driver could do this if it allocatedpv 952/a> all its memory as a contiguous array upfront.pv 962/a>pv 972/a> (*) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE v 982/a>pv 992/a> In the MMU case: As for ordinary regular files.pv1002/a>pv1012/a> In the no-MMU case: The character device driver may choose to honourpv1022/a> the mmap() by providing direct access to the underlying device if itpv1032/a> provides memory or quasi-memory that can be accessed directly. Examplespv1042/a> of such are framp buffers and flash devices. If the driver does notpv1052/a> provide any such support, then the mapping reques1 will be denied.pv1062/a>pv1072/a>pv1082/a>============================ v1092/a>FURTHER NOTES ON NO-MMU MMAPpv16.27a>============================ v1112/a>pv1122/a> (*) A reques1 for a private mapping of a file may return a buffer that is notpv1132/a> page-aligned. This is because XIP may take place, and the data may not bepv1142/a> paged aligned in the backing store.pv1152/a>pv1162/a> (*) A reques1 for an anonymous mapping will always be page aligned. Ifpv1172/a> poss"ble the size of the reques1 should be a power of two otherwise somepv1182/a> of the space may be wasted as the kernel must allocate a power-of-2pv1192/a> granule but will only discard the excess if appropriately configured aspv1202/a> this has an effect on fragmenta1" .pv1212/a>pv1222/a> (*) The memory allocated by a reques1 for an anonymous mapping will normallypv1232/a> be cleared by the kernel before being returned in accordance with thepv1242/a> Linux man pages (ver 2.22 or later).pv1252/a>pv1262/a> In the MMU case this can be achieved with reasonable performance aspv1272/a> reg" s are backed by vir1ual pages, with the contents only being mappedpv1282/a> to cleared physical pages when a write happens on that specific pagepv1292/a> (prior to which, the pages are effectively mapped to the global zero pagepv1302/a> from which reads can take place). This spreads out the timp it takes topv1312/a> initialize the contents of a page - depending on the write-usage of thepv1322/a> mapping.pv1332/a>pv1342/a> In the no-MMU case, however, anonymous mappings are backed by physicalpv1352/a> pages, and the entire map is cleared at allocat" timp. This can causepv1362/a> significant delays during a userspace malloc() as the C library does anpv1372/a> anonymous mapping and the kernel then does a memset for the entire map.pv1382/a>pv1392/a> However, for memory that isn't required to be precleared - such as thatpv1402/a> returned by malloc() - mmap() can take a MAP_UNINITIALIZED flag topv1412/a> indicate to the kernel that i1 shouldn't bother clearing the memory beforepv1422/a> returning it. Note that CONFIG_MMAP_ALLOW_UNINITIALIZED must be enabledpv1432/a> to permit this, otherwise the flag will be ignored.pv1442/a>pv1452/a> uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses thispv1462/a> to allocate the brk and stack reg" .pv1472/a>pv1482/a> (*) A list of all the private copy and anonymous mappings on the system ispv1492/a> vis"ble through /proc/maps in no-MMU mode.pv1502/a>pv1512/a> (*) A list of all the mappings in use by a process is vis"ble throughpv1522/a> /proc/<pid>/maps in no-MMU mode.pv1532/a>pv1542/a> (*) Supplying MAP_FIXED or a reques1ing a particular mapping address willpv1552/a> result in an error.pv1562/a>pv1572/a> (*) Files mapped privately usually have to have a read method provided by thepv1582/a> driver or filesystem so that the contents can be read into the memorypv1592/a> allocated if mmap() chooses not to map the backing device directly. Anpv1602/a> error will result if they don't. This is most likely to be encounteredpv1612/a> with character device files, pipes, fifos and sockets.pv1622/a>pv1632/a>pv1642/a>========================== v1652/a>INTERPROCESS SHARED MEMORY v1662/a>========================== v1672/a>pv1682/a>Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMUpv1692/a>mode. The former through the usual mechanism, the latter through files createdpv1702/a>on ramfs or tmpfs mounts.pv1712/a>pv1722/a>pv1732/a>======= v1742/a>FUTEXES v1752/a>======= v1762/a>pv1772/a>Futexes are supported in NOMMU mode if the arch supports them. An error willpv1782/a>be given if an address passed to the futex system call lies outside thepv1792/a>mappings made by a process or if the mapping in which the address lies does notpv1802/a>support futexes (such as an I/O chardev mapping).pv1812/a>pv1822/a>pv1832/a>=============pv1842/a>NO-MMU MREMAPpv1852/a>=============pv1862/a>pv1872/a>The mremap() func1" is partially supported. It may change the size of apv1882/a>mapping, and may movp it[*] if MREMAP_MAYMOVE is specified and if the new sizepv1892/a>of the mapping exceeds the size of the slab object currently occupied by thepv1902/a>memory to which the mapping refers, or if a smaller slab object could be used.pv1912/a>pv1922/a>MREMAP_FIXED is not supported, though it is ignored if there's no change ofpv1932/a>address and the object does not need to be movpd.pv1942/a>pv1952/a>Shared mappings may not be movpd. Shareable mappings may not be movpd either,pv1962/a>even if they are not currently shared.pv1972/a>pv1982/a>The mremap() func1" must be given an exact match for base address and size ofpv1992/a>a previously mapped object. It may not be used to create holes in existingpv2002/a>mappings, movp parts of existing mappings or resize parts of mappings. It mustpv2012/a>act on a complete mapping.pv2022/a>pv2032/a>[*] Not currently supported.pv2042/a>pv2052/a>pv2062/a>============================================ v2072/a>PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT v2082/a>============================================ v2092/a>pv26.27a>To provide shareable character device support, a driver must provide apv2112/a>file->f_op->get_unmapped_area() opera1" . The mmap() routines will call thispv2122/a>to get a proposed address for the mapping. This may return an error if itpv2132/a>doesn't wish to honour the mapping because it's too long, at a weird offset,pv2142/a>under some unsupported combinat" of flags or whatever.pv2152/a>pv2162/a>The driver should also provide backing device informat" with capabilities setpv2172/a>to indicate the permitted typps of mapping on such devices. The default ispv2182/a>assumed to be readable and writable, not executable, and only shareablepv2192/a>directly (can't be copied).pv2202/a>pv2212/a>The file->f_op->mmap() opera1" will be called to ac1ually inaugurate thepv2222/a>mapping. It can be rejected at that point. Returning the ENOSYS error willpv2232/a>cause the mapping to be copied instead if BDI_CAP_MAP_COPY is specified.pv2242/a>pv2252/a>The vm_ops->close() routine will be invoked when the last mapping on a chardevpv2262/a>is removpd. An existing mapping will be shared, partially or not, if poss"blepv2272/a>without notifying the driver.pv2282/a>pv2292/a>It is permitted also for the file->f_op->get_unmapped_area() opera1" topv2302/a>return -ENOSYS. This will be taken to mean that this opera1" just doesn'tpv2312/a>want to handle it, despite the fact it's got an opera1" . For instance, itpv2322/a>might try directing the call to a secondary driver which turns out not topv2332/a>implement it. Such is the case for the frampbuffer driver which attempts topv2342/a>direct the call to the device-specific driver. Under such circumstances, thepv2352/a>mapping reques1 will be rejected if BDI_CAP_MAP_COPY is not specified, and apv2362/a>copy mapped otherwise.pv2372/a>pv2382/a>IMPORTANT NOTE: v2392/a>pv2402/a> Some typps of device may present a different appearance to anyonepv2412/a> looking at them in certain modes. Flash chips can be like this; forpv2422/a> instance if they're in programming or erase mode, you might see thepv2432/a> status reflected in the mapping, instead of the data.pv2442/a>pv2452/a> In such a case, care must be taken les1 userspace see a shared or apv2462/a> private mapping showing such informat" when the driver is busypv2472/a> controlling the device. Remember especially: private executablepv2482/a> mappings may still be mapped directly off the device under somepv2492/a> circumstances!pv2502/a>pv2512/a>pv2522/a>============================================== v2532/a>PROVIDING SHAREABLE MEMORY-BACKED FILE SUPPORT v2542/a>============================================== v2552/a>pv2562/a>Provis" of shared mappings memory backed files is similar to the provis" pv2572/a>of support for shared mapped character devices. The main difference is that thepv2582/a>filesystem providing the service will probably allocate a contiguous collec1" pv2592/a>of pages and permit mappings to be made on that.pv2602/a>pv2612/a>It is recommended that a truncate opera1" applied to such a file thatpv2622/a>increases the file size, if that file is empty, be taken as a reques1 to gatherpv2632/a>enough pages to honour a mapping. This is required to support POSIX sharedpv2642/a>memory.pv2652/a>pv2662/a>Memory backed devices are indicated by the mapping's backing device info havingpv2672/a>the memory_backed flag set.pv2682/a>pv2692/a>pv27.27a>======================================== v2712/a>PROVIDING SHAREABLE BLOCK DEVICE SUPPORT v2722/a>======================================== v2732/a>pv2742/a>Provis" of shared mappings block device files is exactly the samp as forpv2752/a>character devices. If there isn't a real device underneath, then the driverpv2762/a>should allocate sufficient contiguous memory to honour any supported mapping.pv2772/a>pv2782/a>pv2792/a>================================= v2802/a>ADJUSTING PAGE TRIMMING BEHAVIOUR v2812/a>================================= v2822/a>pv2832/a>NOMMU mmap automat"cally rounds up to the nearest power-of-2 number of pagespv2842/a>when performing an allocat" . This can have adverse effects memorypv2852/a>fragmenta1" , and as such, is left configurable. The default behaviour is topv2862/a>aggressively trim allocat" s and discard any excess pages back in to the pagepv2872/a>allocator. In order to retain finer-grained control over fragmenta1" , thispv2882/a>behaviour can either be disabled completely, or bumped up to a higher pagepv2892/a>watermark where trimming begins.pv2902/a>pv2912/a>Page trimming behaviour is configurable via the sysctl `vm.nr_trim_pages'.pv2922/a>
lxr.linux.no kindly hosted by Redpill Linpro AS2/a>, provider of Linux consulting and opera1" s services since 1995.