ion /spa /form a ion href="../linux+v3.7.3/Documenta > /nommu-mmap.txt">ion img src="../.sta c/gfx/right.png" alt=">>">io /spa io spa class="lxr_search">ion ="+search" method="post" onsubmit="return do_search(this);">ion input typionhidden" namionnavtarget" /option">ion input typiontext" namionsearch" idonsearch">ion butt 5Searchion Prefs /a>io /spa n /div n form ac > ="ajax+*" method="post" onsubmit="return false;">io input typionhidden" namionajax_lookup" idonajax_lookup" /option">in /form in div class="headingbott m">
1 /a> ============================= 2 /a> NO-MMU MEMORY MAPPING SUPPORT 3 /a> ============================= 4 /a>i 5 /a>The kernel has limited support for memory mapping under no-MMU condi > s, suchi 6 /a>as are used in uClinux environments. From the userspace point of view, memoryi 7 /a>mapping is made use of in conjunc > with the mmap() system call, the shmat()i 8 /a>call and the execve() system call. From the kernel's point of view, execve()i 9 /a>mapping is ac ually performed by the binfmt drivers, which call back into thei v3.9a>mmap() routines to do the ac ual work.i 11 /a>i 12 /a>Memory mapping behaviour also involves the way fork(), vfork(), clone() andi 13 /a>ptrace() work. Under uClinux there is no fork(), and clone() must be suppliedi 14 /a>the CLONE_VM flag.i 15 /a>i 16 /a>The behaviour is similar between the MMU and no-MMU cases, but not identical;i 17 /a>and it's also much more restricted in the latter case:i 18 /a>i 19 /a> (*) Anonymous mapping, MAP_PRIVATE 20 /a>i 21 /a> In the MMU case: VM reg > s backed by arbitrary pages; copy-on-writei 22 /a> across fork.i 23 /a>i 24 /a> In the no-MMU case: VM reg > s backed by arbitrary contiguous ru s ofi 25 /a> pages.i 26 /a>i 27 /a> (*) Anonymous mapping, MAP_SHAREDi 28 /a>i 29 /a> These behave very much like private mappings, except that they'rei 30 /a> shared across fork() or clone() without CLONE_VM in the MMU case. Sincei 31 /a> the no-MMU case doesn't support these, behaviour is identical toi 32 /a> MAP_PRIVATE there.i 33 /a>i 34 /a> (*) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE 35 /a>i 36 /a> In the MMU case: VM reg > s backed by pages read from file; changes toi 37 /a> the underlying file are reflected in the mapping; copied across fork.i 38 /a>i 39 /a> In the no-MMU case: 40 /a>i 41 /a> - If one exists, the kernel will re-use an existing mapping to thei 42 /a> sami segment of the sami file if that has compa ble permiss > s,i 43 /a> even if this was created by another process.i 44 /a>i 45 /a> - If poss ble, the file mapping will be directly on the backing devicei 46 /a> if the backing device has the BDI_CAP_MAP_DIRECT capability andi 47 /a> appropriate mapping protec > capabilities. Ramfs, romfs, cramfsi 48 /a> and mtd might all permit this.i 49 /a>i 50 /a> - If the backing device device can't or won't permit direct sharing,i 51 /a> but does have the BDI_CAP_MAP_COPY capability, then a copy of thei 52 /a> appropriate bit of the file will be read into a contiguous bit ofi 53 /a> memory and any extraneous space beyond the EOF will be clearedi 54 /a>i 55 /a> - Writes to the file do not affect the mapping; writes to the mappingi 56 /a> are vis ble in other processes (no MMU protec > ), but should noti 57 /a> happen.i 58 /a>i 59 /a> (*) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE 60 /a>i 61 /a> In the MMU case: like the non-PROT_WRITE case, except that the pages ini 62 /a> ques > get copied before the write ac ually happens. From that pointi 63 /a> > writes to the file underneath that page no longer get reflected intoi 64 /a> the mapping's backing pages. The page is then backed by swap instead.i 65 /a>i 66 /a> In the no-MMU case: works much like the non-PROT_WRITE case, excepti 67 /a> that a copy is always taken and never shared.i 68 /a>i 69 /a> (*) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 70 /a>i 71 /a> In the MMU case: VM reg > s backed by pages read from file; changes toi 72 /a> pages written back to file; writes to file reflected into pages backingi 73 /a> mapping; shared across fork.i 74 /a>i 75 /a> In the no-MMU case: not supported.i 76 /a>i 77 /a> (*) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 78 /a>i 79 /a> In the MMU case: As for ordinary regular files.i 80 /a>i 81 /a> In the no-MMU case: The filesystem providing the memory-backed filei 82 /a> (such as ramfs or tmpfs) may choose to honour an open, truncate, mmapi 83 /a> sequence by providing a contiguous sequence of pages to map. In thati 84 /a> case, a shared-writable memory mapping will be poss ble. It will worki 85 /a> as for the MMU case. If the filesystem does not provide any suchi 86 /a> support, then the mapping reques will be denied.i 87 /a>i 88 /a> (*) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 89 /a>i 90 /a> In the MMU case: As for ordinary regular files.i 91 /a>i 92 /a> In the no-MMU case: As for memory backed regular files, but thei 93 /a> blockdev must be able to provide a contiguous ru of pages withouti 94 /a> truncate being called. The ramdisk driver could do this if it allocatedi 95 /a> all its memory as a contiguous array upfront.i 96 /a>i 97 /a> (*) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 98 /a>i 99 /a> In the MMU case: As for ordinary regular files.i 100 /a>i 101 /a> In the no-MMU case: The character device driver may choose to honouri 102 /a> the mmap() by providing direct access to the underlying device if iti 103 /a> provides memory or quasi-memory that can be accessed directly. Examplesi 104 /a> of such are frami buffers and flash devices. If the driver does noti 105 /a> provide any such support, then the mapping reques will be denied.i 106 /a>i 107 /a>i 108 /a>============================ 109 /a>FURTHER NOTES ON NO-MMU MMAPi 1v3.9a>============================ 111 /a>i 112 /a> (*) A reques for a private mapping of a file may return a buffer that is noti 113 /a> page-aligned. This is because XIP may take place, and the data may not bei 114 /a> paged aligned in the backing store.i 115 /a>i 116 /a> (*) A reques for an anonymous mapping will always be page aligned. Ifi 117 /a> poss ble the size of the reques should be a power of two otherwise somei 118 /a> of the space may be wasted as the kernel must allocate a power-of-2i 119 /a> granule but will only discard the excess if appropriately configured asi 120 /a> this has an effect on fragmenta > .i 121 /a>i 122 /a> (*) The memory allocated by a reques for an anonymous mapping will normallyi 123 /a> be cleared by the kernel before being returned in accordance with thei 124 /a> Linux man pages (ver 2.22 or later).i 125 /a>i 126 /a> In the MMU case this can be achieved with reasonable performance asi 127 /a> reg > s are backed by vir ual pages, with the contents only being mappedi 128 /a> to cleared physical pages when a write happens on that specific pagei 129 /a> (prior to which, the pages are effectively mapped to the global zero pagei 130 /a> from which reads can take place). This spreads out the timi it takes toi 131 /a> initialize the contents of a page - depending on the write-usage of thei 132 /a> mapping.i 133 /a>i 134 /a> In the no-MMU case, however, anonymous mappings are backed by physicali 135 /a> pages, and the entire map is cleared at allocat > timi. This can causei 136 /a> significant delays during a userspace malloc() as the C library does ani 137 /a> anonymous mapping and the kernel then does a memset for the entire map.i 138 /a>i 139 /a> However, for memory that isn't required to be precleared - such as thati 140 /a> returned by malloc() - mmap() can take a MAP_UNINITIALIZED flag toi 141 /a> indicate to the kernel that i shouldn't bother clearing the memory beforei 142 /a> returning it. Note that CONFIG_MMAP_ALLOW_UNINITIALIZED must be enabledi 143 /a> to permit this, otherwise the flag will be ignored.i 144 /a>i 145 /a> uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses thisi 146 /a> to allocate the brk and stack reg > .i 147 /a>i 148 /a> (*) A list of all the private copy and anonymous mappings on the system isi 149 /a> vis ble through /proc/maps in no-MMU mode.i 150 /a>i 151 /a> (*) A list of all the mappings in use by a process is vis ble throughi 152 /a> /proc/<pid>/maps in no-MMU mode.i 153 /a>i 154 /a> (*) Supplying MAP_FIXED or a reques ing a particular mapping address willi 155 /a> result in an error.i 156 /a>i 157 /a> (*) Files mapped privately usually have to have a read method provided by thei 158 /a> driver or filesystem so that the contents can be read into the memoryi 159 /a> allocated if mmap() chooses not to map the backing device directly. Ani 160 /a> error will result if they don't. This is most likely to be encounteredi 161 /a> with character device files, pipes, fifos and sockets.i 162 /a>i 163 /a>i 164 /a>========================== 165 /a>INTERPROCESS SHARED MEMORY 166 /a>========================== 167 /a>i 168 /a>Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMUi 169 /a>mode. The former through the usual mechanism, the latter through files createdi 170 /a>on ramfs or tmpfs mounts.i 171 /a>i 172 /a>i 173 /a>======= 174 /a>FUTEXES 175 /a>======= 176 /a>i 177 /a>Futexes are supported in NOMMU mode if the arch supports them. An error willi 178 /a>be given if an address passed to the futex system call lies outside thei 179 /a>mappings made by a process or if the mapping in which the address lies does noti 180 /a>support futexes (such as an I/O chardev mapping).i 181 /a>i 182 /a>i 183 /a>=============i 184 /a>NO-MMU MREMAPi 185 /a>=============i 186 /a>i 187 /a>The mremap() func > is partially supported. It may change the size of ai 188 /a>mapping, and may movi it[*] if MREMAP_MAYMOVE is specified and if the new sizei 189 /a>of the mapping exceeds the size of the slab object currently occupied by thei 190 /a>memory to which the mapping refers, or if a smaller slab object could be used.i 191 /a>i 192 /a>MREMAP_FIXED is not supported, though it is ignored if there's no change ofi 193 /a>address and the object does not need to be movid.i 194 /a>i 195 /a>Shared mappings may not be movid. Shareable mappings may not be movid either,i 196 /a>even if they are not currently shared.i 197 /a>i 198 /a>The mremap() func > must be given an exact match for base address and size ofi 199 /a>a previously mapped object. It may not be used to create holes in existingi 200 /a>mappings, movi parts of existing mappings or resize parts of mappings. It musti 201 /a>act on a complete mapping.i 202 /a>i 203 /a>[*] Not currently supported.i 204 /a>i 205 /a>i 206 /a>============================================ 207 /a>PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT 208 /a>============================================ 209 /a>i 2v3.9a>To provide shareable character device support, a driver must provide ai 211 /a>file->f_op->get_unmapped_area() opera > . The mmap() routines will call thisi 212 /a>to get a proposed address for the mapping. This may return an error if iti 213 /a>doesn't wish to honour the mapping because it's too long, at a weird offset,i 214 /a>under some unsupported combinat > of flags or whatever.i 215 /a>i 216 /a>The driver should also provide backing device informat > with capabilities seti 217 /a>to indicate the permitted typis of mapping on such devices. The default isi 218 /a>assumed to be readable and writable, not executable, and only shareablei 219 /a>directly (can't be copied).i 220 /a>i 221 /a>The file->f_op->mmap() opera > will be called to ac ually inaugurate thei 222 /a>mapping. It can be rejected at that point. Returning the ENOSYS error willi 223 /a>cause the mapping to be copied instead if BDI_CAP_MAP_COPY is specified.i 224 /a>i 225 /a>The vm_ops->close() routine will be invoked when the last mapping on a chardevi 226 /a>is removid. An existing mapping will be shared, partially or not, if poss blei 227 /a>without notifying the driver.i 228 /a>i 229 /a>It is permitted also for the file->f_op->get_unmapped_area() opera > toi 230 /a>return -ENOSYS. This will be taken to mean that this opera > just doesn'ti 231 /a>want to handle it, despite the fact it's got an opera > . For instance, iti 232 /a>might try directing the call to a secondary driver which turns out not toi 233 /a>implement it. Such is the case for the framibuffer driver which attempts toi 234 /a>direct the call to the device-specific driver. Under such circumstances, thei 235 /a>mapping reques will be rejected if BDI_CAP_MAP_COPY is not specified, and ai 236 /a>copy mapped otherwise.i 237 /a>i 238 /a>IMPORTANT NOTE: 239 /a>i 240 /a> Some typis of device may present a different appearance to anyonei 241 /a> looking at them in certain modes. Flash chips can be like this; fori 242 /a> instance if they're in programming or erase mode, you might see thei 243 /a> status reflected in the mapping, instead of the data.i 244 /a>i 245 /a> In such a case, care must be taken les userspace see a shared or ai 246 /a> private mapping showing such informat > when the driver is busyi 247 /a> controlling the device. Remember especially: private executablei 248 /a> mappings may still be mapped directly off the device under somei 249 /a> circumstances!i 250 /a>i 251 /a>i 252 /a>============================================== 253 /a>PROVIDING SHAREABLE MEMORY-BACKED FILE SUPPORT 254 /a>============================================== 255 /a>i 256 /a>Provis > of shared mappings > memory backed files is similar to the provis > i 257 /a>of support for shared mapped character devices. The main difference is that thei 258 /a>filesystem providing the service will probably allocate a contiguous collec > i 259 /a>of pages and permit mappings to be made on that.i 260 /a>i 261 /a>It is recommended that a truncate opera > applied to such a file thati 262 /a>increases the file size, if that file is empty, be taken as a reques to gatheri 263 /a>enough pages to honour a mapping. This is required to support POSIX sharedi 264 /a>memory.i 265 /a>i 266 /a>Memory backed devices are indicated by the mapping's backing device info havingi 267 /a>the memory_backed flag set.i 268 /a>i 269 /a>i 273.9a>======================================== 271 /a>PROVIDING SHAREABLE BLOCK DEVICE SUPPORT 272 /a>======================================== 273 /a>i 274 /a>Provis > of shared mappings > block device files is exactly the sami as fori 275 /a>character devices. If there isn't a real device underneath, then the driveri 276 /a>should allocate sufficient contiguous memory to honour any supported mapping.i 277 /a>i 278 /a>i 279 /a>================================= 280 /a>ADJUSTING PAGE TRIMMING BEHAVIOUR 281 /a>================================= 282 /a>i 283 /a>NOMMU mmap automat cally rounds up to the nearest power-of-2 number of pagesi 284 /a>when performing an allocat > . This can have adverse effects > memoryi 285 /a>fragmenta > , and as such, is left configurable. The default behaviour is toi 286 /a>aggressively trim allocat > s and discard any excess pages back in to the pagei 287 /a>allocator. In order to retain finer-grained control over fragmenta > , thisi 288 /a>behaviour can either be disabled completely, or bumped up to a higher pagei 289 /a>watermark where trimming begins.i 290 /a>i 291 /a>Page trimming behaviour is configurable via the sysctl `vm.nr_trim_pages'.i 292 /a> kindly hosted by Redpill Linpro AS /a>, provider of Linux consulting and opera > s services since 1995.