linux/Documentation/filesystems/idmappings.rst
<<
>>
Prefs
   1.. SPDX-License-Identifier: GPL-2.0
   2
   3Idmappings
   4==========
   5
   6Most filesystem developers will have encountered idmappings. They are used when
   7reading from or writing ownership to disk, reporting ownership to userspace, or
   8for permission checking. This document is aimed at filesystem developers that
   9want to know how idmappings work.
  10
  11Formal notes
  12------------
  13
  14An idmapping is essentially a translation of a range of ids into another or the
  15same range of ids. The notational convention for idmappings that is widely used
  16in userspace is::
  17
  18 u:k:r
  19
  20``u`` indicates the first element in the upper idmapset ``U`` and ``k``
  21indicates the first element in the lower idmapset ``K``. The ``r`` parameter
  22indicates the range of the idmapping, i.e. how many ids are mapped. From now
  23on, we will always prefix ids with ``u`` or ``k`` to make it clear whether
  24we're talking about an id in the upper or lower idmapset.
  25
  26To see what this looks like in practice, let's take the following idmapping::
  27
  28 u22:k10000:r3
  29
  30and write down the mappings it will generate::
  31
  32 u22 -> k10000
  33 u23 -> k10001
  34 u24 -> k10002
  35
  36From a mathematical viewpoint ``U`` and ``K`` are well-ordered sets and an
  37idmapping is an order isomorphism from ``U`` into ``K``. So ``U`` and ``K`` are
  38order isomorphic. In fact, ``U`` and ``K`` are always well-ordered subsets of
  39the set of all possible ids useable on a given system.
  40
  41Looking at this mathematically briefly will help us highlight some properties
  42that make it easier to understand how we can translate between idmappings. For
  43example, we know that the inverse idmapping is an order isomorphism as well::
  44
  45 k10000 -> u22
  46 k10001 -> u23
  47 k10002 -> u24
  48
  49Given that we are dealing with order isomorphisms plus the fact that we're
  50dealing with subsets we can embedd idmappings into each other, i.e. we can
  51sensibly translate between different idmappings. For example, assume we've been
  52given the three idmappings::
  53
  54 1. u0:k10000:r10000
  55 2. u0:k20000:r10000
  56 3. u0:k30000:r10000
  57
  58and id ``k11000`` which has been generated by the first idmapping by mapping
  59``u1000`` from the upper idmapset down to ``k11000`` in the lower idmapset.
  60
  61Because we're dealing with order isomorphic subsets it is meaningful to ask
  62what id ``k11000`` corresponds to in the second or third idmapping. The
  63straightfoward algorithm to use is to apply the inverse of the first idmapping,
  64mapping ``k11000`` up to ``u1000``. Afterwards, we can map ``u1000`` down using
  65either the second idmapping mapping or third idmapping mapping. The second
  66idmapping would map ``u1000`` down to ``21000``. The third idmapping would map
  67``u1000`` down to ``u31000``.
  68
  69If we were given the same task for the following three idmappings::
  70
  71 1. u0:k10000:r10000
  72 2. u0:k20000:r200
  73 3. u0:k30000:r300
  74
  75we would fail to translate as the sets aren't order isomorphic over the full
  76range of the first idmapping anymore (However they are order isomorphic over
  77the full range of the second idmapping.). Neither the second or third idmapping
  78contain ``u1000`` in the upper idmapset ``U``. This is equivalent to not having
  79an id mapped. We can simply say that ``u1000`` is unmapped in the second and
  80third idmapping. The kernel will report unmapped ids as the overflowuid
  81``(uid_t)-1`` or overflowgid ``(gid_t)-1`` to userspace.
  82
  83The algorithm to calculate what a given id maps to is pretty simple. First, we
  84need to verify that the range can contain our target id. We will skip this step
  85for simplicity. After that if we want to know what ``id`` maps to we can do
  86simple calculations:
  87
  88- If we want to map from left to right::
  89
  90   u:k:r
  91   id - u + k = n
  92
  93- If we want to map from right to left::
  94
  95   u:k:r
  96   id - k + u = n
  97
  98Instead of "left to right" we can also say "down" and instead of "right to
  99left" we can also say "up". Obviously mapping down and up invert each other.
 100
 101To see whether the simple formulas above work, consider the following two
 102idmappings::
 103
 104 1. u0:k20000:r10000
 105 2. u500:k30000:r10000
 106
 107Assume we are given ``k21000`` in the lower idmapset of the first idmapping. We
 108want to know what id this was mapped from in the upper idmapset of the first
 109idmapping. So we're mapping up in the first idmapping::
 110
 111 id     - k      + u  = n
 112 k21000 - k20000 + u0 = u1000
 113
 114Now assume we are given the id ``u1100`` in the upper idmapset of the second
 115idmapping and we want to know what this id maps down to in the lower idmapset
 116of the second idmapping. This means we're mapping down in the second
 117idmapping::
 118
 119 id    - u    + k      = n
 120 u1100 - u500 + k30000 = k30600
 121
 122General notes
 123-------------
 124
 125In the context of the kernel an idmapping can be interpreted as mapping a range
 126of userspace ids into a range of kernel ids::
 127
 128 userspace-id:kernel-id:range
 129
 130A userspace id is always an element in the upper idmapset of an idmapping of
 131type ``uid_t`` or ``gid_t`` and a kernel id is always an element in the lower
 132idmapset of an idmapping of type ``kuid_t`` or ``kgid_t``. From now on
 133"userspace id" will be used to refer to the well known ``uid_t`` and ``gid_t``
 134types and "kernel id" will be used to refer to ``kuid_t`` and ``kgid_t``.
 135
 136The kernel is mostly concerned with kernel ids. They are used when performing
 137permission checks and are stored in an inode's ``i_uid`` and ``i_gid`` field.
 138A userspace id on the other hand is an id that is reported to userspace by the
 139kernel, or is passed by userspace to the kernel, or a raw device id that is
 140written or read from disk.
 141
 142Note that we are only concerned with idmappings as the kernel stores them not
 143how userspace would specify them.
 144
 145For the rest of this document we will prefix all userspace ids with ``u`` and
 146all kernel ids with ``k``. Ranges of idmappings will be prefixed with ``r``. So
 147an idmapping will be written as ``u0:k10000:r10000``.
 148
 149For example, the id ``u1000`` is an id in the upper idmapset or "userspace
 150idmapset" starting with ``u1000``. And it is mapped to ``k11000`` which is a
 151kernel id in the lower idmapset or "kernel idmapset" starting with ``k10000``.
 152
 153A kernel id is always created by an idmapping. Such idmappings are associated
 154with user namespaces. Since we mainly care about how idmappings work we're not
 155going to be concerned with how idmappings are created nor how they are used
 156outside of the filesystem context. This is best left to an explanation of user
 157namespaces.
 158
 159The initial user namespace is special. It always has an idmapping of the
 160following form::
 161
 162 u0:k0:r4294967295
 163
 164which is an identity idmapping over the full range of ids available on this
 165system.
 166
 167Other user namespaces usually have non-identity idmappings such as::
 168
 169 u0:k10000:r10000
 170
 171When a process creates or wants to change ownership of a file, or when the
 172ownership of a file is read from disk by a filesystem, the userspace id is
 173immediately translated into a kernel id according to the idmapping associated
 174with the relevant user namespace.
 175
 176For instance, consider a file that is stored on disk by a filesystem as being
 177owned by ``u1000``:
 178
 179- If a filesystem were to be mounted in the initial user namespaces (as most
 180  filesystems are) then the initial idmapping will be used. As we saw this is
 181  simply the identity idmapping. This would mean id ``u1000`` read from disk
 182  would be mapped to id ``k1000``. So an inode's ``i_uid`` and ``i_gid`` field
 183  would contain ``k1000``.
 184
 185- If a filesystem were to be mounted with an idmapping of ``u0:k10000:r10000``
 186  then ``u1000`` read from disk would be mapped to ``k11000``. So an inode's
 187  ``i_uid`` and ``i_gid`` would contain ``k11000``.
 188
 189Translation algorithms
 190----------------------
 191
 192We've already seen briefly that it is possible to translate between different
 193idmappings. We'll now take a closer look how that works.
 194
 195Crossmapping
 196~~~~~~~~~~~~
 197
 198This translation algorithm is used by the kernel in quite a few places. For
 199example, it is used when reporting back the ownership of a file to userspace
 200via the ``stat()`` system call family.
 201
 202If we've been given ``k11000`` from one idmapping we can map that id up in
 203another idmapping. In order for this to work both idmappings need to contain
 204the same kernel id in their kernel idmapsets. For example, consider the
 205following idmappings::
 206
 207 1. u0:k10000:r10000
 208 2. u20000:k10000:r10000
 209
 210and we are mapping ``u1000`` down to ``k11000`` in the first idmapping . We can
 211then translate ``k11000`` into a userspace id in the second idmapping using the
 212kernel idmapset of the second idmapping::
 213
 214 /* Map the kernel id up into a userspace id in the second idmapping. */
 215 from_kuid(u20000:k10000:r10000, k11000) = u21000
 216
 217Note, how we can get back to the kernel id in the first idmapping by inverting
 218the algorithm::
 219
 220 /* Map the userspace id down into a kernel id in the second idmapping. */
 221 make_kuid(u20000:k10000:r10000, u21000) = k11000
 222
 223 /* Map the kernel id up into a userspace id in the first idmapping. */
 224 from_kuid(u0:k10000:r10000, k11000) = u1000
 225
 226This algorithm allows us to answer the question what userspace id a given
 227kernel id corresponds to in a given idmapping. In order to be able to answer
 228this question both idmappings need to contain the same kernel id in their
 229respective kernel idmapsets.
 230
 231For example, when the kernel reads a raw userspace id from disk it maps it down
 232into a kernel id according to the idmapping associated with the filesystem.
 233Let's assume the filesystem was mounted with an idmapping of
 234``u0:k20000:r10000`` and it reads a file owned by ``u1000`` from disk. This
 235means ``u1000`` will be mapped to ``k21000`` which is what will be stored in
 236the inode's ``i_uid`` and ``i_gid`` field.
 237
 238When someone in userspace calls ``stat()`` or a related function to get
 239ownership information about the file the kernel can't simply map the id back up
 240according to the filesystem's idmapping as this would give the wrong owner if
 241the caller is using an idmapping.
 242
 243So the kernel will map the id back up in the idmapping of the caller. Let's
 244assume the caller has the slighly unconventional idmapping
 245``u3000:k20000:r10000`` then ``k21000`` would map back up to ``u4000``.
 246Consequently the user would see that this file is owned by ``u4000``.
 247
 248Remapping
 249~~~~~~~~~
 250
 251It is possible to translate a kernel id from one idmapping to another one via
 252the userspace idmapset of the two idmappings. This is equivalent to remapping
 253a kernel id.
 254
 255Let's look at an example. We are given the following two idmappings::
 256
 257 1. u0:k10000:r10000
 258 2. u0:k20000:r10000
 259
 260and we are given ``k11000`` in the first idmapping. In order to translate this
 261kernel id in the first idmapping into a kernel id in the second idmapping we
 262need to perform two steps:
 263
 2641. Map the kernel id up into a userspace id in the first idmapping::
 265
 266    /* Map the kernel id up into a userspace id in the first idmapping. */
 267    from_kuid(u0:k10000:r10000, k11000) = u1000
 268
 2692. Map the userspace id down into a kernel id in the second idmapping::
 270
 271    /* Map the userspace id down into a kernel id in the second idmapping. */
 272    make_kuid(u0:k20000:r10000, u1000) = k21000
 273
 274As you can see we used the userspace idmapset in both idmappings to translate
 275the kernel id in one idmapping to a kernel id in another idmapping.
 276
 277This allows us to answer the question what kernel id we would need to use to
 278get the same userspace id in another idmapping. In order to be able to answer
 279this question both idmappings need to contain the same userspace id in their
 280respective userspace idmapsets.
 281
 282Note, how we can easily get back to the kernel id in the first idmapping by
 283inverting the algorithm:
 284
 2851. Map the kernel id up into a userspace id in the second idmapping::
 286
 287    /* Map the kernel id up into a userspace id in the second idmapping. */
 288    from_kuid(u0:k20000:r10000, k21000) = u1000
 289
 2902. Map the userspace id down into a kernel id in the first idmapping::
 291
 292    /* Map the userspace id down into a kernel id in the first idmapping. */
 293    make_kuid(u0:k10000:r10000, u1000) = k11000
 294
 295Another way to look at this translation is to treat it as inverting one
 296idmapping and applying another idmapping if both idmappings have the relevant
 297userspace id mapped. This will come in handy when working with idmapped mounts.
 298
 299Invalid translations
 300~~~~~~~~~~~~~~~~~~~~
 301
 302It is never valid to use an id in the kernel idmapset of one idmapping as the
 303id in the userspace idmapset of another or the same idmapping. While the kernel
 304idmapset always indicates an idmapset in the kernel id space the userspace
 305idmapset indicates a userspace id. So the following translations are forbidden::
 306
 307 /* Map the userspace id down into a kernel id in the first idmapping. */
 308 make_kuid(u0:k10000:r10000, u1000) = k11000
 309
 310 /* INVALID: Map the kernel id down into a kernel id in the second idmapping. */
 311 make_kuid(u10000:k20000:r10000, k110000) = k21000
 312                                 ~~~~~~~
 313
 314and equally wrong::
 315
 316 /* Map the kernel id up into a userspace id in the first idmapping. */
 317 from_kuid(u0:k10000:r10000, k11000) = u1000
 318
 319 /* INVALID: Map the userspace id up into a userspace id in the second idmapping. */
 320 from_kuid(u20000:k0:r10000, u1000) = k21000
 321                             ~~~~~
 322
 323Idmappings when creating filesystem objects
 324-------------------------------------------
 325
 326The concepts of mapping an id down or mapping an id up are expressed in the two
 327kernel functions filesystem developers are rather familiar with and which we've
 328already used in this document::
 329
 330 /* Map the userspace id down into a kernel id. */
 331 make_kuid(idmapping, uid)
 332
 333 /* Map the kernel id up into a userspace id. */
 334 from_kuid(idmapping, kuid)
 335
 336We will take an abbreviated look into how idmappings figure into creating
 337filesystem objects. For simplicity we will only look at what happens when the
 338VFS has already completed path lookup right before it calls into the filesystem
 339itself. So we're concerned with what happens when e.g. ``vfs_mkdir()`` is
 340called. We will also assume that the directory we're creating filesystem
 341objects in is readable and writable for everyone.
 342
 343When creating a filesystem object the caller will look at the caller's
 344filesystem ids. These are just regular ``uid_t`` and ``gid_t`` userspace ids
 345but they are exclusively used when determining file ownership which is why they
 346are called "filesystem ids". They are usually identical to the uid and gid of
 347the caller but can differ. We will just assume they are always identical to not
 348get lost in too many details.
 349
 350When the caller enters the kernel two things happen:
 351
 3521. Map the caller's userspace ids down into kernel ids in the caller's
 353   idmapping.
 354   (To be precise, the kernel will simply look at the kernel ids stashed in the
 355   credentials of the current task but for our education we'll pretend this
 356   translation happens just in time.)
 3572. Verify that the caller's kernel ids can be mapped up to userspace ids in the
 358   filesystem's idmapping.
 359
 360The second step is important as regular filesystem will ultimately need to map
 361the kernel id back up into a userspace id when writing to disk.
 362So with the second step the kernel guarantees that a valid userspace id can be
 363written to disk. If it can't the kernel will refuse the creation request to not
 364even remotely risk filesystem corruption.
 365
 366The astute reader will have realized that this is simply a varation of the
 367crossmapping algorithm we mentioned above in a previous section. First, the
 368kernel maps the caller's userspace id down into a kernel id according to the
 369caller's idmapping and then maps that kernel id up according to the
 370filesystem's idmapping.
 371
 372Example 1
 373~~~~~~~~~
 374
 375::
 376
 377 caller id:            u1000
 378 caller idmapping:     u0:k0:r4294967295
 379 filesystem idmapping: u0:k0:r4294967295
 380
 381Both the caller and the filesystem use the identity idmapping:
 382
 3831. Map the caller's userspace ids into kernel ids in the caller's idmapping::
 384
 385    make_kuid(u0:k0:r4294967295, u1000) = k1000
 386
 3872. Verify that the caller's kernel ids can be mapped to userspace ids in the
 388   filesystem's idmapping.
 389
 390   For this second step the kernel will call the function
 391   ``fsuidgid_has_mapping()`` which ultimately boils down to calling
 392   ``from_kuid()``::
 393
 394    from_kuid(u0:k0:r4294967295, k1000) = u1000
 395
 396In this example both idmappings are the same so there's nothing exciting going
 397on. Ultimately the userspace id that lands on disk will be ``u1000``.
 398
 399Example 2
 400~~~~~~~~~
 401
 402::
 403
 404 caller id:            u1000
 405 caller idmapping:     u0:k10000:r10000
 406 filesystem idmapping: u0:k20000:r10000
 407
 4081. Map the caller's userspace ids down into kernel ids in the caller's
 409   idmapping::
 410
 411    make_kuid(u0:k10000:r10000, u1000) = k11000
 412
 4132. Verify that the caller's kernel ids can be mapped up to userspace ids in the
 414   filesystem's idmapping::
 415
 416    from_kuid(u0:k20000:r10000, k11000) = u-1
 417
 418It's immediately clear that while the caller's userspace id could be
 419successfully mapped down into kernel ids in the caller's idmapping the kernel
 420ids could not be mapped up according to the filesystem's idmapping. So the
 421kernel will deny this creation request.
 422
 423Note that while this example is less common, because most filesystem can't be
 424mounted with non-initial idmappings this is a general problem as we can see in
 425the next examples.
 426
 427Example 3
 428~~~~~~~~~
 429
 430::
 431
 432 caller id:            u1000
 433 caller idmapping:     u0:k10000:r10000
 434 filesystem idmapping: u0:k0:r4294967295
 435
 4361. Map the caller's userspace ids down into kernel ids in the caller's
 437   idmapping::
 438
 439    make_kuid(u0:k10000:r10000, u1000) = k11000
 440
 4412. Verify that the caller's kernel ids can be mapped up to userspace ids in the
 442   filesystem's idmapping::
 443
 444    from_kuid(u0:k0:r4294967295, k11000) = u11000
 445
 446We can see that the translation always succeeds. The userspace id that the
 447filesystem will ultimately put to disk will always be identical to the value of
 448the kernel id that was created in the caller's idmapping. This has mainly two
 449consequences.
 450
 451First, that we can't allow a caller to ultimately write to disk with another
 452userspace id. We could only do this if we were to mount the whole fileystem
 453with the caller's or another idmapping. But that solution is limited to a few
 454filesystems and not very flexible. But this is a use-case that is pretty
 455important in containerized workloads.
 456
 457Second, the caller will usually not be able to create any files or access
 458directories that have stricter permissions because none of the filesystem's
 459kernel ids map up into valid userspace ids in the caller's idmapping
 460
 4611. Map raw userspace ids down to kernel ids in the filesystem's idmapping::
 462
 463    make_kuid(u0:k0:r4294967295, u1000) = k1000
 464
 4652. Map kernel ids up to userspace ids in the caller's idmapping::
 466
 467    from_kuid(u0:k10000:r10000, k1000) = u-1
 468
 469Example 4
 470~~~~~~~~~
 471
 472::
 473
 474 file id:              u1000
 475 caller idmapping:     u0:k10000:r10000
 476 filesystem idmapping: u0:k0:r4294967295
 477
 478In order to report ownership to userspace the kernel uses the crossmapping
 479algorithm introduced in a previous section:
 480
 4811. Map the userspace id on disk down into a kernel id in the filesystem's
 482   idmapping::
 483
 484    make_kuid(u0:k0:r4294967295, u1000) = k1000
 485
 4862. Map the kernel id up into a userspace id in the caller's idmapping::
 487
 488    from_kuid(u0:k10000:r10000, k1000) = u-1
 489
 490The crossmapping algorithm fails in this case because the kernel id in the
 491filesystem idmapping cannot be mapped up to a userspace id in the caller's
 492idmapping. Thus, the kernel will report the ownership of this file as the
 493overflowid.
 494
 495Example 5
 496~~~~~~~~~
 497
 498::
 499
 500 file id:              u1000
 501 caller idmapping:     u0:k10000:r10000
 502 filesystem idmapping: u0:k20000:r10000
 503
 504In order to report ownership to userspace the kernel uses the crossmapping
 505algorithm introduced in a previous section:
 506
 5071. Map the userspace id on disk down into a kernel id in the filesystem's
 508   idmapping::
 509
 510    make_kuid(u0:k20000:r10000, u1000) = k21000
 511
 5122. Map the kernel id up into a userspace id in the caller's idmapping::
 513
 514    from_kuid(u0:k10000:r10000, k21000) = u-1
 515
 516Again, the crossmapping algorithm fails in this case because the kernel id in
 517the filesystem idmapping cannot be mapped to a userspace id in the caller's
 518idmapping. Thus, the kernel will report the ownership of this file as the
 519overflowid.
 520
 521Note how in the last two examples things would be simple if the caller would be
 522using the initial idmapping. For a filesystem mounted with the initial
 523idmapping it would be trivial. So we only consider a filesystem with an
 524idmapping of ``u0:k20000:r10000``:
 525
 5261. Map the userspace id on disk down into a kernel id in the filesystem's
 527   idmapping::
 528
 529    make_kuid(u0:k20000:r10000, u1000) = k21000
 530
 5312. Map the kernel id up into a userspace id in the caller's idmapping::
 532
 533    from_kuid(u0:k0:r4294967295, k21000) = u21000
 534
 535Idmappings on idmapped mounts
 536-----------------------------
 537
 538The examples we've seen in the previous section where the caller's idmapping
 539and the filesystem's idmapping are incompatible causes various issues for
 540workloads. For a more complex but common example, consider two containers
 541started on the host. To completely prevent the two containers from affecting
 542each other, an administrator may often use different non-overlapping idmappings
 543for the two containers::
 544
 545 container1 idmapping:  u0:k10000:r10000
 546 container2 idmapping:  u0:k20000:r10000
 547 filesystem idmapping:  u0:k30000:r10000
 548
 549An administrator wanting to provide easy read-write access to the following set
 550of files::
 551
 552 dir id:       u0
 553 dir/file1 id: u1000
 554 dir/file2 id: u2000
 555
 556to both containers currently can't.
 557
 558Of course the administrator has the option to recursively change ownership via
 559``chown()``. For example, they could change ownership so that ``dir`` and all
 560files below it can be crossmapped from the filesystem's into the container's
 561idmapping. Let's assume they change ownership so it is compatible with the
 562first container's idmapping::
 563
 564 dir id:       u10000
 565 dir/file1 id: u11000
 566 dir/file2 id: u12000
 567
 568This would still leave ``dir`` rather useless to the second container. In fact,
 569``dir`` and all files below it would continue to appear owned by the overflowid
 570for the second container.
 571
 572Or consider another increasingly popular example. Some service managers such as
 573systemd implement a concept called "portable home directories". A user may want
 574to use their home directories on different machines where they are assigned
 575different login userspace ids. Most users will have ``u1000`` as the login id
 576on their machine at home and all files in their home directory will usually be
 577owned by ``u1000``. At uni or at work they may have another login id such as
 578``u1125``. This makes it rather difficult to interact with their home directory
 579on their work machine.
 580
 581In both cases changing ownership recursively has grave implications. The most
 582obvious one is that ownership is changed globally and permanently. In the home
 583directory case this change in ownership would even need to happen everytime the
 584user switches from their home to their work machine. For really large sets of
 585files this becomes increasingly costly.
 586
 587If the user is lucky, they are dealing with a filesystem that is mountable
 588inside user namespaces. But this would also change ownership globally and the
 589change in ownership is tied to the lifetime of the filesystem mount, i.e. the
 590superblock. The only way to change ownership is to completely unmount the
 591filesystem and mount it again in another user namespace. This is usually
 592impossible because it would mean that all users currently accessing the
 593filesystem can't anymore. And it means that ``dir`` still can't be shared
 594between two containers with different idmappings.
 595But usually the user doesn't even have this option since most filesystems
 596aren't mountable inside containers. And not having them mountable might be
 597desirable as it doesn't require the filesystem to deal with malicious
 598filesystem images.
 599
 600But the usecases mentioned above and more can be handled by idmapped mounts.
 601They allow to expose the same set of dentries with different ownership at
 602different mounts. This is achieved by marking the mounts with a user namespace
 603through the ``mount_setattr()`` system call. The idmapping associated with it
 604is then used to translate from the caller's idmapping to the filesystem's
 605idmapping and vica versa using the remapping algorithm we introduced above.
 606
 607Idmapped mounts make it possible to change ownership in a temporary and
 608localized way. The ownership changes are restricted to a specific mount and the
 609ownership changes are tied to the lifetime of the mount. All other users and
 610locations where the filesystem is exposed are unaffected.
 611
 612Filesystems that support idmapped mounts don't have any real reason to support
 613being mountable inside user namespaces. A filesystem could be exposed
 614completely under an idmapped mount to get the same effect. This has the
 615advantage that filesystems can leave the creation of the superblock to
 616privileged users in the initial user namespace.
 617
 618However, it is perfectly possible to combine idmapped mounts with filesystems
 619mountable inside user namespaces. We will touch on this further below.
 620
 621Remapping helpers
 622~~~~~~~~~~~~~~~~~
 623
 624Idmapping functions were added that translate between idmappings. They make use
 625of the remapping algorithm we've introduced earlier. We're going to look at
 626two:
 627
 628- ``i_uid_into_mnt()`` and ``i_gid_into_mnt()``
 629
 630  The ``i_*id_into_mnt()`` functions translate filesystem's kernel ids into
 631  kernel ids in the mount's idmapping::
 632
 633   /* Map the filesystem's kernel id up into a userspace id in the filesystem's idmapping. */
 634   from_kuid(filesystem, kid) = uid
 635
 636   /* Map the filesystem's userspace id down ito a kernel id in the mount's idmapping. */
 637   make_kuid(mount, uid) = kuid
 638
 639- ``mapped_fsuid()`` and ``mapped_fsgid()``
 640
 641  The ``mapped_fs*id()`` functions translate the caller's kernel ids into
 642  kernel ids in the filesystem's idmapping. This translation is achieved by
 643  remapping the caller's kernel ids using the mount's idmapping::
 644
 645   /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */
 646   from_kuid(mount, kid) = uid
 647
 648   /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
 649   make_kuid(filesystem, uid) = kuid
 650
 651Note that these two functions invert each other. Consider the following
 652idmappings::
 653
 654 caller idmapping:     u0:k10000:r10000
 655 filesystem idmapping: u0:k20000:r10000
 656 mount idmapping:      u0:k10000:r10000
 657
 658Assume a file owned by ``u1000`` is read from disk. The filesystem maps this id
 659to ``k21000`` according to it's idmapping. This is what is stored in the
 660inode's ``i_uid`` and ``i_gid`` fields.
 661
 662When the caller queries the ownership of this file via ``stat()`` the kernel
 663would usually simply use the crossmapping algorithm and map the filesystem's
 664kernel id up to a userspace id in the caller's idmapping.
 665
 666But when the caller is accessing the file on an idmapped mount the kernel will
 667first call ``i_uid_into_mnt()`` thereby translating the filesystem's kernel id
 668into a kernel id in the mount's idmapping::
 669
 670 i_uid_into_mnt(k21000):
 671   /* Map the filesystem's kernel id up into a userspace id. */
 672   from_kuid(u0:k20000:r10000, k21000) = u1000
 673
 674   /* Map the filesystem's userspace id down ito a kernel id in the mount's idmapping. */
 675   make_kuid(u0:k10000:r10000, u1000) = k11000
 676
 677Finally, when the kernel reports the owner to the caller it will turn the
 678kernel id in the mount's idmapping into a userspace id in the caller's
 679idmapping::
 680
 681  from_kuid(u0:k10000:r10000, k11000) = u1000
 682
 683We can test whether this algorithm really works by verifying what happens when
 684we create a new file. Let's say the user is creating a file with ``u1000``.
 685
 686The kernel maps this to ``k11000`` in the caller's idmapping. Usually the
 687kernel would now apply the crossmapping, verifying that ``k11000`` can be
 688mapped to a userspace id in the filesystem's idmapping. Since ``k11000`` can't
 689be mapped up in the filesystem's idmapping directly this creation request
 690fails.
 691
 692But when the caller is accessing the file on an idmapped mount the kernel will
 693first call ``mapped_fs*id()`` thereby translating the caller's kernel id into
 694a kernel id according to the mount's idmapping::
 695
 696 mapped_fsuid(k11000):
 697    /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */
 698    from_kuid(u0:k10000:r10000, k11000) = u1000
 699
 700    /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
 701    make_kuid(u0:k20000:r10000, u1000) = k21000
 702
 703When finally writing to disk the kernel will then map ``k21000`` up into a
 704userspace id in the filesystem's idmapping::
 705
 706   from_kuid(u0:k20000:r10000, k21000) = u1000
 707
 708As we can see, we end up with an invertible and therefore information
 709preserving algorithm. A file created from ``u1000`` on an idmapped mount will
 710also be reported as being owned by ``u1000`` and vica versa.
 711
 712Let's now briefly reconsider the failing examples from earlier in the context
 713of idmapped mounts.
 714
 715Example 2 reconsidered
 716~~~~~~~~~~~~~~~~~~~~~~
 717
 718::
 719
 720 caller id:            u1000
 721 caller idmapping:     u0:k10000:r10000
 722 filesystem idmapping: u0:k20000:r10000
 723 mount idmapping:      u0:k10000:r10000
 724
 725When the caller is using a non-initial idmapping the common case is to attach
 726the same idmapping to the mount. We now perform three steps:
 727
 7281. Map the caller's userspace ids into kernel ids in the caller's idmapping::
 729
 730    make_kuid(u0:k10000:r10000, u1000) = k11000
 731
 7322. Translate the caller's kernel id into a kernel id in the filesystem's
 733   idmapping::
 734
 735    mapped_fsuid(k11000):
 736      /* Map the kernel id up into a userspace id in the mount's idmapping. */
 737      from_kuid(u0:k10000:r10000, k11000) = u1000
 738
 739      /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
 740      make_kuid(u0:k20000:r10000, u1000) = k21000
 741
 7422. Verify that the caller's kernel ids can be mapped to userspace ids in the
 743   filesystem's idmapping::
 744
 745    from_kuid(u0:k20000:r10000, k21000) = u1000
 746
 747So the ownership that lands on disk will be ``u1000``.
 748
 749Example 3 reconsidered
 750~~~~~~~~~~~~~~~~~~~~~~
 751
 752::
 753
 754 caller id:            u1000
 755 caller idmapping:     u0:k10000:r10000
 756 filesystem idmapping: u0:k0:r4294967295
 757 mount idmapping:      u0:k10000:r10000
 758
 759The same translation algorithm works with the third example.
 760
 7611. Map the caller's userspace ids into kernel ids in the caller's idmapping::
 762
 763    make_kuid(u0:k10000:r10000, u1000) = k11000
 764
 7652. Translate the caller's kernel id into a kernel id in the filesystem's
 766   idmapping::
 767
 768    mapped_fsuid(k11000):
 769       /* Map the kernel id up into a userspace id in the mount's idmapping. */
 770       from_kuid(u0:k10000:r10000, k11000) = u1000
 771
 772       /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
 773       make_kuid(u0:k0:r4294967295, u1000) = k1000
 774
 7752. Verify that the caller's kernel ids can be mapped to userspace ids in the
 776   filesystem's idmapping::
 777
 778    from_kuid(u0:k0:r4294967295, k21000) = u1000
 779
 780So the ownership that lands on disk will be ``u1000``.
 781
 782Example 4 reconsidered
 783~~~~~~~~~~~~~~~~~~~~~~
 784
 785::
 786
 787 file id:              u1000
 788 caller idmapping:     u0:k10000:r10000
 789 filesystem idmapping: u0:k0:r4294967295
 790 mount idmapping:      u0:k10000:r10000
 791
 792In order to report ownership to userspace the kernel now does three steps using
 793the translation algorithm we introduced earlier:
 794
 7951. Map the userspace id on disk down into a kernel id in the filesystem's
 796   idmapping::
 797
 798    make_kuid(u0:k0:r4294967295, u1000) = k1000
 799
 8002. Translate the kernel id into a kernel id in the mount's idmapping::
 801
 802    i_uid_into_mnt(k1000):
 803      /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
 804      from_kuid(u0:k0:r4294967295, k1000) = u1000
 805
 806      /* Map the userspace id down into a kernel id in the mounts's idmapping. */
 807      make_kuid(u0:k10000:r10000, u1000) = k11000
 808
 8093. Map the kernel id up into a userspace id in the caller's idmapping::
 810
 811    from_kuid(u0:k10000:r10000, k11000) = u1000
 812
 813Earlier, the caller's kernel id couldn't be crossmapped in the filesystems's
 814idmapping. With the idmapped mount in place it now can be crossmapped into the
 815filesystem's idmapping via the mount's idmapping. The file will now be created
 816with ``u1000`` according to the mount's idmapping.
 817
 818Example 5 reconsidered
 819~~~~~~~~~~~~~~~~~~~~~~
 820
 821::
 822
 823 file id:              u1000
 824 caller idmapping:     u0:k10000:r10000
 825 filesystem idmapping: u0:k20000:r10000
 826 mount idmapping:      u0:k10000:r10000
 827
 828Again, in order to report ownership to userspace the kernel now does three
 829steps using the translation algorithm we introduced earlier:
 830
 8311. Map the userspace id on disk down into a kernel id in the filesystem's
 832   idmapping::
 833
 834    make_kuid(u0:k20000:r10000, u1000) = k21000
 835
 8362. Translate the kernel id into a kernel id in the mount's idmapping::
 837
 838    i_uid_into_mnt(k21000):
 839      /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
 840      from_kuid(u0:k20000:r10000, k21000) = u1000
 841
 842      /* Map the userspace id down into a kernel id in the mounts's idmapping. */
 843      make_kuid(u0:k10000:r10000, u1000) = k11000
 844
 8453. Map the kernel id up into a userspace id in the caller's idmapping::
 846
 847    from_kuid(u0:k10000:r10000, k11000) = u1000
 848
 849Earlier, the file's kernel id couldn't be crossmapped in the filesystems's
 850idmapping. With the idmapped mount in place it now can be crossmapped into the
 851filesystem's idmapping via the mount's idmapping. The file is now owned by
 852``u1000`` according to the mount's idmapping.
 853
 854Changing ownership on a home directory
 855~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 856
 857We've seen above how idmapped mounts can be used to translate between
 858idmappings when either the caller, the filesystem or both uses a non-initial
 859idmapping. A wide range of usecases exist when the caller is using
 860a non-initial idmapping. This mostly happens in the context of containerized
 861workloads. The consequence is as we have seen that for both, filesystem's
 862mounted with the initial idmapping and filesystems mounted with non-initial
 863idmappings, access to the filesystem isn't working because the kernel ids can't
 864be crossmapped between the caller's and the filesystem's idmapping.
 865
 866As we've seen above idmapped mounts provide a solution to this by remapping the
 867caller's or filesystem's idmapping according to the mount's idmapping.
 868
 869Aside from containerized workloads, idmapped mounts have the advantage that
 870they also work when both the caller and the filesystem use the initial
 871idmapping which means users on the host can change the ownership of directories
 872and files on a per-mount basis.
 873
 874Consider our previous example where a user has their home directory on portable
 875storage. At home they have id ``u1000`` and all files in their home directory
 876are owned by ``u1000`` whereas at uni or work they have login id ``u1125``.
 877
 878Taking their home directory with them becomes problematic. They can't easily
 879access their files, they might not be able to write to disk without applying
 880lax permissions or ACLs and even if they can, they will end up with an annoying
 881mix of files and directories owned by ``u1000`` and ``u1125``.
 882
 883Idmapped mounts allow to solve this problem. A user can create an idmapped
 884mount for their home directory on their work computer or their computer at home
 885depending on what ownership they would prefer to end up on the portable storage
 886itself.
 887
 888Let's assume they want all files on disk to belong to ``u1000``. When the user
 889plugs in their portable storage at their work station they can setup a job that
 890creates an idmapped mount with the minimal idmapping ``u1000:k1125:r1``. So now
 891when they create a file the kernel performs the following steps we already know
 892from above:::
 893
 894 caller id:            u1125
 895 caller idmapping:     u0:k0:r4294967295
 896 filesystem idmapping: u0:k0:r4294967295
 897 mount idmapping:      u1000:k1125:r1
 898
 8991. Map the caller's userspace ids into kernel ids in the caller's idmapping::
 900
 901    make_kuid(u0:k0:r4294967295, u1125) = k1125
 902
 9032. Translate the caller's kernel id into a kernel id in the filesystem's
 904   idmapping::
 905
 906    mapped_fsuid(k1125):
 907      /* Map the kernel id up into a userspace id in the mount's idmapping. */
 908      from_kuid(u1000:k1125:r1, k1125) = u1000
 909
 910      /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
 911      make_kuid(u0:k0:r4294967295, u1000) = k1000
 912
 9132. Verify that the caller's kernel ids can be mapped to userspace ids in the
 914   filesystem's idmapping::
 915
 916    from_kuid(u0:k0:r4294967295, k1000) = u1000
 917
 918So ultimately the file will be created with ``u1000`` on disk.
 919
 920Now let's briefly look at what ownership the caller with id ``u1125`` will see
 921on their work computer:
 922
 923::
 924
 925 file id:              u1000
 926 caller idmapping:     u0:k0:r4294967295
 927 filesystem idmapping: u0:k0:r4294967295
 928 mount idmapping:      u1000:k1125:r1
 929
 9301. Map the userspace id on disk down into a kernel id in the filesystem's
 931   idmapping::
 932
 933    make_kuid(u0:k0:r4294967295, u1000) = k1000
 934
 9352. Translate the kernel id into a kernel id in the mount's idmapping::
 936
 937    i_uid_into_mnt(k1000):
 938      /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
 939      from_kuid(u0:k0:r4294967295, k1000) = u1000
 940
 941      /* Map the userspace id down into a kernel id in the mounts's idmapping. */
 942      make_kuid(u1000:k1125:r1, u1000) = k1125
 943
 9443. Map the kernel id up into a userspace id in the caller's idmapping::
 945
 946    from_kuid(u0:k0:r4294967295, k1125) = u1125
 947
 948So ultimately the caller will be reported that the file belongs to ``u1125``
 949which is the caller's userspace id on their workstation in our example.
 950
 951The raw userspace id that is put on disk is ``u1000`` so when the user takes
 952their home directory back to their home computer where they are assigned
 953``u1000`` using the initial idmapping and mount the filesystem with the initial
 954idmapping they will see all those files owned by ``u1000``.
 955