linux/Documentation/filesystems/sharedsubtree.rst
<<
>>
Prefs
   1.. SPDX-License-Identifier: GPL-2.0
   2
   3===============
   4Shared Subtrees
   5===============
   6
   7.. Contents:
   8        1) Overview
   9        2) Features
  10        3) Setting mount states
  11        4) Use-case
  12        5) Detailed semantics
  13        6) Quiz
  14        7) FAQ
  15        8) Implementation
  16
  17
  181) Overview
  19-----------
  20
  21Consider the following situation:
  22
  23A process wants to clone its own namespace, but still wants to access the CD
  24that got mounted recently.  Shared subtree semantics provide the necessary
  25mechanism to accomplish the above.
  26
  27It provides the necessary building blocks for features like per-user-namespace
  28and versioned filesystem.
  29
  302) Features
  31-----------
  32
  33Shared subtree provides four different flavors of mounts; struct vfsmount to be
  34precise
  35
  36        a. shared mount
  37        b. slave mount
  38        c. private mount
  39        d. unbindable mount
  40
  41
  422a) A shared mount can be replicated to as many mountpoints and all the
  43replicas continue to be exactly same.
  44
  45        Here is an example:
  46
  47        Let's say /mnt has a mount that is shared::
  48
  49            mount --make-shared /mnt
  50
  51        Note: mount(8) command now supports the --make-shared flag,
  52        so the sample 'smount' program is no longer needed and has been
  53        removed.
  54
  55        ::
  56
  57            # mount --bind /mnt /tmp
  58
  59        The above command replicates the mount at /mnt to the mountpoint /tmp
  60        and the contents of both the mounts remain identical.
  61
  62        ::
  63
  64            #ls /mnt
  65            a b c
  66
  67            #ls /tmp
  68            a b c
  69
  70        Now let's say we mount a device at /tmp/a::
  71
  72            # mount /dev/sd0  /tmp/a
  73
  74            #ls /tmp/a
  75            t1 t2 t3
  76
  77            #ls /mnt/a
  78            t1 t2 t3
  79
  80        Note that the mount has propagated to the mount at /mnt as well.
  81
  82        And the same is true even when /dev/sd0 is mounted on /mnt/a. The
  83        contents will be visible under /tmp/a too.
  84
  85
  862b) A slave mount is like a shared mount except that mount and umount events
  87        only propagate towards it.
  88
  89        All slave mounts have a master mount which is a shared.
  90
  91        Here is an example:
  92
  93        Let's say /mnt has a mount which is shared.
  94        # mount --make-shared /mnt
  95
  96        Let's bind mount /mnt to /tmp
  97        # mount --bind /mnt /tmp
  98
  99        the new mount at /tmp becomes a shared mount and it is a replica of
 100        the mount at /mnt.
 101
 102        Now let's make the mount at /tmp; a slave of /mnt
 103        # mount --make-slave /tmp
 104
 105        let's mount /dev/sd0 on /mnt/a
 106        # mount /dev/sd0 /mnt/a
 107
 108        #ls /mnt/a
 109        t1 t2 t3
 110
 111        #ls /tmp/a
 112        t1 t2 t3
 113
 114        Note the mount event has propagated to the mount at /tmp
 115
 116        However let's see what happens if we mount something on the mount at /tmp
 117
 118        # mount /dev/sd1 /tmp/b
 119
 120        #ls /tmp/b
 121        s1 s2 s3
 122
 123        #ls /mnt/b
 124
 125        Note how the mount event has not propagated to the mount at
 126        /mnt
 127
 128
 1292c) A private mount does not forward or receive propagation.
 130
 131        This is the mount we are familiar with. Its the default type.
 132
 133
 1342d) A unbindable mount is a unbindable private mount
 135
 136        let's say we have a mount at /mnt and we make it unbindable::
 137
 138            # mount --make-unbindable /mnt
 139
 140         Let's try to bind mount this mount somewhere else::
 141
 142            # mount --bind /mnt /tmp
 143            mount: wrong fs type, bad option, bad superblock on /mnt,
 144                    or too many mounted file systems
 145
 146        Binding a unbindable mount is a invalid operation.
 147
 148
 1493) Setting mount states
 150
 151        The mount command (util-linux package) can be used to set mount
 152        states::
 153
 154            mount --make-shared mountpoint
 155            mount --make-slave mountpoint
 156            mount --make-private mountpoint
 157            mount --make-unbindable mountpoint
 158
 159
 1604) Use cases
 161------------
 162
 163        A) A process wants to clone its own namespace, but still wants to
 164           access the CD that got mounted recently.
 165
 166           Solution:
 167
 168                The system administrator can make the mount at /cdrom shared::
 169
 170                    mount --bind /cdrom /cdrom
 171                    mount --make-shared /cdrom
 172
 173                Now any process that clones off a new namespace will have a
 174                mount at /cdrom which is a replica of the same mount in the
 175                parent namespace.
 176
 177                So when a CD is inserted and mounted at /cdrom that mount gets
 178                propagated to the other mount at /cdrom in all the other clone
 179                namespaces.
 180
 181        B) A process wants its mounts invisible to any other process, but
 182        still be able to see the other system mounts.
 183
 184           Solution:
 185
 186                To begin with, the administrator can mark the entire mount tree
 187                as shareable::
 188
 189                    mount --make-rshared /
 190
 191                A new process can clone off a new namespace. And mark some part
 192                of its namespace as slave::
 193
 194                    mount --make-rslave /myprivatetree
 195
 196                Hence forth any mounts within the /myprivatetree done by the
 197                process will not show up in any other namespace. However mounts
 198                done in the parent namespace under /myprivatetree still shows
 199                up in the process's namespace.
 200
 201
 202        Apart from the above semantics this feature provides the
 203        building blocks to solve the following problems:
 204
 205        C)  Per-user namespace
 206
 207                The above semantics allows a way to share mounts across
 208                namespaces.  But namespaces are associated with processes. If
 209                namespaces are made first class objects with user API to
 210                associate/disassociate a namespace with userid, then each user
 211                could have his/her own namespace and tailor it to his/her
 212                requirements. This needs to be supported in PAM.
 213
 214        D)  Versioned files
 215
 216                If the entire mount tree is visible at multiple locations, then
 217                an underlying versioning file system can return different
 218                versions of the file depending on the path used to access that
 219                file.
 220
 221                An example is::
 222
 223                    mount --make-shared /
 224                    mount --rbind / /view/v1
 225                    mount --rbind / /view/v2
 226                    mount --rbind / /view/v3
 227                    mount --rbind / /view/v4
 228
 229                and if /usr has a versioning filesystem mounted, then that
 230                mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
 231                /view/v4/usr too
 232
 233                A user can request v3 version of the file /usr/fs/namespace.c
 234                by accessing /view/v3/usr/fs/namespace.c . The underlying
 235                versioning filesystem can then decipher that v3 version of the
 236                filesystem is being requested and return the corresponding
 237                inode.
 238
 2395) Detailed semantics
 240---------------------
 241        The section below explains the detailed semantics of
 242        bind, rbind, move, mount, umount and clone-namespace operations.
 243
 244        Note: the word 'vfsmount' and the noun 'mount' have been used
 245        to mean the same thing, throughout this document.
 246
 2475a) Mount states
 248
 249        A given mount can be in one of the following states
 250
 251        1) shared
 252        2) slave
 253        3) shared and slave
 254        4) private
 255        5) unbindable
 256
 257        A 'propagation event' is defined as event generated on a vfsmount
 258        that leads to mount or unmount actions in other vfsmounts.
 259
 260        A 'peer group' is defined as a group of vfsmounts that propagate
 261        events to each other.
 262
 263        (1) Shared mounts
 264
 265                A 'shared mount' is defined as a vfsmount that belongs to a
 266                'peer group'.
 267
 268                For example::
 269
 270                        mount --make-shared /mnt
 271                        mount --bind /mnt /tmp
 272
 273                The mount at /mnt and that at /tmp are both shared and belong
 274                to the same peer group. Anything mounted or unmounted under
 275                /mnt or /tmp reflect in all the other mounts of its peer
 276                group.
 277
 278
 279        (2) Slave mounts
 280
 281                A 'slave mount' is defined as a vfsmount that receives
 282                propagation events and does not forward propagation events.
 283
 284                A slave mount as the name implies has a master mount from which
 285                mount/unmount events are received. Events do not propagate from
 286                the slave mount to the master.  Only a shared mount can be made
 287                a slave by executing the following command::
 288
 289                        mount --make-slave mount
 290
 291                A shared mount that is made as a slave is no more shared unless
 292                modified to become shared.
 293
 294        (3) Shared and Slave
 295
 296                A vfsmount can be both shared as well as slave.  This state
 297                indicates that the mount is a slave of some vfsmount, and
 298                has its own peer group too.  This vfsmount receives propagation
 299                events from its master vfsmount, and also forwards propagation
 300                events to its 'peer group' and to its slave vfsmounts.
 301
 302                Strictly speaking, the vfsmount is shared having its own
 303                peer group, and this peer-group is a slave of some other
 304                peer group.
 305
 306                Only a slave vfsmount can be made as 'shared and slave' by
 307                either executing the following command::
 308
 309                        mount --make-shared mount
 310
 311                or by moving the slave vfsmount under a shared vfsmount.
 312
 313        (4) Private mount
 314
 315                A 'private mount' is defined as vfsmount that does not
 316                receive or forward any propagation events.
 317
 318        (5) Unbindable mount
 319
 320                A 'unbindable mount' is defined as vfsmount that does not
 321                receive or forward any propagation events and cannot
 322                be bind mounted.
 323
 324
 325        State diagram:
 326
 327        The state diagram below explains the state transition of a mount,
 328        in response to various commands::
 329
 330            -----------------------------------------------------------------------
 331            |             |make-shared |  make-slave  | make-private |make-unbindab|
 332            --------------|------------|--------------|--------------|-------------|
 333            |shared       |shared      |*slave/private|   private    | unbindable  |
 334            |             |            |              |              |             |
 335            |-------------|------------|--------------|--------------|-------------|
 336            |slave        |shared      | **slave      |    private   | unbindable  |
 337            |             |and slave   |              |              |             |
 338            |-------------|------------|--------------|--------------|-------------|
 339            |shared       |shared      | slave        |    private   | unbindable  |
 340            |and slave    |and slave   |              |              |             |
 341            |-------------|------------|--------------|--------------|-------------|
 342            |private      |shared      |  **private   |    private   | unbindable  |
 343            |-------------|------------|--------------|--------------|-------------|
 344            |unbindable   |shared      |**unbindable  |    private   | unbindable  |
 345            ------------------------------------------------------------------------
 346
 347            * if the shared mount is the only mount in its peer group, making it
 348            slave, makes it private automatically. Note that there is no master to
 349            which it can be slaved to.
 350
 351            ** slaving a non-shared mount has no effect on the mount.
 352
 353        Apart from the commands listed below, the 'move' operation also changes
 354        the state of a mount depending on type of the destination mount. Its
 355        explained in section 5d.
 356
 3575b) Bind semantics
 358
 359        Consider the following command::
 360
 361            mount --bind A/a  B/b
 362
 363        where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
 364        is the destination mount and 'b' is the dentry in the destination mount.
 365
 366        The outcome depends on the type of mount of 'A' and 'B'. The table
 367        below contains quick reference::
 368
 369            --------------------------------------------------------------------------
 370            |         BIND MOUNT OPERATION                                           |
 371            |************************************************************************|
 372            |source(A)->| shared      |       private  |       slave    | unbindable |
 373            | dest(B)  |              |                |                |            |
 374            |   |      |              |                |                |            |
 375            |   v      |              |                |                |            |
 376            |************************************************************************|
 377            |  shared  | shared       |     shared     | shared & slave |  invalid   |
 378            |          |              |                |                |            |
 379            |non-shared| shared       |      private   |      slave     |  invalid   |
 380            **************************************************************************
 381
 382        Details:
 383
 384    1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
 385        which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
 386        mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
 387        are created and mounted at the dentry 'b' on all mounts where 'B'
 388        propagates to. A new propagation tree containing 'C1',..,'Cn' is
 389        created. This propagation tree is identical to the propagation tree of
 390        'B'.  And finally the peer-group of 'C' is merged with the peer group
 391        of 'A'.
 392
 393    2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
 394        which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
 395        mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
 396        are created and mounted at the dentry 'b' on all mounts where 'B'
 397        propagates to. A new propagation tree is set containing all new mounts
 398        'C', 'C1', .., 'Cn' with exactly the same configuration as the
 399        propagation tree for 'B'.
 400
 401    3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
 402        mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
 403        'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
 404        'C3' ... are created and mounted at the dentry 'b' on all mounts where
 405        'B' propagates to. A new propagation tree containing the new mounts
 406        'C','C1',..  'Cn' is created. This propagation tree is identical to the
 407        propagation tree for 'B'. And finally the mount 'C' and its peer group
 408        is made the slave of mount 'Z'.  In other words, mount 'C' is in the
 409        state 'slave and shared'.
 410
 411    4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
 412        invalid operation.
 413
 414    5. 'A' is a private mount and 'B' is a non-shared(private or slave or
 415        unbindable) mount. A new mount 'C' which is clone of 'A', is created.
 416        Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
 417
 418    6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
 419        which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
 420        mounted on mount 'B' at dentry 'b'.  'C' is made a member of the
 421        peer-group of 'A'.
 422
 423    7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
 424        new mount 'C' which is a clone of 'A' is created. Its root dentry is
 425        'a'.  'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
 426        slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
 427        'Z'.  All mount/unmount events on 'Z' propagates to 'A' and 'C'. But
 428        mount/unmount on 'A' do not propagate anywhere else. Similarly
 429        mount/unmount on 'C' do not propagate anywhere else.
 430
 431    8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
 432        invalid operation. A unbindable mount cannot be bind mounted.
 433
 4345c) Rbind semantics
 435
 436        rbind is same as bind. Bind replicates the specified mount.  Rbind
 437        replicates all the mounts in the tree belonging to the specified mount.
 438        Rbind mount is bind mount applied to all the mounts in the tree.
 439
 440        If the source tree that is rbind has some unbindable mounts,
 441        then the subtree under the unbindable mount is pruned in the new
 442        location.
 443
 444        eg:
 445
 446          let's say we have the following mount tree::
 447
 448                A
 449              /   \
 450              B   C
 451             / \ / \
 452             D E F G
 453
 454          Let's say all the mount except the mount C in the tree are
 455          of a type other than unbindable.
 456
 457          If this tree is rbound to say Z
 458
 459          We will have the following tree at the new location::
 460
 461                Z
 462                |
 463                A'
 464               /
 465              B'                Note how the tree under C is pruned
 466             / \                in the new location.
 467            D' E'
 468
 469
 470
 4715d) Move semantics
 472
 473        Consider the following command
 474
 475        mount --move A  B/b
 476
 477        where 'A' is the source mount, 'B' is the destination mount and 'b' is
 478        the dentry in the destination mount.
 479
 480        The outcome depends on the type of the mount of 'A' and 'B'. The table
 481        below is a quick reference::
 482
 483            ---------------------------------------------------------------------------
 484            |                   MOVE MOUNT OPERATION                                 |
 485            |**************************************************************************
 486            | source(A)->| shared      |       private  |       slave    | unbindable |
 487            | dest(B)  |               |                |                |            |
 488            |   |      |               |                |                |            |
 489            |   v      |               |                |                |            |
 490            |**************************************************************************
 491            |  shared  | shared        |     shared     |shared and slave|  invalid   |
 492            |          |               |                |                |            |
 493            |non-shared| shared        |      private   |    slave       | unbindable |
 494            ***************************************************************************
 495
 496        .. Note:: moving a mount residing under a shared mount is invalid.
 497
 498      Details follow:
 499
 500    1. 'A' is a shared mount and 'B' is a shared mount.  The mount 'A' is
 501        mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1', 'A2'...'An'
 502        are created and mounted at dentry 'b' on all mounts that receive
 503        propagation from mount 'B'. A new propagation tree is created in the
 504        exact same configuration as that of 'B'. This new propagation tree
 505        contains all the new mounts 'A1', 'A2'...  'An'.  And this new
 506        propagation tree is appended to the already existing propagation tree
 507        of 'A'.
 508
 509    2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
 510        mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
 511        are created and mounted at dentry 'b' on all mounts that receive
 512        propagation from mount 'B'. The mount 'A' becomes a shared mount and a
 513        propagation tree is created which is identical to that of
 514        'B'. This new propagation tree contains all the new mounts 'A1',
 515        'A2'...  'An'.
 516
 517    3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount.  The
 518        mount 'A' is mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1',
 519        'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
 520        receive propagation from mount 'B'. A new propagation tree is created
 521        in the exact same configuration as that of 'B'. This new propagation
 522        tree contains all the new mounts 'A1', 'A2'...  'An'.  And this new
 523        propagation tree is appended to the already existing propagation tree of
 524        'A'.  Mount 'A' continues to be the slave mount of 'Z' but it also
 525        becomes 'shared'.
 526
 527    4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
 528        is invalid. Because mounting anything on the shared mount 'B' can
 529        create new mounts that get mounted on the mounts that receive
 530        propagation from 'B'.  And since the mount 'A' is unbindable, cloning
 531        it to mount at other mountpoints is not possible.
 532
 533    5. 'A' is a private mount and 'B' is a non-shared(private or slave or
 534        unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
 535
 536    6. 'A' is a shared mount and 'B' is a non-shared mount.  The mount 'A'
 537        is mounted on mount 'B' at dentry 'b'.  Mount 'A' continues to be a
 538        shared mount.
 539
 540    7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
 541        The mount 'A' is mounted on mount 'B' at dentry 'b'.  Mount 'A'
 542        continues to be a slave mount of mount 'Z'.
 543
 544    8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
 545        'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
 546        unbindable mount.
 547
 5485e) Mount semantics
 549
 550        Consider the following command::
 551
 552            mount device  B/b
 553
 554        'B' is the destination mount and 'b' is the dentry in the destination
 555        mount.
 556
 557        The above operation is the same as bind operation with the exception
 558        that the source mount is always a private mount.
 559
 560
 5615f) Unmount semantics
 562
 563        Consider the following command::
 564
 565            umount A
 566
 567        where 'A' is a mount mounted on mount 'B' at dentry 'b'.
 568
 569        If mount 'B' is shared, then all most-recently-mounted mounts at dentry
 570        'b' on mounts that receive propagation from mount 'B' and does not have
 571        sub-mounts within them are unmounted.
 572
 573        Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to
 574        each other.
 575
 576        let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount
 577        'B1', 'B2' and 'B3' respectively.
 578
 579        let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on
 580        mount 'B1', 'B2' and 'B3' respectively.
 581
 582        if 'C1' is unmounted, all the mounts that are most-recently-mounted on
 583        'B1' and on the mounts that 'B1' propagates-to are unmounted.
 584
 585        'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount
 586        on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'.
 587
 588        So all 'C1', 'C2' and 'C3' should be unmounted.
 589
 590        If any of 'C2' or 'C3' has some child mounts, then that mount is not
 591        unmounted, but all other mounts are unmounted. However if 'C1' is told
 592        to be unmounted and 'C1' has some sub-mounts, the umount operation is
 593        failed entirely.
 594
 5955g) Clone Namespace
 596
 597        A cloned namespace contains all the mounts as that of the parent
 598        namespace.
 599
 600        Let's say 'A' and 'B' are the corresponding mounts in the parent and the
 601        child namespace.
 602
 603        If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to
 604        each other.
 605
 606        If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of
 607        'Z'.
 608
 609        If 'A' is a private mount, then 'B' is a private mount too.
 610
 611        If 'A' is unbindable mount, then 'B' is a unbindable mount too.
 612
 613
 6146) Quiz
 615
 616        A. What is the result of the following command sequence?
 617
 618                ::
 619
 620                    mount --bind /mnt /mnt
 621                    mount --make-shared /mnt
 622                    mount --bind /mnt /tmp
 623                    mount --move /tmp /mnt/1
 624
 625                what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
 626                Should they all be identical? or should /mnt and /mnt/1 be
 627                identical only?
 628
 629
 630        B. What is the result of the following command sequence?
 631
 632                ::
 633
 634                    mount --make-rshared /
 635                    mkdir -p /v/1
 636                    mount --rbind / /v/1
 637
 638                what should be the content of /v/1/v/1 be?
 639
 640
 641        C. What is the result of the following command sequence?
 642
 643                ::
 644
 645                    mount --bind /mnt /mnt
 646                    mount --make-shared /mnt
 647                    mkdir -p /mnt/1/2/3 /mnt/1/test
 648                    mount --bind /mnt/1 /tmp
 649                    mount --make-slave /mnt
 650                    mount --make-shared /mnt
 651                    mount --bind /mnt/1/2 /tmp1
 652                    mount --make-slave /mnt
 653
 654                At this point we have the first mount at /tmp and
 655                its root dentry is 1. Let's call this mount 'A'
 656                And then we have a second mount at /tmp1 with root
 657                dentry 2. Let's call this mount 'B'
 658                Next we have a third mount at /mnt with root dentry
 659                mnt. Let's call this mount 'C'
 660
 661                'B' is the slave of 'A' and 'C' is a slave of 'B'
 662                A -> B -> C
 663
 664                at this point if we execute the following command
 665
 666                mount --bind /bin /tmp/test
 667
 668                The mount is attempted on 'A'
 669
 670                will the mount propagate to 'B' and 'C' ?
 671
 672                what would be the contents of
 673                /mnt/1/test be?
 674
 6757) FAQ
 676
 677        Q1. Why is bind mount needed? How is it different from symbolic links?
 678                symbolic links can get stale if the destination mount gets
 679                unmounted or moved. Bind mounts continue to exist even if the
 680                other mount is unmounted or moved.
 681
 682        Q2. Why can't the shared subtree be implemented using exportfs?
 683
 684                exportfs is a heavyweight way of accomplishing part of what
 685                shared subtree can do. I cannot imagine a way to implement the
 686                semantics of slave mount using exportfs?
 687
 688        Q3 Why is unbindable mount needed?
 689
 690                Let's say we want to replicate the mount tree at multiple
 691                locations within the same subtree.
 692
 693                if one rbind mounts a tree within the same subtree 'n' times
 694                the number of mounts created is an exponential function of 'n'.
 695                Having unbindable mount can help prune the unneeded bind
 696                mounts. Here is an example.
 697
 698                step 1:
 699                   let's say the root tree has just two directories with
 700                   one vfsmount::
 701
 702                                    root
 703                                   /    \
 704                                  tmp    usr
 705
 706                    And we want to replicate the tree at multiple
 707                    mountpoints under /root/tmp
 708
 709                step 2:
 710                      ::
 711
 712
 713                        mount --make-shared /root
 714
 715                        mkdir -p /tmp/m1
 716
 717                        mount --rbind /root /tmp/m1
 718
 719                      the new tree now looks like this::
 720
 721                                    root
 722                                   /    \
 723                                 tmp    usr
 724                                /
 725                               m1
 726                              /  \
 727                             tmp  usr
 728                             /
 729                            m1
 730
 731                          it has two vfsmounts
 732
 733                step 3:
 734                    ::
 735
 736                            mkdir -p /tmp/m2
 737                            mount --rbind /root /tmp/m2
 738
 739                        the new tree now looks like this::
 740
 741                                      root
 742                                     /    \
 743                                   tmp     usr
 744                                  /    \
 745                                m1       m2
 746                               / \       /  \
 747                             tmp  usr   tmp  usr
 748                             / \          /
 749                            m1  m2      m1
 750                                / \     /  \
 751                              tmp usr  tmp   usr
 752                              /        / \
 753                             m1       m1  m2
 754                            /  \
 755                          tmp   usr
 756                          /  \
 757                         m1   m2
 758
 759                       it has 6 vfsmounts
 760
 761                step 4:
 762                      ::
 763                          mkdir -p /tmp/m3
 764                          mount --rbind /root /tmp/m3
 765
 766                          I won't draw the tree..but it has 24 vfsmounts
 767
 768
 769                at step i the number of vfsmounts is V[i] = i*V[i-1].
 770                This is an exponential function. And this tree has way more
 771                mounts than what we really needed in the first place.
 772
 773                One could use a series of umount at each step to prune
 774                out the unneeded mounts. But there is a better solution.
 775                Unclonable mounts come in handy here.
 776
 777                step 1:
 778                   let's say the root tree has just two directories with
 779                   one vfsmount::
 780
 781                                    root
 782                                   /    \
 783                                  tmp    usr
 784
 785                    How do we set up the same tree at multiple locations under
 786                    /root/tmp
 787
 788                step 2:
 789                      ::
 790
 791
 792                        mount --bind /root/tmp /root/tmp
 793
 794                        mount --make-rshared /root
 795                        mount --make-unbindable /root/tmp
 796
 797                        mkdir -p /tmp/m1
 798
 799                        mount --rbind /root /tmp/m1
 800
 801                      the new tree now looks like this::
 802
 803                                    root
 804                                   /    \
 805                                 tmp    usr
 806                                /
 807                               m1
 808                              /  \
 809                             tmp  usr
 810
 811                step 3:
 812                      ::
 813
 814                            mkdir -p /tmp/m2
 815                            mount --rbind /root /tmp/m2
 816
 817                      the new tree now looks like this::
 818
 819                                    root
 820                                   /    \
 821                                 tmp    usr
 822                                /   \
 823                               m1     m2
 824                              /  \     / \
 825                             tmp  usr tmp usr
 826
 827                step 4:
 828                      ::
 829
 830                            mkdir -p /tmp/m3
 831                            mount --rbind /root /tmp/m3
 832
 833                      the new tree now looks like this::
 834
 835                                          root
 836                                      /           \
 837                                     tmp           usr
 838                                 /    \    \
 839                               m1     m2     m3
 840                              /  \     / \    /  \
 841                             tmp  usr tmp usr tmp usr
 842
 8438) Implementation
 844
 8458A) Datastructure
 846
 847        4 new fields are introduced to struct vfsmount:
 848
 849        *   ->mnt_share
 850        *   ->mnt_slave_list
 851        *   ->mnt_slave
 852        *   ->mnt_master
 853
 854        ->mnt_share
 855                links together all the mount to/from which this vfsmount
 856                send/receives propagation events.
 857
 858        ->mnt_slave_list
 859                links all the mounts to which this vfsmount propagates
 860                to.
 861
 862        ->mnt_slave
 863                links together all the slaves that its master vfsmount
 864                propagates to.
 865
 866        ->mnt_master
 867                points to the master vfsmount from which this vfsmount
 868                receives propagation.
 869
 870        ->mnt_flags
 871                takes two more flags to indicate the propagation status of
 872                the vfsmount.  MNT_SHARE indicates that the vfsmount is a shared
 873                vfsmount.  MNT_UNCLONABLE indicates that the vfsmount cannot be
 874                replicated.
 875
 876        All the shared vfsmounts in a peer group form a cyclic list through
 877        ->mnt_share.
 878
 879        All vfsmounts with the same ->mnt_master form on a cyclic list anchored
 880        in ->mnt_master->mnt_slave_list and going through ->mnt_slave.
 881
 882         ->mnt_master can point to arbitrary (and possibly different) members
 883         of master peer group.  To find all immediate slaves of a peer group
 884         you need to go through _all_ ->mnt_slave_list of its members.
 885         Conceptually it's just a single set - distribution among the
 886         individual lists does not affect propagation or the way propagation
 887         tree is modified by operations.
 888
 889        All vfsmounts in a peer group have the same ->mnt_master.  If it is
 890        non-NULL, they form a contiguous (ordered) segment of slave list.
 891
 892        A example propagation tree looks as shown in the figure below.
 893        [ NOTE: Though it looks like a forest, if we consider all the shared
 894        mounts as a conceptual entity called 'pnode', it becomes a tree]::
 895
 896
 897                        A <--> B <--> C <---> D
 898                       /|\            /|      |\
 899                      / F G          J K      H I
 900                     /
 901                    E<-->K
 902                        /|\
 903                       M L N
 904
 905        In the above figure  A,B,C and D all are shared and propagate to each
 906        other.   'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave
 907        mounts 'J' and 'K'  and  'D' has got two slave mounts 'H' and 'I'.
 908        'E' is also shared with 'K' and they propagate to each other.  And
 909        'K' has 3 slaves 'M', 'L' and 'N'
 910
 911        A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D'
 912
 913        A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
 914
 915        E's ->mnt_share links with ->mnt_share of K
 916
 917        'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A'
 918
 919        'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
 920
 921        K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
 922
 923        C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
 924
 925        J and K's ->mnt_master points to struct vfsmount of C
 926
 927        and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
 928
 929        'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
 930
 931
 932        NOTE: The propagation tree is orthogonal to the mount tree.
 933
 9348B Locking:
 935
 936        ->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected
 937        by namespace_sem (exclusive for modifications, shared for reading).
 938
 939        Normally we have ->mnt_flags modifications serialized by vfsmount_lock.
 940        There are two exceptions: do_add_mount() and clone_mnt().
 941        The former modifies a vfsmount that has not been visible in any shared
 942        data structures yet.
 943        The latter holds namespace_sem and the only references to vfsmount
 944        are in lists that can't be traversed without namespace_sem.
 945
 9468C Algorithm:
 947
 948        The crux of the implementation resides in rbind/move operation.
 949
 950        The overall algorithm breaks the operation into 3 phases: (look at
 951        attach_recursive_mnt() and propagate_mnt())
 952
 953        1. prepare phase.
 954        2. commit phases.
 955        3. abort phases.
 956
 957        Prepare phase:
 958
 959        for each mount in the source tree:
 960
 961                   a) Create the necessary number of mount trees to
 962                        be attached to each of the mounts that receive
 963                        propagation from the destination mount.
 964                   b) Do not attach any of the trees to its destination.
 965                      However note down its ->mnt_parent and ->mnt_mountpoint
 966                   c) Link all the new mounts to form a propagation tree that
 967                      is identical to the propagation tree of the destination
 968                      mount.
 969
 970                   If this phase is successful, there should be 'n' new
 971                   propagation trees; where 'n' is the number of mounts in the
 972                   source tree.  Go to the commit phase
 973
 974                   Also there should be 'm' new mount trees, where 'm' is
 975                   the number of mounts to which the destination mount
 976                   propagates to.
 977
 978                   if any memory allocations fail, go to the abort phase.
 979
 980        Commit phase
 981                attach each of the mount trees to their corresponding
 982                destination mounts.
 983
 984        Abort phase
 985                delete all the newly created trees.
 986
 987        .. Note::
 988           all the propagation related functionality resides in the file pnode.c
 989
 990
 991------------------------------------------------------------------------
 992
 993version 0.1  (created the initial document, Ram Pai linuxram@us.ibm.com)
 994
 995version 0.2  (Incorporated comments from Al Viro)
 996