2autofs - how it works
   8The goal of autofs is to provide on-demand mounting and race free
   9automatic unmounting of various other filesystems.  This provides two
  10key advantages:
  121. There is no need to delay boot until all filesystems that
  13   might be needed are mounted.  Processes that try to access those
  14   slow filesystems might be delayed but other processes can
  15   continue freely.  This is particularly important for
  16   network filesystems (e.g. NFS) or filesystems stored on
  17   media with a media-changing robot.
  192. The names and locations of filesystems can be stored in
  20   a remote database and can change at any time.  The content
  21   in that data base at the time of access will be used to provide
  22   a target for the access.  The interpretation of names in the
  23   filesystem can even be programmatic rather than database-backed,
  24   allowing wildcards for example, and can vary based on the user who
  25   first accessed a name.
  30The "autofs" filesystem module is only one part of an autofs system.
  31There also needs to be a user-space program which looks up names
  32and mounts filesystems.  This will often be the "automount" program,
  33though other tools including "systemd" can make use of "autofs".
  34This document describes only the kernel module and the interactions
  35required with any user-space program.  Subsequent text refers to this
  36as the "automount daemon" or simply "the daemon".
  38"autofs" is a Linux kernel module which provides the "autofs"
  39filesystem type.  Several "autofs" filesystems can be mounted and they
  40can each be managed separately, or all managed by the same daemon.
  45An autofs filesystem can contain 3 sorts of objects: directories,
  46symbolic links and mount traps.  Mount traps are directories with
  47extra properties as described in the next section.
  49Objects can only be created by the automount daemon: symlinks are
  50created with a regular `symlink` system call, while directories and
  51mount traps are created with `mkdir`.  The determination of whether a
  52directory should be a mount trap is based on a master map. This master
  53map is consulted by autofs to determine which directories are mount
  54points. Mount points can be *direct*/*indirect*/*offset*.
  55On most systems, the default master map is located at */etc/auto.master*.
  57If neither the *direct* or *offset* mount options are given (so the
  58mount is considered to be *indirect*), then the root directory is
  59always a regular directory, otherwise it is a mount trap when it is
  60empty and a regular directory when not empty.  Note that *direct* and
  61*offset* are treated identically so a concise summary is that the root
  62directory is a mount trap only if the filesystem is mounted *direct*
  63and the root is empty.
  65Directories created in the root directory are mount traps only if the
  66filesystem is mounted *indirect* and they are empty.
  68Directories further down the tree depend on the *maxproto* mount
  69option and particularly whether it is less than five or not.
  70When *maxproto* is five, no directories further down the
  71tree are ever mount traps, they are always regular directories.  When
  72the *maxproto* is four (or three), these directories are mount traps
  73precisely when they are empty.
  75So: non-empty (i.e. non-leaf) directories are never mount traps. Empty
  76directories are sometimes mount traps, and sometimes not depending on
  77where in the tree they are (root, top level, or lower), the *maxproto*,
  78and whether the mount was *indirect* or not.
  80Mount Traps
  83A core element of the implementation of autofs is the Mount Traps
  84which are provided by the Linux VFS.  Any directory provided by a
  85filesystem can be designated as a trap.  This involves two separate
  86features that work together to allow autofs to do its job.
  90If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if
  91the inode has S_AUTOMOUNT set, or can be set directly) then it is
  92(potentially) a mount trap.  Any access to this directory beyond a
  93"`stat`" will (normally) cause the `d_op->d_automount()` dentry operation
  94to be called. The task of this method is to find the filesystem that
  95should be mounted on the directory and to return it.  The VFS is
  96responsible for actually mounting the root of this filesystem on the
  99autofs doesn't find the filesystem itself but sends a message to the
 100automount daemon asking it to find and mount the filesystem.  The
 101autofs `d_automount` method then waits for the daemon to report that
 102everything is ready.  It will then return "`NULL`" indicating that the
 103mount has already happened.  The VFS doesn't try to mount anything but
 104follows down the mount that is already there.
 106This functionality is sufficient for some users of mount traps such
 107as NFS which creates traps so that mountpoints on the server can be
 108reflected on the client.  However it is not sufficient for autofs.  As
 109mounting onto a directory is considered to be "beyond a `stat`", the
 110automount daemon would not be able to mount a filesystem on the 'trap'
 111directory without some way to avoid getting caught in the trap.  For
 112that purpose there is another flag.
 116If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but
 117related behaviours are invoked, both using the `d_op->d_manage()`
 118dentry operation.
 120Firstly, before checking to see if any filesystem is mounted on the
 121directory, d_manage() will be called with the `rcu_walk` parameter set
 122to `false`.  It may return one of three things:
 124-  A return value of zero indicates that there is nothing special
 125   about this dentry and normal checks for mounts and automounts
 126   should proceed.
 128   autofs normally returns zero, but first waits for any
 129   expiry (automatic unmounting of the mounted filesystem) to
 130   complete.  This avoids races.
 132-  A return value of `-EISDIR` tells the VFS to ignore any mounts
 133   on the directory and to not consider calling `->d_automount()`.
 134   This effectively disables the **DCACHE_NEED_AUTOMOUNT** flag
 135   causing the directory not be a mount trap after all.
 137   autofs returns this if it detects that the process performing the
 138   lookup is the automount daemon and that the mount has been
 139   requested but has not yet completed.  How it determines this is
 140   discussed later.  This allows the automount daemon not to get
 141   caught in the mount trap.
 143   There is a subtlety here.  It is possible that a second autofs
 144   filesystem can be mounted below the first and for both of them to
 145   be managed by the same daemon.  For the daemon to be able to mount
 146   something on the second it must be able to "walk" down past the
 147   first.  This means that d_manage cannot *always* return -EISDIR for
 148   the automount daemon.  It must only return it when a mount has
 149   been requested, but has not yet completed.
 151   `d_manage` also returns `-EISDIR` if the dentry shouldn't be a
 152   mount trap, either because it is a symbolic link or because it is
 153   not empty.
 155-  Any other negative value is treated as an error and returned
 156   to the caller.
 158   autofs can return
 160   - -ENOENT if the automount daemon failed to mount anything,
 161   - -ENOMEM if it ran out of memory,
 162   - -EINTR if a signal arrived while waiting for expiry to
 163     complete
 164   - or any other error sent down by the automount daemon.
 167The second use case only occurs during an "RCU-walk" and so `rcu_walk`
 168will be set.
 170An RCU-walk is a fast and lightweight process for walking down a
 171filename path (i.e. it is like running on tip-toes).  RCU-walk cannot
 172cope with all situations so when it finds a difficulty it falls back
 173to "REF-walk", which is slower but more robust.
 175RCU-walk will never call `->d_automount`; the filesystems must already
 176be mounted or RCU-walk cannot handle the path.
 177To determine if a mount-trap is safe for RCU-walk mode it calls
 178`->d_manage()` with `rcu_walk` set to `true`.
 180In this case `d_manage()` must avoid blocking and should avoid taking
 181spinlocks if at all possible.  Its sole purpose is to determine if it
 182would be safe to follow down into any mounted directory and the only
 183reason that it might not be is if an expiry of the mount is
 186In the `rcu_walk` case, `d_manage()` cannot return -EISDIR to tell the
 187VFS that this is a directory that doesn't require d_automount.  If
 188`rcu_walk` sees a dentry with DCACHE_NEED_AUTOMOUNT set but nothing
 189mounted, it *will* fall back to REF-walk.  `d_manage()` cannot make the
 190VFS remain in RCU-walk mode, but can only tell it to get out of
 191RCU-walk mode by returning `-ECHILD`.
 193So `d_manage()`, when called with `rcu_walk` set, should either return
 194-ECHILD if there is any reason to believe it is unsafe to enter the
 195mounted filesystem, otherwise it should return 0.
 197autofs will return `-ECHILD` if an expiry of the filesystem has been
 198initiated or is being considered, otherwise it returns 0.
 201Mountpoint expiry
 204The VFS has a mechanism for automatically expiring unused mounts,
 205much as it can expire any unused dentry information from the dcache.
 206This is guided by the MNT_SHRINKABLE flag.  This only applies to
 207mounts that were created by `d_automount()` returning a filesystem to be
 208mounted.  As autofs doesn't return such a filesystem but leaves the
 209mounting to the automount daemon, it must involve the automount daemon
 210in unmounting as well.  This also means that autofs has more control
 211over expiry.
 213The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to
 214the `umount` system call.  Unmounting with MNT_EXPIRE will fail unless
 215a previous attempt had been made, and the filesystem has been inactive
 216and untouched since that previous attempt.  autofs does not depend on
 217this but has its own internal tracking of whether filesystems were
 218recently used.  This allows individual names in the autofs directory
 219to expire separately.
 221With version 4 of the protocol, the automount daemon can try to
 222unmount any filesystems mounted on the autofs filesystem or remove any
 223symbolic links or empty directories any time it likes.  If the unmount
 224or removal is successful the filesystem will be returned to the state
 225it was before the mount or creation, so that any access of the name
 226will trigger normal auto-mount processing.  In particular, `rmdir` and
 227`unlink` do not leave negative entries in the dcache as a normal
 228filesystem would, so an attempt to access a recently-removed object is
 229passed to autofs for handling.
 231With version 5, this is not safe except for unmounting from top-level
 232directories.  As lower-level directories are never mount traps, other
 233processes will see an empty directory as soon as the filesystem is
 234unmounted.  So it is generally safest to use the autofs expiry
 235protocol described below.
 237Normally the daemon only wants to remove entries which haven't been
 238used for a while.  For this purpose autofs maintains a "`last_used`"
 239time stamp on each directory or symlink.  For symlinks it genuinely
 240does record the last time the symlink was "used" or followed to find
 241out where it points to.  For directories the field is used slightly
 242differently.  The field is updated at mount time and during expire
 243checks if it is found to be in use (ie. open file descriptor or
 244process working directory) and during path walks. The update done
 245during path walks prevents frequent expire and immediate mount of
 246frequently accessed automounts. But in the case where a GUI continually
 247access or an application frequently scans an autofs directory tree
 248there can be an accumulation of mounts that aren't actually being
 249used. To cater for this case the "`strictexpire`" autofs mount option
 250can be used to avoid the "`last_used`" update on path walk thereby
 251preventing this apparent inability to expire mounts that aren't
 252really in use.
 254The daemon is able to ask autofs if anything is due to be expired,
 255using an `ioctl` as discussed later.  For a *direct* mount, autofs
 256considers if the entire mount-tree can be unmounted or not.  For an
 257*indirect* mount, autofs considers each of the names in the top level
 258directory to determine if any of those can be unmounted and cleaned
 261There is an option with indirect mounts to consider each of the leaves
 262that has been mounted on instead of considering the top-level names.
 263This was originally intended for compatibility with version 4 of autofs
 264and should be considered as deprecated for Sun Format automount maps.
 265However, it may be used again for amd format mount maps (which are
 266generally indirect maps) because the amd automounter allows for the
 267setting of an expire timeout for individual mounts. But there are
 268some difficulties in making the needed changes for this.
 270When autofs considers a directory it checks the `last_used` time and
 271compares it with the "timeout" value set when the filesystem was
 272mounted, though this check is ignored in some cases. It also checks if
 273the directory or anything below it is in use.  For symbolic links,
 274only the `last_used` time is ever considered.
 276If both appear to support expiring the directory or symlink, an action
 277is taken.
 279There are two ways to ask autofs to consider expiry.  The first is to
 280use the **AUTOFS_IOC_EXPIRE** ioctl.  This only works for indirect
 281mounts.  If it finds something in the root directory to expire it will
 282return the name of that thing.  Once a name has been returned the
 283automount daemon needs to unmount any filesystems mounted below the
 284name normally.  As described above, this is unsafe for non-toplevel
 285mounts in a version-5 autofs.  For this reason the current `automount(8)`
 286does not use this ioctl.
 288The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or
 289the **AUTOFS_IOC_EXPIRE_MULTI** ioctl.  This will work for both direct and
 290indirect mounts.  If it selects an object to expire, it will notify
 291the daemon using the notification mechanism described below.  This
 292will block until the daemon acknowledges the expiry notification.
 293This implies that the "`EXPIRE`" ioctl must be sent from a different
 294thread than the one which handles notification.
 296While the ioctl is blocking, the entry is marked as "expiring" and
 297`d_manage` will block until the daemon affirms that the unmount has
 298completed (together with removing any directories that might have been
 299necessary), or has been aborted.
 301Communicating with autofs: detecting the daemon
 304There are several forms of communication between the automount daemon
 305and the filesystem.  As we have already seen, the daemon can create and
 306remove directories and symlinks using normal filesystem operations.
 307autofs knows whether a process requesting some operation is the daemon
 308or not based on its process-group id number (see getpgid(1)).
 310When an autofs filesystem is mounted the pgid of the mounting
 311processes is recorded unless the "pgrp=" option is given, in which
 312case that number is recorded instead.  Any request arriving from a
 313process in that process group is considered to come from the daemon.
 314If the daemon ever has to be stopped and restarted a new pgid can be
 315provided through an ioctl as will be described below.
 317Communicating with autofs: the event pipe
 320When an autofs filesystem is mounted, the 'write' end of a pipe must
 321be passed using the 'fd=' mount option.  autofs will write
 322notification messages to this pipe for the daemon to respond to.
 323For version 5, the format of the message is::
 325        struct autofs_v5_packet {
 326                struct autofs_packet_hdr hdr;
 327                autofs_wqt_t wait_queue_token;
 328                __u32 dev;
 329                __u64 ino;
 330                __u32 uid;
 331                __u32 gid;
 332                __u32 pid;
 333                __u32 tgid;
 334                __u32 len;
 335                char name[NAME_MAX+1];
 336        };
 338And the format of the header is::
 340        struct autofs_packet_hdr {
 341                int proto_version;              /* Protocol version */
 342                int type;                       /* Type of packet */
 343        };
 345where the type is one of ::
 347        autofs_ptype_missing_indirect
 348        autofs_ptype_expire_indirect
 349        autofs_ptype_missing_direct
 350        autofs_ptype_expire_direct
 352so messages can indicate that a name is missing (something tried to
 353access it but it isn't there) or that it has been selected for expiry.
 355The pipe will be set to "packet mode" (equivalent to passing
 356`O_DIRECT`) to _pipe2(2)_ so that a read from the pipe will return at
 357most one packet, and any unread portion of a packet will be discarded.
 359The `wait_queue_token` is a unique number which can identify a
 360particular request to be acknowledged.  When a message is sent over
 361the pipe the affected dentry is marked as either "active" or
 362"expiring" and other accesses to it block until the message is
 363acknowledged using one of the ioctls below with the relevant
 366Communicating with autofs: root directory ioctls
 369The root directory of an autofs filesystem will respond to a number of
 370ioctls.  The process issuing the ioctl must have the CAP_SYS_ADMIN
 371capability, or must be the automount daemon.
 373The available ioctl commands are:
 376        a notification has been handled.  The argument
 377        to the ioctl command is the "wait_queue_token" number
 378        corresponding to the notification being acknowledged.
 379- **AUTOFS_IOC_FAIL**:
 380        similar to above, but indicates failure with
 381        the error code `ENOENT`.
 383        Causes the autofs to enter "catatonic"
 384        mode meaning that it stops sending notifications to the daemon.
 385        This mode is also entered if a write to the pipe fails.
 387        This returns the protocol version in use.
 389        Returns the protocol sub-version which
 390        is really a version number for the implementation.
 392        This passes a pointer to an unsigned
 393        long.  The value is used to set the timeout for expiry, and
 394        the current timeout value is stored back through the pointer.
 396        Returns, in the pointed-to `int`, 1 if
 397        the filesystem could be unmounted.  This is only a hint as
 398        the situation could change at any instant.  This call can be
 399        used to avoid a more expensive full unmount attempt.
 401        as described above, this asks if there is
 402        anything suitable to expire.  A pointer to a packet::
 404                struct autofs_packet_expire_multi {
 405                        struct autofs_packet_hdr hdr;
 406                        autofs_wqt_t wait_queue_token;
 407                        int len;
 408                        char name[NAME_MAX+1];
 409                };
 411        is required.  This is filled in with the name of something
 412        that can be unmounted or removed.  If nothing can be expired,
 413        `errno` is set to `EAGAIN`.  Even though a `wait_queue_token`
 414        is present in the structure, no "wait queue" is established
 415        and no acknowledgment is needed.
 417        This is similar to
 418        **AUTOFS_IOC_EXPIRE** except that it causes notification to be
 419        sent to the daemon, and it blocks until the daemon acknowledges.
 420        The argument is an integer which can contain two different flags.
 422        **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored
 423        and objects are expired if the are not in use.
 425        **AUTOFS_EXP_FORCED** causes the in use status to be ignored
 426        and objects are expired ieven if they are in use. This assumes
 427        that the daemon has requested this because it is capable of
 428        performing the umount.
 430        **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level
 431        name to expire.  This is only safe when *maxproto* is 4.
 433Communicating with autofs: char-device ioctls
 436It is not always possible to open the root of an autofs filesystem,
 437particularly a *direct* mounted filesystem.  If the automount daemon
 438is restarted there is no way for it to regain control of existing
 439mounts using any of the above communication channels.  To address this
 440need there is a "miscellaneous" character device (major 10, minor 235)
 441which can be used to communicate directly with the autofs filesystem.
 442It requires CAP_SYS_ADMIN for access.
 444The 'ioctl's that can be used on this device are described in a separate
 445document `autofs-mount-control.txt`, and are summarised briefly here.
 446Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure::
 448        struct autofs_dev_ioctl {
 449                __u32 ver_major;
 450                __u32 ver_minor;
 451                __u32 size;             /* total size of data passed in
 452                                         * including this struct */
 453                __s32 ioctlfd;          /* automount command fd */
 455                /* Command parameters */
 456                union {
 457                        struct args_protover            protover;
 458                        struct args_protosubver         protosubver;
 459                        struct args_openmount           openmount;
 460                        struct args_ready               ready;
 461                        struct args_fail                fail;
 462                        struct args_setpipefd           setpipefd;
 463                        struct args_timeout             timeout;
 464                        struct args_requester           requester;
 465                        struct args_expire              expire;
 466                        struct args_askumount           askumount;
 467                        struct args_ismountpoint        ismountpoint;
 468                };
 470                char path[0];
 471        };
 473For the **OPEN_MOUNT** and **IS_MOUNTPOINT** commands, the target
 474filesystem is identified by the `path`.  All other commands identify
 475the filesystem by the `ioctlfd` which is a file descriptor open on the
 476root, and which can be returned by **OPEN_MOUNT**.
 478The `ver_major` and `ver_minor` are in/out parameters which check that
 479the requested version is supported, and report the maximum version
 480that the kernel module can support.
 482Commands are:
 485        does nothing, except validate and
 486        set version numbers.
 488        return an open file descriptor
 489        on the root of an autofs filesystem.  The filesystem is identified
 490        by name and device number, which is stored in `openmount.devid`.
 491        Device numbers for existing filesystems can be found in
 492        `/proc/self/mountinfo`.
 494        same as `close(ioctlfd)`.
 496        if the filesystem is in
 497        catatonic mode, this can provide the write end of a new pipe
 498        in `setpipefd.pipefd` to re-establish communication with a daemon.
 499        The process group of the calling process is used to identify the
 500        daemon.
 502        `path` should be a
 503        name within the filesystem that has been auto-mounted on.
 504        On successful return, `requester.uid` and `requester.gid` will be
 505        the UID and GID of the process which triggered that mount.
 507        Check if path is a
 508        mountpoint of a particular type - see separate documentation for
 509        details.
 520These all have the same
 521function as the similarly named **AUTOFS_IOC** ioctls, except
 522that **FAIL** can be given an explicit error number in `fail.status`
 523instead of assuming `ENOENT`, and this **EXPIRE** command
 524corresponds to **AUTOFS_IOC_EXPIRE_MULTI**.
 526Catatonic mode
 529As mentioned, an autofs mount can enter "catatonic" mode.  This
 530happens if a write to the notification pipe fails, or if it is
 531explicitly requested by an `ioctl`.
 533When entering catatonic mode, the pipe is closed and any pending
 534notifications are acknowledged with the error `ENOENT`.
 536Once in catatonic mode attempts to access non-existing names will
 537result in `ENOENT` while attempts to access existing directories will
 538be treated in the same way as if they came from the daemon, so mount
 539traps will not fire.
 541When the filesystem is mounted a _uid_ and _gid_ can be given which
 542set the ownership of directories and symbolic links.  When the
 543filesystem is in catatonic mode, any process with a matching UID can
 544create directories or symlinks in the root directory, but not in other
 547Catatonic mode can only be left via the
 548**AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`.
 550The "ignore" mount option
 553The "ignore" mount option can be used to provide a generic indicator
 554to applications that the mount entry should be ignored when displaying
 555mount information.
 557In other OSes that provide autofs and that provide a mount list to user
 558space based on the kernel mount list a no-op mount option ("ignore" is
 559the one use on the most common OSes) is allowed so that autofs file
 560system users can optionally use it.
 562This is intended to be used by user space programs to exclude autofs
 563mounts from consideration when reading the mounts list.
 565autofs, name spaces, and shared mounts
 568With bind mounts and name spaces it is possible for an autofs
 569filesystem to appear at multiple places in one or more filesystem
 570name spaces.  For this to work sensibly, the autofs filesystem should
 571always be mounted "shared". e.g. ::
 573        mount --make-shared /autofs/mount/point
 575The automount daemon is only able to manage a single mount location for
 576an autofs filesystem and if mounts on that are not 'shared', other
 577locations will not behave as expected.  In particular access to those
 578other locations will likely result in the `ELOOP` error ::
 580        Too many levels of symbolic links