1.. SPDX-License-Identifier: GPL-2.0
   4File management in the Linux kernel
   7This document describes how locking for files (struct file)
   8and file descriptor table (struct files) works.
  10Up until 2.6.12, the file descriptor table has been protected
  11with a lock (files->file_lock) and reference count (files->count).
  12->file_lock protected accesses to all the file related fields
  13of the table. ->count was used for sharing the file descriptor
  14table between tasks cloned with CLONE_FILES flag. Typically
  15this would be the case for posix threads. As with the common
  16refcounting model in the kernel, the last task doing
  17a put_files_struct() frees the file descriptor (fd) table.
  18The files (struct file) themselves are protected using
  19reference count (->f_count).
  21In the new lock-free model of file descriptor management,
  22the reference counting is similar, but the locking is
  23based on RCU. The file descriptor table contains multiple
  24elements - the fd sets (open_fds and close_on_exec, the
  25array of file pointers, the sizes of the sets and the array
  26etc.). In order for the updates to appear atomic to
  27a lock-free reader, all the elements of the file descriptor
  28table are in a separate structure - struct fdtable.
  29files_struct contains a pointer to struct fdtable through
  30which the actual fd table is accessed. Initially the
  31fdtable is embedded in files_struct itself. On a subsequent
  32expansion of fdtable, a new fdtable structure is allocated
  33and files->fdtab points to the new structure. The fdtable
  34structure is freed with RCU and lock-free readers either
  35see the old fdtable or the new fdtable making the update
  36appear atomic. Here are the locking rules for
  37the fdtable structure -
  391. All references to the fdtable must be done through
  40   the files_fdtable() macro::
  42        struct fdtable *fdt;
  44        rcu_read_lock();
  46        fdt = files_fdtable(files);
  47        ....
  48        if (n <= fdt->max_fds)
  49                ....
  50        ...
  51        rcu_read_unlock();
  53   files_fdtable() uses rcu_dereference() macro which takes care of
  54   the memory barrier requirements for lock-free dereference.
  55   The fdtable pointer must be read within the read-side
  56   critical section.
  582. Reading of the fdtable as described above must be protected
  59   by rcu_read_lock()/rcu_read_unlock().
  613. For any update to the fd table, files->file_lock must
  62   be held.
  644. To look up the file structure given an fd, a reader
  65   must use either lookup_fd_rcu() or files_lookup_fd_rcu() APIs. These
  66   take care of barrier requirements due to lock-free lookup.
  68   An example::
  70        struct file *file;
  72        rcu_read_lock();
  73        file = lookup_fd_rcu(fd);
  74        if (file) {
  75                ...
  76        }
  77        ....
  78        rcu_read_unlock();
  805. Handling of the file structures is special. Since the look-up
  81   of the fd (fget()/fget_light()) are lock-free, it is possible
  82   that look-up may race with the last put() operation on the
  83   file structure. This is avoided using atomic_long_inc_not_zero()
  84   on ->f_count::
  86        rcu_read_lock();
  87        file = files_lookup_fd_rcu(files, fd);
  88        if (file) {
  89                if (atomic_long_inc_not_zero(&file->f_count))
  90                        *fput_needed = 1;
  91                else
  92                /* Didn't get the reference, someone's freed */
  93                        file = NULL;
  94        }
  95        rcu_read_unlock();
  96        ....
  97        return file;
  99   atomic_long_inc_not_zero() detects if refcounts is already zero or
 100   goes to zero during increment. If it does, we fail
 101   fget()/fget_light().
 1036. Since both fdtable and file structures can be looked up
 104   lock-free, they must be installed using rcu_assign_pointer()
 105   API. If they are looked up lock-free, rcu_dereference()
 106   must be used. However it is advisable to use files_fdtable()
 107   and lookup_fd_rcu()/files_lookup_fd_rcu() which take care of these issues.
 1097. While updating, the fdtable pointer must be looked up while
 110   holding files->file_lock. If ->file_lock is dropped, then
 111   another thread expand the files thereby creating a new
 112   fdtable and making the earlier fdtable pointer stale.
 114   For example::
 116        spin_lock(&files->file_lock);
 117        fd = locate_fd(files, file, start);
 118        if (fd >= 0) {
 119                /* locate_fd() may have expanded fdtable, load the ptr */
 120                fdt = files_fdtable(files);
 121                __set_open_fd(fd, fdt);
 122                __clear_close_on_exec(fd, fdt);
 123                spin_unlock(&files->file_lock);
 124        .....
 126   Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
 127   the fdtable pointer (fdt) must be loaded after locate_fd().