linux/Documentation/filesystems/netfs_library.rst
<<
>>
Prefs
   1.. SPDX-License-Identifier: GPL-2.0
   2
   3=================================
   4Network Filesystem Helper Library
   5=================================
   6
   7.. Contents:
   8
   9 - Overview.
  10 - Buffered read helpers.
  11   - Read helper functions.
  12   - Read helper structures.
  13   - Read helper operations.
  14   - Read helper procedure.
  15   - Read helper cache API.
  16
  17
  18Overview
  19========
  20
  21The network filesystem helper library is a set of functions designed to aid a
  22network filesystem in implementing VM/VFS operations.  For the moment, that
  23just includes turning various VM buffered read operations into requests to read
  24from the server.  The helper library, however, can also interpose other
  25services, such as local caching or local data encryption.
  26
  27Note that the library module doesn't link against local caching directly, so
  28access must be provided by the netfs.
  29
  30
  31Buffered Read Helpers
  32=====================
  33
  34The library provides a set of read helpers that handle the ->readpage(),
  35->readahead() and much of the ->write_begin() VM operations and translate them
  36into a common call framework.
  37
  38The following services are provided:
  39
  40 * Handle folios that span multiple pages.
  41
  42 * Insulate the netfs from VM interface changes.
  43
  44 * Allow the netfs to arbitrarily split reads up into pieces, even ones that
  45   don't match folio sizes or folio alignments and that may cross folios.
  46
  47 * Allow the netfs to expand a readahead request in both directions to meet its
  48   needs.
  49
  50 * Allow the netfs to partially fulfil a read, which will then be resubmitted.
  51
  52 * Handle local caching, allowing cached data and server-read data to be
  53   interleaved for a single request.
  54
  55 * Handle clearing of bufferage that aren't on the server.
  56
  57 * Handle retrying of reads that failed, switching reads from the cache to the
  58   server as necessary.
  59
  60 * In the future, this is a place that other services can be performed, such as
  61   local encryption of data to be stored remotely or in the cache.
  62
  63From the network filesystem, the helpers require a table of operations.  This
  64includes a mandatory method to issue a read operation along with a number of
  65optional methods.
  66
  67
  68Read Helper Functions
  69---------------------
  70
  71Three read helpers are provided::
  72
  73        void netfs_readahead(struct readahead_control *ractl,
  74                             const struct netfs_read_request_ops *ops,
  75                             void *netfs_priv);
  76        int netfs_readpage(struct file *file,
  77                           struct folio *folio,
  78                           const struct netfs_read_request_ops *ops,
  79                           void *netfs_priv);
  80        int netfs_write_begin(struct file *file,
  81                              struct address_space *mapping,
  82                              loff_t pos,
  83                              unsigned int len,
  84                              unsigned int flags,
  85                              struct folio **_folio,
  86                              void **_fsdata,
  87                              const struct netfs_read_request_ops *ops,
  88                              void *netfs_priv);
  89
  90Each corresponds to a VM operation, with the addition of a couple of parameters
  91for the use of the read helpers:
  92
  93 * ``ops``
  94
  95   A table of operations through which the helpers can talk to the filesystem.
  96
  97 * ``netfs_priv``
  98
  99   Filesystem private data (can be NULL).
 100
 101Both of these values will be stored into the read request structure.
 102
 103For ->readahead() and ->readpage(), the network filesystem should just jump
 104into the corresponding read helper; whereas for ->write_begin(), it may be a
 105little more complicated as the network filesystem might want to flush
 106conflicting writes or track dirty data and needs to put the acquired folio if
 107an error occurs after calling the helper.
 108
 109The helpers manage the read request, calling back into the network filesystem
 110through the suppplied table of operations.  Waits will be performed as
 111necessary before returning for helpers that are meant to be synchronous.
 112
 113If an error occurs and netfs_priv is non-NULL, ops->cleanup() will be called to
 114deal with it.  If some parts of the request are in progress when an error
 115occurs, the request will get partially completed if sufficient data is read.
 116
 117Additionally, there is::
 118
 119  * void netfs_subreq_terminated(struct netfs_read_subrequest *subreq,
 120                                 ssize_t transferred_or_error,
 121                                 bool was_async);
 122
 123which should be called to complete a read subrequest.  This is given the number
 124of bytes transferred or a negative error code, plus a flag indicating whether
 125the operation was asynchronous (ie. whether the follow-on processing can be
 126done in the current context, given this may involve sleeping).
 127
 128
 129Read Helper Structures
 130----------------------
 131
 132The read helpers make use of a couple of structures to maintain the state of
 133the read.  The first is a structure that manages a read request as a whole::
 134
 135        struct netfs_read_request {
 136                struct inode            *inode;
 137                struct address_space    *mapping;
 138                struct netfs_cache_resources cache_resources;
 139                void                    *netfs_priv;
 140                loff_t                  start;
 141                size_t                  len;
 142                loff_t                  i_size;
 143                const struct netfs_read_request_ops *netfs_ops;
 144                unsigned int            debug_id;
 145                ...
 146        };
 147
 148The above fields are the ones the netfs can use.  They are:
 149
 150 * ``inode``
 151 * ``mapping``
 152
 153   The inode and the address space of the file being read from.  The mapping
 154   may or may not point to inode->i_data.
 155
 156 * ``cache_resources``
 157
 158   Resources for the local cache to use, if present.
 159
 160 * ``netfs_priv``
 161
 162   The network filesystem's private data.  The value for this can be passed in
 163   to the helper functions or set during the request.  The ->cleanup() op will
 164   be called if this is non-NULL at the end.
 165
 166 * ``start``
 167 * ``len``
 168
 169   The file position of the start of the read request and the length.  These
 170   may be altered by the ->expand_readahead() op.
 171
 172 * ``i_size``
 173
 174   The size of the file at the start of the request.
 175
 176 * ``netfs_ops``
 177
 178   A pointer to the operation table.  The value for this is passed into the
 179   helper functions.
 180
 181 * ``debug_id``
 182
 183   A number allocated to this operation that can be displayed in trace lines
 184   for reference.
 185
 186
 187The second structure is used to manage individual slices of the overall read
 188request::
 189
 190        struct netfs_read_subrequest {
 191                struct netfs_read_request *rreq;
 192                loff_t                  start;
 193                size_t                  len;
 194                size_t                  transferred;
 195                unsigned long           flags;
 196                unsigned short          debug_index;
 197                ...
 198        };
 199
 200Each subrequest is expected to access a single source, though the helpers will
 201handle falling back from one source type to another.  The members are:
 202
 203 * ``rreq``
 204
 205   A pointer to the read request.
 206
 207 * ``start``
 208 * ``len``
 209
 210   The file position of the start of this slice of the read request and the
 211   length.
 212
 213 * ``transferred``
 214
 215   The amount of data transferred so far of the length of this slice.  The
 216   network filesystem or cache should start the operation this far into the
 217   slice.  If a short read occurs, the helpers will call again, having updated
 218   this to reflect the amount read so far.
 219
 220 * ``flags``
 221
 222   Flags pertaining to the read.  There are two of interest to the filesystem
 223   or cache:
 224
 225   * ``NETFS_SREQ_CLEAR_TAIL``
 226
 227     This can be set to indicate that the remainder of the slice, from
 228     transferred to len, should be cleared.
 229
 230   * ``NETFS_SREQ_SEEK_DATA_READ``
 231
 232     This is a hint to the cache that it might want to try skipping ahead to
 233     the next data (ie. using SEEK_DATA).
 234
 235 * ``debug_index``
 236
 237   A number allocated to this slice that can be displayed in trace lines for
 238   reference.
 239
 240
 241Read Helper Operations
 242----------------------
 243
 244The network filesystem must provide the read helpers with a table of operations
 245through which it can issue requests and negotiate::
 246
 247        struct netfs_read_request_ops {
 248                void (*init_rreq)(struct netfs_read_request *rreq, struct file *file);
 249                bool (*is_cache_enabled)(struct inode *inode);
 250                int (*begin_cache_operation)(struct netfs_read_request *rreq);
 251                void (*expand_readahead)(struct netfs_read_request *rreq);
 252                bool (*clamp_length)(struct netfs_read_subrequest *subreq);
 253                void (*issue_op)(struct netfs_read_subrequest *subreq);
 254                bool (*is_still_valid)(struct netfs_read_request *rreq);
 255                int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
 256                                         struct folio *folio, void **_fsdata);
 257                void (*done)(struct netfs_read_request *rreq);
 258                void (*cleanup)(struct address_space *mapping, void *netfs_priv);
 259        };
 260
 261The operations are as follows:
 262
 263 * ``init_rreq()``
 264
 265   [Optional] This is called to initialise the request structure.  It is given
 266   the file for reference and can modify the ->netfs_priv value.
 267
 268 * ``is_cache_enabled()``
 269
 270   [Required] This is called by netfs_write_begin() to ask if the file is being
 271   cached.  It should return true if it is being cached and false otherwise.
 272
 273 * ``begin_cache_operation()``
 274
 275   [Optional] This is called to ask the network filesystem to call into the
 276   cache (if present) to initialise the caching state for this read.  The netfs
 277   library module cannot access the cache directly, so the cache should call
 278   something like fscache_begin_read_operation() to do this.
 279
 280   The cache gets to store its state in ->cache_resources and must set a table
 281   of operations of its own there (though of a different type).
 282
 283   This should return 0 on success and an error code otherwise.  If an error is
 284   reported, the operation may proceed anyway, just without local caching (only
 285   out of memory and interruption errors cause failure here).
 286
 287 * ``expand_readahead()``
 288
 289   [Optional] This is called to allow the filesystem to expand the size of a
 290   readahead read request.  The filesystem gets to expand the request in both
 291   directions, though it's not permitted to reduce it as the numbers may
 292   represent an allocation already made.  If local caching is enabled, it gets
 293   to expand the request first.
 294
 295   Expansion is communicated by changing ->start and ->len in the request
 296   structure.  Note that if any change is made, ->len must be increased by at
 297   least as much as ->start is reduced.
 298
 299 * ``clamp_length()``
 300
 301   [Optional] This is called to allow the filesystem to reduce the size of a
 302   subrequest.  The filesystem can use this, for example, to chop up a request
 303   that has to be split across multiple servers or to put multiple reads in
 304   flight.
 305
 306   This should return 0 on success and an error code on error.
 307
 308 * ``issue_op()``
 309
 310   [Required] The helpers use this to dispatch a subrequest to the server for
 311   reading.  In the subrequest, ->start, ->len and ->transferred indicate what
 312   data should be read from the server.
 313
 314   There is no return value; the netfs_subreq_terminated() function should be
 315   called to indicate whether or not the operation succeeded and how much data
 316   it transferred.  The filesystem also should not deal with setting folios
 317   uptodate, unlocking them or dropping their refs - the helpers need to deal
 318   with this as they have to coordinate with copying to the local cache.
 319
 320   Note that the helpers have the folios locked, but not pinned.  It is
 321   possible to use the ITER_XARRAY iov iterator to refer to the range of the
 322   inode that is being operated upon without the need to allocate large bvec
 323   tables.
 324
 325 * ``is_still_valid()``
 326
 327   [Optional] This is called to find out if the data just read from the local
 328   cache is still valid.  It should return true if it is still valid and false
 329   if not.  If it's not still valid, it will be reread from the server.
 330
 331 * ``check_write_begin()``
 332
 333   [Optional] This is called from the netfs_write_begin() helper once it has
 334   allocated/grabbed the folio to be modified to allow the filesystem to flush
 335   conflicting state before allowing it to be modified.
 336
 337   It should return 0 if everything is now fine, -EAGAIN if the folio should be
 338   regrabbed and any other error code to abort the operation.
 339
 340 * ``done``
 341
 342   [Optional] This is called after the folios in the request have all been
 343   unlocked (and marked uptodate if applicable).
 344
 345 * ``cleanup``
 346
 347   [Optional] This is called as the request is being deallocated so that the
 348   filesystem can clean up ->netfs_priv.
 349
 350
 351
 352Read Helper Procedure
 353---------------------
 354
 355The read helpers work by the following general procedure:
 356
 357 * Set up the request.
 358
 359 * For readahead, allow the local cache and then the network filesystem to
 360   propose expansions to the read request.  This is then proposed to the VM.
 361   If the VM cannot fully perform the expansion, a partially expanded read will
 362   be performed, though this may not get written to the cache in its entirety.
 363
 364 * Loop around slicing chunks off of the request to form subrequests:
 365
 366   * If a local cache is present, it gets to do the slicing, otherwise the
 367     helpers just try to generate maximal slices.
 368
 369   * The network filesystem gets to clamp the size of each slice if it is to be
 370     the source.  This allows rsize and chunking to be implemented.
 371
 372   * The helpers issue a read from the cache or a read from the server or just
 373     clears the slice as appropriate.
 374
 375   * The next slice begins at the end of the last one.
 376
 377   * As slices finish being read, they terminate.
 378
 379 * When all the subrequests have terminated, the subrequests are assessed and
 380   any that are short or have failed are reissued:
 381
 382   * Failed cache requests are issued against the server instead.
 383
 384   * Failed server requests just fail.
 385
 386   * Short reads against either source will be reissued against that source
 387     provided they have transferred some more data:
 388
 389     * The cache may need to skip holes that it can't do DIO from.
 390
 391     * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
 392       end of the slice instead of reissuing.
 393
 394 * Once the data is read, the folios that have been fully read/cleared:
 395
 396   * Will be marked uptodate.
 397
 398   * If a cache is present, will be marked with PG_fscache.
 399
 400   * Unlocked
 401
 402 * Any folios that need writing to the cache will then have DIO writes issued.
 403
 404 * Synchronous operations will wait for reading to be complete.
 405
 406 * Writes to the cache will proceed asynchronously and the folios will have the
 407   PG_fscache mark removed when that completes.
 408
 409 * The request structures will be cleaned up when everything has completed.
 410
 411
 412Read Helper Cache API
 413---------------------
 414
 415When implementing a local cache to be used by the read helpers, two things are
 416required: some way for the network filesystem to initialise the caching for a
 417read request and a table of operations for the helpers to call.
 418
 419The network filesystem's ->begin_cache_operation() method is called to set up a
 420cache and this must call into the cache to do the work.  If using fscache, for
 421example, the cache would call::
 422
 423        int fscache_begin_read_operation(struct netfs_read_request *rreq,
 424                                         struct fscache_cookie *cookie);
 425
 426passing in the request pointer and the cookie corresponding to the file.
 427
 428The netfs_read_request object contains a place for the cache to hang its
 429state::
 430
 431        struct netfs_cache_resources {
 432                const struct netfs_cache_ops    *ops;
 433                void                            *cache_priv;
 434                void                            *cache_priv2;
 435        };
 436
 437This contains an operations table pointer and two private pointers.  The
 438operation table looks like the following::
 439
 440        struct netfs_cache_ops {
 441                void (*end_operation)(struct netfs_cache_resources *cres);
 442
 443                void (*expand_readahead)(struct netfs_cache_resources *cres,
 444                                         loff_t *_start, size_t *_len, loff_t i_size);
 445
 446                enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq,
 447                                                       loff_t i_size);
 448
 449                int (*read)(struct netfs_cache_resources *cres,
 450                            loff_t start_pos,
 451                            struct iov_iter *iter,
 452                            bool seek_data,
 453                            netfs_io_terminated_t term_func,
 454                            void *term_func_priv);
 455
 456                int (*prepare_write)(struct netfs_cache_resources *cres,
 457                                     loff_t *_start, size_t *_len, loff_t i_size,
 458                                     bool no_space_allocated_yet);
 459
 460                int (*write)(struct netfs_cache_resources *cres,
 461                             loff_t start_pos,
 462                             struct iov_iter *iter,
 463                             netfs_io_terminated_t term_func,
 464                             void *term_func_priv);
 465
 466                int (*query_occupancy)(struct netfs_cache_resources *cres,
 467                                       loff_t start, size_t len, size_t granularity,
 468                                       loff_t *_data_start, size_t *_data_len);
 469        };
 470
 471With a termination handler function pointer::
 472
 473        typedef void (*netfs_io_terminated_t)(void *priv,
 474                                              ssize_t transferred_or_error,
 475                                              bool was_async);
 476
 477The methods defined in the table are:
 478
 479 * ``end_operation()``
 480
 481   [Required] Called to clean up the resources at the end of the read request.
 482
 483 * ``expand_readahead()``
 484
 485   [Optional] Called at the beginning of a netfs_readahead() operation to allow
 486   the cache to expand a request in either direction.  This allows the cache to
 487   size the request appropriately for the cache granularity.
 488
 489   The function is passed poiners to the start and length in its parameters,
 490   plus the size of the file for reference, and adjusts the start and length
 491   appropriately.  It should return one of:
 492
 493   * ``NETFS_FILL_WITH_ZEROES``
 494   * ``NETFS_DOWNLOAD_FROM_SERVER``
 495   * ``NETFS_READ_FROM_CACHE``
 496   * ``NETFS_INVALID_READ``
 497
 498   to indicate whether the slice should just be cleared or whether it should be
 499   downloaded from the server or read from the cache - or whether slicing
 500   should be given up at the current point.
 501
 502 * ``prepare_read()``
 503
 504   [Required] Called to configure the next slice of a request.  ->start and
 505   ->len in the subrequest indicate where and how big the next slice can be;
 506   the cache gets to reduce the length to match its granularity requirements.
 507
 508 * ``read()``
 509
 510   [Required] Called to read from the cache.  The start file offset is given
 511   along with an iterator to read to, which gives the length also.  It can be
 512   given a hint requesting that it seek forward from that start position for
 513   data.
 514
 515   Also provided is a pointer to a termination handler function and private
 516   data to pass to that function.  The termination function should be called
 517   with the number of bytes transferred or an error code, plus a flag
 518   indicating whether the termination is definitely happening in the caller's
 519   context.
 520
 521 * ``prepare_write()``
 522
 523   [Required] Called to prepare a write to the cache to take place.  This
 524   involves checking to see whether the cache has sufficient space to honour
 525   the write.  ``*_start`` and ``*_len`` indicate the region to be written; the
 526   region can be shrunk or it can be expanded to a page boundary either way as
 527   necessary to align for direct I/O.  i_size holds the size of the object and
 528   is provided for reference.  no_space_allocated_yet is set to true if the
 529   caller is certain that no data has been written to that region - for example
 530   if it tried to do a read from there already.
 531
 532 * ``write()``
 533
 534   [Required] Called to write to the cache.  The start file offset is given
 535   along with an iterator to write from, which gives the length also.
 536
 537   Also provided is a pointer to a termination handler function and private
 538   data to pass to that function.  The termination function should be called
 539   with the number of bytes transferred or an error code, plus a flag
 540   indicating whether the termination is definitely happening in the caller's
 541   context.
 542
 543 * ``query_occupancy()``
 544
 545   [Required] Called to find out where the next piece of data is within a
 546   particular region of the cache.  The start and length of the region to be
 547   queried are passed in, along with the granularity to which the answer needs
 548   to be aligned.  The function passes back the start and length of the data,
 549   if any, available within that region.  Note that there may be a hole at the
 550   front.
 551
 552   It returns 0 if some data was found, -ENODATA if there was no usable data
 553   within the region or -ENOBUFS if there is no caching on this file.
 554
 555Note that these methods are passed a pointer to the cache resource structure,
 556not the read request structure as they could be used in other situations where
 557there isn't a read request structure as well, such as writing dirty data to the
 558cache.
 559
 560
 561API Function Reference
 562======================
 563
 564.. kernel-doc:: include/linux/netfs.h
 565.. kernel-doc:: fs/netfs/read_helper.c
 566