linux/Documentation/filesystems/ext4/directory.rst
<<
>>
Prefs
   1.. SPDX-License-Identifier: GPL-2.0
   2
   3Directory Entries
   4-----------------
   5
   6In an ext4 filesystem, a directory is more or less a flat file that maps
   7an arbitrary byte string (usually ASCII) to an inode number on the
   8filesystem. There can be many directory entries across the filesystem
   9that reference the same inode number--these are known as hard links, and
  10that is why hard links cannot reference files on other filesystems. As
  11such, directory entries are found by reading the data block(s)
  12associated with a directory file for the particular directory entry that
  13is desired.
  14
  15Linear (Classic) Directories
  16~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  17
  18By default, each directory lists its entries in an “almost-linear”
  19array. I write “almost” because it's not a linear array in the memory
  20sense because directory entries are not split across filesystem blocks.
  21Therefore, it is more accurate to say that a directory is a series of
  22data blocks and that each block contains a linear array of directory
  23entries. The end of each per-block array is signified by reaching the
  24end of the block; the last entry in the block has a record length that
  25takes it all the way to the end of the block. The end of the entire
  26directory is of course signified by reaching the end of the file. Unused
  27directory entries are signified by inode = 0. By default the filesystem
  28uses ``struct ext4_dir_entry_2`` for directory entries unless the
  29“filetype” feature flag is not set, in which case it uses
  30``struct ext4_dir_entry``.
  31
  32The original directory entry format is ``struct ext4_dir_entry``, which
  33is at most 263 bytes long, though on disk you'll need to reference
  34``dirent.rec_len`` to know for sure.
  35
  36.. list-table::
  37   :widths: 8 8 24 40
  38   :header-rows: 1
  39
  40   * - Offset
  41     - Size
  42     - Name
  43     - Description
  44   * - 0x0
  45     - \_\_le32
  46     - inode
  47     - Number of the inode that this directory entry points to.
  48   * - 0x4
  49     - \_\_le16
  50     - rec\_len
  51     - Length of this directory entry. Must be a multiple of 4.
  52   * - 0x6
  53     - \_\_le16
  54     - name\_len
  55     - Length of the file name.
  56   * - 0x8
  57     - char
  58     - name[EXT4\_NAME\_LEN]
  59     - File name.
  60
  61Since file names cannot be longer than 255 bytes, the new directory
  62entry format shortens the name\_len field and uses the space for a file
  63type flag, probably to avoid having to load every inode during directory
  64tree traversal. This format is ``ext4_dir_entry_2``, which is at most
  65263 bytes long, though on disk you'll need to reference
  66``dirent.rec_len`` to know for sure.
  67
  68.. list-table::
  69   :widths: 8 8 24 40
  70   :header-rows: 1
  71
  72   * - Offset
  73     - Size
  74     - Name
  75     - Description
  76   * - 0x0
  77     - \_\_le32
  78     - inode
  79     - Number of the inode that this directory entry points to.
  80   * - 0x4
  81     - \_\_le16
  82     - rec\_len
  83     - Length of this directory entry.
  84   * - 0x6
  85     - \_\_u8
  86     - name\_len
  87     - Length of the file name.
  88   * - 0x7
  89     - \_\_u8
  90     - file\_type
  91     - File type code, see ftype_ table below.
  92   * - 0x8
  93     - char
  94     - name[EXT4\_NAME\_LEN]
  95     - File name.
  96
  97.. _ftype:
  98
  99The directory file type is one of the following values:
 100
 101.. list-table::
 102   :widths: 16 64
 103   :header-rows: 1
 104
 105   * - Value
 106     - Description
 107   * - 0x0
 108     - Unknown.
 109   * - 0x1
 110     - Regular file.
 111   * - 0x2
 112     - Directory.
 113   * - 0x3
 114     - Character device file.
 115   * - 0x4
 116     - Block device file.
 117   * - 0x5
 118     - FIFO.
 119   * - 0x6
 120     - Socket.
 121   * - 0x7
 122     - Symbolic link.
 123
 124To support directories that are both encrypted and casefolded directories, we
 125must also include hash information in the directory entry. We append
 126``ext4_extended_dir_entry_2`` to ``ext4_dir_entry_2`` except for the entries
 127for dot and dotdot, which are kept the same. The structure follows immediately
 128after ``name`` and is included in the size listed by ``rec_len`` If a directory
 129entry uses this extension, it may be up to 271 bytes.
 130
 131.. list-table::
 132   :widths: 8 8 24 40
 133   :header-rows: 1
 134
 135   * - Offset
 136     - Size
 137     - Name
 138     - Description
 139   * - 0x0
 140     - \_\_le32
 141     - hash
 142     - The hash of the directory name
 143   * - 0x4
 144     - \_\_le32
 145     - minor\_hash
 146     - The minor hash of the directory name
 147
 148
 149In order to add checksums to these classic directory blocks, a phony
 150``struct ext4_dir_entry`` is placed at the end of each leaf block to
 151hold the checksum. The directory entry is 12 bytes long. The inode
 152number and name\_len fields are set to zero to fool old software into
 153ignoring an apparently empty directory entry, and the checksum is stored
 154in the place where the name normally goes. The structure is
 155``struct ext4_dir_entry_tail``:
 156
 157.. list-table::
 158   :widths: 8 8 24 40
 159   :header-rows: 1
 160
 161   * - Offset
 162     - Size
 163     - Name
 164     - Description
 165   * - 0x0
 166     - \_\_le32
 167     - det\_reserved\_zero1
 168     - Inode number, which must be zero.
 169   * - 0x4
 170     - \_\_le16
 171     - det\_rec\_len
 172     - Length of this directory entry, which must be 12.
 173   * - 0x6
 174     - \_\_u8
 175     - det\_reserved\_zero2
 176     - Length of the file name, which must be zero.
 177   * - 0x7
 178     - \_\_u8
 179     - det\_reserved\_ft
 180     - File type, which must be 0xDE.
 181   * - 0x8
 182     - \_\_le32
 183     - det\_checksum
 184     - Directory leaf block checksum.
 185
 186The leaf directory block checksum is calculated against the FS UUID, the
 187directory's inode number, the directory's inode generation number, and
 188the entire directory entry block up to (but not including) the fake
 189directory entry.
 190
 191Hash Tree Directories
 192~~~~~~~~~~~~~~~~~~~~~
 193
 194A linear array of directory entries isn't great for performance, so a
 195new feature was added to ext3 to provide a faster (but peculiar)
 196balanced tree keyed off a hash of the directory entry name. If the
 197EXT4\_INDEX\_FL (0x1000) flag is set in the inode, this directory uses a
 198hashed btree (htree) to organize and find directory entries. For
 199backwards read-only compatibility with ext2, this tree is actually
 200hidden inside the directory file, masquerading as “empty” directory data
 201blocks! It was stated previously that the end of the linear directory
 202entry table was signified with an entry pointing to inode 0; this is
 203(ab)used to fool the old linear-scan algorithm into thinking that the
 204rest of the directory block is empty so that it moves on.
 205
 206The root of the tree always lives in the first data block of the
 207directory. By ext2 custom, the '.' and '..' entries must appear at the
 208beginning of this first block, so they are put here as two
 209``struct ext4_dir_entry_2``\ s and not stored in the tree. The rest of
 210the root node contains metadata about the tree and finally a hash->block
 211map to find nodes that are lower in the htree. If
 212``dx_root.info.indirect_levels`` is non-zero then the htree has two
 213levels; the data block pointed to by the root node's map is an interior
 214node, which is indexed by a minor hash. Interior nodes in this tree
 215contains a zeroed out ``struct ext4_dir_entry_2`` followed by a
 216minor\_hash->block map to find leafe nodes. Leaf nodes contain a linear
 217array of all ``struct ext4_dir_entry_2``; all of these entries
 218(presumably) hash to the same value. If there is an overflow, the
 219entries simply overflow into the next leaf node, and the
 220least-significant bit of the hash (in the interior node map) that gets
 221us to this next leaf node is set.
 222
 223To traverse the directory as a htree, the code calculates the hash of
 224the desired file name and uses it to find the corresponding block
 225number. If the tree is flat, the block is a linear array of directory
 226entries that can be searched; otherwise, the minor hash of the file name
 227is computed and used against this second block to find the corresponding
 228third block number. That third block number will be a linear array of
 229directory entries.
 230
 231To traverse the directory as a linear array (such as the old code does),
 232the code simply reads every data block in the directory. The blocks used
 233for the htree will appear to have no entries (aside from '.' and '..')
 234and so only the leaf nodes will appear to have any interesting content.
 235
 236The root of the htree is in ``struct dx_root``, which is the full length
 237of a data block:
 238
 239.. list-table::
 240   :widths: 8 8 24 40
 241   :header-rows: 1
 242
 243   * - Offset
 244     - Type
 245     - Name
 246     - Description
 247   * - 0x0
 248     - \_\_le32
 249     - dot.inode
 250     - inode number of this directory.
 251   * - 0x4
 252     - \_\_le16
 253     - dot.rec\_len
 254     - Length of this record, 12.
 255   * - 0x6
 256     - u8
 257     - dot.name\_len
 258     - Length of the name, 1.
 259   * - 0x7
 260     - u8
 261     - dot.file\_type
 262     - File type of this entry, 0x2 (directory) (if the feature flag is set).
 263   * - 0x8
 264     - char
 265     - dot.name[4]
 266     - “.\\0\\0\\0”
 267   * - 0xC
 268     - \_\_le32
 269     - dotdot.inode
 270     - inode number of parent directory.
 271   * - 0x10
 272     - \_\_le16
 273     - dotdot.rec\_len
 274     - block\_size - 12. The record length is long enough to cover all htree
 275       data.
 276   * - 0x12
 277     - u8
 278     - dotdot.name\_len
 279     - Length of the name, 2.
 280   * - 0x13
 281     - u8
 282     - dotdot.file\_type
 283     - File type of this entry, 0x2 (directory) (if the feature flag is set).
 284   * - 0x14
 285     - char
 286     - dotdot\_name[4]
 287     - “..\\0\\0”
 288   * - 0x18
 289     - \_\_le32
 290     - struct dx\_root\_info.reserved\_zero
 291     - Zero.
 292   * - 0x1C
 293     - u8
 294     - struct dx\_root\_info.hash\_version
 295     - Hash type, see dirhash_ table below.
 296   * - 0x1D
 297     - u8
 298     - struct dx\_root\_info.info\_length
 299     - Length of the tree information, 0x8.
 300   * - 0x1E
 301     - u8
 302     - struct dx\_root\_info.indirect\_levels
 303     - Depth of the htree. Cannot be larger than 3 if the INCOMPAT\_LARGEDIR
 304       feature is set; cannot be larger than 2 otherwise.
 305   * - 0x1F
 306     - u8
 307     - struct dx\_root\_info.unused\_flags
 308     -
 309   * - 0x20
 310     - \_\_le16
 311     - limit
 312     - Maximum number of dx\_entries that can follow this header, plus 1 for
 313       the header itself.
 314   * - 0x22
 315     - \_\_le16
 316     - count
 317     - Actual number of dx\_entries that follow this header, plus 1 for the
 318       header itself.
 319   * - 0x24
 320     - \_\_le32
 321     - block
 322     - The block number (within the directory file) that goes with hash=0.
 323   * - 0x28
 324     - struct dx\_entry
 325     - entries[0]
 326     - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
 327
 328.. _dirhash:
 329
 330The directory hash is one of the following values:
 331
 332.. list-table::
 333   :widths: 16 64
 334   :header-rows: 1
 335
 336   * - Value
 337     - Description
 338   * - 0x0
 339     - Legacy.
 340   * - 0x1
 341     - Half MD4.
 342   * - 0x2
 343     - Tea.
 344   * - 0x3
 345     - Legacy, unsigned.
 346   * - 0x4
 347     - Half MD4, unsigned.
 348   * - 0x5
 349     - Tea, unsigned.
 350   * - 0x6
 351     - Siphash.
 352
 353Interior nodes of an htree are recorded as ``struct dx_node``, which is
 354also the full length of a data block:
 355
 356.. list-table::
 357   :widths: 8 8 24 40
 358   :header-rows: 1
 359
 360   * - Offset
 361     - Type
 362     - Name
 363     - Description
 364   * - 0x0
 365     - \_\_le32
 366     - fake.inode
 367     - Zero, to make it look like this entry is not in use.
 368   * - 0x4
 369     - \_\_le16
 370     - fake.rec\_len
 371     - The size of the block, in order to hide all of the dx\_node data.
 372   * - 0x6
 373     - u8
 374     - name\_len
 375     - Zero. There is no name for this “unused” directory entry.
 376   * - 0x7
 377     - u8
 378     - file\_type
 379     - Zero. There is no file type for this “unused” directory entry.
 380   * - 0x8
 381     - \_\_le16
 382     - limit
 383     - Maximum number of dx\_entries that can follow this header, plus 1 for
 384       the header itself.
 385   * - 0xA
 386     - \_\_le16
 387     - count
 388     - Actual number of dx\_entries that follow this header, plus 1 for the
 389       header itself.
 390   * - 0xE
 391     - \_\_le32
 392     - block
 393     - The block number (within the directory file) that goes with the lowest
 394       hash value of this block. This value is stored in the parent block.
 395   * - 0x12
 396     - struct dx\_entry
 397     - entries[0]
 398     - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
 399
 400The hash maps that exist in both ``struct dx_root`` and
 401``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes
 402long:
 403
 404.. list-table::
 405   :widths: 8 8 24 40
 406   :header-rows: 1
 407
 408   * - Offset
 409     - Type
 410     - Name
 411     - Description
 412   * - 0x0
 413     - \_\_le32
 414     - hash
 415     - Hash code.
 416   * - 0x4
 417     - \_\_le32
 418     - block
 419     - Block number (within the directory file, not filesystem blocks) of the
 420       next node in the htree.
 421
 422(If you think this is all quite clever and peculiar, so does the
 423author.)
 424
 425If metadata checksums are enabled, the last 8 bytes of the directory
 426block (precisely the length of one dx\_entry) are used to store a
 427``struct dx_tail``, which contains the checksum. The ``limit`` and
 428``count`` entries in the dx\_root/dx\_node structures are adjusted as
 429necessary to fit the dx\_tail into the block. If there is no space for
 430the dx\_tail, the user is notified to run e2fsck -D to rebuild the
 431directory index (which will ensure that there's space for the checksum.
 432The dx\_tail structure is 8 bytes long and looks like this:
 433
 434.. list-table::
 435   :widths: 8 8 24 40
 436   :header-rows: 1
 437
 438   * - Offset
 439     - Type
 440     - Name
 441     - Description
 442   * - 0x0
 443     - u32
 444     - dt\_reserved
 445     - Zero.
 446   * - 0x4
 447     - \_\_le32
 448     - dt\_checksum
 449     - Checksum of the htree directory block.
 450
 451The checksum is calculated against the FS UUID, the htree index header
 452(dx\_root or dx\_node), all of the htree indices (dx\_entry) that are in
 453use, and the tail block (dx\_tail).
 454