1Ceph Distributed File System
   4Ceph is a distributed network file system designed to provide good
   5performance, reliability, and scalability.
   7Basic features include:
   9 * POSIX semantics
  10 * Seamless scaling from 1 to many thousands of nodes
  11 * High availability and reliability.  No single point of failure.
  12 * N-way replication of data across storage nodes
  13 * Fast recovery from node failures
  14 * Automatic rebalancing of data on node addition/removal
  15 * Easy deployment: most FS components are userspace daemons
  18 * Flexible snapshots (on any directory)
  19 * Recursive accounting (nested files, directories, bytes)
  21In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely
  22on symmetric access by all clients to shared block devices, Ceph
  23separates data and metadata management into independent server
  24clusters, similar to Lustre.  Unlike Lustre, however, metadata and
  25storage nodes run entirely as user space daemons.  Storage nodes
  26utilize btrfs to store data objects, leveraging its advanced features
  27(checksumming, metadata replication, etc.).  File data is striped
  28across storage nodes in large chunks to distribute workload and
  29facilitate high throughputs.  When storage nodes fail, data is
  30re-replicated in a distributed fashion by the storage nodes themselves
  31(with some minimal coordination from a cluster monitor), making the
  32system extremely efficient and scalable.
  34Metadata servers effectively form a large, consistent, distributed
  35in-memory cache above the file namespace that is extremely scalable,
  36dynamically redistributes metadata in response to workload changes,
  37and can tolerate arbitrary (well, non-Byzantine) node failures.  The
  38metadata server takes a somewhat unconventional approach to metadata
  39storage to significantly improve performance for common workloads.  In
  40particular, inodes with only a single link are embedded in
  41directories, allowing entire directories of dentries and inodes to be
  42loaded into its cache with a single I/O operation.  The contents of
  43extremely large directories can be fragmented and managed by
  44independent metadata servers, allowing scalable concurrent access.
  46The system offers automatic data rebalancing/migration when scaling
  47from a small cluster of just a few nodes to many hundreds, without
  48requiring an administrator carve the data set into static volumes or
  49go through the tedious process of migrating data between servers.
  50When the file system approaches full, new nodes can be easily added
  51and things will "just work."
  53Ceph includes flexible snapshot mechanism that allows a user to create
  54a snapshot on any subdirectory (and its nested contents) in the
  55system.  Snapshot creation and deletion are as simple as 'mkdir
  56.snap/foo' and 'rmdir .snap/foo'.
  58Ceph also provides some recursive accounting on directories for nested
  59files and bytes.  That is, a 'getfattr -d foo' on any directory in the
  60system will reveal the total number of nested regular files and
  61subdirectories, and a summation of all nested file sizes.  This makes
  62the identification of large disk space consumers relatively quick, as
  63no 'du' or similar recursive scan of the file system is required.
  66Mount Syntax
  69The basic mount syntax is:
  71 # mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
  73You only need to specify a single monitor, as the client will get the
  74full list when it connects.  (However, if the monitor you specify
  75happens to be down, the mount won't succeed.)  The port can be left
  76off if the monitor is using the default.  So if the monitor is at
  79 # mount -t ceph /mnt/ceph
  81is sufficient.  If /sbin/mount.ceph is installed, a hostname can be
  82used instead of an IP address.
  86Mount Options
  89  ip=A.B.C.D[:N]
  90        Specify the IP and/or port the client should bind to locally.
  91        There is normally not much reason to do this.  If the IP is not
  92        specified, the client's IP address is determined by looking at the
  93        address its connection to the monitor originates from.
  95  wsize=X
  96        Specify the maximum write size in bytes.  By default there is no
  97        maximum.  Ceph will normally size writes based on the file stripe
  98        size.
 100  rsize=X
 101        Specify the maximum readahead.
 103  mount_timeout=X
 104        Specify the timeout value for mount (in seconds), in the case
 105        of a non-responsive Ceph file system.  The default is 30
 106        seconds.
 108  rbytes
 109        When stat() is called on a directory, set st_size to 'rbytes',
 110        the summation of file sizes over all files nested beneath that
 111        directory.  This is the default.
 113  norbytes
 114        When stat() is called on a directory, set st_size to the
 115        number of entries in that directory.
 117  nocrc
 118        Disable CRC32C calculation for data writes.  If set, the storage node
 119        must rely on TCP's error correction to detect data corruption
 120        in the data payload.
 122  dcache
 123        Use the dcache contents to perform negative lookups and
 124        readdir when the client has the entire directory contents in
 125        its cache.  (This does not change correctness; the client uses
 126        cached metadata only when a lease or capability ensures it is
 127        valid.)
 129  nodcache
 130        Do not use the dcache as above.  This avoids a significant amount of
 131        complex code, sacrificing performance without affecting correctness,
 132        and is useful for tracking down bugs.
 134  noasyncreaddir
 135        Do not use the dcache as above for readdir.
 137More Information
 140For more information on Ceph, see the home page at
 143The Linux kernel client source tree is available at
 144        git://
 145        git://
 147and the source for the full system is at
 148        git://