linux/Documentation/scsi/st.txt
<<
>>
Prefs
   1This file contains brief information about the SCSI tape driver.
   2The driver is currently maintained by Kai M\xC3\xA4kisara (email
   3Kai.Makisara@kolumbus.fi)
   4
   5Last modified: Sun Feb 24 21:59:07 2008 by kai.makisara
   6
   7
   8BASICS
   9
  10The driver is generic, i.e., it does not contain any code tailored
  11to any specific tape drive. The tape parameters can be specified with
  12one of the following three methods:
  13
  141. Each user can specify the tape parameters he/she wants to use
  15directly with ioctls. This is administratively a very simple and
  16flexible method and applicable to single-user workstations. However,
  17in a multiuser environment the next user finds the tape parameters in
  18state the previous user left them.
  19
  202. The system manager (root) can define default values for some tape
  21parameters, like block size and density using the MTSETDRVBUFFER ioctl.
  22These parameters can be programmed to come into effect either when a
  23new tape is loaded into the drive or if writing begins at the
  24beginning of the tape. The second method is applicable if the tape
  25drive performs auto-detection of the tape format well (like some
  26QIC-drives). The result is that any tape can be read, writing can be
  27continued using existing format, and the default format is used if
  28the tape is rewritten from the beginning (or a new tape is written
  29for the first time). The first method is applicable if the drive
  30does not perform auto-detection well enough and there is a single
  31"sensible" mode for the device. An example is a DAT drive that is
  32used only in variable block mode (I don't know if this is sensible
  33or not :-).
  34
  35The user can override the parameters defined by the system
  36manager. The changes persist until the defaults again come into
  37effect.
  38
  393. By default, up to four modes can be defined and selected using the minor
  40number (bits 5 and 6). The number of modes can be changed by changing
  41ST_NBR_MODE_BITS in st.h. Mode 0 corresponds to the defaults discussed
  42above. Additional modes are dormant until they are defined by the
  43system manager (root). When specification of a new mode is started,
  44the configuration of mode 0 is used to provide a starting point for
  45definition of the new mode.
  46
  47Using the modes allows the system manager to give the users choices
  48over some of the buffering parameters not directly accessible to the
  49users (buffered and asynchronous writes). The modes also allow choices
  50between formats in multi-tape operations (the explicitly overridden
  51parameters are reset when a new tape is loaded).
  52
  53If more than one mode is used, all modes should contain definitions
  54for the same set of parameters.
  55
  56Many Unices contain internal tables that associate different modes to
  57supported devices. The Linux SCSI tape driver does not contain such
  58tables (and will not do that in future). Instead of that, a utility
  59program can be made that fetches the inquiry data sent by the device,
  60scans its database, and sets up the modes using the ioctls. Another
  61alternative is to make a small script that uses mt to set the defaults
  62tailored to the system.
  63
  64The driver supports fixed and variable block size (within buffer
  65limits). Both the auto-rewind (minor equals device number) and
  66non-rewind devices (minor is 128 + device number) are implemented.
  67
  68In variable block mode, the byte count in write() determines the size
  69of the physical block on tape. When reading, the drive reads the next
  70tape block and returns to the user the data if the read() byte count
  71is at least the block size. Otherwise, error ENOMEM is returned.
  72
  73In fixed block mode, the data transfer between the drive and the
  74driver is in multiples of the block size. The write() byte count must
  75be a multiple of the block size. This is not required when reading but
  76may be advisable for portability.
  77
  78Support is provided for changing the tape partition and partitioning
  79of the tape with one or two partitions. By default support for
  80partitioned tape is disabled for each driver and it can be enabled
  81with the ioctl MTSETDRVBUFFER.
  82
  83By default the driver writes one filemark when the device is closed after
  84writing and the last operation has been a write. Two filemarks can be
  85optionally written. In both cases end of data is signified by
  86returning zero bytes for two consecutive reads.
  87
  88If rewind, offline, bsf, or seek is done and previous tape operation was
  89write, a filemark is written before moving tape.
  90
  91The compile options are defined in the file linux/drivers/scsi/st_options.h.
  92
  934. If the open option O_NONBLOCK is used, open succeeds even if the
  94drive is not ready. If O_NONBLOCK is not used, the driver waits for
  95the drive to become ready. If this does not happen in ST_BLOCK_SECONDS
  96seconds, open fails with the errno value EIO. With O_NONBLOCK the
  97device can be opened for writing even if there is a write protected
  98tape in the drive (commands trying to write something return error if
  99attempted).
 100
 101
 102MINOR NUMBERS
 103
 104The tape driver currently supports 128 drives by default. This number
 105can be increased by editing st.h and recompiling the driver if
 106necessary. The upper limit is 2^17 drives if 4 modes for each drive
 107are used.
 108
 109The minor numbers consist of the following bit fields:
 110
 111dev_upper non-rew mode dev-lower
 112  20 -  8     7    6 5  4      0
 113The non-rewind bit is always bit 7 (the uppermost bit in the lowermost
 114byte). The bits defining the mode are below the non-rewind bit. The
 115remaining bits define the tape device number. This numbering is
 116backward compatible with the numbering used when the minor number was
 117only 8 bits wide.
 118
 119
 120SYSFS SUPPORT
 121
 122The driver creates the directory /sys/class/scsi_tape and populates it with
 123directories corresponding to the existing tape devices. There are autorewind
 124and non-rewind entries for each mode. The names are stxy and nstxy, where x
 125is the tape number and y a character corresponding to the mode (none, l, m,
 126a). For example, the directories for the first tape device are (assuming four
 127modes): st0  nst0  st0l  nst0l  st0m  nst0m  st0a  nst0a.
 128
 129Each directory contains the entries: default_blksize  default_compression
 130default_density  defined  dev  device  driver. The file 'defined' contains 1
 131if the mode is defined and zero if not defined. The files 'default_*' contain
 132the defaults set by the user. The value -1 means the default is not set. The
 133file 'dev' contains the device numbers corresponding to this device. The links
 134'device' and 'driver' point to the SCSI device and driver entries.
 135
 136Each directory also contains the entry 'options' which shows the currently
 137enabled driver and mode options. The value in the file is a bit mask where the
 138bit definitions are the same as those used with MTSETDRVBUFFER in setting the
 139options.
 140
 141A link named 'tape' is made from the SCSI device directory to the class
 142directory corresponding to the mode 0 auto-rewind device (e.g., st0). 
 143
 144
 145BSD AND SYS V SEMANTICS
 146
 147The user can choose between these two behaviours of the tape driver by
 148defining the value of the symbol ST_SYSV. The semantics differ when a
 149file being read is closed. The BSD semantics leaves the tape where it
 150currently is whereas the SYS V semantics moves the tape past the next
 151filemark unless the filemark has just been crossed.
 152
 153The default is BSD semantics.
 154
 155
 156BUFFERING
 157
 158The driver tries to do transfers directly to/from user space. If this
 159is not possible, a driver buffer allocated at run-time is used. If
 160direct i/o is not possible for the whole transfer, the driver buffer
 161is used (i.e., bounce buffers for individual pages are not
 162used). Direct i/o can be impossible because of several reasons, e.g.:
 163- one or more pages are at addresses not reachable by the HBA
 164- the number of pages in the transfer exceeds the number of
 165  scatter/gather segments permitted by the HBA
 166- one or more pages can't be locked into memory (should not happen in
 167  any reasonable situation)
 168
 169The size of the driver buffers is always at least one tape block. In fixed
 170block mode, the minimum buffer size is defined (in 1024 byte units) by
 171ST_FIXED_BUFFER_BLOCKS. With small block size this allows buffering of
 172several blocks and using one SCSI read or write to transfer all of the
 173blocks. Buffering of data across write calls in fixed block mode is
 174allowed if ST_BUFFER_WRITES is non-zero and direct i/o is not used.
 175Buffer allocation uses chunks of memory having sizes 2^n * (page
 176size). Because of this the actual buffer size may be larger than the
 177minimum allowable buffer size.
 178
 179NOTE that if direct i/o is used, the small writes are not buffered. This may
 180cause a surprise when moving from 2.4. There small writes (e.g., tar without
 181-b option) may have had good throughput but this is not true any more with
 1822.6. Direct i/o can be turned off to solve this problem but a better solution
 183is to use bigger write() byte counts (e.g., tar -b 64).
 184
 185Asynchronous writing. Writing the buffer contents to the tape is
 186started and the write call returns immediately. The status is checked
 187at the next tape operation. Asynchronous writes are not done with
 188direct i/o and not in fixed block mode.
 189
 190Buffered writes and asynchronous writes may in some rare cases cause
 191problems in multivolume operations if there is not enough space on the
 192tape after the early-warning mark to flush the driver buffer.
 193
 194Read ahead for fixed block mode (ST_READ_AHEAD). Filling the buffer is
 195attempted even if the user does not want to get all of the data at
 196this read command. Should be disabled for those drives that don't like
 197a filemark to truncate a read request or that don't like backspacing.
 198
 199Scatter/gather buffers (buffers that consist of chunks non-contiguous
 200in the physical memory) are used if contiguous buffers can't be
 201allocated. To support all SCSI adapters (including those not
 202supporting scatter/gather), buffer allocation is using the following
 203three kinds of chunks:
 2041. The initial segment that is used for all SCSI adapters including
 205those not supporting scatter/gather. The size of this buffer will be
 206(PAGE_SIZE << ST_FIRST_ORDER) bytes if the system can give a chunk of
 207this size (and it is not larger than the buffer size specified by
 208ST_BUFFER_BLOCKS). If this size is not available, the driver halves
 209the size and tries again until the size of one page. The default
 210settings in st_options.h make the driver to try to allocate all of the
 211buffer as one chunk.
 2122. The scatter/gather segments to fill the specified buffer size are
 213allocated so that as many segments as possible are used but the number
 214of segments does not exceed ST_FIRST_SG.
 2153. The remaining segments between ST_MAX_SG (or the module parameter
 216max_sg_segs) and the number of segments used in phases 1 and 2
 217are used to extend the buffer at run-time if this is necessary. The
 218number of scatter/gather segments allowed for the SCSI adapter is not
 219exceeded if it is smaller than the maximum number of scatter/gather
 220segments specified. If the maximum number allowed for the SCSI adapter
 221is smaller than the number of segments used in phases 1 and 2,
 222extending the buffer will always fail.
 223
 224
 225EOM BEHAVIOUR WHEN WRITING
 226
 227When the end of medium early warning is encountered, the current write
 228is finished and the number of bytes is returned. The next write
 229returns -1 and errno is set to ENOSPC. To enable writing a trailer,
 230the next write is allowed to proceed and, if successful, the number of
 231bytes is returned. After this, -1 and the number of bytes are
 232alternately returned until the physical end of medium (or some other
 233error) is encountered.
 234
 235
 236MODULE PARAMETERS
 237
 238The buffer size, write threshold, and the maximum number of allocated buffers
 239are configurable when the driver is loaded as a module. The keywords are:
 240
 241buffer_kbs=xxx             the buffer size for fixed block mode is set
 242                           to xxx kilobytes
 243write_threshold_kbs=xxx    the write threshold in kilobytes set to xxx
 244max_sg_segs=xxx            the maximum number of scatter/gather
 245                           segments
 246try_direct_io=x            try direct transfer between user buffer and
 247                           tape drive if this is non-zero
 248
 249Note that if the buffer size is changed but the write threshold is not
 250set, the write threshold is set to the new buffer size - 2 kB.
 251
 252
 253BOOT TIME CONFIGURATION
 254
 255If the driver is compiled into the kernel, the same parameters can be
 256also set using, e.g., the LILO command line. The preferred syntax is
 257to use the same keyword used when loading as module but prepended
 258with 'st.'. For instance, to set the maximum number of scatter/gather
 259segments, the parameter 'st.max_sg_segs=xx' should be used (xx is the
 260number of scatter/gather segments).
 261
 262For compatibility, the old syntax from early 2.5 and 2.4 kernel
 263versions is supported. The same keywords can be used as when loading
 264the driver as module. If several parameters are set, the keyword-value
 265pairs are separated with a comma (no spaces allowed). A colon can be
 266used instead of the equal mark. The definition is prepended by the
 267string st=. Here is an example:
 268
 269        st=buffer_kbs:64,write_threshold_kbs:60
 270
 271The following syntax used by the old kernel versions is also supported:
 272
 273           st=aa[,bb[,dd]]
 274
 275where
 276  aa is the buffer size for fixed block mode in 1024 byte units
 277  bb is the write threshold in 1024 byte units
 278  dd is the maximum number of scatter/gather segments
 279
 280
 281IOCTLS
 282
 283The tape is positioned and the drive parameters are set with ioctls
 284defined in mtio.h The tape control program 'mt' uses these ioctls. Try
 285to find an mt that supports all of the Linux SCSI tape ioctls and
 286opens the device for writing if the tape contents will be modified
 287(look for a package mt-st* from the Linux ftp sites; the GNU mt does
 288not open for writing for, e.g., erase).
 289
 290The supported ioctls are:
 291
 292The following use the structure mtop:
 293
 294MTFSF   Space forward over count filemarks. Tape positioned after filemark.
 295MTFSFM  As above but tape positioned before filemark.
 296MTBSF   Space backward over count filemarks. Tape positioned before
 297        filemark.
 298MTBSFM  As above but ape positioned after filemark.
 299MTFSR   Space forward over count records.
 300MTBSR   Space backward over count records.
 301MTFSS   Space forward over count setmarks.
 302MTBSS   Space backward over count setmarks.
 303MTWEOF  Write count filemarks.
 304MTWSM   Write count setmarks.
 305MTREW   Rewind tape.
 306MTOFFL  Set device off line (often rewind plus eject).
 307MTNOP   Do nothing except flush the buffers.
 308MTRETEN Re-tension tape.
 309MTEOM   Space to end of recorded data.
 310MTERASE Erase tape. If the argument is zero, the short erase command
 311        is used. The long erase command is used with all other values
 312        of the argument.
 313MTSEEK  Seek to tape block count. Uses Tandberg-compatible seek (QFA)
 314        for SCSI-1 drives and SCSI-2 seek for SCSI-2 drives. The file and
 315        block numbers in the status are not valid after a seek.
 316MTSETBLK Set the drive block size. Setting to zero sets the drive into
 317        variable block mode (if applicable).
 318MTSETDENSITY Sets the drive density code to arg. See drive
 319        documentation for available codes.
 320MTLOCK and MTUNLOCK Explicitly lock/unlock the tape drive door.
 321MTLOAD and MTUNLOAD Explicitly load and unload the tape. If the
 322        command argument x is between MT_ST_HPLOADER_OFFSET + 1 and
 323        MT_ST_HPLOADER_OFFSET + 6, the number x is used sent to the
 324        drive with the command and it selects the tape slot to use of
 325        HP C1553A changer.
 326MTCOMPRESSION Sets compressing or uncompressing drive mode using the
 327        SCSI mode page 15. Note that some drives other methods for
 328        control of compression. Some drives (like the Exabytes) use
 329        density codes for compression control. Some drives use another
 330        mode page but this page has not been implemented in the
 331        driver. Some drives without compression capability will accept
 332        any compression mode without error.
 333MTSETPART Moves the tape to the partition given by the argument at the
 334        next tape operation. The block at which the tape is positioned
 335        is the block where the tape was previously positioned in the
 336        new active partition unless the next tape operation is
 337        MTSEEK. In this case the tape is moved directly to the block
 338        specified by MTSEEK. MTSETPART is inactive unless
 339        MT_ST_CAN_PARTITIONS set.
 340MTMKPART Formats the tape with one partition (argument zero) or two
 341        partitions (the argument gives in megabytes the size of
 342        partition 1 that is physically the first partition of the
 343        tape). The drive has to support partitions with size specified
 344        by the initiator. Inactive unless MT_ST_CAN_PARTITIONS set.
 345MTSETDRVBUFFER
 346        Is used for several purposes. The command is obtained from count
 347        with mask MT_SET_OPTIONS, the low order bits are used as argument.
 348        This command is only allowed for the superuser (root). The
 349        subcommands are:
 350        0
 351           The drive buffer option is set to the argument. Zero means
 352           no buffering.
 353        MT_ST_BOOLEANS
 354           Sets the buffering options. The bits are the new states
 355           (enabled/disabled) the following options (in the
 356           parenthesis is specified whether the option is global or
 357           can be specified differently for each mode):
 358             MT_ST_BUFFER_WRITES write buffering (mode)
 359             MT_ST_ASYNC_WRITES asynchronous writes (mode)
 360             MT_ST_READ_AHEAD  read ahead (mode)
 361             MT_ST_TWO_FM writing of two filemarks (global)
 362             MT_ST_FAST_EOM using the SCSI spacing to EOD (global)
 363             MT_ST_AUTO_LOCK automatic locking of the drive door (global)
 364             MT_ST_DEF_WRITES the defaults are meant only for writes (mode)
 365             MT_ST_CAN_BSR backspacing over more than one records can
 366                be used for repositioning the tape (global)
 367             MT_ST_NO_BLKLIMS the driver does not ask the block limits
 368                from the drive (block size can be changed only to
 369                variable) (global)
 370             MT_ST_CAN_PARTITIONS enables support for partitioned
 371                tapes (global)
 372             MT_ST_SCSI2LOGICAL the logical block number is used in
 373                the MTSEEK and MTIOCPOS for SCSI-2 drives instead of
 374                the device dependent address. It is recommended to set
 375                this flag unless there are tapes using the device
 376                dependent (from the old times) (global)
 377             MT_ST_SYSV sets the SYSV semantics (mode)
 378             MT_ST_NOWAIT enables immediate mode (i.e., don't wait for
 379                the command to finish) for some commands (e.g., rewind)
 380             MT_ST_SILI enables setting the SILI bit in SCSI commands when
 381                reading in variable block mode to enhance performance when
 382                reading blocks shorter than the byte count; set this only
 383                if you are sure that the drive supports SILI and the HBA
 384                correctly returns transfer residuals
 385             MT_ST_DEBUGGING debugging (global; debugging must be
 386                compiled into the driver)
 387        MT_ST_SETBOOLEANS
 388        MT_ST_CLEARBOOLEANS
 389           Sets or clears the option bits.
 390        MT_ST_WRITE_THRESHOLD
 391           Sets the write threshold for this device to kilobytes
 392           specified by the lowest bits.
 393        MT_ST_DEF_BLKSIZE
 394           Defines the default block size set automatically. Value
 395           0xffffff means that the default is not used any more.
 396        MT_ST_DEF_DENSITY
 397        MT_ST_DEF_DRVBUFFER
 398           Used to set or clear the density (8 bits), and drive buffer
 399           state (3 bits). If the value is MT_ST_CLEAR_DEFAULT
 400           (0xfffff) the default will not be used any more. Otherwise
 401           the lowermost bits of the value contain the new value of
 402           the parameter.
 403        MT_ST_DEF_COMPRESSION
 404           The compression default will not be used if the value of
 405           the lowermost byte is 0xff. Otherwise the lowermost bit
 406           contains the new default. If the bits 8-15 are set to a
 407           non-zero number, and this number is not 0xff, the number is
 408           used as the compression algorithm. The value
 409           MT_ST_CLEAR_DEFAULT can be used to clear the compression
 410           default.
 411        MT_ST_SET_TIMEOUT
 412           Set the normal timeout in seconds for this device. The
 413           default is 900 seconds (15 minutes). The timeout should be
 414           long enough for the retries done by the device while
 415           reading/writing.
 416        MT_ST_SET_LONG_TIMEOUT
 417           Set the long timeout that is used for operations that are
 418           known to take a long time. The default is 14000 seconds
 419           (3.9 hours). For erase this value is further multiplied by
 420           eight.
 421        MT_ST_SET_CLN
 422           Set the cleaning request interpretation parameters using
 423           the lowest 24 bits of the argument. The driver can set the
 424           generic status bit GMT_CLN if a cleaning request bit pattern
 425           is found from the extended sense data. Many drives set one or
 426           more bits in the extended sense data when the drive needs
 427           cleaning. The bits are device-dependent. The driver is
 428           given the number of the sense data byte (the lowest eight
 429           bits of the argument; must be >= 18 (values 1 - 17
 430           reserved) and <= the maximum requested sense data sixe), 
 431           a mask to select the relevant bits (the bits 9-16), and the
 432           bit pattern (bits 17-23). If the bit pattern is zero, one
 433           or more bits under the mask indicate cleaning request. If
 434           the pattern is non-zero, the pattern must match the masked
 435           sense data byte.
 436
 437           (The cleaning bit is set if the additional sense code and
 438           qualifier 00h 17h are seen regardless of the setting of
 439           MT_ST_SET_CLN.)
 440
 441The following ioctl uses the structure mtpos:
 442MTIOCPOS Reads the current position from the drive. Uses
 443        Tandberg-compatible QFA for SCSI-1 drives and the SCSI-2
 444        command for the SCSI-2 drives.
 445
 446The following ioctl uses the structure mtget to return the status:
 447MTIOCGET Returns some status information.
 448        The file number and block number within file are returned. The
 449        block is -1 when it can't be determined (e.g., after MTBSF).
 450        The drive type is either MTISSCSI1 or MTISSCSI2.
 451        The number of recovered errors since the previous status call
 452        is stored in the lower word of the field mt_erreg.
 453        The current block size and the density code are stored in the field
 454        mt_dsreg (shifts for the subfields are MT_ST_BLKSIZE_SHIFT and
 455        MT_ST_DENSITY_SHIFT).
 456        The GMT_xxx status bits reflect the drive status. GMT_DR_OPEN
 457        is set if there is no tape in the drive. GMT_EOD means either
 458        end of recorded data or end of tape. GMT_EOT means end of tape.
 459
 460
 461MISCELLANEOUS COMPILE OPTIONS
 462
 463The recovered write errors are considered fatal if ST_RECOVERED_WRITE_FATAL
 464is defined.
 465
 466The maximum number of tape devices is determined by the define
 467ST_MAX_TAPES. If more tapes are detected at driver initialization, the
 468maximum is adjusted accordingly.
 469
 470Immediate return from tape positioning SCSI commands can be enabled by
 471defining ST_NOWAIT. If this is defined, the user should take care that
 472the next tape operation is not started before the previous one has
 473finished. The drives and SCSI adapters should handle this condition
 474gracefully, but some drive/adapter combinations are known to hang the
 475SCSI bus in this case.
 476
 477The MTEOM command is by default implemented as spacing over 32767
 478filemarks. With this method the file number in the status is
 479correct. The user can request using direct spacing to EOD by setting
 480ST_FAST_EOM 1 (or using the MT_ST_OPTIONS ioctl). In this case the file
 481number will be invalid.
 482
 483When using read ahead or buffered writes the position within the file
 484may not be correct after the file is closed (correct position may
 485require backspacing over more than one record). The correct position
 486within file can be obtained if ST_IN_FILE_POS is defined at compile
 487time or the MT_ST_CAN_BSR bit is set for the drive with an ioctl.
 488(The driver always backs over a filemark crossed by read ahead if the
 489user does not request data that far.)
 490
 491
 492DEBUGGING HINTS
 493
 494To enable debugging messages, edit st.c and #define DEBUG 1. As seen
 495above, debugging can be switched off with an ioctl if debugging is
 496compiled into the driver. The debugging output is not voluminous.
 497
 498If the tape seems to hang, I would be very interested to hear where
 499the driver is waiting. With the command 'ps -l' you can see the state
 500of the process using the tape. If the state is D, the process is
 501waiting for something. The field WCHAN tells where the driver is
 502waiting. If you have the current System.map in the correct place (in
 503/boot for the procps I use) or have updated /etc/psdatabase (for kmem
 504ps), ps writes the function name in the WCHAN field. If not, you have
 505to look up the function from System.map.
 506
 507Note also that the timeouts are very long compared to most other
 508drivers. This means that the Linux driver may appear hung although the
 509real reason is that the tape firmware has got confused.
 510