1Anticipatory IO scheduler 2------------------------- 3Nick Piggin <piggin@cyberone.com.au> 13 Sep 2003 4 5Attention! Database servers, especially those using "TCQ" disks should 6investigate performance with the 'deadline' IO scheduler. Any system with high 7disk performance requirements should do so, in fact. 8 9If you see unusual performance characteristics of your disk systems, or you 10see big performance regressions versus the deadline scheduler, please email 11me. Database users don't bother unless you're willing to test a lot of patches 12from me ;) its a known issue. 13 14 15Selecting IO schedulers 16----------------------- 17To choose IO schedulers at boot time, use the argument 'elevator=deadline'. 18'noop' and 'as' (the default) are also available. IO schedulers are assigned 19globally at boot time only presently. 20 21 22Tuning the anticipatory IO scheduler 23------------------------------------ 24When using 'as', the anticipatory IO scheduler there are 5 parameters under 25/sys/block/*/iosched/. All are units of milliseconds. 26 27The parameters are: 28* read_expire 29 Controls how long until a request becomes "expired". It also controls the 30 interval between which expired requests are served, so set to 50, a request 31 might take anywhere < 100ms to be serviced _if_ it is the next on the 32 expired list. Obviously it won't make the disk go faster. The result 33 basically equates to the timeslice a single reader gets in the presence of 34 other IO. 100*((seek time / read_expire) + 1) is very roughly the % 35 streaming read efficiency your disk should get with multiple readers. 36 37* read_batch_expire 38 Controls how much time a batch of reads is given before pending writes are 39 served. Higher value is more efficient. This might be set below read_expire 40 if writes are to be given higher priority than reads, but reads are to be 41 as efficient as possible when there are no writes. Generally though, it 42 should be some multiple of read_expire. 43 44* write_expire, and 45* write_batch_expire are equivalent to the above, for writes. 46 47* antic_expire 48 Controls the maximum amount of time we can anticipate a good read before 49 giving up. Many other factors may cause anticipation to be stopped early, 50 or some processes will not be "anticipated" at all. Should be a bit higher 51 for big seek time devices though not a linear correspondence - most 52 processes have only a few ms thinktime. 53 54

