linux/Documentation/networking/cxgb.txt
<<
>>
Prefs
   1                 Chelsio N210 10Gb Ethernet Network Controller
   2
   3                         Driver Release Notes for Linux
   4
   5                                 Version 2.1.1
   6
   7                                 June 20, 2005
   8
   9CONTENTS
  10========
  11 INTRODUCTION
  12 FEATURES
  13 PERFORMANCE
  14 DRIVER MESSAGES
  15 KNOWN ISSUES
  16 SUPPORT
  17
  18
  19INTRODUCTION
  20============
  21
  22 This document describes the Linux driver for Chelsio 10Gb Ethernet Network
  23 Controller. This driver supports the Chelsio N210 NIC and is backward
  24 compatible with the Chelsio N110 model 10Gb NICs.
  25
  26
  27FEATURES
  28========
  29
  30 Adaptive Interrupts (adaptive-rx)
  31 ---------------------------------
  32
  33  This feature provides an adaptive algorithm that adjusts the interrupt
  34  coalescing parameters, allowing the driver to dynamically adapt the latency
  35  settings to achieve the highest performance during various types of network
  36  load.
  37
  38  The interface used to control this feature is ethtool. Please see the
  39  ethtool manpage for additional usage information.
  40
  41  By default, adaptive-rx is disabled.
  42  To enable adaptive-rx:
  43
  44      ethtool -C <interface> adaptive-rx on
  45
  46  To disable adaptive-rx, use ethtool:
  47
  48      ethtool -C <interface> adaptive-rx off
  49
  50  After disabling adaptive-rx, the timer latency value will be set to 50us.
  51  You may set the timer latency after disabling adaptive-rx:
  52
  53      ethtool -C <interface> rx-usecs <microseconds>
  54
  55  An example to set the timer latency value to 100us on eth0:
  56
  57      ethtool -C eth0 rx-usecs 100
  58
  59  You may also provide a timer latency value while disabling adaptive-rx:
  60
  61      ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
  62
  63  If adaptive-rx is disabled and a timer latency value is specified, the timer
  64  will be set to the specified value until changed by the user or until
  65  adaptive-rx is enabled.
  66
  67  To view the status of the adaptive-rx and timer latency values:
  68
  69      ethtool -c <interface>
  70
  71
  72 TCP Segmentation Offloading (TSO) Support
  73 -----------------------------------------
  74
  75  This feature, also known as "large send", enables a system's protocol stack
  76  to offload portions of outbound TCP processing to a network interface card
  77  thereby reducing system CPU utilization and enhancing performance.
  78
  79  The interface used to control this feature is ethtool version 1.8 or higher.
  80  Please see the ethtool manpage for additional usage information.
  81
  82  By default, TSO is enabled.
  83  To disable TSO:
  84
  85      ethtool -K <interface> tso off
  86
  87  To enable TSO:
  88
  89      ethtool -K <interface> tso on
  90
  91  To view the status of TSO:
  92
  93      ethtool -k <interface>
  94
  95
  96PERFORMANCE
  97===========
  98
  99 The following information is provided as an example of how to change system
 100 parameters for "performance tuning" an what value to use. You may or may not
 101 want to change these system parameters, depending on your server/workstation
 102 application. Doing so is not warranted in any way by Chelsio Communications,
 103 and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
 104 of data or damage to equipment.
 105
 106 Your distribution may have a different way of doing things, or you may prefer
 107 a different method. These commands are shown only to provide an example of
 108 what to do and are by no means definitive.
 109
 110 Making any of the following system changes will only last until you reboot
 111 your system. You may want to write a script that runs at boot-up which
 112 includes the optimal settings for your system.
 113
 114  Setting PCI Latency Timer:
 115      setpci -d 1425:* 0x0c.l=0x0000F800
 116
 117  Disabling TCP timestamp:
 118      sysctl -w net.ipv4.tcp_timestamps=0
 119
 120  Disabling SACK:
 121      sysctl -w net.ipv4.tcp_sack=0
 122
 123  Setting large number of incoming connection requests:
 124      sysctl -w net.ipv4.tcp_max_syn_backlog=3000
 125
 126  Setting maximum receive socket buffer size:
 127      sysctl -w net.core.rmem_max=1024000
 128
 129  Setting maximum send socket buffer size:
 130      sysctl -w net.core.wmem_max=1024000
 131
 132  Set smp_affinity (on a multiprocessor system) to a single CPU:
 133      echo 1 > /proc/irq/<interrupt_number>/smp_affinity
 134
 135  Setting default receive socket buffer size:
 136      sysctl -w net.core.rmem_default=524287
 137
 138  Setting default send socket buffer size:
 139      sysctl -w net.core.wmem_default=524287
 140
 141  Setting maximum option memory buffers:
 142      sysctl -w net.core.optmem_max=524287
 143
 144  Setting maximum backlog (# of unprocessed packets before kernel drops):
 145      sysctl -w net.core.netdev_max_backlog=300000
 146
 147  Setting TCP read buffers (min/default/max):
 148      sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
 149
 150  Setting TCP write buffers (min/pressure/max):
 151      sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
 152
 153  Setting TCP buffer space (min/pressure/max):
 154      sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
 155
 156  TCP window size for single connections:
 157   The receive buffer (RX_WINDOW) size must be at least as large as the
 158   Bandwidth-Delay Product of the communication link between the sender and
 159   receiver. Due to the variations of RTT, you may want to increase the buffer
 160   size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
 161   "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
 162   At 10Gb speeds, use the following formula:
 163       RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
 164       Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
 165   RX_WINDOW sizes of 256KB - 512KB should be sufficient.
 166   Setting the min, max, and default receive buffer (RX_WINDOW) size:
 167       sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
 168
 169  TCP window size for multiple connections:
 170   The receive buffer (RX_WINDOW) size may be calculated the same as single
 171   connections, but should be divided by the number of connections. The
 172   smaller window prevents congestion and facilitates better pacing,
 173   especially if/when MAC level flow control does not work well or when it is
 174   not supported on the machine. Experimentation may be necessary to attain
 175   the correct value. This method is provided as a starting point for the
 176   correct receive buffer size.
 177   Setting the min, max, and default receive buffer (RX_WINDOW) size is
 178   performed in the same manner as single connection.
 179
 180
 181DRIVER MESSAGES
 182===============
 183
 184 The following messages are the most common messages logged by syslog. These
 185 may be found in /var/log/messages.
 186
 187  Driver up:
 188     Chelsio Network Driver - version 2.1.1
 189
 190  NIC detected:
 191     eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
 192
 193  Link up:
 194     eth#: link is up at 10 Gbps, full duplex
 195
 196  Link down:
 197     eth#: link is down
 198
 199
 200KNOWN ISSUES
 201============
 202
 203 These issues have been identified during testing. The following information
 204 is provided as a workaround to the problem. In some cases, this problem is
 205 inherent to Linux or to a particular Linux Distribution and/or hardware
 206 platform.
 207
 208  1. Large number of TCP retransmits on a multiprocessor (SMP) system.
 209
 210      On a system with multiple CPUs, the interrupt (IRQ) for the network
 211      controller may be bound to more than one CPU. This will cause TCP
 212      retransmits if the packet data were to be split across different CPUs
 213      and re-assembled in a different order than expected.
 214
 215      To eliminate the TCP retransmits, set smp_affinity on the particular
 216      interrupt to a single CPU. You can locate the interrupt (IRQ) used on
 217      the N110/N210 by using ifconfig:
 218          ifconfig <dev_name> | grep Interrupt
 219      Set the smp_affinity to a single CPU:
 220          echo 1 > /proc/irq/<interrupt_number>/smp_affinity
 221
 222      It is highly suggested that you do not run the irqbalance daemon on your
 223      system, as this will change any smp_affinity setting you have applied.
 224      The irqbalance daemon runs on a 10 second interval and binds interrupts
 225      to the least loaded CPU determined by the daemon. To disable this daemon:
 226          chkconfig --level 2345 irqbalance off
 227
 228      By default, some Linux distributions enable the kernel feature,
 229      irqbalance, which performs the same function as the daemon. To disable
 230      this feature, add the following line to your bootloader:
 231          noirqbalance
 232
 233          Example using the Grub bootloader:
 234              title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
 235              root (hd0,0)
 236              kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
 237              initrd /initrd-2.4.21-27.ELsmp.img
 238
 239  2. After running insmod, the driver is loaded and the incorrect network
 240     interface is brought up without running ifup.
 241
 242      When using 2.4.x kernels, including RHEL kernels, the Linux kernel
 243      invokes a script named "hotplug". This script is primarily used to
 244      automatically bring up USB devices when they are plugged in, however,
 245      the script also attempts to automatically bring up a network interface
 246      after loading the kernel module. The hotplug script does this by scanning
 247      the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
 248      for HWADDR=<mac_address>.
 249
 250      If the hotplug script does not find the HWADDRR within any of the
 251      ifcfg-eth# files, it will bring up the device with the next available
 252      interface name. If this interface is already configured for a different
 253      network card, your new interface will have incorrect IP address and
 254      network settings.
 255
 256      To solve this issue, you can add the HWADDR=<mac_address> key to the
 257      interface config file of your network controller.
 258
 259      To disable this "hotplug" feature, you may add the driver (module name)
 260      to the "blacklist" file located in /etc/hotplug. It has been noted that
 261      this does not work for network devices because the net.agent script
 262      does not use the blacklist file. Simply remove, or rename, the net.agent
 263      script located in /etc/hotplug to disable this feature.
 264
 265  3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
 266     on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
 267
 268      If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
 269      chipset, you may experience the "133-Mhz Mode Split Completion Data
 270      Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
 271      bus PCI-X bus.
 272
 273      AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
 274      can provide stale data via split completion cycles to a PCI-X card that
 275      is operating at 133 Mhz", causing data corruption.
 276
 277      AMD's provides three workarounds for this problem, however, Chelsio
 278      recommends the first option for best performance with this bug:
 279
 280        For 133Mhz secondary bus operation, limit the transaction length and
 281        the number of outstanding transactions, via BIOS configuration
 282        programming of the PCI-X card, to the following:
 283
 284           Data Length (bytes): 1k
 285           Total allowed outstanding transactions: 2
 286
 287      Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
 288      section 56, "133-MHz Mode Split Completion Data Corruption" for more
 289      details with this bug and workarounds suggested by AMD.
 290
 291      It may be possible to work outside AMD's recommended PCI-X settings, try
 292      increasing the Data Length to 2k bytes for increased performance. If you
 293      have issues with these settings, please revert to the "safe" settings
 294      and duplicate the problem before submitting a bug or asking for support.
 295
 296      NOTE: The default setting on most systems is 8 outstanding transactions
 297            and 2k bytes data length.
 298
 299  4. On multiprocessor systems, it has been noted that an application which
 300     is handling 10Gb networking can switch between CPUs causing degraded
 301     and/or unstable performance.
 302
 303      If running on an SMP system and taking performance measurements, it
 304      is suggested you either run the latest netperf-2.4.0+ or use a binding
 305      tool such as Tim Hockin's procstate utilities (runon)
 306      <http://www.hockin.org/~thockin/procstate/>.
 307
 308      Binding netserver and netperf (or other applications) to particular
 309      CPUs will have a significant difference in performance measurements.
 310      You may need to experiment which CPU to bind the application to in
 311      order to achieve the best performance for your system.
 312
 313      If you are developing an application designed for 10Gb networking,
 314      please keep in mind you may want to look at kernel functions
 315      sched_setaffinity & sched_getaffinity to bind your application.
 316
 317      If you are just running user-space applications such as ftp, telnet,
 318      etc., you may want to try the runon tool provided by Tim Hockin's
 319      procstate utility. You could also try binding the interface to a
 320      particular CPU: runon 0 ifup eth0
 321
 322
 323SUPPORT
 324=======
 325
 326 If you have problems with the software or hardware, please contact our
 327 customer support team via email at support@chelsio.com or check our website
 328 at http://www.chelsio.com
 329
 330===============================================================================
 331
 332 Chelsio Communications
 333 370 San Aleso Ave.
 334 Suite 100
 335 Sunnyvale, CA 94085
 336 http://www.chelsio.com
 337
 338This program is free software; you can redistribute it and/or modify
 339it under the terms of the GNU General Public License, version 2, as
 340published by the Free Software Foundation.
 341
 342You should have received a copy of the GNU General Public License along
 343with this program; if not, write to the Free Software Foundation, Inc.,
 34459 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
 345
 346THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
 347WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
 348MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
 349
 350 Copyright (c) 2003-2005 Chelsio Communications. All rights reserved.
 351
 352===============================================================================
 353
lxr.linux.no kindly hosted by Redpill Linpro AS, provider of Linux consulting and operations services since 1995.