linux/Documentation/networking/openvswitch.txt
<<
>>
Prefs
   1Open vSwitch datapath developer documentation
   2=============================================
   3
   4The Open vSwitch kernel module allows flexible userspace control over
   5flow-level packet processing on selected network devices.  It can be
   6used to implement a plain Ethernet switch, network device bonding,
   7VLAN processing, network access control, flow-based network control,
   8and so on.
   9
  10The kernel module implements multiple "datapaths" (analogous to
  11bridges), each of which can have multiple "vports" (analogous to ports
  12within a bridge).  Each datapath also has associated with it a "flow
  13table" that userspace populates with "flows" that map from keys based
  14on packet headers and metadata to sets of actions.  The most common
  15action forwards the packet to another vport; other actions are also
  16implemented.
  17
  18When a packet arrives on a vport, the kernel module processes it by
  19extracting its flow key and looking it up in the flow table.  If there
  20is a matching flow, it executes the associated actions.  If there is
  21no match, it queues the packet to userspace for processing (as part of
  22its processing, userspace will likely set up a flow to handle further
  23packets of the same type entirely in-kernel).
  24
  25
  26Flow key compatibility
  27----------------------
  28
  29Network protocols evolve over time.  New protocols become important
  30and existing protocols lose their prominence.  For the Open vSwitch
  31kernel module to remain relevant, it must be possible for newer
  32versions to parse additional protocols as part of the flow key.  It
  33might even be desirable, someday, to drop support for parsing
  34protocols that have become obsolete.  Therefore, the Netlink interface
  35to Open vSwitch is designed to allow carefully written userspace
  36applications to work with any version of the flow key, past or future.
  37
  38To support this forward and backward compatibility, whenever the
  39kernel module passes a packet to userspace, it also passes along the
  40flow key that it parsed from the packet.  Userspace then extracts its
  41own notion of a flow key from the packet and compares it against the
  42kernel-provided version:
  43
  44    - If userspace's notion of the flow key for the packet matches the
  45      kernel's, then nothing special is necessary.
  46
  47    - If the kernel's flow key includes more fields than the userspace
  48      version of the flow key, for example if the kernel decoded IPv6
  49      headers but userspace stopped at the Ethernet type (because it
  50      does not understand IPv6), then again nothing special is
  51      necessary.  Userspace can still set up a flow in the usual way,
  52      as long as it uses the kernel-provided flow key to do it.
  53
  54    - If the userspace flow key includes more fields than the
  55      kernel's, for example if userspace decoded an IPv6 header but
  56      the kernel stopped at the Ethernet type, then userspace can
  57      forward the packet manually, without setting up a flow in the
  58      kernel.  This case is bad for performance because every packet
  59      that the kernel considers part of the flow must go to userspace,
  60      but the forwarding behavior is correct.  (If userspace can
  61      determine that the values of the extra fields would not affect
  62      forwarding behavior, then it could set up a flow anyway.)
  63
  64How flow keys evolve over time is important to making this work, so
  65the following sections go into detail.
  66
  67
  68Flow key format
  69---------------
  70
  71A flow key is passed over a Netlink socket as a sequence of Netlink
  72attributes.  Some attributes represent packet metadata, defined as any
  73information about a packet that cannot be extracted from the packet
  74itself, e.g. the vport on which the packet was received.  Most
  75attributes, however, are extracted from headers within the packet,
  76e.g. source and destination addresses from Ethernet, IP, or TCP
  77headers.
  78
  79The <linux/openvswitch.h> header file defines the exact format of the
  80flow key attributes.  For informal explanatory purposes here, we write
  81them as comma-separated strings, with parentheses indicating arguments
  82and nesting.  For example, the following could represent a flow key
  83corresponding to a TCP packet that arrived on vport 1:
  84
  85    in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
  86    eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
  87    frag=no), tcp(src=49163, dst=80)
  88
  89Often we ellipsize arguments not important to the discussion, e.g.:
  90
  91    in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
  92
  93
  94Basic rule for evolving flow keys
  95---------------------------------
  96
  97Some care is needed to really maintain forward and backward
  98compatibility for applications that follow the rules listed under
  99"Flow key compatibility" above.
 100
 101The basic rule is obvious:
 102
 103    ------------------------------------------------------------------
 104    New network protocol support must only supplement existing flow
 105    key attributes.  It must not change the meaning of already defined
 106    flow key attributes.
 107    ------------------------------------------------------------------
 108
 109This rule does have less-obvious consequences so it is worth working
 110through a few examples.  Suppose, for example, that the kernel module
 111did not already implement VLAN parsing.  Instead, it just interpreted
 112the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
 113packet.  The flow key for any packet with an 802.1Q header would look
 114essentially like this, ignoring metadata:
 115
 116    eth(...), eth_type(0x8100)
 117
 118Naively, to add VLAN support, it makes sense to add a new "vlan" flow
 119key attribute to contain the VLAN tag, then continue to decode the
 120encapsulated headers beyond the VLAN tag using the existing field
 121definitions.  With this change, a TCP packet in VLAN 10 would have a
 122flow key much like this:
 123
 124    eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
 125
 126But this change would negatively affect a userspace application that
 127has not been updated to understand the new "vlan" flow key attribute.
 128The application could, following the flow compatibility rules above,
 129ignore the "vlan" attribute that it does not understand and therefore
 130assume that the flow contained IP packets.  This is a bad assumption
 131(the flow only contains IP packets if one parses and skips over the
 132802.1Q header) and it could cause the application's behavior to change
 133across kernel versions even though it follows the compatibility rules.
 134
 135The solution is to use a set of nested attributes.  This is, for
 136example, why 802.1Q support uses nested attributes.  A TCP packet in
 137VLAN 10 is actually expressed as:
 138
 139    eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
 140    ip(proto=6, ...), tcp(...)))
 141
 142Notice how the "eth_type", "ip", and "tcp" flow key attributes are
 143nested inside the "encap" attribute.  Thus, an application that does
 144not understand the "vlan" key will not see either of those attributes
 145and therefore will not misinterpret them.  (Also, the outer eth_type
 146is still 0x8100, not changed to 0x0800.)
 147
 148Handling malformed packets
 149--------------------------
 150
 151Don't drop packets in the kernel for malformed protocol headers, bad
 152checksums, etc.  This would prevent userspace from implementing a
 153simple Ethernet switch that forwards every packet.
 154
 155Instead, in such a case, include an attribute with "empty" content.
 156It doesn't matter if the empty content could be valid protocol values,
 157as long as those values are rarely seen in practice, because userspace
 158can always forward all packets with those values to userspace and
 159handle them individually.
 160
 161For example, consider a packet that contains an IP header that
 162indicates protocol 6 for TCP, but which is truncated just after the IP
 163header, so that the TCP header is missing.  The flow key for this
 164packet would include a tcp attribute with all-zero src and dst, like
 165this:
 166
 167    eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
 168
 169As another example, consider a packet with an Ethernet type of 0x8100,
 170indicating that a VLAN TCI should follow, but which is truncated just
 171after the Ethernet type.  The flow key for this packet would include
 172an all-zero-bits vlan and an empty encap attribute, like this:
 173
 174    eth(...), eth_type(0x8100), vlan(0), encap()
 175
 176Unlike a TCP packet with source and destination ports 0, an
 177all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
 178VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
 179attribute expressly to allow this situation to be distinguished.
 180Thus, the flow key in this second example unambiguously indicates a
 181missing or malformed VLAN TCI.
 182
 183Other rules
 184-----------
 185
 186The other rules for flow keys are much less subtle:
 187
 188    - Duplicate attributes are not allowed at a given nesting level.
 189
 190    - Ordering of attributes is not significant.
 191
 192    - When the kernel sends a given flow key to userspace, it always
 193      composes it the same way.  This allows userspace to hash and
 194      compare entire flow keys that it may not be able to fully
 195      interpret.
 196
lxr.linux.no kindly hosted by Redpill Linpro AS, provider of Linux consulting and operations services since 1995.