linux/Documentation/crc32.txt
<<
valu22./spa v2 22./formv2 22.a valu22 href="../linux+v3.9.6/Documenta> /crc32.txt">valu22.img src="../.sta> c/gfx/right.png" alt=">>">va./spa v2va.spa class="lxr_search">valu ="+search" method="post" onsubmit="return do_search(this);">valu22.input typ hidden" nam navtarget" on> ">valu22.input typ text" nam search" id search">valu22.buttptityp submit">Searchvalu22Prefs2 22./a>va./spa v2lu22 2./divv2lu22 2.form ac> ="ajax+*" method="post" onsubmit="return false;">va.input typ hidden" nam ajax_lookup" id ajax_lookup" on> ">vlu22 2./formv2vlu22 2.div class="headingbottpm">2 21./a>A brief CRC tutorial. 2 22./a>v2 23./a>A CRC is a long-divisoptiremainder. You add the CRC to the message,v2 24./a>and the whole thing (message+CRC) is a multiple of the givenv2 25./a>CRC polynomial. To check the CRC, you ca either check that thev2 26./a>CRC matches the recomputed on> , *or* you ca check that thev2 27./a>remainder computed ptithe message+CRC is 0. This latter approachv2 28./a>is used by a lot of hardware implementa> s, and is why so manyv2 29./a>protocols put the end-of-fram flag after the CRC. 2 va>v2 11./a>It's ac>ually the sam long divisoptiyou learned in school, except thatv2 12./a>- We're working in binary, so the digits are only 0 and 1, andv2 13./a>- When dividing polynomials, there are no carries. Rather than add andv2 14./a> subtrac>, we just xor. Thus, we tend to get a bit sloppy aboutv2 15./a> the difference between adding and subtrac>ing. 2 16 va>v2 17./a>Like all divisopt, the remainder is always smaller than the divisor.v2 18./a>To produce a 32-bit CRC, the divisor is ac>ually a 33-bit CRC polynomial.v2 19./a>Since it's 33 bits long, bit 32 is always going to be set, so usually the 2 20./a>CRC is written in hex withithe most significa t bit omitted. (Ifiyou're 2 21./a>familiar withithe IEEE 754 floa>ing-point format, it's the sam idea.) 2 22./a>v2 23./a>Note that a CRC is computed pver a string of *bits*, so you have 2 24./a>to decide ptithe endianness of the bits within each byte. To get 2 25./a>the best error-detec>ing properties, this should correspond to the 2 26 va>order they're ac>ually sent. For example, standard RS-232 serial is 2 27./a>little-endian;ithe most significa t bit (sometimes used for parity) 2 28./a>is sent last. And when appending a CRC word to a message, you should 2 29./a>do it in the right order, matching the endianness.v2 3 va>v2 31./a>Just like withiordinary divisopt, you proceed pte digit (bit) at a time.v2 32./a>Each step of the divisoptiyou take pte more digit (bit) of the dividendv2 33./a>and append it to the current remainder. Then you figure out thev2 34./a>appropriate multiple of the divisor to subtrac> to being the remainderv2 35./a>back into range. In binary, this is easy - it has to be either 0ior 1,v2 36./a>and to make the XOR ca cel, it's just a copy of bit 32 of the remainder.v2 37 va>v2 38./a>When computing a CRC, we don't care about the quotient, so we ca v2 39./a>throw the quotient bit away, but subtrac> the appropriate multiple ofv2 40./a>the polynomial from the remainder and we're back to where we started,v2 41./a>ready to process the next bit.v2 42./a>v2 43./a>A big-endian CRC written this way would be coded like:v2 44./a>for (i = 0; i < input_bits; i++) {v2 45./a> multiple = remainder & 0x80000000 ? CRCPOLY : 0;v2 46./a> remainder = (remainder << 1 | next_input_bit()) ^ multiple;v2 47 va>}v2 48./a>v2 49./a>Notice how, to get at bit 32 of the shifted remainder, we lookv2 50./a>at bit 31 of the remainder *before* shifting it.v2 51./a>v2 52./a>But also notice how the next_input_bit() bits we're shifting intov2 53./a>the remainder don't ac>ually affec> any decisopt-making untilv2 54./a>32 bits later. Thus, the first 32 cycles of this are pretty boring. 2 55./a>Also, to add the CRC to a message, we need a 32-bit-long hole for it at 2 56./a>the end, so we have to add 32 extra cycles shifting in zeros at thev2 57 va>end of every message,v2 58./a>v2 59./a>These details lead to a standard trick: rearrange merging in thev2 60./a>next_input_bit() untilithe moment it's needed. Then the first 32 cyclesv2 61./a>ca be precomputed, and merging in the final 32 zero bits to make roomv2 62./a>for the CRC ca be skipped entirely. This changes the code to:v2 63./a>v2 64./a>for (i = 0; i < input_bits; i++) {v2 65./a> remainder ^= next_input_bit() << 31;v2 66./a> multiple = (remainder & 0x80000000) ? CRCPOLY : 0;v2 67./a> remainder = (remainder << 1) ^ multiple;v2 68 va>}v2 69./a>v2 70./a>Withithis > miza> , the little-endian code is particularly simple:v2 71./a>for (i = 0; i < input_bits; i++) {v2 72./a> remainder ^= next_input_bit();v2 73./a> multiple = (remainder & 1) ? CRCPOLY : 0;v2 74./a> remainder = (remainder >> 1) ^ multiple;v2 75 va>}v2 76 va>v2 77./a>The most significa t coefficient of the remainder polynomial is storedv2 78./a>in the least significa t bit of the binary "remainder" variable.v2 79./a>The other details of endianness have been hidden in CRCPOLY (which must 2 80./a>be bit-reversed) and next_input_bit().v2 81./a>v2 82./a>As long as next_input_bit is returning the bits in a sensible order, we don'tv2 83./a>*have* to wait untilithe last possible moment to merge in addi> al bits.v2 84./a>We ca do it 8 bits at a time rather than 1 bit at a time:v2 85./a>for (i = 0; i < input_bytes; i++) {v2 86./a> remainder ^= next_input_byte() << 24;v2 87./a> for (j = 0; j < 8; j++) {v2 88./a> multiple = (remainder & 0x80000000) ? CRCPOLY : 0;v2 89./a> remainder = (remainder << 1) ^ multiple;v2 90./a> }v2 91 va>}v2 92./a>v2 93./a>Or in little-endian:v2 94./a>for (i = 0; i < input_bytes; i++) {v2 95./a> remainder ^= next_input_byte();v2 96./a> for (j = 0; j < 8; j++) {v2 97./a> multiple = (remainder & 1) ? CRCPOLY : 0;v2 98./a> remainder = (remainder >> 1) ^ multiple;v2 99./a> }v2100 va>}v2101./a>v2102./a>If the input is a multiple of 32 bits, you ca even XOR in a 32-bitv2103./a>word at a time and increase the inner loop count to 32.v2104./a>v2105./a>You ca also mix and match the two loop styles, for example doing thev2106./a>bulk of a message byte-at-a-time and adding bit-at-a-time processingv2107./a>for any frac> al bytes at the end.v2108./a>v2109./a>To reduce the number of condi> al branches, software commonly usesv2110./a>the byte-at-a-time table method, popularized by Dilip V. Sarwate,v2111./a>"Computa> of Cyclic Redundancy Checks via Table Look-Up", Comm. ACMv2112./a>v.31 no.8 (August 1998) p.21008-1013.v2113./a>v2114./a>Here, rather than just shifting pte bit of the remainder to decidev2115./a>in the correct multiple to subtrac>, we ca shift a byte at a time.v2116 va>This produces a 40-bit (rather than a 33-bit) intermediate remainder,v2117./a>and the correct multiple of the polynomial to subtrac> is found usingv2118./a>a 256-entry lookup table indexed by the high 8 bits.v2119./a>v2120./a>(The table entries are simply the CRC-32 of the given pte-byte messages.) 2121./a>v2122./a>When space is more constrained, smaller tables ca be used, e.g. twov2123./a>4-bit shifts followed by a lookup in a 16-entry table.v2124./a>v2125./a>I> is not prac> cal to process much more than 8 bits at a time usingithisv2126 va>techniq , because tables larger than 256 entries use too much memory and,v2127./a>more importantly, too much of the L1 cache.v2128./a>v2129./a>To get higher software performanc , a "slicing" techniq ca be used.v213 va>See "High Octane CRC Genera> withithe Intel Slicing-by-8 Algorithm",v2131./a>ftp://download.intel.com/technology/comms/perfnet/download/slicing-by-8.pdfv2132./a>v2133./a>This does not change the number of table lookups, but does increasev2134./a>the parallelism. Withithe classic Sarwate algorithm, each table lookupv2135./a>must be completed before the index of the next ca be computed.v2136 va>v2137 va>A "slicing by 2" techniq would shift the remainder 16 bits at a time,v2138./a>producing a 48-bit intermediate remainder. Rather than doing a singlev2139./a>lookup in a 65536-entry table, the two high bytes are looked up inv2140./a>two different 256-entry tables. Each contains the remainder requiredv2141./a>to ca cel out the corresponding byte. The tables are different because thev2142./a>polynomials to ca cel are different. One has non-zero coefficients fromv2143./a>x^32 to x^39, while the other goes from x^40 to x^47.v2144./a>v2145./a>Since modern processors ca handle many parallel memory opera> s, thisv2146./a>takes barely longer than a single table look-up and thus performs almostv2147 va>twice as fast as the basic Sarwate algorithm.v2148./a>v2149./a>This ca be extended to "slicing by 4" usingi4 256-entry tables.v2150./a>Each step, 32 bits of data is fetched, XORed withithe CRC, and the resultv2151./a>broken into bytes and looked up in the tables. Because the 32-bit shiftv2152./a>leaves the low-order bits of the intermediate remainder zero, thev2153./a>final CRC is simply the XOR of the 4 table look-ups.v2154./a>v2155./a>But this still enforces sequential execu> : a second group of tablev2156./a>look-ups ca not begin untilithe previous groups 4 table look-ups have allv2157 va>been completed. Thus, the processor's load/store unit is sometimes idle.v2158./a>v2159./a>To make maximum use of the processor, "slicing by 8" performs 8 look-upsv2160./a>in parallel. Each step, the 32-bit CRC is shifted 64 bits and XORedv2161./a>withi64 bits of input data. What is important to note is that 4 ofv2162./a>those 8 bytes are simply copies of the input data;ithey do not dependv2163./a>ptithe previous CRC at all. Thus, those 4 table look-ups may commencev2164./a>immediately, without waiting for the previous loop itera> . 2165./a> 2166./a>By always havingi4 loads in flight, a modern superscalar processor ca v2167 va>be kept busy and make full use of its L1 cache.v2168./a>v2169./a>Two more details about CRC implementa> in the real world:v217 va>v2171./a>Normally, appending zero bits to a message which is already a multiplev2172./a>of a polynomial produces a larger multiple of that polynomial. Thus,v2173./a>a basic CRC will not detec> appended zero bits (or bytes). To enablev2174./a>a CRC to detec> this condi> , it's common to invert the CRC beforev2175 va>appending it. This makes the remainder of the message+crc come out notv2176 va>as zero, but some fixed non-zero on> . (The CRC of the invers v2177./a>patter , 0xffffffff.) 2178./a>v2179./a>The sam problem applies to zero bits prepended to the message, and av2180./a>similar solu> is used. Instead of starting the CRC computa> withv2181./a>a remainder of 0, an initial remainder of all ones is used. As long asv2182./a>you start the sam way decoding, it doesn't make a difference.v2183./a> The original LXR software by the LXR community./a>, this experimental vers by lxr@linux.no./a>. ./divv2.div class="subfooter"> lxr.linux.no kindly hosted by Redpill Linpro AS./a>, provider of Linux consulting and opera> s services since 1995. ./divv2 ./bodyv2./htmlv2