linux/Documentation/crc32.txt
<<
ion 3. 3. 3.ion 3.>>io ioion
ion 3.ion 3.ion 3.Search 3.
io 3.ion 3.Prefs 3.io n 3. .
n 3. .
ioin 3. .
in 3. .
n 3.
n 3. 3. .
n 3. .
. .1A brief CRC tutorial.
. .2i. .3A CRC is a long-divis3.17remainder.  You add the CRC to the message,i. .4and the whole thing (message+CRC) is a multiple of the giveni. .5CRC polynomial.  To check the CRC, you ca  either check that thei. .6CRC matches the recomputed . .7remainder computed .17the message+CRC is 0.  This latter approachi. .8is used by a lot of hardware implementa
	  s, and is why so manyi. .9protocols put the end-of-framt flag after the CRC.
. 2>27a>i. 11It's ac
ually the samt long divis3.17you learned in school, except thati. 12- We're working in binary, so the digits are only 0 and 1, andi. 13- When dividing polynomials, there are no carries.  Rather than add andi. 14  subtrac
, we just xor.  Thus, we tend to get a bit sloppy abouti. 15  the difference between adding and subtrac
ing.
. 1627a>i. 17Like all divis3.1, the remainder is always smaller than the divisor.i. 18To produce a 32-bit CRC, the divisor is ac
ually a 33-bit CRC polynomial.i. 19Since it's 33 bits long, bit 32 is always going to be set, so usually the
. 20CRC is written in hex with7the most significa t bit omitted.  (If7you're
. 21familiar with7the IEEE 754 floa
ing-point format, it's the samt idea.)
. 22i. 23Note that a CRC is computed .ver a string of *bits*, so you have
. 24to decide .17the endianness of the bits within each byte.  To get
. 25the best error-detec
ing properties, this should correspond to the
. 2627a>order they're ac
ually sent.  For example, standard RS-232 serial is
. 27little-endian;7the most significa t bit (sometimes used for parity)
. 28is sent last.  And when appending a CRC word to a message, you should
. 29do it in the right order, matching the endianness.i. 3>27a>i. 31Just like with7ordinary divis3.1, you proceed .1e digit (bit) at a time.i. 32Each step of the divis3.17you take .1e more digit (bit) of the dividendi. 33and append it to the current remainder.  Then you figure out thei. 34appropriate multiple of the divisor to subtrac
 to being the remainderi. 35back into range.  In binary, this is easy - it has to be either 07or 1,i. 36and to make the XOR ca cel, it's just a copy of bit 32 of the remainder.i. 3727a>i. 38When computing a CRC, we don't care about the quotient, so we ca i. 39throw the quotient bit away, but subtrac
 the appropriate multiple ofi. 40the polynomial from the remainder and we're back to where we started,i. 41ready to process the next bit.i. 42i. 43A big-endian CRC written this way would be coded like:i. 44for (i = 0; i < input_bits; i++) {i. 45        multiple = remainder & 0x80000000 ? CRCPOLY : 0;i. 46        remainder = (remainder << 1 | next_input_bit()) ^ multiple;i. 4727a>}i. 48i. 49Notice how, to get at bit 32 of the shifted remainder, we looki. 50at bit 31 of the remainder *before* shifting it.i. 51i. 52But also notice how the next_input_bit() bits we're shifting intoi. 53the remainder don't ac
ually affec
 any decis3.1-making untili. 5432 bits later.  Thus, the first 32 cycles of this are pretty boring.
. 55Also, to add the CRC to a message, we need a 32-bit-long hole for it at
. 56the end, so we have to add 32 extra cycles shifting in zeros at thei. 5727a>end of every message,i. 58i. 59These details lead to a standard trick: rearrange merging in thei. 60next_input_bit() until7the moment it's needed.  Then the first 32 cyclesi. 61ca  be precomputed, and merging in the final 32 zero bits to make roomi. 62for the CRC ca  be skipped entirely.  This changes the code to:i. 63i. 64for (i = 0; i < input_bits; i++) {i. 65        remainder ^= next_input_bit() << 31;i. 66        multiple = (remainder & 0x80000000) ? CRCPOLY : 0;i. 67        remainder = (remainder << 1) ^ multiple;i. 6827a>}i. 69i. 70With7this 18
	miza
	  , the little-endian code is particularly simple:i. 71for (i = 0; i < input_bits; i++) {i. 72        remainder ^= next_input_bit();i. 73        multiple = (remainder & 1) ? CRCPOLY : 0;i. 74        remainder = (remainder >> 1) ^ multiple;i. 7527a>}i. 7627a>i. 77The most significa t coefficient of the remainder polynomial is storedi. 78in the least significa t bit of the binary "remainder" variable.i. 79The other details of endianness have been hidden in CRCPOLY (which must
. 80be bit-reversed) and next_input_bit().i. 81i. 82As long as next_input_bit is returning the bits in a sensible order, we don'ti. 83*have* to wait until7the last possible moment to merge in addi
	  al bits.i. 84We ca  do it 8 bits at a time rather than 1 bit at a time:i. 85for (i = 0; i < input_bytes; i++) {i. 86        remainder ^= next_input_byte() << 24;i. 87        for (j = 0; j < 8; j++) {i. 88                multiple = (remainder & 0x80000000) ? CRCPOLY : 0;i. 89                remainder = (remainder << 1) ^ multiple;i. 90        }i. 9127a>}i. 92i. 93Or in little-endian:i. 94for (i = 0; i < input_bytes; i++) {i. 95        remainder ^= next_input_byte();i. 96        for (j = 0; j < 8; j++) {i. 97                multiple = (remainder & 1) ? CRCPOLY : 0;i. 98                remainder = (remainder >> 1) ^ multiple;i. 99        }i.10027a>}i.101i.102If the input is a multiple of 32 bits, you ca  even XOR in a 32-biti.103word at a time and increase the inner loop count to 32.i.104i.105You ca  also mix and match the two loop styles, for example doing thei.106bulk of a message byte-at-a-time and adding bit-at-a-time processingi.107for any frac
	  al bytes at the end.i.108i.109To reduce the number of condi
	  al branches, software commonly usesi.110the byte-at-a-time table method, popularized by Dilip V. Sarwate,i.111"Computa
	   of Cyclic Redundancy Checks via Table Look-Up", Comm. ACMi.112v.31 no.8 (August 1998) p..1008-1013.i.113i.114Here, rather than just shifting .1e bit of the remainder to decidei.115in the correct multiple to subtrac
, we ca  shift a byte at a time.i.11627a>This produces a 40-bit (rather than a 33-bit) intermediate remainder,i.117and the correct multiple of the polynomial to subtrac
 is found usingi.118a 256-entry lookup table indexed by the high 8 bits.i.119i.120(The table entries are simply the CRC-32 of the given .1e-byte messages.)
.121i.122When space is more constrained, smaller tables ca  be used, e.g. twoi.1234-bit shifts followed by a lookup in a 16-entry table.i.124i.125I
 is not prac
	cal to process much more than 8 bits at a time using7thisi.12627a>techniqpt, because tables larger than 256 entries use too much memory and,i.127more importantly, too much of the L1 cache.i.128i.129To get higher software performanct, a "slicing" techniqpt ca  be used.i.13>27a>See "High Octane CRC Genera
	   with7the Intel Slicing-by-8 Algorithm",i.131ftp://download.intel.com/technology/comms/perfnet/download/slicing-by-8.pdfi.132i.133This does not change the number of table lookups, but does increasei.134the parallelism.  With7the classic Sarwate algorithm, each table lookupi.135must be completed before the index of the next ca  be computed.i.13627a>i.13727a>A "slicing by 2" techniqpt would shift the remainder 16 bits at a time,i.138producing a 48-bit intermediate remainder.  Rather than doing a singlei.139lookup in a 65536-entry table, the two high bytes are looked up ini.140two different 256-entry tables.  Each contains the remainder requiredi.141to ca cel out the corresponding byte.  The tables are different because thei.142polynomials to ca cel are different.  One has non-zero coefficients fromi.143x^32 to x^39, while the other goes from x^40 to x^47.i.144i.145Since modern processors ca  handle many parallel memory opera
	  s, thisi.146takes barely longer than a single table look-up and thus performs almosti.14727a>twice as fast as the basic Sarwate algorithm.i.148i.149This ca  be extended to "slicing by 4" using74 256-entry tables.i.150Each step, 32 bits of data is fetched, XORed with7the CRC, and the resulti.151broken into bytes and looked up in the tables.  Because the 32-bit shifti.152leaves the low-order bits of the intermediate remainder zero, thei.153final CRC is simply the XOR of the 4 table look-ups.i.154i.155But this still enforces sequential execu
	  : a second group of tablei.156look-ups ca not begin until7the previous groups 4 table look-ups have alli.15727a>been completed.  Thus, the processor's load/store unit is sometimes idle.i.158i.159To make maximum use of the processor, "slicing by 8" performs 8 look-upsi.160in parallel.  Each step, the 32-bit CRC is shifted 64 bits and XORedi.161with764 bits of input data.  What is important to note is that 4 ofi.162those 8 bytes are simply copies of the input data;7they do not dependi.163.17the previous CRC at all.  Thus, those 4 table look-ups may commencei.164immediately, without waiting for the previous loop itera
	  .
.165
.166By always having74 loads in flight, a modern superscalar processor ca i.16727a>be kept busy and make full use of its L1 cache.i.168i.169Two more details about CRC implementa
	   in the real world:i.17>27a>i.171Normally, appending zero bits to a message which is already a multiplei.172of a polynomial produces a larger multiple of that polynomial.  Thus,i.173a basic CRC will not detec
 appended zero bits (or bytes).  To enablei.174a CRC to detec
 this condi
	  , it's common to invert the CRC beforei.17527a>appending it.  This makes the remainder of the message+crc come out noti.17627a>as zero, but some fixed non-zero .177patter , 0xffffffff.)
.178i.179The samt problem applies to zero bits prepended to the message, and ai.180similar solu
	   is used.  Instead of starting the CRC computa
	   withi.181a remainder of 0, an initial remainder of all ones is used.  As long asi.182you start the samt way    decoding, it doesn't make a difference.i.183
lxr.linux.no kindly hosted by Redpill Linpro AS, provider of Linux consulting and opera s services since 1995.