linux/Documentation/crc32.txt
<<
e="v2.6.28"4.3 pti formt> ptia 33333333Search pti formt> i v23ct>3 ptia href="+prefs?return=Documentaelec/crc32.txt"33 pti a>3/opt ti divt>/opt tiform acelec="ajax+*" method="post" onsubmit="return false;">33/opt ti formt>3/opt tidiv class="headingbott.m">/opt/opt pt tidiv id search_results" class="search_results"> pt>/opt ti divt> idiv id content"t> idiv id file_contents"t
t t1i a>A brief CRC tutorial.
t t2i a>3t t3i a>A CRC is a long-divis2.6.remainder.  You add the CRC to the message,3t t4i a>and the whole thing (message+CRC) is a multiple of the given3t t5i a>CRC polynomial.  To check the CRC, you c3c either check that the3t t6i a>CRC matches the recomputed 11"
	, *or* you c3c check that the3t t7i a>remainder computed .6.the message+CRC is 0.  This latter approach3t t8i a>is used by a lot of hardware implementaelecs, and is why so many3t t9i a>protocols put the end-of-fram	 flag after the CRC.
t   
 a>3t 11i a>It's aceually the sam	 long divis2.6.you learned in school, except that3t 12i a>- We're working in binary, so the digits are only 0 and 1, and3t 13i a>- When dividing polynomials, there are no carries.  Rather than add and3t 14i a>  subtrace, we just xor.  Thus, we tend to get a bit sloppy about3t 15i a>  the difference between adding and subtraceing.
t 16
 a>3t 17i a>Like all divis2.6, the remainder is always smaller than the divisor.3t 18i a>To produce a 32-bit CRC, the divisor is aceually a 33-bit CRC polynomial.3t 19i a>Since it's 33 bits long, bit 32 is always going to be set, so usually the
t 20i a>CRC is written in hex with.the most signific3ct bit omitted.  (If.you're
t 21i a>familiar with.the IEEE 754 floaeing-point format, it's the sam	 idea.)
t 22i a>3t 23i a>Note that a CRC is computed .ver a string of *bits*, so you have
t 24i a>to decide .6.the endianness of the bits within each byte.  To get
t 25i a>the best error-deteceing properties, this should correspond to the
t 26
 a>order they're aceually sent.  For example, standard RS-232 serial is
t 27i a>little-endian;.the most signific3ct bit (sometimes used for parity)
t 28i a>is sent last.  And when appending a CRC word to a message, you should
t 29i a>do it in the right order, matching the endianness.3t 3 
 a>3t 31i a>Just like with.ordinary divis2.6, you proceed .6e digit (bit) at a time.3t 32i a>Each step of the divis2.6.you take .6e more digit (bit) of the dividend3t 33i a>and append it to the current remainder.  Then you figure out the3t 34i a>appropriate multiple of the divisor to subtrace to being the remainder3t 35i a>back into range.  In binary, this is easy - it has to be either 0.or 1,3t 36i a>and to make the XOR c3ccel, it's just a copy of bit 32 of the remainder.3t 37
 a>3t 38i a>When computing a CRC, we don't care about the quotient, so we c3c3t 39i a>throw the quotient bit away, but subtrace the appropriate multiple of3t 40i a>the polynomial from the remainder and we're back to where we started,3t 41i a>ready to process the next bit.3t 42i a>3t 43i a>A big-endian CRC written this way would be coded like:3t 44i a>for (i = 0; i < input_bits; i++) {3t 45i a>        multiple = remainder & 0x80000000 ? CRCPOLY : 0;3t 46i a>        remainder = (remainder << 1 | next_input_bit()) ^ multiple;3t 47
 a>}3t 48i a>3t 49i a>Notice how, to get at bit 32 of the shifted remainder, we look3t 50i a>at bit 31 of the remainder *before* shifting it.3t 51i a>3t 52i a>But also notice how the next_input_bit() bits we're shifting into3t 53i a>the remainder don't aceually affece any decis2.6-making until3t 54i a>32 bits later.  Thus, the first 32 cycles of this are pretty boring.
t 55i a>Also, to add the CRC to a message, we need a 32-bit-long hole for it at
t 56i a>the end, so we have to add 32 extra cycles shifting in zeros at the3t 57
 a>end of every message,3t 58i a>3t 59i a>These details lead to a standard trick: rearrange merging in the3t 60i a>next_input_bit() until.the moment it's needed.  Then the first 32 cycles3t 61i a>c3c be precomputed, and merging in the final 32 zero bits to make room3t 62i a>for the CRC c3c be skipped entirely.  This changes the code to:3t 63i a>3t 64i a>for (i = 0; i < input_bits; i++) {3t 65i a>        remainder ^= next_input_bit() << 31;3t 66i a>        multiple = (remainder & 0x80000000) ? CRCPOLY : 0;3t 67i a>        remainder = (remainder << 1) ^ multiple;3t 68
 a>}3t 69i a>3t 70i a>With.this /selmizaelec, the little-endian code is particularly simple:3t 71i a>for (i = 0; i < input_bits; i++) {3t 72i a>        remainder ^= next_input_bit();3t 73i a>        multiple = (remainder & 1) ? CRCPOLY : 0;3t 74i a>        remainder = (remainder >> 1) ^ multiple;3t 75
 a>}3t 76
 a>3t 77i a>The most signific3ct coefficient of the remainder polynomial is stored3t 78i a>in the least signific3ct bit of the binary "remainder" variable.3t 79i a>The other details of endianness have been hidden in CRCPOLY (which must
t 80i a>be bit-reversed) and next_input_bit().3t 81i a>3t 82i a>As long as next_input_bit is returning the bits in a sensible order, we don't3t 83i a>*have* to wait until.the last possible moment to merge in addielecal bits.3t 84i a>We c3c do it 8 bits at a time rather than 1 bit at a time:3t 85i a>for (i = 0; i < input_bytes; i++) {3t 86i a>        remainder ^= next_input_byte() << 24;3t 87i a>        for (j = 0; j < 8; j++) {3t 88i a>                multiple = (remainder & 0x80000000) ? CRCPOLY : 0;3t 89i a>                remainder = (remainder << 1) ^ multiple;3t 90i a>        }3t 91
 a>}3t 92i a>3t 93i a>Or in little-endian:3t 94i a>for (i = 0; i < input_bytes; i++) {3t 95i a>        remainder ^= next_input_byte();3t 96i a>        for (j = 0; j < 8; j++) {3t 97i a>                multiple = (remainder & 1) ? CRCPOLY : 0;3t 98i a>                remainder = (remainder >> 1) ^ multiple;3t 99i a>        }3t100
 a>}3t101i a>3t102i a>If the input is a multiple of 32 bits, you c3c even XOR in a 32-bit3t103i a>word at a time and increase the inner loop count to 32.3t104i a>3t105i a>You c3c also mix and match the two loop styles, for example doing the3t106i a>bulk of a message byte-at-a-time and adding bit-at-a-time processing3t107i a>for any fracelecal bytes at the end.3t108i a>3t109i a>To reduce the number of condielecal branches, software commonly uses3t110i a>the byte-at-a-time table method, popularized by Dilip V. Sarwate,3t111i a>"Computaelec of Cyclic Redundancy Checks via Table Look-Up", Comm. ACM3t112i a>v.31 no.8 (August 1998) p.t1008-1013.3t113i a>3t114i a>Here, rather than just shifting .6e bit of the remainder to decide3t115i a>in the correct multiple to subtrace, we c3c shift a byte at a time.3t116
 a>This produces a 40-bit (rather than a 33-bit) intermediate remainder,3t117i a>and the correct multiple of the polynomial to subtrace is found using3t118i a>a 256-entry lookup table indexed by the high 8 bits.3t119i a>3t120i a>(The table entries are simply the CRC-32 of the given .6e-byte messages.)
t121i a>3t122i a>When space is more constrained, smaller tables c3c be used, e.g. two3t123i a>4-bit shifts followed by a lookup in a 16-entry table.3t124i a>3t125i a>Ie is not pracelcal to process much more than 8 bits at a time using.this3t126
 a>techniq
	, because tables larger than 256 entries use too much memory and,3t127i a>more importantly, too much of the L1 c3che.3t128i a>3t129i a>To get higher software performanc	, a "slicing" techniq
	 c3c be used.3t13 
 a>See "High Octane CRC Generaelec with.the Intel Slicing-by-8 Algorithm",3t131i a>ftp://download.intel.com/technology/comms/perfnet/download/slicing-by-8.pdf3t132i a>3t133i a>This does not change the number of table lookups, but does increase3t134i a>the parallelism.  With.the classic Sarwate algorithm, each table lookup3t135i a>must be completed before the index of the next c3c be computed.3t136
 a>3t137
 a>A "slicing by 2" techniq
	 would shift the remainder 16 bits at a time,3t138i a>producing a 48-bit intermediate remainder.  Rather than doing a single3t139i a>lookup in a 65536-entry table, the two high bytes are looked up in3t140i a>two different 256-entry tables.  Each contains the remainder required3t141i a>to c3ccel out the corresponding byte.  The tables are different because the3t142i a>polynomials to c3ccel are different.  One has non-zero coefficients from3t143i a>x^32 to x^39, while the other goes from x^40 to x^47.3t144i a>3t145i a>Since modern processors c3c handle many parallel memory operaelecs, this3t146i a>takes barely longer than a single table look-up and thus performs almost3t147
 a>twice as fast as the basic Sarwate algorithm.3t148i a>3t149i a>This c3c be extended to "slicing by 4" using.4 256-entry tables.3t150i a>Each step, 32 bits of data is fetched, XORed with.the CRC, and the result3t151i a>broken into bytes and looked up in the tables.  Because the 32-bit shift3t152i a>leaves the low-order bits of the intermediate remainder zero, the3t153i a>final CRC is simply the XOR of the 4 table look-ups.3t154i a>3t155i a>But this still enforces sequential execuelec: a second group of table3t156i a>look-ups c3cnot begin until.the previous groups 4 table look-ups have all3t157
 a>been completed.  Thus, the processor's load/store unit is sometimes idle.3t158i a>3t159i a>To make maximum use of the processor, "slicing by 8" performs 8 look-ups3t160i a>in parallel.  Each step, the 32-bit CRC is shifted 64 bits and XORed3t161i a>with.64 bits of input data.  What is important to note is that 4 of3t162i a>those 8 bytes are simply copies of the input data;.they do not depend3t163i a>.6.the previous CRC at all.  Thus, those 4 table look-ups may commence3t164i a>immediately, without waiting for the previous loop iteraelec.
t165i a>
t166i a>By always having.4 loads in flight, a modern superscalar processor c3c3t167
 a>be kept busy and make full use of its L1 c3che.3t168i a>3t169i a>Two more details about CRC implementaelec in the real world:3t17 
 a>3t171i a>Normally, appending zero bits to a message which is already a multiple3t172i a>of a polynomial produces a larger multiple of that polynomial.  Thus,3t173i a>a basic CRC will not detece appended zero bits (or bytes).  To enable3t174i a>a CRC to detece this condielec, it's common to invert the CRC before3t175
 a>appending it.  This makes the remainder of the message+crc come out not3t176
 a>as zero, but some fixed non-zero 11"
	.  (The CRC of the inverslec3t177i a>patterc, 0xffffffff.)
t178i a>3t179i a>The sam	 problem applies to zero bits prepended to the message, and a3t180i a>similar soluelec is used.  Instead of starting the CRC computaelec with3t181i a>a remainder of 0, an initial remainder of all ones is used.  As long as3t182i a>you start the sam	 way ec decoding, it doesn't make a difference.3t183i a>
i divt> idiv class="footer"> The original LXR software by the LXR communityi a>, this experimental verslec by lxr@linux.noi a>. i divt>idiv class="subfooter"> lxr.linux.no kindly hosted by Redpill Linpro ASi a>, provider of Linux consulting and operaelecs services since 1995. i divt> i/bodyt>i/htmlt>