linux/Documentation/crypto/descore-readme.txt
<<
=v33 v3/spa v3/form v3a =v33 v href="../linux+v3.7.4/Documentan 12/cryp1al22="v35 v3.13ion> /linux+img src v3.1.s6 "c/gfx/right.png" alt=">>">/l+v3.7.4//l+3.7. class="lxr_search">/lin/linux+input typt> hidden" namt> navtarget" elect> ">/linux+input typt> text" namt> search" id> search">/linux+butt submit">Search/linuxPv35s/Documa>/l+v3.7.4/al22=cumdiv4/al22=cuenta ac "v3="ajax+*" method="post" onsubmit="return false;">/l+input typt> hidden" namt> ajax_lookup" id> ajax_lookup" elect> ">/al22=cumentan /al22=cudiv class="headingbott search_results" class="search_results"/Do4/al22=cumdiv4/ udiv id> cont"16"4/ udiv id> file_cont"16s"n b6/4f/16282d90410380121ee76d9d7c19c3dbac70_3/0"n L1" class="line" namt> L1">2=c1uma>Below is the original README file from the valPre.shar package. L2" class="line" namt> L2">2=c2uma>------------------------------------------------------------------------------ L3" class="line" namt> L3">2=c3uma>/ L4" class="line" namt> L4">2=c4uma> va - fast & portable DES en"2pti L5" class="line" namt> L5">2=c5uma>Copyright (C) 1992 Dana L. How L6" class="line" namt> L6">2=c6uma>/ L7" class="line" namt> L7">2=c7uma>This program is free software; you c7. redistribute it and/or modify/ L8" class="line" namt> L8">2=c8uma>it under the terms of the GNU Library General Public License as published by/ L9" class="line" namt> L9">2=c9uma>the Free Software Found "v3; either vers L10" class="line" namt> L10">2=ptioa>(at your v3a ) any later vers L11" class="line" namt> L11">2=11uma>/ L12" class="line" namt> L12">2=12uma>This program is distributed in the hope that it will be useful,/ L13" class="line" namt> L13">2=13uma>but WITHOUT ANY WARRANTY; without even the implied warranty of/ L14" class="line" namt> L14">2=14uma>MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the/ L15" class="line" namt> L15">2=15uma>GNU Library General Public License for more vtails. L16" class="line" namt> L16">2=16uma>/ L17" class="line" namt> L17">2=17uma>You should have received a copy of the GNU Library General Public License/ L18" class="line" namt> L18">2=18uma>along with this program; if not, write to the Free Software/ L19" class="line" namt> L19">2=19uma>Found "v3, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. L20" class="line" namt> L20">2=20uma>/ L21" class="line" namt> L21">2=21uma>Author's address: how@isl.s6 nford.edu/ L22" class="line" namt> L22">2=22uma>/ L23" class="line" namt> L23">2=23uma>$Id: README,v 1.15 1992/05/20 00:25:32 how E $/ L24" class="line" namt> L24">2=24uma>/ L25" class="line" namt> L25">2=25uma>/ L26" class="line" namt> L26">2=26uma>==>> To compile after u16 rring/unsharring, just `make' <<==/ L27" class="line" namt> L27">2=27uma>/ L28" class="line" namt> L28">2=28uma>/ L29" class="line" namt> L29">2=29uma>This package was vaigned with the following goals: L30" class="line" namt> L30">2=30uma>1.al22=cHighest possible en"2pti L31" class="line" namt> L31">2=31uma>2.al22=cPORTABILITY to any byte-addressable host with a 32bit unaigned C typt L32" class="line" namt> L32">2=32uma>3.al22=cPlug-comp "ble replace="16 for KERBEROS's low-level routines. L33" class="line" namt> L33">2=33uma>/ L34" class="line" namt> L34">2=34uma>This second release inclu va a number of perentaance enhance="16s for/ L35" class="line" namt> L35">2=35uma>register-s6 rved machines. My discuss L36" class="line" namt> L36">2=36uma>7175v3ti4@compuserve.com, sparked a number of these enhance="16s. L37" class="line" namt> L37">2=37uma>/ L38" class="line" namt> L38">2=38uma>To more rapidly unders6 nd the code in this package, inspect vaSmallFips.i/ L39" class="line" namt> L39">2=39uma>(created by typing `make') BEFORE you tackle vaCode.h. The latter is set/ L40" class="line" namt> L40">2=40uma>up in a parameterized fash L41" class="line" namt> L41">2=41uma>hackers in pursuit of that last microsecond. You will find it more/ L42" class="line" namt> L42">2=42uma>illumin "ng to inspect one specific imple="16 "v3,/ L43" class="line" namt> L43">2=43uma> nd then move L44" class="line" namt> L44">2=44uma>/ L45" class="line" namt> L45">2=45uma>/ L46" class="line" namt> L46">2=46uma>perentaance comp ris L47" class="line" namt> L47">2=47uma>compile on a SPARCS6 "v3 1 (cc -O4, gcc -O2): L48" class="line" namt> L48">2=48uma>/ L49" class="line" namt> L49">2=49uma>thia code (byte-order independent): L50" class="line" namt> L50">2=50uma>2=c30us per en"2pti L51" class="line" namt> L51">2=51uma>2=c33us per en"2pti L52" class="line" namt> L52">2=52uma>2=c45us per en"2pti L53" class="line" namt> L53">2=53uma>2=c48us per en"2pti L54" class="line" namt> L54">2=54uma>2=275us to set a new key (uses 1k of key tables) L55" class="line" namt> L55">2=55uma>2=ccccccthia has the quickest en"2pti L56" class="line" namt> L56">2=56uma>2=ccccccsince i was interested in fast va filters rather than "2pti(3) L57" class="line" namt> L57">2=57uma>2=cccccc nd password "2acking, i haven't really bothered yet to speed up L58" class="line" namt> L58">2=58uma>2=ccccccthe key set "ng routine. also, i have no interest in re-imple="16"ng L59" class="line" namt> L59">2=59uma>2=cccccc ll the other junk in the mit kerberos va library, so i've just L60" class="line" namt> L60">2=60uma>2=ccccccprovided my routines with little stub interfaces so they c7. be/ L61" class="line" namt> L61">2=61uma>2=ccccccused as drop-in replace="16s with mit's code or any of the mit- L62" class="line" namt> L62">2=62uma>2=cccccccomp "ble packages below. (note that the firs6 two timings above L63" class="line" namt> L63">2=63uma>2=cccccc re highly v riable because of cache effects). L64" class="line" namt> L64">2=64uma>/ L65" class="line" namt> L65">2=65uma>kerberos va replace="16 from australia (vers L66" class="line" namt> L66">2=66uma>2=c53us per en"2pti L67" class="line" namt> L67">2=67uma>2=c96us to set a new key (uses 3 "pk of key tables) L68" class="line" namt> L68">2=68uma>2=ccccccso vapite the author's inclus L69" class="line" namt> L69">2=69uma>2=ccccccimprove="16s i had suggested to him, this package's/ L70" class="line" namt> L70">2=70uma>2=ccccccen"2pti L71" class="line" namt> L71">2=71uma>2=ccccccmore specifically, 19-40% slower on the 68020c nd 11-35% slower L72" class="line" namt> L72">2=72uma>2=ccccccon the sparc, depend"ng on the compiler; L73" class="line" namt> L73">2=73uma>2=ccccccin full gory vtail (ALT_ECB is a lib va v riant): L74" class="line" namt> L74">2=74uma>2=cccccccompiler2=ccccccmachine2=cccccc vaCore lib va ALT_ECB slower by/ L75" class="line" namt> L75">2=75uma>2=ccccccgcc 2.1 -O2cccccSun 3/110=cccccc304ccuS 369.5uS 461.8uS 22%/ L76" class="line" namt> L76">2=76uma>2=cccccccc -O1cccccSun 3/110=cccccc336ccuS 436.6uS 399.3uS 19%/ L77" class="line" namt> L77">2=77uma>2=cccccccc -O2cccccSun 3/110=cccccc360ccuS 53on uS 505.1uS 40%/ L78" class="line" namt> L78">2=78uma>2=cccccccc -O4cccccSun 3/110=cccccc365ccuS 53on3uS 505.3uS 38%/ L79" class="line" namt> L79">2=79uma>2=ccccccgcc 2.1 -O2cccccSun 4/50=cccccc=c48ccuS 53n uS =57.5uS 11%/ L80" class="line" namt> L80">2=80uma>2=cccccccc -O2cccccSun 4/50=cccccc=c48ccuS 64.6uS 64.7uS 35%/ L81" class="line" namt> L81">2=81uma>2=cccccccc -O4cccccSun 4/50=cccccc=c48ccuS 64.7uS 64.9uS 35%/ L82" class="line" namt> L82">2=82uma>2=cccccc(my time measure="16s re not as accurate as his). L83" class="line" namt> L83">2=83uma>2=cthe comm"16s in my firs6 release of vaCore L84" class="line" namt> L84">2=84uma>2=c68us per en"2pti L85" class="line" namt> L85">2=85uma>2=c96us to set a new key (uses 3 "pk of key tables) L86" class="line" namt> L86">2=86uma>2=ccccccthia is a eery nice package which imple="16s the most important L87" class="line" namt> L87">2=87uma>2=ccccccof the v3miz "v3s which i did in my en"2pti L88" class="line" namt> L88">2=88uma>2=ccccccit's a bit weak L89" class="line" namt> L89">2=89uma>2=ccccccit's 39%-106% slower. because he was interested in fast "2pti(3)c nd/ L90" class="line" namt> L90">2=90uma>2=ccccccpassword-"2acking applica "v3s, he also used the samt ideas to/ L91" class="line" namt> L91">2=91uma>2=ccccccspeed upcthe key-set "ng routines with impressive results. L92" class="line" namt> L92">2=92uma>2=cccccc(at some point i may o the samt in my package). he also imple="16s L93" class="line" namt> L93">2=93uma>2=ccccccthe rest of the mit va library. L94" class="line" namt> L94">2=94uma>2=cccccc(code from eay@psych.psy.uq.oz.au viaccomp.sources.misc) L95" class="line" namt> L95">2=95uma>/ L96" class="line" namt> L96">2=96uma>fast "2pti(3)cpackage from denmark: L97" class="line" namt> L97">2=97uma>2=ccccccthe va routine here is buried inside a loop to o the L98" class="line" namt> L98">2=98uma>2=ccccccc2pti func "v3c nd i didn't feel like ripping it out and measur"ng L99" class="line" namt> L99">2=99uma>2=ccccccperentaance. hia code takes 36 sparccinstruc "v3s to compute one/ L100" class="line" namt> L100">2100uma>2=cccccc va iter "v3; above, Quick (64k) takes 31 and Small (2k) takes 37. L101" class="line" namt> L101">2101uma>2=cccccch3 claims to use 280k of tables but the iter "v3 calcul "v3 seems L102" class="line" namt> L102">2102uma>2=ccccccto use only 128k. his tables and code re machine2independent. L103" class="line" namt> L103">2103uma>2=cccccc(code from glad@daimi.aau.dk viacalt.sources or comp.sources.misc) L104" class="line" namt> L104">2104uma>/ L105" class="line" namt> L105">2105uma>swedish reimple="16 "v3 of Kerberos va library/ L106" class="line" namt> L106">2106uma>2=108us per en"2pti L107" class="line" namt> L107">2107uma>2=134us to set a new key (uses 32k of key tables to get this speed!) L108" class="line" namt> L108">2108uma>2=ccccccthe tables used seem to be machine-independent;/ L109" class="line" namt> L109">2109uma>2=cccccch3 seems to have inclu vd a lot of special case code L110" class="line" namt> L110">2110uma>2=ccccccso that, e.g., `long' loads c7. be used instead of 4 `char' loads L111" class="line" namt> L111">2111uma>2=ccccccwhen the machine's architecture allowa it. L112" class="line" namt> L112">2112uma>2=cccccc(code obtained from chalmers.se:pubn va) L113" class="line" namt> L113">2113uma>/ L114" class="line" namt> L114">2114uma>"2ack 3.3ccpackage from england: L115" class="line" namt> L115">2115uma>2=ccccccas incc2pti above, the va routine is buried in a loop.cit's L116" class="line" namt> L116">2116uma>2=ccccccalso eery modified for "2pti. his iter "v3 code uses 16k L117" class="line" namt> L117">2117uma>2=ccccccof tables and appears to be slow. L118" class="line" namt> L118">2118uma>2=cccccc(code obtained from aem@aber.ac.uk viacalt.sources or comp.sources.misc) L119" class="line" namt> L119">2119uma>/ L120" class="line" namt> L120">2120uma>``highly v3mized'' nd tweaked Kerberos/Athena code (byte-order dependent): L121" class="line" namt> L121">2121uma>2=165us per en"2pti L122" class="line" namt> L122">2122uma>2=478us to set a new key (uses <1k of key tables) L123" class="line" namt> L123">2123uma>2=ccccccso vapite the comm"16s in thia code, it was possible to get L124" class="line" namt> L124">2124uma>2=ccccccfaster code AND smaller tables, as well as making the tables L125" class="line" namt> L125">2125uma>2=ccccccmachine-independent. L126" class="line" namt> L126">2126uma>2=cccccc(code obtained from prep.ai.mit.edu) L127" class="line" namt> L127">2127uma>/ L128" class="line" namt> L128">2128uma>UC Berkeley code (depends oncmachine-endedness): L129" class="line" namt> L129">2129uma>2=226us per en"2pti L130" class="line" namt> L130">2130uma>10848us to set a new key/ L131" class="line" namt> L131">2131uma>2=cccccctable sizes re unclear, but they on't look eery small/ L132" class="line" namt> L132">2132uma>2=cccccc(code obtained from wuarchive.wustl.edu) L133" class="line" namt> L133">2133uma>/ L134" class="line" namt> L134">2134uma>/ L135" class="line" namt> L135">2135uma>motiva "v3c nd hiatory/ L136" class="line" namt> L136">2136uma>/ L137" class="line" namt> L137">2137uma>a while ago i wanted some va routines and the routines dtue="16ed v3 sun's L138" class="line" namt> L138">2138uma>m7. pages either didn't exist or dumped core. i had heard of kerberos,/ L139" class="line" namt> L139">2139uma>and knew that it used des, so i figured i'd use ita routines. but once L140" class="line" namt> L140">2140uma>i got it and lookvd at the code,ccit really set off a lot of pet peevva - L141" class="line" namt> L141">2141uma>it was too convoluted, the code had been written without taking L142" class="line" namt> L142">2142uma>advantage of the regular struc ure of oper "v3s such as IP, E, and FP L143" class="line" namt> L143">2143uma>(i.e. the author didn't sit own and think before cod"ng),/ L144" class="line" namt> L144">2144uma>it was excessively slow, the author had attempted to clarify the code/ L145" class="line" namt> L145">2145uma>by adding MORE s6 e="16s to make the data move="16 more `consistent'/ L146" class="line" namt> L146">2146uma>instead of simplifying his imple="16 "v3 and cut "ng own v3 all data/ L147" class="line" namt> L147">2147uma>move="16 (in particul r, his use of L1, R1, L2, R2), and it was full of/ L148" class="line" namt> L148">2148uma>idiotic `tweaks' for particul r machines which failed to deliver significant L149" class="line" namt> L149">2149uma>speedups but which did obfuscate eeerything. so i took the test ata/ L150" class="line" namt> L150">2150uma>from his eerifica "v3 program and rewrote eeerything else. L151" class="line" namt> L151">2151uma>/ L152" class="line" namt> L152">2152uma>a while later i r7. across the great "2pti(3)cpackage ="16"oned above. L153" class="line" namt> L153">2153uma>the fact that this guy was computing 2 sboxes per table lookup rather L154" class="line" namt> L154">2154uma>than one (and using a MUCH larger table in the process) emboldened me to/ L155" class="line" namt> L155">2155uma> o the samt - it was a trivial change from which i had been scared away/ L156" class="line" namt> L156">2156uma>by the larger table size.ccin hia case he didn't realize you on't need to keep L157" class="line" namt> L157">2157uma>the working data in TWO forms, one for easy use of half the sboxes i// L158" class="line" namt> L158">2158uma>indexing, the other for easy use of the other half; instead you c7. keep L159" class="line" namt> L159">2159uma>it in the enta for the firs6 half and use a simple ro6 e to get the other L160" class="line" namt> L160">2160uma>half.ccthia means i have (almost) half the data manipul "v3 nd half L161" class="line" namt> L161">2161uma>the table size.ccin fairness though he might be en"od"ng something particul r L162" class="line" namt> L162">2162uma>to c2pti(3)cin hia tables - i didn't check. L163" class="line" namt> L163">2163uma>/ L164" class="line" namt> L164">2164uma>i'm glad that i imple="16ed it the way i did, because thia Cseers L165" class="line" namt> L165">2165uma>portable (the ifdef's are perentaance enhance="16s) and it is faster L166" class="line" namt> L166">2166uma>than eers L167" class="line" namt> L167">2167uma>/ L168" class="line" namt> L168">2168uma>/ L169" class="line" namt> L169">2169uma>porting notes L170" class="line" namt> L170">2170uma>/ L171" class="line" namt> L171">2171uma>one thing i did not want to o was write 7. enntaoua mess L172" class="line" namt> L172">2172uma>which depended v3 endedness and other machine2quirks,/ L173" class="line" namt> L173">2173uma> nd which necessarily produced differ"16 code nd differ"16 lookup tables L174" class="line" namt> L174">2174uma>for differ"16 machines. see the kerberos code for an example L175" class="line" namt> L175">2175uma>of what i didn't want to o;c ll their endedness-specific ` v3miz "v3s'/ L176" class="line" namt> L176">2176uma>obfuscate the code and in the end were slower than a simpler machine/ L177" class="line" namt> L177">2177uma>independent approach. howeeer, there re always some portability/ L178" class="line" namt> L178">2178uma>consider "v3s of some kind, and i have inclu vd some o v3a s/ L179" class="line" namt> L179">2179uma>for varying numbers of register v riables. L180" class="line" namt> L180">2180uma>perhaps some will still regard the result as a mess! L181" class="line" namt> L181">2181uma>/ L182" class="line" namt> L182">2182uma>1)ci asse=" eeerything is byte 7ddressable,calthough i on't actually/ L183" class="line" namt> L183">2183uma>2=cdepend on the byte order, and that bytes are 8 bits. L184" class="line" namt> L184">2184uma>2=ci asse=" word pointers c7. be freely c7st to and from char pointers. L185" class="line" namt> L185">2185uma>2=cnote that 99% of C programs make these asse= v3a s. L186" class="line" namt> L186">2186uma>2=ci always use unaigned char's if the high bit could be set. L187" class="line" namt> L187">2187uma>2) the typedef `word' means a 32 bit unaigned integral type. L188" class="line" namt> L188">2188uma>2=cif `unaigned long' is not 32 bits, change the typedef in vaCore.h. L189" class="line" namt> L189">2189uma>2=ci asse=" sizeof(word) == 4 EVERYWHERE. L190" class="line" namt> L190">2190uma>/ L191" class="line" namt> L191">2191uma>the (worst-case) cost of my NOT oing endedness-specific v3miz "v3s/ L192" class="line" namt> L192">2192uma>in the data loading and atoring code surrounding the key iter "v3s/ L193" class="line" namt> L193">2193uma>is less th7. 12%. also, there is the ad vd benefit that/ L194" class="line" namt> L194">2194uma>the input and output work areas do not need to b" word-aligned. L195" class="line" namt> L195">2195uma>/ L196" class="line" namt> L196">2196uma>/ L197" class="line" namt> L197">2197uma>OPTIONAL perentaance v3miz "v3s/ L198" class="line" namt> L198">2198uma>/ L199" class="line" namt> L199">2199uma>1)cyou should define2one of `i386,' `vax,' `mc68000,' or `sparc,'/ L200" class="line" namt> L200">2200uma>2=cwhicheeer one is closest to the capabilities of your machine. L201" class="line" namt> L201">2201uma>2=csee the s6 rt of vaCode.h to see exactly what this href=" L202" class="line" namt> L202">2202uma>2=cnote that ifcyou sref=" the wrong one, the va code will still work;/ L203" class="line" namt> L203">2203uma>2=cthese re just perentaance tweaks. L204" class="line" namt> L204">2204uma>2) for those with func "v3al `asm' keywords:cyou should change the L205" class="line" namt> L205">2205uma>2=cROR and ROL macros to use machine2ro6 e instruc "v3s ifcyou have them. L206" class="line" namt> L206">2206uma>2= this will save 2 instruc "v3s and a temporary per use,/ L207" class="line" namt> L207">2207uma>2= or about 32 to 40 instruc "v3s per en/de"2pti L208" class="line" namt> L208">2208uma>2=cnote that gcc is hm rt enough to translate the ROL/R macros into/ L209" class="line" namt> L209">2209uma>2=cmachine2ro6 es! L210" class="line" namt> L210">2210uma>/ L211" class="line" namt> L211">2211uma>these v3miz "v3s re all rather persnickety, yet with themcyou should/ L212" class="line" namt> L212">2212uma>b" able to get perentaance equal to assembly-"od"ng, except that: L213" class="line" namt> L213">2213uma>1)cwith the lack of a bit ro6 e oper or in C,2ro6 es have to be synthesized L214" class="line" namt> L214">2214uma>2=cfrom shifts. so access to `asm' will speed things upcif your machine L215" class="line" namt> L215">2215uma>2=chas ro6 es, as explained above in (3)c(not necessary ifcyou use gcc). L216" class="line" namt> L216">2216uma>2) if your machinechas less th7. 12 32-bit registers i oubt your compiler2will L217" class="line" namt> L217">2217uma>2=cgenera e good code. L218" class="line" namt> L218">2218uma>2=c`i386' tries to configure the code for a 386 by only de"laring 3 registers L219" class="line" namt> L219">2219uma>2=c(it appears that gcc c7. use ebx, esi and edi to hold register v riables). L220" class="line" namt> L220">2220uma>2=choweeer, ifcyou like assembly "od"ng, the 386 does have 7 32-bit registers,/ L221" class="line" namt> L221">2221uma>2= and ifcyou use ALL of them, use `scaled by 8' ddress modes with displace="16/ L222" class="line" namt> L222">2222uma>2= and other tricks, you c7. get reasv3able routines for DesQuickCore... with/ L223" class="line" namt> L223">2223uma>2=cabout 250 instruc "v3s apiece. For DesSmall... it will help to rearrange L224" class="line" namt> L224">2224uma>2=c va_keymap, i.e., now the sbox # is the high p rt of the indexc nd/ L225" class="line" namt> L225">2225uma>2=cthe 6 bits of ata is the low p rt; it helps to exchange these. L226" class="line" namt> L226">2226uma>2=csince i have no way to conveni"16ly test it i have notcprovided my L227" class="line" namt> L227">2227uma>2=cshoehorned 386 eers L228" class="line" namt> L228">2228uma>2=cto put eeerything in registers(!), and genera e about 370 instruc "v3s apiece L229" class="line" namt> L229">2229uma>2= for the DesQuickCore... routines! L230" class="line" namt> L230">2230uma>/ L231" class="line" namt> L231">2231uma>"od"ng notes L232" class="line" namt> L232">2232uma>/ L233" class="line" namt> L233">2233uma>the e//de"2pti L234" class="line" namt> L234">2234uma>with 4 being ac "vely used at once dur"ng the inner iter "v3s. L235" class="line" namt> L235">2235uma>ifcyou on't have 4 register v riables get a new machine. L236" class="line" namt> L236">2236uma>up to 8 more registers re used to hold constants in some configur "v3s. L237" class="line" namt> L237">2237uma>/ L238" class="line" namt> L238">2238uma>i asse=" that the use of a constant is more expensive th7. using a register: L239" class="line" namt> L239">2239uma>a) addi "v3ally, i have tried to put the larger constants in registers. L240" class="line" namt> L240">2240uma>2=cregistering priority was by the following: L241" class="line" namt> L241">2241uma>2=ccccccanything more th7. 12 bits (bad for RISC and CISC) L242" class="line" namt> L242">2242uma>2=ccccccgreater than 127 in value (c7.'t use movq or byte immedi e on CISC) L243" class="line" namt> L243">2243uma>2=cccccc9-127 (may notcb" able to use CISC shift immedi e or add/sub quick),/ L244" class="line" namt> L244">2244uma>2=cccccc1-8 were neeer registered, be"ng the cheapest constants. L245" class="line" namt> L245">2245uma>b) the compiler may be too stupid to realize table and table+256 should/ L246" class="line" namt> L246">2246uma>2=cbe assigned to iffer"16 constant registers nd instead repeti "vely/ L247" class="line" namt> L247">2247uma>2=c o the rithmetic, so i assigncthese to explicit `m' register v riables/ L248" class="line" namt> L248">2248uma>2=cwhen possible nd helpful. L249" class="line" namt> L249">2249uma>/ L250" class="line" namt> L250">2250uma>i asse=" that indexing is cheaper or equivalent to auto incre="16/de"2e="16,/ L251" class="line" namt> L251">2251uma>where the indexcis 7 bits unaigned or smaller. L252" class="line" namt> L252">2252uma>this asse= v3a is reeersed for 68k nd vax. L253" class="line" namt> L253">2253uma>/ L254" class="line" namt> L254">2254uma>i asse=" that ddresses c7. be cheaply formed from two registers,/ L255" class="line" namt> L255">2255uma>or from a register and a small constant. L256" class="line" namt> L256">2256uma>for the 68000, the `two registers and amall offset' form is used sparingly. L257" class="line" namt> L257">2257uma>all indexcscaling is done explicitly - no hidden shifts by log2(sizeof). L258" class="line" namt> L258">2258uma>/ L259" class="line" namt> L259">2259uma>the code is written so that eeen a dumb compiler/ L260" class="line" namt> L260">2260uma>should neeer need more th7. one hidden temporary,/ L261" class="line" namt> L261">2261uma>increas"ng the chance that eeerything will fit in the registers. L262" class="line" namt> L262">2262uma>KEEP THIS MORE SUBTLE POINT IN MIND IF YOU REWRITE ANYTHING. L263" class="line" namt> L263">2263uma>(actually, there re some code frag="16s now which do require two temps,/ L264" class="line" namt> L264">2264uma>but fixing it would either break the s6ruc ure of the macros or/ L265" class="line" namt> L265">2265uma>require de"laring another temporary). L266" class="line" namt> L266">2266uma>/ L267" class="line" namt> L267">2267uma>/ L268" class="line" namt> L268">2268uma>special efficient ata entaa6/ L269" class="line" namt> L269">2269uma>/ L270" class="line" namt> L270">2270uma>bits re manipul ed in thia arrangem"16 most of the time (S7 S5 S3 S1): L271" class="line" namt> L271">2271uma>2=cccccc003130292827xxxx242322212019xxxx161514131211xxxx080706050403xxxx L272" class="line" namt> L272">2272uma>(the x bits re still there, i'm just emphas"zing where the S boxes re). L273" class="line" namt> L273">2273uma>bits re ro6 ed left 4cwhen computing S6 S4 S2 S0: L274" class="line" namt> L274">2274uma>2=cccccc282726252423xxxx201918171615xxxx121110090807xxxx040302010031xxxx L275" class="line" namt> L275">2275uma>the rightmost two bits re usually clearvd so the lower byte c7. be used L276" class="line" namt> L276">2276uma>as an indexcinto an sbox mapping table. the next two x'd bits re set L277" class="line" namt> L277">2277uma>to v rioua values to access iffer"16 p rts of the tables. L278" class="line" namt> L278">2278uma>/ L279" class="line" namt> L279">2279uma>/ L280" class="line" namt> L280">2280uma>how to use the routines/ L281" class="line" namt> L281">2281uma>/ L282" class="line" namt> L282">2282uma> atatypes: L283" class="line" namt> L283">2283uma>2=ccccccpointer to 8 byte 7rea of type DesData/ L284" class="line" namt> L284">2284uma>2=ccccccused to hold keys nd input/output blocks to des. L285" class="line" namt> L285">2285uma>/ L286" class="line" namt> L286">2286uma>2=ccccccpointer to 128 byte 7rea of type DesKeys/ L287" class="line" namt> L287">2287uma>2=ccccccused to hold full 768-bit key. L288" class="line" namt> L288">2288uma>2=ccccccmust be long-aligned. L289" class="line" namt> L289">2289uma>/ L290" class="line" namt> L290">2290uma>DesQuickInit() L291" class="line" namt> L291">2291uma>2=ccccccc ll this before using any other routine with `Quick' in ita namt. L292" class="line" namt> L292">2292uma>2=ccccccit genera es the special 64k table these routines need. L293" class="line" namt> L293">2293uma>DesQuickDone() L294" class="line" namt> L294">2294uma>2=ccccccfrees this table L295" class="line" namt> L295">2295uma>/ L296" class="line" namt> L296">2296uma>DesMethod(m, k) L297" class="line" namt> L297">2297uma>2=ccccccmcpoints to a 128byte block, kcpoints to an 8 byte des key/ L298" class="line" namt> L298">2298uma>2=ccccccwhich must have odd p rity (or -1 is returned) and which must/ L299" class="line" namt> L299">2299uma>2=ccccccnotcb" a (semi-)weak key (or -2 is returned). L300" class="line" namt> L300">2300uma>2=ccccccnntaally DesMethod() returns 0. L301" class="line" namt> L301">2301uma>2=ccccccm is filled in from k so that when one of the routines below L302" class="line" namt> L302">2302uma>2=ccccccia called with m, the routine will act like standard des/ L303" class="line" namt> L303">2303uma>2=cccccce//de"2pti L304" class="line" namt> L304">2304uma>2=ccccccyou supply a standard 56bit key;choweeer, ifcyou fill i// L305" class="line" namt> L305">2305uma>2=ccccccm yourself, you will get a 768bit key - but then it wo.'t/ L306" class="line" namt> L306">2306uma>2=ccccccbe standard.ccit's 768bits not 1024 because the le7st significant L307" class="line" namt> L307">2307uma>2=cccccctwo bits of each byte 7re not used.=cnote that these two bits L308" class="line" namt> L308">2308uma>2=ccccccwill be set to magic constants which speed up the e/"2pti L309" class="line" namt> L309">2309uma>2=ccccccon some machines. and yes, each byte controls L310" class="line" namt> L310">2310uma>2=cccccca specific sbox dur"ng a specific iter "v3. L311" class="line" namt> L311">2311uma>2=ccccccyou really should.'t use the 768bit entaa6 directly;=ci should/ L312" class="line" namt> L312">2312uma>2=ccccccprovide a routine that converts 128 6-bit bytes (specified in/ L313" class="line" namt> L313">2313uma>2=ccccccS-box mapping order or something)cinto the right entaa6 for you. L314" class="line" namt> L314">2314uma>2=ccccccthis would e16 il some byte concaten "v3 and rot "v3. L315" class="line" namt> L315">2315uma>/ L316" class="line" namt> L316">2316uma>Des{Small|Quick}{Fips|Core}{E/"2pti|De"2pti}(d, m, s) L317" class="line" namt> L317">2317uma>2=ccccccperentas va on the 8 bytes at scinto the 8 bytes at d.=(d,s: char *). L318" class="line" namt> L318">2318uma>2=ccccccuses m as a 768bit key as explained above. L319" class="line" namt> L319">2319uma>2=ccccccthe E/"2pti|De"2pti choice is obvioua. L320" class="line" namt> L320">2320uma>2=ccccccFips|Core vtermines whether a complvtely standard FIPScinitial L321" class="line" namt> L321">2321uma>2=ccccccand fi3al permu6 "v3 is done; ifcnot, then the data is loaded L322" class="line" namt> L322">2322uma>2=ccccccand atored in a nonstandard bit order (FIPScw/o IP/FP). L323" class="line" namt> L323">2323uma>2=ccccccFips slows own Quick by 10%, Small by 9%. L324" class="line" namt> L324">2324uma>2=ccccccSmall|Quick vtermines whether you use the nntaal routine L325" class="line" namt> L325">2325uma>2=ccccccor the crazy quick one which gobbles up 64k more of memory. L326" class="line" namt> L326">2326uma>2=ccccccSmall is 50% slower then Quick, but Quick needs 32 times as much/ L327" class="line" namt> L327">2327uma>2=ccccccmemory. Quick is inclu vd for programs that do nothing but DES,/ L328" class="line" namt> L328">2328uma>2=cccccce.g., en"2pti L329" class="line" namt> L329">2329uma>/ L330" class="line" namt> L330">2330uma>/ L331" class="line" namt> L331">2331uma>Get "ng it to compile on your machine L332" class="line" namt> L332">2332uma>/ L333" class="line" namt> L333">2333uma>there 7re nocmachine-dependencies in the code (see porting),/ L334" class="line" namt> L334">2334uma>except perhaps the `now()' macro in vaTest.c. L335" class="line" namt> L335">2335uma>ALL genera ed tables re machinecindependent. L336" class="line" namt> L336">2336uma>you should edit the Makefile with the appropri e o v3miz "v3 flags L337" class="line" namt> L337">2337uma>for your compiler2(MAX o v3miz "v3). L338" class="line" namt> L338">2338uma>/ L339" class="line" namt> L339">2339uma>/ L340" class="line" namt> L340">2340uma>Speed"ng up kerberos (and/or its va library)/ L341" class="line" namt> L341">2341uma>/ L342" class="line" namt> L342">2342uma>note that i have inclu vd a kerberos-comp "ble interface in vaUtil.c/ L343" class="line" namt> L343">2343uma>through the func "v3sc va_key_sched() nd dva_ecb_en"2pti(). L344" class="line" namt> L344">2344uma>to use these with kerberos or kerberos-comp "ble code put dvaCore.a/ L345" class="line" namt> L345">2345uma>ahead of the kerberos-comp "ble library on your linker's command line. L346" class="line" namt> L346">2346uma>you should not need to #inclu v vaCore.h; just inclu v the header/ L347" class="line" namt> L347">2347uma>file provided with the kerberos library. L348" class="line" namt> L348">2348uma>/ L349" class="line" namt> L349">2349uma>Other uses/ L350" class="line" namt> L350">2350uma>/ L351" class="line" namt> L351">2351uma>the macros in vaCode.h would be eery useful for put "ng inline des/ L352" class="line" namt> L352">2352uma>func "v3scin more complica ed en"2pti L353" class="line" namt> L353">2353uma>
lxr.linux.no kindly hosted by Redpill Linpro ASuma>,cprovider of Linux consulting and oper "v3s servicva since 1995.