linux-old/Documentation/DMA-mapping.txt
<<
>>
Prefs
   1                        Dynamic DMA mapping
   2                        ===================
   3
   4                 David S. Miller <davem@redhat.com>
   5                 Richard Henderson <rth@cygnus.com>
   6                  Jakub Jelinek <jakub@redhat.com>
   7
   8Most of the 64bit platforms have special hardware that translates bus
   9addresses (DMA addresses) into physical addresses.  This is similar to
  10how page tables and/or a TLB translates virtual addresses to physical
  11addresses on a CPU.  This is needed so that e.g. PCI devices can
  12access with a Single Address Cycle (32bit DMA address) any page in the
  1364bit physical address space.  Previously in Linux those 64bit
  14platforms had to set artificial limits on the maximum RAM size in the
  15system, so that the virt_to_bus() static scheme works (the DMA address
  16translation tables were simply filled on bootup to map each bus
  17address to the physical page __pa(bus_to_virt())).
  18
  19So that Linux can use the dynamic DMA mapping, it needs some help from the
  20drivers, namely it has to take into account that DMA addresses should be
  21mapped only for the time they are actually used and unmapped after the DMA
  22transfer.
  23
  24The following API will work of course even on platforms where no such
  25hardware exists, see e.g. include/asm-i386/pci.h for how it is implemented on
  26top of the virt_to_bus interface.
  27
  28First of all, you should make sure
  29
  30#include <linux/pci.h>
  31
  32is in your driver. This file will obtain for you the definition of the
  33dma_addr_t (which can hold any valid DMA address for the platform)
  34type which should be used everywhere you hold a DMA (bus) address
  35returned from the DMA mapping functions.
  36
  37                         What memory is DMA'able?
  38
  39The first piece of information you must know is what kernel memory can
  40be used with the DMA mapping facilities.  There has been an unwritten
  41set of rules regarding this, and this text is an attempt to finally
  42write them down.
  43
  44If you acquired your memory via the page allocator
  45(i.e. __get_free_page*()) or the generic memory allocators
  46(i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from
  47that memory using the addresses returned from those routines.
  48
  49This means specifically that you may _not_ use the memory/addresses
  50returned from vmalloc() for DMA.  It is possible to DMA to the
  51_underlying_ memory mapped into a vmalloc() area, but this requires
  52walking page tables to get the physical addresses, and then
  53translating each of those pages back to a kernel address using
  54something like __va().  [ EDIT: Update this when we integrate
  55Gerd Knorr's generic code which does this. ]
  56
  57This rule also means that you may not use kernel image addresses
  58(ie. items in the kernel's data/text/bss segment, or your driver's)
  59nor may you use kernel stack addresses for DMA.  Both of these items
  60might be mapped somewhere entirely different than the rest of physical
  61memory.
  62
  63Also, this means that you cannot take the return of a kmap()
  64call and DMA to/from that.  This is similar to vmalloc().
  65
  66What about block I/O and networking buffers?  The block I/O and
  67networking subsystems make sure that the buffers they use are valid
  68for you to DMA from/to.
  69
  70                        DMA addressing limitations
  71
  72Does your device have any DMA addressing limitations?  For example, is
  73your device only capable of driving the low order 24-bits of address
  74on the PCI bus for SAC DMA transfers?  If so, you need to inform the
  75PCI layer of this fact.
  76
  77By default, the kernel assumes that your device can address the full
  7832-bits in a SAC cycle.  For a 64-bit DAC capable device, this needs
  79to be increased.  And for a device with limitations, as discussed in
  80the previous paragraph, it needs to be decreased.
  81
  82For correct operation, you must interrogate the PCI layer in your
  83device probe routine to see if the PCI controller on the machine can
  84properly support the DMA addressing limitation your device has.  It is
  85good style to do this even if your device holds the default setting,
  86because this shows that you did think about these issues wrt. your
  87device.
  88
  89The query is performed via a call to pci_set_dma_mask():
  90
  91        int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask);
  92
  93Here, pdev is a pointer to the PCI device struct of your device, and
  94device_mask is a bit mask describing which bits of a PCI address your
  95device supports.  It returns zero if your card can perform DMA
  96properly on the machine given the address mask you provided.
  97
  98If it returns non-zero, your device can not perform DMA properly on
  99this platform, and attempting to do so will result in undefined
 100behavior.  You must either use a different mask, or not use DMA.
 101
 102This means that in the failure case, you have three options:
 103
 1041) Use another DMA mask, if possible (see below).
 1052) Use some non-DMA mode for data transfer, if possible.
 1063) Ignore this device and do not initialize it.
 107
 108It is recommended that your driver print a kernel KERN_WARNING message
 109when you end up performing either #2 or #3.  In this manner, if a user
 110of your driver reports that performance is bad or that the device is not
 111even detected, you can ask them for the kernel messages to find out
 112exactly why.
 113
 114The standard 32-bit addressing PCI device would do something like
 115this:
 116
 117        if (pci_set_dma_mask(pdev, 0xffffffff)) {
 118                printk(KERN_WARNING
 119                       "mydev: No suitable DMA available.\n");
 120                goto ignore_this_device;
 121        }
 122
 123Another common scenario is a 64-bit capable device.  The approach
 124here is to try for 64-bit DAC addressing, but back down to a
 12532-bit mask should that fail.  The PCI platform code may fail the
 12664-bit mask not because the platform is not capable of 64-bit
 127addressing.  Rather, it may fail in this case simply because
 12832-bit SAC addressing is done more efficiently than DAC addressing.
 129Sparc64 is one platform which behaves in this way.
 130
 131Here is how you would handle a 64-bit capable device which can drive
 132all 64-bits during a DAC cycle:
 133
 134        int using_dac;
 135
 136        if (!pci_set_dma_mask(pdev, 0xffffffffffffffff)) {
 137                using_dac = 1;
 138        } else if (!pci_set_dma_mask(pdev, 0xffffffff)) {
 139                using_dac = 0;
 140        } else {
 141                printk(KERN_WARNING
 142                       "mydev: No suitable DMA available.\n");
 143                goto ignore_this_device;
 144        }
 145
 146If your 64-bit device is going to be an enormous consumer of DMA
 147mappings, this can be problematic since the DMA mappings are a
 148finite resource on many platforms.  Please see the "DAC Addressing
 149for Address Space Hungry Devices" section near the end of this
 150document for how to handle this case.
 151
 152Finally, if your device can only drive the low 24-bits of
 153address during PCI bus mastering you might do something like:
 154
 155        if (pci_set_dma_mask(pdev, 0x00ffffff)) {
 156                printk(KERN_WARNING
 157                       "mydev: 24-bit DMA addressing not available.\n");
 158                goto ignore_this_device;
 159        }
 160
 161When pci_set_dma_mask() is successful, and returns zero, the PCI layer
 162saves away this mask you have provided.  The PCI layer will use this
 163information later when you make DMA mappings.
 164
 165There is a case which we are aware of at this time, which is worth
 166mentioning in this documentation.  If your device supports multiple
 167functions (for example a sound card provides playback and record
 168functions) and the various different functions have _different_
 169DMA addressing limitations, you may wish to probe each mask and
 170only provide the functionality which the machine can handle.  It
 171is important that the last call to pci_set_dma_mask() be for the 
 172most specific mask.
 173
 174Here is pseudo-code showing how this might be done:
 175
 176        #define PLAYBACK_ADDRESS_BITS   0xffffffff
 177        #define RECORD_ADDRESS_BITS     0x00ffffff
 178
 179        struct my_sound_card *card;
 180        struct pci_dev *pdev;
 181
 182        ...
 183        if (pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) {
 184                card->playback_enabled = 1;
 185        } else {
 186                card->playback_enabled = 0;
 187                printk(KERN_WARN "%s: Playback disabled due to DMA limitations.\n",
 188                       card->name);
 189        }
 190        if (pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) {
 191                card->record_enabled = 1;
 192        } else {
 193                card->record_enabled = 0;
 194                printk(KERN_WARN "%s: Record disabled due to DMA limitations.\n",
 195                       card->name);
 196        }
 197
 198A sound card was used as an example here because this genre of PCI
 199devices seems to be littered with ISA chips given a PCI front end,
 200and thus retaining the 16MB DMA addressing limitations of ISA.
 201
 202                        Types of DMA mappings
 203
 204There are two types of DMA mappings:
 205
 206- Consistent DMA mappings which are usually mapped at driver
 207  initialization, unmapped at the end and for which the hardware should
 208  guarantee that the device and the CPU can access the data
 209  in parallel and will see updates made by each other without any
 210  explicit software flushing.
 211
 212  Think of "consistent" as "synchronous" or "coherent".
 213
 214  Consistent DMA mappings are always SAC addressable.  That is
 215  to say, consistent DMA addresses given to the driver will always
 216  be in the low 32-bits of the PCI bus space.
 217
 218  Good examples of what to use consistent mappings for are:
 219
 220        - Network card DMA ring descriptors.
 221        - SCSI adapter mailbox command data structures.
 222        - Device firmware microcode executed out of
 223          main memory.
 224
 225  The invariant these examples all require is that any CPU store
 226  to memory is immediately visible to the device, and vice
 227  versa.  Consistent mappings guarantee this.
 228
 229  IMPORTANT: Consistent DMA memory does not preclude the usage of
 230             proper memory barriers.  The CPU may reorder stores to
 231             consistent memory just as it may normal memory.  Example:
 232             if it is important for the device to see the first word
 233             of a descriptor updated before the second, you must do
 234             something like:
 235
 236                desc->word0 = address;
 237                wmb();
 238                desc->word1 = DESC_VALID;
 239
 240             in order to get correct behavior on all platforms.
 241
 242- Streaming DMA mappings which are usually mapped for one DMA transfer,
 243  unmapped right after it (unless you use pci_dma_sync below) and for which
 244  hardware can optimize for sequential accesses.
 245
 246  This of "streaming" as "asynchronous" or "outside the coherency
 247  domain".
 248
 249  Good examples of what to use streaming mappings for are:
 250
 251        - Networking buffers transmitted/received by a device.
 252        - Filesystem buffers written/read by a SCSI device.
 253
 254  The interfaces for using this type of mapping were designed in
 255  such a way that an implementation can make whatever performance
 256  optimizations the hardware allows.  To this end, when using
 257  such mappings you must be explicit about what you want to happen.
 258
 259Neither type of DMA mapping has alignment restrictions that come
 260from PCI, although some devices may have such restrictions.
 261
 262                 Using Consistent DMA mappings.
 263
 264To allocate and map large (PAGE_SIZE or so) consistent DMA regions,
 265you should do:
 266
 267        dma_addr_t dma_handle;
 268
 269        cpu_addr = pci_alloc_consistent(dev, size, &dma_handle);
 270
 271where dev is a struct pci_dev *. You should pass NULL for PCI like buses
 272where devices don't have struct pci_dev (like ISA, EISA).  This may be
 273called in interrupt context. 
 274
 275This argument is needed because the DMA translations may be bus
 276specific (and often is private to the bus which the device is attached
 277to).
 278
 279Size is the length of the region you want to allocate, in bytes.
 280
 281This routine will allocate RAM for that region, so it acts similarly to
 282__get_free_pages (but takes size instead of a page order).  If your
 283driver needs regions sized smaller than a page, you may prefer using
 284the pci_pool interface, described below.
 285
 286The consistent DMA mapping interfaces, for non-NULL dev, will always
 287return a DMA address which is SAC (Single Address Cycle) addressable.
 288Even if the device indicates (via PCI dma mask) that it may address
 289the upper 32-bits and thus perform DAC cycles, consistent allocation
 290will still only return 32-bit PCI addresses for DMA.  This is true
 291of the pci_pool interface as well.
 292
 293In fact, as mentioned above, all consistent memory provided by the
 294kernel DMA APIs are always SAC addressable.
 295
 296pci_alloc_consistent returns two values: the virtual address which you
 297can use to access it from the CPU and dma_handle which you pass to the
 298card.
 299
 300The cpu return address and the DMA bus master address are both
 301guaranteed to be aligned to the smallest PAGE_SIZE order which
 302is greater than or equal to the requested size.  This invariant
 303exists (for example) to guarantee that if you allocate a chunk
 304which is smaller than or equal to 64 kilobytes, the extent of the
 305buffer you receive will not cross a 64K boundary.
 306
 307To unmap and free such a DMA region, you call:
 308
 309        pci_free_consistent(dev, size, cpu_addr, dma_handle);
 310
 311where dev, size are the same as in the above call and cpu_addr and
 312dma_handle are the values pci_alloc_consistent returned to you.
 313This function may not be called in interrupt context.
 314
 315If your driver needs lots of smaller memory regions, you can write
 316custom code to subdivide pages returned by pci_alloc_consistent,
 317or you can use the pci_pool API to do that.  A pci_pool is like
 318a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages.
 319Also, it understands common hardware constraints for alignment,
 320like queue heads needing to be aligned on N byte boundaries.
 321
 322Create a pci_pool like this:
 323
 324        struct pci_pool *pool;
 325
 326        pool = pci_pool_create(name, dev, size, align, alloc, flags);
 327
 328The "name" is for diagnostics (like a kmem_cache name); dev and size
 329are as above.  The device's hardware alignment requirement for this
 330type of data is "align" (which is expressed in bytes, and must be a
 331power of two).  The flags are SLAB_ flags as you'd pass to
 332kmem_cache_create.  Not all flags are understood, but SLAB_POISON may
 333help you find driver bugs.  If you call this in a non- sleeping
 334context (f.e. in_interrupt is true or while holding SMP locks), pass
 335SLAB_ATOMIC.  If your device has no boundary crossing restrictions,
 336pass 0 for alloc; passing 4096 says memory allocated from this pool
 337must not cross 4KByte boundaries (but at that time it may be better to
 338go for pci_alloc_consistent directly instead).
 339
 340Allocate memory from a pci pool like this:
 341
 342        cpu_addr = pci_pool_alloc(pool, flags, &dma_handle);
 343
 344flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor
 345holding SMP locks), SLAB_ATOMIC otherwise.  Like pci_alloc_consistent,
 346this returns two values, cpu_addr and dma_handle.
 347
 348Free memory that was allocated from a pci_pool like this:
 349
 350        pci_pool_free(pool, cpu_addr, dma_handle);
 351
 352where pool is what you passed to pci_pool_alloc, and cpu_addr and
 353dma_handle are the values pci_pool_alloc returned. This function
 354may be called in interrupt context.
 355
 356Destroy a pci_pool by calling:
 357
 358        pci_pool_destroy(pool);
 359
 360Make sure you've called pci_pool_free for all memory allocated
 361from a pool before you destroy the pool. This function may not
 362be called in interrupt context.
 363
 364                        DMA Direction
 365
 366The interfaces described in subsequent portions of this document
 367take a DMA direction argument, which is an integer and takes on
 368one of the following values:
 369
 370 PCI_DMA_BIDIRECTIONAL
 371 PCI_DMA_TODEVICE
 372 PCI_DMA_FROMDEVICE
 373 PCI_DMA_NONE
 374
 375One should provide the exact DMA direction if you know it.
 376
 377PCI_DMA_TODEVICE means "from main memory to the PCI device"
 378PCI_DMA_FROMDEVICE means "from the PCI device to main memory"
 379It is the direction in which the data moves during the DMA
 380transfer.
 381
 382You are _strongly_ encouraged to specify this as precisely
 383as you possibly can.
 384
 385If you absolutely cannot know the direction of the DMA transfer,
 386specify PCI_DMA_BIDIRECTIONAL.  It means that the DMA can go in
 387either direction.  The platform guarantees that you may legally
 388specify this, and that it will work, but this may be at the
 389cost of performance for example.
 390
 391The value PCI_DMA_NONE is to be used for debugging.  One can
 392hold this in a data structure before you come to know the
 393precise direction, and this will help catch cases where your
 394direction tracking logic has failed to set things up properly.
 395
 396Another advantage of specifying this value precisely (outside of
 397potential platform-specific optimizations of such) is for debugging.
 398Some platforms actually have a write permission boolean which DMA
 399mappings can be marked with, much like page protections in the user
 400program address space.  Such platforms can and do report errors in the
 401kernel logs when the PCI controller hardware detects violation of the
 402permission setting.
 403
 404Only streaming mappings specify a direction, consistent mappings
 405implicitly have a direction attribute setting of
 406PCI_DMA_BIDIRECTIONAL.
 407
 408The SCSI subsystem provides mechanisms for you to easily obtain
 409the direction to use, in the SCSI command:
 410
 411        scsi_to_pci_dma_dir(SCSI_DIRECTION)
 412
 413Where SCSI_DIRECTION is obtained from the 'sc_data_direction'
 414member of the SCSI command your driver is working on.  The
 415mentioned interface above returns a value suitable for passing
 416into the streaming DMA mapping interfaces below.
 417
 418For Networking drivers, it's a rather simple affair.  For transmit
 419packets, map/unmap them with the PCI_DMA_TODEVICE direction
 420specifier.  For receive packets, just the opposite, map/unmap them
 421with the PCI_DMA_FROMDEVICE direction specifier.
 422
 423                  Using Streaming DMA mappings
 424
 425The streaming DMA mapping routines can be called from interrupt
 426context.  There are two versions of each map/unmap, one which will
 427map/unmap a single memory region, and one which will map/unmap a
 428scatterlist.
 429
 430To map a single region, you do:
 431
 432        struct pci_dev *pdev = mydev->pdev;
 433        dma_addr_t dma_handle;
 434        void *addr = buffer->ptr;
 435        size_t size = buffer->len;
 436
 437        dma_handle = pci_map_single(dev, addr, size, direction);
 438
 439and to unmap it:
 440
 441        pci_unmap_single(dev, dma_handle, size, direction);
 442
 443You should call pci_unmap_single when the DMA activity is finished, e.g.
 444from the interrupt which told you that the DMA transfer is done.
 445
 446Using cpu pointers like this for single mappings has a disadvantage,
 447you cannot reference HIGHMEM memory in this way.  Thus, there is a
 448map/unmap interface pair akin to pci_{map,unmap}_single.  These
 449interfaces deal with page/offset pairs instead of cpu pointers.
 450Specifically:
 451
 452        struct pci_dev *pdev = mydev->pdev;
 453        dma_addr_t dma_handle;
 454        struct page *page = buffer->page;
 455        unsigned long offset = buffer->offset;
 456        size_t size = buffer->len;
 457
 458        dma_handle = pci_map_page(dev, page, offset, size, direction);
 459
 460        ...
 461
 462        pci_unmap_page(dev, dma_handle, size, direction);
 463
 464Here, "offset" means byte offset within the given page.
 465
 466With scatterlists, you map a region gathered from several regions by:
 467
 468        int i, count = pci_map_sg(dev, sglist, nents, direction);
 469        struct scatterlist *sg;
 470
 471        for (i = 0, sg = sglist; i < count; i++, sg++) {
 472                hw_address[i] = sg_dma_address(sg);
 473                hw_len[i] = sg_dma_len(sg);
 474        }
 475
 476where nents is the number of entries in the sglist.
 477
 478The implementation is free to merge several consecutive sglist entries
 479into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any
 480consecutive sglist entries can be merged into one provided the first one
 481ends and the second one starts on a page boundary - in fact this is a huge
 482advantage for cards which either cannot do scatter-gather or have very
 483limited number of scatter-gather entries) and returns the actual number
 484of sg entries it mapped them to.
 485
 486Then you should loop count times (note: this can be less than nents times)
 487and use sg_dma_address() and sg_dma_len() macros where you previously
 488accessed sg->address and sg->length as shown above.
 489
 490To unmap a scatterlist, just call:
 491
 492        pci_unmap_sg(dev, sglist, nents, direction);
 493
 494Again, make sure DMA activity has already finished.
 495
 496PLEASE NOTE:  The 'nents' argument to the pci_unmap_sg call must be
 497              the _same_ one you passed into the pci_map_sg call,
 498              it should _NOT_ be the 'count' value _returned_ from the
 499              pci_map_sg call.
 500
 501Every pci_map_{single,sg} call should have its pci_unmap_{single,sg}
 502counterpart, because the bus address space is a shared resource (although
 503in some ports the mapping is per each BUS so less devices contend for the
 504same bus address space) and you could render the machine unusable by eating
 505all bus addresses.
 506
 507If you need to use the same streaming DMA region multiple times and touch
 508the data in between the DMA transfers, just map it with
 509pci_map_{single,sg}, and after each DMA transfer call either:
 510
 511        pci_dma_sync_single(dev, dma_handle, size, direction);
 512
 513or:
 514
 515        pci_dma_sync_sg(dev, sglist, nents, direction);
 516
 517as appropriate.
 518
 519After the last DMA transfer call one of the DMA unmap routines
 520pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_*
 521call till pci_unmap_*, then you don't have to call the pci_dma_sync_*
 522routines at all.
 523
 524Here is pseudo code which shows a situation in which you would need
 525to use the pci_dma_sync_*() interfaces.
 526
 527        my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len)
 528        {
 529                dma_addr_t mapping;
 530
 531                mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE);
 532
 533                cp->rx_buf = buffer;
 534                cp->rx_len = len;
 535                cp->rx_dma = mapping;
 536
 537                give_rx_buf_to_card(cp);
 538        }
 539
 540        ...
 541
 542        my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs)
 543        {
 544                struct my_card *cp = devid;
 545
 546                ...
 547                if (read_card_status(cp) == RX_BUF_TRANSFERRED) {
 548                        struct my_card_header *hp;
 549
 550                        /* Examine the header to see if we wish
 551                         * to accept the data.  But synchronize
 552                         * the DMA transfer with the CPU first
 553                         * so that we see updated contents.
 554                         */
 555                        pci_dma_sync_single(cp->pdev, cp->rx_dma, cp->rx_len,
 556                                            PCI_DMA_FROMDEVICE);
 557
 558                        /* Now it is safe to examine the buffer. */
 559                        hp = (struct my_card_header *) cp->rx_buf;
 560                        if (header_is_ok(hp)) {
 561                                pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len,
 562                                                 PCI_DMA_FROMDEVICE);
 563                                pass_to_upper_layers(cp->rx_buf);
 564                                make_and_setup_new_rx_buf(cp);
 565                        } else {
 566                                /* Just give the buffer back to the card. */
 567                                give_rx_buf_to_card(cp);
 568                        }
 569                }
 570        }
 571
 572Drivers converted fully to this interface should not use virt_to_bus any
 573longer, nor should they use bus_to_virt. Some drivers have to be changed a
 574little bit, because there is no longer an equivalent to bus_to_virt in the
 575dynamic DMA mapping scheme - you have to always store the DMA addresses
 576returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single
 577calls (pci_map_sg stores them in the scatterlist itself if the platform
 578supports dynamic DMA mapping in hardware) in your driver structures and/or
 579in the card registers.
 580
 581All PCI drivers should be using these interfaces with no exceptions.
 582It is planned to completely remove virt_to_bus() and bus_to_virt() as
 583they are entirely deprecated.  Some ports already do not provide these
 584as it is impossible to correctly support them.
 585
 586                64-bit DMA and DAC cycle support
 587
 588Do you understand all of the text above?  Great, then you already
 589know how to use 64-bit DMA addressing under Linux.  Simply make
 590the appropriate pci_set_dma_mask() calls based upon your cards
 591capabilities, then use the mapping APIs above.
 592
 593It is that simple.
 594
 595Well, not for some odd devices.  See the next section for information
 596about that.
 597
 598        DAC Addressing for Address Space Hungry Devices
 599
 600There exists a class of devices which do not mesh well with the PCI
 601DMA mapping API.  By definition these "mappings" are a finite
 602resource.  The number of total available mappings per bus is platform
 603specific, but there will always be a reasonable amount.
 604
 605What is "reasonable"?  Reasonable means that networking and block I/O
 606devices need not worry about using too many mappings.
 607
 608As an example of a problematic device, consider compute cluster cards.
 609They can potentially need to access gigabytes of memory at once via
 610DMA.  Dynamic mappings are unsuitable for this kind of access pattern.
 611
 612To this end we've provided a small API by which a device driver
 613may use DAC cycles to directly address all of physical memory.
 614Not all platforms support this, but most do.  It is easy to determine
 615whether the platform will work properly at probe time.
 616
 617First, understand that there may be a SEVERE performance penalty for
 618using these interfaces on some platforms.  Therefore, you MUST only
 619use these interfaces if it is absolutely required.  %99 of devices can
 620use the normal APIs without any problems.
 621
 622Note that for streaming type mappings you must either use these
 623interfaces, or the dynamic mapping interfaces above.  You may not mix
 624usage of both for the same device.  Such an act is illegal and is
 625guaranteed to put a banana in your tailpipe.
 626
 627However, consistent mappings may in fact be used in conjunction with
 628these interfaces.  Remember that, as defined, consistent mappings are
 629always going to be SAC addressable.
 630
 631The first thing your driver needs to do is query the PCI platform
 632layer with your devices DAC addressing capabilities:
 633
 634        int pci_dac_set_dma_mask(struct pci_dev *pdev, u64 mask);
 635
 636This routine behaves identically to pci_set_dma_mask.  You may not
 637use the following interfaces if this routine fails.
 638
 639Next, DMA addresses using this API are kept track of using the
 640dma64_addr_t type.  It is guaranteed to be big enough to hold any
 641DAC address the platform layer will give to you from the following
 642routines.  If you have consistent mappings as well, you still
 643use plain dma_addr_t to keep track of those.
 644
 645All mappings obtained here will be direct.  The mappings are not
 646translated, and this is the purpose of this dialect of the DMA API.
 647
 648All routines work with page/offset pairs.  This is the _ONLY_ way to 
 649portably refer to any piece of memory.  If you have a cpu pointer
 650(which may be validly DMA'd too) you may easily obtain the page
 651and offset using something like this:
 652
 653        struct page *page = virt_to_page(ptr);
 654        unsigned long offset = ((unsigned long)ptr & ~PAGE_MASK);
 655
 656Here are the interfaces:
 657
 658        dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev,
 659                                         struct page *page,
 660                                         unsigned long offset,
 661                                         int direction);
 662
 663The DAC address for the tuple PAGE/OFFSET are returned.  The direction
 664argument is the same as for pci_{map,unmap}_single().  The same rules
 665for cpu/device access apply here as for the streaming mapping
 666interfaces.  To reiterate:
 667
 668        The cpu may touch the buffer before pci_dac_page_to_dma.
 669        The device may touch the buffer after pci_dac_page_to_dma
 670        is made, but the cpu may NOT.
 671
 672When the DMA transfer is complete, invoke:
 673
 674        void pci_dac_dma_sync_single(struct pci_dev *pdev,
 675                                     dma64_addr_t dma_addr,
 676                                     size_t len, int direction);
 677
 678This must be done before the CPU looks at the buffer again.
 679This interface behaves identically to pci_dma_sync_{single,sg}().
 680
 681If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t
 682the following interfaces are provided:
 683
 684        struct page *pci_dac_dma_to_page(struct pci_dev *pdev,
 685                                         dma64_addr_t dma_addr);
 686        unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev,
 687                                            dma64_addr_t dma_addr);
 688
 689This is possible with the DAC interfaces purely because they are
 690not translated in any way.
 691
 692                Optimizing Unmap State Space Consumption
 693
 694On many platforms, pci_unmap_{single,page}() is simply a nop.
 695Therefore, keeping track of the mapping address and length is a waste
 696of space.  Instead of filling your drivers up with ifdefs and the like
 697to "work around" this (which would defeat the whole purpose of a
 698portable API) the following facilities are provided.
 699
 700Actually, instead of describing the macros one by one, we'll
 701transform some example code.
 702
 7031) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures.
 704   Example, before:
 705
 706        struct ring_state {
 707                struct sk_buff *skb;
 708                dma_addr_t mapping;
 709                __u32 len;
 710        };
 711
 712   after:
 713
 714        struct ring_state {
 715                struct sk_buff *skb;
 716                DECLARE_PCI_UNMAP_ADDR(mapping)
 717                DECLARE_PCI_UNMAP_LEN(len)
 718        };
 719
 720   NOTE: DO NOT put a semicolon at the end of the DECLARE_*()
 721         macro.
 722
 7232) Use pci_unmap_{addr,len}_set to set these values.
 724   Example, before:
 725
 726        ringp->mapping = FOO;
 727        ringp->len = BAR;
 728
 729   after:
 730
 731        pci_unmap_addr_set(ringp, mapping, FOO);
 732        pci_unmap_len_set(ringp, len, BAR);
 733
 7343) Use pci_unmap_{addr,len} to access these values.
 735   Example, before:
 736
 737        pci_unmap_single(pdev, ringp->mapping, ringp->len,
 738                         PCI_DMA_FROMDEVICE);
 739
 740   after:
 741
 742        pci_unmap_single(pdev,
 743                         pci_unmap_addr(ringp, mapping),
 744                         pci_unmap_len(ringp, len),
 745                         PCI_DMA_FROMDEVICE);
 746
 747It really should be self-explanatory.  We treat the ADDR and LEN
 748separately, because it is possible for an implementation to only
 749need the address in order to perform the unmap operation.
 750
 751                        Platform Issues
 752
 753If you are just writing drivers for Linux and do not maintain
 754an architecture port for the kernel, you can safely skip down
 755to "Closing".
 756
 7571) Struct scatterlist requirements.
 758
 759   Struct scatterlist must contain, at a minimum, the following
 760   members:
 761
 762        char *address;
 763        struct page *page;
 764        unsigned int offset;
 765        unsigned int length;
 766
 767   The "address" member will disappear in 2.5.x
 768
 769   This means that your pci_{map,unmap}_sg() and all other
 770   interfaces dealing with scatterlists must be able to cope
 771   properly with page being non NULL.
 772
 773   A scatterlist is in one of two states.  The base address is
 774   either specified by "address" or by a "page+offset" pair.
 775   If "address" is NULL, then "page+offset" is being used.
 776   If "page" is NULL, then "address" is being used.
 777
 778   In 2.5.x, all scatterlists will use "page+offset".  But during
 779   2.4.x we still have to support the old method.
 780
 7812) More to come...
 782
 783                           Closing
 784
 785This document, and the API itself, would not be in it's current
 786form without the feedback and suggestions from numerous individuals.
 787We would like to specifically mention, in no particular order, the
 788following people:
 789
 790        Russell King <rmk@arm.linux.org.uk>
 791        Leo Dagum <dagum@barrel.engr.sgi.com>
 792        Ralf Baechle <ralf@oss.sgi.com>
 793        Grant Grundler <grundler@cup.hp.com>
 794        Jay Estabrook <Jay.Estabrook@compaq.com>
 795        Thomas Sailer <sailer@ife.ee.ethz.ch>
 796        Andrea Arcangeli <andrea@suse.de>
 797        Jens Axboe <axboe@suse.de>
 798        David Mosberger-Tang <davidm@hpl.hp.com>
 799
lxr.linux.no kindly hosted by Redpill Linpro AS, provider of Linux consulting and operations services since 1995.