Tech Support > Microsoft Windows > Drivers > KMDF Scatter-Gather in EvtProgramDma vs Common Buffer
KMDF Scatter-Gather in EvtProgramDma vs Common Buffer
Posted by Maxim on July 3rd, 2007


Hi everyone

I am implementing a KMDF driver for a PCIE device and following the
toolkit's DMA framework. I don't see an elegant way of solving my problem.

My hardware device has up to 16 DMA engines, so it is capable of initiating
multiple DMA transfers in parallel. I can use that functionality to
implement scatter-gather transfers or multiple transfers, but there are
limitations which I don't seem to be able to overcome.

1. A length of the DMA access for this device has to be a multiple of 128.
I truncate the user IO request to satisfy that requirement for the
transaction as a whole. However, certain elements of SG list I receive in
EvtProgramDma, do not necessarily satisfy this requirement. Is there a way
to force the framework to force minimal length of SG list fragments? Setting
device alignment to 127 doesn't help. Not sure if forcing user apps to do
128-byte alignment would solve this problem...

2. Alternatively, I could switch to using packet-based rather than SG-based
DMAs, but framework claims to have a limit of no more than a single
outstanding DMA at any given time when used in packet mode. That would not
allow me to achieve good PCIE utilization at all, i.e. much lower bandwidth -
doesn't work.
Any way of overcoming the limitation? Why is it there in the first place?

3. Finally, I could do DMA transfers to a pre-allocated common buffer, and
then copy them into the user buffer. WDF documentation doesn't really show
how that plugs into the framework. I understand I could take an interrupt,
schedule a work item at a lower IRQL, and in the work handler get a handle to
user request memory and copy it there from the common buffer. However, would
calling DMA completion routines interfere with that, e.g. overwrite the
request memory or do something equally bad? Furthermore, I would effectively
be doing DMA my way. Is there a WDF-suggested way of using common buffer
with DMAs where the DMA results need to be copied to the user buffer?

Any suggestions would be much appreciated.

Posted by Eliyas Yakub [MSFT] on July 6th, 2007


1) I'm not clear whether the issue you are having is with the lenght of the
fragmenets or the alignment. For up to a page size, virtual address
alignment and logical address alignment is guaranteed to be same. In that if
your buffers are aligned to 128 bytes then the logical address of the
fragement in the SGList will also be aliged to 128.

2) No, switching to packed based is not the right solution. Yes, there is a
limitation on the number of concurrent transfers and it's due to the way DMA
support is designed in the NT system. This is not a framework imposed limit.

3) If your hardware supports scatter-gather then there is no point in using
common buffer. You should DMA directly into the user buffer pages. There are
couple of KMDF samples demonstrate how to do this. Take a look at the PLX
and PCIDRV sample.

-Eliyas

Posted by Maxim on July 6th, 2007


Thanks for answering. I guess we'll have to force the user application to
allocate buffers on 128-byte boundaries.


Posted by Maxim on July 6th, 2007


Since you seem to know about DMAs, let me ask you one more question please.
I posted it in already in a separate message:

http://www.microsoft.com/communities...59b5fe8c06&p=1

I am clear on the blocking part of that question. The DMA part of that
question remains open. Namely, the # of scatter-gather elements avaiable in
HW is limited depending on its current state. Is it a certainty that for a
request of length n, the maximum number of scatter-gather elements passed to
EvtProgramDma will be n/PAGE_SIZE+1? If so, then I could leave the requests
in the queue until the right resources become available; otherwise, I would
have to evaluate the resources from within EvtProgramDma callback, and if
they are not available, create yet another queue and requeue the DMA
programming request.

Posted by Eliyas Yakub [MSFT] on July 6th, 2007


There is an easy solution for you in KMDF. Depending on your hardware state,
you call WdfDmaEnablerSetMaximumScatterGatherElements to set the
scatter-gather elements limit. When you call WdfDmaTransactionExecute to
start a DMA transaction, framework will first compute to see if the
transcation meets the SG limit set by your driver. If it's doesn't then it
returns STATUS_WDF_TOO_FRAGMENTED error. At that point, you can put the
request in a different queue and process it later when the state changes.

-Eliyas


"Maxim" <Maxim@discussions.microsoft.com> wrote in message
news:62BE9069-B76C-4BD0-B885-791C2D4B4E08@microsoft.com...


Similar Posts