Tech Support > Microsoft Windows > Drivers > Packet alignment constraint in WinUSB
Packet alignment constraint in WinUSB
Posted by Paarvai Naai on May 30th, 2008


I'm using WinUSB to communicate with my device and had some issues
regarding the RAW_IO pipe policy.

At first, I noticed that there is a significant performance hit when
not using the RAW_IO pipe policy. The documentation for
WinUsb_SetPipePolicy states that WinUSB's queuing and error handling
is bypassed when using this policy, but I could find no further
discussion of the queuing and error handling.

After setting the pipe policy to RAW_IO, I found that the MSDN
documentation states that calls to WinUsb_ReadPipe and
WinUsb_WritePipe must satisfy the following conditions:

* The buffer length must be a multiple of the maximum endpoint packet size.
* The length must be less than what the host controller supports.

There is a comment in the WinUSB "How to" guide that states that it is
only for read requests that buffers must be a multiple of the maximum
packet size. In contrast, the MSDN documentation states that this
constraint applies to both read and write requests.

I tested the behavior and the "How to" guide is correct. I can make
calls to WinUsb_WritePipe with an odd-sized buffer and things work as
expected. The MSDN documentation should be updated to only have the
restriction on read requests.

The larger issue is that the read constraint imposes an onerous burden
on the application programmer. There are many situations in which a
partial read is required. As such, I had to work around this by
submitting temporary buffers and copying the result after the
operation completes. This has a clear performance disadvantage over
simply reading a partial packet directly from the driver.

It is worthy to note that Linux and Mac OS X do not impose this
constraint, thus allowing the application to simply read as many bytes
as desired from the USB stream.

Is it possible to remove this constraint in the next WinUSB update?

Thanks,
Paarvai


----------------
This post is a suggestion for Microsoft, and Microsoft responds to the
suggestions with the most votes. To vote for this suggestion, click the "I
Agree" button in the message pane. If you do not see the button, follow this
link to open the suggestion in the Microsoft Web-based Newsreader and then
click "I Agree" in the message pane.

http://www.microsoft.com/communities...vic e.drivers

Posted by Tim Roberts on May 31st, 2008


Paarvai Naai <PaarvaiNaai@discussions.microsoft.com> wrote:
There is no buffering anywhere in the USB driver path. Your buffer is sent
to the host controller driver, where it is filled and returned back to you.
The device has no idea how much data was requested. A device simply
receives a "send data now" signal. If you asked for 128 bytes, and the
device sends 512 bytes, that's called "babble". It is a USB protocol
violation.

If you need buffered access to your device data stream, then you have to
provide the buffers, just as you describe. This is nothing new. It's been
true for USB devices forever.

I can't speak for Mac OS, but I am experienced with Linux USB coding. The
exact same restrictions apply: there is no buffering. If you ask for 128
bytes and the device transmits 512, that's a protocol violation. Even the
libusb library, which is not part of the operating system, provides no
buffering.

So, let me get this straight. You want to use "raw IO" so that you can
avoid buffering, but you really want it to buffering?
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

Posted by Maxim S. Shatskih on June 1st, 2008


I think WinUSB also supports this, just not in raw mode.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com


Posted by Paarvai Naai on June 2nd, 2008


Thanks for your response.

Maybe an example will make the scenario more clear:

1) There is a device that sends exactly 768 bytes. This will arrive
as two packets: 512 bytes + 256 bytes.

2) The application would like to allocate a buffer of exactly 768
bytes to receive this data.

3) Under the RAW_IO pipe policy, there cannot be a partial packet for
the read. Therefore, a buffer of 1024 bytes is required.

My low-level library code does not own the buffer's memory, relying
upon the application code to provide a properly sized buffer. The
application code is not concerned with USB packet sizing. It simply
wishes to receive 768 bytes and provides a buffer to the library that
is sized accordingly.

The current workaround is for the low-level code to allocate an
additional aligned 1024-byte buffer and copy the result into the
application buffer when complete. This has the unnecessary
performance penalty of memory allocation and copying. While this
example is for a small packet, the penalties accumulate over larger
packets and many iterations.

In contrast, the most efficient usage would be the following (assume
that "char *buffer" is the application-supplied buffer of length 768):

1) Receive the first 512 bytes into &buffer[0].
2) Receive the remaining 256 bytes into &buffer[512].

Please note that this works as expected on both Linux and Mac OS X.
Therefore, there is no inherent USB controller limitation. Rather, it
seems that the WinUSB driver has an unnecessary restriction that
prevents step #2 from working with RAW_IO.

Is it possible to remove this constraint in the next WinUSB update?

Best Regards,
Paarvai


"Tim Roberts" wrote:


Posted by Maxim S. Shatskih on June 2nd, 2008


Then go away from RAW_IO, this is simplest.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com


Posted by Paarvai Naai on June 2nd, 2008


While your suggestion may seem to be the simplest approach, the reason
we are using RAW_IO is the undetermined latency behavior when using
the buffered mode.

The WinUSB documentation does not sufficiently describe how the
buffering mechanism works. Furthermore, I cannot inspect the WinUSB
source code to characterize the latency behavior. Therefore, the only
choice is to use RAW_IO.

Our application is a mature product that has been working for years on
Windows and Linux. We have strict constraints on throughput and
latency. We would like to transition the Windows version to WinUSB
for better maintenance. However, we need to be in control of USB
communications at the lowest level.

This approach works perfectly on other operating systems. It works
almost perfectly on WinUSB. If WinUSB removed the limitation on read
packets, its features would be brought up to the same level as the
Linux and Darwin user-mode USB stacks.

Best Regards,
Paarvai


"Maxim S. Shatskih" wrote:


Posted by Alexander Grigoriev on June 3rd, 2008


What is your target throughput?

"Paarvai Naai" <PaarvaiNaai@discussions.microsoft.com> wrote in message
news:954972E9-69D1-4F11-BFE0-FBB4AFEF724F@microsoft.com...


Posted by Paarvai Naai on June 3rd, 2008


The maximum theoretical bandwidth of USB 2.0 is 53MB/s. This is ideal
case on the best motherboards. Lesser motherboards are often limited
to 30MB/s. We would like to achieve as close to the highest bandwidth
on a given motherboard.

This means that we do not want to introduce any unnecessary
inefficiencies. Our current "copy buffer" workaround for the WinUSB
issue is probably sufficient on a fast machine. However, we don't
know how it will manifest on slower machines.

Besides the straight performance numbers, it is also a code
maintainability issue. Both Linux and Darwin allow us to walk the
pointer in a single buffer, reading the precise amount of data
required with no extra copying. Only WinUSB requires that we align
the reads, requiring the inelegant workaround described above. Just
the fact that it works is not reason to say that it is not a problem.

Is it possible for somebody from Microsoft to comment on this? I
suspect that it should be relatively easy to remove the limitation of
packet alignment for RAW_IO reads. I would be interested in hearing
your thoughts.

Thanks,
Paarvai


"Alexander Grigoriev" wrote:


Posted by Maxim S. Shatskih on June 3rd, 2008


Latency constraints on a bulk pipe???

As you can understand, this is the normal natural limitation on non-buffered
USB traffic.

So, you can either get rid of RAW_IO, or implement your own buffering.

How is your Windows kernel-mode driver implemented? using its own buffering I
believe?

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com


Posted by Maxim S. Shatskih on June 3rd, 2008


There are no other ways. USB transfers are aligned, period.

No-raw traffic will result in the same "copy buffer" done for you by WinUSB.

Then the copying is done in the system library for USB, same way as in WinUSB
in non-raw mode.

Switch off the raw mode and enjoy the same functionality as in UNIXen.

No, it is just plain impossible. "Raw" means - no extra processing, exactly as
on USB wire. This means the alignment requirement.

If you do not want this requirement - go away from raw mode, you will have the
same functionality as in both UNIXen.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com


Posted by Alexander Grigoriev on June 3rd, 2008


I'm afraid you won't get good throughput with odd buffers. Change your
device to always supply full size packets, submit large buffers, and it will
be better.

That said, 40 MB/s memory copy is nothing for modern systems. Those from
2005 could achieve that USB throughput easily under WinXP.

"Paarvai Naai" <PaarvaiNaai@discussions.microsoft.com> wrote in message
news:8EA56C65-B6E4-498C-85D5-D3EEC4FC0BEA@microsoft.com...


Posted by Paarvai Naai on June 3rd, 2008


The device does use full-sized packets, but there are technical reasons why
it needs to send a partial packet every so often.

Regardless of whether a 40MB/s memory copy is significant on a modern
computer, it is simply code bloat to have something that would not be
necessary if the underlying problem was fixed.

Paarvai

"Alexander Grigoriev" wrote:


Posted by Paarvai Naai on June 3rd, 2008


The latency constraints stem from a requirement for high burst
bandwidth. This is not to be confused with high continuous bandwidth.
I have started a new thread called "Buffered performance in WinUSB" to
address the issue in more detail.

As for your question about the kernel driver, there is no buffering
involved. Rather, it simply populates the appropriate fields in the
Windows URB structure:

http://msdn.microsoft.com/en-us/library/ms793340.aspx
http://msdn.microsoft.com/en-us/library/ms793345.aspx

In particular, the TransferBufferLength field is simply set to the
size. The size does not have to be a multiple of 512.

Paarvai


"Maxim S. Shatskih" wrote:


Posted by Paarvai Naai on June 3rd, 2008


This is incorrect. Please refer to the EHCI specification:

http://www.intel.com/technology/usb/ehcispec.htm


In section 4.10.3 on page 83, it states:

---
The maximum number of bytes a device can send is Maximum Packet
Size. The number of bytes moved during an OUT transaction is either
Maximum Packet Length bytes or Total Bytes to Transfer, whichever is
less.

....

The PID Code field indicates an IN and the device sends more than the
expected number of bytes (e.g. Maximum Packet Length or Total Bytes
to Transfer bytes, whichever is less) (e.g. a packet babble). This
results in the host controller setting the Halted bit to a one.
---

Section 3.5.3 on page 42 has a description of the relevant data
structures.


Therefore, for both OUT and IN packets, partial packet transfers are
fully supported by the EHCI hardware. This makes sense, since Linux
and Darwin both have the same behavior as I previously described. In
fact, the underlying Windows EHCI also has the same behavior. It
seems that WinUSB is introducing the alignment limitation.

Is it possible for somebody at Microsoft with visibility into the
source code to comment on this issue?

Thanks,
Paarvai


Posted by Randy Aull \(MS\) on June 4th, 2008


You are correct that the MaxPacket restriction only applies to IN endpoints.
The purpose of the constraint on IN endpoints is to prevent a malicious or
malfunctioning usermode client from causing a babble condition on the bus.

"Paarvai Naai" <PaarvaiNaai@discussions.microsoft.com> wrote in message
news:28A343A0-2B3D-4F1D-A4AD-39460B87F1C2@microsoft.com...

Posted by Randy Aull \(MSFT\) on June 4th, 2008


The penalty doesn't "accumulate". The cost is fixed. For every transfer
that you have that is not a multiple of MaxPacket, you will experience a
fixed overhead of an extra request for the last MaxPacket chunk of the
buffer.

Now, the restriction on MaxPacket allignment is not imposed by the lower USB
stack. It is in fact imposed by the WinUSB driver itself. This is to
prevent malicious or malfunctioning software from causing a babble condition
on the bus.


"Paarvai Naai" <PaarvaiNaai@discussions.microsoft.com> wrote in message
news:AF300D3D-3917-4AB1-8F2E-878C02B12BF1@microsoft.com...

Posted by Paarvai Naai on June 4th, 2008


My understanding is that WinUSB imposes the MaxPacket limitation to
prevent a security issue in which user-mode software can stall the
EHCI controller for the entire system. Is this correct?

Is it possible for the WinUSB or EHCI driver to automatically unstall
the controller in the specific case of babble on an endpoint for which
the user has requested less than MaxPacket bytes?

In the meantime, I will test Linux and Darwin to see how they behave
in this situation and post the results to this discussion group.

Thanks,
Paarvai


"Randy Aull (MSFT)" wrote:


Posted by Tim Roberts on June 5th, 2008


Paarvai Naai <PaarvaiNaai@discussions.microsoft.com> wrote:
Oh, the device is absolutely allowed to send a partial packet. That is
fully supported. If you supply a 1024-byte buffer and the device sends
768, the request will be completed with 768 bytes. A short buffer
terminates a transfer.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

Posted by Tim Roberts on June 5th, 2008


Paarvai Naai <PaarvaiNaai@discussions.microsoft.com> wrote:
No, that's silly. A stall on an endpoint affects ONLY that endpoint.

Look, if the limitations of WinUSB aren't acceptable for you, then for
goodness sakes, just throw WinUSB in the trash and write a kernel driver.
Kernel USB drivers are not that hard to write.

WinUSB is a good solution for many USB problems. However, it is NOT the
solution for EVERY USB problem.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

Posted by Tim Roberts on June 5th, 2008


Paarvai Naai <PaarvaiNaai@discussions.microsoft.com> wrote:

I am curious to know what part of Maxim's statement you believe is
contradicted by this section.

!!! What? How did you possibly come to that conclusion? What those
paragraphs say exactly match what the rest of us have been saying: when a
device sends more data than the request asked for, that's "babble", which
is a protocol violation.

"Partial packet transfers" are supported, yes, but only in the sense that
the DEVICE is allowed to send less than the maximum packet size.

No, it doesn't.

This newsgroup is not an official Microsoft support channel. If you want
an official word, you will have to call Microsoft support and pay for a
support event.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.


Similar Posts