Oh, finally…

So, this post is a little bit late as well, I almost didn’t manage to make this work :) I’ve been now tinkering with writing a USB device stack for my USB stick for literally the past two weeks now; Being a new space, I’m used to the idea of having issues, but between incredibly stupid bugs, very minimal documentation, and errors in some critical hardware documentation, this has been quite a challenging project.

On the plus side, I now have my USB stick behaving as a virtual com port – so it will be trivial to write extensions and make it start to do really useful stuff (haven’t quite got around to that yet, so expect a part 3)

Additionally, in the process I’ve become pretty familiar with  how USB works, and I’ll try to more simply define it for those who are interested in getting started.

First, some detail about how I managed to make it to this point; I’ve wanted to play around with USB for quite a long time – the first time I tried was many many years ago, around the time I had mastered several much simpler serial protocols… That didn’t particularly go anywhere as I had no idea where to start with the USB specification, and every part of it seemed equally irrelevant and incomprehensible.

As you might imagine, I’ve since become a lot better at reading specifications – and while the USB specification is still by no means trivial, it makes quite a bit of sense now, and have been able to identify the important parts and work through the process of actually building a compatible device.

Since I’m using an AVR with USB support, some level of this is done for me;  the entire electrical communication layer is done, and the vast majority of the protocol work – all that remains is to handle the high level messages and instruct the hardware on how to proceed. This was indeed a pretty easy thing to do, just bugs caused quite a lot of pain.

One thing I’ve found very helpful in this process is USB protocol tracing; There are a number of tools out there that can capture USB traffic, and if you’re using Windows 7, a mechanism to do this exists built into the OS’s USB stack itself (take a look here for more information). In general the bugs I’ve hit were mainly packets that didn’t complete properly (returned an inappropriate response, or didn’t return at all), and those things are pretty easy to find in a protocol trace.

So, let’s start with a broad question: What is USB?

The USB specification defines the behavior and allowed interactions of all parts of the USB ecosystem. It provides a framework for building devices which require data transfer between the device and host, in a number of ways. The specifications for USB are available from usb.org – Most modern devices are USB 2.0, and USB 3.0 is out of the scope of what I’m talking about here :)

It’s actually a very  wide specification, and the USB specification and related specifications seek to define a lot of things, such as:

  • How a USB device physically attaches to a host system (Mechanical, chapter 6)
  • How a USB device electrically communicates to a host system (Electrical, chapter 7)
  • What specific sorts of message types should be sent between the host and device, and the bit patterns they map to (Protocol layer, Chapter 8)
  • How the device uses those messages to interact with the host and expose data about itself (USB Device Framework, Chapter 9)
  • How the host should behave in general
  • How hubs should work, to allow a single USB port to drive multiple devices
  • And additionally the USB Device Class specifications specify in more detail how certain types of devices should behave, in a standardized way.

For what I have been doing with the USB stick and general microcontroller USB development (when hardware support is available), only the parts about the Device Framework and device class specifications are really important – it’s good to know about the protocol layer but that is for the most part taken care of for you.

Still, it is pretty complex – but only because there are a number of details to attend to. I’m omitting a few of the details, but I’ll go over the major points individually:

First, a device is a logical object composed of endpoints – There are 16 endpoint numbers and 0 is always a special number for sending control messages to the device. Everything about USB is host-centric, the host initiates all transactions, the device must only respond. So the majority of what happens in USB, is the host sends a packet of a certain type, to a specific endpoint, and then the device responds to that request with data, acknowledgement, or some error condition.

An endpoint is really just a buffer to send or receive data – the endpoints in USB microcontrollers typically only operate in one direction, though the USB standard allows an endpoint to both send and receive data. Typically you will have to configure your endpoint with a specific size (which will be communicated to the host via descriptors, see further below)  - Then either you will make an IN endpoint write data to the buffer, and hardware will happily send it to the host the next time the host asks about that endpoint – or you will create an OUT endpoint, wait for it to be filled by the host, and the process the incoming data. Endpoint 0 must operate in both directions, and that balance is a little elaborate to maintain.

The big three request packets the host uses to control the flow of data are: SETUP, IN, and OUT. IN commands are the host asking for data, OUT commands are the host sending data, and SETUP packets are slightly more complicated – a SETUP packet must only be sent to endpoint 0, and it is always followed by an 8-byte data packet, which is a data structure specifying a request. As part of the setup transaction, the host may send additional data (with an OUT), or may ask for data (with an IN); and a third stage with the opposite direction will give the device a chance to acknowledge or reject the transaction.

Here’s another representation…

Host sending data to device:  OUT – <host sends data> – [device ACK, handled by hardware]

Host requesting data from device: IN – <device sends data> – [host ACK, or retry logic handled by hardware]

Host sending a SETUP request with no extra data: SETUP – <host sends 8 bytes> – [Device ACK in hardware] – IN – <Device sends zero-length data packet to acknowledge> – [host ACK]

Host sending a SETUP request and requesting extra data: SETUP – <host sends 8 bytes> – [Device ACK] – IN – <Device sends extra data> – [host ACK] – OUT – <Host sends zero length packet> – [Device ACK to acknowledge]

Host sending a SETUP request and sending extra data: SETUP – <host sends 8 bytes> – [Device ACK] – OUT – <Host sends additional data> – [Device ACK]  - IN – <Devices sends zero-length packet to acknowledge> – [Host ACK]

Alternately, the device may send a STALL response instead of data / ack if it does not support a SETUP operation (or if the endpoint has an error); (to add another detail,  USB hardware will also send NAK packets to delay requests until the device is ready to respond- which I haven’t included for the sake of simplicity)

Another small complexity in the SETUP phase is that if your data payload exceeds the maximum endpoint length, multiple packets will be sent/requested, finally terminating with a packet less than the maximum endpoint size.

So, did I say this was simple? I’m sorry :)

That’s the very outer shell – the information you need to know to interface with the hardware; Next comes the requests themselves.

The requests are completely documented in the USB specification, in  sections 9.3 and 9.4; The specification leaves a little bit to be desired in the clarity of what exactly needs to be handled, but most importantly you need to focus on the “type” field in the bmRequestType and also the request code itself. If the type is a standard request, the table of standard device request IDs is in that area, If the type is a class, or vendor request, a different table of request codes applies. It is necessary to handle several of the standard requests in order for a device to work. Most importantly though, are set_address, get_descriptor, and set_configuration. Several of the others are required, but not all of them (I’m not elaborating further because the descriptions in section 9.4 are actually not bad)

For set_address, some interesting logic is usually required, After receiving the set_address packet, you must complete the ACK phase before actually setting the address (or else your address will change before the host commits to your new address, and it will not be possible to complete the ACK ) – the Atmel datasheet went into some detail on how it should be done. The entire transaction is done without extra data (the address is passed in one of the fields of that 8-byte SETUP data packet), so it is the software’s responsibility only to receive that, process it, and then instruct the hardware to send a zero-length data packet in response to the host’s IN (and set the address AFTER that packet completes)

Get_Descriptor is the most important command to get right, and you only absolutely have to handle two types of requests here. First: for descriptor type 1 and descriptor index 0 (the Device descriptor), and second: descriptor type 2 and descriptor index 0 (The first Configuration descriptor) – The descriptors I will cover in a moment, but they are relatively large data structures that describe the structure of your device, among other important details. This is probably the only setup request that is likely to need multiple packets in response (My setup endpoint is 32 bytes, and my configuration descriptor is 67 bytes – so 3 packets for me). In this case the idea is to process the SETUP data structure, reject it with STALL if it’s not one of the supported types, but otherwise send packets of max length followed by waiting for those packets to complete until  you get to the end (you either send a zero length packet or non-max-length packet as the last one). Note as well that the get_descriptor request includes a maximum length, so you may have to stop early sometimes.

By default, your device is in an unconfigured state, essentially all of the endpoints are supposed to be turned off (Except endpoint 0), the device should not be doing anything notable. – set_configuration allows the host to turn on a device to one of potentially several operating modes (each one of these modes gets a configuration descriptor)

If you can implement those three, the others will follow pretty naturally, and Class requests when implementing class devices follow the same pattern.

One last subject for this blog post: Descriptors!

The USB descriptor definitions in Section 9.6  are pretty well documented structures. The device descriptor is simple enough – when you are asked to provide a device descriptor, they mean you should send a string of bytes that correspond to the USB device descriptor (9.6.1) – Note that some of the numeric fields are a bit… distant – descriptor type numbers are in table 9-5 back in section 9.4.

The configuration descriptor is the tricky one. If you read about the get_descriptor device request (9.4.3) you’ll find that the configuration descriptor should actually be a configuration descriptor, followed by one or more interface descriptors, and each interface descriptor may be followed by one or more endpoint descriptors (think of a tree). In the USB world, A configuration is thought of as a mode of operation, and an interface is thought of as a logical unit  that serves a specific function. That specific function is associated with the Zero endpoint for control (typically as Vendor type requests, which you can define yourself if you want to write a driver), and possibly one or more endpoints, which serve as data pipes for interfacing with the device.

How should you define a configuration? Well, any way you want if you are writing your own driver… But USB-IF noticed that this wasn’t ideal, and so the notion of Class devices was brought into the world.

I’m not going to cover this here (this post is already so very long!), but  in addition to the USB core specification there are a large number of official class specifications, which define how specific types of devices should operate – To be more specific, they define what things you should have in your configuration, and they provide a list of class requests, notifications, and custom descriptors related to describing various types of devices. The upshoot of all this is you simply have to implement a class device, and likely the driver to make your device work on whatever computer you plug it into has already been written for you.

In re-reading this, I doubt I will have answered everyone’s questions, I’m happy to answer further questions if you post them; USB does seem pretty simple after wading through it for a week or two, but it is still a large array of moving parts, regardless of how simple as they may be individually.

I’m planning to continue this, another time, and I have been working on a USB cribsheet for my own reference that I’ll also link here sometime.

(And now, I am off, to implement a USB bootloader, and USB jtag programming and… )