USB Virtual Serial Receive Speed

As part of optimization work to support large-scale LED projects, PJRC developed three simple USB virtual serial receive speed benchmarks. The complete source code is available below.

Here are the benchmark results for boards that currently support native USB virtual serial for Arduino sketches.

These tests show the importance of efficient USB software. The actual speed a board achieves depends much more on its software design than its raw hardware speed.

To understand these results, we need to look at what each attempts to measure.

Standard Receive Test

The "standard" test measures the sustained speed with a sketch using Serial.available() and Serial.read().
        // receive 500 bytes
        for (count=0; count < 500; count++) {
          while (!Serial.available()) ;
          buf[count] = Serial.read();
        }
Nearly all example sketches use this inefficient approach, with 2 function calls for every incoming byte. Speeding these 2 functions offers the greatest benefit for casual Arduino users who copy example code.

The 32 bit processors on Teensy 3.0, Fubarino Mini and Maple can execute these 2 function calls much faster than 8 bit AVR.

ReadBytes Receive Test

Arduino 1.0 introduced the Stream class, which adds many useful functions to all Arduino communication objects. The readBytes() function can receive large blocks of data without the repetitive double function call overhead.
        // receive 500 bytes, using Serial.readBytes
        // as many times as necessary until all 500
        while (count < 500) {
          n = Serial.readBytes(buf+count, 500-count);
          count = count + n;
        }
Until recently, every readByte() implementation used available() and read() internally. Teensyduino 1.14 is the first system to optimize the readBytes() function. The result is a dramatic increase in speed! Even the relatively slow Teensy 2.0 is able to receive at the full USB speed. The benchmark also attempts to estimate the CPU time left over. See the testing methodology below for details.

ReadBytes with Overhead Test

The "overhead" test attempts to simulate a more realistic readBytes() usage scenario, while still maintaining a repeatable benchmark. After receiving 500 bytes, an "overhead" delay of 500 µs is waited.
        // Delay for 500 microseconds, to simulate doing
        // something useful with the received data
        unsigned long beginMicros = micros();
        while (micros() - beginMicros <= 500) ; // wait 500 us
Most Arduino sketches are designed to be simple and leverage libraries. One example might involve receiving data and updating a long LED strip with the FastSPI_LED library. This test attempts to measure the real-world speed achieved when a program spends significant time actually using the data, rather than calling the readBytes() function as rapidly as possible.

Operating System Differences

The above tests were made using a fast Linux-based desktop computer and large USB write transfer size, to minimize any impact of speed limitations on the PC. You might believe a 64 bit multi-core CPU running at several GHz clock speed with gigabytes of RAM and extremely fast buses could not possibly limit these USB speeds, but in fact the USB device drivers on each operating system vary considerably when smaller transfer sizes are used.

In this test, the ReadBytes test was run on the same Teensy 3.0, MacBook Pro (late-2012 model), rebooted to each operating system. The benchmark program was run with its default setting, which writes 30000 bytes at a time. Then it was run with the option to write 24 bytes and 1 byte at a time.

Apple has done a remarkable job in OS-X 10.7 (Lion) to handle small writes and efficiently transmit the data.

Linux was tested using an Ubuntu 12.04 live CD. Similar numbers were measured on the fast desktop PC, but the numbers reported here are from the MacBook

Windows 7 (SP1) was tested running natively, by dual-booting with Apple's Bootcamp. Microsoft's drivers seem to have a lot of trouble efficiently handling small writes.

Unfortunately, it is not usually feasible in non-benchmark applications to write 30000 bytes at once. Often data is sent in small, fixed size messages. On Windows there is a tremendous performance penalty for writing only small messages to be transmitted by USB virtual serial.

Testing Methodology & Source Code

Anyone familiar with the computer industry knows to be skeptical of benchmarks, especially those published by any vendor showing their product to be superior! We strongly encourage you to embrace the Soviet-era doctrine "Trust, but Verify".

Teensy 2.0 and 3.0 were optimized using these benchmarks as a metric. Prior to Teensyduino 1.14, Teensy 3.0 measured approximately 330 kbytes/sec on the standard test (slower than Fubarino), and both Teensy boards had identical standard and readbytes performance. The readBytes() optimization was developed using this benchmarks to evaluate its improvement. This benchmark was not in any way designed to favor Teensy, but Teensy did have an early advantage of optimization work using this benchmark before it was made publicly available.

Here is the complete source code needed to run these benchmarks:

These benchmarks are being made available in the hope other vendors may also be able to optimize their code, and of course for anyone interested in USB virtual serial performance, or anyone who wishes to "Trust, but Verify"!

To run the benchmark, simply upload one of the benchmark sketches to your Arduino compatible board. Then run the "receive_test" program from a command line. A pre-compiled Windows version is included. On Mac and Linux, you will need to edit the makefile to select your system, then run "make" to compile. On Mac, you will need Apple's X-code command line utilities package, which is available from their free (but registration required) developer program.

The ReadBytes results with Teensy are limited by your system's available USB bandwidth, so you may see varying results depending on which USB port you use and what other USB devices are connected or in use during the test. The other tests are limited by software speeds, so you should see very similar results. If you do find different results, please post on the PJRC forum to let us know.

Boards Tested: Maple, Fubarino Mini, Teensy 3.0, Teensy 2.0 (left, top to bottom)
and Arduino Leonardo, Arduino Due (right, top to bottom).

Test Conditions

Teensy 3.0 and Teensy 2.0 were tested using Arduino 1.0.5 with Teensyduino 1.14 on Linux.

Fubarino Mini was tested using "mpide-0023-linux-20130514" on Linux.

Maple was tested using "maple-ide-v0.0.12" on Linux.

Arduino Due was tested using Arduino 1.5.2 on Linux. A quick check was also done using the nightly build from June 1, 2013, which showed essentially the same results.

Arduino Leonardo was tested using Arduino 1.0.5 on Linux.

The board speed tests were run on a desktop PC running Ubuntu 12.04.2 LTS (64 bit), Linux kernel 3.2.0-44 with an i7-3930K CPU, 32 GB RAM and SSD. Each board was connected through a USB 2.0 hub (Genesys Logic, USB ID 05e3:0608). A USB keyboard, mouse, scanner and FTDI USB-serial cable were connected to the system during all tests, but not in use.

During the readBytes() test, a multimeter was connected to pin 3. The average DC voltage was noted, and then divided by the logic high voltage after the test ends. This gives a reasonable approximation of the free CPU time on the board during the test. These numbers involve a subjective reading of the average voltage, which varies slightly while the test runs, so the CPU usage numbers are approximate.

The operating system tests were performed on a MacBook Pro (late 2012 model), running OSX 1.7.?, Windows 7 SP1, and Ubuntu 12.04 connected to a Teensy 3.0 programmed from the earlier Linux-based tests. The Teensy 3.0 had identical programming in each operating system test (not reprogrammed by the Arduino IDE on each system).

What About Energia?

For this test, Energia 0101E0009 was attempted with the Launchpad LM4F120 board. However, Energia does not appear to have any support for using the native USB serial port on this board. The benchmark measured 11 kbyte/sec using the debug port, the slow speed of only regular hardware serial.

If a future version of Energia supports the native USB port, this benchmark could be used to test its performance.