How to cross compile and sign Windows EXE on Linux with Yubikey token

I recently obtained a new code signing certificate used to sign the Windows version of Teensy’s software. Because key storage rules have changed, and very little information currently exists for use in a cross compile workflow, I decided to write this article.

The basic concept is you wish to compile software for Windows, from the comfort of your Linux machine. The Windows EXE file needs to be properly signed, so users see “Verified Publisher” and Microsoft Defender SmartScreen will (eventually) learn your application is safe.

These steps were performed on Ubuntu 22.04. Other Linux distributions can very likely also work.

Buying a Code Signing Certificate

Before June 2023, obtaining a code signing cert was simpler and less expensive. You would create a private key and use it to send a certificate signing request (CSR). After verifying your identity, the certificate authority would send you the certificate file.

Signing a Windows EXE file requires these 2 items, a secret key and a certificate issued by one of the certificate authority companies Microsoft trusts. Previously it was just 2 files on your computer, or a single “P12” format file which can hold both.

Since June 2023, your secret key must either be stored inside a hardware token and/or stored on the certificate authority’s cloud service. Supposedly this is more secure.

It is also much more expensive! As with buying anything, shop around. Prices vary and change. As of March 2024, the best price I found was at SSL.com. This article isn’t sponsored or affiliated with SSL.com in any way. They simply had the best (least bad) price in March 2024. Pricing for “OV” was $110 per year for 3 years, and $250 for the hardware key. The actual hardware has $80 retail price when blank from Yubico.

If you choose another certificate authority, you might contact them to confirm the hardware key is Yubikey FIPS. Theoretically other hardware keys might exist, if not today then at some point in the future, and you would want to confirm whether the hardware they send works with Linux and libengine-pkcs11-openssl.

SSL.com also offers a cloud service which eliminates the need for a hardware key, but its least expensive tier is $20/month. That’s almost as much as the Yubikey FIPS hardware, repeated every year! Worse year, that low tier offers only 20 signatures per month. Teensy’s software has several signed EXE files. Merely building a few beta tests could mean needing the next tier.

After you’ve paid, the certificate authority will verify your identity. For “OV” level, the verification is so simple that the price and complexity of needing a hardware token feels rather wasteful. Nonetheless, to get Windows “Verified Publisher” this is the process.

Then all you can do is wait for your Yubikey token to arrive.

Collect Critically Important Data

Your Yubikey FIPS token will arrive initialized with 2 passwords, called PIN and PUK. Your first step is to obtain these 2 passwords.

From SSL.com, my Yubikey came with an 8 digit serial number, printed on a label and also on the key. On the SSL.com website order summary, under “CERTIFICATE DETAILS” -> “physical tokens” was an “activate” link. Clicking it brings up a dialog box to enter the 8 digit serial number. After entering the serial number, the PIN and PUK passwords appeared.

Other certificate authorities will likely work differently, but all Yubikey FIPS use these PIN and PUK passwords for access to the key. You must get the PIN and PUK passwords the certificate authority used. If the PIN is entered incorrectly 3 times, the key becomes locked. The PUK password lets you assign a new PIN password. If PUK is incorrect 3 times, the hardware becomes unusable! Make sure you have the correct PIN and PUK before using your key.

You will also need the certificate data. On the SSL.com I found this on the order summary page under “END ENTITY CERTIFICATES”. It begins “—–BEGIN CERTIFICATE—–“, has a couple dozen lines of base64 encoded data, and ends with “—–END CERTIFICATE—–“. Save this to a file. Any name is ok, I named it “mycert.pem”.

Install Software

Fortunately Ubuntu 22.04 has all the required software as packages. Use these commands install everything demonstrated in this article.

sudo apt install yubikey-manager osslsigncode ykcs11 libengine-pkcs11-openssl
sudo apt install gcc-mingw-w64-i686
sudo apt install wine

You only need the (large) wine package if you want to run your freshly compiled EXE on Linux.

Check Yubikey Hardware

Now you’re ready to plug in your Yubikey and check it. First run this command:

ykman list

You should see it detect your Yubikey and print basic info. The 8 digit serial number should appear.

YubiKey 5 NFC FIPS (5.4.3) [OTP+FIDO+CCID] Serial: 00000000

Type this command to check the keys stored:

ykman piv info

You should see results like this, but with actual serial numbers instead of zeros.

PIV version: 5.4.3
PIN tries remaining: 3/3
Management key algorithm: TDES
Management key is stored on the YubiKey, protected by PIN.
CHUID: 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
CCC:   No data available.
Slot 84:
        Algorithm: ECCP384
        Subject DN: CN=SSL.com Code Signing Intermediate CA ECC R2,O=SSL Corp,L=Houston,ST=Texas,C=US
        Issuer DN: CN=SSL.com Root Certification Authority ECC,O=SSL Corporation,L=Houston,ST=Texas,C=US
        Serial: 00000000000000000000000000000000000000
        Fingerprint: 0000000000000000000000000000000000000000000000000000000000000000
        Not before: 2019-03-07 19:35:47
        Not after: 2034-03-03 19:35:47
Slot 9a:
        Algorithm: ECCP384
        Subject DN: CN=PJRC.COM\, LLC,O=PJRC.COM\, LLC,L=Sherwood,ST=Oregon,C=US
        Issuer DN: CN=SSL.com Code Signing Intermediate CA ECC R2,O=SSL Corp,L=Houston,ST=Texas,C=US
        Serial: 00000000000000000000000000000000000000
        Fingerprint: 0000000000000000000000000000000000000000000000000000000000000000
        Not before: 2024-03-04 15:51:07
        Not after: 2027-03-04 15:51:07

The “PIN tries remaining” is particularly important. If you enter the PIN password incorrectly 3 times, the hardware becomes locked. Best to download the graphical YubiKey Managers software from Yubico’s website, and use it to set a new PIN with the PUK password. However, 3 wrong tries with PUK locks the hardware completely. Unlocking requires deleting the keys you paid $250 to get.

Compile a Windows EXE program

Create a file “hello.c” with any simple C program.

#include <stdio.h>
int main() {
    printf("Hello World\n");
    return 0;
}

Use these commands to compile and run the code.

i686-w64-mingw32-gcc -Os -s -o hello.exe hello.c
wine hello.exe

Check Yubikey PKCS11 module pathname

The default PKCS11 module pathname is /usr/lib/x86_64-linux-gnu/libykcs11.so

If you run Linux on hardware other than x86-64, such as Raspberry Pi, or if you use a different Linux distribution, the pathname may differ. On Ubuntu, you can use dpkg to see all the files the ykcs11 package installed.

dpkg -L ykcs11
ls -l /usr/lib/x86_64-linux-gnu/libykcs11.so
ls -l /usr/lib/x86_64-linux-gnu/libykcs11.so.2
ls -l /usr/lib/x86_64-linux-gnu/libykcs11.so.2.2.0

Sign your hello.exe file

To sign your freshly compiled EXE file, run osslsigncode with these parameters. If your PKCS11 module pathname is different, replace it as needed.

rm -f hello2.exe
osslsigncode sign -h sha2 -pkcs11module /usr/lib/x86_64-linux-gnu/libykcs11.so -certs mycert.pem -key 'pkcs11:pin-value=XXXXXXXX' -ts http://ts.ssl.com -in hello.exe -out hello2.exe

If hello2.exe already exists, osslsigncode isn’t very smart and will show confusing errors. Best to make sure it’s deleted before running osslsigncode.

Your new hello2.exe file is a signed copy. When copied to a Windows computer, right click and view Properties. It should have a Digital Signatures tab. Clicking Details should show the signature info.

Microsoft Defender SmartScreen

Unfortunately, unless you purchased the most expensive “EV” type certificate, your signature using a brand new key will not yet have a reputation with SmartScreen. Windows users will see “isn’t commonly downloaded”.

Eventually as more people download and use your software, SmartScreen will begin to trust it.

Feedback

If you have questions or feedback, please post on this forum thread.

Especially if you know of certificate authorities offers better prices, please share!

Improving Arduino Serial Monitor Performance

Recently I’ve been working to improve the Arduino Serial Monitor.  Here it is running with Teensyduino 1.48-beta1.

Previously if a board sent data this fast (as Teensy 4.0 can), Java would run out of memory and the Arduino IDE crashes.

Teensy 4.0’s USB code is not yet fully optimized, so we can expect even greater speeds later this year.  The Arduino Serial Monitor needs improvement to handle these faster data rates!

Deja Vu From 2014

This isn’t the first time Teensy has crashed Arduino by sending too rapidly to the Serial Monitor.  Back in 2014, this same problem existed with Teensy 3.1.  Serial.print() without delay on Teensy 3.1 would cause Java to run out of memory and crash the Arduino IDE.

Arduino Due was also capable of crashing the Arduino IDE this way.  The Arduino developers had tried in October 2014 to solve it by limiting the buffered data size, which helped, but still Java would eventually run out of memory and lock up.

On December 6, 2014, I finally managed to work around the problem well enough for Arduino to handle sustained USB full speed (12 Mbit/sec) incoming data.  My solution worked around the terrible slowness of adding and removing data from the JTextArea component by collecting incoming data to a buffer and using a timer to add data at only 30 Hz rate.  It also limited the rate of removal to only once every 150 adds, and removed by the number of characters rather than the number of lines.  4 days later, the Arduino developers adapted my solution and merged it into Arduino.  This code as been in every version of Arduino since 1.6.0.

At the time, I wrote this explanation of the details and rant about Java performance.  Back then I wrote “Java is pretty horrible”.  Now with the benefit of hindsight, I realize I was equating Swing’s JTextArea and JTextComponent classes (and the complicated data storage infrastructure lurking behind them) with Java in general.  I also wrote “if dramatically faster hardware is made … in the future, this buffer might need to grow”.

Now with Teensy 4.0 bringing that dramatically faster hardware, and some hindsight, I’ve learned how so much more than merely increasing the size of an intermediate buffer is needed to support sustained data transfer at such speed.

What’s Really Using So Much Memory?

I quickly discovered the terrible slowness inside JTextArea & JTextComponent scaled up (or “down”) rapidly with data size.  Keeping the same 30 updates per second but with larger data would not work.  It failed spectacularly.  Under the load of Teensy 4.0 printing without delay, the Arduino IDE would run slowly for a matter of seconds, then on Windows and Mac start throwing OutOfMemoryError exceptions and ultimately lock up.  On Linux, it would keep running, but unusably slow and consume many gigabytes of memory.  Not good.

To start digging into the problem, I ran VisualVM, which is a Java profiler.  It’s one of the programs bundled with the Java SDK.  If you have the SDK and a JAVA_HOME environment variable (the usual setup for compiling Arduino from source), it can be run from the command line with “$JAVA_HOME/bin/jvisualvm”.

VisualVM is very easy to use.  Every Java-based program running on your machine shows up in the “Local” group.  Arduino appears as “processing.app.Base (pid [number])”.  Clicking it connects the profiler to the running Arduino IDE.  Then clicking the “Profiler” tab lets you see which Java classes are using so much memory.

This screenshot shows the memory use after only several seconds of Teensy 4.0 printing rapidly to the Serial Monitor.  While 100 megabytes is used by raw character data, the really startling result is nearly 2 million live instances of GapContent$MarkData, GapContent$UnfoPosRef, AbstractDocument$LeafElement, and GapContent$StickyPosition.  Yikes!

Even on Linux, with the extra burden of the VisualVM profiler, Java quickly crashes under the strain of Teensy 4.0 printing without delays.  But the profiler served its purpose, so shine light on what’s consuming such an insane amount of memory.  GapContent appeared to the culprit.

Flexible or Bloated, A Matter of Perspective?

Java has a pretty amazing amount of good documentation.  Google searches always turn up Oracle’s reference material.  Usually searches turn up many nice Java tutorials and well answered questions on sites like Stack Overflow.  But from the lack of non-reference material, not even unanswered questions (other than people trying to use JTextArea or JTextComponent as a terminal or live log file display and hitting these same memory use problems), it seems this part of Java is a seldom traveled path.  That’s much of the reason I’m taking some time to write this lengthy blog article, to share with you what I’ve learned on this optimization journey.

I spent a lot of time reading the Java reference pages.  A lot of time…

Internally a number of Java classes are used in a rather modular way.  This modularity could be been seen a highly flexibly or highly bloated system, either a blessing or a curse, depending on your perspective.  In the end, the modular nature turned out to be quite useful.  But first, let’s look at how it’s structured.

This diagram from the JTextComponent reference best sums up the way things really work under the hood.

If you compare with Oracle’s page, you’ll see I’ve add the GapContent part to this diagram.  It turns out the Document class actually outsources all the data storage to GapContent.  At one point I had imagined just replacing GapContext with something more efficient.  But sadly, the GapContent API is designed around the assumption that the data size grows to any arbitrary size.  I wanted to replace all the storage with a fixed-size FIFO circular buffer and avoid *any* use dynamically allocating classes or large data on the heap during the sustained processing of data.

FifoDocument Class

GapContent had to go, so I started work on a new FifoDocument class which would hold all the Serial Monitor text in a fixed size array.

The idea was simple.  Since we only new add lines at the end, and delete the oldest lines from the beginning, this ought to be simple, right?

Elements & Events

At first I did not understand the purpose of the Element class.  I imagined just sending a DocumentEvent output with a single Element representing all the text.  If you read only that Element reference, perhaps you can see how it seems to imply that might work?  At least that’s what I incorrectly assumed.

The Document reference is the only other page (which I found) describing Elements.  But it only describes how they might be used in a generic way.  This Element structure image is completely wrong for the use case of JTextComponent.

It turns out JTextComponent expects a single top-level Element as a container for the entire document, which has child Elements representing each line.  Near the end of the explanation on that Document reference page is a dead link to “see The Swing Connection and most particularly the article, The Element Interface“.  Every indication is Oracle deleted “The Swing Connection” blog many years ago, and dead links automatically redirect to the generic Java page.

Fortunately I did find a copy of The Element Interface article archived at an academic site.  This article is essential to understand what Element structure the various Java Document classes actually use, if you want to craft your own custom Document class to replace on of them.  For the Arduino Serial Monitor case, it’s the PlainDocument structure.

My initial hope to use a single Element was replaced by adding a large, fixed size array of FifoElementLine instances which keep track of where the individual lines of data are located within the big FIFO circular buffer.

With this addition and many trial-and-error tests to figure out which functions actually get called, *finally* the Serial Monitor window started responding to the DocumentEvent notifications and displayed the text.

An early experiment also showed that FifoDocument could delete data without receiving any input from JTextArea.  By simply sending an event to notify JTextArea that data has been deleted, indeed the Serial Monitor would update properly.  This highly flexible event-based design that could be seen as bloat turned out to be very useful for implementing an efficient FIFO that automatically discards old data.

Later this ability for the Document to notify the GUI of changes (which it didn’t initiate or participate in any way) turned out to be even more useful for directly adding data into the FIFO.

Bugs & Thread Safety

Without the help of several great people on the PJRC Forum, especially Tim (Defragster), FifoDocment probably never would have reached a usable state.  The DocumentEvent interface involves many complex requirements which are only scantly documented.  Many tries were needed to get everything right.  Defragster found pretty much every bug very quickly.

Even after the “easy” bugs where fixed, thorny problems with threads remained.  I ended up making almost all the public methods of FifoDocument synchronized.  The thread which adds new data into FifoDocument is also called using SwingUtilities.invokeAndWait().  These are less than optimal.  Perhaps later even better performance could be possible?

Direct Write Into FIFO Memory

Even with all these optimizations, Java would still run out of memory on some Macintosh machines.  Arduino’s traditional implementation of the serial monitor makes multiple copies of incoming data before it finally ends up stored in FifoDocument (or GapContent).

First, the Serial class has an event handler which receives incoming characters into a buffer, which is allocates on the heap.  Then that buffer is converted to a String and passed to a message() function, which is the abstraction allowing different classes to receive data.  The contents of that String are then copied into a StringBuffer instance, which is the improvement I contributed 5 years ago.  At a rate of 30 Hz, then StringBuffer is then copied another String instances, which is passed to JTextArea, which then passes it to the Document storing the data.

I replaced all this copying of data with path directly from arriving characters into the FifoDocument character buffer.

Unfortunately this means overriding the usual path data takes between the abstract serial monitor classes.  Instead, a single loop waits for data to arrive.  When data is ready to read, it requests the maximum number of bytes FifoDocument can accept, and the offset where that data goes inside FifoDocument’s fixed size buffer.  It then reads incoming characters directly from the input stream to FifoDocument’s buffer.  No extra copies are made in other buffers, or String instances to pass the data between abstraction layers.

With this final optimization, even older Macs could continuously receive from Teensy 4.0 without running out of memory or locking up.

However, Java’s InputStreamReader class is still used to convert the raw bytes from UTF8 format to Java’s internal handling of all characters, and the Document event API still uses heap-based temporary allocations for some features.  These do still cause Java’s memory usage to grow gradually, then suddenly shrink what Java’s garbage collection runs.  But at least initial testing appears as if this overhead is acceptable.

Auto Scroll Behavior

While developing the FifoDocument with a truly fixed size buffer, I was forced to make some hard decisions about how to handle the Serial Monitor’s auto-scroll checkbox.

Since Arduino 1.6.0, the Serial Monitor has used a target size of 4,000,000 characters for its buffer (half of the maxChars size in the TextAreaFIFO class).  But this only checked every 150 updates, which can happen no faster than 30 Hz.  Regardless of the auto-scroll checkbox, if the stored data has grown longer than 4,000,000 characters, the oldest data is deleted so only 4,000,000 characters remain.

FifoDocument has a fixed size buffer, rather than the flexible storage of GapContent, so I was not able to preserve this functionality.  I’m also not convinced this existing approach is really correct, since rapidly arriving data can cause whatever the user is trying to read to be deleted.  The TextAreaFIFO class as a flag to tell whether to trim the oldest data, but in all modern versions of Arduino it is never used or changed.

For FifoDocument, I implemented a buffer management policy where the buffer is allowed to fill to 60% capacity while scrolling.  During sustained scrolling, 40% of the buffer remains free.

When auto-scroll is disabled, FifoDocument allows new data to completely fill the buffer.  Once the buffer is 100% full, FifoDocument discards new data.  This may be a controversial decision.  The idea is to allow the user to read anything already within the buffer, and everything new which arrives until the buffer becomes 100% full.  The Serial Monitor window can’t jump to go blank if too much new data pours in while the user is reading.  But once the fixed size buffer is full, nothing more can be captured until auto-scroll is turned back on.

FifoDocument currently implements a 10,000,000 character buffer, and a maximum of 1,000,000 lines.  This allows more data to be captured, but the behavior is not exactly the same as current versions of Arduino when auto-scroll is turned off.

Performance

Despite my frustrated words 5 years ago, indeed Java is capable of implementing the serial monitor at these higher speeds.  But the pervasive approach of most Java code, allocating new objects while passing data between many abstraction layers, puts far too much burden on Java’s memory management and garbage collection.  Using a fixed size buffer, with incoming data stored directly to the buffer without allocating copies on Java’s heap is indeed fast enough.

In a somewhat surprising result, the most efficient system for this use is Microsoft Windows 10.

The Java virtual machine appears to run Java code more efficiently on Windows than it does on Macintosh and Linux.  Perhaps Oracle (or Sun) invested more work to optimize the JRE for Windows?

In this screenshot, you can see the “teensy_serialmon” program running, also with low CPU usage.  This helper program talks to Teensy using Windows native WIN32 functions and then passes the data to Java using stdin & stdout streams.  On Windows 10, both the built in serial driver and the lightweight anonymous pipes used for stdin/stdout appear to be very efficient when accessed with native WIN32 functions.

While Linux is not far behind Windows, unfortunately Macintosh appears to have considerable CPU overhead for accessing serial devices.

This screenshot was taken on the same Macbook Air as the Windows 10 test above (running natively, via Bootcamp dual boot).

However, the Macintosh USB drivers allow Teensy 4.0 to transmit about 60% faster than the drivers on Windows and Linux.  When running the USB serial print speed benchmark, Linux and Windows usually sustain about 200,000 lines/second.  Macintosh usually runs just over 300,0000 lines/second.

These differences are very likely an artifact caused by less-than-optimal USB driver code on Teensy 4.0, interacting with subtle timing differences in the USB host drivers on each system.  I have been delaying work for USB optimizations on Teensy 4.0 until the Serial Monitor is capable of handling the incoming speed without crashing the Arduino IDE.  Now, with these improvements, I can start to focus my effort on optimizing the Teensy side!

Arduino Contribution

As always, my intention is to contribute Teensy-inspired improvements to the Arduino IDE back to the Arduino project.  Several weeks ago I exchanged a few emails with the Arduino developers about this Serial Monitor optimization work, so they are aware of my effort.  Part of the reason to write this lengthy article is to document this work from a “high level” perspective which isn’t really possible by the comments in the code which give more detail.

All of this source code is published on Github.  These are the files:

  • FifoDocument.java – All the FIFO and Document code
  • TeensyPipeMonitor.java – Teensy’s “Pluggable” Serial Monitor using FifoDocument.  This file creates the listener threads which receive stdin data directly into FifoDocument’s buffer, and parse stderr for status updates.
  • FifoEvent.java – The event info FifoDocument sends to JTextComponent.  Methods just call the actual implementation in FifoDocument.
  • FifoElementLine.java – The Element instances representing individual lines of text.  Methods just call the actual implementation in FifoDocument.
  • FifoElementRoot.java – The top-level Element required by JTextComponent.
  • FifoPosition.java – The Position interface which JTextComponent requires for users to select text and copy to clipboard.  Positions are actually implemented with a 64 bit total-historic offset, as explained in this comment.

This work could be considered 2 separate contributions, even though their code is now pretty tightly integrated.  The other work could be called “Pluggable Serial Monitor”, similar in concept to Pluggable Discovery.  The concept, like with discovering ports, is a board can provide a program like “teensy_serialmon” which does the actual communication and makes it available to the Arduino IDE using stdin & stdout streams.  Teensy has used this approach in beta testing since early 2017, and released in Teensyduino 1.42  June 2018).  The port discovery portion was accepted by Arduino around that time.  The serial monitor part will hopefully become an official Arduino feature at some point in the future.

Whether this serial monitor performance improvement may ever accepted by Arduino is unclear.  Arduino has announced at Maker Faire and Arduino Day (March 2018) a next-gen Arduino IDE which will no longer use Java.  How they feel about merging such a large change to the existing Java code is unclear, especially when Teensy 4.0 is (probably) the only board available today which can transmit at these speeds which crash the existing IDE.

Still, my hope is this code may eventually find its way into an official Arduino release.  Eventually more microcontrollers will appear on the market with 480 Mbit or faster USB, and fast enough processors and USB code capable of sustained printing at these speeds and perhaps much faster.

When anyone is later interested in this Serial Monitor optimization, hopefully this lengthy article and the code on github will help.

High Precision Sine Wave Synthesis Using Taylor Series

Normally sine waves are generated on microcontrollers using a table lookup.

Lookup tables are perfect when wavelength happens to be an exact multiple of the sample rate, because you never actually need to know the values in between the table’s stored points.

But if you want to generate waveforms at any frequency without changing your sample rate, you end up needing points on the waveform that are between two entries in the table.  Four approaches are possible.

  1. Use the prior point, even if the next point would have been better
  2. Use the prior or next point, whichever is closer
  3. Use both nearest points with linear interpolation
  4. Use 3 or more nearest points, with spline or other non-linear interpolation

With any of these, larger lookup tables give better accuracy.  Since sine waves have symmetry, some programmers choose to store only 1/4 of the waveform and add slight overhead to map the other 3 quadrants onto the smaller table.

The Teensy Audio Library uses approach #3 for normal sine wave synthesis.  The vast majority of sine wave examples in the Arduino ecosystem use approach #1.

If you want a sine wave with extremely low distortion, where 16 or 20 or even 24 bits are within +/- 1 from an ideal sine wave, you would need an extremely large table!

Ideally, you’d want to be able to very rapidly compute an accurate sine value for any 32 bit resolution phase angle, so your samples always line up to an ideal sine wave.

Sine can be computed using Taylor series approximation.  The formula is: (where x is the angle, in radians)

sin(x) = x – (x^3)/3! + (x^5)/5! – (x^7)/7! + (x^9)/9! – (x^11)/11! + ….

This series goes on forever, but each extra terms makes the approximation rapidly converge to the true value.  In doing quite a lot of testing, I discovered the C library function on Linux for sin() uses this approximation, to only the (x^7)/7! term.  I also found a few sites talking about going to the (x^9)/9! for “professional quality” audio.

One nice advantage of cutting off the Taylor series with on of the subtracted powers (3, 7, 11, etc) is the tiny remaining error will always be slightly less than the true ideal sine value.  This means the final result does not need be checked for greater than 1.00000 and rounded down to fit into the maximum value of an integer.

If you’re still reading by this point, you’re probably shaking your head, thinking this couldn’t possibly be practical in a microcontroller.  That’s a complex equation with floating point numbers, and huge values in x^11 and 11!, since 11 factorial happens to be 39916800.

However, this Taylor series equation can be computed very efficiently, by exploiting the Cortex-M4 DSP extension instructions and bit shift operations, where the phase angle from 0 up to 2π is mapped from 0x00000000 to 0xFFFFFFFF.

The code I’m sharing here implements this equation to the (x^11)/11! term using 32 bit integers, using only 12 multiply instructions, which execute in a single cycle on Cortex-M4.  The add & subtract take zero CPU time, since those multiply instructions also come in flavors that do a multiply-and-accumulate, either positive or negative accumulate.

The Cortex-M4 multiplies perform a 32×32 to 64 bit multiply, and then discard the low 32 bits, with proper round off.  That turns out to be exactly the right thing for managing the huge values of x raised to an increasing power, and the huge numbers of the factorials.  Since those divisions are by constants, it’s possible to multiply by the reciprocal to get the same effect.

So, here’s is the optimized code:

https://github.com/PaulStoffregen/Audio/blob/master/synth_sine.cpp#L75

// High accuracy 11th order Taylor Series Approximation
// input is 0 to 0xFFFFFFFF, representing 0 to 360 degree phase
// output is 32 bit signed integer, top 25 bits should be very good
static int32_t taylor(uint32_t ph)
{
        int32_t angle, sum, p1, p2, p3, p5, p7, p9, p11;

        if (ph >= 0xC0000000 || ph < 0x40000000) {
                angle = (int32_t)ph; // valid from -90 to +90 degrees
        } else {
                angle = (int32_t)(0x80000000u - ph);
        }
        p1 =  multiply_32x32_rshift32_rounded(angle << 1, 1686629713);
        p2 =  multiply_32x32_rshift32_rounded(p1, p1) << 3;
        p3 =  multiply_32x32_rshift32_rounded(p2, p1) << 3;
        sum = multiply_subtract_32x32_rshift32_rounded(p1 << 1, p3, 1431655765);
        p5 =  multiply_32x32_rshift32_rounded(p3, p2) << 1;
        sum = multiply_accumulate_32x32_rshift32_rounded(sum, p5, 286331153);
        p7 =  multiply_32x32_rshift32_rounded(p5, p2);
        sum = multiply_subtract_32x32_rshift32_rounded(sum, p7, 54539267);
        p9 =  multiply_32x32_rshift32_rounded(p7, p2);
        sum = multiply_accumulate_32x32_rshift32_rounded(sum, p9, 6059919);
        p11 = multiply_32x32_rshift32_rounded(p9, p2);
        sum = multiply_subtract_32x32_rshift32_rounded(sum, p11, 440721);
        return sum <<= 1;
}

On top of the 12 cycles for multiplies, there’s a few bit shifts, and a quick conditional test which subtracts from a constant.  That’s necessary because the Taylor series approximation applies only if the angle is between -pi/2 to +pi/2.  For the other half of the sine wave, that subtract maps back into the valid range, because the sine wave has symmetry.

This function takes a 32 bit angle, where 0 represents 0 degrees, and 0xFFFFFFFF is just before 360 degrees.  So the input is perfect for a DDS phase accumulator.  The output is a 32 bit signed integer, where 0x7FFFFFFF represents an amplitude of +1.0, and 0x80000001 represents -1.0.

This code will never return 0x80000000, so you don’t need to worry about that case.

I did quite a lot of testing while working out these constants and the bit shifts for correct numerical ranges.  I believe the top 25 bits are “perfect”.  Six of the low 7 bits are very close, but the approximation does diverge slightly as the angle approaches pi/2 magnitude.  The LSB is always zero, since the computation needs to have extra overhead range to accommodate values representing up to ~1.57 (pi/2) before the latter terms converge to the final accurate value.

For 8 bit AVR, this approach probably isn’t practical.  It probably isn’t practical on Cortex-M0+ either, since there’s no 32×32 multiply with 64 bit result.  Cortex-M3 does have such a multiply, but not in the convenient version that rounds off and discards the low 32 bits.  On Cortex-M4, this code runs very fast.  In fact, when executing at 100 MHz or faster, it might even rival the table lookup, since non-sequential flash accesses (for the table) usually involve a few wait states for a cache miss.  Then again, this code does have 6 integer constants, for the conversion to radians and the factorial coefficients… and depending on compiler flags and flash caching behavior, loading those 6 constants might be the slowest part of this algorithm?

I’m sure most people will still use table lookups.  Linear interpolation between the nearest 2 table entries is fast and gives a result good enough for most applications.  Often a large table is also works well enough, without interpolation.  But I wanted to take a moment to share this anyway, even if it is massively overkill for most applications.  Hope you find it interesting.

UPDATE: Josy Boelen mentioned alternate forms for Taylor series approximation which require fewer multiplies.  Whether these could also be optimized with the M4 DSP extension instructions (not keeping full 64 bit resolution at every step) could be a really interesting future project…

 

This article was originally published in January 2016 (archive.org link) on the DorkbotPDX site.  Since then, the DorkbotPDX blog section has vanished.  I’m reposting it here with slight edits and a couple waveform plots, to preserve the info, and also because Michael Field recently asked for an article about these sorts of numerical approximations (which are rarely given as highly optimized fixed-point source code).

 

 

Arduino Pluggable Discovery

Arduino 1.8.9 has a new Pluggable Discovery feature, which began as a contribution from Teensy.  Ports can now be customized to support any protocol.

Already Omlzo is using Pluggable Discovery with CAN bus.

In this article, I’ll dive into the details needed to use Pluggable Discovery.  But first…

A Brief History

Since the first Arduino software in 2005, the main way to upload code to boards has been by serial ports.  In 2008, support for in-circuit programmer hardware was added, where you select a type of programmer  rather than a port.  In 2013, port detection and selection in the Ports menu was expanded to include network ports for Arduino Yun.  Around that time, Arduino also became highly configurable by recipes in platform.txt, based on a major contribution from Mark Sproul & Rick Anderson.   But until now, Arduino has supported detecting only 2 types of ports, either serial or network (using Yun’s specific protocol), with the only way to customize being dedicated programmer hardware.  Protocols other than serial & Yun’s network just weren’t supported for the Ports menu.

Teensy has used HID protocol, not Serial, for code uploading since its launch in 2008.  Using Teensy with Arduino was somewhat awkward.  When running a program implementing USB Serial, Teensy would appear in Arduino’s list of serial ports.  But with HID or other non-serial protocols, no port would be shown.  Teensyduino’s upload process could automatically detect Teensy in those modes, so uploading would “just work” even though the ports menu indicated no ports available.  When things went wrong, like this infamous Windows 7 driver bug, lack of reliable ports display made confusing situations hard to understand.

In early 2017, I started work to improve how Teensyduino integrates with Arduino.  A dedicated “teensy_ports” utility watches for Teensy’s USB devices to be added & removed.  Inside the Arduino IDE, I added a 3rd type of port to the DiscoveryManager class, using this utility’s info.  Finally, Teensy could be seen in the Ports menu even with running in a non-Serial mode.  You could choose exactly which board to use for upload, rather than hoping the automatic detection would find the right one.

In a June 2018 conversation with Massimo Banzi, Cristian Maglie and other Arduino developers, it was decided that I would contribute this code to Arduino.  A flexible JSON format would replace the original single-line text, so anyone could use it, and Arduino can adapt the format in the future.  After a lengthy process, where Cristian rewrote and greatly improved the JSON parsing, in March 2019 Arduino 1.8.9 was finally released with this new “Pluggable Discovery” feature.

JSON Data Format

The first step to using Pluggable Discovery is to write a “discoverer” utility which sends information to the Arduino IDE as your board is connected and disconnected.  Data is sent to Arduino on stdout, and Arduino may send you commands on stdin.

Here is a sample of the JSON data “teensy_ports” sends to Arduino 1.8.9 when running on Linux.

{
  "eventType": "add",
  "port": {
    "address": "/sys/devices/pci0000:00/0000:00:14.0/usb4/4-14/4-14.1/4-14.1.4",
    "protocol": "Teensy",
    "protocolLabel": "Teensy Ports",
    "boardName": "Teensy 3.6",
    "label": "/dev/hidraw5 Audio",
    "identificationPrefs": {
    },
    "prefs": {
      "vendorId": "16C0",
      "productId": "04D2",
      "serialNumber": "0"
    },
    "online": "true"
  }
}

Here is a quick attempt to document these fields.  Please understand these fields may change meaning, or others may be added by future Arduino releases.  Hopefully in the future Arduino will add official documentation for this JSON format.

  • eventType: Either “add” when a device is detected and Arduino should begin showing it in the Ports menu, or “remove” when disconnected and the Ports menu should no longer show it.  If information changes, or more information becomes available, multiple “add” messages may be sent to cause Arduino to update the Ports menu.  The eventType “error” is also supported.  See Discoverer Commands below for details.
  • port: A container for all of the information about the port.
  • address: A unique identifier for this port.  The address is not shows in the Ports menu, but does appear in the status bar on the bottom of the Arduino IDE windows when the port has been selected.  The address is the primary info you would give to your uploader utility.  See “Integrating With Your Uploader” below for details.
  • protocol: A unique (or at least distinctive) name for the communication protocol the port uses.  Currently Arduino 1.8.9 ignores this field, unless it is “serial” or “network”.  This name can be passed to your uploader utility.
  • protocolLabel: The name of your type of port, as shown in the title of a sub-section of the Ports menu.  Currently there is no way to adapt this field to different languages.  This info can also be passed to your uploader utility, though typically the “protocol” field would be used rather than this name meant for display to the user.
  • boardName: The name of the board.  This should probably match one of the names you define in boards.txt, though Arduino 1.8.9 does not currently enforce any requirements.
  • label: The human readable name for the port, to be shown as a selectable item in the Ports menu.
  • identificationPrefs: Arduino 1.8.9 does not make any use of this field.
  • prefs: This is meant to collect info about the port, such as the info shown by “Get Board Info”.  However, Arduino 1.8.9 does not make use of this field for non-serial ports.  Hopefully future versions will fix this.
  • online: While not used by Arduino 1.8.9 for non-serial ports, this should normally be “true” when eventType is “add” and “false” when eventType is “remove”.  With serial ports, the port’s online status can be momentarily false when Arduino is attempting to send a command to the board to automatically reboot into its uploading or bootloader mode.

When a port should be deleted, only the eventType and address fields are required.  Here is a sample of the JSON remove message teensy_ports transmits.

{
  "eventType": "remove",
  "port": {
    "address": "/sys/devices/pci0000:00/0000:00:14.0/usb4/4-14/4-14.1/4-14.1.4",
    "online": "false"
  }
}

Like all programs which print to stdout, buffering may exist between your output and Arduino’s reception.  Composing the entire JSON message in a buffer, then using a single write and flush to stdout is the most efficient way.  Arduino is using the Jackson library to parse your JSON messages, so sending piecemeal, with or without newlines or other optional white space should also work.

Discoverer Commands

The Arduino IDE will also send your discoverer commands on its stdin stream.  These are not JSON.  They are plain text commands, terminated by a single newline character.

  • START_SYNC – Begin running in sync mode, where your program transmits JSON messages automatically as the device change events occur.
  • START – Begin running in polled mode, where Arduino will send a “LIST” command at regular intervals.
  • LIST – In polled mode, this command is meant to prompt your program to scan for changes.
  • STOP – Stop running.

Future versions of Arduino may behave differently, so please understand the following behavior is specific to version 1.8.9.

When Arduino runs your discoverer, it will send a single START_SYNC command.  If you do not need polled mode, no response is needed or expected.  Your program simply transmits JSON messages as it detects ports, starting with “add” eventType messages all the ports currently available, and then more “add” or “remove” messages as available ports change.

If you require polling mode, you would transmit this JSON message.

{
  "eventType": "error",
  "message": "START_SYNC not supported"
}

Upon receiving this message, Arduino should transmit “START” and then send “LIST” at regular intervals.  If your discoverer is unable to detect hardware changes as they occur, you can design your code to wait for “LIST” and then check whether ports have changed.

With polling, typically you would need to design your discoverer to keep in memory a list of the ports it has previously transmitted to Arduino.  When you poll which ports are actually available, your discoverer is responsible for detecting when any port you have previously transmitted an “add” eventType is no longer present.  You must transmit the “remove” eventType, to cause the Arduino IDE to no longer offer that port to users in the Ports menu.

A list of previously detected ports can also help you avoid repeatedly sending the same info as your discoverer checks which ports currently exist.  But this is not required.  As long as the “address” field remains the same, Arduino Pluggable Discovery treats a redundant “add” eventType as a request to update the port’s info.  Sending the same info repeatedly is inefficient, but harmless.

Polling mode may not be working properly in Arduino 1.8.9.  Most testing was done with sync mode.

Arduino 1.8.9 transmits STOP only when the user closes the entire IDE, just before your program is terminated.  There is no guarantee your discoverer will continue running long enough to actually receive the STOP message.

Future Arduino software may send START_SYNC, START and STOP at different times, but with 1.8.9 the start messages are used only as the Arduino IDE starts up, and STOP is sent only when the user quits the software.

Running Your Discoverer

Like everything else customizable in modern Arduino, you create a recipe in your platform.txt file which tells Arduino to run you discoverer.  Here is a recipe for “teensy_ports”.

## Teensy Ports Discovery (Arduino 1.8.9)
discovery.teensy.pattern="{runtime.ide.path}/hardware/tools/teensy_ports" -J2

This recipe assumes the teensy_ports utility is installed in the hardware folder, inside the Arduino software.  Because the recipe starts with “discovery”, Arduino 1.8.9 detects it is a command for a pluggable discovery program.  It will be run as Arduino starts up.  Use of {runtime.ide.path} only works if your discover is copied to a locating inside Arduino’s folder.

The “-J2” is parameter given to “teensy_ports”, the same as if you had typed “./teensy_ports -J2” on the command line.  While unlikely to be needed, you can build almost any command line your discoverer might need.  In this case, the JSON format changed during development, so “-J2” tells teensy_ports with JSON dialect to output.

If your board is installed by Arduino’s Boards Manager, your discover is meant to be defined and installed as a tool and referenced with {runtime.tools.TOOLNAME.path}.  See this comment from Cristian for more detail.

Integrating With Your Uploader

Now that your discover can find your board’s custom port, and the user can select it in the Ports menu, you need a way to actually pass the detected info to your board’s uploader program.

Fortunately this part is easy.  The macro {serial.port} in your uploader recipe in platform.txt will be replaced by whatever text your discoverer provided in the JSON “address” field.

tools.yourtoolname.upload.pattern="{runtime.hardware.path}/upload_program_name" "{build.path}/{build.project_name}" "{serial.port}"

Yes, it’s still called “serial.port” in these substitution patterns, even though we can now (theoretically) extend the Arduino IDE any sort of protocol.

While your uploader probably only needs the unique address field and the pathname to find the code Arduino compiled, you can also pass some of the other JSON fields to your uploader.  Here is the recipe Teensy is currently using.

tools.teensyloader.upload.pattern="{cmd.path}/teensy_post_compile" "-file={build.project_name}" "-path={build.path}" "-tools={cmd.path}" "-board={build.board}" -reboot "-port={serial.port}" "-portlabel={serial.port.label}" "-portprotocol={serial.port.protocol}"

Hopefully Arduino will eventually document the mapping from JSON fields to recipe substitution names.  These were found by reading the Java source code and some experimentation.

Troubleshooting

What could possibly go wrong?!

Arduino 1.8.9 has a fair amount of debugging code built in to help you see how Arduino is running your discoverer, and what it’s receiving.  To enable the debug output, add this line to your preferences.txt file.

discovery.debug=true

If you’re not sure where your preferences.txt file is located, click File > Preferences and look at the info in the button part of the Preferences dialog.

Minor Issues In Arduino 1.8.9

Polling mode may not work.

The substitution macro {runtime.hardware.path} very often used in platform.txt recipes is not yet supported.  Teensyduino adds a patch to Arduino 1.8.9 to support this in discovery recipes.  If you prefer {runtime.hardware.path}, installing Teensyduino may be a short-term solution.  Hopefully this patch will be merged into future Arduino versions.

Other substitution issues may exist in Arduino 1.8.9.  Pluggable discovery is brand new.

Pluggable Discovery does not yet exist in Arduino’s Command Line Interface software.  Using Pluggable Discovery today means your board’s package can not work with the CLI.  Cristian started work on Pluggable Discovery for CLI months ago, but since January there appears to be no progress.  Hopefully as more people use Pluggable Discovery to support non-Serial protocols, we’ll see progress on bringing it to CLI as well.

Serial Monitor

Of course the other key Arduino IDE feature missing to truly support boards with any type of port is the serial monitor window.  And also the serial plotter.

Teensyduino currently adds several patches to the Arduino IDE to install a customized serial monitor.  It communicates, via stdin/stdout/stderr with a “teensy_serialmon” utility program.  Hopefully in the future, these patches can eventually be contributed to Arduino as as “Pluggable Serial Monitor” feature which complements Pluggable Discovery.

But at least for now, if you are interested in extending the Arduino IDE to work with boards using completely different communication protocols, hopefully the Pluggable Discovery contribution and this article can help.

 

 

 

 

Better SPI Bus Design in 3 Steps

Most Arduino SPI tutorials show this simple but poor SPI bus design:

Better SPI bus design can prevent conflicts.  3 simple improvements are needed:

  1. Use pullup resistors on all chip select signals.

  2. Verify tri-state behavior on MISO: use a tri-state buffer chip if necessary.

  3. Protect bus access with SPI.beginTransaction(settings) and SPI.endTransaction().

Step 1: Pullup Resistors for Chip Select & Reset Signals

When multiple SPI devices are used, and especially when each is supported by its own library, pullup resistors are needed on the chip select pins.

Without a pullup resistor, the second device can “hear” and respond to the communication taking place on the first device, if that second device’s chip select pin is not pulled up.  This is easy to understand in hindsight, but it can be temendously confusing and frustrating to novice Arduino users who purchase shields or breakout boards without pullup resistors.  Each SPI device works when used alone, but they sometimes mysteriously fail when used together, only because both devices are hearing communication meant to initialize only the first device!

A simpe workaround for devices without pullup resistor involves adding code at the beginning of setup.

    void setup() {
      pinMode(4, OUTPUT);
      digitalWrite(4, HIGH);
      pinMode(10, OUTPUT);
      digitalWrite(10, HIGH);
      delay(1);
      // now it's safe to use SD.begin(4) and Ethernet.begin()
    }

Step 2: Proper MISO Tri-State Behavior

Most SPI chips will tri-state (effectively disconnect) their MISO pin when their chip select signal is high (inactive).

However, some chips do not have proper MISO tri-state behavior.  Fortunately, checking MISO tri-state is easy, especially when prototyping on a breadboard.  Just connect two 10K resistors to the MISO line, like this:

When all SPI chips are disabled, the MISO signal should “float” to approximately half the Vcc voltage.  If any device is still driving the MISO line, you’ll see a logic high (usually close to 3.3V or 5.0V) or logic low (close to zero volts).  This test is so easy, it should always be performed by designers of Arduino compatible products.

Arduino shields and breakout boards with poorly-behaved chips should always include a tri-state buffer.  Adafruit’s CC3000 breakout board is a good example:

Step 3: USB SPI Transactions in Software

Newer versions of Arduino’s SPI library support transactions.  Transactions give you 2 benefits:

  • Your SPI settings are used, even if other devices use different settings
  • Your device gains exclusive use of the SPI bus.  Others will not disturb you.

These improvements solve software conflicts, allowing multiple SPI devices to properly share the SPI bus.

A typical use of transactions looks like this:

    SPI.beginTransaction(SPISettings(14000000, MSBFIRST, SPI_MODE0));
    digitalWrite(chipSelectPin, LOW);
    SPI.transfer(mybyte1);
    SPI.transfer(mybyte2);
    digitalWrite(chipSelectPin, HIGH);
    SPI.endTransaction();

SPI.beginTransaction() takes a special SPISettings variable, which give the maximum clock speed, the data order, and clock polarity mode.  The speed is give as an ordinary number, expressing the maximum clock speed that device can use.  The SPI library will automatically select the fastest clock available which is equal or less than your number.  This allows your code to always use the best speed, even on board with different clock speeds.

If your code will ever call SPI library functions from within an interrupt (eg, from attachInterrupt), you must call SPI.usingInterrupt().  For example:

    SPI.begin();
    SPI.usingInterrupt(digitalPinToInterrupt(mypin));
    attachInterrupt(digitalPinToInterrupt(mypin), myFunction, LOW);

If you are developing a library that must be compatible with older versions of Arduino, which lack these SPI transaction functions, you can use SPI_HAS_TRANSACTION to check for the new version.  For example:

    #ifdef SPI_HAS_TRANSACTION
    SPI.beginTransaction(SPISettings(2000000, LSBFIRST, SPI_MODE1));
    #endif

Please Share and Use This Information

Many SPI-based products for Arduino do not work well together.  My hope is this information can help all makers of Arduino compatible devices to achieve much better compatibility.

Long-term, sharing of knowledge is needed.  Please share this information and ask makers of SPI devices and libraries to consider these suggestions.

This article may be shared and copied under the terms of the Creative Commons Attribution 4.0 International License.  Please, copy & share!  🙂

 

This article was originally published on the DorkbotPDX website, on November 24, 2014.  In late 2018, DorkbotPDX removed its blog section.  An archive of the original article is still available on the Internet Archive.  I am republishing this article here, in the hope it may continue to be found and used by everyone using SPI chips, and especially companies making SPI-based products for the Arduino community.

 

 

 

Control Voltage to 1.2V Analog Input Pin

Often the question is asked, what is the simplest way to get modular synth control voltage (CV) into an analog input pin?

This simple circuit using only 3 resistors and 1 capacitor converts the -5V to +5V CV signal range to the 0 to 1.2V ADC input range.

For Teensy 3.2, 3.5 and 3.6, you would use analogReference(INTERNAL) to configure for the 0 to 1.2V range.  Then you can read the CV signal at any time with analogRead().

While this circuit is the simplest and easiest way, it is not necessarily the best way.  The input impedance, or load placed on the CV signal, it 31K ohms, rather than the standard 100K impedance normally used for modular synthesis systems.  This forum discussion goes into the circuit’s limitations and suggests more advanced ways using opamps.

 

Originally this article was published on the DorkbotPDX website.  Since that time, the DorkbotPDX blog section has vanished.  I’m reposting it here, to preserve this info.  A copy of the original can also be found at the internet archive.

Light Table For Web White Background Photos

Years ago, in my slow quest for better photography of electronic projects, I built a light table to eliminate shadows.  Most of the white background photos you see on the PJRC site are shot with this light table.

This is how it turned out.

The build used boards from OSH Park (then “Laen’s PCB group order”), materials from TAP Plastics, 15 white LEDs and parts I mostly had laying around.

This view is inside a cheap 2 foot sized light tent I purchased from some ebay vendor, and a couple bright lights outside the tent on both side.

The LEDs are Cree CLM3C-WKW-CWBYA453, which are supposedly the same 5500K color temperature as the CFL lights outside the tent. Maybe that matters, maybe not, but it seemed like a good idea.

The Cree LED is a surface mount part, but fortunately Lean’s PCB group order made it very easy to convert to something I can solder wires onto. All the PCBs mount with double sticky mounting tape.

As you can see in this LED photo, there’s a bit of shadow. It’s a soft shadow due to the light tent casting light from many directions, but it’s still very present.  This is the type of shadow I’m hoping just a little bit of under side lighting will eliminate.

This little board is a constant current regulator. It takes a 0 to 5 volt input and regulates a 0 to 20 mA output current to a string of 5 LEDs. I wanted to make sure the current was perfectly constant since the camera might choose a quick shutter time.

Here’s the schematic for that circuit. At the time, it seemed like a good idea to sense the current using a resistor between the NPN transistor’s collector and the LEDs. The idea was any small change in ground potential between the board 0-5V control signal wouldn’t matter, if I ran separate signal and power ground lines.

But I didn’t consider the current draw though those resistors around the upper opamp. As you can see in the schematic, I change the values to about as high as I reasonably could. It still have a tiny bit of the lowest part of the range where the LEDs won’t completely turn off.

The 0 to 5 volt signal just comes from this potentiometer on the front panel.  Because it’s driving only the inputs to opamps, it doesn’t have any substantial load.  I still used a 1K pot anyways, though a higher value would have consumed a little less current.  I guess I didn’t care about an extra 5 mA.

The power for everything comes from this simple little power supply, which is just (approx) 24 volts unregulated, and a 5 volt regulated output from the pot, which is from a 7805 regulator.  Simple.

Of course, the opamp circuit isn’t perfect. After putting this together, I decided to try a different approach, sensing on the emitter side, and no current sensing path to add to the LED current! I also included 4.7K resistors on the feedback looks, and the positive inputs see about a 1K impedance. Any errors from the opamp’s input (PNP) bias currents should be small, and should be more on the negative than positive, so hopefully any tiny error will tend to reduce the LED current, not increase.

Then again, the original boards might work out ok, but Laen’s PCB group order makes it so very easy to just quickly draw up a (small) board.  Because the cost is so low, it’s easy to just send it off without all the worry the goes into a normal board order.

I still haven’t actually put this thing to use… the top plastic cover turned out to be just a bit too small, so I need to go shave it down to size on my table saw (which currently has a bunch of other project stuff piled on top of it). But soon, with a little luck, I’ll be able to take pictures of electronic stuff and adjust the light table to null out or at least soften away most of the little shadows that I still get, even with the light tent.

 

Originally this article was published on the DorkbotPDX website.  Since that time, the DorkbotPDX blog section has vanished.  I’m reposting it here, to preserve this info.  A copy of the original can also be found at the internet archive.

Measuring Microamps & Milliamps at 3 MHz Bandwidth

Some time ago, I needed to actually “see” a current waveform in the 100 uA to 5 mA range with at least a couple MHz bandwidth.  This extremely expensive probe would have been perfect, but instead I built something similar for about $30 using the amazing Analog Devices AD8428 amplifier.

The first step was cutting the power trace and adding a resistor.  I used two 1 ohm resistors in parallel.

At 5 mA, this makes only 2.5 mV.  My scope’s supposed resolution is 1 mV, but the truth is there’s plenty of noise down in the 1 mV range.  That’s pretty common for most scopes, even pretty spendy ones.  So it’s just not feasible to measure this signal directly (not to mention using 2 probes and subtracting them in the scope).

That incredibly expensive Agilent probe probably has a couple really nice amplifiers inside…. so I went searching for an amplifier.  After a bit of searching, I found the AD8428.  It has a fixed gain of 2000 and a bandwidth of 3.5 MHz.  That’s a gain-bandwidth product of 7 GHz !!!  It’s also an extremely well matched instrumentation amp with an amazing CMRR of 140 dB.  So it gets rid of the power supply voltage and outputs the amplified signal referenced to ground.

The AD8428 is perfect.  It’s so very easy!  Of course, such amazing performance costs money: about $20.  Here’s that expensive little amplifier, and a 5V to +/- 15V power supply (about $10) to power it.

The one trick with measuring such tiny voltages is twisting the 2 sense wires together.  Honestly, I didn’t try it running them separately, but since this thing is getting voltages in only the microvoltage range for the lower measurements, I didn’t want to risk picking up noise.  I also put a 100 ohm resistor on the output, just in case I accidentally short the output or do something clumsy that might blow that little $20 part.

Here’s a scope screenshot using this little amplifier to “see” the current (the blue waveform).

In this test, the microcontroller is running in its slowest mode at only 10 kHz, drawing about 120 uA.  Then when the chip’s internal oscillator is started, the current jumps to about 600 uA.  Later, the CPU switches to actually clocking from that oscillator.  There’s an on-chip clock divider which is switched in and increased gradually.

The bottom trace (red) is the voltage on the chip’s 1.8V linear regulator.  It turns out that sudden jumps in current cause pretty substantial downward spikes on the regulated voltage.  This more gradual startup approach really helps.  This sort of thing is impossible to see with a slow multimeter, but with a reasonably good bandwidth measurement of the current, it’s easy to see what various code actually does to the current.

I tried connecting my multimeter to the amp output.  Sometimes it’s just a lot more convenient to look at a single number on a meter than fiddle with the scope.  I had been using the current mode on the meter before building this.  One thing I was surprised to find it my little meter updates its screen much faster while measuring about 125 mA than it does when measuring 125 uA.

Another interesting thing I’ve been noticing is patterns within the blue current waveform.  This Agilent scope has a “digital phosphor” rendering of the huge amount of data it collected.  This static screenshot can’t really capture the interactive experience of adjusting the waveform intensity, where various regions within the data change brightness differently, indicating there’s something interesting/different going on.  Even so, you can see several areas in the screenshot where interesting things are happening once the CPU is up and running.  It’s interesting how the current waveform changes as different code executes.

I know this isn’t anything terribly impressive… basically just buy a high-end amplifier and use it with a series resistor.  Maybe it even reads like an Analog Devices ad?  I’m not affiliated with Analog Devices… I just bought this part at normal qty 1 pricing from Digikey.

Still, this is the first time I’ve ever really looked at such low microcontroller currents with a few MHz bandwidth, and I’m guessing not many people have ever bothered to really measure such currents, so I thought I’d share.

 

Years ago I originally posted this article to the DorkbotPDX site.  Hackday published an article about it at the time.  Since then, the DorkbotPDX blog section has vanished.  I’m reposting it here, to preserve the info… for the next time I or others might need to do this sort of current measurement on a budget.

 

 

 

 

 

 

How To Get Tech Help From Strangers On The Internet

Online tech forums can feel intimidating.  Three simple things can greatly improve your experience & odds for useful help, regardless of your skill level.

Good First Impression

“You get 1 chance to make a good first impression” is timeless wisdom. Strangers will quickly form an opinion of you, based only on the words, images or video in your message.  Make your words count!

Showing you’ve made an effort, more than any other factor, will influence people to help.

But if you’re an absolute beginner, what sort of effort can reasonably be expected?  Anyone can try a Google search using the words from their question.  Many experts have long forgotten how difficult finding relevant info can be when you don’t know the proper terms, or unrelated tech has since started using those words.  Just explaining the search you tried and confusing or off-topic info you’ve seen can go a long way towards helping seasoned experts to understand your struggle.  The results aren’t important, your personal effort is what matters!

Genuine interest to learn also tends to make a good impression.  Even a few words can really communicate your attitude.  Frustrated but determined to learn from a trying experience can make a great impression.  Very smart people who can help the most tend to appreciate genuine & candid communication.

Of course if you have started a project or done work, mention it.  Or better, show what you have tried.  Screenshots, photos or even a quick video can powerfully demonstrate what you’ve done, and cover the other 2 aspects of a successful forum question.

Even just a few extra words, which show you are making a sincere effort and sincerely wish to learn, can make an improvement in the response you’ll likely receive.

Context Brings Understanding

Humans have amazing capability to genuinely understand.  Experts can sometimes apply their tremendous knowledge towards creative ideas to solve your problem… when they actually understand what you really need.

Providing context in your question is key.  What are your larger goals?  How does the question you’re asking fit into your project?  What do you ideally hope to accomplish?  And why?

When working on a technical matter, it’s easy to focus on only the issue at hand.  Remember, good people with vast knowledge & experience regularly read forums, just because they enjoy helping others.  When composing your forum question, don’t forget to give context.  It can make the difference between answers that are at best technically correct and answers that are truly insightful.

Details Are Essential

Modern technical issues can involve a mind boggling number of details.  How much raw information should your question include?  Before posting, try asking these questions:

  1. Can readers SEE or EXPERIENCE your problem?
  2. Can readers REPRODUCE your problem?

It’s easy to joke “I saw an error, but clicked OK without reading”.  But seriously, a basic level of detail allows everyone reading your message to see the problem.  A screenshot or exact copy of an error, plus specific information about software, hardware, and steps taken are a baseline for your readers to merely attempt to see the problem as you did.

Ideally, people reading your message would be able to reproduce the problem.  Providing this level of detail is not always practical.  But if you can give enough info, odds of someone solving the problem are vastly improved.

Software programming questions should include complete source code, not merely the small fragments.  Often code issues involve subtle details, such as declarations of variables elsewhere in the code.  Some experts can spot these sorts of problems with amazing speed, if you show complete code!  Don’t forget to be specific about any extra libraries used.

Electronic hardware issues usually need photos or accurate wiring diagrams, and links or part numbers of materials used.

It’s common to feel nervous or self-conscious about showing your project details on a forum.  Don’t be shy.  Experts who regularly read the forum genuinely want to help.  Show enough detail, give context and demonstrate you’ve made some effort and odds very good they will help!

Common Pitfalls

Forums don’t always work ideally.  Usually the 3 steps above will lead to good results, but there are a number of common problems to keep in mind.

Too Much Diagnosis

Technical problem solving involves observing the effects of an issue and trying to deduce the cause.  A very natural human tendency writing while in this mindset is to focus moreso on the presumed causes than objectively sharing the observed effects.

Usually a short break between an intense debugging session and composing your forum post can make a tremendous difference.  When writing, try to think of your reader’s perspective.  Are you asking people to help diagnose the problem, or merely asking them to nod “yup, you’re right” in agreement with your existing conclusions?

However, sharing your thought process can avoid people needlessly covering the same ground you’ve already investigated.  There is no one perfect way.  Just keep in mind that too much diagnosis can shut people out of helping you troubleshoot.

Venting Frustration

Let’s face it, some technical problems are really hard, even intensely frustrating.  When you’re annoyed, venting frustration is a very natural human tendency.

Before you post on a forum, creating that all-important first impression with strangers, review your message.  Some level of expressed frustration is normal and maybe even helpful.  It can show you’ve really tried.  Just beware the common trap, where your forum message ends up being perceived as a rant rather than a question or request for help.

Nobody Replies

Even under ideal conditions, sometimes forum posts get no replies.  This can feel disappointing & disheartening.  Try not to let that emotion take control as you compose a reply to “bump” the thread.  You can do much better.

Effort matters.  If days or weeks have elapsed since your first post, presumably you’ve put more effort into the project?  Even if fruitless, the personal effort you’ve invested since your prior post is something you can share that tends to draw people in.  Photos or code or other clear signs of work on your part can be highly effective.

Sometimes very difficult questions, with plenty of detail, get no replies simply because nobody knows the answer.  Or nobody knows with certainty.  Asking for opinions or ideas for directions to try can really help in these cases, turning what started as a solid question into a more fluid conversation.  Again, apparent effort is the key factor that attracts people to contribute towards helping solve a tough problem.

Most forums discourage or even ban posting duplicate copies of your question, or “cross posting”.  Posting the exact same message in multiple places at the same time is almost never a good idea.  Experts who regularly read the forums, the people most likely to help you, will almost certainly notice.  You get one chance to make a good first impression.  However, after a period with no answers, sometimes duplicate posting can be done gracefully, usually with an explanation & link to the prior unanswered question.

Academically Dishonest Students

Every tech forum gets lazy students asking people to do their homework.  Experts who regularly read forums are inundated by these low-quality questions.  Like an email inbox with spam, the last thing you want is your message to be mistaken for more junk & quickly dismissed.

If you are doing a student project, usually the best approach is to be honest about the academic nature (and deadline) and to make sure your sincere effort shines through.  Earnest effort and genuine desire to learn is what distinguishes good students in the minds of experts who will actually help!

Proprietary Projects

When you can’t share source code or other relevant technical details, technical forum help is rarely effective.

Usually you must invest extra effort to separate just the one piece that is problematic from the rest of your project, so it can be shared.  If you have not done this, consider that asking “has anyone seen a problem like XYX” is essentially asking humans to function as a search engine.  Sometimes the results are good (better than you could do with Google), but difficult technical problems without sharing relevant details are rarely solved by blind guessing.

If your employer or organization absolutely will not allow any code or details shared, no matter how trivial, perhaps using “enterprise” products which come with support contracts, or hiring a private consulting firm is more appropriate than asking on a public forum.

After Your Question Is Answered

The best way to say “Thank You” is to acknowledge who had the right answer.  Many experts pour countless unpaid hours into helping strangers on forums, simply because it feels good to be helpful.  Being recognized as supplying the correct answer is a nice reward.

When your problem is solved, please consider Google and other search engines may send people with similar technical questions to your messages for years to come.  The very best thing you can do after a technical problem is resolved is a quick but clear message confirming the solution.

If you posted on multiple forums, please take a few minutes to follow up on every thread with a link to the message with best answer.

About This Article (and Me)

Over the last 6 years, I’ve written 18620 answers our forum here at PJRC, and many more on numerous other forums.  During this time, we’ve managed to grow a pretty good forum community and help many thousands of people with projects.

I watched and tried to learn which what has worked and how we might improve.  One thing that’s become very clear is so many people need a little guidance on how to best ask their questions.  It’s not the technical nature, but these subtle human factors that matter most.

My hope is to see all tech forums improve.  Please, share this article.

-Paul Stoffregen

 

USB Hub Bug Hunting & Lessons Learned

This tale begins with a customer reporting this cute little 2 port USB hub wasn’t working with Teensy 3.6.

In this article I’ll show how a protocol analyzer is used, how my instincts turned out to be very wrong, and along the way dive into arcane USB details you probably won’t see explained anywhere else.

First Instincts: Fail

This article is being written from the perspective of hindsight.  You can much of the process as it actually happened on the forum thread where this problem was reported.

Most of my first troubleshooting instincts revolved around these 2 questions:

1: This hub works on Linux, Mac & Windows.  What are they doing that Teensy isn’t?

2: All other hubs work on Teensy, so what’s different about this particular hub?

I believe most people would start with these 2 questions.  In the end, both turned out to involve very wrong assumptions that were ultimately a distraction from finding the real problem.

Protocol Analyzer: Fail (Software Crash)

Usually the first step in debugging USB problems is to hook up the protocol analyzer.  I have this one, the Beagle480 from Total Phase.

The way this works is pretty simple.  On the left side where I drew the 2 blue arrows, your USB communication passes through.  To whatever you’re monitoring, it just looks like a longer cable.  But it makes a copy of all communication happening in either direction and sends it to the port on the right (green arrow).

If you’re just making a USB device that plugs into your PC, usually you can run software to capture the data from your PC’s driver.  But when you’re making an embedded USB host like Teensy 3.6’s 2nd USB port, this is the only way to actually see what’s really happening.  The downside is the cost.  Total Phase sells this one for $1200.

Here’s the test setup, with the Beagle480 inserted between the Teensy 3.6 and 2-port hub.  Usually quite a clutter of cables is involved, but this little hub plugs right into the side of the Beagle480.  The hub has 1 keyboard connected.  A Mac laptop is watching the USB communication, and a Linux desktop is running the Arduino IDE to upload code to Teensy.

Despite the high price, it’s not made of magic.  Inside it has a fairly small buffer memory.  It has to send a copy everything on that right side port, which can be problematic during periods of sustained fast data flow.

This case turned out to be one of those problematic situations.  Unfortunately Total Phase’s software does not handle this situation well.  The Linux version simply crashes.

The Mac version is better.  It hangs with the infamous Mac beachball.  A Force quit is needed.

Fortunately having the software lock up means at least something can be seen on the screen, even if I could not scroll up to look at any other data.   Whatever was going wrong with this hub was causing a SETUP token to be repeatedly tried.  The timestamps show it’s happening about every 8 to 12 microseconds!

What’s Different About This Hub?

With the protocol analyzer not helping much, I decided to first focus on question #2.  All the other hubs work.  What’s different about this one?

In USBHost_t36.h on line 59 is an option to turn on lots of verbose debug printing.  But it doesn’t print much about the USB descriptors.  I decided this was a good time to fix that (rather than just plug into a PC and run a program to view), optimistically hoping the info would show some stark difference between this bad hub and all the good ones.  I spent a few hours adding this code to print descriptor info.

Here is the configuration descriptor for this 2 port hub.

Configuration Descriptor:
  09 02 29 00 01 01 00 E0 01 
    NumInterfaces = 1
    ConfigurationValue = 1
  09 04 00 00 01 09 00 01 00 
    Interface = 0
    Number of endpoints = 1
    Class/Subclass/Protocol = 9(Hub) / 0 / 1(Single-TT)
  07 05 81 03 01 00 0C 
    Endpoint = 1 IN
    Type = Interrupt
    Max Size = 1
    Polling Interval = 12
  09 04 00 01 01 09 00 02 00 
    Interface = 0
    Number of endpoints = 1
    Class/Subclass/Protocol = 9(Hub) / 0 / 2(Multi-TT)
  07 05 81 03 01 00 0C 
    Endpoint = 1 IN
    Type = Interrupt
    Max Size = 1
    Polling Interval = 12

Turns out this little 2 port hub is a Multi-TT type.  I’ll talk more about the transaction translators soon.  Most hubs have only a single TT, but 2 of my test hubs are Multi-TT.  Here’s one of their descriptors.

Configuration Descriptor:
  09 02 29 00 01 01 00 E0 32 
    NumInterfaces = 1
    ConfigurationValue = 1
  09 04 00 00 01 09 00 01 00 
    Interface = 0
    Number of endpoints = 1
    Class/Subclass/Protocol = 9(Hub) / 0 / 1(Single-TT)
  07 05 81 03 01 00 0C 
    Endpoint = 1 IN
    Type = Interrupt
    Max Size = 1
    Polling Interval = 12
  09 04 00 01 01 09 00 02 00 
    Interface = 0
    Number of endpoints = 1
 Class/Subclass/Protocol = 9(Hub) / 0 / 2(Multi-TT)
  07 05 81 03 01 00 0C 
    Endpoint = 1 IN
    Type = Interrupt
    Max Size = 1
    Polling Interval = 12

Turns out they’re an exact binary match, except for the last byte in the configuration descriptor.  I can’t emphasize enough how this process involves looking stuff up in the USB 2.0 PDF, in this case page 266.  This byte is just the hub telling the PC how much power is might draw.  The 2 port hub say 01, meaning only 2 mA which can’t be right.  But I know nothing in code on Teensy (which I wrote) ever look at this byte.  Maybe someday we’ll detect power issues, but for now it’s ignored.

Hubs also have a special descriptor for their capability.  It’s documented on page 417-418 of the USB 2.0 PDF.  Here the 3 different hubs did differ.

2 port (not working)
09 29 02 09 00 32 01 00 FF


4 port, ioGear, GUH274 (works)
09 29 04 89 00 32 64 00 FF


4 port, no-name blue color, UH-BSC4-US (works)
09 29 04 E0 00 32 64 00 FF

While I now know these differences are meaningless, and really only the 3rd byte (the number of ports) matters, at the moment I found these it seemed like this just had to explain the problem.  That 2 port hub is different.

Turns out the 4th byte describes features like which LEDs, over-current protection and other features the hub has.  The 7th byte says how long we’re supposed to wait between enabling power and accessing a port, but the bad 2-port hub says only 1 ms, much less than the other 2 working hubs.

What Would Linux Do?

With nothing apparently different between the hub, I returned to the idea that Teensy must be missing some initialization or other step that hasn’t mattered for all the other hubs, but does for this one.

It’s easy to look at what Linux does, since nothing goes wrong that crashes the Beagle480’s software.  Here’s most of the hub’s enumeration process on Linux:

I quickly noticed Linux is sending this Set Interface control transfer.  I did testing with all 7 of the hubs I have.  This doesn’t happen on the USB 1.1 hub or the 3 that are Single-TT, but Linux does do it for all the Multi-TT hubs.

Adding this to the USBHost_t36 library took time.  The hub driver was only looking at the first 16 bytes, to see if the interface and endpoint are compatible.  I had to rewrite the hub driver to look for all the possible interfaces.  Then when it was able to detect if 2 or more exist and parse which is the best to use, I added code to actually send that Set Interface command.

It didn’t make any difference.  I’d poured a few more hours into coding, which in the long term was probably worthwhile, but still no closer to solving the problem!

Working Around Data Capture Software Lockup

I seriously considered resigning this problem to my low-priority bug list, since all the other hubs work.  I considered restructuring the enumeration code to even more closely match Linux’s approach.  I looked over the descriptor data yet again, hoping to find the answer, but no.

Clearly I needed to get around the problem with Total Phase’s software locking up.  At first I just added a “while (1) ; // die here” right before calling new_Device() to start the enumeration process after the hub detects the keyboard.  But that didn’t reveal anything useful.

I needed to let the problem happen, but then shut off the USB port before the Beagle480’s buffer could overfill (or whatever other problem causes the Total Phase software to lock up).  Ultimately I did this:

    if (state == PORT_RECOVERY) {
        port_doing_reset = 0;
        print("PORT_RECOVERY, port=", port);
        println(", addr=", device->address);
static IntervalTimer stoptimer;
stoptimer.begin(panic, 1000);
        // begin enumeration process
        uint8_t speed = port_doing_reset_speed;
        devicelist[port-1] = new_Device(speed, device->address, port);

When I put in terrible debug code, I like to give it very odd indenting so I’ll see it when/if I later forget to remove it!  In this case, a hardware timer is started to generate an interrupt in 1 millisecond, which calls panic().

The panic() function looked like this:

void panic()
{
    GPIOE_PCOR = (1<<6); // turn off USB host power
    Serial.printf("Panic stop\n");
    while (1) ;
}

With this I was finally able to see the beginning of the problem.

Somehow the bad 2 port hub was indeed working, at least for the first control transfer which reads 8 bytes of the device descriptor.  But then everything goes wrong and nothing works.  The host controller keeps trying to resend the Set Address command… at least for 1 millisecond, until the panic() function abruptly shuts off the USB port’s power.

Realizing What’s Really Wrong

I wish I could tell you there’s a reliable process to go from actually being able to observe the problem to understanding it.  There isn’t!

In this case I spent several hours re-reading the USB 2.0 spec, staring at that screenful of info, and basically just guessing.  For a while I considered there might be some sort of electrical interference problem unique to this hub, somehow triggered by completing that first transfer.

It literally took hours to focus on this 1 byte which turned out to be the cause of so much trouble.

To understand this byte, you need to know about USB split transactions, which are probably among the most arcane of USB features.  This next part will be quite technical, so perhaps skip to the end if you’re not in the mood for intensely low-level USB protocol details.

Split transfers are used only between USB hosts and hubs.  If you’re making a USB device, you’ll never see this type of USB communication.  They’re used only when a host at 480 Mbit speed needs to communicate with devices at 12 or 1.5 Mbit.

When USB 3.x is used, all the 5+ Gbps communication happens on dedicated wires.  The cables & hubs have a redundant 480 Mbit path. When you use a 12 or 1.5 Mbit device on a USB 3.0 hub, that slow communicate is done using split transactions on the 480 Mbit line.  Slow speeds are never translated to the 5 Gbps lines.

USB has a lot of very specific terminology.  From smallest to largest, communication is in Tokens, then Transactions, then Transfers.  I’m not going into details here, other than it’s important to remember we’re dealing with 3 starts-with-T junks of data.  Tokens are the smallest.  Transactions are bigger, made up of Tokens.  Transfers are the largest, maybe of of Transactions & Tokens.

These SPLIT tokens communicate with a Transaction Translator (TT) inside the hub.  When the host wants to send a transaction at 12 or 1.5 Mbit, it sends a SPLIT-START token to the hub’s TT.  If the TT isn’t busy, it acknowledges and begins working on the slow communication.  That sequence of tokens is a transaction, also called split start (maybe confusing if you don’t know the context).  Later the host sends SPLIT-COMPLETE.  If the TT has finished, it replies with ACK, and if it’s still busy it replies with NYET.

Here’s the diagram from the USB 2.0 spec on page 202 showing this for OUT transactions.  Sadly there isn’t a diagram specifically for SETUP transactions, but my guess is this should be pretty close.  Did I mention this sort of work is all about guesswork and re-reading technical specs over and over?

Total Phase’s software is very nice.  It normally shows you one Transfer per line.  If you want to see the Transactions that made up that transfer, you click the little triangle to expand.  Likewise if you want to drill down to see the actual tokens, you can.  Here’s the first successful transfer.  Hopefully you can see this more-or-less corresponds to Figure 8-9, except SETUP instead of OUT.  Of course, we only see the stuff from circles #1 & #3 since we’re connected between Teensy and the hub.

In this case, we can see Teensy sent 58 SPLIT-COMPLETE tokens while the hub slowly send that SETUP token at 1.5 Mbit to the keyboard, where the hub replied with NYET to the first 57, then finally an ACK.

This is important, because the process of figuring out why something isn’t working almost always involves first looking at the deeper details of what it’s supposed to do when it does work.

With that in mind, here’s an expanded view of the first Set Address transfer that fails.

Remember the software is showing us nesting of Tokens inside Transactions inside Transfers.  So the 3rd line is actually the first real data.  This badness begins with the bytes 78 01 82 B4.

Earlier 78 01 82 10 was considered ok.  But 78 01 82 B4.  So what does that last byte mean?  Again the answer is digging into the nitty gritty details in the USB 2.0 document.  It happens to be on the same page as that diagram above.

Even though going from 10 to B4 looks like a big change, in fact the top 5 bits are a CRC check which is expected to be totally different if any of the checked bits changed.  So really this means just one of the “ET” bits flipped from a 0 to 1.

The meaning of those ET bits is documented a few pages later.

These bits were 00 for Control, but somehow became 10 for Bulk.  That’s definitely not right!  We’re sending a SETUP token, and that token has the endpoint number encoded within (also documented elsewhere in chapter 8….)

Where This 1 Bit Went So Wrong

The good news is this last part, going from an understanding of what’s wrong to finding the mistake causing it, is relatively easy.  It took only about an hour.  While I didn’t know exactly what controlled those ET bits, I was pretty sure it was something in the EHCI controller’s QH or qTD data structures.

USB controllers implementing EHCI are very unlike traditional embedded peripherals, where a handful of fixed register addresses control everything.  EHCI gets almost everything from main system RAM, where it’s regularly using DMA to read linked lists and binary tree data structures to find out what work you want it to perform.

For each pipe/endpoint each USB device, you create a QH structure (except complexities for isochronous… but trying to keep this simple).  Here’s the EHCI diagram for the QH structure.

The first field is used for the linked lists or inverted tree pointers.  The next two are where you actually configure how the hardware will communicate.  The big yellow area is written by the hardware, coping other info there temporarily as it works on the Transfers you want.  (Yes, the hardware deals in Transfers and automatically does the Transactions & Tokens)

While there’s a lot of small fields in those 64 bits, the “C” bit turned out to be the key.  The EHCI spec documents it as:

Control Endpoint Flag (C). If the QH.EPS field indicates the endpoint is not a high-speed device, and the endpoint is an control endpoint, then software must set this bit to a one. Otherwise it should always set this bit to a zero.

Fortunately the code uses C structs to with distinctive names to represent the QH and other EHCI data structures.  A quick grep of all the code turned up everything that accesses this field.  Only a few of them actually write to it.

Ultimately the bug turned out to be just this 1 line function called pipe_set_maxlen()

pipe->qh.capabilities[0] = (pipe->qh.capabilities[0] & 0x8000FFFF) | (maxlen << 16);

Of course if you look at the 2nd line in the QH documentation, this is incorrectly zeroing too many bits, destroying the C bit and part of the RL field.  It should be:

pipe->qh.capabilities[0] = (pipe->qh.capabilities[0] & 0xF800FFFF) | (maxlen << 16);

Indeed applying this fix made the “bad” 2-port hub work quite nicely.

Headslap Moment

After I found the fix, I looked one last time at the question “All other hubs work on Teensy, so what’s different about this particular hub?”

Here’s the communication with that keyboard through one of the “good” hubs!

Turns out the code has been sending improper SPLIT-START tokens all along, but every hub I own and the numerous hubs other people have used all were able to work anyway!  The different was this particular hub was actually rejecting the bad tokens.

In all this time, developing this library earlier this year and all this recent work to track down this bug, apparently I never actually connected the protocol analyzer and watched the “good” hubs really communicating with low speed devices!  Could have saved a *lot* of work.  Now I know, in hindsight.