CrashReport and Breadcrumbs

Teensy 4.0, Teensy 4.1 and MicroMod Teensy have a Memory Protection Unit (MPU) which can detect incorrect memory access. A fault handler stores information about the fault at a reserved location in memory, and reboots Teensy after 8 seconds. Your program can use CrashReport to access this stored information.

Basic CrashReport Usage with Arduino Serial Monitor

CrashReport can be accessed as a boolean to check whether stored data is present, and by the Arduino print() function. This simplest usage just prints this info to the Arduino Serial Monitor when your program starts.

void setup() {
  Serial.begin(9600);
  /* check for CrashReport stored from previous run */
  if (CrashReport) {
    /* print info (hope Serial Monitor windows is open) */
    Serial.print(CrashReport);
  }
}

Using Breadcrumbs

Discovering where and why your program crashes can be quite a challenge. To help, CrashReport provides up to 6 breadcrumbs you can use to log the progress of your program. Each breadcrumb stores a 32 bit number which infers where your program had recently run. (The term "breadcrumb" is inspired by the classic story Hansel and Gretel.)

In this simple example, breadcrumb #2 is written with 4 different numbers as the program runs.

void setup() {
  Serial.begin(9600);
  /* check for CrashReport stored from previous run */
  if (CrashReport) {
    while (!Serial && millis() < 10000) ; /* wait up to 10 sec */
    /* print info (hope Serial Monitor windows is open) */
    Serial.print(CrashReport);
  }
}

void loop() {
  CrashReport.breadcrumb(2, 1111111);
  delay(1000);
  CrashReport.breadcrumb(2, 2222222);
  delay(1000);
  CrashReport.breadcrumb(2, 3333333);
  delay(1000);
  volatile byte *p = nullptr; /* using this pointer will crash */
  *p = 5;
  CrashReport.breadcrumb(2, 4444444);
  delay(1000);
}

When the CrashReport info is printed, we can see the breadcrumb had 3333333, which means the crashing code is between the 3333333 and 4444444 breadcrumb values.

After uploading your program with new breadcrumb values, the first CrashReport info you see may be leftover info from your old program's prior run. Remember to look for the 2nd CrashReport print after upload to be sure you are getting breadcrumb info from your latest upload.

Finding a problem in a large program is usually much more difficult than this small example. For programs which have several logical functions or tasks, usually a different breadcrumb is used in each logical section. When the CrashReport info is printed, multiple breadcrumbs can give you an idea of which portions of work performed by your program had recently been completed.

Using addr2line

TODO...

Logging CrashReport

CrashReport can work with any Arduino libraries using the Arduino print() function. Instead of immediately printing to the Serial Monitor, you can instead log CrashReport information to a SD card using the SD library. To view the file, simply remove the SD card and read it with any computer.

#include <SD.h>

/* Teensy audio board: pin 10
   Teensy 3.5 & 3.6 & 4.1 on-board: BUILTIN_SDCARD
   Wiz820+SD board: pin 4 */
const int chipSelect = BUILTIN_SDCARD;

void setup() {
  if (CrashReport) {
    if (SD.begin(chipSelect)) {
      /* FILE_WRITE will append to end of the file */
      File logFile = SD.open("crashlog.txt", FILE_WRITE);
      if (logFile) {
        logFile.print(CrashReport);
        logFile.close();
      }
    }
  }
}

If a SD card is not available, you can also use LittleFS to log CrashReport information to a small unused portion of Teensy's program memory. Because you can't physically remove the program memory, your program must also provide a way to read the logged CrashReport file.

#include <LittleFS.h>

LittleFS_Program myfs;
const int storageSize = 256 * 1024;

void setup() {
  if (CrashReport) {
    if (myfs.begin(storageSize)) {
      /* FILE_WRITE_BEGIN will overwrite any old file */
      File logFile = myfs.open("crashlog.txt", FILE_WRITE_BEGIN);
      if (logFile) {
        logFile.print(CrashReport);
        logFile.close();
      }
    }
  }
  Serial.begin(9600);
  delay(1000);
  Serial.println("Press 'R' to show crashlog.txt file");
}

void loop() {
  /* every time loop() runs, check if the user wants CrashReport info */
  if (Serial.available()) {
    char c = Serial.read();
    if (c == 'r' || c == 'R') {
      File logFile = myfs.open("crashlog.txt");
      if (logFile) {
        while (logFile.available()) {
          c = logFile.read();
          Serial.write(c);
        }
      } else {
        Serial.println("Sorry, no crashlog.txt stored");
      }
    }
  }
  
  // program to do its normal work...

  /* uncomment this code to create a CrashReport */
  /* static elapsedMillis msec;
     if (msec > 15000) {
       CrashReport.breadcrumb(1, 123);
       CrashReport.breadcrumb(2, 456);
       CrashReport.breadcrumb(3, 789);
       volatile byte *p = nullptr;
       *p = 5;
     } */
}

Limitations

CrashReport is only effective when the MPU detects a fault condition or an unused interrupt is run. Many other types of problems can not be detected by the MPU. Programs can enter an infinite loop. Interrupts may be disabled, allowing the main program to keep running but effectively shutting off all hardware I/O. Certain peripherals, such as FlexIO, if accessed without first enabling their clocks can indefinitely hang the bus. The MPU can not detect these problems.

Normally a watchdog timer would be used for critical applications which must recover from these types of problems. However, effective use of watchdog timer hardware requires careful planning. Usually a group of flags are needed to track whether all functional portions of a program are operating properly. When to reset the timer, only after all program tasks are fully confirmed, but always before the timeout causes hardware reboot requires careful design. Unlike the MPU which is very "safe" (it will only take action when software makes a confirmed mistake) watchdog timers can create new problems if the program's timing is not carefully designed.

These hardware features can give you resilience to software flaws, but understanding their limitations is important to actually achieving good results.

MyFault Library

To test CrashReport, the MyFault Library provides a collection of examples which trigger the various types of faults the MPU can detect. It also has faults which are not properly detected.