Chapter 6: Running the "Hello World" Example on Arduino

Steps to Run the "Hello World" Example

Running the "Hello World" example from Harvard's TinyMLx course using the Arduino IDE involves a series of steps. Here's a concise guide:

Prerequisites

  • Arduino IDE installed
  • A compatible microcontroller board (like Arduino Nano 33 BLE Sense)
  • Necessary drivers and libraries installed

Steps

  1. Install Board Support:

    • Open Arduino IDE, go to Tools > Board > Boards Manager.
    • Search for the board you're using (e.g., "Arduino Nano 33 BLE") and install it.
  2. Install Libraries:

    • Go to Sketch > Include Library > Manage Libraries.
    • Search for "Harvard_TinyMLx" and install the library.
  3. Load the Example:

    • Go to File > Examples > Harvard_TinyMLx > hello_world.
    • This will open the example sketch.
  4. Configure the Board:

    • Go to Tools > Board and select your board from the list.
    • Go to Tools > Port and select the port to which your board is connected.
  5. Upload the Code:

    • Click the "Upload" button (arrow pointing right) or press Ctrl + U to compile and upload the code to your board.
  6. Monitor Output:

    • You should see the LED blinking in a pattern that resembles the sine wave, which means the model is running and making predictions.

blinking


TfLitePtrUnion

The TfLitePtrUnion is a union data structure used in TensorFlow Lite Micro to manage the different data types that may be stored in a tensor. A union allows you to store different data types in the same memory location, which is especially useful when memory is constrained like in microcontrollers.

Here's a quick breakdown:

  • The various typed pointers like i32, u32, i64, f, etc., allow you to conveniently treat the raw memory as a certain type. This is helpful for code readability and might make some operations faster.

  • The void* data member is the most generic one, and it essentially means "a pointer to some data, but I don't care what type it is." This is useful when you're writing generic code that must work with any data type.

  • The comment in the code suggests that direct access to the other members is deprecated, and instead, you should use .data or the helper function GetTensorData<TYPE>(tensor) for type safety.

In TensorFlow Lite Micro, a tensor is often an abstract representation that can hold various types of data like floats, integers, or even more complex types like TfLiteComplex64. Using a union like TfLitePtrUnion enables a flexible and memory-efficient way to manage these different possibilities.

Here is an example of how TfLitePtrUnion is used in the TfLiteTensor struct:

First, let's create an instance of MicroInterpreter and get the input and output tensors:

tflite::MicroInterpreter interpreter(model, op_resolver, tensor_arena, kTensorArenaSize);
TF_LITE_ENSURE_STATUS(interpreter.AllocateTensors());

TfLiteTensor* input = interpreter.input(0);
TFLITE_CHECK_NE(input, nullptr);

TfLiteTensor* output = interpreter.output(0);
TFLITE_CHECK_NE(output, nullptr);

From the tensors, we can get the data member, which is of type TfLitePtrUnion. We can either use .f to treat the data as a float:

// If input is float
input->data.f[0] = golden_inputs_float;
TF_LITE_ENSURE_STATUS(interpreter.Invoke());
float y_pred = output->data.f[0];
TFLITE_CHECK_LE(abs(sin(golden_inputs_float) - y_pred), epsilon);

...or use .int8 to treat the data as an int8:

// If input is int8 (quantized)
input->data.int8[0] = golden_inputs_int8;
TF_LITE_ENSURE_STATUS(interpreter.Invoke());
float y_pred = (output->data.int8[0] - output_zero_point) * output_scale;
TFLITE_CHECK_LE(abs(sin(golden_inputs_float) - y_pred), epsilon);

Folder Structure of the "Hello World" Example

The folder containing the "Hello World" example is located at /Users/myName/Documents/Arduino/libraries/Harvard_TinyMLx/examples/hello_world on my computer (again, all under Apache License 2.0). And here's the folder structure:

.
├── arduino_constants.cpp
├── arduino_main.cpp
├── arduino_output_handler.cpp
├── constants.h
├── hello_world.ino
├── main_functions.h
├── model.cpp
├── model.h
└── output_handler.h

I won't go into the details of each file, but here's a brief overview:


arduino_output_handler.h and arduino_output_handler.cpp

These files define a function that outputs the predicted sine wave to the Arduino Nano 33 BLE Sense's LED using PWM (pulse-width modulation).

// Do this only once
if (!initialized) {
    // Set the LED pin to output
    pinMode(led, OUTPUT);
    initialized = true;
}

// Calculate the brightness of the LED such that y=-1 is fully off
// and y=1 is fully on. The LED's brightness can range from 0-255.
int brightness = (int)(127.5f * (y_value + 1));

// PWM output to the LED
analogWrite(led, brightness);

// Log the current brightness value for display in the Arduino plotter
TF_LITE_REPORT_ERROR(error_reporter, "%d\n", brightness);

model.h and model.cpp

These files are where the model that were trained in the notebook shown in Chapter 4 is stored. The model.h file contains the following declarations:

extern const unsigned char g_model[];
extern const int g_model_len;

The model is stored as a 8-byte array in model.cpp. Therefore, we need to use alignas(8) to ensure that the model is aligned to 8 bytes. This is necessary because the Arduino Nano 33 BLE Sense's microcontroller uses a 32-bit ARM Cortex-M4F processor, which requires 8-byte alignment for 64-bit accesses.

#include "model.h"

// Keep model aligned to 8 bytes to guarantee aligned 64-bit accesses.
alignas(8) const unsigned char g_model[] = {
    0x1c, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, 0x14, 0x00, 0x20, 0x00,
    0x1c, 0x00, 0x18, 0x00, 0x14, 0x00, 0x10, 0x00, 0x0c, 0x00, 0x00, 0x00,
    0x08, 0x00, 0x04, 0x00, 0x14, 0x00, 0x00, 0x00, 0x1c, 0x00, 0x00, 0x00,
    0x98, 0x00, 0x00, 0x00, 0xc8, 0x00, 0x00, 0x00, 0x1c, 0x03, 0x00, 0x00,
    ...
    0x0c, 0x00, 0x10, 0x00, 0x0f, 0x00, 0x00, 0x00, 0x08, 0x00, 0x04, 0x00,
    0x0c, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x09};
const int g_model_len = 2488;

hello_world.ino

This is the "main" file of an Arduino sketch. It's the file that contains the setup() and loop() functions. The setup() function is called once when the board boots up, and the loop() function is called repeatedly.

We start by declaring some global variables (and put them in an anonymous namespace to avoid name collisions):

namespace
{
    tflite::ErrorReporter* error_reporter = nullptr;
    const tflite::Model* model = nullptr;
    tflite::MicroInterpreter* interpreter = nullptr;
    TfLiteTensor* input = nullptr;
    TfLiteTensor* output = nullptr;
    int inference_count = 0;

    constexpr int kTensorArenaSize = 2000;
    uint8_t tensor_arena[kTensorArenaSize];
}

Then, we define the setup() function, which is called once when the board boots up. This function is used to initialize the board and set up the model.

void setup()
{
    // Set up logging. Google style is to avoid globals or statics because of
    // lifetime uncertainty, but since this has a trivial destructor it's okay.
    // NOLINTNEXTLINE(runtime-global-variables)
    static tflite::MicroErrorReporter micro_error_reporter;
    error_reporter = &micro_error_reporter;

    // Map the model (in model.h) to a usable data structure. This doesn't
    // involve any copying or parsing, it's a very lightweight operation.
    model = tflite::GetModel(g_model);

    // Check that the model we are using is correct
    if (model->version() != TFLITE_SCHEMA_VERSION)
    {
        TF_LITE_REPORT_ERROR(error_reporter,
                            "Model provided is schema version %d not equal "
                            "to supported version %d.",
                            model->version(), TFLITE_SCHEMA_VERSION);
        return;
    }

    // This pulls in all the operation implementations we need.
    // NOLINTNEXTLINE(runtime-global-variables)
    static tflite::AllOpsResolver resolver;

    // Build an interpreter to run the model with.
    static tflite::MicroInterpreter static_interpreter(model, resolver, tensor_arena, kTensorArenaSize, error_reporter);
    interpreter = &static_interpreter;

    // Allocate memory from the tensor_arena for the model's tensors.
    TfLiteStatus allocate_status = interpreter->AllocateTensors();
    if (allocate_status != kTfLiteOk)
    {
        TF_LITE_REPORT_ERROR(error_reporter, "AllocateTensors() failed");
        return;
    }

    // Obtain pointers to the model's input and output tensors.
    input = interpreter->input(0);
    output = interpreter->output(0);

    // Keep track of how many inferences we have performed.
    inference_count = 0;
}

We can ignore the "NOLINTNEXTLINE" comments for now. They're just there to suppress some warnings from the linter, essentially saying that "Yes, I know this is generally considered bad practice, but I have a good reason for doing it this way, so don't warn me about it."

Also, note how we use the tflite::AllOpsResolver to pull in all the operation implementations we need. This is a convenience class that contains all the operation implementations that are available for the target hardware. However, this can easily take up a lot of memory, so it's recommended to use a custom resolver that only includes the operations that are actually used by the model. The book shows how to do this in the next chapter.

Next, we define the loop() function, which is called repeatedly. This function is used to run the model and output the results.

void loop()
{
    // Calculate an x value to feed into the model. We compare the current
    // inference_count to the number of inferences per cycle to determine
    // our position within the range of possible x values the model was
    // trained on, and use this to calculate a value.
    float position = static_cast<float>(inference_count) /
                    static_cast<float>(kInferencesPerCycle);
    float x = position * kXrange;

    // Quantize the input from floating-point to integer
    int8_t x_quantized = x / input->params.scale + input->params.zero_point;
    // Place the quantized input in the model's input tensor
    input->data.int8[0] = x_quantized;

    // Run inference, and report any error
    TfLiteStatus invoke_status = interpreter->Invoke();
    if (invoke_status != kTfLiteOk)
    {
        TF_LITE_REPORT_ERROR(error_reporter, "Invoke failed on x: %f\n",
                            static_cast<double>(x));
        return;
    }

    // Obtain the quantized output from model's output tensor
    int8_t y_quantized = output->data.int8[0];
    // Dequantize the output from integer to floating-point
    float y = (y_quantized - output->params.zero_point) * output->params.scale;

    // Output the results. A custom HandleOutput function can be implemented
    // for each supported hardware target.
    HandleOutput(error_reporter, x, y);

    // Increment the inference_counter, and reset it if we have reached
    // the total number per cycle
    inference_count += 1;
    if (inference_count >= kInferencesPerCycle) inference_count = 0;
}