Chapter 5: Building an Application

1. Testing the "Hello World" Example on Local Machine

Clone the tflite-micro repository and run the following commands to test the "Hello World" example on your local machine:

make -f tensorflow/lite/micro/tools/make/Makefile test_hello_world_test

This runs this file in the terminal. You should see a lot of output, but the last few lines should look like this:

tensorflow/lite/micro/tools/make/test_latency_log.sh hello_world_test  gen/osx_arm64_default/bin/hello_world_test '~~~ALL TESTS PASSED~~~' osx

"Unique Tag","Total ticks across all events with that tag."
FULLY_CONNECTED, 20
total number of ticks, 20

[RecordingMicroAllocator] Arena allocation total 2344 bytes
[RecordingMicroAllocator] Arena allocation head 128 bytes
[RecordingMicroAllocator] Arena allocation tail 2216 bytes
[RecordingMicroAllocator] 'TfLiteEvalTensor data' used 240 bytes with alignment overhead (requested 240 bytes for 10 allocations)
[RecordingMicroAllocator] 'Persistent TfLiteTensor data' used 128 bytes with alignment overhead (requested 128 bytes for 2 tensors)
[RecordingMicroAllocator] 'Persistent buffer data' used 1152 bytes with alignment overhead (requested 1100 bytes for 7 allocations)
[RecordingMicroAllocator] 'NodeAndRegistration struct' used 192 bytes with alignment overhead (requested 192 bytes for 3 NodeAndRegistration structs)
~~~ALL TESTS PASSED~~~

Running hello_world_test took 0.302 seconds

Overview: What's Happening Here?

In this test, we're running a simple TensorFlow Lite Micro application on your local machine to ensure everything is set up correctly. This "Hello World" example serves as a quick sanity check and a way to familiarize yourself with the framework's workflow. The application uses several components to get the job done:

Model: A pre-trained TFLite model that encodes the machine learning algorithm you'll be running.
Op Resolver: This resolves the operations (ops) in the model, linking them to their code implementations.
Arena: A fixed-size memory block allocated for the model's tensors, helping to manage resource-constrained environments.
Interpreter: This is the engine that uses the model, resolver, and arena to execute the machine learning algorithm.

Let's take a look at the code. The hello_world_test.cc:

You can see that we define 3 tests and run them in the main() function. The first test is for profiling memory and latency, the second one is for accuracy, and the third one is for accuracy with integer quantization. Let's take a look at each of them.

1.1 Including Header Files

Before implementing the tests, we need to include the necessary header files. This file starts by including the following header files, all licensed under the Apache License 2.0 (see home page for details):

tensorflow/lite/core/c/common.h: This is the main header for common definitions and APIs in TensorFlow Lite. It contains data structures and constants used throughout the framework, e.g, TF_LITE_ENSURE_STATUS, TfLiteStatus, and TfLiteTensor.
tensorflow/lite/micro/examples/hello_world/models/hello_world_float_model_data.h: This file contains the floating-point representation of a "Hello World" model that has been trained and converted to C array format for easy inclusion in a microcontroller project.
tensorflow/lite/micro/examples/hello_world/models/hello_world_int8_model_data.h: Similar to the float version, this one contains the integer (int8) version of the "Hello World" model.
tensorflow/lite/micro/micro_interpreter.h: This file includes the MicroInterpreter class, which is the object that runs inference on a TFLite Micro model. It performs the operations defined in the model using the inputs and outputs that you provide.
tensorflow/lite/micro/micro_log.h: Contains logging utilities specific to TensorFlow Lite Micro. These are usually simpler and more constrained compared to full-scale logging frameworks, to fit the needs of microcontroller environments.
tensorflow/lite/micro/micro_mutable_op_resolver.h: This file helps in resolving the operators (ops) that your model will use. In TensorFlow Lite Micro, you need to explicitly tell the interpreter which ops your model will be using to save space.
tensorflow/lite/micro/micro_profiler.h: Contains utilities for profiling the performance of your model on a microcontroller. It's useful for identifying bottlenecks and ensuring that the model runs efficiently.
tensorflow/lite/micro/recording_micro_interpreter.h: This is an extension of the MicroInterpreter that adds functionality for recording various runtime metrics like memory usage, which can help in debugging and optimization.
tensorflow/lite/micro/system_setup.h: Contains setup functions for the system to get it ready to run TensorFlow Lite for Microcontrollers.
tensorflow/lite/schema/schema_generated.h: This file is generated from the TFLite schema and includes the FlatBuffers schema for TFLite files. It contains the object representations and utility functions for reading TFLite model files.

1.2 Defining `RegisterOps`

namespace
{
    using HelloWorldOpResolver = tflite::MicroMutableOpResolver<1>;

    TfLiteStatus RegisterOps(HelloWorldOpResolver& op_resolver)
    {
        TF_LITE_ENSURE_STATUS(op_resolver.AddFullyConnected());
        return kTfLiteOk;
    }
}

The anonymous namespace ensures that HelloWorldOpResolver and RegisterOps are only visible within the translation unit where they are defined. This avoids name clashes if the same names are used in other parts of the project.
using HelloWorldOpResolver = tflite::MicroMutableOpResolver<1>;: This line is creating a type alias named HelloWorldOpResolver that is equivalent to tflite::MicroMutableOpResolver<1>. The MicroMutableOpResolver class is templated to specify the maximum number of ops it can resolve; here, it's set to 1.
TfLiteStatus RegisterOps(HelloWorldOpResolver& op_resolver): This function takes a reference to an object of type HelloWorldOpResolver and attempts to register the FullyConnected operator with it.
TF_LITE_ENSURE_STATUS(op_resolver.AddFullyConnected());: This macro line adds the FullyConnected operator to op_resolver and checks if the operation was successful. If it wasn't, the function returns an error status. Because each HelloWorldOpResolver can only resolve one op, the op_resolver is full after this function is called.
return kTfLiteOk;: If everything goes as planned, the function returns kTfLiteOk, indicating success.

In TensorFlow Lite Micro, "resolving" an operation means mapping the operation used in a machine learning model to the actual code that will be executed when the model is run. Before the model can run, the interpreter uses the resolver to map the ops in the model to their actual code implementations. Essentially, the resolver tells the interpreter, "For this op, run this piece of code."

1.3 Test 1: Profiling Memory and Latency

TfLiteStatus ProfileMemoryAndLatency()
{
    tflite::MicroProfiler profiler;
    HelloWorldOpResolver op_resolver;
    TF_LITE_ENSURE_STATUS(RegisterOps(op_resolver));

    // Arena size just a round number. The exact arena usage can be determined
    // using the RecordingMicroInterpreter.
    constexpr int kTensorArenaSize = 3000;
    uint8_t tensor_arena[kTensorArenaSize];
    constexpr int kNumResourceVariables = 24;

    tflite::RecordingMicroAllocator* allocator(
        tflite::RecordingMicroAllocator::Create(tensor_arena, kTensorArenaSize));
    tflite::RecordingMicroInterpreter interpreter(
        tflite::GetModel(g_hello_world_float_model_data), op_resolver, allocator,
        tflite::MicroResourceVariables::Create(allocator, kNumResourceVariables),
        &profiler);

    TF_LITE_ENSURE_STATUS(interpreter.AllocateTensors());
    TFLITE_CHECK_EQ(interpreter.inputs_size(), 1);
    interpreter.input(0)->data.f[0] = 1.f;
    TF_LITE_ENSURE_STATUS(interpreter.Invoke());

    MicroPrintf("");  // Print an empty new line
    profiler.LogTicksPerTagCsv();

    MicroPrintf("");  // Print an empty new line
    interpreter.GetMicroAllocator().PrintAllocations();
    return kTfLiteOk;
}

The profiler.LogTicksPerTagCsv() prints the profiling info in CSV format. The output (when put in a table) looks like this:

Unique Tag	Total ticks across all events with that tag
FULLY_CONNECTED	20
total number of ticks	20

This means that the FullyConnected op took 20 ticks to run. The total number of ticks is also 20, which means that the FullyConnected op was the only op that ran.

The interpreter.GetMicroAllocator().PrintAllocations() prints the memory usage info. The output looks like this:

[RecordingMicroAllocator] Arena allocation total 2344 bytes
[RecordingMicroAllocator] Arena allocation head 128 bytes
[RecordingMicroAllocator] Arena allocation tail 2216 bytes
[RecordingMicroAllocator] 'TfLiteEvalTensor data' used 240 bytes with alignment overhead (requested 240 bytes for 10 allocations)
[RecordingMicroAllocator] 'Persistent TfLiteTensor data' used 128 bytes with alignment overhead (requested 128 bytes for 2 tensors)
[RecordingMicroAllocator] 'Persistent buffer data' used 1152 bytes with alignment overhead (requested 1100 bytes for 7 allocations)
[RecordingMicroAllocator] 'NodeAndRegistration struct' used 192 bytes with alignment overhead (requested 192 bytes for 3 NodeAndRegistration structs)

The output is of the type TfLiteStatus, which can either be kTfLiteOk or an error status. The definition of TfLiteStatus is in lite/core/c/c_api_types.h:

typedef enum TfLiteStatus {
  /// Success
  kTfLiteOk = 0,

  /// Generally referring to an error in the runtime (i.e. interpreter)
  kTfLiteError = 1,

  /// Generally referring to an error from a TfLiteDelegate itself.
  kTfLiteDelegateError = 2,

  /// Generally referring to an error in applying a delegate due to
  /// incompatibility between runtime and delegate, e.g., this error is returned
  /// when trying to apply a TF Lite delegate onto a model graph that's already
  /// immutable.
  kTfLiteApplicationError = 3,

  /// Generally referring to serialized delegate data not being found.
  /// See tflite::delegates::Serialization.
  kTfLiteDelegateDataNotFound = 4,

  /// Generally referring to data-writing issues in delegate serialization.
  /// See tflite::delegates::Serialization.
  kTfLiteDelegateDataWriteError = 5,

  /// Generally referring to data-reading issues in delegate serialization.
  /// See tflite::delegates::Serialization.
  kTfLiteDelegateDataReadError = 6,

  /// Generally referring to issues when the TF Lite model has ops that cannot
  /// be resolved at runtime. This could happen when the specific op is not
  /// registered or built with the TF Lite framework.
  kTfLiteUnresolvedOps = 7,

  /// Generally referring to invocation cancelled by the user.
  /// See `interpreter::Cancel`.
  // TODO(b/194915839): Implement `interpreter::Cancel`.
  // TODO(b/250636993): Cancellation triggered by `SetCancellationFunction`
  // should also return this status code.
  kTfLiteCancelled = 8,
} TfLiteStatus;

1.4 Test 2: Accuracy

Again, we will first create an interpreter, allocate tensors, and invoke the model. Then, we will check if the predicted output is within a small range of the expected output.

TfLiteStatus LoadFloatModelAndPerformInference() {
    const tflite::Model* model =
        ::tflite::GetModel(g_hello_world_float_model_data);
    TFLITE_CHECK_EQ(model->version(), TFLITE_SCHEMA_VERSION);

    HelloWorldOpResolver op_resolver;
    TF_LITE_ENSURE_STATUS(RegisterOps(op_resolver));

    // Arena size just a round number. The exact arena usage can be determined
    // using the RecordingMicroInterpreter.
    constexpr int kTensorArenaSize = 3000;
    uint8_t tensor_arena[kTensorArenaSize];

    tflite::MicroInterpreter interpreter(model, op_resolver, tensor_arena,
                                        kTensorArenaSize);
    TF_LITE_ENSURE_STATUS(interpreter.AllocateTensors());

    // Check if the predicted output is within a small range of the
    // expected output
    float epsilon = 0.05f;
    constexpr int kNumTestValues = 4;
    float golden_inputs[kNumTestValues] = {0.f, 1.f, 3.f, 5.f};

    for (int i = 0; i < kNumTestValues; ++i) {
        interpreter.input(0)->data.f[0] = golden_inputs[i];
        TF_LITE_ENSURE_STATUS(interpreter.Invoke());
        float y_pred = interpreter.output(0)->data.f[0];
        TFLITE_CHECK_LE(abs(sin(golden_inputs[i]) - y_pred), epsilon);
    }

    return kTfLiteOk;
}

1.5 Test 3: Accuracy with Integer Quantization

This test is similar to the previous one, except that we're using the integer (int8) version of the model.

TfLiteStatus LoadQuantModelAndPerformInference() {
    // Map the model into a usable data structure. This doesn't involve any
    // copying or parsing, it's a very lightweight operation.
    const tflite::Model* model =
        ::tflite::GetModel(g_hello_world_int8_model_data);
    TFLITE_CHECK_EQ(model->version(), TFLITE_SCHEMA_VERSION);

    HelloWorldOpResolver op_resolver;
    TF_LITE_ENSURE_STATUS(RegisterOps(op_resolver));

    // Arena size just a round number. The exact arena usage can be determined
    // using the RecordingMicroInterpreter.
    constexpr int kTensorArenaSize = 3000;
    uint8_t tensor_arena[kTensorArenaSize];

    tflite::MicroInterpreter interpreter(model, op_resolver, tensor_arena,
                                        kTensorArenaSize);

    TF_LITE_ENSURE_STATUS(interpreter.AllocateTensors());

    TfLiteTensor* input = interpreter.input(0);
    TFLITE_CHECK_NE(input, nullptr);

    TfLiteTensor* output = interpreter.output(0);
    TFLITE_CHECK_NE(output, nullptr);

    float output_scale = output->params.scale;
    int output_zero_point = output->params.zero_point;

    // Check if the predicted output is within a small range of the
    // expected output
    float epsilon = 0.05;

    constexpr int kNumTestValues = 4;
    float golden_inputs_float[kNumTestValues] = {0.77, 1.57, 2.3, 3.14};

    // The int8 values are calculated using the following formula
    // (golden_inputs_float[i] / input->params.scale + input->params.scale)
    int8_t golden_inputs_int8[kNumTestValues] = {-96, -63, -34, 0};

    for (int i = 0; i < kNumTestValues; ++i) {
        input->data.int8[0] = golden_inputs_int8[i];
        TF_LITE_ENSURE_STATUS(interpreter.Invoke());
        float y_pred = (output->data.int8[0] - output_zero_point) * output_scale;
        TFLITE_CHECK_LE(abs(sin(golden_inputs_float[i]) - y_pred), epsilon);
    }

    return kTfLiteOk;
}

Note that to convert the output from int8 to float, we need to use the scale and zero point values from the output tensor. The formula is:

$y_{pred} = (y_{int8} - zp) \times s$

where $zp$ is the zero point and $s$ is the scale. Those two values are stored in the params field of the output tensor.