Chapter 13: TensorFlow Lite Micro

Comparing TensorFlow Versions

Category	Description
TensorFlow	- Developed by Google, released in 2015. - Adds 20 MB by default to binary size; can be reduced to 2 MB by trimming features.
TensorFlow Lite	- Started in 2017. - Designed for mobile device efficiency. - Omits training; only runs inference. - Limited data types and operations compared to TensorFlow. - Fits in a few hundred KB. - Supports 8-bit quantization and has optimized libraries.
TensorFlow Lite Micro	- Focus on binary size; aims for 20 KB or less. - Removed dependencies like C Standard Library. - Started in 2018 targeting embedded platforms. - First application was a wake word detection (e.g., "Hey Google").

Requirements of TensorFlow Lite Micro:

No OS Dependencies: To accommodate platforms without an OS, the framework avoids references to files or devices.
No C/C++ Library Dependencies at Linker Time: Binary size was crucial, so dependencies that would add size, such as sprintf(), were avoided. The C math library, needed for specific functions, was an exception.
No Floating-Point Hardware Expected: The code leans towards 8-bit integer parameters due to some platforms lacking hardware support for floating-point arithmetic. However, it can support float operations when necessary.
No Dynamic Memory Allocation: Due to concerns like heap fragmentation, the framework avoids dynamic memory allocation. Instead, it uses a fixed-size arena provided by the calling application.
Requires C++11: TensorFlow Lite Micro is primarily in C++, adhering to C++11. This was chosen for code sharing across TensorFlow Lite versions and as most popular platforms already support it.
Expects 32-bit Processors: The focus is on the increasingly common 32-bit embedded devices, maintaining consistency across mobile and embedded versions. While there are reports of 16-bit platform ports, they aren't the primary concern.

Why Is the Model Interpreted?

When deciding between interpreting models at runtime and generating code from a model beforehand, different benefits and challenges come into play.

Advantages of Code Generation: - Ease of Building: Integrating into build systems is simplified. Users can directly include a few C or C++ files in any IDE, leading to smooth project building. - Modifiability: A single implementation file allows for easier debugging and modification than a large library. - Inline Data: Model data is part of the source, eliminating the need for extra files and loading steps. - Code Size: Only necessary code is included, minimizing program size.

Drawbacks of Code Generation (vs. Model Interpretation): - Upgradability: Upgrading to a newer framework version after modifying generated code is cumbersome. It involves manual merging of changes or regenerating and reapplying local changes. - Multiple Models: Code generation struggles with simultaneous support for multiple models without duplicating much of the source. - Replacing Models: Switching to a different model necessitates recompiling the whole program.

The team found a middle-ground solution called "project generation" to harness the benefits of code generation while avoiding its downsides.

Project Generation in TensorFlow Lite:

Definition: Project generation in TensorFlow Lite creates a copy of the necessary source files to build a specific model. It optionally sets up IDE-specific project files for easy building. Unlike code generation, it doesn't modify the source files.

Key Advantages: - Upgradability: Source files match the original TensorFlow Lite structure, simplifying local modifications and library upgrades using standard merge tools. - Multiple and Replacement Models: Thanks to the interpreter structure, multiple models can coexist, and swapping models is straightforward without needing a recompile. - Inline Data: Model parameters can be incorporated directly into the program as a C data array. With the FlatBuffers serialization format, this data can be directly used in memory without any unpacking. - External Dependencies: All required files are stored in one folder, eliminating the need for downloading or installing separate dependencies.

One challenge, however, is code size. Due to the interpreter structure, it's harder to remove unused code paths. TensorFlow Lite addresses this with the OpResolver mechanism, allowing users to register only the essential kernel implementations.

Using TensorFlow Lite's Build System for Your Projects:

TensorFlow Lite, rooted in the Linux environment, uses familiar Unix tools like shell scripts, Make, and Python. However, the developers acknowledge the varied preferences of embedded developers and offer a versatile solution: the "project generation" mechanism. Here's a guide on how to navigate this system for your project.

Step-by-Step Guide for Setting Up Your Project:

Start with the Source:
- Fetch TensorFlow's source code from GitHub.
- For foundational set-up on Linux, use:

make -f tensorflow/lite/micro/tools/make/Makefile test

Pre-existing Projects:
- If you're working with a provided example, like the speech wake-word for SparkFun Edge, compile using:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET="sparkfun_edge" micro_speech_bin

Setting Up a New Project:
- Once you have your model's byte array ready for your new project, you need to create a corresponding directory structure and files for your project.
- Use project generation to scaffold out this structure. For instance, to create a project compatible with the Mbed IDE:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET="your_target_name" generate_your_project_name_mbed_project

This will create a directory with source files and dependencies tailored for the Mbed environment.

For Windows and Other IDEs:
- If you're on platforms other than Linux or use IDEs like Keil, Mbed, or Arduino, leverage project generation to obtain a compatible structure.
- Similar commands (as shown above for Mbed) can be crafted for other platforms.
- Additionally, there's a universal option that provides a basic folder structure of source files. It might not have specific project meta-information but does include a configuration file for Visual Studio Code.
Keeping Your Project Updated:
- Given the nature of TensorFlow Lite's build system, you might wonder about updates. The beauty of project generation is that it streamlines updates. Regularly, TensorFlow Lite's team auto-generates projects, making the results accessible on a public web server. Instead of manually pulling from GitHub, you can download the latest configurations for your IDE and merge changes as needed.

Specializing Code in TensorFlow Lite

In TensorFlow Lite's Micro framework, it is common to require platform-specific or optimized implementations of certain modules. To balance customization with maintainability and ease of integration back into the main framework, the following approach is used:

Modular Design: The library is divided into small modules. Each module has a default C++ file (implementation) and an associated header file (interface).
Custom Implementations: For platform-specific or optimized versions, the specialized version of the module is saved in a subfolder named after the target platform or feature, within the directory of the original module. This subfolder version is used over the default when building for that target.
Example - Audio Provider:
- The wake-word sample needs to capture audio, but no cross-platform method exists.
- A default audio_provider.cpp returns a buffer filled with zeros (without using a microphone). This allows prototyping even if a microphone isn't yet functional.
- For real devices, platform-specific implementations are required. E.g., for the STM32F746NG Discovery kit, a specialized audio_provider.cpp exists in a disco_f746ng subfolder.
- When building for STM32F746NG, the build system uses this specialized version instead of the default.
- Similarly, there are various specialized versions for different platforms, such as macOS.
Extensibility and Optimization: Beyond portability, this approach is also used for optimizations. Default implementations in the library are simple and generic. For performance improvements, specialized and optimized versions can replace these defaults, allowing developers to incrementally improve and test code performance.

This structure ensures flexibility for developers while maintaining a consistent interface and ease of integration with the larger TensorFlow Lite ecosystem.

TensorFlow Lite File Format

Complexity: TensorFlow Lite's model storage format is intricate, but becomes manageable after understanding its basics.
Neural Network Basics: Neural network models are graphs with operations having inputs and outputs. Inputs can include:
Learned values (weights)
- Results from preceding operations
- External input values (like image pixels or audio samples)

The model's final outputs often represent predictions for different categories.

Transfer to Devices: Neural models are usually trained on desktops and need to be transferred to other devices like phones or microcontrollers.
- Transfer is done using a converter that exports a trained TensorFlow model to a TensorFlow Lite file.
- Conversion is challenging because:
  - A desktop-trained model might rely on features not available on simpler platforms.
  - Conversion requires transforming variable values like weights into constants.
  - There's a need to remove operations necessary only for training and apply optimizations.
Large Number of Operations: There are over 800 operations in mainline TensorFlow, with more being added regularly. This makes writing a comprehensive converter a challenging task.
Simplicity in TensorFlow Lite Files: The TensorFlow Lite conversion process aims to produce:
- A clearer representation of the trained model.
- Frozen variables turned into weights.
- Applied common graph optimizations.
Due to its simplicity and clarity, using the TensorFlow Lite file format is recommended for accessing TensorFlow models for inference, even outside the context of microcontrollers.

Introduction to FlatBuffer

FlatBuffer is a highly efficient binary serialization library. It was originally created at Google for game development and other performance-critical applications, including TensorFlow Lite Micro.

Unlike other serialization methods, FlatBuffer does not need a parsing/decoding step to a secondary representation, making it much faster for I/O operations. It allows you to directly access serialized data without parsing or unpacking, which can be particularly useful for applications that require high performance or real-time processing.

Tutorial: Using FlatBuffer with C++

1. Setting up FlatBuffer

First, you need to download and install the FlatBuffers compiler (flatc) and C++ headers. You can usually do this via package managers or directly from the FlatBuffers GitHub repository.

2. Define your data schema

FlatBuffers uses a schema to understand the structure of the data you wish to serialize. This schema is written in a file with a .fbs extension.

For our tutorial, let's define a simple schema for a user:

namespace Tutorial;

table User {
  id: int;
  name: string;
  email: string;
}

root_type User;

Save this schema into a file named user.fbs.

3. Compile the schema

Using the flatc compiler, we can generate the C++ headers for our schema.

flatc --cpp user.fbs

This will produce a file named user_generated.h.

4. Serialize data using FlatBuffer in C++

Now, let's write a simple C++ program to serialize some user data:

#include "user_generated.h"
#include <iostream>
#include <flatbuffers/flatbuffers.h>

int main() {
    flatbuffers::FlatBufferBuilder builder;

    auto name = builder.CreateString("John Doe");
    auto email = builder.CreateString("john.doe@example.com");

    Tutorial::UserBuilder userBuilder(builder);
    userBuilder.add_id(1);
    userBuilder.add_name(name);
    userBuilder.add_email(email);

    auto user = userBuilder.Finish();
    builder.Finish(user);

    // Get the serialized data
    uint8_t *buf = builder.GetBufferPointer();
    int size = builder.GetSize();

    std::cout << "Serialized size: " << size << " bytes." << std::endl;

    return 0;
}

5. Deserialize data

Here's how you can deserialize the user data from the buffer:

const Tutorial::User* user = flatbuffers::GetRoot<Tutorial::User>(buf);
std::cout << "User ID: " << user->id() << std::endl;
std::cout << "Name: " << user->name()->c_str() << std::endl;
std::cout << "Email: " << user->email()->c_str() << std::endl;

Remember to link against the FlatBuffers library when compiling your program. That's it! You've now learned the basics of using FlatBuffers with C++ for serialization and deserialization. With this foundation, you can further explore more advanced features and optimize your applications for performance.

FlatBuffers in TensorFlow Lite Micro

Due to the limited resources available on devices like microcontrollers, it's crucial to have efficient ways to represent models and data. FlatBuffers plays a critical role here. Here's how TensorFlow Lite Micro uses FlatBuffers:

1. Model Serialization

When you convert a TensorFlow model to the TensorFlow Lite format using the TensorFlow Lite converter, the resulting model is serialized into a FlatBuffer format. This format represents the model's structure, metadata, and the weights/biases in a compact binary form.

2. Memory Efficiency

Using FlatBuffers allows TFLM to read directly from this serialized data without any deserialization overhead. Given that deserialization often requires additional memory, this direct access is a boon for devices with tight memory constraints.

3. Immutable Data

FlatBuffers creates immutable data structures. This is ideal for TFLM because models are typically stored in the read-only memory (ROM) of microcontrollers. The fact that these data structures can't be modified at runtime ensures the integrity of the model.

4. Schemas for Flexibility

The use of FlatBuffer schemas (.fbs files) in TensorFlow Lite provides a clear contract of what the serialized model should look like. If there are updates or changes to the model structure, the schema can be versioned, providing backward compatibility and forward flexibility.

5. Size Optimization

Given the resource constraints of microcontrollers, it's essential to minimize model sizes. FlatBuffers allows the TFLM runtime to work directly on the serialized binary data without unpacking, thus minimizing the runtime footprint.