Chapter 9: Person Detection

Application Structure

When developing an application for person detection, the structure can be broken down into three main steps:

1. Image Capture

The first step is to capture an image from the camera module. This is the raw data that will be fed into the machine learning model. Depending on your setup, you may capture this as a bitmap, JPEG, or in another image format that the model can interpret.

2. Model Inference

Next, the captured image is directly fed into the pre-trained person detection model. The model processes the image and outputs its inference—essentially determining whether there is a person in the image or not. This step could be computational-intensive depending on the complexity of the model.

3. LED Feedback

Based on the model's inference, the application then lights up specific LEDs. For instance, a green LED could be lit if a person is detected, and a red one if no person is found in the image.

Simplicity Compared to Wake-Word Detection

When compared to wake-word detection (common in voice-activated systems), person detection has certain advantages in terms of simplicity:

Minimal Preprocessing

In wake-word detection, audio signals often require substantial preprocessing to be used effectively by the model. In contrast, person detection is more straightforward. Since the sensor's output is already an image—a format that can be directly used by the model—there is very little preprocessing needed.

No Need for Averaging

Wake-word detection models usually average multiple inferences over a period to improve accuracy. With person detection, this is generally not necessary. The models can be more computationally heavy and take longer to run, making it less practical to average multiple runs to reach a decision.

Test the Model on Local Machine

This test code is used to test the model on the local machine.

Includes and Initial Setup

#include "tensorflow/lite/c/common.h"
#include "tensorflow/lite/micro/examples/person_detection/model_settings.h"
...
uint8_t tensor_arena[tensor_arena_size];

This part includes necessary headers and defines the tensor_arena for allocating memory. The size of this arena depends on the platform (XTENSA and VISION_P6 in this case).

Test Initialization

TF_LITE_MICRO_TESTS_BEGIN
TF_LITE_MICRO_TEST(TestInvoke) {

It starts a group of micro tests and then defines a specific test called TestInvoke.

Model Loading

const tflite::Model* model = ::tflite::GetModel(g_person_detect_model_data);

Loads the model into a lightweight data structure for later use.

Schema Version Check

if (model->version() != TFLITE_SCHEMA_VERSION) {
    MicroPrintf("Model provided is schema version %d not equal to supported version %d.\n", model->version(), TFLITE_SCHEMA_VERSION);
}

Checks if the model's schema version matches the supported version.

Operation Resolver

tflite::MicroMutableOpResolver<5> micro_op_resolver;
micro_op_resolver.AddAveragePool2D(tflite::Register_AVERAGE_POOL_2D_INT8());
...

Defines which operations the interpreter should be able to execute. In this case, it adds support for five different types of operations.

Interpreter Creation

tflite::MicroInterpreter interpreter(model, micro_op_resolver, tensor_arena, tensor_arena_size);
interpreter.AllocateTensors();

Creates a MicroInterpreter object and allocates memory for the tensors.

Input Validation

TfLiteTensor* input = interpreter.input(0);
TF_LITE_MICRO_EXPECT(input != nullptr);
...

Grabs the model's input tensor and verifies its dimensions and data type.

Test Data Loading

memcpy(input->data.int8, g_person_image_data, input->bytes);

Copies an image containing a person into the input tensor.

Model Invocation

TfLiteStatus invoke_status = interpreter.Invoke();
...

Invokes the model and checks the return status.

Output Validation

TfLiteTensor* output = interpreter.output(0);
...
TF_LITE_MICRO_EXPECT_GT(person_score, no_person_score);

Fetches the output tensor and checks if the output is as expected. Specifically, it ensures that the person score is higher than the no-person score for a person image, and vice versa for a no-person image.

Test End

TF_LITE_MICRO_TESTS_END

Marks the end of the test suite.

Deploy the Model on Arduino

Run the person detection example from the Arduino IDE. The code is located at File > Examples > Harvard_TinyMLx > person_detection.

Again, the Arduino sketch follows the same structure as the other examples:

Include TensorFlow Lite Micro and other helper files

#include <TensorFlowLite.h>
#include "main_functions.h"
#include "detection_responder.h"
...

Declaring global variables

These will later be initialized in the setup() function.

namespace {
    tflite::ErrorReporter* error_reporter = nullptr;
    const tflite::Model* model = nullptr;
    tflite::MicroInterpreter* interpreter = nullptr;
    TfLiteTensor* input = nullptr;

    constexpr int kTensorArenaSize = 136 * 1024;
    static uint8_t tensor_arena[kTensorArenaSize];
}

Setup function

void setup() {
    // NOLINTNEXTLINE(runtime-global-variables)
    static tflite::MicroErrorReporter micro_error_reporter;
    error_reporter = &micro_error_reporter;

    // Map the model into a usable data structure. This doesn't involve any
    // copying or parsing, it's a very lightweight operation.
    model = tflite::GetModel(g_person_detect_model_data);
    if (model->version() != TFLITE_SCHEMA_VERSION) {
        TF_LITE_REPORT_ERROR(error_reporter,
                            "Model provided is schema version %d not equal "
                            "to supported version %d.",
                            model->version(), TFLITE_SCHEMA_VERSION);
        return;
    }

    // Pull in only the operation implementations we need.
    // This relies on a complete list of all the ops needed by this graph.
    // An easier approach is to just use the AllOpsResolver, but this will
    // incur some penalty in code space for op implementations that are not
    // needed by this graph.
    //
    // tflite::AllOpsResolver resolver;
    // NOLINTNEXTLINE(runtime-global-variables)
    static tflite::MicroMutableOpResolver<5> micro_op_resolver;
    micro_op_resolver.AddAveragePool2D();
    micro_op_resolver.AddConv2D();
    micro_op_resolver.AddDepthwiseConv2D();
    micro_op_resolver.AddReshape();
    micro_op_resolver.AddSoftmax();

    // Build an interpreter to run the model with.
    // NOLINTNEXTLINE(runtime-global-variables)
    static tflite::MicroInterpreter static_interpreter(
        model, micro_op_resolver, tensor_arena, kTensorArenaSize, error_reporter);
    interpreter = &static_interpreter;

    // Allocate memory from the tensor_arena for the model's tensors.
    TfLiteStatus allocate_status = interpreter->AllocateTensors();
    if (allocate_status != kTfLiteOk) {
        TF_LITE_REPORT_ERROR(error_reporter, "AllocateTensors() failed");
        return;
    }

    // Get information about the memory area to use for the model's input.
    input = interpreter->input(0);
}

Loop function

void loop() {
    // Get image from provider.
    if (kTfLiteOk != GetImage(error_reporter, kNumCols, kNumRows, kNumChannels, input->data.int8)) {
        TF_LITE_REPORT_ERROR(error_reporter, "Image capture failed.");
    }

    // Run the model on this input and make sure it succeeds.
    if (kTfLiteOk != interpreter->Invoke()) {
        TF_LITE_REPORT_ERROR(error_reporter, "Invoke failed.");
    }

    TfLiteTensor* output = interpreter->output(0);

    // Process the inference results.
    int8_t person_score = output->data.uint8[kPersonIndex];
    int8_t no_person_score = output->data.uint8[kNotAPersonIndex];
    RespondToDetection(error_reporter, person_score, no_person_score);
}

Get Image from Camera

The GetImage function is used to capture an image from the camera module. It is located in arduino_image_provider.cpp.

#include <TinyMLShield.h> // Camera library

TfLiteStatus GetImage(tflite::ErrorReporter* error_reporter, int image_width,
                      int image_height, int channels, int8_t* image_data) {
    byte data[176 * 144]; // Receiving QCIF grayscale from camera = 176 * 144 * 1

    static bool g_is_camera_initialized = false;
    static bool serial_is_initialized = false;

    // Initialize camera if necessary
    if (!g_is_camera_initialized) {
        if (!Camera.begin(QCIF, GRAYSCALE, 5, OV7675)) {
            TF_LITE_REPORT_ERROR(error_reporter, "Failed to initialize camera!");
            return kTfLiteError;
        }
        g_is_camera_initialized = true;
    }

    // Read camera data
    Camera.readFrame(data);

    int min_x = (176 - 96) / 2;
    int min_y = (144 - 96) / 2;
    int index = 0;

    // Crop 96x96 image. This lowers FOV, ideally we would downsample but this is simpler. 
    for (int y = min_y; y < min_y + 96; y++) {
        for (int x = min_x; x < min_x + 96; x++) {
            image_data[index++] = static_cast<int8_t>(data[(y * 176) + x] - 128); // convert TF input image to signed 8-bit
        }
    }

    return kTfLiteOk;
}

#endif  // ARDUINO_EXCLUDE_CODE

Tensorflow Lite kernels now uses signed int8_t for quantized models. This means that the input images must be converted from unisgned to signed format. Therefore, we see that the image data is converted to signed 8-bit within the nested for loop.