Chapter 9: Person Detection
Application Structure
When developing an application for person detection, the structure can be broken down into three main steps:
1. Image Capture
The first step is to capture an image from the camera module. This is the raw data that will be fed into the machine learning model. Depending on your setup, you may capture this as a bitmap, JPEG, or in another image format that the model can interpret.
2. Model Inference
Next, the captured image is directly fed into the pre-trained person detection model. The model processes the image and outputs its inference—essentially determining whether there is a person in the image or not. This step could be computational-intensive depending on the complexity of the model.
3. LED Feedback
Based on the model's inference, the application then lights up specific LEDs. For instance, a green LED could be lit if a person is detected, and a red one if no person is found in the image.
Simplicity Compared to Wake-Word Detection
When compared to wake-word detection (common in voice-activated systems), person detection has certain advantages in terms of simplicity:
Minimal Preprocessing
In wake-word detection, audio signals often require substantial preprocessing to be used effectively by the model. In contrast, person detection is more straightforward. Since the sensor's output is already an image—a format that can be directly used by the model—there is very little preprocessing needed.
No Need for Averaging
Wake-word detection models usually average multiple inferences over a period to improve accuracy. With person detection, this is generally not necessary. The models can be more computationally heavy and take longer to run, making it less practical to average multiple runs to reach a decision.
Test the Model on Local Machine
This test code is used to test the model on the local machine.
Includes and Initial Setup
#include "tensorflow/lite/c/common.h"
#include "tensorflow/lite/micro/examples/person_detection/model_settings.h"
...
uint8_t tensor_arena[tensor_arena_size];
This part includes necessary headers and defines the tensor_arena
for allocating memory. The size of this arena depends on the platform (XTENSA
and VISION_P6
in this case).
Test Initialization
TF_LITE_MICRO_TESTS_BEGIN
TF_LITE_MICRO_TEST(TestInvoke) {
It starts a group of micro tests and then defines a specific test called TestInvoke
.
Model Loading
const tflite::Model* model = ::tflite::GetModel(g_person_detect_model_data);
Loads the model into a lightweight data structure for later use.
Schema Version Check
if (model->version() != TFLITE_SCHEMA_VERSION) {
MicroPrintf("Model provided is schema version %d not equal to supported version %d.\n", model->version(), TFLITE_SCHEMA_VERSION);
}
Checks if the model's schema version matches the supported version.
Operation Resolver
tflite::MicroMutableOpResolver<5> micro_op_resolver;
micro_op_resolver.AddAveragePool2D(tflite::Register_AVERAGE_POOL_2D_INT8());
...
Defines which operations the interpreter should be able to execute. In this case, it adds support for five different types of operations.
Interpreter Creation
tflite::MicroInterpreter interpreter(model, micro_op_resolver, tensor_arena, tensor_arena_size);
interpreter.AllocateTensors();
Creates a MicroInterpreter
object and allocates memory for the tensors.
Input Validation
TfLiteTensor* input = interpreter.input(0);
TF_LITE_MICRO_EXPECT(input != nullptr);
...
Grabs the model's input tensor and verifies its dimensions and data type.
Test Data Loading
memcpy(input->data.int8, g_person_image_data, input->bytes);
Copies an image containing a person into the input tensor.
Model Invocation
TfLiteStatus invoke_status = interpreter.Invoke();
...
Invokes the model and checks the return status.
Output Validation
TfLiteTensor* output = interpreter.output(0);
...
TF_LITE_MICRO_EXPECT_GT(person_score, no_person_score);
Fetches the output tensor and checks if the output is as expected. Specifically, it ensures that the person score is higher than the no-person score for a person image, and vice versa for a no-person image.
Test End
TF_LITE_MICRO_TESTS_END
Marks the end of the test suite.
Deploy the Model on Arduino
Run the person detection
example from the Arduino IDE. The code is located at File > Examples > Harvard_TinyMLx > person_detection
.
Again, the Arduino sketch follows the same structure as the other examples:
Include TensorFlow Lite Micro and other helper files
#include <TensorFlowLite.h>
#include "main_functions.h"
#include "detection_responder.h"
...
Declaring global variables
These will later be initialized in the setup()
function.
namespace {
tflite::ErrorReporter* error_reporter = nullptr;
const tflite::Model* model = nullptr;
tflite::MicroInterpreter* interpreter = nullptr;
TfLiteTensor* input = nullptr;
constexpr int kTensorArenaSize = 136 * 1024;
static uint8_t tensor_arena[kTensorArenaSize];
}
Setup function
void setup() {
// NOLINTNEXTLINE(runtime-global-variables)
static tflite::MicroErrorReporter micro_error_reporter;
error_reporter = µ_error_reporter;
// Map the model into a usable data structure. This doesn't involve any
// copying or parsing, it's a very lightweight operation.
model = tflite::GetModel(g_person_detect_model_data);
if (model->version() != TFLITE_SCHEMA_VERSION) {
TF_LITE_REPORT_ERROR(error_reporter,
"Model provided is schema version %d not equal "
"to supported version %d.",
model->version(), TFLITE_SCHEMA_VERSION);
return;
}
// Pull in only the operation implementations we need.
// This relies on a complete list of all the ops needed by this graph.
// An easier approach is to just use the AllOpsResolver, but this will
// incur some penalty in code space for op implementations that are not
// needed by this graph.
//
// tflite::AllOpsResolver resolver;
// NOLINTNEXTLINE(runtime-global-variables)
static tflite::MicroMutableOpResolver<5> micro_op_resolver;
micro_op_resolver.AddAveragePool2D();
micro_op_resolver.AddConv2D();
micro_op_resolver.AddDepthwiseConv2D();
micro_op_resolver.AddReshape();
micro_op_resolver.AddSoftmax();
// Build an interpreter to run the model with.
// NOLINTNEXTLINE(runtime-global-variables)
static tflite::MicroInterpreter static_interpreter(
model, micro_op_resolver, tensor_arena, kTensorArenaSize, error_reporter);
interpreter = &static_interpreter;
// Allocate memory from the tensor_arena for the model's tensors.
TfLiteStatus allocate_status = interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
TF_LITE_REPORT_ERROR(error_reporter, "AllocateTensors() failed");
return;
}
// Get information about the memory area to use for the model's input.
input = interpreter->input(0);
}
Loop function
void loop() {
// Get image from provider.
if (kTfLiteOk != GetImage(error_reporter, kNumCols, kNumRows, kNumChannels, input->data.int8)) {
TF_LITE_REPORT_ERROR(error_reporter, "Image capture failed.");
}
// Run the model on this input and make sure it succeeds.
if (kTfLiteOk != interpreter->Invoke()) {
TF_LITE_REPORT_ERROR(error_reporter, "Invoke failed.");
}
TfLiteTensor* output = interpreter->output(0);
// Process the inference results.
int8_t person_score = output->data.uint8[kPersonIndex];
int8_t no_person_score = output->data.uint8[kNotAPersonIndex];
RespondToDetection(error_reporter, person_score, no_person_score);
}
Get Image from Camera
The GetImage
function is used to capture an image from the camera module. It is located in arduino_image_provider.cpp
.
#include <TinyMLShield.h> // Camera library
TfLiteStatus GetImage(tflite::ErrorReporter* error_reporter, int image_width,
int image_height, int channels, int8_t* image_data) {
byte data[176 * 144]; // Receiving QCIF grayscale from camera = 176 * 144 * 1
static bool g_is_camera_initialized = false;
static bool serial_is_initialized = false;
// Initialize camera if necessary
if (!g_is_camera_initialized) {
if (!Camera.begin(QCIF, GRAYSCALE, 5, OV7675)) {
TF_LITE_REPORT_ERROR(error_reporter, "Failed to initialize camera!");
return kTfLiteError;
}
g_is_camera_initialized = true;
}
// Read camera data
Camera.readFrame(data);
int min_x = (176 - 96) / 2;
int min_y = (144 - 96) / 2;
int index = 0;
// Crop 96x96 image. This lowers FOV, ideally we would downsample but this is simpler.
for (int y = min_y; y < min_y + 96; y++) {
for (int x = min_x; x < min_x + 96; x++) {
image_data[index++] = static_cast<int8_t>(data[(y * 176) + x] - 128); // convert TF input image to signed 8-bit
}
}
return kTfLiteOk;
}
#endif // ARDUINO_EXCLUDE_CODE
Tensorflow Lite kernels now uses signed int8_t
for quantized models. This means that the input images must be converted from unisgned to signed format. Therefore, we see that the image data is converted to signed 8-bit within the nested for loop.