Chapter 4: Building a "Hello World" Model

This chapter explores a notebook that trains a model with TensorFlow to predict the sine function. Below are the main takeaways:

Designing the Model

Predicting the sine function is a regression task. We'll create a simple 2-layer neural network with 16 hidden units and a single output unit. The input and output will each be a single value. We'll train the model to minimize the mean squared error (MSE) loss function.

from tensorflow.keras import layers
model_1 = tf.keras.Sequential()
model_1.add(layers.Dense(16, activation='relu', input_shape=(1,)))
model_1.add(layers.Dense(1))
model_1.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
model_1.summary()

The summary of the model is as follows:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 16)                32        
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 17        
=================================================================
Total params: 49
Trainable params: 49
Non-trainable params: 0
_________________________________________________________________

Note that TensorFlow models can be "compiled," which is different from PyTorch models that run dynamically. The TensorFlow model gets compiled into a static graph for execution.

Training the Model

To train the model, simply call model.fit(). The training data is generated by the generate_data() function, and the model trains for 1000 epochs.

history_1 = model_1.fit(x_train, y_train, epochs=1000, batch_size=16,
                    validation_data=(x_validate, y_validate))

The fit() method returns a history object that contains training and validation loss and metrics. We can then plot these metrics to understand the model's performance.

training_loss

Making Predictions: Test 1

To predict the sine function, we will use the model.predict() method. The initial results aren't promising. To improve them, we'll increase the model's complexity by adding more layers and hidden units.

prediction_1

Improving the Model

Here, we design a model with three layers and 16 hidden units in each layer.

model_2 = tf.keras.Sequential()
model_2.add(layers.Dense(16, activation='relu', input_shape=(1,)))
model_2.add(layers.Dense(16, activation='relu'))
model_2.add(layers.Dense(1))
model_2.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
model_2.summary()

The summary for this improved model is:

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_2 (Dense)              (None, 16)                32        
_________________________________________________________________
dense_3 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 17        
=================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________

Train the model again:

history_2 = model_2.fit(x_train, y_train, epochs=600, batch_size=16,
                    validation_data=(x_validate, y_validate))

Making Predictions: Test 2

The predicted sine function is much closer to the actual sine function.

prediction_2

Convert to TensorFlow Lite

Everything covered so far has been pretty basic. The real TinyML magic happens when we convert the model to TensorFlow Lite. The conversion process is as follows:

# Convert the model to the TensorFlow Lite format without quantization
converter = tf.lite.TFLiteConverter.from_keras_model(model_2)
tflite_model = converter.convert()
open("sine_model.tflite", "wb").write(tflite_model)

The resulting model is a binary file that can be loaded onto a microcontroller. The model is 2736 bytes in size, which is small enough to fit on most microcontrollers.

However, we can make the model even smaller by quantizing it. Quantization is a process that reduces the precision of the model's weights and activations. We will go into more detail on quantization in a later chapter. This reduces the model's size and makes it faster to execute. The quantization process is as follows:

# Convert the model to the TensorFlow Lite format with quantization
converter = tf.lite.TFLiteConverter.from_keras_model(model_2)
# Indicate that we want to perform the default optimizations,
# which includes quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Define a generator function that provides our test data's x values
# as a representative dataset, and tell the converter to use it
def representative_dataset_generator():
    for value in x_test:
        # Each scalar value must be inside of a 2D array that is wrapped in a list
        yield [np.array(value, dtype=np.float32, ndmin=2)]
converter.representative_dataset = representative_dataset_generator
# Convert the model
tflite_model = converter.convert()

# Save the model to disk
open("sine_model_quantized.tflite", "wb").write(tflite_model)

The quantized model is 2512 bytes in size, which only saves 224 bytes. This is because the model is already small, and quantization is more effective on larger models.

Note that the converter.representative_dataset is used specifically for quantization when we're converting a TensorFlow model to TensorFlow Lite. The representative dataset helps the converter determine the dynamic range of activations so that it can quantize the model in a way that minimally impacts accuracy. The dataset should be representative of the data the model will actually see in deployment to get the most accurate quantized model.

Hex Dump of the Model

We can run the following code to print out a hex dump of the model weights.

# Install xxd if it is not available
!apt-get -qq install xxd
# Save the file as a C source file
!xxd -i sine_model_quantized.tflite > sine_model_quantized.cc
# Print the source file
!cat sine_model_quantized.cc

The output is:

const unsigned char g_sine_model_data[] DATA_ALIGN_ATTRIBUTE = {
    0x1c, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x0e, 0x00, 0x18, 0x00, 0x04, 0x00, 0x08, 0x00, 0x0c, 0x00,
    0x10, 0x00, 0x14, 0x00, 0x0e, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00,
    0x10, 0x09, 0x00, 0x00, 0x58, 0x02, 0x00, 0x00, 0x40, 0x02, 0x00, 0x00,
    ...
    0x00, 0x00, 0x00, 0x06, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x06, 0x00,
    0x06, 0x00, 0x05, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00, 0x72, 0x0a, 0x00,
    0x0c, 0x00, 0x07, 0x00, 0x00, 0x00, 0x08, 0x00, 0x0a, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x09, 0x04, 0x00, 0x00, 0x00};
const int g_sine_model_data_len = 2432;

We can copy and paste this into a C file and load the model onto a microcontroller.

Test the Model in Python

# Instantiate an interpreter for the model
sine_model = tf.lite.Interpreter('sine_model.tflite')

# Allocate memory for the model
sine_model.allocate_tensors()

# Get indexes of the input and output tensors
sine_model_input_index = sine_model.get_input_details()[0]["index"]
sine_model_output_index = sine_model.get_output_details()[0]["index"]

# Create arrays to store the results
sine_model_predictions = []

# Run the model's interpreter for each input value (x_value) and store the results in arrays
for x_value in x_test:
    # Create a 2D tensor wrapping the current x value
    x_value_tensor = tf.convert_to_tensor([[x_value]], dtype=np.float32)
    # Write the value to the input tensor
    sine_model.set_tensor(sine_model_input_index, x_value_tensor)
    # Run inference
    sine_model.invoke()
    # Read the prediction from the output tensor
    sine_model_predictions.append(sine_model.get_tensor(sine_model_output_index)[0])


# See how they line up with the data
plt.clf()
plt.title('Comparison of various models against actual values')
plt.plot(x_test, y_test, 'bo', label='Actual')
plt.plot(x_test, predictions, 'ro', label='Original predictions')
plt.plot(x_test, sine_model_predictions, 'bx', label='Lite predictions')
plt.legend()
plt.show()

In a standard TensorFlow (or even Keras) environment, we would typically use the .predict() method to get model predictions. However, TensorFlow Lite is designed for resource-constrained environments like embedded systems, and the API is simplified and different.

set_tensor and get_tensor

set_tensor(index, tensor): This function takes an index and a tensor. It sets the tensor at the given index in the model's input or output list to the provided tensor. Essentially, we're updating the model's input data before running inference.
get_tensor(index): This function takes an index and returns the tensor located at that index in the model's input or output list. We use this to extract the model's prediction after running inference.

So, in TensorFlow Lite, the typical .predict() flow is broken down into these lower-level operations to give you more control. This is especially useful in systems where you might need fine-grained control over aspects like memory allocation or execution timing.

Here's how the code would look like in standard TensorFlow for comparison:

predictions = sine_model.predict(x_test)

In TensorFlow Lite, you manually set the input, invoke the model, and then get the output, which allows for more control but requires a bit more boilerplate code.