Online Digital Forensics Courses and Labs
Building a Low-cost and State-of-the-art IoT Security Hands-on Laboratory

Funded by National Science Foundation (NSF)

Audio Control on TI-RSLK


Instructor: Dr. Yier Jin, 352-294-0401, yier.jin@ece.ufl.edu

Student Assistants: Max Panoff, Jingyuan Li

Prerequisite:

  • Basic understanding of robotics
  • Ability to use a computer

Goal:

Using Tensorflow to train specific audio words model to control a moving robot.

Table of Contents:


Introduction

With the rapid development of the field of artificial intelligence such as machine learning, more and more applications have been developed based on this technology. The embedded field is no exception. Embedded systems with the attributes of artificial intelligence can not only complete traditional simple control characters, but now some intelligent controls have become achievable.

The problem also ensues. Deep neural networks in artificial intelligence often require quite high computing power, usually based on GPUs, but often the computing power of embedded systems is very limited. In addition, the size of the trained model often exceeds the storage supported by the embedded system. Also, often embedded systems are not supported by the operating system, which means that deploying neural networks to embedded systems can be extremely complex.

However, with the rapid development of artificial intelligence technology, the technology based on TensorFlow Lite makes the embedded system intelligent speech become feasible.We managed to create a version of TensorFlow Lite that meets these constraints, sharing most of the same code, but with only a 20 kilobyte footprint.

What does TensorFlow lite contain?

The main idea of project is that by using an adapted model, you can modify the code to allow a MSP432 board to collect audio data, determine which keyword best matches that data, and respond appropriately.


Required materials

Hardware:

  1. MSP432 LaunchPad
  2. BOOSTXL-AUDIO Audio Booster Pack
  3. TI-RSLK

Software:

  1. Adapted Tensorflow Lite (TFLite) Micro micro_speech project.
  2. CCS. We adopt Code Composer Studio (CCS) as the IDE to use. It can be downloaded from: http://software-dl.ti.com/ccs/esd/documents/ccs_downloads.html. Our release is CCS 9.1.0 for a 64bit Windows System. The IDE is developed for TI microcontrollers and embedded processors and is very powerful. What’s more, the official website also offers some useful reference documents which are downloaded in the Documents folder. We also need to download a software called TI-RSLK Maze which offers many example projects and unfinished code framework. The website to download is: http://www.ti.com/general/docs/lit/getliterature.tsp?baseLiteratureNumber=SLAC768&fileType=zip.
  3. Google Colab.

Platform construction

Almost all the steps of hardware construction are offered by the curriculum mentioned before. We reference mostly the Module 5 and Module12. There are three main components we need to build connection in the process of realizing basic function: Motor Drive and Power Distribution Board (MDPDB), MSP432 LaunchPad and motors. Regarding how to build up the platform, please refer to the part “Platform construction” in http://cyberforensic.net/k-12/k-12_lab4.html.

BOOSTXL_AUDIO Audio Booster

This board contains the microphone that you will use to take audio measurements with. This should be simple to interface with and requires only pins. One should supply power to the microphone. The other pin reads the microphone’s output as input through an Analog Digital Converter. More information on these can be found in the documentation.

After installation, the platform will be looked like the picture below.


Sequence of implementation

To deploy a TensorFlow model to a microcontroller, you will need to follow this process:

  1. Create or obtain a TensorFlow model
  2. The model must be small enough to fit on your target device after conversion, and it can only use supported operations. If you want to use operations that are not currently supported, you can provide your own implementations.

  3. Convert the model to a TensorFlow Lite FlatBuffer
  4. You will convert your model into the standard TensorFlow Lite format using the TensorFlow Lite converter. You may wish to output a quantized model, since these are smaller in size and more efficient to execute.

  5. Convert the FlatBuffer to a C byte array
  6. Models are kept in read-only program memory and provided in the form of a simple C file. Standard tools can be used to convert the FlatBuffer into a C array.

  7. Integrate the TensorFlow Lite for Microcontrollers C++ library
  8. Write your microcontroller code to collect data, perform inference using the C++ library, and make use of the results.

  9. Deploy to your device
  10. Build and deploy the program to your device.


Train your model

Models may be trained through the speech_recognition example through Tensorflow. Tensorflow is interfaced with through python, so installing python is required to train new models. We recommend using anaconda do this on Windows. We also recommend using a modified tiny_conv model architecture. This can be found and modified through models.py in the speech_recognition project. The models we provide you use first_filter_width = 4, first_filter_height = 5, and first_filter_count = 4, which we determined to be small enough to run the model at sufficient speed through experimentation. You may change these values as desired. For an explanation of how convolution layers function, please refer to the attached slides. You may also change the architecture as you wish, though more advanced knowledge of python and Tensorflow will be needed to do so.

Training the model may be done through calling train.py in the speech_recognition. Train.py takes many arguments, we recommend reading through the file to identify them all. By default, this trains the model using the open source speech dataset supplied by google. More detail on this can be found in the slides. This dataset should be sufficient for the purposes of this project, but you are free to modify it as you wish.

After training, the model must be converted into a C style flat buffer. More detail on this can be found in the slides. Note: a part of this process is easiest to complete in Linux. If you do not have access to Linux, it should be possible to execute this through Google Colab.


Main and Main_Functions

We will start our discussion with the main.cc and main_functions.cc files. These follow a certain structure that must be maintained for the code to work. The main function should consist of a “setup” function and “loop” function, both of which take no arguments. This is so that the code base remains compatible with certain other boards, and while not necessary for the MSP432, can cause issues when with other files in the project if changed. We do not recommend altering this structure.

In a file titled main_functions.cc are the definitions for the setup and loop functions. Setup is called once at the very start of the program and configures the board for use. Loop is called repeatedly over the course of the program after this, until the program terminates. The contents of the loop function as defined in main_functions.cc are responsible for running the keyword recognition program and should be saved if the loop function is to be modified. There are several potential methods to allow a program structured this way to use RTOS, we will leave it to you to devise a method that works for your implementation. We suggest starting threads in the setup function and leaving loop to idle, but your implementation is up to you. Note that the functions as described below describe the functions as they are given to you, not as they have to be for the program to work. You are free to modify what is contained in these functions, but we suggest saving the content elsewhere before doing so. The current content is integral to running the program successfully, but may not belong in these functions, depending on your implementation.

Setup

Called once at the start of the program, creates all objects/instances used in default micro_speech and reserves memory for pointer/buffers. Also ensures that the loaded model is valid.

Loop

Called repeatedly through the course of the program. Uses a featureProvider instance to convert the contents of an audio buffer to features for keyword recognition. Invokes the pretrained model using those features as inputs to the model. Uses a recognizeCommands instance to process the outputs from the pretrained model to determine what, or if a, keyword was said. Then calls RespondToCommand to react to the observed keyword.


Audio_Provider

Audio_Provider consists of two major (required) functions: LatestAudioTimestamp and GetAudioSamples but has several smaller functions as well.

LatestAudioTimestamp returns a pointer to the time at which the last audio sample was taken.

GetAudioSamples aims to maintain a certain number of microphone measurements, taken at a certain frequency, in a buffer. While those number and frequency of samples are determined by the model, for all models we provide you, this will be 512 samples at 16kHz frequency. We include an outline on how this can be achieved but leave the actual implementation to you. You do not need to follow our outline for this function, other than including the major functions. Note: This is not the feature extraction method converting time domain to frequency domain. That is feature_provider and should not require adjustment.


Command_Responder

Command_Responder has one major function: RespondToCommand.

RespondToCommand:

RespondToCommand passes a pointer denoting the command the model believes it heard. Based on this command different actions can be taken. We leave it to the students to determine what exactly this function does, as it will depend on the rest of their implementation. That said, we do include a framework for reporting the current to CCS through uart.


Recognize_Commands

Command_Responder has one major Class, RecognizeCommands which has one major function, ProcessLatestResults, as well as several private members. This class and function should not require updates but can be helpful when troubleshooting issues.

ProcessLatestResults:

This function should not be confused with command responder, as it has a distinc purpose, though a similar name. Not only does ProcessLatestResults have many internal checks which print diagnostic information about issues through uart, but it also determines which command was heard. Most DNN models made for micro_speech predict the likelihood of a given sample belonging to every keyword. ProcessLatestResults determines what the most likely keyword is, as well as performing suppression of commands when necessary (i.e. to stop one spoken keyword being detected multiple times).

Also of note are the member variables of RecognizeCommands. These are set in the setup function and cannot be accessed again without custom code to allow doing so. These control important characteristics such as the timeout period between commands (suppression_ms),the number of instances (minimum_count) required within a given amount of time (average_window_duration_ms) to suggest a keyword, and the detection threshold (detection_threshold) that a keyword must exceed to be recognized. Students should not need to update these values but may do so if they find it improves performance.


Summary

In this project, we design and make a simple model to demonstrate the basic function of how to incorporate AI technology into embedded system using TensorFlow Lite solution. We use the components mainly offered by TI-RSLK and BOOSTXL_AUDIO Audio Booster. As a formation, the vehicle is able to respond to keywords “up”, “down”, “right”, and “left”. or local network. The basic function has been realized, but it still welcome everyone who is interested in this project and TensorFlow Lite.

For future improvement, there are several aspects which welcome everyone to improve.

  1. Improvement of model.
  2. Develop more sophisticated application.

Appendix

References:

The video demo link can be found at: https://www.youtube.com/watch?v=au5htYl8fmo.

The slides can be downloaded here.

The source code can be downloaded here.