Hardware used for AI/ML applications has trended towards more powerful and more power hungry devices. Currently, GPUs and some FPGA datacenter accelerator cards can consume 200-300W at full load. This makes using these devices impractical in many edge-computing applications. Some semiconductor manufacturers are beginning to build AI-accelerated silicon to improve issues relating to not only power consumption, but also form factor and cost. We examine one such device - the MAX78000 Artificial Intelligence Microcontroller. With synthesis software provided by the manufacturer, this microcontroller can perform inference with models trained with high level software such as Pytorch or Tensorflow. Before synthesis, quantization is performed on the model weights, which allows the model to occupy a much smaller memory footprint and perform more efficient calculations, but decreases model accuracy. We attempt to measure the reduction in performance and accuracy degradation that should be expected for this device by benchmarking CNN (Convolutional Neural Network) inference on datasets such as MNIST,1 a dataset consisting of handwritten digits, and CIFAR-10,2 a dataset containing images divided into ten classes. We benchmark inference using models such as SimpleNet and models found through NAS (Neural Architecture Search) by adding batch processing of test data sets to code generated by the AI8X synthesis from the MAX78000 SDK. Using the performance and accuracy results from the testing of the aforementioned datasets and neural network models, we attempt to predict the feasibility of performing inference for such CNN use cases such as real-time image recognition and object detection. For each case we examine which commonly used algorithms are or are not feasible with the resources limitations of the MAX78000 SoC.
|