Hardware acceleration of edge AI with NPU

dc.contributor.authorRantala, Tuomas
dc.contributor.departmentfi=Tietotekniikan laitos|en=Department of Computing|
dc.contributor.facultyfi=Teknillinen tiedekunta|en=Faculty of Technology|
dc.contributor.studysubjectfi=Tieto- ja viestintätekniikka|en=Information and Communication Technology|
dc.date.accessioned2026-06-16T19:31:31Z
dc.date.issued2026-06-02
dc.description.abstractThis thesis focuses on the integration and deployment of two artificial neural networks into a firmware project on the STMicroelectronics STM32N6 microcontroller. A custom development board with onboard microphones, PSRAM, and flash was used. The neural network inference is executed on the Neural-Art 14 NPU integrated into the STM32N6. The basics of artificial neural networks, CPUs, GPUs, and NPUs are discussed, highlighting the importance of hardware acceleration for efficient inference. Utilizing the STMicroelectronics X-CUBE-AI, a Yamnet-1024 audio event detection model from the STM32 model zoo GitHub page was integrated into the firmware project. The integration included building the audio pipeline, utilizing the Arm Helium vector processing technology for the DSP functions. The performance of the model was measured by timing the CPU and NPU execution time. The functionality of the model was validated. In the initial version, the inference time was split relatively evenly between the CPU and NPU, totalling 21.22 ms. X-CUBE-AI was updated to a newer version, which enabled the epoch controller of the NPU. This decreased the total inference time to 10.45 ms, which is in line with the performance metrics presented in the model zoo. The epoch controller reduced the CPU time of the inference to just 260 µs, demonstrating the importance of configuring the hardware correctly to achieve optimal performance. For the second model, an image classifier model, MobileNetV1 was chosen. X-CUBE-AI was found to have limitations for integrating the second neural network onto the project, due to lacking memory configuration options. During this work, a new tool STM32Cube AI Studio was released to replace XCUBE-AI. Yamnet-1024 was migrated to the new tool, which was seen as a great improvement, even though it also had limitations regarding the memory configuration. Regardless of the limitations in the tool, the MobileNetV1 model was integrated alongside the Yamnet-1024 model into the firmware. The models were run sequentially. No performance degradation was observed on the Yamnet-1024 model after integrating the MobileNetV1 model to be executed alongside it. Overall, the STM32N6 with the Neural-Art 14 NPU was shown to be well-suited choice for edge AI applications where larger embedded models are used.
dc.format.extent72
dc.identifier.urihttps://www.utupub.fi/handle/11111/62086
dc.identifier.urnURN:NBN:fi-fe2026061671676
dc.language.isoeng
dc.rightsfi=Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.|en=This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|
dc.rights.accessrightssuljettu
dc.subjectEdge AI
dc.subjectNPU
dc.subjectSTM32N6
dc.titleHardware acceleration of edge AI with NPU
dc.type.ontasotfi=Diplomityö|en=Master's thesis|

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Tuomas_Rantala_opinnayte.pdf
Size:
1.77 MB
Format:
Adobe Portable Document Format