Hardware acceleration of edge AI with NPU

Rantala, Tuomas

Hardware acceleration of edge AI with NPU

dc.contributor.author	Rantala, Tuomas
dc.contributor.department	fi=Tietotekniikan laitos\|en=Department of Computing\|
dc.contributor.faculty	fi=Teknillinen tiedekunta\|en=Faculty of Technology\|
dc.contributor.studysubject	fi=Tieto- ja viestintätekniikka\|en=Information and Communication Technology\|
dc.date.accessioned	2026-06-16T19:31:31Z
dc.date.issued	2026-06-02
dc.description.abstract	This thesis focuses on the integration and deployment of two artificial neural networks into a firmware project on the STMicroelectronics STM32N6 microcontroller. A custom development board with onboard microphones, PSRAM, and flash was used. The neural network inference is executed on the Neural-Art 14 NPU integrated into the STM32N6. The basics of artificial neural networks, CPUs, GPUs, and NPUs are discussed, highlighting the importance of hardware acceleration for efficient inference. Utilizing the STMicroelectronics X-CUBE-AI, a Yamnet-1024 audio event detection model from the STM32 model zoo GitHub page was integrated into the firmware project. The integration included building the audio pipeline, utilizing the Arm Helium vector processing technology for the DSP functions. The performance of the model was measured by timing the CPU and NPU execution time. The functionality of the model was validated. In the initial version, the inference time was split relatively evenly between the CPU and NPU, totalling 21.22 ms. X-CUBE-AI was updated to a newer version, which enabled the epoch controller of the NPU. This decreased the total inference time to 10.45 ms, which is in line with the performance metrics presented in the model zoo. The epoch controller reduced the CPU time of the inference to just 260 µs, demonstrating the importance of configuring the hardware correctly to achieve optimal performance. For the second model, an image classifier model, MobileNetV1 was chosen. X-CUBE-AI was found to have limitations for integrating the second neural network onto the project, due to lacking memory configuration options. During this work, a new tool STM32Cube AI Studio was released to replace XCUBE-AI. Yamnet-1024 was migrated to the new tool, which was seen as a great improvement, even though it also had limitations regarding the memory configuration. Regardless of the limitations in the tool, the MobileNetV1 model was integrated alongside the Yamnet-1024 model into the firmware. The models were run sequentially. No performance degradation was observed on the Yamnet-1024 model after integrating the MobileNetV1 model to be executed alongside it. Overall, the STM32N6 with the Neural-Art 14 NPU was shown to be well-suited choice for edge AI applications where larger embedded models are used.
dc.format.extent	72
dc.identifier.uri	https://www.utupub.fi/handle/11111/62086
dc.identifier.urn	URN:NBN:fi-fe2026061671676
dc.language.iso	eng
dc.rights	fi=Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.\|en=This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.\|
dc.rights.accessrights	suljettu
dc.subject	Edge AI
dc.subject	NPU
dc.subject	STM32N6
dc.title	Hardware acceleration of edge AI with NPU
dc.type.ontasot	fi=Diplomityö\|en=Master's thesis\|

Tiedostot

Näytetään 1 - 1 / 1

Name:: Tuomas_Rantala_opinnayte.pdf
Size:: 1.77 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Pro gradu -tutkielmat ja diplomityöt sekä syventävien opintojen opinnäytetyöt (rajattu näkyvyys)