Data Reduction Methods of Audio Signals for Embedded Sound Event Recognition
Yan, Junjie (2020-06-23)
Data Reduction Methods of Audio Signals for Embedded Sound Event Recognition
Yan, Junjie
(23.06.2020)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2020082161345
https://urn.fi/URN:NBN:fi-fe2020082161345
Tiivistelmä
Sound event detection is a typical Internet of Things (IoT) application task, which could be used in many scenarios like dedicated security application, where cameras might be unsuitable due to the environment variations like lights and movements. In realistic applications, usually models for this task are implemented on embedded devices with microphones. And the idea of edge computing is to process the data near the place where it happens, because reacting in real time is very important in some applications. Transmitting collected audio clips to cloud may cause huge delay and sometime results in serious consequence. But processing on local has another problem, heavy computation may beyond the load for embedded devices, which happens to be the weakness of embedded devices. Works on this problem have make a huge progress recent year, like model compression and hardware acceleration.
This thesis provides a new perspective on embedded deep learning for audio tasks, aimed at reducing data amount of audio signals for sound event recognition task. Instead of following the idea of compressing model or designing hardware accelerator, our methods focus on analog front-end signal acquisition side, reducing data amount of audio signal clips directly, using specific sampling methods. The state-of-the-art works for sound event detection are mainly based on deep learning models. For deep learning models, less input size means lower latency due to less time steps for recurrent neural network (RNN) or less convolutional computations for convolutional neural network (CNN). So, less data amount of input, audio signals gain less computation and parameters of neural network classifier, naturally, resulting less delay while interference. Our experiments implement three kind of data reduction methods on this sound event detection task, all of these three methods are based on reducing the sample points of an audio signal, using less sampling rate and sampling width, using sigma delta analog digital converter (ADC) and using level crossing (LC) ADC for audio signals. We simulated these three kinds of signals and feed them into the neural network to train the classifier
Finally, we derive the conclusion that there is still some redundancy of audio signals in traditional sampling ways for audio classification. And using specific ADC modules better performance on classification with the same data amount in original way.
This thesis provides a new perspective on embedded deep learning for audio tasks, aimed at reducing data amount of audio signals for sound event recognition task. Instead of following the idea of compressing model or designing hardware accelerator, our methods focus on analog front-end signal acquisition side, reducing data amount of audio signal clips directly, using specific sampling methods. The state-of-the-art works for sound event detection are mainly based on deep learning models. For deep learning models, less input size means lower latency due to less time steps for recurrent neural network (RNN) or less convolutional computations for convolutional neural network (CNN). So, less data amount of input, audio signals gain less computation and parameters of neural network classifier, naturally, resulting less delay while interference. Our experiments implement three kind of data reduction methods on this sound event detection task, all of these three methods are based on reducing the sample points of an audio signal, using less sampling rate and sampling width, using sigma delta analog digital converter (ADC) and using level crossing (LC) ADC for audio signals. We simulated these three kinds of signals and feed them into the neural network to train the classifier
Finally, we derive the conclusion that there is still some redundancy of audio signals in traditional sampling ways for audio classification. And using specific ADC modules better performance on classification with the same data amount in original way.