Comparing Data Augmentation Methods for Synthesizer Parameter Estimation
Ladataan...
1.64 MB
suljettu
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
Lataukset2
Pysyvä osoite
Verkkojulkaisu
DOI
Tiivistelmä
Synthesizer parameter estimation is a machine learning task where we train a model to estimate synthesizer parameters for recreating a given target sound. This problem can be approached by randomly generating synthesizer sounds to be used as training data. This might cause the model to estimate parameters well for the synthesizer generated sounds, but not for other sounds, like instrument sounds recorded with a microphone. Data augmentation methods can help overcome this problem.
This thesis compares two different data augmentation methods in the context of synthesizer parameter estimation. The first is augmentation applied during the process of synthesizing the sound by adding noise to the pitch and amplitude envelopes of the synthesizer. The second augmentation method applies masking to the spectrogram of the sound which is the input for the neural network. We compare applying no augmentation, only envelope augmentation, only spectrogram augmentation, and both augmentations. Evaluation is based on recorded instrument sounds by comparing the spectrograms of the predicted sound and the target sound.
The results show slightly better performance when using only spectrogram augmentation, but the difference is subtle and it is difficult to say how significant it is. We do not get clear answers to the questions about how the augmentation methods affect the model’s performance, suggesting that further research with a different evaluation metric is needed.