Automatic analysis of the emotional content of speech in daylong child-centered recordings from a neonatal intensive care unit

dc.contributor.authorVaaras Einari
dc.contributor.authorAhlqvist-Björkroth Sari
dc.contributor.authorDrossos Konstantinos
dc.contributor.authorRäsänen Okko
dc.contributor.organizationfi=lastentautioppi|en=Paediatrics and Adolescent Medicine|
dc.contributor.organizationfi=psykologia|en=Psychology|
dc.contributor.organizationfi=tyks, vsshp|en=tyks, varha|
dc.contributor.organization-code1.2.246.10.2458963.20.40612039509
dc.contributor.organization-code2603103
dc.contributor.organization-code2607313
dc.converis.publication-id68413743
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/68413743
dc.date.accessioned2022-10-28T14:09:30Z
dc.date.available2022-10-28T14:09:30Z
dc.description.abstract<p>Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.<br></p>
dc.format.pagerange3380
dc.format.pagerange3384
dc.identifier.eisbn978-1-71383-690-2
dc.identifier.issn1990-9772
dc.identifier.jour-issn1990-9772
dc.identifier.olddbid186609
dc.identifier.oldhandle10024/169703
dc.identifier.urihttps://www.utupub.fi/handle/11111/39153
dc.identifier.urlhttps://urn.fi/URN:NBN:fi:tuni-202112038869
dc.identifier.urnURN:NBN:fi-fe2022012811252
dc.language.isoen
dc.okm.affiliatedauthorVaaras, Einari
dc.okm.affiliatedauthorAhlqvist-Björkroth, Sari
dc.okm.affiliatedauthorRäsänen, Okko
dc.okm.affiliatedauthorDataimport, tyks, vsshp
dc.okm.discipline3123 Gynaecology and paediatricsen_GB
dc.okm.discipline515 Psychologyen_GB
dc.okm.discipline3123 Naisten- ja lastentauditfi_FI
dc.okm.discipline515 Psykologiafi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.relation.conferenceAnnual Conference of the International Speech Communication Association
dc.relation.doi10.21437/Interspeech.2021-303
dc.relation.ispartofjournalInterspeech
dc.relation.ispartofseriesProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
dc.relation.volume1
dc.source.identifierhttps://www.utupub.fi/handle/10024/169703
dc.titleAutomatic analysis of the emotional content of speech in daylong child-centered recordings from a neonatal intensive care unit
dc.title.book22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
dc.year.issued2021

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
vaaras21_interspeech.pdf
Size:
457.73 KB
Format:
Adobe Portable Document Format