Automatic analysis of the emotional content of speech in daylong child-centered recordings from a neonatal intensive care unit

Vaaras Einari; Ahlqvist-Björkroth Sari; Drossos Konstantinos; Räsänen Okko

Automatic analysis of the emotional content of speech in daylong child-centered recordings from a neonatal intensive care unit

dc.contributor.author	Vaaras Einari
dc.contributor.author	Ahlqvist-Björkroth Sari
dc.contributor.author	Drossos Konstantinos
dc.contributor.author	Räsänen Okko
dc.contributor.organization	fi=tyks, vsshp\|en=tyks, varha\|
dc.contributor.organization-code	2603103
dc.converis.publication-id	68413743
dc.converis.url	https://research.utu.fi/converis/portal/Publication/68413743
dc.date.accessioned	2022-10-28T14:09:30Z
dc.date.available	2022-10-28T14:09:30Z
dc.description.abstract	<p>Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.<br></p>
dc.format.pagerange	3384
dc.identifier.eisbn	978-1-71383-690-2
dc.identifier.issn	1990-9772
dc.identifier.jour-issn	1990-9772
dc.identifier.olddbid	186609
dc.identifier.oldhandle	10024/169703
dc.identifier.uri	https://www.utupub.fi/handle/11111/39153
dc.identifier.url	https://urn.fi/URN:NBN:fi:tuni-202112038869
dc.identifier.urn	URN:NBN:fi-fe2022012811252
dc.language.iso	en
dc.okm.affiliatedauthor	Vaaras, Einari
dc.okm.affiliatedauthor	Ahlqvist-Björkroth, Sari
dc.okm.affiliatedauthor	Räsänen, Okko
dc.okm.affiliatedauthor	Dataimport, tyks, vsshp
dc.okm.discipline	515 Psychology	en_GB
dc.okm.internationalcopublication	not an international co-publication
dc.okm.internationality	International publication
dc.okm.type	A4 Conference Article
dc.relation.conference	Annual Conference of the International Speech Communication Association
dc.relation.doi	10.21437/Interspeech.2021-303
dc.relation.ispartofjournal	Interspeech
dc.relation.ispartofseries	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
dc.relation.volume	1
dc.source.identifier	https://www.utupub.fi/handle/10024/169703
dc.title	Automatic analysis of the emotional content of speech in daylong child-centered recordings from a neonatal intensive care unit
dc.title.book	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
dc.year.issued	2021

Tiedostot

Näytetään 1 - 1 / 1

Name:: vaaras21_interspeech.pdf
Size:: 457.73 KB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet