Development of a speech emotion recognizer for large-scale child-centered audio recordings from a hospital environment

Vaaras Einari; Ahlqvist-Björkroth Sari; Drossos Konstantinos; Lehtonen Liisa; Räsänen Okko

Development of a speech emotion recognizer for large-scale child-centered audio recordings from a hospital environment

dc.contributor.author	Vaaras Einari
dc.contributor.author	Ahlqvist-Björkroth Sari
dc.contributor.author	Drossos Konstantinos
dc.contributor.author	Lehtonen Liisa
dc.contributor.author	Räsänen Okko
dc.contributor.organization	fi=tyks, vsshp\|en=tyks, varha\|
dc.contributor.organization-code	1.2.246.10.2458963.20.15586825505
dc.contributor.organization-code	1.2.246.10.2458963.20.40612039509
dc.converis.publication-id	179057384
dc.converis.url	https://research.utu.fi/converis/portal/Publication/179057384
dc.date.accessioned	2025-08-27T22:22:58Z
dc.date.available	2025-08-27T22:22:58Z
dc.description.abstract	<p>In order to study how early emotional experiences shape infant development, one approach is to analyze the emotional content of <a href="https://www.sciencedirect.com/topics/social-sciences/logopedics" title="Learn more about speech from ScienceDirect's AI-generated Topic Pages">speech</a> heard by infants, as captured by child-centered daylong recordings, and as analyzed by automatic speech emotion recognition (SER) systems. However, since large-scale daylong audio is initially unannotated and differs from typical speech corpora from controlled environments, there are no existing in-domain SER systems for the task. Based on existing literature, it is also unclear what is the best approach to deploy a SER system for a new domain. Consequently, in this study, we investigated alternative strategies for deploying a SER system for large-scale child-centered <a href="https://www.sciencedirect.com/topics/computer-science/audio-recording" title="Learn more about audio recordings from ScienceDirect's AI-generated Topic Pages">audio recordings</a> from a neonatal hospital environment, comparing cross-corpus <a href="https://www.sciencedirect.com/topics/computer-science/generalization" title="Learn more about generalization from ScienceDirect's AI-generated Topic Pages">generalization</a>, active learning (AL), and <a href="https://www.sciencedirect.com/topics/computer-science/domain-adaptation" title="Learn more about domain adaptation from ScienceDirect's AI-generated Topic Pages">domain adaptation</a> (DA) methods in the process. We first conducted simulations with existing emotion-labeled speech corpora to find the best strategy for SER system deployment. We then tested how the findings generalize to our new initially unannotated dataset. As a result, we found that the studied AL method provided overall the most consistent results, being less dependent on the specifics of the training corpora or speech features compared to the alternative methods. However, in situations without the possibility to annotate data, unsupervised DA proved to be the best approach. We also observed that deployment of a SER system for real-world daylong child-centered audio recordings achieved a SER performance level comparable to those reported in literature, and that the amount of human effort required for the system deployment was overall relatively modest.<br></p>
dc.format.pagerange	22
dc.format.pagerange	9
dc.identifier.eissn	1872-7182
dc.identifier.jour-issn	0167-6393
dc.identifier.olddbid	202079
dc.identifier.oldhandle	10024/185106
dc.identifier.uri	https://www.utupub.fi/handle/11111/45394
dc.identifier.url	https://doi.org/10.1016/j.specom.2023.02.001
dc.identifier.urn	URN:NBN:fi-fe2023033033889
dc.language.iso	en
dc.okm.affiliatedauthor	Ahlqvist-Björkroth, Sari
dc.okm.affiliatedauthor	Lehtonen, Liisa
dc.okm.affiliatedauthor	Dataimport, tyks, vsshp
dc.okm.discipline	515 Psychology	en_GB
dc.okm.discipline	515 Psykologia	fi_FI
dc.okm.internationalcopublication	not an international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	Elsevier B.V.
dc.publisher.country	Netherlands	en_GB
dc.publisher.country	Alankomaat	fi_FI
dc.publisher.country-code	NL
dc.relation.doi	10.1016/j.specom.2023.02.001
dc.relation.ispartofjournal	Speech Communication
dc.relation.volume	148
dc.source.identifier	https://www.utupub.fi/handle/10024/185106
dc.title	Development of a speech emotion recognizer for large-scale child-centered audio recordings from a hospital environment
dc.year.issued	2023

Tiedostot

Näytetään 1 - 1 / 1

Name:: 1-s2.0-S0167639323000262-main (1).pdf
Size:: 1.25 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet