Comparison of Word and Character Level Information for Medical Term Identification Using Convolutional Neural Networks and Transformers

Seneviratne Sandaru; Lenskiy Artem; Nolan Christopher; Daskalaki Eleni; Suominen Hanna

Comparison of Word and Character Level Information for Medical Term Identification Using Convolutional Neural Networks and Transformers

dc.contributor.author	Seneviratne Sandaru
dc.contributor.author	Lenskiy Artem
dc.contributor.author	Nolan Christopher
dc.contributor.author	Daskalaki Eleni
dc.contributor.author	Suominen Hanna
dc.contributor.organization	fi=tietotekniikan laitos\|en=Department of Computing\|
dc.contributor.organization-code	1.2.246.10.2458963.20.85312822902
dc.converis.publication-id	176265377
dc.converis.url	https://research.utu.fi/converis/portal/Publication/176265377
dc.date.accessioned	2025-08-28T00:41:54Z
dc.date.available	2025-08-28T00:41:54Z
dc.description.abstract	Complexity and domain-specificity make medical text hard to understand for patients and their next of kin. To simplify such text, this paper explored how word and character level information can be leveraged to identify medical terms when training data is limited. We created a dataset of medical and general terms using the Human Disease Ontology from BioPortal and Wikipedia pages. Our results from 10-fold cross validation indicated that convolutional neural networks (CNNs) and transformers perform competitively. The best F score of 93.9% was achieved by a CNN trained on both word and character level embeddings. Statistical significance tests demonstrated that general word embeddings provide rich word representations for medical term identification. Consequently, focusing on words is favorable for medical term identification if using deep learning architectures.
dc.format.pagerange	253
dc.identifier.issn	0926-9630
dc.identifier.jour-issn	0926-9630
dc.identifier.olddbid	206227
dc.identifier.oldhandle	10024/189254
dc.identifier.uri	https://www.utupub.fi/handle/11111/44610
dc.identifier.url	https://ebooks.iospress.nl/doi/10.3233/SHTI210717
dc.identifier.urn	URN:NBN:fi-fe2022091258506
dc.language.iso	en
dc.okm.affiliatedauthor	Suominen, Hanna
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A4 Conference Article
dc.publisher.country	Netherlands	en_GB
dc.publisher.country	Alankomaat	fi_FI
dc.publisher.country-code	NL
dc.relation.conference	International Congress in Nursing Informatics
dc.relation.doi	10.3233/SHTI210717
dc.relation.ispartofjournal	Studies in Health Technology and Informatics
dc.relation.ispartofseries	Studies in Health Technology and Informatics
dc.relation.volume	284
dc.source.identifier	https://www.utupub.fi/handle/10024/189254
dc.title	Comparison of Word and Character Level Information for Medical Term Identification Using Convolutional Neural Networks and Transformers
dc.title.book	Nurses and Midwives in the Digital Age: Selected Papers, Posters and Panels from the 15th International Congress in Nursing Informatics
dc.year.issued	2021

Tiedostot

Näytetään 1 - 1 / 1

Name:: SHTI-284-SHTI210717.pdf
Size:: 155.58 KB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet