Comparison of Word and Character Level Information for Medical Term Identification Using Convolutional Neural Networks and Transformers

dc.contributor.authorSeneviratne Sandaru
dc.contributor.authorLenskiy Artem
dc.contributor.authorNolan Christopher
dc.contributor.authorDaskalaki Eleni
dc.contributor.authorSuominen Hanna
dc.contributor.organizationfi=tietotekniikan laitos|en=Department of Computing|
dc.contributor.organization-code1.2.246.10.2458963.20.85312822902
dc.converis.publication-id176265377
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/176265377
dc.date.accessioned2025-08-28T00:41:54Z
dc.date.available2025-08-28T00:41:54Z
dc.description.abstractComplexity and domain-specificity make medical text hard to understand for patients and their next of kin. To simplify such text, this paper explored how word and character level information can be leveraged to identify medical terms when training data is limited. We created a dataset of medical and general terms using the Human Disease Ontology from BioPortal and Wikipedia pages. Our results from 10-fold cross validation indicated that convolutional neural networks (CNNs) and transformers perform competitively. The best F score of 93.9% was achieved by a CNN trained on both word and character level embeddings. Statistical significance tests demonstrated that general word embeddings provide rich word representations for medical term identification. Consequently, focusing on words is favorable for medical term identification if using deep learning architectures.
dc.format.pagerange249
dc.format.pagerange253
dc.identifier.issn0926-9630
dc.identifier.jour-issn0926-9630
dc.identifier.olddbid206227
dc.identifier.oldhandle10024/189254
dc.identifier.urihttps://www.utupub.fi/handle/11111/44610
dc.identifier.urlhttps://ebooks.iospress.nl/doi/10.3233/SHTI210717
dc.identifier.urnURN:NBN:fi-fe2022091258506
dc.language.isoen
dc.okm.affiliatedauthorSuominen, Hanna
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline3112 Neurosciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline3112 Neurotieteetfi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA4 Conference Article
dc.publisher.countryNetherlandsen_GB
dc.publisher.countryAlankomaatfi_FI
dc.publisher.country-codeNL
dc.relation.conferenceInternational Congress in Nursing Informatics
dc.relation.doi10.3233/SHTI210717
dc.relation.ispartofjournalStudies in Health Technology and Informatics
dc.relation.ispartofseriesStudies in Health Technology and Informatics
dc.relation.volume284
dc.source.identifierhttps://www.utupub.fi/handle/10024/189254
dc.titleComparison of Word and Character Level Information for Medical Term Identification Using Convolutional Neural Networks and Transformers
dc.title.bookNurses and Midwives in the Digital Age: Selected Papers, Posters and Panels from the 15th International Congress in Nursing Informatics
dc.year.issued2021

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
SHTI-284-SHTI210717.pdf
Size:
155.58 KB
Format:
Adobe Portable Document Format