Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling

dc.contributor.authorSuwisa Kaewphan
dc.contributor.authorKai Hakala
dc.contributor.authorNiko Miekka
dc.contributor.authorTapio Salakoski
dc.contributor.authorFilip Ginter
dc.contributor.organizationfi=kieli- ja puheteknologia|en=Language and Speech Technology|
dc.contributor.organizationfi=tietojenkäsittelytiede|en=Computer Science|
dc.contributor.organization-code1.2.246.10.2458963.20.47465613983
dc.contributor.organization-code2606803
dc.contributor.organization-code2606805
dc.converis.publication-id35859071
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/35859071
dc.date.accessioned2022-10-28T12:37:16Z
dc.date.available2022-10-28T12:37:16Z
dc.description.abstract<p>We present a system for automatically identifying a multitude of biomedical entities from the literature. This work is based on our previous efforts in the BioCreative VI: Interactive Bio-ID Assignment shared task in which our system demonstrated state-of-the-art performance with the highest achieved results in named entity recognition. In this paper we describe the original conditional random field-based system used in the shared task as well as experiments conducted since, including better hyperparameter tuning and character level modeling, which led to further performance improvements. For normalizing the mentions into unique identifiers we use fuzzy character <em>n</em>-gram matching. The normalization approach has also been improved with a better abbreviation resolution method and stricter guideline compliance resulting in vastly improved results for various entity types. All tools and models used for both named entity recognition and normalization are publicly available under open license.</p>
dc.format.pagerange1
dc.format.pagerange10
dc.identifier.eissn1758-0463
dc.identifier.jour-issn1758-0463
dc.identifier.olddbid177737
dc.identifier.oldhandle10024/160831
dc.identifier.urihttps://www.utupub.fi/handle/11111/34393
dc.identifier.urlhttps://academic.oup.com/database/article/doi/10.1093/database/bay096/5101499
dc.identifier.urnURN:NBN:fi-fe2021042719760
dc.language.isoen
dc.okm.affiliatedauthorKaewphan, Suwisa
dc.okm.affiliatedauthorHakala, Kai
dc.okm.affiliatedauthorMiekka, Niko
dc.okm.affiliatedauthorSalakoski, Tapio
dc.okm.affiliatedauthorGinter, Filip
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherOxford University Press
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.doi10.1093/database/bay096
dc.relation.ispartofjournalDatabase: The Journal of Biological Databases and Curation
dc.relation.volume2018
dc.source.identifierhttps://www.utupub.fi/handle/10024/160831
dc.titleWide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling
dc.year.issued2018

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
bay096.pdf
Size:
867.36 KB
Format:
Adobe Portable Document Format
Description:
Publisher's PDF