Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task

dc.contributor.authorAbeed Sarker
dc.contributor.authorMaksim Belousov
dc.contributor.authorJasper Friedrichs
dc.contributor.authorKai Hakala
dc.contributor.authorSvetlana Kiritchenko
dc.contributor.authorFarrokh Mehryary
dc.contributor.authorSifei Han
dc.contributor.authorTung Tran
dc.contributor.authorAnthony Rios
dc.contributor.authorRamakanth Kavuluru
dc.contributor.authorBerry de Bruijn
dc.contributor.authorFilip Ginter
dc.contributor.authorDebanjan Mahata
dc.contributor.authorSaif M. Mohammad
dc.contributor.authorGoran Nenadic
dc.contributor.authorGraciela Gonzalez-Hernandez
dc.contributor.organizationfi=tietotekniikan laitos|en=Department of Computing|
dc.contributor.organization-code1.2.246.10.2458963.20.85312822902
dc.contributor.organization-code2610300
dc.converis.publication-id36971574
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/36971574
dc.date.accessioned2022-10-28T12:29:54Z
dc.date.available2022-10-28T12:29:54Z
dc.description.abstract<div>Objective: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data.</div><div><br /></div><div>Materials and Methods: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks.</div><div><br /></div><div>Results: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems.</div><div><br /></div><div>Discussion: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1).Conclusions: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).</div>
dc.format.pagerange1274
dc.format.pagerange1283
dc.identifier.jour-issn1067-5027
dc.identifier.olddbid176840
dc.identifier.oldhandle10024/159934
dc.identifier.urihttps://www.utupub.fi/handle/11111/32453
dc.identifier.urlhttps://academic.oup.com/jamia/article/25/10/1274/5113021
dc.identifier.urnURN:NBN:fi-fe2021042720279
dc.language.isoen
dc.okm.affiliatedauthorHakala, Kai
dc.okm.affiliatedauthorMehryary, Farrokh
dc.okm.affiliatedauthorGinter, Filip
dc.okm.discipline515 Psychologyen_GB
dc.okm.discipline515 Psykologiafi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherOXFORD UNIV PRESS
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.doi10.1093/jamia/ocy114
dc.relation.ispartofjournalJournal of the American Medical Informatics Association
dc.relation.issue10
dc.relation.volume25
dc.source.identifierhttps://www.utupub.fi/handle/10024/159934
dc.titleData and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task
dc.year.issued2018

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Sarker_et_al_Data_and_systems_for_medication-related_text_classification_and_concept_normalization_from_Twitter.pdf
Size:
558.56 KB
Format:
Adobe Portable Document Format
Description:
Publisher's PDF