Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task

Abeed Sarker; Maksim Belousov; Jasper Friedrichs; Kai Hakala; Svetlana Kiritchenko; Farrokh Mehryary; Sifei Han; Tung Tran; Anthony Rios; Ramakanth Kavuluru; Berry de Bruijn; Filip Ginter; Debanjan Mahata; Saif M. Mohammad; Goran Nenadic; Graciela Gonzalez-Hernandez

Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task

dc.contributor.author	Abeed Sarker
dc.contributor.author	Maksim Belousov
dc.contributor.author	Jasper Friedrichs
dc.contributor.author	Kai Hakala
dc.contributor.author	Svetlana Kiritchenko
dc.contributor.author	Farrokh Mehryary
dc.contributor.author	Sifei Han
dc.contributor.author	Tung Tran
dc.contributor.author	Anthony Rios
dc.contributor.author	Ramakanth Kavuluru
dc.contributor.author	Berry de Bruijn
dc.contributor.author	Filip Ginter
dc.contributor.author	Debanjan Mahata
dc.contributor.author	Saif M. Mohammad
dc.contributor.author	Goran Nenadic
dc.contributor.author	Graciela Gonzalez-Hernandez
dc.contributor.organization	fi=tietotekniikan laitos\|en=Department of Computing\|
dc.contributor.organization-code	2610300
dc.converis.publication-id	36971574
dc.converis.url	https://research.utu.fi/converis/portal/Publication/36971574
dc.date.accessioned	2022-10-28T12:29:54Z
dc.date.available	2022-10-28T12:29:54Z
dc.description.abstract	<div>Objective: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data.</div><div><br /></div><div>Materials and Methods: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks.</div><div><br /></div><div>Results: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems.</div><div><br /></div><div>Discussion: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1).Conclusions: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).</div>
dc.format.pagerange	1283
dc.identifier.jour-issn	1067-5027
dc.identifier.olddbid	176840
dc.identifier.oldhandle	10024/159934
dc.identifier.uri	https://www.utupub.fi/handle/11111/32453
dc.identifier.url	https://academic.oup.com/jamia/article/25/10/1274/5113021
dc.identifier.urn	URN:NBN:fi-fe2021042720279
dc.language.iso	en
dc.okm.affiliatedauthor	Hakala, Kai
dc.okm.affiliatedauthor	Mehryary, Farrokh
dc.okm.affiliatedauthor	Ginter, Filip
dc.okm.discipline	515 Psychology	en_GB
dc.okm.discipline	515 Psykologia	fi_FI
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	OXFORD UNIV PRESS
dc.publisher.country	United Kingdom	en_GB
dc.publisher.country	Britannia	fi_FI
dc.publisher.country-code	GB
dc.relation.doi	10.1093/jamia/ocy114
dc.relation.ispartofjournal	Journal of the American Medical Informatics Association
dc.relation.issue	10
dc.relation.volume	25
dc.source.identifier	https://www.utupub.fi/handle/10024/159934
dc.title	Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task
dc.year.issued	2018

Tiedostot

Näytetään 1 - 1 / 1

Name:: Sarker_et_al_Data_and_systems_for_medication-related_text_classification_and_concept_normalization_from_Twitter.pdf
Size:: 558.56 KB
Format:: Adobe Portable Document Format
Description:: Publisher's PDF

Lataa

Kokoelmat

Rinnakkaistallenteet