Developing an online hate classifier for multiple social media platforms

Joni Salminen; Maximilian Hopf; Shammur A. Chowdhury; Soon-gyo Jung; Hind Almerekhi; Bernard Jansen

Developing an online hate classifier for multiple social media platforms

dc.contributor.author	Joni Salminen
dc.contributor.author	Maximilian Hopf
dc.contributor.author	Shammur A. Chowdhury
dc.contributor.author	Soon-gyo Jung
dc.contributor.author	Hind Almerekhi
dc.contributor.author	Bernard Jansen
dc.contributor.organization	fi=markkinointi\|en=Marketing\|
dc.contributor.organization-code	1.2.246.10.2458963.20.50826905346
dc.converis.publication-id	45063523
dc.converis.url	https://research.utu.fi/converis/portal/Publication/45063523
dc.date.accessioned	2022-10-28T12:20:11Z
dc.date.available	2022-10-28T12:20:11Z
dc.description.abstract	<p>The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate detection using multi-platform data. To address this research gap, we collect a total of 197,566 comments from four platforms: YouTube, Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as non-hateful and the remaining 20% labeled as hateful. We then experiment with several classification algorithms (Logistic Regression, Naïve Bayes, Support Vector Machines, XGBoost, and Neural Networks) and feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their combination). While all the models significantly outperform the keyword-based baseline classifier, XGBoost using all features performs the best (F1 = 0.92). Feature importance analysis indicates that BERT features are the most impactful for the predictions. Findings support the generalizability of the best model, as the platform-specific results from Twitter and Wikipedia are comparable to their respective source papers. We make our code publicly available for application in real software systems as well as for further development by online hate researchers.</p>
dc.identifier.eissn	2192-1962
dc.identifier.jour-issn	2192-1962
dc.identifier.olddbid	175919
dc.identifier.oldhandle	10024/159013
dc.identifier.uri	https://www.utupub.fi/handle/11111/30086
dc.identifier.urn	URN:NBN:fi-fe2021042824119
dc.language.iso	en
dc.okm.affiliatedauthor	Salminen, Joni
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.discipline	113 Tietojenkäsittely ja informaatiotieteet	fi_FI
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	Springer
dc.publisher.country	Germany	en_GB
dc.publisher.country	Saksa	fi_FI
dc.publisher.country-code	DE
dc.relation.articlenumber	1
dc.relation.doi	10.1186/s13673-019-0205-6
dc.relation.ispartofjournal	Human-Centric Computing and Information Sciences
dc.relation.issue	1
dc.relation.volume	10
dc.source.identifier	https://www.utupub.fi/handle/10024/159013
dc.title	Developing an online hate classifier for multiple social media platforms
dc.year.issued	2020

Tiedostot

Näytetään 1 - 1 / 1

Name:: s13673-019-0205-6.pdf
Size:: 1.87 MB
Format:: Adobe Portable Document Format
Description:: Publisher's version

Lataa

Kokoelmat

Rinnakkaistallenteet