Hyppää sisältöön
    • Suomeksi
    • In English
  • Suomeksi
  • In English
  • Kirjaudu
Näytä aineisto 
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
JavaScript is disabled for your browser. Some features of this site may not work without it.

Developing an online hate classifier for multiple social media platforms

Maximilian Hopf; Shammur A. Chowdhury; Joni Salminen; Hind Almerekhi; Bernard Jansen; Soon-gyo Jung

Developing an online hate classifier for multiple social media platforms

Maximilian Hopf
Shammur A. Chowdhury
Joni Salminen
Hind Almerekhi
Bernard Jansen
Soon-gyo Jung
Katso/Avaa
Publisher's version (1.866Mb)
Lataukset: 

Springer
doi:10.1186/s13673-019-0205-6
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2021042824119
Tiivistelmä

The proliferation of social media enables people to express their
opinions widely online. However, at the same time, this has resulted in
the emergence of conflict and hate, making online environments
uninviting for users. Although researchers have found that hate is a
problem across multiple platforms, there is a lack of models for online
hate detection using multi-platform data. To address this research gap,
we collect a total of 197,566 comments from four platforms: YouTube,
Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as
non-hateful and the remaining 20% labeled as hateful. We then experiment
with several classification algorithms (Logistic Regression, Naïve
Bayes, Support Vector Machines, XGBoost, and Neural Networks) and
feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their
combination). While all the models significantly outperform the
keyword-based baseline classifier, XGBoost using all features performs
the best (F1 = 0.92). Feature importance analysis indicates that BERT
features are the most impactful for the predictions. Findings support
the generalizability of the best model, as the platform-specific results
from Twitter and Wikipedia are comparable to their respective source
papers. We make our code publicly available for application in real
software systems as well as for further development by online hate
researchers.

Kokoelmat
  • Rinnakkaistallenteet [19207]

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste
 

 

Tämä kokoelma

JulkaisuajatTekijätNimekkeetAsiasanatTiedekuntaLaitosOppiaineYhteisöt ja kokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste