Twitter Sentiment Analysis on Product Reviews : A comparison between two machine learning techniques
Velzeboer, Emily (2019-08-23)
Twitter Sentiment Analysis on Product Reviews : A comparison between two machine learning techniques
Velzeboer, Emily
(23.08.2019)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe202001243243
https://urn.fi/URN:NBN:fi-fe202001243243
Tiivistelmä
We live in an era where large amounts of information, the so-called Big Data, are exploited to find patterns, trends, and make predictions.
Sentiment analysis deals with analyzing large amounts of text for the purpose of catching the mood and feeling about something, e.g., a product or an event. In particular, this thesis applies sentiment analysis techniques to classify product reviews on the Twitter social media platform, with the goal of understanding what the general feeling about a newly-launched product is. This work focuses on a recently launched line of running shoes by American sport equipment manufacturer Nike Inc.
The methodology used in this work is rooted in the field of machine learning: data is preprocessed to make it easily exploitable, and then standard techniques and algorithms like Naïve Bayes classifier and Support Vector Machine are employed to train classifiers and make predictions.
Beyond the comparison that is drawn between several techniques, the originality of this work lies in the fact that a generic training set is shown to give good results when used to train classifiers that will be applied to the specific problem of classifying product reviews. In other words, this work shows that the text elements that define and identify sentiment are constant across different use cases, and that generic texts can be used to extract knowledge which can be successfully exploited for a specific problem.
Moreover, this work presents an innovative treatment of neutral tweets, i.e., those that do not contain a positive or negative feeling. No specific training is done in this case, but the classifier is shown to be able to identify them in many instances, despite the fact that the training set only contains positive and negative samples. To this end, a different approach is used, based on the subjectivity of the text itself, under the assumption that a piece of text that conveys a feeling will be subjective, whereas a neutral piece of text will tend to be more objective.
Sentiment analysis deals with analyzing large amounts of text for the purpose of catching the mood and feeling about something, e.g., a product or an event. In particular, this thesis applies sentiment analysis techniques to classify product reviews on the Twitter social media platform, with the goal of understanding what the general feeling about a newly-launched product is. This work focuses on a recently launched line of running shoes by American sport equipment manufacturer Nike Inc.
The methodology used in this work is rooted in the field of machine learning: data is preprocessed to make it easily exploitable, and then standard techniques and algorithms like Naïve Bayes classifier and Support Vector Machine are employed to train classifiers and make predictions.
Beyond the comparison that is drawn between several techniques, the originality of this work lies in the fact that a generic training set is shown to give good results when used to train classifiers that will be applied to the specific problem of classifying product reviews. In other words, this work shows that the text elements that define and identify sentiment are constant across different use cases, and that generic texts can be used to extract knowledge which can be successfully exploited for a specific problem.
Moreover, this work presents an innovative treatment of neutral tweets, i.e., those that do not contain a positive or negative feeling. No specific training is done in this case, but the classifier is shown to be able to identify them in many instances, despite the fact that the training set only contains positive and negative samples. To this end, a different approach is used, based on the subjectivity of the text itself, under the assumption that a piece of text that conveys a feeling will be subjective, whereas a neutral piece of text will tend to be more objective.