Date of Award
Master of Science (MS)
Electrical Engineering and Computer Science
Computer Sciences | Physical Sciences and Mathematics
Interactive websites generate terabytes of data on a daily basis. This data canbe used in multiple analytical applications to teach computers more about human behavior. Text classification is such an application. Multiple freely available user-generated text data can be used to teach computers to identify the sentiments behind a user’s on-screen interactions without the need of any human intervention. Sentiment analysis is an interesting problem, solving which would theoretically get a computer closer to passing the Turing test. Through this thesis, we test the ability of a classifier to accurately identify user sentiments. However, we do not focus on standard classification settings and the aim is to train the classifier in such a way that it would also be effective in identifying sentiment behind user generated text generated from a completely new social media platform. To be able to do this, we must first identify behavioral bias based on user interactions in two different social media sites as well as websites that accept user reviews. This bias must then be mitigated in order to obtain an unbiased classifier that can then be used to identify user sentiments on any social media platform. For the research in this thesis, such user-generated text is obtained from the social media sites Reddit and Twitter. We also obtain product review data related to both books and wine. Various natural language processing techniques are then employed to process the data and extract similar and dissimilar trends. Vectorized user text would be used to train sentiment classifiers. Finally, classification bias would be identified and mitigated in order to obtain classifiers that can identify human sentiments in real-time with an improved accuracy with limited dependency on source information.
Deshpande, Alpana, "Sentiment Classification Bias In User Generated Content" (2021). Theses - ALL. 478.