EWU Institutional Repository

Sentiment Analysis on Twitter Data

Show simple item record

dc.contributor.author Rahman, Anika
dc.contributor.author Ali, Ahamad
dc.date.accessioned 2017-02-22T06:37:27Z
dc.date.available 2017-02-22T06:37:27Z
dc.date.issued 12/1/2016
dc.identifier.uri http://dspace.ewubd.edu/handle/2525/2077
dc.description This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh en_US
dc.description.abstract Every day using micro blogging millions of users share opinions on various topics. Twitter is a very popular micro blogging site where users are allows a limit of 140 characters; this kind of restriction makes the users to be concise as well as expressive at the same moment. For that reason it’s become a rich source for sentiment analysis and belief mining. For the same reason we become interested to work with twitter data. The aim of this project is to develop such a functional classifier which can accurately and automatically classify the sentiment of an unknown tweet.In this thesis, we propose techniques to classify the sentiment label accurately. Therefore, we introduce two techniques: one of the technique is sentiment classification algorithm (SCA) and the other one is a machine learning algorithm SVM. For both SCA and SVM we calculate weights based on different features. Then in SCA, build feature vector we build pair of tweets by using different features. From those pair we measure the Euclidian distance for every tweet with its pairs. From those distance we only consider nearest 8 tweets label to classify that tweet. On the other hand in SVM, build a matrix from the calculated weights based on different features and by applying PCA (principle component analysis) we try to find k eigenvector with the largest Eigen values. From this transformed sample dataset we try to find the best c and best gamma by using grid search technique to use in SVM. Finally, we apply SVM to assign the sentiment label of each tweet in the test dataset. In both algorithms we use confusion matrix to calculate the accuracy. In out we have found that SCA always perform better than the SVM. We also evaluate the performance of these two techniques for different size of datasets. en_US
dc.language.iso en_US en_US
dc.publisher East West University en_US
dc.relation.ispartofseries ;CSE00063
dc.subject Sentiment Analysis on Twitter Data en_US
dc.title Sentiment Analysis on Twitter Data en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account