Sentiment Analysis on Twitter Data

Rahman, Anika; Ali, Ahamad

DSpace Home
→
Department of Computer Science & Engineering
→
B.Sc in Computer Science and Engineering
→
Thesis 2016
→
View Item

dc.contributor.author	Rahman, Anika
dc.contributor.author	Ali, Ahamad
dc.date.accessioned	2017-02-22T06:37:27Z
dc.date.available	2017-02-22T06:37:27Z
dc.date.issued	12/1/2016
dc.identifier.uri	http://dspace.ewubd.edu/handle/2525/2077
dc.description	This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh	en_US
dc.description.abstract	Every day using micro blogging millions of users share opinions on various topics. Twitter is a very popular micro blogging site where users are allows a limit of 140 characters; this kind of restriction makes the users to be concise as well as expressive at the same moment. For that reason it’s become a rich source for sentiment analysis and belief mining. For the same reason we become interested to work with twitter data. The aim of this project is to develop such a functional classifier which can accurately and automatically classify the sentiment of an unknown tweet.In this thesis, we propose techniques to classify the sentiment label accurately. Therefore, we introduce two techniques: one of the technique is sentiment classification algorithm (SCA) and the other one is a machine learning algorithm SVM. For both SCA and SVM we calculate weights based on different features. Then in SCA, build feature vector we build pair of tweets by using different features. From those pair we measure the Euclidian distance for every tweet with its pairs. From those distance we only consider nearest 8 tweets label to classify that tweet. On the other hand in SVM, build a matrix from the calculated weights based on different features and by applying PCA (principle component analysis) we try to find k eigenvector with the largest Eigen values. From this transformed sample dataset we try to find the best c and best gamma by using grid search technique to use in SVM. Finally, we apply SVM to assign the sentiment label of each tweet in the test dataset. In both algorithms we use confusion matrix to calculate the accuracy. In out we have found that SCA always perform better than the SVM. We also evaluate the performance of these two techniques for different size of datasets.	en_US
dc.language.iso	en_US	en_US
dc.publisher	East West University	en_US
dc.relation.ispartofseries	;CSE00063
dc.subject	Sentiment Analysis on Twitter Data	en_US
dc.title	Sentiment Analysis on Twitter Data	en_US
dc.type	Thesis	en_US