EFFICIENT WORD SENSE DISAMBIGUATION TECHNIQUE FOR SENTENCE LEVEL SENTIMENT CLASSIFICATION OF ONLINE REVIEWS.
ABSTRACT : In the computational linguistics the extraction of actual sense of words from text has a long history in the field. Due to its importance in the field of sentiment analysis it is considered the most important one. During sentiment analysis more challenging problems are faced due to the ambiguous senses of words. In this work we propose a new method of word sense disambiguation (WSD) using matrix map of the semantic scores extracted from SentiWordNet of WordNet glosses terms. The correct sense of the target word is extracted and determined for which the similarity between WordNet gloss and context matrix is greatest. Our empirical results have shown that the proposed method improves the result of sentence level sentiment classification as evaluated on different domain datasets. From the result it is clear that the propose method achieves an accuracy of 90.71% at sentence level sentiment classification of online reviews.
KEYWORDS: Sentiment Analysis, Sentiment Classification, Word sense disambiguation,WordNet, SentiWordNet
The rapid growth and enhancement of World Wide Web and electronic document progress made it essential to classify the text documents from online customer point of views for the enhancement and organization of new facts.
Unstructured text in web contents is beneficial in different walks of life for example in decision making and planning of market strategies i.e. the researchers takes more interest in the field sentiment analysis of online text because it is the need of present era.
Now a day's people are excited in online gossip and online communication. They are using different online blogs, web forums as a social media, due to these blogs and web forums a lot of data has been generated on the internet . These data represent the opinion of various peoples about products, events, topics etc that is helpful in decision making in a particular area .
People describe their opinion against and in favour of people, organization, government authorities, and even they organize protests and other destructive demonstration type activities. For example In America we found in 2012 the occupy Wall Street demonstration which is mostly operated through social media like Facebook, twitter and other blogs.
Similarly in UK the riots against the police raised through the social media. In Libya a lot of people gathered in the revolution because of the social networking as they get each and every update inside their home and at the palm of their hands. People comments and discussed each and every aspect of a specific subject and every one share their opinion in an open manner.
According to the literature, Over 45,000 new blogs are created on a daily basis along with 1.2 million new posts each day. Moreover, 40% of the people in today's society rely on opinions, reviews and recommendations which are gathered from blogs, forums and other related resources. This data is rapidly changing due to round-the-clock updating of information on the Web. For instance, a survey has been done on more than 2000 Americans and the following results were concluded . 81% of Internet users have, at least once, done online searches about a particular product.
Between 43% and 84% of internet users who read online reviews of hotels, restaurants and various services like, medical services or travel agencies, report that their purchase choices are significantly influenced by the online comments and reviews.32% using an online ratings system, have rated a product, or person service, and 30% (18% of online senior citizens included) have posted an online review or a comment about a particular service or product . Sentiment classification techniques can be successfully used for the identification and understanding of opinion in online text. Therefore the online text mostly contains on these types of sentiment i.e. "positive", "negative" and "neutral" . We can make decision that whether the text is subjective or objective. Semantic orientation and machine learning are the two important techniques used for sentiment classification which is used in previous research.
Sentiment analysis is necessary for online text. It has many application for example, product, reviews, political reviews, sports reviews etc. the reviews about the product provide a proper guideline in form of feedback to the owner of product and also to the customers, similarly the familiarity of political leaders in the light of various comments they receive through social media like Facebook, twitters and other blogs.
Two lexical resources are mostly used for detecting of sentiments in the text i.e. SentiwordNet and WordNet. SentiWordNet contains the positive and negative scores of the sentiment and WordNet contains senses of each English word. 
Based on the context, word has many meanings. To extract the correct meaning of a word according to the context the Natural Language Processing (NLP) task known as Word Sense Disambiguation (WSD) is used. In many application area (e.g. Information retrieval (IR), sentiment classification etc.), WSD is most important task of NLP. WSD mostly categorized into two methods. Corpus based and knowledge based methods. The corpus based method is found more effective than knowledge base method because in this method the correct senses are collected for each ambiguous word  . In this method supervised learning technique is used to point out the correct sense for the word being used. Training data contains the annotated word from where classifier extracts the correct sense of the ambiguous word.  
In knowledge based method mostly use the unsupervised technique which require the external resources like lexical database or dictionary and do not depend on corpora. 
In this article we focus on knowledge based method as our target is to improve the result of sentiment analysis of online text by extracting the correct sense of each word and to eliminate ambiguity. Similarity based method is one of the two knowledge methods which compute the similarity between word senses in the context. The word has greater similarity is considered to be the correct sense. 
This article has many contributions in the field of sentiment analysis. In this work the method which have been proposed for the classification of sentiment of online text/reviews is sentence level rule based method for independent domain. In this method the online text/reviews splits into sentences and each sentence is semantically checked according to the context. (2) Short abbreviated words list is collected from social media and online blogs like Facebook, twitter, flicker etc which minimize the chance of data lost, and the original words is extracted from the list for short abbreviated words. (3) Used SentiWordNet as a lexical resource for obtaining the positive and negative score of sentiment which is helpful in decision making process. (4) WordNet database is used for the extraction of word senses according to the context from WordNet glosses. (5) There are many words that can change the context of the original words.
The scores extracted form SentiWordNet needs to be updated, for the purpose context shifter knowledge base included. (6) Similarly some words increase or decrease the polarity of a word which affects the words sense. So list of enhancer and reducer collected and used as knowledge base. (7) In a text single word have many senses which create the ambiguity because different senses have different scores either positive or negative. The problem is to automatically extract the correct sense. For this solution new method proposed which find the semantic similarity known as matrix map for word sense disambiguation. After WSD the results become more accurate and very helpful in decision making process. (8) New similarity based method for word sense disambiguation known as matrix map.
The paper in the remaining section is organized as follows: Section II presents the Literature Review and related works, in the Section III the proposed work is explained in detail. Section IV shows the result and discussion. Conclusion is made in Section V along with future work.
II BACKGROUND AND RELATED WORK
In the initial days of sentiment analysis late 1990 the researchers focused on subjectivity detection. Later they worked in the related areas like narrations, interpretation of metaphors. Subjectivity detection is described in the below literature. With the rapid increase World Wide Web and internet usage, Web becomes the source of information. Therefore the research moved slowly from subjectivity analysis towards sentiment analysis on online text.  Sentiment analysis is involved opinion of peoples through online social media, web forums and other customer reviews about a product. People described their opinion on the Web about online services, e-shopping, e-banking etc. the online text are available on different websites and forum like twitter, flicker, LinkedIn, Cnet etc. the information extracted from these sources are helpful customer and companies in decision making according to the market strategies. 
In sentence level text classification the text are split into sentences and check each sentence semantically by lexical or statistically techniques. The sentence is checked that whether it is subjective or objective. The subjective sentences are classified as positive, negative or neutral. In sentence level sentiment classification individual sentence is semantically checked by NLP methods.   
In  WordNet synonymy relations for tagging words method was suggested with three semantic dimensions. Using WordNet relation the shortest path for particular word is calculated and assign positive, negative weight to the word.
Dictionary based method uses in data finding from references and lexicographical resources like WordNet which assign sentiment to words. Mostly these methods utilize words relationship (synonymy, hyperonymy, antonymy, hyponymy) to find the word in a list.
In some recent methods word level sentiment classification is used by the dictionary definitions. For lexical based semantic orientation various dictionary are used like ConceptNet, SentiWordNet etc. Sentence level sentiment classification to employing the WordNet relations was firstly made by[24,25].
However, the text documents or reviews are broken down into sentences for sentiment analysis at the sentence level. These sentences are then evaluated by utilizing lexical or statistical methods in order to determine their semantic orientation . This process involves two functions; first is to determine the subjectivity or objectivity of a sentence and the next function is of taking the sentences with an opinion orientation which is subjective. Some existing work involves analysis at different levels  . Particularly, the level of semantic orientation involving words regarding opinion as well as the phrase level. Semantic orientation can be accumulated from the words and phrases to find out the overall Semantic Orientation of a particular sentence or review [28,29]
In literature Word Sense Disambiguation is widely studied specially in Natural Language Processing (NLP). In  WSD is described in detail. Two linguistic distributional hypotheses in inspires similarity based methods. First, that have similar meaning in appearance in a text and second is the original senses extracted from dictionary definitions. Using these hypotheses first WSD algorithm defined which determines the sense of polysemous word in glosses by the calculation of word overlapping in two are more words.  In  new WSD approach is defined to extract the correct sense from WordNet Glosses making a context vector.
In this work new approach is defined for word sense disambiguation by used matrix map to extract the correct sense of target word from WordNet glosses and improve the result of sentiment analysis of online review/text at a sentence level using lexical resource SentiWordNet.
III PROPOSED APPROACH
SENTENCE LEVEL SENTIMENT CLASSIFICATION In this work we classify the sentiments of online text at a sentence level. Each sentence belong to a review are semantically checked whether the sentence have positive or negative sentiments. For positivity and negativity score using the SentiWordNet lexical resource. SentiWordNet have score for English Language Word either in positive or negative. Individual sentence scores are obtained by calculating the words scores.
Usually the online text collected from social media and online blogs have short abbreviated words used commonly. The short abbreviated word database is attached and replaces it by original word. In a sentence some words change the context of the sentence, similarly some words reduce or enhance the actual score of sentiment, for this purpose the context shifter and enhancer, reducer dictionaries are used to update the sentence score according to the context.
WORD SENSE DISMABIGUATION (WSD) FOR IMPROVING RESULTS
The resultant score of the sentence are still not perfect because it has ambiguity as the word of natural language can be used in different senses having different scores according to the context. It is very difficult to decide automatically that which sense is used in the sentence. In this work new method is developed for correct sense extraction during sentiment analysis. Firstly senses of individual words are extracts from WordNet Glosses. To select the correct sense we create a matrix of similarity scores of word and senses of words. Finally if correct sense is extracted then will get the correct score of words from where the sentence score will be improved. The process of proposed system along with WSD is described in the blow steps.
i. Split the online extracted text into sentences to form a Bag of Sentences (BOS).
ii. Short abbreviated words are replaced by their text from short abbreviated word list.
iii. Text cleaning process applied to remove noise from sentences, spelling correction and case correction.
iv. Part of Speech (POS) tagging of each word in a sentence and find the position of word in a sentence.
v. Prepare the list of sentiment words i.e. Noun, Adjective, Verb and Adverb
vi. Use SentiWordNet as a lexical resource for obtaining the positive and negative score of sentiment.
vii. Use WordNet database for the extraction of word senses according to the context from WordNet glosses.
viii. To calculate the importance of sentiments, make a matrix of sentiment words using their scores in a sentence level.
ix. Obtained WordNet glosses of the matrix word and prepare a matrix of glosses using their scores.
x. Find the similarity between sentiment and WordNet Glosses by mapping the matrices of sentiment word and WordNet glosses to extract the correct sense.
xi. After extracting the correct sense update the polarity of a sentence by the knowledge base according to the sentence structure and contextual information.
(1) Split sentences:
In this work we are doing sentence level sentiment
classification so the online text/ reviews are split into sentences and represent each sentence by sentence_ID. To split a review into sentences we used "." as a sentence boundary.
(2) Short Abbreviated Word List:
Short abbreviated words list is collected from social media and online blogs like Facebook, twitter, flicker etc and updated the list with original words which minimize the chance of data lost, and the original words is extracted from the list for short abbreviated words. i.e.
Table 1 Short abbreviated list
Just4u###Just for you
(3) Text Cleaning Process:
In this step automatic spell checker module will automatically correct the incorrect spelling words and also case correction has been made. The online text has more noise so in this step the noise will be removed before processing the text for sentiment analysis.
(4) Part of Speech (POS) tagging:
We use stanford-postagger to assign tag to each word. After tagging each word we select efficient tag word like Noun, Verb, Adjective and Adverbs as described in table 2.
Table 2 Part of Speech Tags
(5) Sentiment Word List:
Once tagging has been made then generate word list consist of Noun,Adjective, Verb and Adverb along with the reference of sentence and position in the sentence. In this list each word is notated with the POS_ID.
(6) SentiWordNet for Positive and Negative Score: Each word in the list updated with Positive and Negative score from SentiWordNet lexical resource. In this step we extract the highest score of the word form SentiWordNet without considering the sense of the sentiment.
(7) WordNet for word sense extraction:
To extract the correct sense of sentiment word we use WordNet Glosses where multiple senses of a word exist along with a sense number and also contain the actual positive and negative score according to the word sense.
(8) Sentence Level Sentiment word Matrix: Construct a matrix of word along with extracted score for each word form sentiment word list at a sentence level described in a WSD example.
(9) Matrices for WordNet Glosses : Single word may be described in multiple senses in a WordNet Glosses. Therefore construct separate matrix for each sense along with positive, negative score and POS abbreviation.
(10) Mapping sentence matrix with Glosses Matrices to find similarity: We now construct a matrix which compute and map the constructed matrices and apply word sense disambiguation (WSD) algorithm for extraction of correct sense and update the score in a semantic word list according to the extracted sense.
(11) UPDATE SENTENCE POLARITY
Word_List:= All Sentiment Words in a sentence
AMB_Word:= Ambigues Word in a Word_List
WRD_GLOSSES_LIST: WordNet Glosses for
W_SENSE:= WordNet Sense for Word in Word_List
ForeachAMB_Word in Word_List
SELECT W_POSITIVE_SCORE AND
W_NEGATIVE_SCORE FROM SentiWordNet
Foreach WRD_GLOSS in
SELECT G_POSITIVE_SCORE AND
If W_POSITIVE_SCORE AND
W_NEGATIVE_SCORE is similar
WRD_GLOSS then W_SESNSE:=
Table 3 Sentence Level Sentiment Word Matrix
Now the score of each word is extracted according to the context therefore update the sentence level score according to the sentence structure and contextual information for sentiment classification.
A WORD SENSE DISAMBIGUATION EXAMPLE
To explain word sense Disambiguation we take an example.
Consider the sentence extracted from online reviews "The virus spread in all saving deposit money systems in the bank." This sentence contains ambiguous words virus and bank. To extract the correct sense we will follow the steps described above. First we will construct sentence level sentiment matrix as
RESULT AND DISCUSSION
In order to evaluate the effectiveness of the proposed method we conduct an experiment for evaluation we collect 1000 twitter comments from the dataset publically exists for researcher. and 550 comments extracted about political party Pakistan Tehreek-e-Insaf success in Khyber Pakhtunkhwa in 2013 election from Geo websitei, 2013 shown in Table 8. Similarly 1000 reviews extracted from Skytrax3 to evaluate our system which contains about 2.5 million reviews of airlines and airports. The review is processed and splitted into sentences. After semantic orientation positive and negative sentences are extracted. All the extracted datasets are processed according to the proposed method steps defined in section III. Table 8 shows the classification of datasets and its sentencesii Now extract WordNet glosses for this word which following three.
i. infection by a virus that is pathogenic to humans
ii. A true virus cannot spread to another computer without human assistance.
iii. the virus of jealousy is latent in everyone
Table 5 Glosses Matrix 1
Word###Positive Score###Negative Score
###extracted form###extracted form###POS Sentence ID
Table 6 Glosses Matrix 2
Word###Positive Score###Negative Score
###extracted form###extracted form###POS Sentence ID
Table 7 Glosses Matrix 3
Word###Positive Score###Negative Score
###extracted form###extracted form###POS Sentence ID
Table 8 Opinion and Non Opinion Sentences
Datasets###Comments Sentences###Opinion###Non Opinion
Table 9 accuracy of Positive and Negative Sentences
Pakistan Comments Negative###393###365###28###.928
IV. For semantic orientation only subjective sentences are processed. The semantic score are extracted from SentiWordNet. For twitter comments 1200 positive and 436 negative sentences have taken shown in Table 9. To Test the performance of the proposed method these sentences are evaluated by the proposed system where positive accuracy 89% and negative accuracy is 90% for twitter comments shown in Table 9. Similarly for election comments 700 positive and 393 negative sentences evaluated in which the accuracy are 90% and 93% respectively. For airline reviews 4000 positive and 1405 negative sentences have taken where accuracy 87.5% and 96% respectively. From the result it is clear from that average accuracy is 90.71% at sentence level is achieved. Therefore it can be concluded that our lexical based methods performance is improved and is adoptive with different domains datasets.
Table 9 Performance of Different Sentence Level Sentiment Classification
Table 9 show the performance of the proposed method compared with other methods. The contribution of this work is the preparation of short abbreviated word list, introducing a method of matrix map for WSD, and sentence level semantic orientation taking into account all parts of speech and sentence contextual structure. Figure-2 shows the performance of the proposed method as compared to machine teaching , , and Twitrratr methods.
V CONCLUSION AND FUTURE WORK
Our empirical results have shown that the proposed method improves the result of sentence level sentiment classification as evaluated on various domain datasets. However, in future direction our research lies in applying WSD using matrix map for semantic orientation at document level and feedback level and WSD matrix map will applied for the improvement of sentence clustering which may in turn be based on improved sentence similarity measures. We are currently exploring the feasibility of using the matrix map technique in other text mining task.
1. Aurangzeb khan, Baharum Baharudin, Khairullah khan Sentiment Classification from Online Customer Reviews Using Lexical Contextual Sentence Structure
2. Aurangzeb khan, Baharum Baharudin, Sentiment Classification Using Sentence-level Semantic Orientation of Opinion Terms from Blogs (2011 IEEE)
3. Yan Dang, Yulei Zhang, Hsinchun Chen A Lexicon Enhanced Method for Sentiment Classification: An Experiment on Online Product Reviews (IEEE 2013)
4. Statistical Text Analysis and Sentiment Classification in Social Media
5. A Review of Domain Adaptation for Opinion Detection and Sentiment Classification
6. Sentiment Classification by Sentence Level Semantic Orientation using SentiWordNet from Online Reviews and Blogs
7. Sentiment Classification of Reviews Using SentiWordNet
8. Aurangzeb khan, Baharum Baharudin , 2011 Sentence Level Semantic Orientation of Online Reviews and Blogs using SentiWordNet for Effective Sentiment Classification
9. B. Pang and Lillian Lee Opinion Mining and Sentiment Analysis Foundations and TrendsR in Information Retrieval Vol. 2, Nos. 1-2 (2008) 1-135c 2008 B
10. A. Khan, B. Baharudin, L.H. Lee, and K. Khan, "A Review of Machine Learning Algorithms for Text- Documents Classification," Journal of Advances in Information Technology, vol. 1, 2010, pp. 4-20.
11. Jun Fu Cai, Wee Sun Lee ,YeeWhye The (2007) Improving Word Sense Disambiguation Using Topic Features. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
12. Y. K. Lee and H. T. Ng. 2002. An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation. In Proc. of EMNLP.
13. KHALED ABDALGADER and ANDREW SKABAR Unsupervised Similarity-Based Word Sense Disambiguation Using Context Vectors and Sentential Word Importance. ACM Transactions on Speech and Language Processing, Vol. 9, No. 1, Article 2, Publication date: May 2012
14. NAVIGLI, R. 2009. Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, 1-69
15. SNYDER,B. AND PALMER, M. 2004. The English all-words task. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL/SIGLEX). 41-43.
16. YAROWSKY,D. AND FLORIAN, R. 2002. Evaluating sense disambiguation across diverse parameter spaces. Nat- ural Lang. Eng. 8, 4, 293-310.
17. NAVIGLI,R. AND LAPATA, M. 2010. An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Trans. Pattern Anal. Mach. Intell. 32, 4, 678-692.
18. FELLBAUM,C.,ED. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
19. J.M. Wiebe, "Tracking point of view in narrative," International Journal of Computational Linguistics, vol. 20, 1994, pp. 233-287.
20. Zhu, F., and Zhang, X. (2010). Impact of online consumer reviews on sales: The moderating role of product and consumer characteristics. Journal of Marketing, 74(2), 133-148. Am Marketing Assoc.
21. Pang, B., and Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135. Now Publishers Inc.
22. Hu, M., and Liu, B. (2004). Mining and summarizing customer reviews. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177).
23. J. Kamps, M.J. Marx, R.J. Mokken, and M. De Rijke, Using wordnet to measure semantic orientations of adjectives, European Language Resources Association (ELRA), 2004.
24. S.M. Kim and E. Hovy, "Determining the sentiment of opinions," Proceedings of the 20th international conference on Computational Linguistics, 2004, pp. 1367-1374.
25. S.M. Kim and E. Hovy, "Automatic detection of opinion bearing words and sentences," Companion Volume to the Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), 2005, pp. 61-66.
26. Leung, C. W. K., and Chan, S. C. F. (2008). Sentiment Analysis of Product Reviews. Encyclopedia of Data Warehousing and Mining Information Science Reference,, 1794-1799. Citeseer.
27. Westerski, A. (2007). Sentiment Analysis : Introduction and the State of the Art overview. Universidad Politecnica de Madrid, Spain (pp. 211- 218).
28. Turney, P. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL002) (pp. 417-424).
29. Andreevskaia, A., andBergler, S. (2008). When specialists and generalists work together: Overcoming domain dependence in sentiment tagging. Proceedings of ACL-08: HLT, 290-298. Citeseer.
30. NAVIGLI, R. 2009. Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, 1-69.
31. HARRIS, Z. 1954. Distributional structure. In The Philosophy of Linguistics J. J. Katz, Ed. , Oxford University Press, Oxford, UK, 26-47.
32. LESK, M. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pinecone from an ice cream cone. In Proceedings of the 5th Annual International Conference On Systems Documentation (SIGDOC). 24-26
33. Abdalgader, K. and Skabar, A. 2012.Unsupervised similarity-based word sense disambiguation using contextvectors and sentential word importance. ACM Trans. Speech Lang. Process. 9, 1, Article 2 (May 2012), 21 pages.
34. D.A. Shamma, L. Kennedy, and E.F. Churchill, "Tweet the debates: understanding community annotation of uncollected sources," Proceedings of the first SIGMM Workshop on Social Media, 2009, pp. 3-10.
35. A. Go, R. Bhayani, and L. Huang, "Twitter sentiment classification using distant supervision," CS224N Project Report, Stanford, Stanford University, 2009, pp. 1-12.
EDC, Gandhara University Peshawar, IECS, UST BannuIECS, UST Bannu, email@example.com firstname.lastname@example.org email@example.com
|Printer friendly Cite/link Email Feedback|
|Author:||Khan, Muhammad Faheem; Khan, Aurangzeb; Khan, Khairullah|
|Date:||Dec 31, 2013|
|Previous Article:||PERFORMANCE OF PROMISING LENTIL CULTIVARS AT DIFFERENT NITROGEN RATES UNDER IRRIGATED CONDITIONS.|
|Next Article:||A CASE STUDY OF EFFORT ESTIMATION IN AGILE SOFTWARE DEVELOPMENT USING USE CASE POINTS.|