Central University of Las Villas, Cuba Artificial intelligence

Central University of Las Villas, Cuba Artificial intelligence

Central University of Las Villas, Cuba Artificial intelligence Lab Computer Science Department Opinion Mining Prof. Leticia Arco Garca [email protected] Motivation Someone who wants to buy a car Looks for comments and reviews Someone who just bought a car

Comments on it Writes about his experience Car manufacturer Gets feedback from customers Improve their products Adjust marketing strategies Opinions Opinions are central to almost all human activities and are key influencers of our behaviours

Our beliefs and perceptions of reality, and the choices we make, are, to a considerable degree, conditioned upon how others see and evaluate the world When we need to make a decision we often seek out the opinions of others This is not only true for individuals but also true for organizations Social media on the Web With the explosive growth of social media on the Web, individuals and

organizations are increasingly using the content in these media for decision making 4 Contents Origin and definition Different levels of analysis Opinions: definition, types and main problems Sentiment analysis tasks Polarity detection: two approaches Other sentiment analysis proposals Lexical resources and available datasets Applications in Business Informatics

Our results Challenges: SemEval-2017 and TASS 2017 5 Origin (1/2) Some earlier work on interpretation of metaphors, sentiment adjectives, subjectivity, view points and affects (1990-1999) Learning subjective adjectives from corpora (Wiebe, 2000) Yahoo! for Amazon: Extracting market sentiment from stock message boards (Dans and Chen, 2001) An operational system for detecting and tracking opinions in on-line discussion (Tong, 2001) Origin (2/2)

Sentiment analysis Sentiment analysis: Capturing favourability using natural language processing (Nasukawa and Yi, 2003) Opinion mining Mining the peanut gallery: Opinion extraction and semantic classification of product reviews (Dave et. al., 2003) Definition Sentiment analysis is the field of study that analyses

peoples opinions sentiments evaluations appraisals attitudes emotions towards entities such as

products services organizations individuals issues events topics their attributes

Review mining Sentiment analysis Subjectivity analysis Sentiment mining Affect analysis

Emotion analysis Opinion extraction Opinion mining Different levels of analysis (1/2) Document level Classifies whether a whole opinion document expresses a positive or negative sentiment Assumption: Each document expresses opinions on a single entity Sentence level

Determines whether each sentence expresses a positive, negative or neutral opinion Closely related to subjectivity classification The iPhones call quality is good, but its battery life is short. Both levels do not discover what exactly people liked and did not like Different levels of analysis (2/2) Entity and aspect level Performs finer-grained analysis Directly looks at the opinion itself Goal: discover sentiments on entities and/or their aspects The iPhones call quality is good, but its battery life is short. Entity: iPhone

Aspects: call quality and battery life Sentiment on iPhones call quality: positive Sentiment on its battery life: negative Opinion The meaning of opinion itself is still very broad Sentiment analysis mainly focuses on opinions which express or imply positive or negative sentiments 1, 2, 3 and 4 positive 5 negative Date Opinion source

Opinion holders Sentiment orientations Opinion polarities Topic Opinion definition An opinion is a quintuple (ei, aij, sijkl, hk, tl) where: ei is the name of an entity aij is an aspect of ei sijkl is the sentiment on aspect aij of entity ei hk is the opinion holder tl is the time when the opinion is expressed by hk The sentiment sijkl is positive, negative, or neutral, or

expressed with different strength/intensity level, e.g., 1 to 5 Types of opinions (1/2) Regular opinions Express a sentiment only on a particular entity or an aspect of the entity Direct opinion Belgian chocolates taste very good. Indirect opinion After injection of the drug, my joints felt worse. Comparative opinions Compare multiple entities based on some of their shared aspects

Belgian beers taste much better than Cuban beers. Types of opinions (2/2) Explicit opinion Is a subjective statement that gives a regular or comparative opinion UHASSELT is a very good university. Implicit opinion Is an objective statement that implies a regular or comparative opinion The battery life of Nokia phones is longer than Samsung phones. Explicit opinions are easier to detect and to

classify than implicit opinions Sentiment analysis is a NLP problem It touches every aspect of NLP Co-reference resolution Negation handling Word sense disambiguation Sentiment analysis is a highly restricted NLP problem It does not need to fully understand the semantics of each sentence or document It only needs to understand some aspects of it Positive or negative sentiments Their target entities

Their topics Opinion Mining is more difficult than Text Mining Informal language Abbreviations Emoticons Spelling and typographical errors Ironic and sarcastic language Language knowledge level Cultural level These characteristics impose a greater difficulty on the opinion mining, regarding other text mining tasks

Sentiment analysis tasks Objective of sentiment analysis: Given an opinion document, discover all opinion quintuples 1. Entity extraction and categorization 2. Aspect extraction and categorization 3. Opinion holder extraction and categorization 4. Time extraction and standardization 5. Aspect sentiment classification 6. Opinion quintuple generation Polarity detection: two approaches Semantic approaches Characterized by the use of dictionaries of words (lexicons) with semantic orientation of polarity or opinion

Computational learning techniques Consist on training a classifier using any supervised learning algorithm from a collection of annotated texts Words expressing feeling or opinion Positive opinion: good, wonderful, amazing, Negative opinion: bad, poor, terrible, Sentiment lexicon or opinion lexicon (sentiment words, opinion words, polar words, opinion-bearing words) Base type

Comparative type Approaches to compile sentiment words Manual approach Labour intensive and time consuming Useful for final check in automated approaches Dictionary-based approach Few seed sentiment words to bootstrap based on the synonym and antonym structure of a dictionary Corpus-based approach 1. Given a seed list of known sentiment words, discover other sentiment words and their orientations from a domain corpus

2. Adapt a general-purpose sentiment lexicon to a new one using a domain corpus for sentiment analysis applications in the domain Sentiment lexicon Although sentiment words and phrases are important for sentiment analysis, only using them is far from sufficient Sentiment lexicon is necessary but not sufficient for sentiment analysis Some problems of feeling words They may have opposite orientations in different

application domains A sentence containing sentiment words may not express any sentiment Sarcastic sentences with or without sentiment words are hard to deal with Many sentences without feeling words can also imply opinions It is a large dictionary, covering thousands of words. He has put on weight, and is Can

you tell me which camera is now quite large. good? He likes to talk large, but I think If can find a good camera in the he Iexaggerates. shop, I'll buy it. I HATE to admit it but, I LOVE He killed things. the ant before it could

admitting bite him. I liiikeee winter, summer does not arrive yet :-( What a great car! It stopped working in two days. This washer uses a lot of water. Sentiment classification using supervised learning (1/3) Two-class classification problem: positive and negative Training and testing data used are normally product reviews A review with 4 or 5 stars is considered a positive review A review with 1 to 2 stars is considered a negative review

First approaches: Nave Bayes classification Support Vector Machines Sentiment classification using supervised learning (2/3) Like other supervised machine learning applications, the key for sentiment classification is the engineering of a set of effective features Terms and their frequency Part of speech

Sentiment words and phrases Rules of opinions Sentiment shifters

Syntactic dependency Sentiment classification using supervised learning (3/3) Apart from classification of positive and negative sentiments, researchers also studied the problem of predicting the rating scores (e.g., 15 stars) of reviews Regression problem Subjectivity classification Objective sentences Express factual information from sentences

Subjective sentences Express subjective views and opinions Is subjectivity equivalent to sentiment? I think that he went home. The phone broke in two days. Emotion Emotions are our subjective feelings and thoughts Six primary emotions: love, joy, surprise, anger, sadness and fear Opinions that we study in sentiment analysis are mostly evaluations

Rational evaluations are from rational reasoning, tangible beliefs, and utilitarian attitudes Emotional evaluations are from non-tangible and emotional responses to entities which go deep into peoples state of mind Five sentiment ratings emotional negative (-2), rational negative (-1), neutral (0), rational positive (+1), and emotional positive (+2) Aspect-based sentiment analysis Such methods are typically unsupervised Sentiment lexicon Composite expressions Rules of opinions

Sentence parse tree Sentiment shifters But-clauses Aggregate opinions Aspect extraction approaches Extraction based on frequent nouns and noun phrases Extraction by exploiting opinion and target relations Extraction using supervised learning Extraction using topic modelling Semantic classification and deep learning Grouping aspects into categories

Aspect expressions need to be grouped into synonymous aspect categories Each category represents a unique aspect Same aspect for phones: call quality and voice quality Many aspect expressions are multi-word phrases, Grouping such aspect expressions fromwhich cannotaspect be easilyis the same handled with dictionaries critical for opinion analysis WordNet and other thesaurus

movie and picture are synonyms in movie reviews picture is more likely to be synonymous to photo while movie to video in camera reviews Opinion summarization (1/3) Different entity names Aspect-based opinion summary Different aspect names Opinion summarization (2/3) Visualization of aspect-based summary of opinions on a digital camera

Opinion summarization (3/3) Visualization of aspect-based summaries of opinions Opinion spammers A key feature of social media is that it enables anyone from anywhere in the world to freely express his/her views and opinions without disclosing his/her true identify and without the fear of undesirable consequences Opinion spammers Friends and family Competitors Company employees

Genuine customers Businesses that provide fake review writing services Some businesses give discounts and even full refunds to some of their customers on the condition that the customers write positive reviews for them Agencies and political organizations may employ people to post messages to secretly influence social media conversations and to spread lies and disinformation Opinion spammers vs opinion spam detection Review content: linguistic features Meta-data about the review: user-id, star rating, time, host IP address,

Product information Opinion spam detection Supervised There is no labelled training data for learning Unsupervised Spam detection based on atypical behaviours Exploit duplicate reviews Spam detection using review graph

Create features Group spam detection: Frequent pattern mining Cross-domain sentiment classification A classifier trained using opinion documents from one domain often performs poorly on test data from another domain

Words and even language constructs used in different domains for expressing opinions can be quite different Learn as humans do The same word in one domain may mean positive but in another Retain learned knowledge domain may mean negative from previous tasks and use it to help future learning

Domain adaptation or transfer learning is needed A small amount of labelled training data for the new domain Is a continuous learning process where the learner has performed a sequence No labelled data for the new of learning domaintasks

Lifelong machine learning Cross-language sentiment classification Perform sentiment classification of opinion documents in multiple languages Researchers from different countries want to build sentiment analysis systems in their own languages. Companies want to know and compare consumer opinions about their products and services in different countries Co-training methods Lexical resources

Opinion search and retrieval Find public opinions about a particular entity or an aspect of the entity Find customer opinions about a digital camera Find opinions of a person or organization (i.e., opinion holder) about a particular entity or an aspect of the entity (or topic) Find Charles Michels opinion about terrorism Lexical resources WordNet Affect SentiWordNet General Inquirer

WordNet Affect WordNet-Affect is an extension of WordNet Domains, including a subset of synsets suitable to represent affective concepts correlated with affective words Affective labels (a-labes) are assigned to a number of WordNet synsets WordNet Affect: Terms and affective categories

Some terms related to "university" through their emotional categories SentiWordNet SentiWordNet is a lexical resource for opinion mining SentiWordNet assigns to each synset of WordNet three sentiment scores: Positivity Negativity Objectivity Generating SentiWordNet 1. A weak-supervision, semi-supervised learning step 2. A random-walk step

General Inquirer Harvard categories: Positive, Negative, Strong, Week, Active, Passive, Pleasure, Pain, Feel, Arousal, Virtue, Emotion, New categories based social cognition Lasswell value dictionary categories Some public available datasets Stanford large movie dataset Movie TripAdvisor TBOD ISEAR

DUC data Spinn3r dataset HASH EMOT OpinRank dataset Opinion mining and enterprises Enterprises are open and flexible in the use of technological tools to sense customers and market Acquiring information in real-time allows the company to be agile and to develop Sense and Response capabilities An agile enterprise respond immediately to any internal or external event as customer demand or customer opinions Knowing what the customer thinks of a given product/service

helps top management to introduce improvements in processes and products Customer opinions represent a potential of knowledge to be consider for the acquisition of competitive advantages Opinions are very important for decision making Gretzel and Yoo (2008) demonstrate that 97.7% of travel booking decisions are made after consulting other travellers opinions, of which 77.9% involve the use of customer reviews as a source of information helping to make a better decision Gretzel, U. & Yoo, K. H. (2008) Use and Impact of Online Travel Reviews Information and Communication Technologies in Tourism. Innsbruck, Austria.

How can a sentiment analysis tool help my brand? Better understand the motivations behind sentiment Learn from social posts, news, reviews, and more Benchmark against competitors Track purchase intent Evaluate campaign impact Analyse product launch response Some sentiment analysis tools Opinion Crawl Meaning cloud Trackur SAS

Opentext Statsoft NetOwl Extractor Meltwater Cloudbased Eventprocessing Architecture for Opinion Mining (1/2) Smart distributed architecture for opinion mining on internetbased content that answers key challenges: Integrating heterogeneous data sources Adapting to events through dynamic system configuration A novel approach of semantic complex event processing in a cloud environment capturing different levels of information: Event data

Content from various heterogeneous sources Distributed sources Dynamic co-reference resolution Cloudbased Eventprocessing Architecture for Opinion Mining (2/2) 1. Topic modelling and sentiment analysis 2. Deep linguistic and interlinking analysis 3. Transfer learning and active learning of opinions 4. Cloud computing and event processing Enterprise information fusion for real-time business intelligence (1/2) Correlate the

external events in real-time with known facts about the internal operations and transactions of the enterprise and its ecosystem Enterprise information fusion for real-time business intelligence (2/2) News event detection from Twitter

Identifying customer preferences about tourism products using an aspect-based opinion mining approach A novel application mining for competitive intelligence A new method to extract opinion patterns from customer reviews and its application to evaluate resources or internal factors in an enterprise 1. Opinion gathering 2.

Text pre-processing 3. Factor and polarity detection 4. Internal factor evaluation Customer voice sensor Call centre is an important intermediary betweenfor enterprise

and customers A comprehensive opinion mining system call centre It helps customers to solve the conversation problems It allows the enterprise to deeply analyse the customer's voice and make a distinct market positioning Mobile application for customers reviews opinion mining The Power of Text-mining in Business Process Management (1/2)

The Power of Text-mining in Business Process Management (2/2) PosNeg Opinion Opinions Identify terms Disambiguate lexically each term Obtain all meanings of each term Classify each term in positive or negative Evaluate the opinion Improving SentiWordNet 3.0 84342 terms

Preprocessing stage: Split terms considering if they have polarity values assigned or not 5037 79305 terms Stage 1: Assign polarity values considering the synonyms of terms without assigned polarity values 51027 28278 terms

Stage 2: Assign inverse polarity values considering the antonyms of terms without assigned polarity values 5678 22600 terms Stage 3: Assign polarity values considering the synonyms of terms with assigned polarity values 5770 16830 terms

Stage 4: Assign inverse polarity values considering the antonyms of terms with assigned polarity values 15291 terms without assigned polarity values 1539 69051 terms with assigned polarity values SpanishSentiWordNet agresor n 09195176 09158637 09848308 attacker assailant aggressor assaulter aggressor robber Intralinguistic index The spanish term

and its POS label English meaning of the term Improved SentiWordNet 3.0 Negative and positive polarities of each meaning Evaluate the polarity of the term by adding the positive and negative polarities of its meanings Negative and positive polarities of the Spanish term SpanishSentiWordNet Topic detection assisting polarity detection large room with 2 double beds

and 2 bathrooms, The TV was Ok, a 27' CRT Flat Screen. We got a We stay at Hilton for 4 nights last march. It was a pleasant stay. We got a large room with 2 double beds and 2 bathrooms, The TV was Ok, a 27' CRT Flat Screen. The concierge was very friendly when we need. The room was very cleaned when we arrived, we ordered some pizzas from room service and the pizza was Ok also. The main Hall is beautiful. The breakfast is charged, 20 dollars, kinda expensive. The internet

access (WiFi) is charged, 13 dollars/day. Pros: Low rate price, huge rooms, close to attractions at Loop, close to metro station. Cons: Expensive breakfast, Internet access charged. Tip: When The room leaving the building, always use the Michigan Av exit. Its a great view. The concierge was very friendly when we need. The breakfast is

charged, 20 dollars, kinda expensive. was very cleaned when we arrived Schema for topic segmentation and detection Textual corpora Represent textual units Identify textual units

vectors, graphs, probabilistic distribution textual units Pre-process tokens Represent segments vectors, graphs, probabilistic distribution Cluster segments

Segment segments segment clusters (topics) Framework OpinionTopicDetection Desktop application OpinionTD Label segment clusters Topics and corresponding labels

64 Open issues and future directions (1/2) Data collected from various resources are often so much noisy, wrongly spelt and unstructured There is a lack of universal opinion grading system across sentiment dictionaries Online discussion and political discussions often contain irony and sarcastic sentences For better product comparison, we should compare a set of products with respect to their common aspects The lack of proper review spam dataset is a major issue in order to perform opinion spam detection

Open issues and future directions (2/2) A very few attempts were made to utilize the potential of optimization techniques for feature selection There is a lack of opinion mining system in non-English languages Cross-domain sentiment analysis is still a major challenge Aspect level sentiment analysis is very much required for comparative visualization of similar kind of products The main challenge lies in review helpfulness is the validation of the proposed method Challenges SemEval 2017 Detecting sentiment, humor, and truth

Task Task Task Task Task 4: 5:

6: 7: 8: Sentiment Analysis in Twitter Fine-Grained Sentiment Analysis on Financial Microblogs and News #HashtagWars: Learning a Sense of Humor Detection and Interpretation of English Puns RumourEval: Determining rumour veracity and support for rumours TASS 2017 Task 1: Sentiment analysis at tweet level Task 2: Aspect-based sentiment analysis

Central University of Las Villas, Cuba Artificial intelligence Lab Computer Science Department Thanks! Questions, ideas, suggestions, comments, Opinion Mining Prof. Leticia Arco Garca [email protected]

Recently Viewed Presentations

  • Apresentação do PowerPoint - WordPress.com

    Apresentação do PowerPoint - WordPress.com

    formaÇÃo continuada para professores da educaÇÃo bÁsica na Área de educaÇÃo do campo - moodle. formaÇÃo continuada para professores de matemÁtica no moodle. auxiliar de cozinha. certific camareira. certific pedreiro. instituto federal de roraima campus de amajari. amajari. operador de...
  • Managed Services

    Managed Services

    Managed services. Migration. Monitoring as a Service. Infrastructure monitoring enables you to focus on incident and problem management. Perform fault and performance monitoring, asset management, patch management and vulnerability scanning.
  • MEF Global Interconnect Briefing

    MEF Global Interconnect Briefing

    The introductions of all IP technologies like EVDO, WCDMA R.5, LTE, and WiMAX require IP backhaul in addition to the legacy TDM/ATM. Ethernet is the right choice for growing data services. The introductions of All-IP technologies like EVDO, WCDMA R.5,...
  • DUE DATES - George Mason University

    DUE DATES - George Mason University

    After drying, decant the solution into a pre-weighed beaker. Experiment Notes Make sure the balance is displaying ±.001g Evaporate solvent using a hot plate UNDER THE HOOD. Do not recrystallize caffeine. Weigh the beaker and mostly dry caffeine. Place your...
  • Review: Uses of the Subjunctive

    Review: Uses of the Subjunctive

    The Purpose Clause. The Purpose clause is a subordinate use of the subjunctive expressing purpose or obligation. - "We come to class so that we may learn about Latin." A . purpose clause. answers the question "why" or "for what...
  • Eight Regions of the United States - Loudoun County Public ...

    Eight Regions of the United States - Loudoun County Public ...

    8 Regions of North America Coastal Range Basin and Range Rocky Mountains Great Plains Interior Lowlands Canadian Shield Appalachian Mountains Coastal Plains Coastal Range West of the Rocky Mountains Stretches from Canada to California Made up of Rugged Mountains and...
  • Azure Stream Analytics - cp-mlxprod-static.microsoft.com

    Azure Stream Analytics - cp-mlxprod-static.microsoft.com

    TIMESTAMP BY lets you designate a field in the data stream as the one that holds the event time -- the time at which the event actually occurred (as opposed to the time the event reached an event hub, which...
  • 2.6 Ionic Compounds: Chemical Formulas and Naming

    2.6 Ionic Compounds: Chemical Formulas and Naming

    Polyatomic Ionic compounds include either a cation or anion that contains multiple elements ('polyatomic' = multiple atoms). Some examples include Al 2 (SO 4) 3, (NH 4) 2. OTo determine the formula from the name, write the symbol for the...