vaderSentimentScores

Sentiment scores with VADER algorithm

Syntax

compoundScores = vaderSentimentScores(documents)

compoundScores = vaderSentimentScores(documents,Name,Value)

[compoundScores,positiveScores,negativeScores,neutralScores] = vaderSentimentScores(___)

Description

Use vaderSentimentScores to evaluate sentiment in tokenized text with the Valence Aware Dictionary and sEntiment Reasoner (VADER) algorithm. The vaderSentimentScores function uses, by default, the VADER sentiment lexicon and modifier word lists.

The function supports English text only.

compoundScores = vaderSentimentScores(documents) returns sentiment scores for tokenized documents. The function calculates the compound scores by aggregating individual token scores, adjusted according to the algorithm rules and then normalized between -1 and 1. The function discards all tokens with a single character, not present in the sentiment lexicon.

example

compoundScores = vaderSentimentScores(documents,Name,Value) specifies additional options using one or more name-value pairs.

[compoundScores,positiveScores,negativeScores,neutralScores] = vaderSentimentScores(___) also returns the ratios for proportions of the documents which are positive, negative, and neutral, respectively, using any of the previous syntaxes.

example

Examples

collapse all

Evaluate Sentiment in Text

Open Live Script

Create a tokenized document.

str = [
    "The book was VERY good!!!!"
    "The book was not very good."];
documents = tokenizedDocument(str);

Evaluate the sentiment of the tokenized documents. Scores close to 1 indicate positive sentiment, scores close to -1 indicate negative sentiment, and scores close to 0 indicate neutral sentiment.

compoundScores = vaderSentimentScores(documents)

compoundScores = 2×1

    0.7264
   -0.3865

Evaluate Sentiment Using Custom Lexicon

Open Live Script

Sentiment analysis algorithms such as VADER rely on annotated lists of words called sentiment lexicons. For example, VADER uses a sentiment lexicon with words annotated with a sentiment score ranging from -4 to 4, where scores close to 4 indicate strong positive sentiment, scores close to -4 indicate strong negative sentiment, and scores close to zero indicate neutral sentiment.

To analyze the sentiment of text using the VADER algorithm, use the vaderSentimentScores function. If the sentiment lexicon used by the vaderSentimentScores function does not suit the data you are analyzing, for example, if you have a domain-specific data set like medical or engineering data, then you can use your own custom sentiment lexicon. For an example showing how to generate a domain specific sentiment lexicon, see Generate Domain Specific Sentiment Lexicon.

Create a tokenized document array containing the text data to analyze.

textData = [ 
    "This company is showing extremely strong growth."
    "This other company is accused of misleading consumers."];
documents = tokenizedDocument(textData);

Load the example domain specific lexicon for finance data.

filename = "financeSentimentLexicon.csv";
tbl = readtable(filename);
head(tbl)

        Token         SentimentScore
    ______________    ______________

    {'innovative'}             4    
    {'greater'   }        3.6216    
    {'efficiency'}        3.5971    
    {'enhance'   }        3.5628    
    {'better'    }        3.5532    
    {'creative'  }        3.5358    
    {'strengthen'}        3.5161    
    {'improved'  }         3.484

Evaluate the sentiment using the vaderSentimentScores function and specify the custom sentiment lexicon using the 'SentimentLexicon' option. Scores close to 1 indicate positive sentiment, scores close to -1 indicate negative sentiment, and scores close to 0 indicate neutral sentiment.

compoundScores = vaderSentimentScores(documents,'SentimentLexicon',tbl)

compoundScores = 2×1

    0.8762
   -0.1176

Input Arguments

collapse all

`documents` — Input documents
`tokenizedDocument` array

Input documents, specified as a tokenizedDocument array.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Boosters',["verry" "verrry"] specifies to use the boosters "verry" and "verrrry".

`SentimentLexicon` — Sentiment lexicon
table

Sentiment lexicon, specified as a table with these variables:

Token – Token, specified as a string scalar. The tokens must be lowercase.
SentimentScore – Sentiment score of token, specified as a numeric scalar in the range [-4, 4], where scores close to -4 indicate strong negative sentiment, scores close to 4 indicate strong positive sentiment, and scores close to 0 indicate neutral sentiment.

When evaluating sentiment, the software, by default, ignores tokens with one character and replaces emojis with an equivalent textual description before computing the sentiment scores. For example, the software replaces instances of the emoji "😀" with the text "grinning face" and then evaluates the sentiment scores. If you provide tokens with one character or emojis with corresponding sentiment scores in SentimentLexicon, then the function does not remove or replace these tokens.

The default sentiment lexicon is the VADER sentiment lexicon.

Data Types: table

`Boosters` — List of booster words or n-grams
string array

List of booster words or n-grams, specified as a string array.

The function uses booster n-grams to boost the sentiment of subsequent tokens. For example, words like "absolutely" and "amazingly".

For a list of words, the list must be a column vector. For a list of n-grams, the list has size NumNgrams-by-maxN , where NumNgrams is the number of n-grams, and maxN is the length of the largest n-gram. The (i,j)th element of the list is the jth word of the ith n-gram. If the number of words in the ith n-gram is less than maxN, then the remaining entries of the ith row of the list are empty.

The booster n-grams must be lowercase.

The default list of booster n-grams is the VADER booster list.

Data Types: string

`Dampeners` — List of dampener words or n-grams
string array

List of dampener words or n-grams, specified as a string array.

The function uses dampener n-grams to dampen the sentiment of subsequent tokens. For example, words like "hardly" and "somewhat".

The dampener n-grams must be lowercase.

The default list of dampener n-grams is the VADER dampener list.

Data Types: string

`Negations` — List of negation words
string array

List of negation words, specified as a string array.

The function uses negation words to negate the sentiment of subsequent tokens. For example, words like "not" and "isn't".

The negation words must be lowercase.

The default list of negation words is the VADER negation list.

Data Types: string

Output Arguments

collapse all

`compoundScores` — Compound sentiment scores
numeric vector

Compound sentiment scores, returned as a numeric vector. The function returns one score for each input document. The value compoundScores(i) corresponds to the compound sentiment score of documents(i).

The function determines the compound scores by aggregating individual token scores, adjusts them according to the VADER algorithm rules, and then normalizes them between -1 and 1.

`positiveScores` — Positive sentiment scores
numeric vector

Positive sentiment scores, returned as a numeric vector. The function returns one score for each input document. The value positiveScores(i) corresponds to the positive sentiment score of documents(i).

`negativeScores` — Negative sentiment scores
numeric vector

Negative sentiment scores, returned as a numeric vector. The function returns one score for each input document. The value negativeScores(i) corresponds to the negative sentiment score of documents(i).

`neutralScores` — Neutral sentiment scores
numeric vector

Neutral sentiment scores, returned as a numeric vector. The function returns one score for each input document. The value neutralScores(i) corresponds to the neutral sentiment score of documents(i).

References

[1] Hutto, C., and Eric Gilbert. “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text.” Proceedings of the International AAAI Conference on Web and Social Media 8, no. 1 (May 16, 2014): 216–25. https://doi.org/10.1609/icwsm.v8i1.14550.

Version History

Introduced in R2019b

vaderSentimentScores

Syntax

Description

Examples

Evaluate Sentiment in Text

Evaluate Sentiment Using Custom Lexicon

Input Arguments

`documents` — Input documents
`tokenizedDocument` array

Name-Value Arguments

`SentimentLexicon` — Sentiment lexicon
table

`Boosters` — List of booster words or n-grams
string array

`Dampeners` — List of dampener words or n-grams
string array

`Negations` — List of negation words
string array

Output Arguments

`compoundScores` — Compound sentiment scores
numeric vector

`positiveScores` — Positive sentiment scores
numeric vector

`negativeScores` — Negative sentiment scores
numeric vector

`neutralScores` — Neutral sentiment scores
numeric vector

References

Version History

See Also

Topics

vaderSentimentScores

Syntax

Description

Examples

Evaluate Sentiment in Text

Evaluate Sentiment Using Custom Lexicon

Input Arguments

documents — Input documents tokenizedDocument array

Name-Value Arguments

SentimentLexicon — Sentiment lexicon table

Boosters — List of booster words or n-grams string array

Dampeners — List of dampener words or n-grams string array

Negations — List of negation words string array

Output Arguments

compoundScores — Compound sentiment scores numeric vector

positiveScores — Positive sentiment scores numeric vector

negativeScores — Negative sentiment scores numeric vector

neutralScores — Neutral sentiment scores numeric vector

References

Version History

See Also

Topics

`documents` — Input documents
`tokenizedDocument` array

`SentimentLexicon` — Sentiment lexicon
table

`Boosters` — List of booster words or n-grams
string array

`Dampeners` — List of dampener words or n-grams
string array

`Negations` — List of negation words
string array

`compoundScores` — Compound sentiment scores
numeric vector

`positiveScores` — Positive sentiment scores
numeric vector

`negativeScores` — Negative sentiment scores
numeric vector

`neutralScores` — Neutral sentiment scores
numeric vector