wordcloud
Create word cloud chart from text data
Syntax
Description
wordcloud(___,
specifies additional Name,Value
)WordCloudChart
properties using one or
more name-value pair arguments.
wordcloud(
creates the word cloud in the figure, panel, or tab specified by
parent
,___)parent
.
returns the wc
= wordcloud(___)WordCloudChart
object. Use wc
to modify properties of the word cloud after creating it. For a list of
properties, see WordCloudChart Properties.
Note
Text Analytics Toolbox extends the functionality of the wordcloud
(MATLAB®) function. It adds support for creating word clouds directly from string arrays,
and creating word clouds from bag-of-words models, bag-of-n-gram models, and LDA
topics. For the wordcloud
(Text Analytics Toolbox) reference page, see wordcloud
(Text Analytics Toolbox).
Examples
Create Word Cloud from Table
Load the example data sonnetsTable
. The table tbl
contains a list of words in the variable Word
, and the corresponding frequency counts in the variable Count
.
load sonnetsTable
head(tbl)
Word Count ___________ _____ {'''tis' } 1 {''Amen'' } 1 {''Fair' } 2 {''Gainst'} 1 {''Since' } 1 {''This' } 2 {''Thou' } 1 {''Thus' } 1
Plot the table data using wordcloud
. Specify the words and corresponding word sizes to be the Word
and Count
variables respectively.
figure wordcloud(tbl,'Word','Count'); title("Sonnets Word Cloud")
Prepare Text Data for Word Clouds
If you have Text Analytics Toolbox™ installed, then you can create word clouds directly from string arrays. For more information, see wordcloud
(Text Analytics Toolbox). If you do not have Text Analytics Toolbox, then you must preprocess the text data manually.
This example shows how to create a word cloud from plain text by reading it into a string array, preprocessing it, and passing it to the wordcloud
function.
Read the text from Shakespeare's Sonnets with the fileread
function and convert it to string.
sonnets = string(fileread("sonnets.txt")); extractBefore(sonnets,"II")
ans = "THE SONNETS by William Shakespeare I From fairest creatures we desire increase, That thereby beauty's rose might never die, But as the riper should by time decease, His tender heir might bear his memory: But thou, contracted to thine own bright eyes, Feed'st thy light's flame with self-substantial fuel, Making a famine where abundance lies, Thy self thy foe, to thy sweet self too cruel: Thou that art now the world's fresh ornament, And only herald to the gaudy spring, Within thine own bud buriest thy content, And tender churl mak'st waste in niggarding: Pity the world, or else this glutton be, To eat the world's due, by the grave and thee. "
Split sonnets
into a string array whose elements contain individual words. To do this, remove the punctuation characters and join all the string elements into a 1-by-1 string and then split on the space characters. Then, remove words with fewer than five characters and convert the words to lowercase.
punctuationCharacters = ["." "?" "!" "," ";" ":"]; sonnets = replace(sonnets,punctuationCharacters," "); words = split(join(sonnets)); words(strlength(words)<5) = []; words = lower(words); words(1:10)
ans = 10x1 string
"sonnets"
"william"
"shakespeare"
"fairest"
"creatures"
"desire"
"increase"
"thereby"
"beauty's"
"might"
Convert sonnets
to a categorical array and then plot using wordcloud
. The function plots the unique elements of C
with sizes corresponding to their frequency counts.
C = categorical(words);
figure
wordcloud(C);
title("Sonnets Word Cloud")
Specify Word Sizes
Create a word cloud from plain text by reading it into a string array, preprocessing it, and passing it to the wordcloud
function.
Read the text from Shakespeare's Sonnets with the fileread
function and convert it to string.
sonnets = string(fileread('sonnets.txt')); extractBefore(sonnets,"II")
ans = "THE SONNETS by William Shakespeare I From fairest creatures we desire increase, That thereby beauty's rose might never die, But as the riper should by time decease, His tender heir might bear his memory: But thou, contracted to thine own bright eyes, Feed'st thy light's flame with self-substantial fuel, Making a famine where abundance lies, Thy self thy foe, to thy sweet self too cruel: Thou that art now the world's fresh ornament, And only herald to the gaudy spring, Within thine own bud buriest thy content, And tender churl mak'st waste in niggarding: Pity the world, or else this glutton be, To eat the world's due, by the grave and thee. "
Split sonnets
into a string array whose elements contain individual words. To do this, remove the punctuation characters and join all the string elements into a 1-by-1 string and then split on the space characters. Then, remove words with fewer than five characters and convert the words to lowercase.
punctuationCharacters = ["." "?" "!" "," ";" ":"]; sonnets = replace(sonnets,punctuationCharacters," "); words = split(join(sonnets)); words(strlength(words)<5) = []; words = lower(words); words(1:10)
ans = 10×1 string
"sonnets"
"william"
"shakespeare"
"fairest"
"creatures"
"desire"
"increase"
"thereby"
"beauty's"
"might"
Find the unique words in sonnets
and count their frequency. Create a word cloud using the frequency counts as size data.
[numOccurrences,uniqueWords] = histcounts(categorical(words));
figure
wordcloud(uniqueWords,numOccurrences);
title("Sonnets Word Cloud")
Specify Word Colors
Load the example data sonnetsTable
. The table tbl
contains a list of words in the Word
variable, and corresponding frequency counts in the Count
variable.
load sonnetsTable
head(tbl)
Word Count ___________ _____ {'''tis' } 1 {''Amen'' } 1 {''Fair' } 2 {''Gainst'} 1 {''Since' } 1 {''This' } 2 {''Thou' } 1 {''Thus' } 1
Plot the table data using wordcloud
. Specify the words and corresponding word sizes to be the Word
and Count
variables respectively. To set the word colors to random values, set 'Color'
to a random matrix or RGB triplets with one row for each word.
numWords = size(tbl,1); colors = rand(numWords,3); figure wordcloud(tbl,'Word','Count','Color',colors); title("Sonnets Word Cloud")
Create Word Cloud Using Text Analytics Toolbox
If you have Text Analytics Toolbox installed, then you can create word clouds directly from string arrays. If you do not have Text Analytics Toolbox, then you must preprocess the text data manually. For an example showing how to create a word cloud without Text Analytics Toolbox, see Prepare Text Data for Word Clouds.
Extract the text from sonnets.txt
using
extractFileText
.
str = extractFileText("sonnets.txt"); extractBefore(str,"II")
ans = "THE SONNETS by William Shakespeare I From fairest creatures we desire increase, That thereby beauty's rose might never die, But as the riper should by time decease, His tender heir might bear his memory: But thou, contracted to thine own bright eyes, Feed'st thy light's flame with self-substantial fuel, Making a famine where abundance lies, Thy self thy foe, to thy sweet self too cruel: Thou that art now the world's fresh ornament, And only herald to the gaudy spring, Within thine own bud buriest thy content, And tender churl mak'st waste in niggarding: Pity the world, or else this glutton be, To eat the world's due, by the grave and thee. "
Display the words from the sonnets in a word cloud.
figure wordcloud(str);
Input Arguments
wordVar
— Table variable for word data
string scalar | character vector | numeric index | logical vector
Table variable for word data, specified as a string scalar, character vector, numeric index, or a logical vector.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| logical
| char
| string
sizeVar
— Table variable for size data
string scalar | character vector | numeric index | logical vector
Table variable for size data, specified as a string scalar, character vector, numeric index, or a logical vector.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| logical
| char
| string
C
— Input categorical data
categorical array
Input categorical data, specified as a categorical array. The function plots each unique element of C
with size corresponding to histcounts(C)
.
Data Types: categorical
words
— Input words
string vector | cell array of character vectors
Input words, specified as a string vector or cell array of character vectors.
Data Types: string
| cell
sizeData
— Word size data
numeric vector
Word size data, specified as a numeric vector.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
parent
— Parent container
Figure
object | Panel
object | Tab
object | TiledChartLayout
object | GridLayout
object
Parent container, specified as a Figure
, Panel
,
Tab
, TiledChartLayout
, or GridLayout
object.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: 'HighlightColor','red'
sets the highlight color to
red.
The WordCloudChart
properties listed here are only a subset. For
a complete list, see WordCloudChart Properties.
MaxDisplayWords
— Maximum number of words to display
100 (default) | nonnegative integer
Maximum number of words to display, specified as a non-negative integer. The software displays
the MaxDisplayWords
largest words.
Color
— Word color
[0.3804 0.3804 0.3804]
(default) | RGB triplet | character vector containing a color name | matrix
Word color, specified as an RGB triplet, a character vector containing a color name,
or an N
-by-3 matrix where N
is the length of
WordData
. If Color
is a matrix, then each
row corresponds to an RGB triplet for the corresponding word in
WordData
.
RGB triplets and hexadecimal color codes are useful for specifying custom colors.
An RGB triplet is a three-element row vector whose elements specify the intensities of the red, green, and blue components of the color. The intensities must be in the range
[0,1]
; for example,[0.4 0.6 0.7]
.A hexadecimal color code is a character vector or a string scalar that starts with a hash symbol (
#
) followed by three or six hexadecimal digits, which can range from0
toF
. The values are not case sensitive. Thus, the color codes"#FF8800"
,"#ff8800"
,"#F80"
, and"#f80"
are equivalent.
Alternatively, you can specify some common colors by name. This table lists the named color options, the equivalent RGB triplets, and hexadecimal color codes.
Color Name | Short Name | RGB Triplet | Hexadecimal Color Code | Appearance |
---|---|---|---|---|
"red" | "r" | [1 0 0] | "#FF0000" | |
"green" | "g" | [0 1 0] | "#00FF00" | |
"blue" | "b" | [0 0 1] | "#0000FF" | |
"cyan"
| "c" | [0 1 1] | "#00FFFF" | |
"magenta" | "m" | [1 0 1] | "#FF00FF" | |
"yellow" | "y" | [1 1 0] | "#FFFF00" | |
"black" | "k" | [0 0 0] | "#000000" | |
"white" | "w" | [1 1 1] | "#FFFFFF" |
Here are the RGB triplets and hexadecimal color codes for the default colors MATLAB uses in many types of plots.
RGB Triplet | Hexadecimal Color Code | Appearance |
---|---|---|
[0 0.4470 0.7410] | "#0072BD" | |
[0.8500 0.3250 0.0980] | "#D95319" | |
[0.9290 0.6940 0.1250] | "#EDB120" | |
[0.4940 0.1840 0.5560] | "#7E2F8E" | |
[0.4660 0.6740 0.1880] | "#77AC30" | |
[0.3010 0.7450 0.9330] | "#4DBEEE" | |
[0.6350 0.0780 0.1840] | "#A2142F" |
Example: 'blue'
Example: [0 0 1]
HighlightColor
— Word highlight color
[0.7529 0.2980 0.0431]
(default) | RGB triplet | character vector containing a color name
Word highlight color, specified as an RGB triplet, or a character vector containing a color name. The software highlights the largest words with this color.
RGB triplets and hexadecimal color codes are useful for specifying custom colors.
An RGB triplet is a three-element row vector whose elements specify the intensities of the red, green, and blue components of the color. The intensities must be in the range
[0,1]
; for example,[0.4 0.6 0.7]
.A hexadecimal color code is a character vector or a string scalar that starts with a hash symbol (
#
) followed by three or six hexadecimal digits, which can range from0
toF
. The values are not case sensitive. Thus, the color codes"#FF8800"
,"#ff8800"
,"#F80"
, and"#f80"
are equivalent.
Alternatively, you can specify some common colors by name. This table lists the named color options, the equivalent RGB triplets, and hexadecimal color codes.
Color Name | Short Name | RGB Triplet | Hexadecimal Color Code | Appearance |
---|---|---|---|---|
"red" | "r" | [1 0 0] | "#FF0000" | |
"green" | "g" | [0 1 0] | "#00FF00" | |
"blue" | "b" | [0 0 1] | "#0000FF" | |
"cyan"
| "c" | [0 1 1] | "#00FFFF" | |
"magenta" | "m" | [1 0 1] | "#FF00FF" | |
"yellow" | "y" | [1 1 0] | "#FFFF00" | |
"black" | "k" | [0 0 0] | "#000000" | |
"white" | "w" | [1 1 1] | "#FFFFFF" |
Here are the RGB triplets and hexadecimal color codes for the default colors MATLAB uses in many types of plots.
RGB Triplet | Hexadecimal Color Code | Appearance |
---|---|---|
[0 0.4470 0.7410] | "#0072BD" | |
[0.8500 0.3250 0.0980] | "#D95319" | |
[0.9290 0.6940 0.1250] | "#EDB120" | |
[0.4940 0.1840 0.5560] | "#7E2F8E" | |
[0.4660 0.6740 0.1880] | "#77AC30" | |
[0.3010 0.7450 0.9330] | "#4DBEEE" | |
[0.6350 0.0780 0.1840] | "#A2142F" |
Example: 'blue'
Example: [0 0 1]
Shape
— Shape of word cloud
'oval'
(default) | 'rectangle'
Shape of word cloud chart, specified as 'oval'
or 'rectangle'
.
Example: 'rectangle'
LayoutNum
— Word placement layout
1 (default) | nonnegative integer
Word placement layout, specified as a nonnegative integer. If you repeatedly call wordcloud
with the same inputs, then the word placement layouts will be the same each time. To get different word placement layouts, use different values of LayoutNum
.
Output Arguments
wc
— WordCloudChart
object
WordCloudChart
object
WordCloudChart
object. You can modify the properties of a
WordCloudChart
after it is created. For more information, see
WordCloudChart Properties.
Tips
Text Analytics Toolbox extends the functionality of the wordcloud
(MATLAB) function. It adds support for creating word clouds directly from string arrays,
and creating word clouds from bag-of-words models, bag-of-n-gram models, and LDA
topics. For the wordcloud
(Text Analytics Toolbox) reference page, see wordcloud
(Text Analytics Toolbox).
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
The
wordcloud
function supports tall arrays with the following usage
notes and limitations:
The syntax
wc = wordcloud(str)
, wherestr
is a string array, character vector, or cell array of character vectors (these inputs require Text Analytics Toolbox) is not supported.When the
words
andsizedata
inputs are provided as tall arrays, then they are gathered into memory and thus, must fit into memory.
Version History
Introduced in R2017b
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)