wordcount2

Version 1.0.0.0 (2.81 KB) by Lee White
Counts the unique words in a text file and return the frequency of those words.
772 Downloads
Updated 9 Aug 2012

View License

NAME: Wordcount2 (Written on Apr 8, 2008, corrected August 8, 2012)

AUTHOR: Suri Like ( surilike@gmail.com )
Lee White (lwhite4@gmail.com)

PURPOSE:
This function reads the alphanumeric words (e.g. Finance, recycle, M16) from a plain text document (.txt) and displays the most frequently used words in the document. For example, after processing a document containing pizza recipes, I got the following output from this function:

'WORD' 'FREQ' 'REL. FREQ'
'dough' [ 170] '1.1336%'
'flour' [ 84] '0.5601%'
'oven' [ 70] '0.4668%'
'pizza' [ 49] '0.3268%'
'sauce' [ 47] '0.3134%'
'cheese' [ 39] '0.2601%'

The first column consists of the most frequently used words in this document. The second column consists of the frequency of the word (i.e. the number of times that word appeared in the document). The last column contains the relative frequency of the word, which is simply the frequency of the word divided by the total number of words in the document. This function might be useful for statistical purposes such as studying the writing habits of a particular author. Please note that the words are case-sensitive, which means 'Great' and 'great' are treated as two different words.

INPUTS:
The first input, 'filename', is simply the name of the text file. The second input, 'num', is the number of words you want to have the function display. For example, if you only want to see the top 10 most frequenly used words, simply set 'num' to 10. Therefore, if the number of words used more than once is less than the value of 'num', only those words will be displayed and you will see less words in the output than you specified.

OUTPUT:
The output, 'results', simply shows a table that looks like the output in the pizza recipe example described above. 'unique_words' is a struct of all the unique words in the files for verification. 'frequencies' is the frequencies of those words.

HOW TO USE:
Say you want to find out the most frequenly used words in a article you found on the web. Simply copy that article and paste it into Notepad. Save the text file with whatever name you want (e.g. 'article.txt'). Then navigate to the directory containing the text file in Matlab and type: results = wordcount('article.txt', 10) to see the top 10 most frequently used words in article.txt.

NOTES:
fopen used by this program is sensetive to the character encoding of your source file. When possible use ANSI encoding. If you run into problems verify matlab is accurately reading in your file by examining the returned unique words cell array.

Cite As

Lee White (2024). wordcount2 (https://www.mathworks.com/matlabcentral/fileexchange/37768-wordcount2), MATLAB Central File Exchange. Retrieved .

MATLAB Release Compatibility
Created with R2012a
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Version Published Release Notes
1.0.0.0