- Process large text files to find unique words and their frequencies.
- Visually represent those word frequencies, there are thousands of unique words.
Large amount of text frequency representation visually
    8 views (last 30 days)
  
       Show older comments
    
I am working on text mining. Now i have some text files which contains millions of words. So i want to determine thier words frequncies. I have two probelms
- how to process large data in matlab for unique words findings and thier occurance for any text document(contains words in millions)
- after finding unique words and thier occurance how to represent them in circos/pi etc any graphical representation (as unique words can be in thousands)
0 Comments
Answers (1)
  Samayochita
 on 18 Jun 2025
        Hi moin khan,
I understand that while working on large-scale text mining in MATLAB, the goal is to: 
To efficiently process large text data in MATLAB:
Step 1: Read large files
Use memory-efficient reading using fileread or fopen and fscanf.
textData = fileread('largeTextFile.txt');  % Suitable for moderately large files
For very large files, prefer reading in chunks:
fid = fopen('largeTextFile.txt','r');
while ~feof(fid)
    line = fgetl(fid);
    % process line
end
fclose(fid);
Step 2: Tokenize text and clean it (optional but preferred)
Break the text into words, convert to lowercase, remove punctuation, etc.
cleanedText = lower(regexprep(textData, '[^\w\s]', ''));  % remove punctuation
words = split(cleanedText);  % tokenize
words = words(~cellfun('isempty',words));  % remove empty strings
Step 3: Count word frequencies
Use “unique” and “accumarray” functions OR “tabulate” function.
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
[uniqueWords, ~, idx] = unique(words);
counts = accumarray(idx, 1);
OR
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
T = tabulate(words)
Step 4: Visualize word frequencies using word cloud
Ideal to create a word cloud chart for hundreds or thousands of words.
wordcloud(uniqueWords, counts);
Please refer to the following documentation links for more information: 
 Hope this is helpful!
0 Comments
See Also
Categories
				Find more on Graph and Network Algorithms in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
