similarity matrix has very large size, how process it without segmenting it?
Show older comments
*hi,
I have similarity matrix with size 17770*17770. When process it get out of memory
In fact, at first , I got this similarity matrix by segmenting the original matrix into 7 parts , each one with size 2500*17770, then collect these parts to get the final size. But, the next step , I can not process it partly because I want to make clustering for this similarity matrix. So, it is impossible processing it partly
Is there a way to process this similarity matrix.*
Thanks in advance
23 Comments
per isakson
on 26 Apr 2013
data type?
huda nawaf
on 27 Apr 2013
huda nawaf
on 27 Apr 2013
Matt J
on 27 Apr 2013
How sparse is the matrix?
Cedric
on 28 Apr 2013
Use NNZ on a 2500x17770 block to tell how sparse it is, and if the number is significantly inferior to the product of sizes, go for a solution based on SPARSE matrices.
Walter Roberson
on 28 Apr 2013
What is the maximum co-occurrence count? Could you represent it as int16 ? That would only be on the order of 600 Mb, which could probably be processed even on a 32 bit MS Windows system if /3G was in effect.
huda nawaf
on 28 Apr 2013
huda nawaf
on 28 Apr 2013
Walter Roberson
on 28 Apr 2013
nnz should be lower-case.
Cedric
on 28 Apr 2013
Hi Huda, nnz should be applied to a variable that contains one of the 2500*17770 blocks of data. As Walter mentions, the function name is lower case (but some of us tend to write function names in upper case on the forum, to differentiate them from the rest of the text).
huda nawaf
on 28 Apr 2013
huda nawaf
on 28 Apr 2013
Cedric
on 28 Apr 2013
Ok, it is dense. A full matrix of size 17770*17770 stored as a double (class/type) array takes a little more than 2.5GB. How much RAM do you have, and are you working on a 32 or 64bits system? If, for any reason, 2.5GB is too large for your system, you can either go on operating on blocks, or, as Walter mentions, work with a lower precision class/type of array (double is 8 bytes, and there are less precise, 4 or 2 bytes classes available).
huda nawaf
on 28 Apr 2013
Walter Roberson
on 28 Apr 2013
When you are initializing the integer co-occurrence matrix, instead of initializing it as zeros(17770,17770), initialize it as zeros(17770,17770,'int32').
Then when you want to normalize it, use
co_occurance_mat = single(co_occurance_mat) ./ single(max(co_occurance_mat(:)));
That might still cause you to run out of memory because of the temporary space needed to do the conversion and division. If it does, then probably the formation of the distance matrix during clustering would also run out of memory.
huda nawaf
on 28 Apr 2013
Walter Roberson
on 28 Apr 2013
We needed to see the result of nnz to know whether the matrix was sparse or dense. It turns out to be dense, so the idea of using sparse calculations to save memory will not work.
Matt J
on 28 Apr 2013
Anyway, I want someone tell me how deal with blocks of matrix to make clustering for total matrix?
That question becomes unnecessary if it turns out that the majority of your matrix elements are zeros. In that case, you don't have to break the matrix into blocks. You would use the SPARSE command to make the entire matrix fit into memory. Since you seem unaware of SPARSE and what it does, the others want to make sure you consider it before proceeding.
Walter Roberson
on 28 Apr 2013
It appears to me that you could save memory during the clustering by not using pdist yourself, and instead use
L = linkage(d, 'ward', 'euclidean', 'savememory', 'on');
huda nawaf
on 29 Apr 2013
Aishwarya Iyengar
on 9 Jul 2020
@huda nawaf
I have a question :
How to create a similarity matrix for 300x300 images ?
please Help... Thanks in advance.
Answers (0)
Categories
Find more on Graph and Network Algorithms in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!