*hi, I have similarity matrix with size 17770*17770. When process it get out of memory In fact, at first , I got this similarity matrix by segmenting the original matrix into 7 parts , each one with size 2500*17770, then collect these parts to get the final size. But, the next step , I can not process it partly because I want to make clustering for this similarity matrix. So, it is impossible processing it partly Is there a way to process this similarity matrix.* Thanks in advance

similarity matrix has very large size, how process it without segme...

Matt J on 26 Apr 2013

Edited: Matt J on 26 Apr 2013

How sparse is the matrix? How is it computed?

per isakson on 26 Apr 2013

data type?

huda nawaf on 27 Apr 2013

to Matt J,

17770*17770 is co_occurance matrix. I got it after segment it into 7 parts.

I have 17770 files , ecah file is movie has been seen by number of users. I were wanting to compute co_occurance of all movies, so compute this matrix partly, but when get it finally I would like to cluster the movies depending on co_occurance matrix.

thanks

huda nawaf on 27 Apr 2013

to per,

it is co_occurance matrix, it is integer.

Matt J on 27 Apr 2013

How sparse is the matrix?

Anand on 28 Apr 2013

What kind of processing do you want to do on it? If it doesn't fit into memory and you need to do neighborhood-like processing on it, write it to a tiff file and use something like blockproc on it.

Cedric on 28 Apr 2013

Use NNZ on a 2500x17770 block to tell how sparse it is, and if the number is significantly inferior to the product of sizes, go for a solution based on SPARSE matrices.

Walter Roberson on 28 Apr 2013

What is the maximum co-occurrence count? Could you represent it as int16 ? That would only be on the order of 600 Mb, which could probably be processed even on a 32 bit MS Windows system if /3G was in effect.

huda nawaf on 28 Apr 2013

Open in MATLAB Online

To Anand,

I try to cluster the co_occurance matrix using ward method(following code), but at the same time I used another menthod spectral clustering for Newman, I got out of memory too.

d=dlmread('d:\matlab\r2011a\bin\co_occurance_mov.txt');
     p=pdist(d,'euclidean');
No_cluster=4;
   L=linkage(p,'ward');
 cluster1=struct('user',0,'length',0);
  T{1} = cluster(L,'maxclust',No_cluster);
 for k=1:No_cluster
       z=find(T{1}==k);
       for j=1:length(z)
           cluster1(k).user(j)=z(j);
       end
       cluster1(k).length=length(z);
   end
   f0=fopen('d:\matlab\r2011a\bin\new_movielens\100k_mov\divid_seq\cluster2_ward.txt','w');
   for i=1:No_cluster
       for j=1:cluster1(i,1).length
           fprintf(f0,'%d ',cluster1(i,1).user(j));
       end
       fprintf(f0,'\n');
   end
  fclose all;

thanks

huda nawaf on 28 Apr 2013

to Cedric, when used NNZ for 17770%17770 , I got

undifined function or method 'NNZ' for input arguments of type'double'.

Walter, the max count is 232940, this is diagonal element , i.e it is occurance the item with itself.

thanks

Walter Roberson on 28 Apr 2013

nnz should be lower-case.

Cedric on 28 Apr 2013

Hi Huda, nnz should be applied to a variable that contains one of the 2500*17770 blocks of data. As Walter mentions, the function name is lower case (but some of us tend to write function names in upper case on the forum, to differentiate them from the rest of the text).

huda nawaf on 28 Apr 2013

nnz for co_occurance_mat. is 315077904

product size is 315772900

thanks

huda nawaf on 28 Apr 2013

Walter, I have to normalize the matrix befor clustering, so the max account will be 1

Cedric on 28 Apr 2013

Ok, it is dense. A full matrix of size 17770*17770 stored as a double (class/type) array takes a little more than 2.5GB. How much RAM do you have, and are you working on a 32 or 64bits system? If, for any reason, 2.5GB is too large for your system, you can either go on operating on blocks, or, as Walter mentions, work with a lower precision class/type of array (double is 8 bytes, and there are less precise, 4 or 2 bytes classes available).

huda nawaf on 28 Apr 2013

My system is 64bit, and 6 GB of RAM.

In this case , must I use blocks or what Walter suggested. if so, please give me an idea how use blocks or how work with a lower precision?

thanks in advance

Walter Roberson on 28 Apr 2013

Open in MATLAB Online

When you are initializing the integer co-occurrence matrix, instead of initializing it as zeros(17770,17770), initialize it as zeros(17770,17770,'int32').

Then when you want to normalize it, use

co_occurance_mat = single(co_occurance_mat) ./ single(max(co_occurance_mat(:)));

That might still cause you to run out of memory because of the temporary space needed to do the conversion and division. If it does, then probably the formation of the distance matrix during clustering would also run out of memory.

huda nawaf on 28 Apr 2013

Hi Cedric, why I have to use nnz with 2500*17770? we need to know the no. of non zero in total matrix.

Right?

Anyway, I want someone tell me how deal with blocks of matrix to make clustering for total matrix?

Walter Roberson on 28 Apr 2013

We needed to see the result of nnz to know whether the matrix was sparse or dense. It turns out to be dense, so the idea of using sparse calculations to save memory will not work.

Matt J on 28 Apr 2013

Anyway, I want someone tell me how deal with blocks of matrix to make clustering for total matrix?

That question becomes unnecessary if it turns out that the majority of your matrix elements are zeros. In that case, you don't have to break the matrix into blocks. You would use the SPARSE command to make the entire matrix fit into memory. Since you seem unaware of SPARSE and what it does, the others want to make sure you consider it before proceeding.

Walter Roberson on 28 Apr 2013

Open in MATLAB Online

It appears to me that you could save memory during the clustering by not using pdist yourself, and instead use

L = linkage(d, 'ward', 'euclidean', 'savememory', 'on');

huda nawaf on 29 Apr 2013

Open in MATLAB Online

Walter,

ward did cluster when I used : L = linkage(d, 'ward', 'euclidean', 'savememory', 'on');

But ,I can not predicate the running time ,maybe 4-5 hours. anyway, it is not important the running time becuase I run it one time.

you resolved big problem , many many thanks.

Walter, If I want use spectral clustering instead of ward to show the difference betwen them in terms of clustering. earlier I faced the same problem (out of memory) wth spectral clustering. what I have to change in following code.in the following function call to other function, but the out of memory happen befor calling the other function

    sim=dlmread('d:\matlab\r2011a\bin\netflix\combain_arrays_sim\sim2_norm.txt');
     [p o]=size(sim)
    for i=1:p
         x=sim(i,:);
         x=x(x~=0);
        deg(i)=length(x);
     end
             total_edg=sum(deg)/2
    %%%%%compute the modularity matrx
    B=sim-((deg'*deg)/(2*total_edg));
    '%%%compute eignvalue and eignvector'
    [U Beta]=eig(B);
    Beta1=diag(Beta);
    [Beta1 ind]=sort(Beta1,'descend');
    if Beta1(1)>0
           bb=find(U(:,ind(1))>0);
           for i=1:length(bb)
            s(bb(i))=1;
           end
           bb1=find(U(:,ind(1))<=0);
           for j=1:length(bb1)
            s(bb1(j))=-1;
           end
       v=s*B*s'
               % if v>0
                 ' %%%divide the eignvector into two groups'
                    if sum(s)~=length(s)&& sum(s)~=-length(s)
                             k=1;k1=1;
                           for j=1:length(s)
                               if s(j)>0
                                   for j1=1:o
                                   Grp_1(k,j1)=B(j,j1);
                                   trac(k)=j;
                                   end
                                   k=k+1;
                               else
                                   for j2=1:o
                                   Grp_2(k1,j2)=B(j,j2);
                                   trac1(k1)=j;
                                   end
                                   k1=k1+1;
                               end
                           end
                           tt=[trac1(1:length(trac1))];
                           Grp_1(:,tt)=[];
                           tt1=[trac(1:length(trac))];
                            Grp_2(:,tt1)=[];
                           hh=sum(Grp_1');
         [p o]=size(Grp_1');
         for i=1:p
             for j=1:o
                 if i==j
                     B_updat(i,j)=Grp_1(i,j)-hh(i);
                 else
                     B_updat(i,j)=Grp_1(i,j);
                 end
             end
         end
         hh1=sum(Grp_2');
         [p o]=size(Grp_2);
         for i=1:p
             for j=1:o
                 if i==j
                     B1_updat(i,j)=Grp_2(i,j)-hh1(i);
                 else
                     B1_updat(i,j)=Grp_2(i,j);
                 end
             end
         end
         itr=1
         nn=(s*B*s')/2;
                            z=0;
                            Divide1_2(B_updat,z,trac,itr,nn);
                           Divide1_2(B1_updat,z,trac1,itr,nn);
                else
                    'the network is indivisible because s is indivisible'
                    return;
                end
    else
      'the network is indivisible because Beta1<0'
    end
    fclose all

Aishwarya Iyengar on 9 Jul 2020

@huda nawaf

I have a question :

How to create a similarity matrix for 300x300 images ?

please Help... Thanks in advance.

similarity matrix has very large size, how process it without segmenting it?

23 Comments
Show 21 older comments Hide 21 older comments

Answers (0)

Categories

Tags

Community Treasure Hunt

similarity matrix has very large size, how process it without segmenting it?

23 Comments Show 21 older comments Hide 21 older comments

Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

23 Comments
Show 21 older comments Hide 21 older comments