How to handle large matrices

I have a general problem, when I want to work with large matrices. The calculation time of my script is around 2 weeks - far too much. I am sure, that the solution is quite simple for someone, who is more experienced with Matlab programming.
Problem with 2 matrices:
pc_raw = 23617489x12 double
fw_raw = 35631184x5 double
Now I want to find for each value in pc_raw(:,1) the corresponding value in fw_raw(:,1) and copy fw_raw(corr_value, 1:5) to pc_raw(corr_value, 8:12).
Example:
pc_raw = 1 1 3 3 4 5 6 7 8 9 10
fw_raw = 1 1 1 2 2 3 3 4 4 5 6 7 8 8 9 10
My script so far:
for k=1:len_pc_raw
pc_timestep = pc_raw(k,1);
NaN_test = isnan(pc_raw(k,8));
if NaN_test == 1
pc_temp = find(pc_raw(:,1)==pc_timestep);
fw_temp = find(fw_raw(:,1)==pc_timestep);
empty_test = isempty(fw_temp);
if empty_test == 0
len_pc_temp = length(pc_temp);
len_fw_temp = length(fw_temp);
if len_pc_temp <= len_fw_temp
for j=1:len_pc_temp
pc_echo = pc_temp(j); fw_echo = fw_temp(j);
pc_raw(pc_echo,8:12) = fw_raw(fw_echo,1:5);
end
else
for j=1:len_fw_temp
pc_echo = pc_temp(j); fw_echo = fw_temp(j);
pc_raw(pc_echo,8:12) = fw_raw(fw_echo,1:5);
end
end
end
end
end
Any idea how to speed up the calculation time? I would be very grateful for any comments!

7 Comments

Yur loop is not consistent. What if pc_temp has more entries than fw_temp?
Normally, the length of pc_temp is smaller than the length of fw_temp. pc_temp is more or less a subset of fw_temp. But I implemented a "backup", if the opposite will happen.
Are pc_raw(:,1) and fw_temp(:,1) always pre-sorted in ascending order as you have shown?
Yes, both are sorted in ascending order.
That will help a *lot*.
What do you want to have happen in the following cases:
1) Value in pc_raw(:,1) is not found in fw_raw(1,:)
2) Value in pc_raw(:,1) has multiple matches in fw_raw(1,:)
3) Multiple same values in pc_raw(:,1)
Once I know these answers I can post m-code and/or a mex routine that should work pretty fast.
for 1) skip the row in pc_raw
for 2) use the first (lowest) value in fw_raw
for 3) use all values in pc_raw and use all corresponding values in fw_raw (in ascending order again) for the length of min(len_pc_temp,len_fw_temp);

Sign in to comment.

 Accepted Answer

This should work, if you are guaranteed that every value of pc_raw(:,1) is in fact in fw_raw(:,1), and is unique. If not, you will need to do some more work.
[tf,loc] = ismember(pc_raw(:,1),fw_raw(:,1));
pc_raw(:,8:12) = fw_raw(loc,:);
EDIT IN RESPONSE TO COMMENTS:
Here is a fuller piece of code in which I have tried to be very explicit about what I am doing. I hope this helps.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This part of the code creates two sample input arrays
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
rng(2)
N = 3;
M = 7;
pc_raw = rand(N,12);
fw_raw = rand(M,5);
fw_raw(M+1,:) = fw_raw(M,:);
fw_raw(M+1,2) = fw_raw(M+1,2) + 0.1
index = randperm(M);
pc_raw(:,1) = fw_raw(index(1:N),1)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This part of the code implements the algorithm
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% First step:
% For each row of pc_raw, find the corresponding row of fw_raw
% that has the same first element
[tf,loc] = ismember(pc_raw(:,1),fw_raw(:,1))
% Second step:
% For every row of pc_raw, insert the all columns of
% fw_raw (for the correct row!) into columns 8:12 of pc_raw
pc_raw(:,8:12) = fw_raw(loc,:)

8 Comments

Unfortunately following may be could happen:
pc_raw = 3 3 3 4 5 6 7 8
fw_raw = 3 3 3 3 4 5 6 7
With "ismember", I just detect the first fw_raw "3" as member (and position 1) with a loc vetor like this as output:
loc = 1 1 1 5 6 7 8
And I updated the script, sry... I forget to show the detailed parts... :(
But thanks for the help until now!
I'm confused, in many ways. You said that pc_raw will be Nx12, and your fw_raw will be Mx5. You little test case here is not like that. So I would not expect my algorithm to work.
Here is the test code I wrote to mimic your problem. It behaves as I expect.
pc_raw = rand(23617489,12);
fw_raw = rand(7,5)
fw_raw = rand(35631184,5);
pc_raw(:,1) = fw_raw(1:23617489,1);
tic
[tf,loc] = ismember(pc_raw(:,1),fw_raw(:,1));
pc_raw(:,8:12) = fw_raw(loc,:);
toc
It took about 80 seconds on my machine.
(In that test code, I accidentally left in a line where I defined fw_raw as 7x5. You don't need that line.)
Here is another bit of test code. It shows that my algorithm works even if some elements of pc_raw(:,1) appear more than once in fw_raw(:,1). In that case, it will use the LAST row of fw_raw that matches.
rng(2)
N = 3;
M = 7;
pc_raw = rand(N,12);
fw_raw = rand(M,5);
fw_raw(M+1,:) = fw_raw(M,:);
fw_raw(M+1,2) = fw_raw(M+1,2) + 0.1
index = randperm(M);
pc_raw(:,1) = fw_raw(index(1:N),1)
[tf,loc] = ismember(pc_raw(:,1),fw_raw(:,1))
pc_raw(:,8:12) = fw_raw(loc,:)
Mhm, with this step...
pc_raw(:,1) = fw_raw(1:23617489,1);
... I will overwrite the original values in pc_raw(:,1). For my small example...
pc_raw = 3 3 3 4 5 6 7 8
fw_raw = 3 3 3 3 4 5 6 7
... the values in pc_raw(:,1) will change to:
pc_raw = 3 3 3 3 4 5 6 7
and I lose the connection between the indices of the first column (pc_raw(:,1) and the attributes in the following columns (pc_raw(:,2:12).
I will extend my example to visualize my problem more clear (just the first column - if the value is repeated (e.g. 3), the possible relations will be ranked in ascending order -> see script):
pc_raw(:,1) = 3 3 3 4 5 6 7 8 (input)
fw_raw(:,1) = 3 3 3 3 4 5 6 7 8 9 (input)
xx_raw(:,1) = 3 3 3 4 5 6 7 8 (output)
Not so easy to explain, sry for that. :(
In the part of my code that you mention, I am still just CREATING my "pretend" input. That is NOT part of the algorithm. The algorithm is ONLY the two lines in my original answer.
I don't fully understand what you want to happen for (1) repeated values in the first column of pc_raw, (2) repeated values in the first column of fw_raw, or (3) the same repeated value in both.
You might get some value out of first applying the unique() function to your first columns, which will identify the locations of the repeats.
case 1) repeated values in the first column of pc_raw & just one corresponding value in the first column of fw_raw
-> copy the one line of fw_raw to the first value of pc_raw
case 2) just one value in the first column of pc_raw & repeated corresponding values in the first column of fw_raw
-> copy the first line of fw_raw to the one value of pc_raw
case 3) repeated values in the first column of pc_raw & repeated corresponding values in the first column of fw_raw
-> copy the min(len_pc_itemp,len_fw_temp) lines of fw_raw to the corresponding values of pc_raw
I can use the unique() function before, but around 80 % of the values are repeated. So it is still a large matrix.
(by the way: big thanks for your admirable patience! I am really impressed - thanks a lot! :) )

Sign in to comment.

More Answers (1)

Here is some m-code for what I think you want:
m1 = size(pc_raw,1);
m2 = size(fw_raw,1);
k2 = 1;
for k1=1:m1
if( isnan(pc_raw(k1,8)) )
while( pc_raw(k1,1) > fw_raw(k2,1) )
k2 = k2 + 1;
if( k2 > m2 )
break;
end
end
if( k2 > m2 )
break;
end
if( pc_raw(k1,1) == fw_raw(k2,1) )
pc_raw(k1,8:12) = fw_raw(k2,1:5);
k2 = k2 + 1;
if( k2 > m2 )
break;
end
end
end
end
It relies heavily on the fact that the first column of each array is pre-sorted in ascending order as you stated.
And here is a mex implementation (CAUTION: No argument checking)
EDIT: 17-Feb-2012 changed location of mxUnshareArray
/* findrep(pc_raw,fw_raw); */
#include "mex.h"
#ifndef MWSIZE_MAX
#define mwSize int
#endif
void mxUnshareArray(mxArray *);
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize m1, m2, k1, k2;
double *Apr1, *Apr8, *Apr9, *Apr10, *Apr11, *Apr12;
double *Bpr1, *Bpr2, *Bpr3, *Bpr4, *Bpr5;
mxUnshareArray(prhs[0]);
m1 = mxGetM(prhs[0]);
m2 = mxGetM(prhs[1]);
Apr1 = mxGetPr(prhs[0]);
Bpr1 = mxGetPr(prhs[1]);
Apr8 = Apr1 + m1*7;
Apr9 = Apr8 + m1;
Apr10 = Apr9 + m1;
Apr11 = Apr10 + m1;
Apr12 = Apr11 + m1;
Bpr2 = Bpr1 + m2;
Bpr3 = Bpr2 + m2;
Bpr4 = Bpr3 + m2;
Bpr5 = Bpr4 + m2;
k2 = 0;
for( k1=0; k1<m1; k1++ ) {
if( mxIsNaN(Apr8[k1]) ) {
while( Apr1[k1] > Bpr1[k2] ) {
if( ++k2 == m2 ) {
return;
}
}
if( Apr1[k1] == Bpr1[k2] ) {
Apr8[k1] = Bpr1[k2];
Apr9[k1] = Bpr2[k2];
Apr10[k1] = Bpr3[k2];
Apr11[k1] = Bpr4[k2];
Apr12[k1] = Bpr5[k2];
if( ++k2 == m2 ) {
return;
}
}
}
}
}
To use the mex code, put it in a file on the MATLAB path, e.g. findrep.c, and then do this:
mex findrep.c
Call it without any output arguments, e.g.,
findrep(pc_raw,fw_raw);

1 Comment

This works perfect in all respects. Time requirement change from around 2 weeks to few minutes! Great help for me - thanks!!!
Reik

Sign in to comment.

Categories

Asked:

on 16 Feb 2012

Edited:

dpb
on 15 Oct 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!