Merging/Superimposing two tables without using loops

2 views (last 30 days)
I have two very large (200000x200) tables, Table_1 & Table_2. They have columns/variables that store different data types- strings, doubles, integers etc.
I want to create a third table, Table_3, that superimposes Table 2 over Table 1 if Table 1 has a missing value. i.e.
if Table_1(row,column)<>{''},
Table_3(row,column)=Table_1(row,column)
else
Table_3(row,column)=Table_2(row,column)
For example,
Table 1 Table 2 Table 3
Record Name Age Record Name Age Record Name Age
1 John 10 1 John 1 John 10
2 Daniela 2 Daniela 5 2 Daniela 5
3 Austin 30 3 Austin 30 3 Austin 30
4 45 4 Mark 45 4 Mark 45
I want to do this without using for loops because the time to traverse them is very very slow.
To consider:
  • Is there a way to do this using the fillmissing() function? This would be the most natural approach however, Table_3=fillmissing(Table_1,'Constant',Table_2) doesn't work. (The 2nd argument (Table_2) cannot be multidimensional.)
  • If this were a simple matrix, Table_3(Table_1=='')=Table_2 would've worked; but I can't seem to do this with tables.
Thanks, NT
  1 Comment
Nikhil Tawakley
Nikhil Tawakley on 17 Mar 2017
Thanks Peter- it's "like" a join but not quite- one typically does a join if there are extra variables to capture; these tables are identical in structure/size. Data in Table_1 is just more reliable than the other (but is comparatively sparsely populated).
I wan't to "superimpose" the two tables such that data from Table 1 is given priority over Table 2.
There could be situation where Mark's age in Table_1 is 85- in that case I would like the output to read:
Table 1 Table 2 Table 3
Record Name Age Record Name Age Record Name Age
4 85 4 Mark 45 4 Mark 85
There's gotta be a way to do this! Thanks again for looking into this.

Sign in to comment.

Accepted Answer

Peter Perkins
Peter Perkins on 16 Mar 2017
This isn't exactly a join, but it's close. It's not entirely clear to me what you want to do, for example, Mark is missing from reconrd 4 in table 1, but Age == 45 is not. However, in tables 2 and 3, where Mark is present, it's the same Age: 45. Is that always true? If so, I would think you could delete records from table 1 that had missing values, then outerjoin with table 2, then repeat for table 3.
Otherwise, maybe you're looking for some kind of setdiff operation.
  4 Comments
Peter Perkins
Peter Perkins on 20 Mar 2017
Edited: Peter Perkins on 20 Mar 2017
OK, I've been misinterpreting your example, and I didn't realize that "table 3" was you expected output. So after re-reading, I see what you want.
You say, "no loops", but I think you mean "no loops over rows." Here's a version that loops over variables, making use of ismissing. With only 200 vars, I don't think that will be an issue. I'm gonna assume that the two tables contain the same record numbers, if that's not true you would have to do some kind of join before this:
>> t1 = table( ...
[1;2;3;4],{'John';'Daniela';'Austin';''},[10;NaN;30;45], ...
'VariableNames',{'Record' 'Name' 'Age'})
t1 =
4×3 table
Record Name Age
______ _________ ___
1 'John' 10
2 'Daniela' NaN
3 'Austin' 30
4 '' 45
>> t2 = table( ...
[1;2;3;4],{'John';'Daniela';'Austin';'Mark'},[NaN;5;30;45], ...
'VariableNames',{'Record' 'Name' 'Age'})
t2 =
4×3 table
Record Name Age
______ _________ ___
1 'John' NaN
2 'Daniela' 5
3 'Austin' 30
4 'Mark' 45
>> isempty(setxor(t1.Record,t2.Record)) % check the assumption
ans =
logical
1
>> t1 = sortrows(t1,'Record'); % just in case
>> t2 = sortrows(t2,'Record');
>> t3 = t1;
>> missingValues = ismissing(t1);
>> for varIndex = 2:width(t1) % index by var number, not name
i = missingValues(:,varIndex);
t3.(varIndex)(i) = t2.(varIndex)(i);
end
>> t3
t3 =
4×3 table
Record Name Age
______ _________ ___
1 'John' 10
2 'Daniela' 5
3 'Austin' 30
4 'Mark' 45
Nikhil Tawakley
Nikhil Tawakley on 10 Apr 2017
Thanks for the detailed response, Peter; this was very helpful. I ended up "solving" the issue by looping over variables. It's still slow but fortunately it's acceptable :)

Sign in to comment.

More Answers (0)

Categories

Find more on Data Type Identification in Help Center and File Exchange

Tags

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!