Appending to the field of a structure array

Hello, I am trying to create a word index. I start off with an empty cell array with 3 fields: Word, Documents, and Locations. For now ignore the latter two. I have a cell array with words
Doc1 = {'Matlab','is','awesome'};
To avoid confusion, there are other documents that have the same word. I want to take my Index, which I created a function for here
function Index = InitializeIndex()
c10 = cell(1,0);
Index = struct('Word', c10, 'Documents', c10, 'Locations', c10);
I want to add the unique words into Index, so here is my function.
function Index = InsertDoc(Index, newDoc, DocNum)
% This function will be a struct array where each element corresponds to a
% unique word in a group of documents. In each element of the struct array
% the word is stored in the Word field, the document numbers that the word
% is contained is in the documents field, and the locations of the word in
% each document is in the Location field.
Index = {Index.Word};
for i = 1:numel(newDoc)
% IndexWord is either empty or the word is not present in IndexWord
if isempty(Index) || strcmpi(Index{i},newDoc(i))
Index.Word{end+1} = newDoc(i);
end
end
My problem is twofold. First, I am having difficulty with my condition regarding the word being unique in index. How do I make it so that it knows if the word does not exist in index, then append? The second question is how do I actually append the word into the word field of Index?

Answers (2)

Assuming that you want Index to be a single struct with fields Words/Documents/Locations (each of the fields being a cell array), then you could do something along these lines:
UniqueWordsInDoc = unique(newDoc); % unique words
in = ismember(UniqueWordsInDoc,Index.Word); % words already in Index
idx = numel(Index.Word)+(1:nnz(~in)); % new Index entries
Index.Word(idx) = UniqueWordsInDoc(~in); % adds new words
If, on the other hand, you want Index to be a struct array with fields Words/Documents/Locations (each of the fields being a string or vector), then you could do something along these lines:
UniqueWordsInDoc = unique(newDoc); % unique words
in = ismember(UniqueWordsInDoc,{Index.Word}); % words already in Index
idx = numel(Index)+(1:nnz(~in)); % new Index entries
[Index(idx).Word] = deal(UniqueWordsInDoc{~in}); % adds new words
In the former case you initialize using:
Index = struct('Word',{{}});
while in the latter you would initialize Index using:
Index = struct('Word',{});
I hope this clarifies the "indexing" issues, this can be kind of tricky...
EDIT1: added correction by Cedric
EDIT2: "concatenating" versions, something along these lines:
in case 1:
UniqueWordsInDoc = unique(newDoc); % unique words
in = ismember(UniqueWordsInDoc,Index.Word); % words already in Index
Index.Word = [Index.Word UniqueWordsInDoc(~in)]; % adds new words
in case 2:
UniqueWordsInDoc = unique(newDoc); % unique words
in = ismember(UniqueWordsInDoc,{Index.Word}); % words already in Index
Index = [Index cell2struct(UniqueWordsInDoc(~in),'Word')']; % adds new words

7 Comments

Rick
Rick on 5 Jul 2014
Edited: Rick on 5 Jul 2014
Index has already been defined in my first function, also we never learned what 'in' is. I can't really use your code because I don't understand it.
in is a logic array the same length as UniqueWordsInDoc indicating which among these words already exist in Index.
If you initialize the variable Index as in your InitializeIndex() example, then you would be in the "latter" case, but the way you were attempting to append in your InsertDoc example might indicate that you expected to be in the "former" case, hence the two options in case they helped clarify the differences (and no big deal, just disregard this answer if it does not help)
I just had a quick look, but in the first case, you probably wanted to write
idx = numel(Index.Word)+..
Is it more efficient to index explicitly extra cells than to concatenate?
Thanks Cedric, you are totally right, I just edited that.
And regarding efficiency I believe they are just about the same, but I am not totally sure (I just run a few tests and I do not see any differences; I believe the main bottleneck there is probably in resizing/relocating in memory the new variable, so if the compiler is relatively smart it should probably end up doing the same thing in both cases).
In any way, in this case I only used explicit indexing for the new fields because I wanted to emphasize the similarities between (keep the same strategy for) the two cases, and appending for the second case might look a bit more complicated (e.g. use cell2struct and then append that to Index?)...
Cedric
Cedric on 5 Jul 2014
Edited: Cedric on 5 Jul 2014
Ok that makes sense; I asked in case I had missed something important (like the multiplication by ok in the other thread, that I would never had thought of!).
Here is my code right now
function Index = InsertDoc(Index, newDoc, DocNum)
% This function will be a struct array where each element corresponds to a
% unique word in a group of documents. In each element of the struct array
% the word is stored in the Word field, the document numbers that the word
% is contained is in the documents field, and the locations of the word in
% each document is in the Location field.
for i = 1:numel(newDoc)
% IndexWord is either empty or the word is not present in IndexWord
if isempty(Index)|| strcmpi({Index.Word},newDoc{i})
Index(end + 1).Word = newDoc{i};
end
end
Here is my input
Doc1 = {'Matlab','is','awesome'};
E7 = InitializeIndex;
E7 = InsertDoc(E7,Doc1,1);
and my output was not what I expected. I expected E7(2) to be 'is'.
EDU>>E7(1)
ans =
Word: 'Matlab'
Documents: []
Locations: []
EDU>> E7(2)
Index exceeds matrix dimensions.
change
strcmpi({Index.Word},newDoc{i})
to
~any(strcmpi({Index.Word},newDoc{i}))

Sign in to comment.

For the first part, use the ismember() command. For the second part, you can just append using
new_list = {old_list,new_word};

3 Comments

for the appending, do you mean this?
Index(i).Word = {{Index.Word},newDoc(i)};
Actually, I think I misunderstood what you meant. If you already had
Index(1).Word = 'cat';
then you can append with
Index(end+1).Word = 'dog';
So do you mean this?? I got rid of Index = {Index.Word} because that is overwriting my function InitializeIndex
for i = 1:numel(newDoc)
% IndexWord is either empty or the word is not present in IndexWord
if isempty(Index)|| strcmpi({Index.Word},newDoc{i})
Index(end + 1).Word = newDoc{i};
end
end
I get the following problem. When I type Index(2).Word, I get 'Index exceeds matrix dimensions.'

Sign in to comment.

Categories

Asked:

on 5 Jul 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!