Clear Filters
Clear Filters

parsing information between two tags in matlab

3 views (last 30 days)
Ahmed
Ahmed on 18 Feb 2016
Edited: Ahmed on 18 Feb 2016
I am trying to parse nodes and edges information from Xml file. Here is a part from XML file:
<node id="1677" label="O60711">
<att name="shared name" value="O60711" type="string"/>
<att name="name" value="O60711" type="string"/>
<att name="selected" value="0" type="boolean"/>
<att name="Taxonomy ID" value="9606" type="string"/>
<att name="Taxonomy Name" value="human" type="string"/>
<att name="Human Readable Label" value="LPXN" type="string"/>
<att name="uniprotkb_accession" value="O60711" type="string"/>
<att name="uniprot" type="list">
<att name="uniprot" value="O60711" type="string"/>
<att name="uniprot" value="B4DV71" type="string"/>
<att name="uniprot" value="LPXN" type="string"/>
<att name="uniprot" value="Q53FW6" type="string"/>
<att name="uniprot" value="Q6FI07" type="string"/>
<att name="uniprot" value="B2R8B4" type="string"/>
<att name="uniprot" value="LDLP" type="string"/>
</att>
-------
-------
-------
</node>
I used this code to extract the information: such as node Id, label, edge Id ,label ,source and target
clc
clear all
xDoc = xmlread('ans.xgmml');
ansNode = xDoc.getDocumentElement;
nodes = ansNode.getElementsByTagName('node');
edges=ansNode.getElementsByTagName('edge');
node_Matrix = zeros(nodes.getLength,1);
edge_Matrix = zeros(edges.getLength,3);
for i = 0 : nodes.getLength-1
node_IDs = nodes.item(i).getAttribute('id');
node_Labels = nodes.item(i).getAttribute('label');
node_Matrix(i+1,1) = str2double(node_IDs);
node_Matrix1{i+1,1} = char(node_Labels);
end
t_node= table(node_Matrix1 ,node_Matrix, ...
'VariableNames', {'Label','ID'} ...
);
writetable(t_node,'nodeinfo.txt')
for j = 0 : edges.getLength-1
edge_IDs = edges.item(j).getAttribute('id');
edge_Labels = edges.item(j).getAttribute('label');
edge_sources=edges.item(j).getAttribute('source');
edge_targets=edges.item(j).getAttribute('target');
edge_Matrix(j+1,1) = str2double(edge_IDs);
edge_Matrix(j+1,2) = str2double(edge_sources);
edge_Matrix(j+1,3) = str2double(edge_targets);
edge_Matrix1{j+1,:} = regexp(char(edge_Labels), '( )', 'split');
end
t_edge = table(edge_Matrix1 ,edge_Matrix(:,1),edge_Matrix(:,2),edge_Matrix(:,3), ...
'VariableNames', {'Label','ID', 'Source', 'Target'} ...
);
writetable(t_edge,'edgeinfo.txt')
Now I want to get the other information in these lines
<att name="Human Readable Label" value="LPXN" type="string"/>
<att name="uniprot" value="O60711" type="string"/>
<att name="uniprot" value="B4DV71" type="string"/>
<att name="uniprot" value="LPXN" type="string"/>
<att name="uniprot" value="Q53FW6" type="string"/>
<att name="uniprot" value="Q6FI07" type="string"/>
<att name="uniprot" value="B2R8B4" type="string"/>
<att name="uniprot" value="LDLP" type="string"/>
</att>
like this
Human Readable Label=LPXN
uniprot=O60711
uniprot=B4DV71
uniprot=LPXN
uniprot=Q53FW6
uniprot=Q6FI07
uniprot=B2R8B4
uniprot=LDLP
How can I get these information from the lines above Any help will be highly appreciate

Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!