How do I count the total number 'AA' repeats in the text? I managed to count how many times it appears in each cell, but don't know how to add it up.

4 views (last 30 days)
clear all;
%read SeqDNA file
Seq_Out_File='./Seq_Out.txt';
fileID=fopen(Seq_Out_File);
C=textscan(fileID, '%s');
NAA=(strfind(C{1},'AA'));
x=cellfun('length', NAA);

Accepted Answer

Walter Roberson
Walter Roberson on 3 May 2018
S = fileread(Seq_Out_File);
no_overlap_count = length(regexp(S, 'AA'));
with_overlap_count = length(regexp(S, 'A(?=A)'));
  3 Comments

Sign in to comment.

More Answers (1)

John BG
John BG on 4 May 2018
Edited: John BG on 4 May 2018
Hi Nitzan Khan
1.-
According to
AAA counts as 2x AA and AAAA would count as 3x AA.
.
2.-
Also, you asked for AA match but you may want to really count all possible outcomes, all possible pairs of the basic sequence 'ACTG'
A basic equivalent to the Stack Overflow code in MATLAB would be:
A1='CTACTGCGACTTATGCCCATAATTGGCCACAATAAGTTTCTCGGATTCGCAGGTACCCTCGAGAGTATGGTCGTGGACTCAACCTTAGAGGCAACGGAGT'
L1='ACTG'
nL=combinator(4,2) % SChwarz's legendary function available here:
L2=L1(nL)
cell1={}
for k=1:1:size(L2,1)
cell1=[cell1 L2(k,:)]
end
nRep=[];
for k=1:1:size(L2,1)
[t1,t2]=regexp(A1,L2(k,:))
nRep=[nRep numel(t1)]
end
for k=1:1:size(L2,1)
str1=['n' L2(k,:) '=' num2str(nRep(k))]
evalin('base',str1)
end
L3=[repmat('n',size(L2,1),1) L2 repmat(',',size(L2,1),1)]'
L3=L3(:)'
L3(end)=[]
str3=['T1=table(' L3 ')']
evalin('base',str3)
T1 =
1×16 table
nAA nAC nAT nAG nCA nCC nCT nCG nTA nTC nTT nTG nGA nGC nGT nGG
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
5 7 6 7 6 4 7 6 7 6 5 5 7 5 6 7
.
3.-
For a complete sequence in a text file:
clear all;clc;close all
A=fileread('Seq_Out.txt');
L1='ACTG'
nL=combinator(4,2) % SChwarz's legendary function available here:
L2=L1(nL)
cell1={}
for k=1:1:size(L2,1)
cell1=[cell1 L2(k,:)]
end
nRep=[];
for k=1:1:size(L2,1)
[t1,t2]=regexp(A,L2(k,:));
nRep=[nRep numel(t1)];
end
for k=1:1:size(L2,1)
str1=['n' L2(k,:) '=' num2str(nRep(k))]
evalin('base',str1)
end
L32=[repmat('n',size(L2,1),1) L2 repmat(',',size(L2,1),1)]'
L32=L32(:)'
L32(end)=[]
.
the count table for all pairs is:
.
str3=['T2=table(' L32 ')']
T2 =
1×16 table
nAA nAC nAT nAG nCA nCC nCT nCG nTA nTC nTT nTG nGA nGC nGT nGG
_____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____
48684 62452 62140 62799 62493 48543 62702 62323 62328 62690 48502 62482 62491 62410 62683 48427
.
Besides the sequence used, also attached the saved variables in .mat file.
You can add more sequences, one each row, as show in the link above
.
If you find this answer useful would you please be so kind to consider marking my answer as Accepted Answer?
To any other reader, if you find this answer useful please consider clicking on the thumbs-up vote link
thanks in advance for time and attention
John BG

Categories

Find more on Creating and Concatenating Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!