How to find a sequence from a txt file?
Show older comments
I'm trying to identify a specific primer sequence from three different text files. I already have this code:
clear;
clc;
% Open the file
fileID = fopen('cDNA1-1.txt', 'r');
% Read the DNA sequence from the file
dna_sequence = strfind(fileID, '%s');
%
DNAsequence=string(dna_sequence)
% Define the primer sequence
primer_sequence = 'TACG';
% Find the location of the primer sequence in the DNA sequence
primer_location = strfind(DNAsequence, primer_sequence);
% Display the location of the primer sequence
if isempty(primer_location)
disp('Primer sequence not found in the DNA sequence.');
else
disp(['Primer sequence found at position(s): ', num2str(primer_location)]);
end
However, for some reason my dna_sequence variable is empty, and I keep getting that the sequence is not found in any of the text files. I know that that's wrong, so I need help.
I will include the three txt files along with my code.
Thank you!
Accepted Answer
More Answers (1)
John D'Errico
on 15 Mar 2024
Edited: John D'Errico
on 15 Mar 2024
strfind does NOT read a string from a file! You did this:
% Open the file
fileID = fopen('cDNA1-1.txt', 'r');
% Read the DNA sequence from the file
dna_sequence = strfind(fileID, '%s');
WRONG. You opened the file, but then never read anything from the file. Essentially, you got ahead of yourself.
fileID = fopen('cDNA1-1.txt', 'r');
I'll use fread, which brings them in as ascii. So char will convert them. As well, I'll make it a row vector. (There are many ways we could do this. I'm just grabbing one that works.)
D = char(fread(fileID))';
But note that the file contains carriage returns and line feed characters, so I'll strip them out. Keep only the DNA part.
D = D(ismember(D,'ACGT'))
strfind(D,'TCAG')
And that would be the locations of that substring in your file.
Categories
Find more on String Parsing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!