Problem 79. DNA N-Gram Distribution
Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.
So for
s = 'AACTGAACG'
and
n = 3
we get the following n-grams (trigrams):
AAC, ACT, CTG, TGA, GAA, AAC, ACG
Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.
Solution Stats
Problem Comments
-
1 Comment
E Chang
on 22 Oct 2018
It should be noted that spaces should be ignored or else test suites 3 and 5 fail.
Solution Comments
Show commentsProblem Recent Solvers1354
Suggested Problems
-
15841 Solvers
-
Which values occur exactly three times?
5133 Solvers
-
Back to basics 6 - Column Vector
1068 Solvers
-
9699 Solvers
-
Tick. Tock. Tick. Tock. Tick. Tock. Tick. Tock. Tick. Tock.
942 Solvers
More from this Author96
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!