Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.
So for
s = 'AACTGAACG'
and
n = 3
we get the following n-grams (trigrams):
AAC, ACT, CTG, TGA, GAA, AAC, ACG
Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.
This problem was originally inspired by a MATLAB Newsgroup discussion.
It should be noted that spaces should be ignored or else test suites 3 and 5 fail.
good use of 'hankel' function
cool solution
Sorry about this, but I got stuck and I want to learn how to do it. After looking at several solutions, I found my mistake and was able to create my own solution :)
What happens if the test suite changed in the future?
This solution is not correct in general, as the way of using hankel here, generates n-1 fake fragments
Clever usage of the Hankel matrix. I don't automatically think of the Hankel for this application, but it really works well. Thanks - I've learned something
What's the point of a 'solution' like this? It passes the test suite, but in what way was it interesting for you to write it?
8071 Solvers
Find the alphabetic word product
1961 Solvers
Sum all integers from 1 to 2^n
5996 Solvers
518 Solvers
Given a 4x4 matrix, swap the two middle columns
289 Solvers