Problem 61158. Gene Duplication with Sequencing Errors

You are investigating the genome of the bacterium Codex matlabius. A virus that infects C. matlabius is known to insert long, repeated sections of its own genes into the bacterial genome. Your job is to find duplicates in the genome that might signal these viral insertions.
Unfortunately, your gene sequencer isn't perfect and sometimes makes reading mistakes. You need to consider both exact matches and very close matches with no more than 1 mismatch (disagreement between the two sequences).
Given a single string of nucleotide characters taken from the genome, find the longest substring that appears in two non-overlapping locations. The two occurrences can either match exactly or differ by at most one character.
Rules:
  • The two occurrences must not overlap
  • They must be at least 5 nucleotides in length
  • Only characters A (adenine), C (cytosine), G (guanine), or T (thymine) appear in the input
  • If the two occurrences differ by exactly one character, mark that position with 'X' in the output
  • The 'X' marker must appear in the interior of the string, never at the beginning or end
  • If the two occurrences match exactly, return the substring without any 'X'
  • If no valid duplicated substring exists, return an empty string
Example 1: Fuzzy match (1 mismatch)
Input
genome = 'AATGCTACCTTAGTACCACTGGATGCTACATTAGA'
Output
dupe = 'ATGCTACXTTAG'
The duplicated gene (with one mismatch at position 8) appears in two places:
Example 2: Exact match (X at beginning is not allowed)
Input
genome = 'AAATCGATCGTTTCGATCG'
Output
dupe = 'TCGATCG'
While there's a potential 8-character fuzzy match, it would require 'X' at the beginning, which is not allowed. Returns the 7-character exact match instead.

Solution Stats

80.0% Correct | 20.0% Incorrect
Last Solution submitted on Jan 13, 2026

Problem Comments

Solution Comments

Show comments

Problem Recent Solvers2

Suggested Problems

More from this Author54

Problem Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!