Main Content


Split merged paired-end sequences into separate files



seqsplitpe(fastqFile) splits merged paired-end sequences from fastqFile into two separate files. Each sequence is split in the middle. The first half of the sequence is saved in the first output file and the other half in the second output file. By default, each output file name consists of the input file name appended with a suffix '_1' or '_2' before the file extension.


seqsplitpe(___,Name,Value) uses additional options specified by one or more Name,Value pair arguments.


[outFiles,N] = seqsplitpe(___) returns the names of output files in a cell array outFiles. N represents a vector containing the numbers of sequences saved in each output file.


collapse all

Split each of the paired-end sequences in half, and store each half in separate output files.

[outFiles, N] = seqsplitpe('SXX123456_merged.fastq');

Check the number of sequences in each output file.

N = 2×1


Input Arguments

collapse all

Names of FASTQ files with sequence and quality information, specified as a character vector, string, string vector, or cell array of character vectors.

Example: 'SRR005164_1_50.fastq'

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'OutputSuffix','PairedEnd_split' specifies to use the custom suffix in the output file names.

Relative or absolute path to the output file directory, specified as a character vector or string. The default is the current directory.

Example: 'OutputDir','F:\results'

Custom suffix to use in the output file names, specified as a character vector or string. It is inserted after the input file name and before the suffix '_1' or '_2'. The default is ''.

Example: 'OutputSuffix','_MisMatches2'

Boolean indicating whether to perform computation in parallel, specified as true or false.

For parallel computing, you must have Parallel Computing Toolbox™. If a parallel pool does not exist, one is created automatically when the auto-creation option is enabled in your parallel preferences. Otherwise, computation runs in serial mode.


There is a cost associated with sharing large input files across workers in a distributed environment. In some cases, running in parallel may not be beneficial in terms of performance.

Example: 'UseParallel',true

Output Arguments

collapse all

Output file names, returned as a cell array of character vectors. By default, the name of each output file consists of the input file name appended with a suffix '_1' or '_2' before the file extension.

Number of sequences saved in each output file, returned as an n-by-1 vector where n is the number of output files. If there are multiple output files, the order within N corresponds to the order of the output files.

Extended Capabilities

Version History

Introduced in R2016b