nt2int
Convert nucleotide sequence from letter to integer representation
Syntax
SeqInt
= nt2int(SeqChar
)
SeqInt
= nt2int(SeqChar
,
...'Unknown', UnknownValue
, ...)
SeqInt
= nt2int(SeqChar
,
...'ACGTOnly', ACGTOnlyValue
, ...)
Input Arguments
SeqChar | One of the following:
|
UnknownValue | Integer to represent unknown nucleotides. Choices are integers
≥ 0 and ≤ 255 .
Default is 0 . |
ACGTOnlyValue | Controls the prohibition of ambiguous nucleotides. Choices
are true or false (default).
If ACGTOnlyValue is true ,
you can enter only the characters A , C , G , T ,
and U . |
Output Arguments
SeqInt | Nucleotide sequence specified by a row vector of integers. |
Description
converts SeqInt
= nt2int(SeqChar
)SeqChar
, a character vector or string specifying a
nucleotide sequence, to SeqInt
, a row vector of integers
specifying the same nucleotide sequence. For valid codes, see the table Mapping Nucleotide Letter Codes to Integers. Unknown characters (characters not in
the table) are mapped to 0
. Gaps represented with hyphens are mapped to
16
.
calls SeqInt
= nt2int(SeqChar
,
...'PropertyName
', PropertyValue
,
...)nt2int
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
specifies
an integer to represent unknown nucleotides. SeqInt
= nt2int(SeqChar
,
...'Unknown', UnknownValue
, ...)UnknownValue
can
be an integer ≥ 0
and ≤ 255
.
Default is 0
.
controls
the prohibition of ambiguous nucleotides (SeqInt
= nt2int(SeqChar
,
...'ACGTOnly', ACGTOnlyValue
, ...)N
, R
, Y
, K
, M
, S
, W
, B
, D
, H
,
and V
). Choices are true
or false
(default).
If ACGTOnlyValue
is true
,
you can enter only the characters A
, C
, G
, T
,
and U
.
Mapping Nucleotide Letter Codes to Integers
Nucleotide | Code | Integer |
---|---|---|
Adenosine | A | 1 |
Cytidine | C | 2 |
Guanine | G | 3 |
Thymidine | T | 4 |
Uridine (if 'Alphabet' set to 'RNA' ) | U | 4 |
Purine (A or G ) | R | 5 |
Pyrimidine (T or C ) | Y | 6 |
Keto (G or T ) | K | 7 |
Amino (A or C ) | M | 8 |
Strong interaction (3 H bonds) (G or C ) | S | 9 |
Weak interaction (2 H bonds) (A or T ) | W | 10 |
Not A (C or G or T ) | B | 11 |
Not C (A or G or T ) | D | 12 |
Not G (A or C or T ) | H | 13 |
Not T or U (A or C or G ) | V | 14 |
Any nucleotide (A or C or G or T or U ) | N | 15 |
Gap of indeterminate length | - | 16 |
Unknown (any character not in table) | * | 0 (default) |
Examples
Convert a nucleotide sequence from letters to integers.
s = nt2int('ACTGCTAGC')
s =
1 2 4 3 2 4 1 3 2
Create a random character vector to represent a nucleotide sequence.
SeqChar = randseq(20) SeqChar = TTATGACGTTATTCTACTTT
Convert the nucleotide sequence from letter to integer representation.
SeqInt = nt2int(SeqChar) SeqInt = Columns 1 through 13 4 4 1 4 3 1 2 3 4 4 1 4 4 Columns 14 through 20 2 4 1 2 4 4 4
Version History
Introduced before R2006a
See Also
aa2int
| baselookup
| int2aa
| int2nt