Main Content

hmmprofstruct

Create or edit hidden Markov model (HMM) profile structure

Syntax

Model = hmmprofstruct(Length)
Model = hmmprofstruct(Length, Field1, Field1Value, Field2, Field2Value, ...)
NewModel = hmmprofstruct(Model, Field1, Field1Value, Field2, Field2Value, ...)

Input Arguments

LengthNumber of match states in the model.
ModelMATLAB® structure containing fields for the parameters of an HMM profile created with the hmmprofstruct function.
FieldCharacter vector or string containing a field name in the structure Model. See the table below for field names.
FieldValueValue associated with Field. See the table below for descriptions.

Output Arguments

ModelMATLAB structure containing fields for the parameters of an HMM profile.

Description

Model = hmmprofstruct(Length) returns Model, a MATLAB structure containing fields for the parameters of an HMM profile. Length specifies the number of match states in the model. All other required parameters are set to the default values.

Model = hmmprofstruct(Length, Field1, Field1Value, Field2, Field2Value, ...) returns an HMM profile structure using the specified parameters. All other required parameters are set to default values.

NewModel = hmmprofstruct(Model, Field1, Field1Value, Field2, Field2Value, ...) returns an updated HMM profile structure using the specified parameters. All other parameters are taken from the input Model.

HMM Profile Structure

The MATLAB structure Model contains the following fields, which are the required and optional parameters of an HMM profile. All probability values are in the [0 1] range.

Field Description
ModelLengthInteger specifying the length of the profile (number of MATCH states).
AlphabetCharacter vector or string specifying the alphabet used in the model. Choices are 'AA' (default) or 'NT'.

Note

AlphaLength is 20 for 'AA' and 4 for 'NT'.

MatchEmission

Symbol emission probabilities in the MATCH states.

Either of the following:

  • A matrix of size ModelLength-by-AlphaLength, where each row corresponds to the emission distribution for a specific MATCH state. Defaults to uniform distributions.

  • A structure containing residue counts, such as returned by aacount or basecount.

InsertEmission

Symbol emission probabilities in the INSERT state.

Either of the following:

  • A matrix of size ModelLength-by-AlphaLength, where each row corresponds to the emission distribution for a specific INSERT state. Defaults to uniform distributions.

  • A structure containing residue counts, such as returned by aacount or basecount.

NullEmission

Symbol emission probabilities in the MATCH and INSERT states for the NULL model.

Either of the following:

  • A 1-by-AlphaLength row vector. Defaults to a uniform distribution.

  • A structure containing residue counts, such as returned by aacount or basecount.

Note

The NULL model is used to compute the log-odds ratio at every state and avoid overflow when propagating the probabilities through the model.

Note

NULL probabilities are also known as the background probabilities.

BeginX

BEGIN state transition probabilities.

Format is a 1-by-(ModelLength + 1) row vector:

[B->D1 B->M1 B->M2 B->M3 .... B->Mend]

Note

If necessary, hmmprofstruct will normalize the data such that the sum of the transition probabilities from the BEGIN state equals 1:

sum(Model.BeginX) = 1

For fragment profiles:

sum(Model.BeginX(3:end)) = 0

Default is [0.01 0.99 0 0 ... 0].

MatchX

MATCH state transition probabilities.

Format is a 4-by-(ModelLength - 1) matrix:

[M1->M2 M2->M3 ... M[end-1]->Mend;
 M1->I1 M2->I2 ... M[end-1]->I[end-1];
 M1->D2 M2->D3 ... M[end-1]->Dend;
 M1->E  M2->E  ... M[end-1]->E  ]

Note

If necessary, hmmprofstruct will normalize the data such that the sum of the transition probabilities from every MATCH state equals 1:

sum(Model.MatchX) = [ 1 1 ... 1 ]

For fragment profiles:

sum(Model.MatchX(4,:)) = 0

Default is repmat([0.998 0.001 0.001 0],ModelLength-1,1).

InsertX

INSERT state transition probabilities.

Format is a 2-by-(ModelLength - 1) matrix:

[ I1->M2 I2->M3 ... I[end-1]->Mend;
  I1->I1 I2->I2 ... I[end-1]->I[end-1] ]

Note

If necessary, hmmprofstruct will normalize the data such that the sum of the transition probabilities from every INSERT state equals 1:

sum(Model.InsertX) = [ 1 1 ... 1 ]

Default is repmat([0.5 0.5],ModelLength-1,1).

DeleteX

DELETE state transition probabilities.

Format is a 2-by-(ModelLength - 1) matrix:

[ D1->M2 D2->M3 ... D[end-1]->Mend ;
  D1->D2 D2->D3 ... D[end-1]->Dend ]

Note

If necessary, hmmprofstruct will normalize the data such that the sum of the transition probabilities from every DELETE state equals 1:

sum(Model.DeleteX) = [ 1 1 ... 1 ]

Default is repmat([0.5 0.5],ModelLength-1,1).

FlankingInsertX

Flanking insert states (N and C) used for LOCAL profile alignment.

Format is a 2-by-2 matrix:

[N->B  C->T ;
 N->N  C->C]

Note

If necessary, hmmprofstruct will normalize the data such that the sum of the transition probabilities from Flanking Insert states equals 1:

sum(Model.FlankingInsertsX) = [1 1]

Note

To force global alignment use:

Model.FlankingInsertsX = [1 1; 0 0]

Default is [0.01 0.01; 0.99 0.99].

LoopX

Loop states transition probabilities used for multiple hits alignment.

Format is a 2-by-2 matrix:

[E->C  J->B ;
 E->J  J->J]

Note

If necessary, hmmprofstruct will normalize the data such that the sum of the transition probabilities from Loop states equals 1:

sum(Model.LoopX) = [1 1] 

Default is [0.5 0.01; 0.5 0.99].

NullX

Null transition probabilities used to provide scores with log-odds values also for state transitions.

Format is a 2-by-1 column vector:

[G->F ; G->G]

Note

If necessary, hmmprofstruct will normalize the data such that the sum of the transition probabilities from Null states equals 1:

sum(Model.NullX) = 1

Default is [0.01; 0.99].

IDNumberOptional. User-assigned identification number.
DescriptionOptional. User-assigned description of the model.

HMM Profile Model

An HMM profile model is a common statistical tool for modeling structured sequences composed of symbols. These symbols include randomness in both the output (emission of symbols) and the state transitions of the process. Markov models are generally represented by state diagrams.

The following figure is a state diagram for an HMM profile of length four. INSERT, MATCH, and DELETE states are in the center section.

  • INSERT state represents the excess of one or more symbols in the target sequence that are not included in the profile.

  • MATCH state means that the target sequence is aligned to the profile at the specific location.

  • DELETE state represents a gap or symbol absence in the target sequence (also known as a silent state because it does not emit any symbols).

Flanking states (S, N, B, E, C, T) are used for proper modeling of the ends of the sequence, either for global, local or fragment alignment of the profile. S, B, E, and T are silent, while N and C are used to insert symbols at the flanks.

Examples

Example 8. Creating an HMM Profile Structure

Create an HMM profile structure with 100 MATCH states, using the amino acid alphabet.

hmmProfile = hmmprofstruct(100,'Alphabet','AA')

hmmProfile = 

        ModelLength: 100
           Alphabet: 'AA'
      MatchEmission: [100x20 double]
     InsertEmission: [100x20 double]
       NullEmission: [1x20 double]
             BeginX: [101x1 double]
             MatchX: [99x4 double]
            InsertX: [99x2 double]
            DeleteX: [99x2 double]
    FlankingInsertX: [2x2 double]
              LoopX: [2x2 double]
              NullX: [2x1 double]
Example 9. Editing an HMM Profile Structure
  1. Use the pfamhmmread function to create an HMM profile structure from pf00002.ls, a PFAM HMM-formatted file included with the software.

    hmm02 = pfamhmmread('pf00002.ls');
  2. Modify the HMM profile structure to force a global alignment by setting the looping transition probabilities in the flanking insert states to zero.

    hmm02 = hmmprofstruct(hmm02,'FlankingInsertX',[0 0;1 1]);
    hmm02.FlankingInsertX
    
    ans =
    
         0     0
         1     1

Version History

Introduced before R2006a