Main Content

matrixProfile

Compute matrix profile of time series

Since R2024b

Description

Return Matrix Profile

MP = matrixProfile(X,len) returns the matrix profile of the time series X, which is the vector of minimum z-normalized Euclidean distances between each subsequence of X with length len and its closest neighbor.

You can use the function findDiscord to find the locations of the top discords in MP.

[MP,MPI]=matrixProfile(X,len) also returns the matrix profile index vector MPI for the location MPI(k) of the nearest neighbor to the subsequence.

example

[___] = matrixProfile(___,Name=Value) specifies options using one or more name-value arguments in addition to the arguments in previous syntaxes. For example, to use parallel processing, set UseParallel to true.

Plot Matrix Profile

matrixProfile(___) plots an interactive plot of the matrix profile. You can use this syntax with any of the previous input-argument combinations.

example

Examples

collapse all

Load the data, which consists of T1. T1 is a timetable containing armature current measurements of a degrading DC motor.

load matrix_profile_data T1

Specify the time series variable X to T1.MotorCurrent and the query segment length to 100.

X = T1.MotorCurrent;
len = 100;

Calculate the matrix profile.

[MP,MPI] = matrixProfile(X,len);

Plot the matrix profile.

matrixProfile(X,len)

Matrix Profile Plots. The Time-Series plot is on the top. Overlays of yellow and purple on the plotted data show the two top motif pairs and the discord. The Matrix Profile plot, which plots the distances, is in the middle. The Subsequences plot is on the bottom, and shows the subsequences for the top two motif pairs and the discord together.

The profile shows the two top motif pairs, or segments that agree best with their neighbors, occur at locations 6717 and 3119. These locations are consistent with minima in the matrix profile plot.

The profile also shows a single discord at location 9797. This subsequence visibly deviates from the motif subsequences for much of its length.

Use findDiscord to find more discords, which are the locations of segments with the furthest distances from their neighbors. Show the top four locations.

locs = findDiscord(MP,MPI);
toplocs = locs(1:4)
toplocs = 4×1

        9797
        9800
        9802
        9792

Show the corresponding distances.

topdist = MP(toplocs)
topdist = 4×1

    8.3894
    8.2062
    8.1517
    7.9777

Plot the findDiscord results.

findDiscord(MP,MPI)

findDiscord Plots. The Time-Series plot is on the top. The Matrix Profile plot is in the middle. The Matrix Profile Discord plot is on the bottom. This plot shows a number of discords that are overlapping or close together.

Discords that are close to each other are probably part of the same anomaly. You need to identify only one discord for such a segment. Improve the segment separation and limit the number of discords to 10.

findDiscord(MP,MPI,MinSeparation=40,MaxNumDiscords=10)

findDiscord Plots. The Time-Series plot is on the top. The Matrix Profile plot is in the middle. The Matrix Profile Discord plot is on the bottom, and now shows discrete discord instances.

The highest discord is at location 9797, as the original matrix profile showed. The plot also shows significant discords in other locations.

Input Arguments

collapse all

Time series to evaluate, specified as a numeric vector of length N. X must not have any missing data.

Length of query subsequence, specified as an integer. len must be less than time series length N.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: MP = matrixProfile(X,len,UseParallel=true) results in parallel processing.

Length of exclusion zone on either side of the query starting position loc, specified as the number of data points to exclude. This argument prevents false matches with the query subsequence itself.

Option for controlling output length when X ends with a partial subsequence, specified as one of the following options:

  • "discard" — Truncate the length of the output vectors MP and I to N-len+1, where N is the length of X.

  • "fill" — Extend the length of distance and index to N by padding MP with len-1 NaNs. The software sets the last len-1 elements of the vector I to the sequence N-len+2:N.

Maximum number of iterations for computing an upper bound on MP, specified as an integer. The default value is N-len+1, which runs the algorithm to completion.

Option to use the parallel pool to speed up computations, specified as false, which results in using serial computation, or true for parallel computation.

Output Arguments

collapse all

Matrix profile containing the z-normalized distances between each subsequence of length len in the time series X and the best-matching len-length neighbor of that subsequence, returned as a numeric vector.

The length of MP is equal to len–1 when X ends with a complete subsequence with respect to len. When X ends with a partial subsequence, the value of EndPoints further modifies the length of MP by truncation or fill.

The ExclusionZoneLength value prevents false matches with the query subsequence itself.

You can use the function findDiscord to find the locations of the top discords in MP.

Starting indices for subsequences X(MPI(k):MPI(k)+len-1) of X that best match the query subsequence of x(loc:loc+len-1), returned as an integer vector.

References

[1] Yeh, Chin-Chia Michael, et al. “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets.” 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 1317–22. DOI.org (Crossref), https://doi.org/10.1109/ICDM.2016.0179.

Version History

Introduced in R2024b