Main Content

matrixProfile

Compute matrix profile of between all pairs of subsequences in a multivariable time series

Since R2024b

Description

Return Matrix Profile

MP = matrixProfile(X,len) returns the matrix profile of the univariable or multivariable time series X. The matrix profile is the vector of minimum z-normalized Euclidean distances between each subsequence of X with length len and its closest neighbor.

  • If X is a vector, then the software treats it as a single channel.

  • If X is a matrix, then the software computes the matrix profile independently for each column (multivariable solution).

You can use the functions findDiscord and findMotif to find the locations of the top discords and motifs in MP.

[MP,MPI]=matrixProfile(X,len) also returns the matrix profile index vector MPI for the locations of the nearest neighbors to the subsequence.

example

[___] = matrixProfile(___,Name=Value) specifies options using one or more name-value arguments in addition to the arguments in previous syntaxes. For example, to use parallel processing, set UseParallel to true.

Plot Matrix Profile

matrixProfile(___) plots an interactive plot of the matrix profile. You can use this syntax with any of the previous input-argument combinations.

example

Examples

collapse all

Load the data, which consists of T1. T1 is a timetable containing armature current measurements of a degrading DC motor.

load matrix_profile_data T1

Specify the time series variable X to T1.MotorCurrent and the query segment length to 100.

X = T1.MotorCurrent;
len = 100;

Calculate the matrix profile.

[MP,MPI] = matrixProfile(X,len);

Plot the matrix profile.

matrixProfile(X,len)

Matrix Profile Plots. The Time-Series plot is on the top. Overlays of yellow and purple on the plotted data show the two top motif pairs and the discord. The Matrix Profile plot, which plots the distances, is in the middle. The Subsequences plot is on the bottom, and shows the subsequences for the top two motif pairs and the discord together.

The profile shows the two top motif pairs, or segments that agree best with their neighbors, occur at locations 6717 and 3119. These locations are consistent with minima in the matrix profile plot.

The profile also shows a single discord at location 9797. This subsequence visibly deviates from the motif subsequences for much of its length.

Use findDiscord to find more discords, which are the locations of segments with the furthest distances from their neighbors. Show the top four locations.

locs = findDiscord(MP,MPI);
toplocs = locs(1:4)
toplocs = 4×1

        9797
        9800
        9802
        9792

Show the corresponding distances.

topdist = MP(toplocs)
topdist = 4×1

    8.3894
    8.2062
    8.1517
    7.9777

Plot the findDiscord results.

figure
findDiscord(MP,MPI)

Figure contains an axes object. The axes object with title Matrix Profile, xlabel Time, ylabel Distance contains 2 objects of type line. One or more of the lines displays its values using only markers These objects represent Distance, Discord.

Discords that are close to each other are probably part of the same anomaly. You need to identify only one discord for such a segment. Improve the segment separation and limit the number of discords to 10.

findDiscord(MP,MPI,MinSeparation=40,MaxNumDiscords=10)

findDiscord Plots. The Time-Series plot is on the top. The Matrix Profile plot is in the middle. The Matrix Profile Discord plot is on the bottom, and now shows discrete discord instances.

The highest discord is at location 9797, as the original matrix profile showed. The plot also shows significant discords in other locations.

Input Arguments

collapse all

Time series to evaluate, specified as a numeric vector of length n or a numeric matrix containing multiple columns of length n. X must not have any missing data.

Length of query subsequence, specified as an integer. len must be less than time series length n.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: MP = matrixProfile(X,len,UseParallel=true) results in parallel processing.

Length of exclusion zone on either side of the query starting position loc, specified as the number of data points to exclude. This argument prevents false matches with the query subsequence itself.

Option for controlling the output length when X ends with a partial subsequence, specified as one of these options:

  • "discard" — Truncate the length of the output vectors MP and I to nlen + 1, where n is the length of X.

  • "fill" — Extend the length of distance and index to n by padding MP with len – 1 NaNs. The software sets the last len –1 elements of the vector I to the sequence (nlen + 2:n).

Maximum number of iterations for computing an upper bound on MP, specified as an integer. The default value is nlen+1, which runs the algorithm to completion.

Option to use the parallel pool to speed up computations, specified as false, which results in using serial computation, or true for parallel computation.

Matrix profile algorithm to use, specified as "STAMP" or "STOMP".

  • The STAMP algorithm (scalable time series anytime matrix profile) supports anytime and parallel computation, and is usually a good choice for single-variable time series.

  • The STOMP algorithm (scalable time series ordered matrix profile) is approximately log2(n) faster than the STAMP algorithm, and is useful for multivariable time series if you have a GPU and do not need anytime capability, that is, you do not need to be able to stop the algorithm before it completes and still obtain an acceptably accurate solution.

.

Output Arguments

collapse all

Matrix profile containing the z-normalized distances between each subsequence of length len in the time series X and the best-matching len-length neighbor of that subsequence, returned as a numeric vector.

  • If X is a vector, then the software treats it as a single channel when computing MP.

  • If X is a matrix, then the software computes the matrix profile independently for each column.

The length of MP is equal to len or len – 1, depending on the setting for EndPoints.

The ExclusionZoneLength value prevents false matches with the query subsequence itself.

You can use the functions findDiscord and findMotif to find the locations of the top discords and top motif pairs, respectively, in MP.

Starting indices for subsequences X(MPI(k):MPI(k)+len–1) of X that best match the query subsequence of x(loc:loc+len-1), returned as an integer vector.

References

[1] Yeh, Chin-Chia Michael, et al. “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets.” 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 1317–22. DOI.org (Crossref), https://doi.org/10.1109/ICDM.2016.0179.

[2] Zhu, Yan, et al. “Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins.” 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 739–48. DOI.org (Crossref), https://doi.org/10.1109/ICDM.2016.0085.

Extended Capabilities

expand all

Version History

Introduced in R2024b

expand all