similarityDistance

Compute distance profile between query and time series subsequences

Since R2024b

Syntax

d = similarityDistance(x,y)

[d,i] = similarityDistance(x,y)

[___] = similarityDistance(x,y,EndPoints=endPoints)

Description

d = similarityDistance(x,y) returns the vector of z-normalized Euclidean distances between the query sequence y and every subsequence of the time series x with the same length as y.

example

[d,i] = similarityDistance(x,y) also returns the vector i of the starting indices of the subsequences that best match the query in y.

example

[___] = similarityDistance(x,y,EndPoints=endPoints) specifies how to handle query windows near the end points of x.

Use this syntax with any of the input and output arguments in the previous syntaxes.

Examples

collapse all

Compute and Evaluate Similarity Distance Profile

Open Live Script

Load the data, which consists of T1 and T2. T1 is a timetable containing armature current measurements on a degrading DC motor. T2 is a timetable that contains data collected from a known faulty motor.

load matrix_profile_data T1 T2

Set x to the MotorCurrent variable in T1. Plot x in a subplot.

x = T1.MotorCurrent;

subplot(211)
plot(x)
ylabel("Motor Current, mA")
hold on

Figure contains an axes object. The axes object with ylabel Motor Current, mA contains an object of type line.

T2 contains anomalous data in a segment that begins at location 3000 and has a length of 100. Extract this data as the target segment y.

len = 100;
loc = 3000;
iy = loc:loc+len-1;
y = T2.MotorCurrent(iy);

Compute the similarity distance of the target anomaly segment y to the subsequences within the motor data in x.

[d,i] = similarityDistance(x,y);

Using the first three indices in i, plot the three closest matching subsequences. These matches indicate potentially similar anomalies to the anomaly in y.

for k = 1:3
  id = i(k):i(k)+len-1;
  plot(id,x(id),"--");
  hold on
end
legend({"Time Series", "Match 1", "Match 2", "Match 3"})
hold off

Figure contains an axes object. The axes object with ylabel Motor Current, mA contains 4 objects of type line. These objects represent Time Series, Match 1, Match 2, Match 3.

For comparison, plot the target anomaly sequence.

subplot(212)
plot(y);
hold on
ylabel("Motor Current, mA")

Figure contains 2 axes objects. Axes object 1 with ylabel Motor Current, mA contains 4 objects of type line. These objects represent Time Series, Match 1, Match 2, Match 3. Axes object 2 with ylabel Motor Current, mA contains an object of type line.

Plot the data in the three matching subsequences with the target anomaly.

for k = 1:3
  id = i(k):i(k)+len-1;
  plot(x(id),"--");
  hold on
end
legend({"Target Anomaly", "Match 1", "Match 2", "Match 3"})
hold off

The matching subsequences appear similar to the target anomaly.

Input Arguments

collapse all

`x` — Time series to evaluate
numeric vector

Time series to evaluate, specified as a numeric vector of length n. x must not have any missing data.

`y` — Query sequence
numeric vector

Query sequence, specified as a numeric vector of length m, where m is less than or equal to the length n of the time series x. y must not have any missing data.

`endPoints` — Method for handling query windows near endpoints
`"discard"` (default) | `"fill"`

Method for handling query windows near the endpoints of x, specified as one of these options:

"discard" — Truncate the length of the output vectors d and i to n – m + 1, where n is the length of x and m is the length of y.
"fill" — Extend the length of d and i to n by padding d with m – 1 NaNs. The software sets the last m – 1 elements of the vector i to n – m + 2:n.

For example, to set the method to "discard", use a syntax such as

d = similarityDistance(x,y,EndPoints="discard")

Here, x is the time series and y is the query sequence.

Output Arguments

collapse all

`d` — Distance vector
numeric vector

Distance vector containing the z-normalized distances between the query sequence y and the subsequences x(k:k+m-1), where k varies from 1 to n-m+1, returned as a numeric vector with a length determined by the method in EndPoints.

`i` — Vector of starting indices for subsequences
positive integer vector

Vector of starting indices for subsequences of x that best match y, returned as a positive integer vector with the same size as d.

The elements of i sort the elements of d(i) in ascending order of distance, that is, from the best match (smallest distance) to the worst match (largest distance). The best match, therefore, has the starting location of d(i(1)), and the worst match has the starting location of d(i(n-m+1)).

References

[1] Abdullah Mueen, Sheng Zhong, Yan Zhu, Michael Yeh, Kaveh Kamgar, Krishnamurthy Viswanathan, Chetan Kumar Gupta, and Eamonn Keogh, The Fastest Similarity Search Algorithm for Time Series Subsequences Under Euclidean Distance, 2022. https://www.cs.unm.edu/%7Emueen/FastestSimilaritySearch.html

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The similarityDistance function fully supports GPU arrays. To run the function on a GPU, specify the input data as a gpuArray (Parallel Computing Toolbox). For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2024b

similarityDistance

Syntax

Description

Examples

Compute and Evaluate Similarity Distance Profile

Input Arguments

x — Time series to evaluate numeric vector

y — Query sequence numeric vector

endPoints — Method for handling query windows near endpoints "discard" (default) | "fill"

Output Arguments

d — Distance vector numeric vector

i — Vector of starting indices for subsequences positive integer vector

References

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

`x` — Time series to evaluate
numeric vector

`y` — Query sequence
numeric vector

`endPoints` — Method for handling query windows near endpoints
`"discard"` (default) | `"fill"`

`d` — Distance vector
numeric vector

`i` — Vector of starting indices for subsequences
positive integer vector

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.