Edit distance on real signals
[___] = edr(___, specifies
the distance metric to use in addition to any of the input arguments
in previous syntaxes.
metric can be one of
edr(___) without output arguments
plots the original and aligned signals.
If the signals are real vectors, the function displays the two original signals on a subplot and the aligned signals in a subplot below the first one.
If the signals are complex vectors, the function displays the original and aligned signals in three-dimensional plots.
If the signals are real matrices, the function uses
imagesc to display
the original and aligned signals.
If the signals are complex matrices, the function plots their real and imaginary parts in the top and bottom half of each image.
Generate two real signals: a chirp and a sinusoid. Add a clearly outlying section to each signal.
x = cos(2*pi*(3*(1:1000)/1000).^2); y = cos(2*pi*9*(1:399)/400); x(400:410) = 7; y(100:115) = 7;
Warp the signals so that the edit distance between them is smallest. Specify a tolerance of 0.1. Plot the aligned signals, both before and after the warping, and output the distance between them.
tol = 0.1; edr(x,y,tol)
ans = 617
Change the sinusoid frequency to twice its initial value. Repeat the computation.
y = cos(2*pi*18*(1:399)/400); y(100:115) = 7; edr(x,y,tol);
Add an imaginary part to each signal. Restore the initial sinusoid frequency. Align the signals by minimizing the sum of squared Euclidean distances.
x = exp(2i*pi*(3*(1:1000)/1000).^2); y = exp(2i*pi*9*(1:399)/400); x(400:405) = 5+3j; x(405:410) = 7; y(100:107) = 3j; y(108:115) = 7-3j; edr(x,y,tol,'squared');
Generate two signals consisting of two distinct peaks separated by valleys of different lengths. Plot the signals.
x1 = [0 1 0 1 0]*.95; x2 = [0 1 0 0 0 0 0 0 0 0 1 0]*.95; subplot(2,1,1) plot(x1) xlim([0 12]) subplot(2,1,2) plot(x2) xlim([0 12])
Compute the edit distance between the signals. Set a small tolerance so that the only matches are between equal samples.
tol = 0.1; figure edr(x1,x2,tol);
The distance between the signals is 7. To align them, it is necessary to remove the seven central zeros of
x2 or add seven zeros to
Compute the D matrix, whose bottom-right element corresponds to the edit distance. For the definition of D, see Edit Distance on Real Signals.
cnd = (abs(x1'-x2))>tol; D = zeros(length(x1)+1,length(x2)+1); D(1,2:end) = 1:length(x2); D(2:end,1) = 1:length(x1); for h = 2:length(x1)+1 for k = 2:length(x2)+1 D(h,k) = min([D(h-1,k)+1 ... D(h,k-1)+1 ... D(h-1,k-1)+cnd(h-1,k-1)]); end end D
D = 6×13 0 1 2 3 4 5 6 7 8 9 10 11 12 1 0 1 2 3 4 5 6 7 8 9 10 11 2 1 0 1 2 3 4 5 6 7 8 9 10 3 2 1 0 1 2 3 4 5 6 7 8 9 4 3 2 1 1 2 3 4 5 6 7 7 8 5 4 3 2 1 1 2 3 4 5 6 7 7
Compute and display the warping path that aligns the signals.
[d,i1,i2] = edr(x1,x2,tol); E = zeros(length(x1),length(x2)); for k = 1:length(i1) E(i1(k),i2(k)) = NaN; end E
E = 5×12 NaN 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0 0 0 0 0 NaN NaN NaN NaN NaN NaN NaN NaN 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0 0 0 0 0 NaN
Repeat the computation, but now constrain the warping path to deviate at most two elements from the diagonal. Plot the stretched signals and the warping path. In the second plot, set the matrix columns to run along the x-axis.
[dc,i1c,i2c] = edr(x1,x2,tol,2); subplot(2,1,1) plot([x1(i1c);x2(i2c)]','.-') title(['Distance: ' num2str(dc)]) subplot(2,1,2) plot(i2c,i1c,'o-',[i2(1) i2(end)],[i1(1) i1(end)]) axis ij title('Warping Path')
The constraint results in a smaller edit distance but distorts the signals. If the constraint cannot be met, then
NaN for the distance. See this by forcing the warping path to deviate at most one element from the diagonal.
[dc,i1c,i2c] = edr(x1,x2,tol,1); subplot(2,1,1) plot([x1(i1c);x2(i2c)]','.-') title(['Distance: ' num2str(dc)]) subplot(2,1,2) plot(i2c,i1c,'o-',[i2(1) i2(end)],[i1(1) i1(end)]) axis ij title('Warping Path')
MATLAB2.gif contain two handwritten samples of the word "MATLAB®." Load the files. Add outliers by blotching the data.
samp1 = 'MATLAB1.gif'; samp2 = 'MATLAB2.gif'; x = double(imread(samp1)); y = double(imread(samp2)); x(15:20,54:60) = 4000; y(15:20,84:96) = 4000;
Align the handwriting samples along the x-axis using the edit distance. Specify a tolerance of 450.
x— Input signal
Input signal, specified as a real or complex vector or matrix.
Complex Number Support: Yes
y— Input signal
Input signal, specified as a real or complex vector or matrix.
Complex Number Support: Yes
Tolerance, specified as a positive scalar.
maxsamp— Width of adjustment window
Inf(default) | positive integer
Width of adjustment window, specified as a positive integer.
metric— Distance metric
Distance metric, specified as
'symmkl'. If X and Y are both K-dimensional
metric prescribes dmn(X,Y), the
distance between the mth sample of X and the nth sample of Y.
'euclidean' — Root sum of
squared differences, also known as the Euclidean or ℓ2 metric:
'absolute' — Sum of absolute
differences, also known as the Manhattan, city block, taxicab, or ℓ1 metric:
'squared' — Square of the
Euclidean metric, consisting of the sum of squared differences:
'symmkl' — Symmetric Kullback-Leibler
metric. This metric is valid only for real and positive X and Y:
dist— Minimum distance
Minimum distance between signals, returned as a positive real scalar.
Two signals with equivalent features arranged
in the same order can appear very different due to differences in
the durations of their sections.
these durations so that the corresponding features appear at the same
location on a common time axis, thus highlighting the similarities
between the signals. The criterion used to perform the distortion
is designed to be robust to outliers.
Consider the two K-dimensional signals
which have M and N samples,
respectively. Given dmn(X,Y), the
distance between the mth sample of X and the nth sample of Y specified in
edr function stretches X and Y onto a common set of instants such that the edit
distance between the signals is smallest.
Given ε, a real number that is the
tolerance specified in
tol, declare that the mth
sample of X and the nth
sample of Y match if dmn(X,Y) < ε. If two samples, m and n,
do not match, you can make them match in any of three ways:
Remove m from the first signal, such as when the next sample does match n. This removal is equivalent to adding m to the second signal and obtaining two consecutive matches.
Lengthen the first signal by adding in position a sample that matches n and displacing the rest of the samples by one location. This addition is equivalent to removing the unmatched n from the second signal.
Substitute m with n in the first signal, or, equivalently, remove both m and n.
The edit distance is the total number of these operations that are needed to make the two signals match. This number is not unique. To compute the smallest possible edit distance between X and Y, start from these facts:
Two empty signals have zero distance between them.
The distance between an empty signal and a signal with L samples is L, because that is the number of samples that must be added to the empty signal to recover the other one. Equivalently, L is the number of samples that must be removed from an L-sample signal to empty it.
Create an (M + 1)-by-(N + 1) matrix, D, such that:
D1,1 = 0.
Dm,1 = m – 1 for m = 2, …, M + 1.
D1,n = n – 1 for n = 2, …, N + 1.
For m, n > 1,
The smallest edit distance between X and Y is then DM+1,N+1.
The warping path through D that
results in this smallest edit distance is parameterized by two sequences
of the same length,
and is a combination of “chess king” moves:
Vertical moves: (m,n) → (m + 1,n) corresponds to removing a sample from X or adding a sample to Y. Each move increases the edit distance by 1.
Horizontal moves: (m,n) → (m,n + 1) corresponds to removing a sample from Y or adding a sample to X. Each move increases the edit distance by 1.
Diagonal moves: (m,n) → (m + 1,n + 1) corresponds to a match if dm,n(X,Y) ≤ ε or corresponds to removing one sample from each signal if dm,n(X,Y) > ε. Matches do not increase the distance. Removals increase it by 1.
This structure ensures that any acceptable path aligns
the complete signals, does not skip samples, and does not repeat signal
features. Additionally, a desirable path runs close to the diagonal
line extended between d1,1(X,Y) and dM,N(X,Y). This
extra constraint, adjusted by the
ensures that the warping compares sections of similar length.
The penalty for making two samples match is independent of the difference in value between the samples. Two samples that differ by a little more than the tolerance incur the same penalty as two samples that are markedly different. For that reason, the edit distance is not affected by outliers. Conversely, repeating samples to align two signals has a cost, which is not the case with dynamic time warping.
 Chen, Lei, M. Tamer Özsu, and Vincent Oria. "Robust and Fast Similarity Search for Moving Object Trajectories." Proceedings of 24th ACM International Conference on Management of Data (SIGMOD ‘05). 2005, pp. 491–502.
 Paliwal, K. K., Anant Agarwal, and Sarvajit S. Sinha. "A Modification over Sakoe and Chiba’s Dynamic Time Warping Algorithm for Isolated Word Recognition." Signal Processing. Vol. 4, 1982, pp. 329–333.
 Sakoe, Hiroaki, and Seibi Chiba. "Dynamic Programming Algorithm Optimization for Spoken Word Recognition." IEEE® Transactions on Acoustics, Speech, and Signal Processing. Vol. ASSP-26, No. 1, 1978, pp. 43–49.