File Exchange

## Outlier Detection and Removal [hampel]

version 1.6 (4.88 KB) by

Detect and replace outliers with appropriate local values in a non-linear time series.

Updated

HAMPEL(X,Y,DX,T,varargin) returns the Hampel filtered values of the
elements in Y. It was developed to detect outliers in a time series,
but it can also be used as an alternative to the standard median
filter.

References
Chapters 1.4.2, 3.2.2 and 4.3.4 in Mining Imperfect Data: Dealing with
Contamination and Incomplete Records by Ronald K. Pearson.

Acknowledgements
I would like to thank Ronald K. Pearson for the introduction to moving
window filters. Please visit his blog at:
http://exploringdatablog.blogspot.com/2012/01/moving-window-filters-and-pracma.html

X,Y are row or column vectors with an equal number of elements.
The elements in Y should be Gaussian distributed.

Input DX,T,varargin must not contain NaN values!

DX,T are optional scalar values.
DX is a scalar which defines the half width of the filter window.
It is required that DX > 0 and DX should be dimensionally equivalent to
the values in X.
T is a scalar which defines the threshold value used in the equation
|Y - Y0| > T*S0.

Standard Parameters for DX and T:
DX = 3*median(X(2:end)-X(1:end-1));
T = 3;

varargin covers addtional optional input. The optional input must be in
the form of 'PropertyName', PropertyValue.
Supported PropertyNames:
'standard': Use the standard Hampel filter.
Revision 1 details below.

Supported PropertyValues: Scalar value which defines the tolerance of
the adaptive filter. In the case of standard Hampel filter this value
is ignored.

Output YY,I,Y0,LB,UB,ADX are column vectors containing Hampel filtered
values of Y, a logical index of the replaced values, nominal data,
lower and upper bounds on the Hampel filter and the relative half size
of the local window, respectively.

NO is a scalar that specifies the Number of Outliers detected.

Examples
1. Hampel filter removal of outliers
X = 1:1000; % Pseudo Time
Y = 5000 + randn(1000, 1); % Pseudo Data
Outliers = randi(1000, 10, 1); % Index of Outliers
Y(Outliers) = Y(Outliers) + randi(1000, 10, 1); % Pseudo Outliers
[YY,I,Y0,LB,UB] = hampel(X,Y);

plot(X, Y, 'b.'); hold on; % Original Data
plot(X, YY, 'r'); % Hampel Filtered Data
plot(X, Y0, 'b--'); % Nominal Data
plot(X, LB, 'r--'); % Lower Bounds on Hampel Filter
plot(X, UB, 'r--'); % Upper Bounds on Hampel Filter
plot(X(I), Y(I), 'ks'); % Identified Outliers

2. Adaptive Hampel filter removal of outliers
DX = 1; % Window Half size
T = 3; % Threshold
X = 1:DX:1000; % Pseudo Time
Y = 5000 + randn(1000, 1); % Pseudo Data
Outliers = randi(1000, 10, 1); % Index of Outliers
Y(Outliers) = Y(Outliers) + randi(1000, 10, 1); % Pseudo Outliers

plot(X, Y, 'b.'); hold on; % Original Data
plot(X, YY, 'r'); % Hampel Filtered Data
plot(X, Y0, 'b--'); % Nominal Data
plot(X, LB, 'r--'); % Lower Bounds on Hampel Filter
plot(X, UB, 'r--'); % Upper Bounds on Hampel Filter
plot(X(I), Y(I), 'ks'); % Identified Outliers

3. Median Filter Based on Filter Window
DX = 3; % Filter Half Size
T = 0; % Threshold
X = 1:1000; % Pseudo Time
Y = 5000 + randn(1000, 1); % Pseudo Data
[YY,I,Y0] = hampel(X,Y,DX,T);

plot(X, Y, 'b.'); hold on; % Original Data
plot(X, Y0, 'r'); % Median Filtered Data

Sara Barros

Avaa Gao

Fiakro Castro

Fiakro Castro

E. Cheynet

Mark Shore

### Mark Shore (view profile)

An interesting and potentially very useful tool. I'm reviewing both this implementation as well as Pearson's book and webpage.

The code appears well-commented and carefully written but I'll hold off on a rating until I have tested it more using my own data.

Michael Lindholm Nielsen

### Michael Lindholm Nielsen (view profile)

There is a new update on the way as soon as Mathworks has reviewed it. This update corrects minor errors.

peter

### peter (view profile)

thanks a lot, a very practical tool

 9 Feb 2012 1.6 (1) Corrected potential error in internal median function. (2) Removed internal "keyboard" command. (3) Optimized internal Gauss filter. 8 Feb 2012 1.5 (1) The elements in X and Y are now temporarily sorted for internal computations. (2) Performance optimization. (3) Added Example 3. 6 Feb 2012 1.4 (1) If the number of elements (X,Y) are below 2 the output YY will be a copy of Y. No outliers will be detected. No error will be issued. 5 Feb 2012 1.3 (1) Changed a calculation in the adaptive Hampel filter. See details in file. (2) Checks if DX,T or varargin contains NaN values. (3) Now capable of ignoring NaN values in X and Y. (4) Added output Y0 - Nominal Data. 28 Jan 2012 1.1 (1) Replaced output S with lower and upper bounds on the Hampel filter. (2) Added option to use an experimental adaptive Hampel filter. The Principle behind this filter is described in hampel.m.
##### MATLAB Release
MATLAB 7.10 (R2010a)