Different performances of STL decomposition in MATLAB and Python

I used trenddecomp fucntion in MATLAB and STL function in Python to decomposite a time series and the results are pretty different in these two software. I don't know if it is something different in the processing of this function?
Here are the code scripts.
MATLAB
data = readtable('stldata.txt');
index = data.index;
datas = data.data;
[LT,ST,R] = trenddecomp(datas,'stl',12);
figure(1);
plot(index,LT)
hold on
plot(index,ST)
plot(index,R)
plot(index,datas)
legend('Long term','Seasonal','Residual','Original')
Python
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import STL
file_path = './stldata.txt'
data = pd.read_csv(file_path, delim_whitespace=True)
stl = STL(data['data'],period=12)
result = stl.fit()
fig = result.plot()
fig.set_size_inches(10, 6)
plt.show()
Result of Python

Answers (1)

Are they "pretty" different? It looks ok your plots, the curves are not very different, just scale y axis in Python or plot as in Matlab in 1 figure 3-4 different curves in different colours and see.
So the differences you percept are mainly due the 2 Matlab and Python plot types differences ( in Matlab it is all curves in 1 subplot, in Python there are 3 subplots per each curve and so the y-axis scaling) and also the difference is due to that that in Python the STL(...) algorithm made less low pass filtering (less high frequencies components rejection, higher cut-off frequency) with their moving average and convolution than that of Matlab (more smooth and so more high frequency components were rejected and so more high passs filtering out and so lower cut-off frequency of matlab internal STL algorithm realization).
Differences in this question lie in 1) How your input initial source of data sampled(spaced), need to be uniform for Matlab, 2) There may be 2 seasonal trends, you found yet only 1, 3) Scaling/zoom of plots, for Python matplotlib.pyplot try different plot initialization fig, ax = plt.subplots(layout='constrained') vertical y axis size and scaling, fig.set_size_inches(), subplot mosaic and ax.set_yscale(...)
Should I look up for original STL and SSA algorithms realizations in Matlab and Python for comparisons? Are they implemented in Matlab as Fortran, C++ codes or already built libs?
In Python STL decomposition is implemented using moving average and convolution filter:
What about Matlab?
Hope this will help.
Can you accept my answer?

4 Comments

Thank you for your comment. But I do believe there is something different in the two results. For example, the long term trends in two results are totally different. I think it is not because of the y-axis. The raw time series data is the monthly groundwater storage of a specific region so it should be a period of 12. And I have set this parameter as 12 in both MATLAB and Python.
Kindly see I provided the updates to the answers:
I meant that the differences in the 2 LOESS STL algorithms implementation visualizations are relatively not that drastical, not so disastrous in comparison to closer other real world picture regions and recent events and very serious deals to manage, only to settle by help.
I found for you next Python setup of function invocation arguments order for closer to TCE NCE MPPL Matlab resuls:
stl = STL(data['data'],period=12, seasonal=9, trend=None, low_pass=19, seasonal_deg=1, trend_deg=1, low_pass_deg=1, robust=True, seasonal_jump=2, trend_jump=2, low_pass_jump=2)
or you can use more advanced:
from statsmodels.tsa.seasonal import MSTL
stl_kwargs = {"seasonal_deg": 0}
model = MSTL(data, periods=(24, 24 * 7, 24*7*4, 24*7*4*30))
So the differences you percept are mainly due the 2 reasons:
First:
Matlab and Python plot types differences ( in Matlab it is all curves in 1 subplot, in Python there are 3 subplots per each curve and so the y-axis scaling, can be made ) and also
Second:
Both Matlab and Python rely on some Fortran routines for STL algorithm.
Python uses:
NETLIB fortran written by [1]. The original code contains a bug that appears in the determination of the median that is used in the robust weighting. This version matches the fixed version that uses a correct partitioned sort to determine the median.
References
[1]
R. B. Cleveland, W. S. Cleveland, J.E. McRae, and I. Terpenning (1990) STL: A Seasonal-Trend Decomposition Procedure Based on LOESS. Journal of Official Statistics, 6, 3-73
While whiich Fortran library is used for the same TCE NCE MPPL Matlab is left for commercial research subject to already order from me.
the difference is due to that that in Python the STL(...) algorithm made less low pass filtering (less high frequencies components rejection, higher cut-off frequency) with their moving average and convolution than that of Matlab (more smooth and so more high frequency components were rejected and so more high passs filtering out and so lower cut-off frequency of matlab internal STL algorithm realization).
I found how to adjust is by next arguments in Python STL function list:
low_pass{int, None}, optional
Length of the low-pass filter. Must be an odd integer >=3. If not provided, uses the smallest odd integer > period.
low_pass_degint, optional
Degree of low pass LOESS. 0 (constant) or 1 (constant and trend).
robustbool, optional
Flag indicating whether to use a weighted version that is robust to some forms of outliers.
seasonal_jumpint, optional
Positive integer determining the linear interpolation step. If larger than 1, the LOESS is used every seasonal_jump points and linear interpolation is between fitted points. Higher values reduce estimation time.
trend_jumpint, optional
Positive integer determining the linear interpolation step. If larger than 1, the LOESS is used every trend_jump points and values between the two are linearly interpolated. Higher values reduce estimation time.
low_pass_jumpint, optional
Positive integer determining the linear interpolation step. If larger than 1, the LOESS is used every low_pass_jump points and values between the two are linearly interpolated. Higher values reduce estimation time.
In Python use next template to equalize the plots with that of TCE NCE MPPL Matlab:
...
res = stl.fit()
plt.gca().set_color_cycle(['red', 'green', 'blue', 'yellow', 'black'])
plt.plot(data)
plt.plot(result.trend)
plt.plot(result.seasonal)
plt.plot(result.resid)
plt.plot(result.observed)
plt.show()
Your can forecast with it using next:
%For univariate time series:
from statsmodels.tsa.sarima.model import SARIMAX
%For multi-variate analysis:
from statsmodels.tsa.varmax.model import VARMAX
%Vector Autoregressive Moving Average with eXogenous regressors model
from statsmodels.tsa.forecasting.stl import STLForecast
stlf = STLForecast(data, SARIMAX)
stlf_res = stlf.fit()
horiz = 3;
forecast = stlf_res.forecast(3)
plt.plot(forecast)
plt.show()
data = readtable('https://uk.mathworks.com/matlabcentral/answers/uploaded_files/1809698/stldata.txt');
index = data.index;
datas = data.data;
per1 = 12;
[LT,ST,R] = trenddecomp(datas,'stl',per1);
figure;
subplot(4, 1, 1);
plot(index, datas, 'r');
title('Input');
xlabel('time');
ylabel('Val[units]');
grid on;
subplot(4, 1, 2);
plot(index, LT, 'b');
title('LongTermTrend');
xlabel('time');
ylabel('Val[units]');
grid on;
subplot(4, 1, 3);
plot(index, ST, 'g');
title('SeasonalPeriodic12');
xlabel('time');
ylabel('Val[units]');
grid on;
subplot(4, 1, 4);
plot(index, R, 'm');
title('Residual');
xlabel('time');
ylabel('Val[units]');
grid on;
Thanks for your update. Now I have used your Python code to run the decomposition on the same source data.
stl = STL(data['data'],period=12, seasonal=9, trend=None, low_pass=19, seasonal_deg=1, trend_deg=1, low_pass_deg=1, robust=True, seasonal_jump=2, trend_jump=2, low_pass_jump=2)
Following is the result. If you look at the long-term trend, your may find that it increases when x ranges from 0 to ~20, and decreases when x ranges from ~20 to ~210. And it finally increases during ~210 to 240. But for the result of MATLAB, the long-term trend continuously decreases from 0 to ~200, and increases from ~200 to the end. So if I were a new reader, I would draw different conlucions from these two results. So I think it may be a critical problem, espically when I want to use STL in scientific resarch work.
I believe that there may be different paramters used in MATLAB and Python. But which one is more resesonable?
I read the original paper of the STL algorithm and it seems that the style of the result from Python is more similar as the original paper than that from MATLAB.
Thank you again for your effort on this topic.
Hm...
OK, valid okay for scientific manuscript of course, Ph.D. and M.Sc. workflow helpfull commitment,
for x from 0 to 25 Python regarded the fluctuations as upward trend, while Matlab united the trend with more strong downward following trend. You are right. It depends on which resolution/scale who needs to catch/distinguish trendlines.
I can do more for it, sweep resolutions, tune algorithms and develop new strategic components iff you hire me by contacting more in private for legitimate employment within the narrow specific SciTech niche complex datum processing and insightful features production signals and systems r&d scope domain.
It is also both subjective and objective.
How do you treat the residuals as a nosice(disturbance) or which useful insights you think
to get from it?
Also if TCE Matlab yet doesn't have more direct control over internal STL algorithm through the trenddecomp(...) function it can be furthermore overrided or we can try to construct wrapper around it by special pre-processing of the input to it to get it closer to some custom only ethalone or specific legitimate stakeholders requirements.
It's all allright. I found also with Optimal Control AIXI very long term predictions (many not disclosed my valuable findings are legitimatelly for sale, it must needs to add more rewards investment of substance to the in order to reap better results) and as per the look ahead we must more appreciate, construct and consent each other track and flowpath, than to disagree and destruct each other track and flowpath.
So which refining do you need to trenddecompt function? Please let me know.
In order to help correctly:
1.
stl = STL(data['data'],period=12, seasonal=9, trend=None, low_pass=19, seasonal_deg=1, trend_deg=1, low_pass_deg=1, robust=True, seasonal_jump=2, trend_jump=2, low_pass_jump=2)
Following is the result. If you look at the long-term trend, your may find that it increases when x ranges from 0 to ~20, and decreases when x ranges from ~20 to ~210. And it finally increases during ~210 to 240. But for the result of MATLAB, the long-term trend continuously decreases from 0 to ~200, and increases from ~200 to the end. So if I were a new reader, I would draw different conlucions from these two results. So I think it may be a critical problem, espically when I want to use STL in scientific resarch work.
I see. TCE NCP MPPL Matlab just filtered it more low-passy to catch more long-term trend. It is not so bad, especially in historical data analysis necessary for future accurate predictions.
I construct.
typical_input = f1(x1,x2)*f2(x1,x2)*f3(x1,x2) + f3(x1,x2)*f4(x1,x2)*f5(x1,x2) +f6(x1,x2)*f7(x1,x2)*f8(x1,x2) =
F(x1,x2)
If you would give me more I'd deliver more.
Which script, code who will be needing (I can deliver securely).
Matlab is better for SSA, while Python has also MSTL and more control over STL by the function arguments list I putted here.
The real world input is nonlinear with both multiplicative and additive terms in multivariate space-time continuum so good reference for well founded basic future scientific research.
STL deals with additive and univariate analysis.
For better quality manuscript you need to just connect right components.
Which conclusions can you draw from the results?
What you get when you run next:
stl = STL(data['data'],period=12, seasonal=24, trend=None, low_pass=40, seasonal_deg=1, trend_deg=1, low_pass_deg=5, robust=True, seasonal_jump=12, trend_jump=24, low_pass_jump=48)
It depends also, how you mindset is modulated, I've been focusing on forecasting, predictions of multi-dimensional multi-variate datum at the time on my initial conclusion within the answer writeup on this your question. And so from the foreseeing standpoint I might catched tradeoff between prediction horizon and accuracy of anticipations. And so focused on long-term trends more robust detection, which scientifically factually requires better low frequencies amplification and high frequencies attenuation as per Occam Razor principle for analysis of fractals.
See my original update explanation:in Python the STL(...) algorithm made less low pass filtering (less high frequencies components rejection, higher cut-off frequency) with their moving average and convolution than that of Matlab (more smooth and so more high frequency components were rejected and so more high passs filtering out and so lower cut-off frequency of matlab internal STL algorithm realization).
SaS,SaP

Sign in to comment.

Products

Release

R2022b

Asked:

on 19 Nov 2024

Edited:

on 21 Nov 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!