Main Content

Industrial Cooling Fan Anomaly Detection Algorithm Development with Deployment to a Microservice Docker Image

Detection of anomalies from sensor measurements is an important part of an equipment condition monitoring process for preventing catastrophic failure. This example shows an end-to-end workflow for creating a predictive maintenance application, from development of a machine learning algorithm to the deployment of the trained machine learning model. A Simulink® model is used to synthetically create measurement data for both normal and anomalous conditions. A support vector machine (SVM) model is trained for detecting anomalies in load, fan mechanics, and power consumption of an industrial cooling fan.

The model detects a load anomaly when the system is working on overloaded conditions, that is, when the system work demand exceeds designed limits. A fan anomaly is detected when a fault occurs within the mechanical subsystem of the motor or the fan. Finally, a power supply anomaly can be detected by a drop in the voltage.

Data Generation

In this example, a thermistor-controlled fan model defined using Simscape™ Electrical blocks, generates the measurements. This Simulink model includes thermal, mechanical and electrical components of a fan. Learn more about this model here [1].

addpath('Data_Generator/', 'Data_Generator/VaryingConvectionLib/');
mdl = "CoolingFanWithFaults";
open_system(mdl)

Anomalies are introduced to this model at random times and for random durations by using custom MATLAB® functions within the DC Source Voltage, Aero Drag and External Temp blocks. There are only three sensors located in this model: a voltage sensor, power sensor and temperature sensor. Data from these sensors is used to detect anomalies.

To generate data with sufficient instances of all anomalies, simulate the model 40 times with randomly generated anomalies. Note that these anomalies can occur in more than one compontent. The generateData helper function supports the simulation of the data. Within the generateData function, nominal ranges and the anomalous range are defined. The generateSimulationEnsemble function is used within generateData function to run the simulations, in parallel if Parallel Computing Toolbox has been installed, and generate the data for this experiment.

generateFlag = false;
if isfolder('./Data')
    folderContent = dir('Data/CoolingFan*.mat');
    if isempty(folderContent)
        generateFlag = true;
    else
        numSim = numel(folderContent);
    end
else
    generateFlag = true;
end

if generateFlag
    numSim = 40;
    rng("default");
    generateData('training', numSim);
end

Once the simulations have been completed, load the data by using simulationEnsembleDatastore and setting the selected variables to be Signals and Anomalies. SimulationEnsembleDatastore is a datastore specialized for use in developing algorithms for condition monitoring and predictive maintenance using simulated data. Create a simulation ensemble datastore for the data generated by the 40 simulations and set the selected variables appropriately.

ensemble = simulationEnsembleDatastore('./Data')
ensemble = 
  simulationEnsembleDatastore with properties:

           DataVariables: [4×1 string]
    IndependentVariables: [0×0 string]
      ConditionVariables: [0×0 string]
       SelectedVariables: [4×1 string]
                ReadSize: 1
              NumMembers: 40
          LastMemberRead: [0×0 string]
                   Files: [40×1 string]

ensemble.SelectedVariables = ["Signals", "Anomalies"];

Data Characteristics

Split the ensemble into training, test and validation subsets. Assign the first 37 members of the ensemble as the training set, next two members as the test set and the last member as the validation set. To visualize the signal characteristics, read the first member of the training ensemble and plot the signals and the anomaly labels associated with the measured signals.

trainEnsemble = subset(ensemble, 1:numSim-3)
trainEnsemble = 
  simulationEnsembleDatastore with properties:

           DataVariables: [4×1 string]
    IndependentVariables: [0×0 string]
      ConditionVariables: [0×0 string]
       SelectedVariables: [2×1 string]
                ReadSize: 1
              NumMembers: 37
          LastMemberRead: [0×0 string]
                   Files: [37×1 string]

testEnsemble = subset(ensemble, numSim-2:numSim-1);
validationEnsemble = subset(ensemble, numSim);

tempData = trainEnsemble.read;
signal_data = tempData.Signals{1};
anomaly_data = tempData.Anomalies{1};

h = figure;
tiledlayout("vertical");
ax1 = nexttile; plot(signal_data.Data(:,1)); title('Motor Voltage')
ax2 = nexttile; plot(signal_data.Data(:,2)); title('Fan Speed')
ax3 = nexttile; plot(signal_data.Data(:,3)); title('Temperature'); xlabel('Time in seconds');
ax4 = nexttile; plot(anomaly_data.Data(:,1), 'bx', 'MarkerIndices',find(anomaly_data.Data(:,1)>0.1), 'MarkerSize', 5); hold on;
plot(anomaly_data.Data(:,2), 'ro', 'MarkerIndices',find(anomaly_data.Data(:,2)>0.1), 'MarkerSize', 5);
plot(anomaly_data.Data(:,3), 'square', 'Color', 'g', 'MarkerIndices',find(anomaly_data.Data(:,3)>0.1), 'MarkerSize', 5); 
ylim([0, 2]);
title('Anomalies');
legend("Load Anomaly", "Fan Anomaly", "Power Supply Anomaly", 'Location', 'bestoutside')
linkaxes([ax1 ax2 ax3 ax4],'x')
set(h,'Units','normalized','Position',[0 0 1 .8]); 

Anomalies are manifested as small pulses on the traces that last between 10 and 100 seconds. Notice that different types of anomalies have different profiles. The information contained in only one of these traces is not sufficient to classify among the different type of anomalies, as one anomaly can be manifested in more than one of the signals. Zoom in to see the signal characteristics when the anomalies occur. At this scale, it is clear that the magnitude of the raw measurements do change during periods of anomalies. The plot shows that more than one anomaly can occur at a time. Therefore, the anomaly detection algorithm needs to use the information in all three measurements to distinguish among anomalies in load, fan speed and power supply.

Model Development

Characteristics of the measured data and the possible anomalies are important in determining the modeling approach. As seen in the data above, anomalies can span multiple consecutive time steps. It is not sufficient to monitor just one signal. All three signals need to be monitored to identify each of the three anomalies. Furthermore, the selected model must have a quick response time. Therefore, the model needs to detect anomalies in reasonably small window size of data. Based on these factors and knowledge of the mechanical components, define a window size of 2000 time samples. This window size is sufficient to capture an anomalous event but at the same time is short enough for the anomaly detection to be quick.

The next step is to understand the balance of the dataset for the different anomaly cases. There are a total of eight different possibilities ranging from no anomalies in either load, fan speed or power supply, or anomalies in all three. Read in the trainingData and summarize the number of instances of each anomaly type. Single anomalies are represented as 'Anomaly1', two simultaneously occuring anomalies are labeled as 'Anomaly12' and anomalies in all three categories are labeled as 'Anomaly123'.

trainingData = trainEnsemble.readall;
anomaly_data = vertcat(trainingData.Anomalies{:});
labelArray={'Normal','Anomaly1','Anomaly2', 'Anomaly3', 'Anomaly12','Anomaly13','Anomaly23','Anomaly123'}';
yCategoricalTrain=labelArray(sum(anomaly_data.Data.*2.^(size(anomaly_data.Data,2)-1:-1:0),2)+1,1);
yCategoricalTrain=categorical(yCategoricalTrain);
summary(yCategoricalTrain);
     Anomaly1          50653 
     Anomaly12         51123 
     Anomaly13            59 
     Anomaly2          48466 
     Anomaly23            64 
     Anomaly3             28 
     Normal         31817644 

As seen above, the dataset is highly imbalanced. There are significantly more points for the normal condition than for anomalous conditions.The categories with multiple simultaneous anomalies occur even less frequently. To use the information in the multiple simultaneous anomaly cases (Anomaly12, Anomaly13, Anomaly23), train three independent classifiers. Each SVM model learns about only a single anomaly type. SVM looks at the interactions between the features and does not assume independence, as long as a non-linear kernel is used.

Features need to be extracted from raw measurement data to achieve a robust and performant model. The Diagnostic Feature Designer (DFD) app assists in the design of features. Create data ensembles using the makeEnsembleTableSimpleFrame helper function to represent the raw data as inputs to the DFD app. The helper function splits the timeseries into frames of size 2000 samples. To deal with the highly imbalanced nature of the dataset, use the downsampleNormalData helper function to reduce the number of frames with the normal label by about 87%. Use this reduced dataset in the DFD app to design features. For each anomaly type, import the data into the app and extract suitable features. Select the top ranked features to train the SVM. To reproduce the workflow, a MATLAB function diagnosticFeatures is instead exported from the DFD app and top features for all three anomaly types have been selected to compute a total of 18 features. Set startApp to true to instead open the DFD app and compute new features.

yTrainAnomaly1=logical(anomaly_data.Data(:,1));
yTrainAnomaly2=logical(anomaly_data.Data(:,2));
yTrainAnomaly3=logical(anomaly_data.Data(:,3));

windowSize=2000;
sensorDataTrain = vertcat(trainingData.Signals{:});
ensembleTableTrain1 = generateEnsembleTable(sensorDataTrain.Data,yTrainAnomaly1,windowSize);
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 8 workers.
ensembleTableTrain2 = generateEnsembleTable(sensorDataTrain.Data,yTrainAnomaly2,windowSize);
ensembleTableTrain3 = generateEnsembleTable(sensorDataTrain.Data,yTrainAnomaly3,windowSize);

ensembleTableTrain1_reduced = downsampleNormalData(ensembleTableTrain1);
ensembleTableTrain2_reduced = downsampleNormalData(ensembleTableTrain2);
ensembleTableTrain3_reduced = downsampleNormalData(ensembleTableTrain3);

startApp = false;
if startApp 
    diagnosticFeatureDesigner;
end

featureTableTrain1 = diagnosticFeatures(ensembleTableTrain1_reduced);
featureTableTrain2 = diagnosticFeatures(ensembleTableTrain2_reduced);
featureTableTrain3 = diagnosticFeatures(ensembleTableTrain3_reduced);

head(featureTableTrain1);
    Anomaly    Signal_tsmodel/Coef1    Signal_tsmodel/Coef2    Signal_tsmodel/Coef3    Signal_tsmodel/AIC    Signal_tsmodel/Freq1    Signal_tsmodel/Mean    Signal_tsmodel/RMS    Signal_tsmodel_1/Freq1    Signal_tsmodel_1/Mean    Signal_tsmodel_1/RMS    Signal_tsfeat/ACF1    Signal_tsfeat/PACF1    Signal_tsmodel_2/Freq1    Signal_tsmodel_2/AIC    Signal_tsmodel_2/Mean    Signal_tsmodel_2/RMS    Signal_tsfeat_2/ACF1    Signal_tsfeat_2/PACF1
    _______    ____________________    ____________________    ____________________    __________________    ____________________    ___________________    __________________    ______________________    _____________________    ____________________    __________________    ___________________    ______________________    ____________________    _____________________    ____________________    ____________________    _____________________

     true            -0.9242                 0.16609                 -0.11822               -12.792               0.0025644                0.026824               0.03654                0.033299                  109.08                   109.14                0.95851                0.95851                0.0028621                 -10.778                  0.69223                 0.89283                 0.95851                  0.95851       
     false           -1.4043                  0.8304                 -0.47453               -20.195                0.033608                 0.13957               0.14214                0.056065                  133.08                   133.49                0.87089                0.87089                 0.046561                 -17.282                   4.0114                  4.0427                 0.87089                  0.87089       
     false           -1.4142                 0.84819                 -0.46628               -20.131                0.050843                 0.15994               0.16218                0.075457                  141.99                   142.36                0.86277                0.86277                 0.070711                 -17.225                   4.4718                  4.4996                 0.86277                  0.86277       
     false           -1.4191                 0.85979                 -0.48505               -20.118                0.038362                 0.13684               0.13951                0.040716                  119.62                    120.1                0.87749                0.87749                 0.051107                 -17.233                   3.7951                  3.8294                 0.87749                  0.87749       
     false           -1.3758                 0.78127                 -0.45944               -20.077                0.028993                 0.12857               0.13129                0.038983                  122.79                   123.25                0.87798                0.87798                 0.035637                 -17.195                   3.6566                  3.6912                 0.87798                  0.87798       
     false           -1.4031                 0.83297                 -0.47675               -20.106                0.031022                 0.14107               0.14362                0.039402                  126.99                   127.42                0.86883                0.86883                 0.037251                 -17.231                   3.9411                  3.9736                 0.86883                  0.86883       
     false           -1.4361                 0.84933                 -0.43984               -20.075                0.032548                 0.13086               0.13366                0.039892                  119.19                   119.66                0.88179                0.88179                 0.039854                 -17.195                   3.6701                  3.7057                 0.88179                  0.88179       
     false           -1.9505                  1.5022                 -0.94977               -17.373               0.0022521               0.0055684              0.038888               0.0026149                  5.2551                   15.076                0.99633                0.99633                0.0024746                 -15.039                  0.14137                 0.76884                 0.99633                  0.99633       

Once the feature tables, which contain the features and the labels, have been computed for each anomaly type, train an SVM model with a gaussian kernel. Further, as seen in summary of one of the feature tables, the features have different ranges. To prevent the SVM model from being biased towards features with larger values, set the Standardize parameter to true. Save the three models to anomalyDetectionModelSVM.mat file for use in a later section.

m.m1 = fitcsvm(featureTableTrain1, 'Anomaly', 'Standardize',true, 'KernelFunction','gaussian');
m.m2 = fitcsvm(featureTableTrain2, 'Anomaly', 'Standardize',true, 'KernelFunction','gaussian');
m.m3 = fitcsvm(featureTableTrain3, 'Anomaly', 'Standardize',true, 'KernelFunction','gaussian');

% Save model for reuse later
save anomalyDetectionModelSVM m

Model Testing and Validation

To test the model accuracy, use the data in the testEnsemble. Read the signal and anomaly data in testEnsemble, and from that data, create a data ensemble using the generateEnsembleTable helper function. Then, use the diagnosticFeatures function to compute the selected features.

testData = testEnsemble.readall;
anomaly_data_test = vertcat(testData.Anomalies{:});

yTestAnomaly1=logical(anomaly_data_test.Data(:,1));
yTestAnomaly2=logical(anomaly_data_test.Data(:,2));
yTestAnomaly3=logical(anomaly_data_test.Data(:,3));

sensorDataTest = vertcat(testData.Signals{:});
ensembleTableTest1 = generateEnsembleTable(sensorDataTest.Data,yTestAnomaly1,windowSize);
ensembleTableTest2 = generateEnsembleTable(sensorDataTest.Data,yTestAnomaly2,windowSize);
ensembleTableTest3 = generateEnsembleTable(sensorDataTest.Data,yTestAnomaly3,windowSize);

featureTableTest1 = diagnosticFeatures(ensembleTableTest1);
featureTableTest2 = diagnosticFeatures(ensembleTableTest2);
featureTableTest3 = diagnosticFeatures(ensembleTableTest3);

% Predict
results1 = m.m1.predict(featureTableTest1(:, 2:end));
results2 = m.m2.predict(featureTableTest2(:, 2:end));
results3 = m.m3.predict(featureTableTest3(:, 2:end));

Compare the predictions from the three models for each anomaly type are compared to the true labels using a confusion chart.

% Combine predictions and create final vector to understand performance
yT = [featureTableTest1.Anomaly, featureTableTest2.Anomaly, featureTableTest3.Anomaly];
yCategoricalT = labelArray(sum(yT.*2.^(size(yT,2)-1:-1:0),2)+1,1);
yCategoricalT = categorical(yCategoricalT);

yHat = [results1, results2, results3];
yCategoricalTestHat=labelArray(sum(yHat.*2.^(size(yHat,2)-1:-1:0),2)+1,1);
yCategoricalTestHat=categorical(yCategoricalTestHat);

figure
confusionchart(yCategoricalT, yCategoricalTestHat)

The confusion chart illustrates that the models performed well in identifying single and multiple simultaneous anomalies in the test data set. Since we are using three separate models, individual confusion charts can be created to visualize the performance of each model.

figure
tiledlayout(1,3);
nexttile; confusionchart(featureTableTest1.Anomaly, results1);
nexttile; confusionchart(featureTableTest2.Anomaly, results2);
nexttile; confusionchart(featureTableTest3.Anomaly, results3);

The confusion charts indicates that the models work well at identifying both single and multiple anomalies. At this point, a suitable model with satisfactory performance has been trained and tested. In the next section, repackage the algorithm to prepare it for deployment.

Create Deployment Ready Code

To prepare for deployment, create a function called detectCoolingFanAnomaly to load the pretrained models stored in anomalyDetectionModelSVM.mat file, create a data ensemble, compute selected features and predict anomalies. To validate the detectCoolingFanAnomaly function, use data that has not been used before in validationEnsemble. This function expects data in windows of size 2000, so loop through the data in validationEnsemble to simulate streaming data.

validationData = validationEnsemble.readall;
anomaly_data_validation = vertcat(validationData.Anomalies{1}.Data);

yValidationAnomaly1=logical(anomaly_data_validation(:,1));
yValidationAnomaly2=logical(anomaly_data_validation(:,2));
yValidationAnomaly3=logical(anomaly_data_validation(:,3));

sensorDataValidation = vertcat(validationData.Signals{:}.Data);

windowSize = 2000;
ii = IndexIterator(1, size(sensorDataValidation,1), windowSize);
predictedLabel = nan(floor(size(sensorDataValidation,1)/windowSize), 3);
trueLabel = nan(floor(size(sensorDataValidation,1)/windowSize), 3);
counter = 1;
while ~ii.EndofRangeFlag
    frame = ii.nextFrameIndex;
    [predictedLabel(counter, :), trueLabel(counter, :)] = detectCoolingFanAnomaly(sensorDataValidation(frame(1):frame(2), 1:3), anomaly_data_validation(frame(1):frame(2), 1:3));
    counter = counter+1;
end
reset(ii);

tiledlayout(1,3);
nexttile; confusionchart(trueLabel(:,1), predictedLabel(:,1));
nexttile; confusionchart(trueLabel(:,2), predictedLabel(:,2));
nexttile; confusionchart(trueLabel(:,3), predictedLabel(:,3));

From the confusion charts, the model has an accuracy similar to the test case. This implies that the detectCoolingFanAnomaly function has been created successfully and is ready for deployment. The choice of deployment approach depends on the requirements of the IT/OT infrastructure.

Create a Microservice Docker Image

For IT/OT infrastructure that implements a microservice architecture with containers, MATLAB Compiler SDK enables deployement of the algorithm to a docker image with an http endpoint. Note that this section requires Docker to be installed and configured. This can be verified by evaluating the following code in a MATLAB command window.

[~,msg] = system('docker version')
msg = 
    'Client: Docker Engine - Community
      Version:           24.0.6
      API version:       1.43
      Go version:        go1.20.7
      Git commit:        ed223bc
      Built:             Mon Sep  4 12:32:17 2023
      OS/Arch:           linux/amd64
      Context:           default
     
     Server: Docker Engine - Community
      Engine:
       Version:          24.0.6
       API version:      1.43 (minimum version 1.12)
       Go version:       go1.20.7
       Git commit:       1a79695
       Built:            Mon Sep  4 12:32:17 2023
       OS/Arch:          linux/amd64
       Experimental:     false
      containerd:
       Version:          1.6.24
       GitCommit:        61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
      runc:
       Version:          1.1.9
       GitCommit:        v1.1.9-0-gccaecfc
      docker-init:
       Version:          0.19.0
       GitCommit:        de40ad0
     '

Start by creating an archive called coolingFanAnomalyArchive using the productionServerArchive funtion. Next, create a microservice Docker image using the files generated by the productionServerArchive function.

mpsResults = compiler.build.productionServerArchive('detectCoolingFanAnomaly.m', 'ArchiveName', 'coolingFanAnomalyArchive', 'Verbose', 'off')
mpsResults = 
  Results with properties:

                  BuildType: 'productionServerArchive'
                      Files: {'/mathworks/home/anarasim/Documents/MATLAB/ExampleManager/anarasim.Bdoc23b.j2445683/predmaint-ex62086719/coolingFanAnomalyArchiveproductionServerArchive/coolingFanAnomalyArchive.ctf'}
    IncludedSupportPackages: {}
                    Options: [1×1 compiler.build.ProductionServerArchiveOptions]

compiler.package.microserviceDockerImage(mpsResults,'ImageName','image_detectcoolingfan', 'VerbosityLevel', 'none');

Verify that a docker image with the name image_detectCoolingFanAnomaly has been successfully created by listing the current images

[~, msg] = system('docker images image_detectcoolingfan')
msg = 
    'REPOSITORY               TAG       IMAGE ID       CREATED                  SIZE
     image_detectcoolingfan   latest    1afb5e09760f   Less than a second ago   3.87GB
     '

Start a container from the image_detectioncoolingfananomaly image with a published port at 9900 mapped to port 9910 to the http service within:

[~, msg] = system('docker run --rm -p 9900:9910 image_detectcoolingfananomaly &');

To ensure that the container is up and running, send a health check request to the /api/health endpoint. The response should be a JSON with the value {status: ‘ok’}:

webread('http://localhost:9900/api/health')
ans = struct with fields:
    status: 'ok'

Validate the Algorithm in the Docker Container

With the Docker container initialized, to verify the algorithm it contains, reuse the data in validationEnsemble. Following a similar path as above, i.e., split the data into windows of size 2000 samples each and populate the message body for an http POST request. The response to the request will contain the predicted anomalies status that can then be compared against the true labels to verify accuracy.

import matlab.net.http.*
import matlab.net.http.field.*
data = matlab.net.http.MessageBody();

windowSize = 2000;
ii = IndexIterator(1, size(sensorDataValidation, 1), windowSize);
predictedLabel = nan(floor(size(sensorDataValidation,1)/windowSize), 3);
trueLabel = nan(floor(size(sensorDataValidation,1)/windowSize), 3);
counter = 1;

while ~ii.EndofRangeFlag
    frame = ii.nextFrameIndex;

    rhs = {sensorDataValidation(frame(1):frame(2), :), anomaly_data_validation(frame(1):frame(2), :)};
    data.Payload = mps.json.encoderequest(rhs, 'Nargout', 2, 'OutputFormat', 'large');
    request = RequestMessage('POST', ...
        [HeaderField('Content-Type', 'application/json')], ...
        data);

    response = request.send('http://localhost:9900/coolingFanAnomalyArchive/detectCoolingFanAnomaly');
    predictedLabel(counter, :) = response.Body.Data.lhs(1).mwdata';
    trueLabel(counter, :) = response.Body.Data.lhs(2).mwdata';
    counter = counter+1;
end
reset(ii);
 
figure;
tiledlayout(1,3)
nexttile; confusionchart(trueLabel(1:250,1), predictedLabel(1:250,1));
nexttile; confusionchart(trueLabel(1:250,2), predictedLabel(1:250,2));
nexttile; confusionchart(trueLabel(1:250,3), predictedLabel(1:250,3));

The anomaly detection algorithm in the docker container can now be integrated into existing IT/OT infrastructure and managed with Kubernetes for high availability and to manage large volumes of requests. The docker image can be easily shared with other members for testing the algorithm and also for integration into other systems.

References

[1] https://www.mathworks.com/help/sps/ug/thermistor-controlled-fan.htm

See Also

Related Topics