Industrial Cooling Fan Anomaly Detection Algorithm Development with Deployment to a Microservice Docker Image
Detection of anomalies from sensor measurements is an important part of an equipment condition monitoring process for preventing catastrophic failure. This example shows an end-to-end workflow for creating a predictive maintenance application, from development of a machine learning algorithm to the deployment of the trained machine learning model. A Simulink® model is used to synthetically create measurement data for both normal and anomalous conditions. A support vector machine (SVM) model is trained for detecting anomalies in load, fan mechanics, and power consumption of an industrial cooling fan.
The model detects a load anomaly when the system is working on overloaded conditions, that is, when the system work demand exceeds designed limits. A fan anomaly is detected when a fault occurs within the mechanical subsystem of the motor or the fan. Finally, a power supply anomaly can be detected by a drop in the voltage.
Data Generation
In this example, a thermistor-controlled fan model defined using Simscape™ Electrical blocks, generates the measurements. This Simulink model includes thermal, mechanical and electrical components of a fan. Learn more about this model here [1].
addpath('Data_Generator/', 'Data_Generator/VaryingConvectionLib/'); mdl = "CoolingFanWithFaults"; open_system(mdl)
Anomalies are introduced to this model at random times and for random durations by using custom MATLAB® functions within the DC Source Voltage, Aero Drag and External Temp blocks. There are only three sensors located in this model: a voltage sensor, power sensor and temperature sensor. Data from these sensors is used to detect anomalies.
To generate data with sufficient instances of all anomalies, simulate the model 40 times with randomly generated anomalies. Note that these anomalies can occur in more than one compontent. The generateData
helper function supports the simulation of the data. Within the generateData
function, nominal ranges and the anomalous range are defined. The generateSimulationEnsemble
function is used within generateData
function to run the simulations, in parallel if Parallel Computing Toolbox has been installed, and generate the data for this experiment.
generateFlag = false; if isfolder('./Data') folderContent = dir('Data/CoolingFan*.mat'); if isempty(folderContent) generateFlag = true; else numSim = numel(folderContent); end else generateFlag = true; end if generateFlag numSim = 40; rng("default"); generateData('training', numSim); end
Once the simulations have been completed, load the data by using simulationEnsembleDatastore
and setting the selected variables to be Signals and Anomalies. SimulationEnsembleDatastore
is a datastore specialized for use in developing algorithms for condition monitoring and predictive maintenance using simulated data. Create a simulation ensemble datastore for the data generated by the 40 simulations and set the selected variables appropriately.
ensemble = simulationEnsembleDatastore('./Data')
ensemble = simulationEnsembleDatastore with properties: DataVariables: [4×1 string] IndependentVariables: [0×0 string] ConditionVariables: [0×0 string] SelectedVariables: [4×1 string] ReadSize: 1 NumMembers: 40 LastMemberRead: [0×0 string] Files: [40×1 string]
ensemble.SelectedVariables = ["Signals", "Anomalies"];
Data Characteristics
Split the ensemble into training, test and validation subsets. Assign the first 37 members of the ensemble as the training set, next two members as the test set and the last member as the validation set. To visualize the signal characteristics, read the first member of the training ensemble and plot the signals and the anomaly labels associated with the measured signals.
trainEnsemble = subset(ensemble, 1:numSim-3)
trainEnsemble = simulationEnsembleDatastore with properties: DataVariables: [4×1 string] IndependentVariables: [0×0 string] ConditionVariables: [0×0 string] SelectedVariables: [2×1 string] ReadSize: 1 NumMembers: 37 LastMemberRead: [0×0 string] Files: [37×1 string]
testEnsemble = subset(ensemble, numSim-2:numSim-1); validationEnsemble = subset(ensemble, numSim); tempData = trainEnsemble.read; signal_data = tempData.Signals{1}; anomaly_data = tempData.Anomalies{1}; h = figure; tiledlayout("vertical"); ax1 = nexttile; plot(signal_data.Data(:,1)); title('Motor Voltage') ax2 = nexttile; plot(signal_data.Data(:,2)); title('Fan Speed') ax3 = nexttile; plot(signal_data.Data(:,3)); title('Temperature'); xlabel('Time in seconds'); ax4 = nexttile; plot(anomaly_data.Data(:,1), 'bx', 'MarkerIndices',find(anomaly_data.Data(:,1)>0.1), 'MarkerSize', 5); hold on; plot(anomaly_data.Data(:,2), 'ro', 'MarkerIndices',find(anomaly_data.Data(:,2)>0.1), 'MarkerSize', 5); plot(anomaly_data.Data(:,3), 'square', 'Color', 'g', 'MarkerIndices',find(anomaly_data.Data(:,3)>0.1), 'MarkerSize', 5); ylim([0, 2]); title('Anomalies'); legend("Load Anomaly", "Fan Anomaly", "Power Supply Anomaly", 'Location', 'bestoutside') linkaxes([ax1 ax2 ax3 ax4],'x') set(h,'Units','normalized','Position',[0 0 1 .8]);
Anomalies are manifested as small pulses on the traces that last between 10 and 100 seconds. Notice that different types of anomalies have different profiles. The information contained in only one of these traces is not sufficient to classify among the different type of anomalies, as one anomaly can be manifested in more than one of the signals. Zoom in to see the signal characteristics when the anomalies occur. At this scale, it is clear that the magnitude of the raw measurements do change during periods of anomalies. The plot shows that more than one anomaly can occur at a time. Therefore, the anomaly detection algorithm needs to use the information in all three measurements to distinguish among anomalies in load, fan speed and power supply.
Model Development
Characteristics of the measured data and the possible anomalies are important in determining the modeling approach. As seen in the data above, anomalies can span multiple consecutive time steps. It is not sufficient to monitor just one signal. All three signals need to be monitored to identify each of the three anomalies. Furthermore, the selected model must have a quick response time. Therefore, the model needs to detect anomalies in reasonably small window size of data. Based on these factors and knowledge of the mechanical components, define a window size of 2000 time samples. This window size is sufficient to capture an anomalous event but at the same time is short enough for the anomaly detection to be quick.
The next step is to understand the balance of the dataset for the different anomaly cases. There are a total of eight different possibilities ranging from no anomalies in either load, fan speed or power supply, or anomalies in all three. Read in the trainingData
and summarize the number of instances of each anomaly type. Single anomalies are represented as 'Anomaly1', two simultaneously occuring anomalies are labeled as 'Anomaly12' and anomalies in all three categories are labeled as 'Anomaly123'.
trainingData = trainEnsemble.readall; anomaly_data = vertcat(trainingData.Anomalies{:}); labelArray={'Normal','Anomaly1','Anomaly2', 'Anomaly3', 'Anomaly12','Anomaly13','Anomaly23','Anomaly123'}'; yCategoricalTrain=labelArray(sum(anomaly_data.Data.*2.^(size(anomaly_data.Data,2)-1:-1:0),2)+1,1); yCategoricalTrain=categorical(yCategoricalTrain); summary(yCategoricalTrain);
Anomaly1 50653 Anomaly12 51123 Anomaly13 59 Anomaly2 48466 Anomaly23 64 Anomaly3 28 Normal 31817644
As seen above, the dataset is highly imbalanced. There are significantly more points for the normal condition than for anomalous conditions.The categories with multiple simultaneous anomalies occur even less frequently. To use the information in the multiple simultaneous anomaly cases (Anomaly12, Anomaly13, Anomaly23), train three independent classifiers. Each SVM model learns about only a single anomaly type. SVM looks at the interactions between the features and does not assume independence, as long as a non-linear kernel is used.
Features need to be extracted from raw measurement data to achieve a robust and performant model. The Diagnostic Feature Designer (DFD) app assists in the design of features. Create data ensembles using the makeEnsembleTableSimpleFrame
helper function to represent the raw data as inputs to the DFD app. The helper function splits the timeseries into frames of size 2000 samples. To deal with the highly imbalanced nature of the dataset, use the downsampleNormalData
helper function to reduce the number of frames with the normal label by about 87%. Use this reduced dataset in the DFD app to design features. For each anomaly type, import the data into the app and extract suitable features. Select the top ranked features to train the SVM. To reproduce the workflow, a MATLAB function diagnosticFeatures
is instead exported from the DFD app and top features for all three anomaly types have been selected to compute a total of 18 features. Set startApp
to true
to instead open the DFD app and compute new features.
yTrainAnomaly1=logical(anomaly_data.Data(:,1)); yTrainAnomaly2=logical(anomaly_data.Data(:,2)); yTrainAnomaly3=logical(anomaly_data.Data(:,3)); windowSize=2000; sensorDataTrain = vertcat(trainingData.Signals{:}); ensembleTableTrain1 = generateEnsembleTable(sensorDataTrain.Data,yTrainAnomaly1,windowSize);
Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 8 workers.
ensembleTableTrain2 = generateEnsembleTable(sensorDataTrain.Data,yTrainAnomaly2,windowSize); ensembleTableTrain3 = generateEnsembleTable(sensorDataTrain.Data,yTrainAnomaly3,windowSize); ensembleTableTrain1_reduced = downsampleNormalData(ensembleTableTrain1); ensembleTableTrain2_reduced = downsampleNormalData(ensembleTableTrain2); ensembleTableTrain3_reduced = downsampleNormalData(ensembleTableTrain3); startApp = false; if startApp diagnosticFeatureDesigner; end featureTableTrain1 = diagnosticFeatures(ensembleTableTrain1_reduced); featureTableTrain2 = diagnosticFeatures(ensembleTableTrain2_reduced); featureTableTrain3 = diagnosticFeatures(ensembleTableTrain3_reduced); head(featureTableTrain1);
Anomaly Signal_tsmodel/Coef1 Signal_tsmodel/Coef2 Signal_tsmodel/Coef3 Signal_tsmodel/AIC Signal_tsmodel/Freq1 Signal_tsmodel/Mean Signal_tsmodel/RMS Signal_tsmodel_1/Freq1 Signal_tsmodel_1/Mean Signal_tsmodel_1/RMS Signal_tsfeat/ACF1 Signal_tsfeat/PACF1 Signal_tsmodel_2/Freq1 Signal_tsmodel_2/AIC Signal_tsmodel_2/Mean Signal_tsmodel_2/RMS Signal_tsfeat_2/ACF1 Signal_tsfeat_2/PACF1 _______ ____________________ ____________________ ____________________ __________________ ____________________ ___________________ __________________ ______________________ _____________________ ____________________ __________________ ___________________ ______________________ ____________________ _____________________ ____________________ ____________________ _____________________ true -0.9242 0.16609 -0.11822 -12.792 0.0025644 0.026824 0.03654 0.033299 109.08 109.14 0.95851 0.95851 0.0028621 -10.778 0.69223 0.89283 0.95851 0.95851 false -1.4043 0.8304 -0.47453 -20.195 0.033608 0.13957 0.14214 0.056065 133.08 133.49 0.87089 0.87089 0.046561 -17.282 4.0114 4.0427 0.87089 0.87089 false -1.4142 0.84819 -0.46628 -20.131 0.050843 0.15994 0.16218 0.075457 141.99 142.36 0.86277 0.86277 0.070711 -17.225 4.4718 4.4996 0.86277 0.86277 false -1.4191 0.85979 -0.48505 -20.118 0.038362 0.13684 0.13951 0.040716 119.62 120.1 0.87749 0.87749 0.051107 -17.233 3.7951 3.8294 0.87749 0.87749 false -1.3758 0.78127 -0.45944 -20.077 0.028993 0.12857 0.13129 0.038983 122.79 123.25 0.87798 0.87798 0.035637 -17.195 3.6566 3.6912 0.87798 0.87798 false -1.4031 0.83297 -0.47675 -20.106 0.031022 0.14107 0.14362 0.039402 126.99 127.42 0.86883 0.86883 0.037251 -17.231 3.9411 3.9736 0.86883 0.86883 false -1.4361 0.84933 -0.43984 -20.075 0.032548 0.13086 0.13366 0.039892 119.19 119.66 0.88179 0.88179 0.039854 -17.195 3.6701 3.7057 0.88179 0.88179 false -1.9505 1.5022 -0.94977 -17.373 0.0022521 0.0055684 0.038888 0.0026149 5.2551 15.076 0.99633 0.99633 0.0024746 -15.039 0.14137 0.76884 0.99633 0.99633
Once the feature tables, which contain the features and the labels, have been computed for each anomaly type, train an SVM model with a gaussian
kernel. Further, as seen in summary of one of the feature tables, the features have different ranges. To prevent the SVM model from being biased towards features with larger values, set the Standardize parameter to true. Save the three models to anomalyDetectionModelSVM.mat file for use in a later section.
m.m1 = fitcsvm(featureTableTrain1, 'Anomaly', 'Standardize',true, 'KernelFunction','gaussian'); m.m2 = fitcsvm(featureTableTrain2, 'Anomaly', 'Standardize',true, 'KernelFunction','gaussian'); m.m3 = fitcsvm(featureTableTrain3, 'Anomaly', 'Standardize',true, 'KernelFunction','gaussian'); % Save model for reuse later save anomalyDetectionModelSVM m
Model Testing and Validation
To test the model accuracy, use the data in the testEnsemble
. Read the signal and anomaly data in testEnsemble
, and from that data, create a data ensemble using the generateEnsembleTable
helper function. Then, use the diagnosticFeatures
function to compute the selected features.
testData = testEnsemble.readall;
anomaly_data_test = vertcat(testData.Anomalies{:});
yTestAnomaly1=logical(anomaly_data_test.Data(:,1));
yTestAnomaly2=logical(anomaly_data_test.Data(:,2));
yTestAnomaly3=logical(anomaly_data_test.Data(:,3));
sensorDataTest = vertcat(testData.Signals{:});
ensembleTableTest1 = generateEnsembleTable(sensorDataTest.Data,yTestAnomaly1,windowSize);
ensembleTableTest2 = generateEnsembleTable(sensorDataTest.Data,yTestAnomaly2,windowSize);
ensembleTableTest3 = generateEnsembleTable(sensorDataTest.Data,yTestAnomaly3,windowSize);
featureTableTest1 = diagnosticFeatures(ensembleTableTest1);
featureTableTest2 = diagnosticFeatures(ensembleTableTest2);
featureTableTest3 = diagnosticFeatures(ensembleTableTest3);
% Predict
results1 = m.m1.predict(featureTableTest1(:, 2:end));
results2 = m.m2.predict(featureTableTest2(:, 2:end));
results3 = m.m3.predict(featureTableTest3(:, 2:end));
Compare the predictions from the three models for each anomaly type are compared to the true labels using a confusion chart.
% Combine predictions and create final vector to understand performance
yT = [featureTableTest1.Anomaly, featureTableTest2.Anomaly, featureTableTest3.Anomaly];
yCategoricalT = labelArray(sum(yT.*2.^(size(yT,2)-1:-1:0),2)+1,1);
yCategoricalT = categorical(yCategoricalT);
yHat = [results1, results2, results3];
yCategoricalTestHat=labelArray(sum(yHat.*2.^(size(yHat,2)-1:-1:0),2)+1,1);
yCategoricalTestHat=categorical(yCategoricalTestHat);
figure
confusionchart(yCategoricalT, yCategoricalTestHat)
The confusion chart illustrates that the models performed well in identifying single and multiple simultaneous anomalies in the test data set. Since we are using three separate models, individual confusion charts can be created to visualize the performance of each model.
figure tiledlayout(1,3); nexttile; confusionchart(featureTableTest1.Anomaly, results1); nexttile; confusionchart(featureTableTest2.Anomaly, results2); nexttile; confusionchart(featureTableTest3.Anomaly, results3);
The confusion charts indicates that the models work well at identifying both single and multiple anomalies. At this point, a suitable model with satisfactory performance has been trained and tested. In the next section, repackage the algorithm to prepare it for deployment.
Create Deployment Ready Code
To prepare for deployment, create a function called detectCoolingFanAnomaly
to load the pretrained models stored in anomalyDetectionModelSVM.mat file, create a data ensemble, compute selected features and predict anomalies. To validate the detectCoolingFanAnomaly
function, use data that has not been used before in validationEnsemble
. This function expects data in windows of size 2000, so loop through the data in validationEnsemble to simulate streaming data.
validationData = validationEnsemble.readall; anomaly_data_validation = vertcat(validationData.Anomalies{1}.Data); yValidationAnomaly1=logical(anomaly_data_validation(:,1)); yValidationAnomaly2=logical(anomaly_data_validation(:,2)); yValidationAnomaly3=logical(anomaly_data_validation(:,3)); sensorDataValidation = vertcat(validationData.Signals{:}.Data); windowSize = 2000; ii = IndexIterator(1, size(sensorDataValidation,1), windowSize); predictedLabel = nan(floor(size(sensorDataValidation,1)/windowSize), 3); trueLabel = nan(floor(size(sensorDataValidation,1)/windowSize), 3); counter = 1; while ~ii.EndofRangeFlag frame = ii.nextFrameIndex; [predictedLabel(counter, :), trueLabel(counter, :)] = detectCoolingFanAnomaly(sensorDataValidation(frame(1):frame(2), 1:3), anomaly_data_validation(frame(1):frame(2), 1:3)); counter = counter+1; end reset(ii); tiledlayout(1,3); nexttile; confusionchart(trueLabel(:,1), predictedLabel(:,1)); nexttile; confusionchart(trueLabel(:,2), predictedLabel(:,2)); nexttile; confusionchart(trueLabel(:,3), predictedLabel(:,3));
From the confusion charts, the model has an accuracy similar to the test case. This implies that the detectCoolingFanAnomaly
function has been created successfully and is ready for deployment. The choice of deployment approach depends on the requirements of the IT/OT infrastructure.
Create a Microservice Docker Image
For IT/OT infrastructure that implements a microservice architecture with containers, MATLAB Compiler SDK enables deployement of the algorithm to a docker image with an http endpoint. Note that this section requires Docker to be installed and configured. This can be verified by evaluating the following code in a MATLAB command window.
[~,msg] = system('docker version')
msg = 'Client: Docker Engine - Community Version: 24.0.6 API version: 1.43 Go version: go1.20.7 Git commit: ed223bc Built: Mon Sep 4 12:32:17 2023 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 24.0.6 API version: 1.43 (minimum version 1.12) Go version: go1.20.7 Git commit: 1a79695 Built: Mon Sep 4 12:32:17 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.24 GitCommit: 61f9fd88f79f081d64d6fa3bb1a0dc71ec870523 runc: Version: 1.1.9 GitCommit: v1.1.9-0-gccaecfc docker-init: Version: 0.19.0 GitCommit: de40ad0 '
Start by creating an archive called coolingFanAnomalyArchive using the productionServerArchive funtion. Next, create a microservice Docker image using the files generated by the productionServerArchive function.
mpsResults = compiler.build.productionServerArchive('detectCoolingFanAnomaly.m', 'ArchiveName', 'coolingFanAnomalyArchive', 'Verbose', 'off')
mpsResults = Results with properties: BuildType: 'productionServerArchive' Files: {'/mathworks/home/anarasim/Documents/MATLAB/ExampleManager/anarasim.Bdoc23b.j2445683/predmaint-ex62086719/coolingFanAnomalyArchiveproductionServerArchive/coolingFanAnomalyArchive.ctf'} IncludedSupportPackages: {} Options: [1×1 compiler.build.ProductionServerArchiveOptions]
compiler.package.microserviceDockerImage(mpsResults,'ImageName','image_detectcoolingfan', 'VerbosityLevel', 'none');
Verify that a docker image with the name image_detectCoolingFanAnomaly
has been successfully created by listing the current images
[~, msg] = system('docker images image_detectcoolingfan')
msg = 'REPOSITORY TAG IMAGE ID CREATED SIZE image_detectcoolingfan latest 1afb5e09760f Less than a second ago 3.87GB '
Start a container from the image_detectioncoolingfananomaly image with a published port at 9900 mapped to port 9910 to the http service within:
[~, msg] = system('docker run --rm -p 9900:9910 image_detectcoolingfananomaly &');
To ensure that the container is up and running, send a health check request to the /api/health endpoint. The response should be a JSON with the value {status: ‘ok’}:
webread('http://localhost:9900/api/health')
ans = struct with fields:
status: 'ok'
Validate the Algorithm in the Docker Container
With the Docker container initialized, to verify the algorithm it contains, reuse the data in validationEnsemble. Following a similar path as above, i.e., split the data into windows of size 2000 samples each and populate the message body for an http POST request. The response to the request will contain the predicted anomalies status that can then be compared against the true labels to verify accuracy.
import matlab.net.http.* import matlab.net.http.field.* data = matlab.net.http.MessageBody(); windowSize = 2000; ii = IndexIterator(1, size(sensorDataValidation, 1), windowSize); predictedLabel = nan(floor(size(sensorDataValidation,1)/windowSize), 3); trueLabel = nan(floor(size(sensorDataValidation,1)/windowSize), 3); counter = 1; while ~ii.EndofRangeFlag frame = ii.nextFrameIndex; rhs = {sensorDataValidation(frame(1):frame(2), :), anomaly_data_validation(frame(1):frame(2), :)}; data.Payload = mps.json.encoderequest(rhs, 'Nargout', 2, 'OutputFormat', 'large'); request = RequestMessage('POST', ... [HeaderField('Content-Type', 'application/json')], ... data); response = request.send('http://localhost:9900/coolingFanAnomalyArchive/detectCoolingFanAnomaly'); predictedLabel(counter, :) = response.Body.Data.lhs(1).mwdata'; trueLabel(counter, :) = response.Body.Data.lhs(2).mwdata'; counter = counter+1; end reset(ii); figure; tiledlayout(1,3) nexttile; confusionchart(trueLabel(1:250,1), predictedLabel(1:250,1)); nexttile; confusionchart(trueLabel(1:250,2), predictedLabel(1:250,2)); nexttile; confusionchart(trueLabel(1:250,3), predictedLabel(1:250,3));
The anomaly detection algorithm in the docker container can now be integrated into existing IT/OT infrastructure and managed with Kubernetes for high availability and to manage large volumes of requests. The docker image can be easily shared with other members for testing the algorithm and also for integration into other systems.
References
[1] https://www.mathworks.com/help/sps/ug/thermistor-controlled-fan.htm
See Also
Related Topics
- Thermistor-Controlled Fan (Simscape Electrical)