Analyze and Select Features for Pump Diagnostics
This example shows how to use the Diagnostic Feature Designer app to analyze and select features to diagnose faults in a triplex reciprocating pump.
The example uses simulated pump fault data generated by the Multi-Class Fault Detection Using Simulated Data example. The data has been preprocessed to remove the pump startup transients.
Open Diagnostic Feature Designer
Load the triplex pump fault data. The pump data contains 240 flow and pressure measurements for different fault conditions. There are three fault types (leaking pump cylinder, blocked pump inlet, increased pump bearing friction). The measurements cover conditions where none, one, or multiple faults are present. The data is collected in a table where each row is a different measurement.
load('savedPumpData')
pumpDatapumpData=240×3 table
           flow                pressure         faultCode
    __________________    __________________    _________
    {1201×1 timetable}    {1201×1 timetable}       0     
    {1201×1 timetable}    {1201×1 timetable}       0     
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       0     
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       100   
    {1201×1 timetable}    {1201×1 timetable}       100   
      ⋮
Open Diagnostic Feature Designer by using the diagnosticFeatureDesigner command. Initiate a new session by clicking New Session, which opens a dialog box for importing data. 

In the Select dataset from workspace pane, select pumpData as your data source. In the Select source variables pane, confirm that the variable names match those that you viewed at the command line. flow and pressure are both signals. faultCode is a condition variable. Condition variables denote the presence or absence of a fault and are used by the app for grouping and classification. When you first open the New Session dialog box, the app displays the variable properties for the first variable.

Click Import to import the pump data into the app.
Plot Data and Group by Fault Code
Plot the flow signal by selecting flow from the Variables section of the data browser and clicking Signal Trace in the plot gallery. Plot the pressure signal the same way.

These plots show the pressure and flow signals for all 240 members in the dataset. You can click the Signal Trace tab and select Group by faultCode to display signals with the same fault code in the same color. Grouping signals in this way can help you to quickly determine if there are any clear differences between signals of different fault types. In this case, the measured signals do not show any clear differences for different fault codes.

To group all future plots by faultCode, use Plot Options. Clicking Plot Options opens a dialog box that lets you set preferences for the session.

Extract Time Domain Features
As the measured signals do not show any differences for different fault conditions, the next step is to extract time-domain features such as signal mean and standard deviation from the signal. First, select flow/Data in the data browser. Then, select Time-Domain Features and then Signal Features.

Two new tabs open, Signal Features and Time-Domain Features. In Signal Features, select the features you would like to extract and click Apply. For now, clear the Plot results check box. You will plot results later to see if the features help distinguish different fault conditions. Repeat this process for the pressure signal.

Extract Frequency Domain Features
A reciprocating pump uses a drive shaft and cylinders to pump fluid. Because of the mechanical construction of the pump, there are likely to be cyclic fluctuations in the pump flow and pressure. For example, zoom into a section of the flow signals using the signal panner below the signal trace plot.

Computing the frequency spectrum of the flow will highlight the cyclic nature of the flow signal and could give better insight into how the flow signal changes under different fault conditions. Estimate the frequency spectra using an autoregressive model.

This method fits an autoregressive model of the prescribed order to the data, and then computes the spectrum of that estimated model. This approach reduces any overfitting to the raw data signal. In this case specify a model order of 20. Also set the frequency grid to have a minimum of 0 and a maximum of 500.

Plotting the computed spectra on a linear scale clearly shows resonant peaks. Grouping by fault code highlights how the spectra change for different fault conditions.

Perform the same computations for the pressure signal as the results will provide additional features to help distinguish different fault conditions.
You can now compute spectral features such as peaks, modal coefficients, and band power.

Extract these features in a smaller band of frequencies between 23-250 Hz as the peaks after 250 Hz are smaller. For each signal, extract five spectral peaks. For now, clear the Plot results check box. You will plot results later to see if the features help distinguish different fault conditions. Repeat this process for the pressure signal by changing the signal selected at the top of the dialog box.

View Features
All the features we have extracted have been collected in a table shown in the Feature Tables browser. To view the computed feature data, select FeatureTable1 from the data browser and click Feature Table View in the plot gallery. The fault code is also displayed in the feature table view as the rightmost column in the table. As more features are computed, more columns get appended to the table.

You can see the distributions of the feature values for different condition variable values, in this case, fault types, by viewing the feature table as a histogram. Select FeatureTable1 and then, click Histogram in the plot gallery to create a set of histogram plots. Use the next and previous buttons to show histograms for different features. Histogram plots grouped by fault code can help to determine if certain features are strong differentiators between fault types. If they are strong differentiators, their distributions will be more distant from each other. For the triplex pump data, the feature distributions tend to overlap and there are no features that can clearly be used to identify faults. The next section looks at using automated ranking to find which features are more useful for fault prediction.

Rank and Export Features
From the Feature Designer tab, click Rank Features and select FeatureTable1. The app gathers all the feature data and ranks the features based on a metric such as ANOVA. The app lists the features in order of importance based on the metric value. In this case, the RMS value for the flow signal and the RMS and mean values for the pressure signal are the features that most strongly distinguish different fault types from each other.

After you have ranked your features in terms of importance, the next step is to export them so that you can train a classification model based on these features. Click Export, select Export features to Classification Learner, and select the features you want to use for classification. In this case, export the top 15 features. The app then sends these features to Classification Learner where they can be used to design a classifier to identify different faults.

In the New Session from File dialog box that Classification Learner opens, confirm 5-fold cross-validation and start the session.

Classification Learner displays a scatter plot for a single model.

In the Models section of the Classification Learner tab, select all model types for training.

In the Models list, select Multiple. Then, click Train All.

When the training is complete, Classification Learner lists each model in order of model number, along with the model validation accuracy, and displays a confusion matrix for the first model in the set. Change the Sort by order to Accuracy (Validation).

The SVM method has the highest classification accuracy of around 79%. There is some randomness in the process, so your results may be different. Select this model and click Confusion Matrix. The confusion matrix illustrates how well this method classifies models for each fault type. The entries on the diagonal represent the number of fault types that are correctly classified. Off-diagonal entries represent fault types for which the predicted and true classes are not the same. To improve accuracy, you can try increasing the number of features. Alternatively, you can iterate on the existing features. Another step would be to iterate on the existing features— especially the spectral features — and perhaps to modify the spectral computation method, change the bandwidth, or use different frequency peaks to improve the classification accuracy.

Diagnose Triplex Pump Faults
This example showed how to use Diagnostic Feature Designer to analyze and select features and create a classifier to diagnose faults in a triplex reciprocating pump.
See Also
Diagnostic Feature Designer | Classification Learner