Statistics and Machine Learning Toolbox
Analyze and model data using statistics and machine learning
Have questions? Contact Sales.
Have questions? Contact Sales.
Statistics and Machine Learning Toolbox provides functions and apps to describe, analyze, and model data. You can use descriptive statistics, visualizations, and clustering for exploratory data analysis; fit probability distributions to data; generate random numbers for Monte Carlo simulations, and perform hypothesis tests. Regression and classification algorithms let you draw inferences from data and build predictive models either interactively, using the Classification and Regression Learner apps, or programmatically, using AutoML.
For multidimensional data analysis and feature extraction, the toolbox provides principal component analysis (PCA), regularization, dimensionality reduction, and feature selection methods that let you identify variables with the best predictive power.
The toolbox provides supervised, semi-supervised, and unsupervised machine learning algorithms, including support vector machines (SVMs), boosted decision trees, shallow neural nets, k-means, and other clustering methods. You can apply interpretability techniques such as partial dependence plots, Shapley values and LIME, and automatically generate C/C++ code for embedded deployment. Native Simulink blocks let you use predictive models with simulations and Model-Based design. Many toolbox algorithms can be used on data sets that are too big to be stored in memory.
Explore data through statistical plotting with interactive and visual graphics and descriptive statistics. Understand and describe potentially large sets of data quickly using descriptive statistics, including measures of central tendency, dispersion, shape, correlation, and covariance.
Identify patterns and features by applying k-means, hierarchical, DBSCAN and other clustering methods, and dividing data into groups or clusters. Determine the optimal number of clusters for the data using different evaluation criteria. Detect anomalies to identify outliers and novelties.
Assign sample variance to different sources and determine whether the variation arises within or among different population groups. Use one-way, two-way, multiway, multivariate, and nonparametric ANOVA, as well as analysis of covariance (ANOCOVA) and repeated measures analysis of variance (RANOVA).
Use the Regression Learner app or programmatically train and assess models such as linear regression, Gaussian processes, support vector machines, neural networks, and ensembles.
Use the Classification Learner app or programmatically train and validate models such as logistic regression, support vector machines, boosted trees, and shallow neural networks.
Extract features from images, signals, text, and numeric data. Iteratively explore and create new features and select the ones that optimize performance. Reduce dimensionality by transforming existing features into new predictor variables and drop less descriptive features after transformation, or by applying automated feature selection.
Fit continuous and discrete distributions, use statistical plots to evaluate goodness-of-fit, and compute probability density functions and cumulative distribution functions for more than 40 different distributions.
Draw inferences about a population based on statistical evidence from a sample. Perform t-tests, distribution tests, and nonparametric tests for one, paired, or independent samples. Test for autocorrection and randomness, and compare distributions.
Statistically analyze effects and data trends. Design experiments to create and test practical plans for how to manipulate data inputs to generate information about their effects on data outputs. Visualize and analyze time-to-failure data with and without censoring and monitor and assess the quality of industrial processes.
Use tall arrays and tables with many classification, regression, and clustering algorithms to train models on data sets that do not fit in memory without changing your code.
Generate portable and readable C/C++ code for inference of classification and regression models, descriptive statistics, and probability distributions. Generate C/C++ prediction code with reduced precision, and update parameters of deployed models without regenerating the prediction code.
Statistics and Machine Learning Toolbox provides functions and apps to describe, analyze, and model data using descriptive statistics, visualizations, clustering, probability distributions, hypothesis tests, and supervised, semi-supervised, and unsupervised machine learning algorithms.
You can train models interactively using the Classification Learner and Regression Learner apps, or programmatically using AutoML, with algorithms including linear regression, support vector machines, boosted decision trees, and shallow neural networks.
The toolbox provides linear regression, generalized linear models (logistic, Poisson, and more), nonlinear regression, mixed-effects models for grouped or hierarchical data, and Gaussian process regression. You can perform stepwise variable selection, regularization with lasso and ridge, and multinomial logistic regression for multiclass outcomes.
The toolbox provides k-means, hierarchical clustering, Gaussian mixture models, DBSCAN, spectral clustering and other methods to identify patterns, determine optimal cluster numbers, and detect anomalies or outliers.
The toolbox includes over 25 built-in probability distributions — including normal, Weibull, gamma, beta, lognormal, and many more — that you can fit to data using fitdist or the interactive Distribution Fitter app. You can also create and fit custom distributions using maximum likelihood estimation, compare fits using information criteria, and work with censored data for survival and reliability analyses.
You can generate portable C/C++ code for embedded deployment using MATLAB Coder, use native Simulink blocks for simulations and Model-Based Design, or compile MATLAB code with MATLAB Compiler for IT environments.
Statistics and Machine Learning Toolbox provides a single environment - from exploratory analysis through production deployment - in a single tested and documented product. Key differentiators include interactive guided apps for beginners, C/C++ code generation for embedded deployment via MATLAB Coder, and direct Simulink integration for Model-Based Design. Open-source alternatives like Python and R offer free, community-supported options, but typically require assembling multiple packages and do not provide equivalent paths for embedded code generation or simulation integration.
You can perform t-tests, distribution tests, nonparametric tests, ANOVA, analysis of covariance, repeated measures ANOVA, and tests for autocorrelation, randomness, and distribution comparisons.
Yes. You can use interactive apps such as Distribution Fitter to fit probability distributions to your data, and Classification Learner and Regression Learner to train, compare, and validate machine learning models – all without writing code.
Discover the possibilities today.
Get pricing information and explore related products.
Your school may already provide access to MATLAB, Simulink, and add-on products through a campus-wide license.