Logistic regression in MATLAB (without Statistics and Machine Learning Toolbox)
68 views (last 30 days)
Show older comments
Alexandre Englert
on 20 Aug 2025 at 18:19
Commented: Alexandre Englert
on 26 Aug 2025 at 13:23
Anyone knows if there is a way to perform logistic regression (similar to the "LogisticRegression" model utilized in Python / scikit-learn) in MATLAB but without the "Statistics and Machine Learning Toolbox"? I have the basic MATLAB software (R2020b) without such toolbox and I didn´t want to migrate to the Python environment as well as to buy this specific toolbox.
Many thanks.
3 Comments
dpb
on 20 Aug 2025 at 21:54
There are at least a couple of well-rated functions at the File Exchange you could look at...
Accepted Answer
William Rose
on 21 Aug 2025 at 22:07
Edited: William Rose
on 21 Aug 2025 at 22:15
[edit: The equation I entered with LaTeX, in my answer below, is displaying as a fuzzy gray rectangle on my computer, after I submit the anwer. I hope it looks better to you. In case it doesn't look better for you, here is the equation in non-LaTeX format:
probability=1/(1+exp(-(x-mu)/s))
]
Here is code to do logistic regression without using any toolboxes. What it does:
- Generate simulated data: 100 observed values of 0 or 1, based on a logistic function of x, with known, specified parameters mu and s (location and scale, respectively).

- Estimate the parameters of the logistic function from the observations.
- Display the estimated parameter values in the console window.
- Plot the observations, and the logistic function with the known parameters, and the logistic function with the parameters estimated from the data.
The code defines a function which returns the negative log likelihood of the observations, for given values of mu and s (location and scale), using the logistic probability equation above.
The code uses fminsearch() (which is part of basic Matlab, not in a toolbox) to find the values of mu and s which give the best fit, i.e. which minimize the negative log likelihood of the observed data.
This script will give slightly different results each time you run it, since it uses rand() to generate the observations.
See detailed comments in the code.
% Generate simulated data for regression
mu=-1; % location = x-value where probability=0.5
s=2; % scale = transition width
x=sort(20*rand(1,100)-10); % 100 random numbers between -10 and +10
pr=1./(1+exp(-(x-mu)/s)); % probability of observing 1, at each x value
y=(rand(1,100)<=pr); % observed y-values (0 or 1)
% Estimate the logistic probability parameters from the observed data
params0=[0,1]; % initial guess for [mu,s]
params=fminsearch(@(params)negloglike(params,x,y),params0); % best-fit parameters
muEst=params(1); sEst=params(2);
fprintf('Estimated mu=%.3f, estimated s=%.3f.\n',muEst,sEst) % display results on console
% Compute estimated probability
prEst=1./(1+exp(-(x-muEst)/sEst)); % estimated probability
% Plot the observations, the probability function used to generate the observations,
% and the probability function estimated from the observations
figure
plot(x,y,'ko','MarkerFaceColor','k') % observations
hold on;
plot(x,pr,'-b','LineWidth',2) % probability function used to generate the observations
xlabel('X'); grid on
plot(x,prEst,'-r','LineWidth',2); % probability function esitmated from the observations
title('Logistic Regression: Observations and Probabilities')
legend('Observation','True Probability','Estimated Probability', 'Location','southeast')
function negLL=negloglike(params,x,y)
% NEGLOGLIKE Negative log likelihood of observations, using logistic regression
% Inputs
% params=[mu,s] where
% mu = x-value where probability=0.5
% s = scale = transition width in the x-direction
% x = x-values of observations (vector)
% y = observations, true(1) or false(0) (vector)
% Output
% negLL=-log(probability of observed observations, given mu and s)
% =-sum(log(prob. of each observation))
% =-sum(y*log(p)+(1-y)*log(1-p))
% where p=probility from logistic probability density
mu=params(1); s=params(2);
p=1./(1+exp(-(x-mu)/s)); % probability of observing 1
negLL=-sum(y.*log(p)+(1-y).*log(1-p));
end
5 Comments
William Rose
on 26 Aug 2025 at 1:32
Edited: William Rose
on 26 Aug 2025 at 1:35
@Alexandre Englert, I made a small adjustment. Now the script for logistic regresison with three predictors works well. I tried four runs with N=50 samples and four runs with N=200 samples. The script converged to reasonable estimates every time. An example of the output is below. The script is attached.
>> logisticRegressionExample3
200 samples: 80 zeros, 120 ones.
Estimated a= 2.293, estimated s= 1.696, 3.479,-2.605.
True a= 2.167, true s= 2.000, 4.000,-3.000.
>>

More Answers (1)
the cyclist
on 20 Aug 2025 at 21:46
Edited: the cyclist
on 20 Aug 2025 at 21:47
Logistic regression is an algorithm that can definitely be programmed in MATLAB, which is a general-purpose programming language. It is a numerical optimization problem.
I was not willing to put in the work to solve that for you, but ChatGPT was. Here is what it came up with, along with a comparison to the output of fitglm (from the Statistics and Machine Learning Toolbox), which is what I would typically use to do a logistic regression.
Without extensive testing, I cannot vouch that this code is doing what I would expect. Frankly, I have not even checked to see if there is a stupendously easier way to do this.
% - Small default ridge for stability (lambda = 1e-6)
% - Base MATLAB implementation + optional fitglm comparison
rng default
% Generate a simple synthetic binary classification dataset (2 informative features)
n = 400;
X1 = [randn(n/2,1) + 1.0; randn(n/2,1) - 1.0];
X2 = [randn(n/2,1) - 0.5; randn(n/2,1) + 0.5];
X = [ones(n,1), X1, X2]; % add intercept only (no extra constant feature)
% True parameters (with intercept)
beta_true = [-0.25; 2.0; -1.5];
p = sigmoid(X*beta_true);
y = double(rand(n,1) < p);
% Fit with our scratch implementation (no toolboxes)
opts = struct('maxIter', 100, 'tol', 1e-8, 'lambda', 1e-6); % tiny ridge for numerical safety
[beta_hat, stats] = logreg_newton(X, y, opts);
% Report results
fprintf('=== Base MATLAB Logistic Regression (Newton/IRLS) ===\n');
disp(table((0:size(X,2)-1)', beta_hat, 'VariableNames', {'CoeffIndex','Estimate'}));
fprintf('Converged: %d in %d iters, final |step| = %.3e, logLik = %.6f\n', ...
stats.converged, stats.iters, stats.lastStepNorm, stats.logLik);
% Train/test split and accuracy
idx = randperm(n);
train = idx(1:round(0.7*n));
test = idx(round(0.7*n)+1:end);
phat_train = sigmoid(X(train,:)*beta_hat);
phat_test = sigmoid(X(test,:)*beta_hat);
yhat_train = phat_train >= 0.5;
yhat_test = phat_test >= 0.5;
acc_train = mean(yhat_train == y(train));
acc_test = mean(yhat_test == y(test));
fprintf('Train acc: %.2f%% | Test acc: %.2f%%\n', 100*acc_train, 100*acc_test);
% Optional: compare to fitglm
hasStatsTBX = ~isempty(ver('stats'));
if hasStatsTBX
Xglm = X(:,2:end); % drop explicit intercept for fitglm
mdl = fitglm(Xglm, y, 'Distribution', 'binomial', 'Link', 'logit', 'Intercept', true);
beta_glm = mdl.Coefficients.Estimate;
fprintf('\n=== fitglm Comparison ===\n');
disp(table((0:size(X,2)-1)', beta_hat, beta_glm, beta_hat - beta_glm, ...
'VariableNames', {'CoeffIndex','BaseMATLAB','fitglm','Diff'}));
phat_test_glm = predict(mdl, Xglm(test,:));
yhat_test_glm = phat_test_glm >= 0.5;
acc_test_glm = mean(yhat_test_glm == y(test));
fprintf('Test acc (base): %.2f%% | Test acc (fitglm): %.2f%%\n', 100*acc_test, 100*acc_test_glm);
else
fprintf('\n[Note] Statistics and Machine Learning Toolbox not detected. Skipping fitglm comparison.\n');
end
% ------- Local functions (base MATLAB only) -------
function [beta, stats] = logreg_newton(X, y, opts)
%LOGREG_NEWTON Logistic regression via Newton-Raphson (IRLS).
if nargin < 3, opts = struct; end
if ~isfield(opts, 'maxIter'), opts.maxIter = 100; end
if ~isfield(opts, 'tol'), opts.tol = 1e-8; end
if ~isfield(opts, 'lambda'), opts.lambda = 1e-6; end % tiny ridge
[n,p] = size(X);
beta = zeros(p,1);
lambda = opts.lambda;
R = zeros(p); % L2 penalty (no intercept penalty)
if lambda > 0
R(2:end,2:end) = lambda*eye(p-1);
end
for k = 1:opts.maxIter
eta = X*beta;
p1 = sigmoid(eta);
W = p1 .* (1 - p1);
W = max(W, 1e-12); % avoid zeros
g = X'*(y - p1) - R*beta;
% Build H = X' * diag(W) * X + R without forming diag(W)
H = X'*(bsxfun(@times, X, W)) + R;
% Extra diagonal jitter for numerical stability in tough cases
% (helps if features are nearly collinear)
H = H + 1e-12*eye(p);
% Solve Newton step
step = H \ g;
% Backtracking line search on penalized log-likelihood
t = 1.0;
pll_prev = loglik(y, eta) - 0.5*lambda*sum(beta(2:end).^2);
while t > 1e-8
beta_try = beta + t*step;
eta_try = X*beta_try;
pll_try = loglik(y, eta_try) - 0.5*lambda*sum(beta_try(2:end).^2);
if pll_try >= pll_prev
beta = beta_try;
pll_prev = pll_try;
break;
end
t = t/2;
end
if norm(step) < opts.tol
stats.converged = true;
stats.iters = k;
stats.lastStepNorm = norm(step);
stats.logLik = pll_prev;
return;
end
end
stats.converged = false;
stats.iters = opts.maxIter;
stats.lastStepNorm = norm(step);
stats.logLik = pll_prev;
end
function y = sigmoid(z)
%SIGMOID Numerically stable logistic sigmoid
y = zeros(size(z));
pos = z >= 0;
neg = ~pos;
y(pos) = 1 ./ (1 + exp(-z(pos)));
ez = exp(z(neg));
y(neg) = ez ./ (1 + ez);
end
function ll = loglik(y, eta)
%LOGLIK Bernoulli log-likelihood
% Compute sum(y.*eta - log(1+exp(eta))) stably
ll = 0;
for i = 1:numel(eta)
t = eta(i);
if t > 0
ll = ll + y(i)*t - (t + log1p(exp(-t)));
else
ll = ll + y(i)*t - log1p(exp(t));
end
end
end
2 Comments
Torsten
on 21 Aug 2025 at 13:20
Edited: Torsten
on 21 Aug 2025 at 14:29
I fully support your efforts to program your own Logistic Regression code.
But from the economic point of view, the time you will have to spend understanding the theory and reliably coding the software will not pay compared to the price of the toolbox.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!