Logistic regression in MATLAB (without Statistics and Machine Learning Toolbox)

Anyone knows if there is a way to perform logistic regression (similar to the "LogisticRegression" model utilized in Python / scikit-learn) in MATLAB but without the "Statistics and Machine Learning Toolbox"? I have the basic MATLAB software (R2020b) without such toolbox and I didn´t want to migrate to the Python environment as well as to buy this specific toolbox.
Many thanks.

3 Comments

You can use the Deep Learning Toolbox, but I guess you don't want to rely on that either?
There are at least a couple of well-rated functions at the File Exchange you could look at...
Matt: Yep, I just want to use plain MATLAB (no toolboxes).
dpb: I tried using the "Logistic Regression for Classification" package which I found on "File Exchange", but already trying the "demo.m" file gave a message saying I needed the "Statistics and Machine Learning Toolbox"...
Any other suggestion of functions or packages there?
Many thanks to both.

Sign in to comment.

 Accepted Answer

[edit: The equation I entered with LaTeX, in my answer below, is displaying as a fuzzy gray rectangle on my computer, after I submit the anwer. I hope it looks better to you. In case it doesn't look better for you, here is the equation in non-LaTeX format:
probability=1/(1+exp(-(x-mu)/s))
]
Here is code to do logistic regression without using any toolboxes. What it does:
  • Generate simulated data: 100 observed values of 0 or 1, based on a logistic function of x, with known, specified parameters mu and s (location and scale, respectively).
  • Estimate the parameters of the logistic function from the observations.
  • Display the estimated parameter values in the console window.
  • Plot the observations, and the logistic function with the known parameters, and the logistic function with the parameters estimated from the data.
The code defines a function which returns the negative log likelihood of the observations, for given values of mu and s (location and scale), using the logistic probability equation above.
The code uses fminsearch() (which is part of basic Matlab, not in a toolbox) to find the values of mu and s which give the best fit, i.e. which minimize the negative log likelihood of the observed data.
This script will give slightly different results each time you run it, since it uses rand() to generate the observations.
See detailed comments in the code.
% Generate simulated data for regression
mu=-1; % location = x-value where probability=0.5
s=2; % scale = transition width
x=sort(20*rand(1,100)-10); % 100 random numbers between -10 and +10
pr=1./(1+exp(-(x-mu)/s)); % probability of observing 1, at each x value
y=(rand(1,100)<=pr); % observed y-values (0 or 1)
% Estimate the logistic probability parameters from the observed data
params0=[0,1]; % initial guess for [mu,s]
params=fminsearch(@(params)negloglike(params,x,y),params0); % best-fit parameters
muEst=params(1); sEst=params(2);
fprintf('Estimated mu=%.3f, estimated s=%.3f.\n',muEst,sEst) % display results on console
Estimated mu=-1.213, estimated s=1.948.
% Compute estimated probability
prEst=1./(1+exp(-(x-muEst)/sEst)); % estimated probability
% Plot the observations, the probability function used to generate the observations,
% and the probability function estimated from the observations
figure
plot(x,y,'ko','MarkerFaceColor','k') % observations
hold on;
plot(x,pr,'-b','LineWidth',2) % probability function used to generate the observations
xlabel('X'); grid on
plot(x,prEst,'-r','LineWidth',2); % probability function esitmated from the observations
title('Logistic Regression: Observations and Probabilities')
legend('Observation','True Probability','Estimated Probability', 'Location','southeast')
function negLL=negloglike(params,x,y)
% NEGLOGLIKE Negative log likelihood of observations, using logistic regression
% Inputs
% params=[mu,s] where
% mu = x-value where probability=0.5
% s = scale = transition width in the x-direction
% x = x-values of observations (vector)
% y = observations, true(1) or false(0) (vector)
% Output
% negLL=-log(probability of observed observations, given mu and s)
% =-sum(log(prob. of each observation))
% =-sum(y*log(p)+(1-y)*log(1-p))
% where p=probility from logistic probability density
mu=params(1); s=params(2);
p=1./(1+exp(-(x-mu)/s)); % probability of observing 1
negLL=-sum(y.*log(p)+(1-y).*log(1-p));
end

5 Comments

Thank you very much. This code (script + function) did the job! :-)
p.s. Any chance this could be expanded to 3 input variables (i.e. 3 features) and 1 logic (binary) response variable?
Many thanks!
@Alexandre Englert, you're welcome.
"Any chance this could be expanded..."
Yes. Stay tuned.
My script to do logistic regression with three predictor variables is not working well yet. It usually fails to converge to a solution. This happens because function negloglike() is returning NaN or -Inf in many cases. This NaN or -Inf seems to be associated with the term sum((1-y).*log(1-p)), where y and p are vectors, and p is the calculated probability. Therefore I suspect what is happening is that calculated p=1. In theory, p should never =1. But in practice, if a+b1x1+b2x2+b3x3 is >>0, then p=1/(1+exp(-(a+b1x1+b2x2+b3x3))) will equal one, and log(1-p) will equal -Inf. I am working on how to get around that.
Since three predictors is not working at the moment, I made a script to do logistic regression with two predictors. The script seems to work well. I ran it four times with N=50 samples per run, and it worked, without error, every time. By "worked", I mean it converged to a reasonable estimate. I also tried it with N=200 samples, four times, and it worked every time. Every run is different in the details, because it generates a new set of random data each time. The script for two predictors is attached. An example of its output, with N=200, is shown below. In this example, the probability goes to zero as x2 get bigger. The example shows that, if a predictor works this way, the algorithm does what we want, without any changes to the code: it figures out that the width, s, is negative.
>> logisticRegressionExample2
200 samples: 128 zeros, 72 ones.
Estimated a=-2.459, estimated s= 1.692,-2.623.
True a=-1.833, true s= 2.000,-3.000.
>>
As you probably know, you can not estimate the midpoint values (mu1, mu2) for each separate variable, when you do logistic regression with multiple predictors. In the case of two predictors, the probability of getting a 1 for each sample is given by
where
and and .
I defined values for true mu1 and mu2 in my script, to make the simulated data, and I used the true mu1, mu2 values to make the blue curves in the plot above. But it is impossible to estimate mu1 and mu2 independently from the experimental data. The reason is that there are infinitely many combinations of mu1, mu2 which will give exactly the same predicted probabilities, for every single sample.
The same thing happens in linear multiple regression: Suppose y1=a1+b1*x1 and y2=a2+b2*x2, and y=y1+y2. Then the equation for predicting y is y=a+b1*x1+b2*x2, where a=a1+a2 . You can see that, by making offsetting changes in a1 and a2, we will get the exact same predictions, for every combination of x1 and x2. So in this multiple linear regresion problem, we can estimate a1+a2, but we can't get the separate values for a1 and a2.
The same thing happens with logistic regression, but the equations are messier.
@Alexandre Englert, I made a small adjustment. Now the script for logistic regresison with three predictors works well. I tried four runs with N=50 samples and four runs with N=200 samples. The script converged to reasonable estimates every time. An example of the output is below. The script is attached.
>> logisticRegressionExample3
200 samples: 80 zeros, 120 ones.
Estimated a= 2.293, estimated s= 1.696, 3.479,-2.605.
True a= 2.167, true s= 2.000, 4.000,-3.000.
>>

Sign in to comment.

More Answers (1)

Logistic regression is an algorithm that can definitely be programmed in MATLAB, which is a general-purpose programming language. It is a numerical optimization problem.
I was not willing to put in the work to solve that for you, but ChatGPT was. Here is what it came up with, along with a comparison to the output of fitglm (from the Statistics and Machine Learning Toolbox), which is what I would typically use to do a logistic regression.
Without extensive testing, I cannot vouch that this code is doing what I would expect. Frankly, I have not even checked to see if there is a stupendously easier way to do this.
% - Small default ridge for stability (lambda = 1e-6)
% - Base MATLAB implementation + optional fitglm comparison
rng default
% Generate a simple synthetic binary classification dataset (2 informative features)
n = 400;
X1 = [randn(n/2,1) + 1.0; randn(n/2,1) - 1.0];
X2 = [randn(n/2,1) - 0.5; randn(n/2,1) + 0.5];
X = [ones(n,1), X1, X2]; % add intercept only (no extra constant feature)
% True parameters (with intercept)
beta_true = [-0.25; 2.0; -1.5];
p = sigmoid(X*beta_true);
y = double(rand(n,1) < p);
% Fit with our scratch implementation (no toolboxes)
opts = struct('maxIter', 100, 'tol', 1e-8, 'lambda', 1e-6); % tiny ridge for numerical safety
[beta_hat, stats] = logreg_newton(X, y, opts);
% Report results
fprintf('=== Base MATLAB Logistic Regression (Newton/IRLS) ===\n');
=== Base MATLAB Logistic Regression (Newton/IRLS) ===
disp(table((0:size(X,2)-1)', beta_hat, 'VariableNames', {'CoeffIndex','Estimate'}));
CoeffIndex Estimate __________ ________ 0 -0.26037 1 2.2738 2 -1.6743
fprintf('Converged: %d in %d iters, final |step| = %.3e, logLik = %.6f\n', ...
stats.converged, stats.iters, stats.lastStepNorm, stats.logLik);
Converged: 1 in 8 iters, final |step| = 3.509e-11, logLik = -106.924048
% Train/test split and accuracy
idx = randperm(n);
train = idx(1:round(0.7*n));
test = idx(round(0.7*n)+1:end);
phat_train = sigmoid(X(train,:)*beta_hat);
phat_test = sigmoid(X(test,:)*beta_hat);
yhat_train = phat_train >= 0.5;
yhat_test = phat_test >= 0.5;
acc_train = mean(yhat_train == y(train));
acc_test = mean(yhat_test == y(test));
fprintf('Train acc: %.2f%% | Test acc: %.2f%%\n', 100*acc_train, 100*acc_test);
Train acc: 89.29% | Test acc: 88.33%
% Optional: compare to fitglm
hasStatsTBX = ~isempty(ver('stats'));
if hasStatsTBX
Xglm = X(:,2:end); % drop explicit intercept for fitglm
mdl = fitglm(Xglm, y, 'Distribution', 'binomial', 'Link', 'logit', 'Intercept', true);
beta_glm = mdl.Coefficients.Estimate;
fprintf('\n=== fitglm Comparison ===\n');
disp(table((0:size(X,2)-1)', beta_hat, beta_glm, beta_hat - beta_glm, ...
'VariableNames', {'CoeffIndex','BaseMATLAB','fitglm','Diff'}));
phat_test_glm = predict(mdl, Xglm(test,:));
yhat_test_glm = phat_test_glm >= 0.5;
acc_test_glm = mean(yhat_test_glm == y(test));
fprintf('Test acc (base): %.2f%% | Test acc (fitglm): %.2f%%\n', 100*acc_test, 100*acc_test_glm);
else
fprintf('\n[Note] Statistics and Machine Learning Toolbox not detected. Skipping fitglm comparison.\n');
end
=== fitglm Comparison ===
CoeffIndex BaseMATLAB fitglm Diff __________ __________ ________ ___________ 0 -0.26037 -0.26037 2.1952e-08 1 2.2738 2.2738 -2.0554e-07 2 -1.6743 -1.6743 1.7967e-07
Test acc (base): 88.33% | Test acc (fitglm): 88.33%
% ------- Local functions (base MATLAB only) -------
function [beta, stats] = logreg_newton(X, y, opts)
%LOGREG_NEWTON Logistic regression via Newton-Raphson (IRLS).
if nargin < 3, opts = struct; end
if ~isfield(opts, 'maxIter'), opts.maxIter = 100; end
if ~isfield(opts, 'tol'), opts.tol = 1e-8; end
if ~isfield(opts, 'lambda'), opts.lambda = 1e-6; end % tiny ridge
[n,p] = size(X);
beta = zeros(p,1);
lambda = opts.lambda;
R = zeros(p); % L2 penalty (no intercept penalty)
if lambda > 0
R(2:end,2:end) = lambda*eye(p-1);
end
for k = 1:opts.maxIter
eta = X*beta;
p1 = sigmoid(eta);
W = p1 .* (1 - p1);
W = max(W, 1e-12); % avoid zeros
g = X'*(y - p1) - R*beta;
% Build H = X' * diag(W) * X + R without forming diag(W)
H = X'*(bsxfun(@times, X, W)) + R;
% Extra diagonal jitter for numerical stability in tough cases
% (helps if features are nearly collinear)
H = H + 1e-12*eye(p);
% Solve Newton step
step = H \ g;
% Backtracking line search on penalized log-likelihood
t = 1.0;
pll_prev = loglik(y, eta) - 0.5*lambda*sum(beta(2:end).^2);
while t > 1e-8
beta_try = beta + t*step;
eta_try = X*beta_try;
pll_try = loglik(y, eta_try) - 0.5*lambda*sum(beta_try(2:end).^2);
if pll_try >= pll_prev
beta = beta_try;
pll_prev = pll_try;
break;
end
t = t/2;
end
if norm(step) < opts.tol
stats.converged = true;
stats.iters = k;
stats.lastStepNorm = norm(step);
stats.logLik = pll_prev;
return;
end
end
stats.converged = false;
stats.iters = opts.maxIter;
stats.lastStepNorm = norm(step);
stats.logLik = pll_prev;
end
function y = sigmoid(z)
%SIGMOID Numerically stable logistic sigmoid
y = zeros(size(z));
pos = z >= 0;
neg = ~pos;
y(pos) = 1 ./ (1 + exp(-z(pos)));
ez = exp(z(neg));
y(neg) = ez ./ (1 + ez);
end
function ll = loglik(y, eta)
%LOGLIK Bernoulli log-likelihood
% Compute sum(y.*eta - log(1+exp(eta))) stably
ll = 0;
for i = 1:numel(eta)
t = eta(i);
if t > 0
ll = ll + y(i)*t - (t + log1p(exp(-t)));
else
ll = ll + y(i)*t - log1p(exp(t));
end
end
end

2 Comments

Thank you for the answer. However, I didn´t want to rely on chatGPT... Are there any good books on MATLAB and supervised machine learning which you could suggest?
I fully support your efforts to program your own Logistic Regression code.
But from the economic point of view, the time you will have to spend understanding the theory and reliably coding the software will not pay compared to the price of the toolbox.

Sign in to comment.

Products

Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!