Multiple Linear Regression Trouble

I'm using regression techniques to attempt spectra unfolding but the result matlab is giving is wrong. I start with several response functions that have been modeled for mono-energetic values like in IgorPlot.png. Each of these responses is then multiplied by a unique coefficient "a(n)" and the summation is taken. Using regression techniques, the summation is then fitted to an example function like in FitTest.png to produce the coefficient matrix "a" which should indicate the which modeled response functions are most prevalent in the example. This technique has been proven to work in similar applications, but I'm running into a problem where, as can be seen in FitTest, matlab favors the last response function above every other function used.
I've tried rewriting the code to include both fewer and more responses and in both cases, regardless of the number of responses used, Matlab will consistently favor there last response listed in the summation equation. Am I going about this completely wrong, or is there something I'm missing here?

6 Comments

Image that didn't attach.
Don't see how you expect us to tell anything w/o even a hint of the code you're trying...
I've modified a basic example from one of my textbooks to handle y as a function of 13 "x" variables. The matrix and equation setup I used are shown below, unfortunately the data I'm using to do this cannot be posted here. All I want to know is why matlab insists on telling me that the last value it finds in Coeff is always the largest, when I know for a fact it shouldn't be.
X_matrix = [n,sum(A0_5),sum(A1_0),sum(A1_5),sum(A2_0),sum(A2_5),sum(A3_0),sum(A3_5),sum(A4_0),sum(A4_5),sum(A5_0),sum(A5_5),sum(A6_0);
sum(A0_5),sum(A05A05),sum(A05A10),sum(A05A15),sum(A05A20),sum(A05A25),sum(A05A30),sum(A05A35),sum(A05A40),sum(A05A45),sum(A05A50),sum(A05A55),sum(A05A60);
sum(A1_0),sum(A05A10),sum(A10A10),sum(A10A15),sum(A10A20),sum(A10A25),sum(A10A30),sum(A10A35),sum(A10A40),sum(A10A45),sum(A10A50),sum(A10A55),sum(A10A60);
sum(A1_5),sum(A05A15),sum(A10A15),sum(A15A15),sum(A15A20),sum(A15A25),sum(A15A30),sum(A15A35),sum(A15A40),sum(A15A45),sum(A15A50),sum(A15A55),sum(A15A60);
sum(A2_0),sum(A05A20),sum(A10A20),sum(A15A20),sum(A20A20),sum(A20A25),sum(A20A30),sum(A20A35),sum(A20A40),sum(A20A45),sum(A20A50),sum(A20A55),sum(A20A60);
sum(A2_5),sum(A05A25),sum(A10A25),sum(A15A25),sum(A20A25),sum(A25A25),sum(A25A30),sum(A25A35),sum(A25A40),sum(A25A45),sum(A25A50),sum(A25A55),sum(A25A60);
sum(A3_0),sum(A05A30),sum(A10A30),sum(A15A30),sum(A20A30),sum(A25A30),sum(A30A30),sum(A30A35),sum(A30A40),sum(A30A45),sum(A30A50),sum(A30A55),sum(A30A60);
sum(A3_5),sum(A05A35),sum(A10A35),sum(A15A35),sum(A20A35),sum(A25A35),sum(A30A35),sum(A35A35),sum(A35A40),sum(A35A45),sum(A35A50),sum(A35A55),sum(A35A60);
sum(A4_0),sum(A05A40),sum(A10A40),sum(A15A40),sum(A20A40),sum(A25A40),sum(A30A40),sum(A35A40),sum(A40A40),sum(A40A45),sum(A40A50),sum(A40A55),sum(A40A60);
sum(A4_5),sum(A05A45),sum(A10A45),sum(A15A45),sum(A20A45),sum(A25A45),sum(A30A45),sum(A35A45),sum(A40A45),sum(A45A45),sum(A45A50),sum(A45A55),sum(A45A60);
sum(A5_0),sum(A05A50),sum(A10A50),sum(A15A50),sum(A20A50),sum(A25A50),sum(A30A50),sum(A35A50),sum(A40A50),sum(A45A50),sum(A50A50),sum(A50A55),sum(A50A60);
sum(A5_5),sum(A05A55),sum(A10A55),sum(A15A55),sum(A20A55),sum(A25A55),sum(A30A55),sum(A35A55),sum(A40A55),sum(A45A55),sum(A50A55),sum(A55A55),sum(A55A60);
sum(A6_0),sum(A60A60),sum(A10A60),sum(A15A60),sum(A20A60),sum(A25A60),sum(A30A60),sum(A35A60),sum(A40A60),sum(A45A60),sum(A50A60),sum(A55A60),sum(A60A60)];
Y1_matrix = [sum(y1);sum(A05y1);sum(A10y1);sum(A15y1);sum(A20y1);sum(A25y1);sum(A30y1);sum(A35y1);sum(A40y1);sum(A45y1);sum(A50y1);sum(A55y1);sum(A60y1)];
Coeff = X_matrix\Y1_matrix;
dpb
dpb on 7 Jul 2014
Edited: dpb on 8 Jul 2014
Normally, one would write the design matrix X and let Matlab compute X'X instead of building it explicitly. That, however, wouldn't be the cause of a problem.
I'd say the location and magnitude of the coefficients is entirely dependent upon the data and the model chosen. You can prove this by rearranging the order of terms in X and recomputing.
Or, build the model with some of the regression tools in either Statistics or Curve Fitting toolboxes.
Specifics on the actual problem are impossible to divine if you can't share the data or provide a representative pseudo-set that duplicates the issue.
...Image that didn't attach.
How in the world did you generate the wiggles in the "LSR fitted response"??? That would take a heckuva' lot more than 13 terms it would seem...
The "wiggles" as you call them are actually a result of something called the Bragg peak from nuclear physics and are present in each of the 13 base responses used. Although the angle may not be ideal in the IgorPlot file, you can still see these peaks to an extent.

Sign in to comment.

Answers (1)

dpb
dpb on 8 Jul 2014
Edited: dpb on 8 Jul 2014
I think on reflection the comment I made earlier about Matlab building the X.'*X matrix actually is your problem. I just happened to have the data still in memory from the current other thread where a poor misguided soul is trying to fit a nonlinear model of incredible complexity to a set of data that are an essentially perfect quadratic. Anyway, having solved for that quadratic with polyfit, I demonstrate the same computation with mldivide --
>> x=[1:length(yh)].'; % I'd lost his x; just used 1:N instead for demo
>> b=polyfit(x,yh,2) % the polyfit results
b =
-4.1158e-05 0.0600 -22.9005
Now build the design matrix X for a quadratic...
>> X=[x.*x x ones(size(x))]; % the design matrix
>> best=X\y % solve the least squares problem
>> best
best =
-0.0000
0.0600
-22.9005
>> b.'-best
ans =
1.0e-14 *
-0.0000
0.0028
-0.7105
>>
Note the results aren't identical but are within E-14 or so which is down in the precision noise level.
Hence, NB: Matlab formed the X'*X matrix internally in the process of solving via \, you do not need to create it and it'll be the wrong answer if you do. I'm guessing there's your source of confusion and difficulty in understanding your results.

4 Comments

Jess
Jess on 8 Jul 2014
Edited: Jess on 8 Jul 2014
I appreciate the suggestion to fit it the summation to a different equation form, but I have tried using the polyval/polyfit commands before and run into the issue that fitting a polynomial is utterly useless to my application. What I require are not the coefficients for a polynomial, but the coefficients of a linear equation (y=a1*x1+a2*x2+...an*xn). Unless I have completely misunderstood the use of the poly commands, they return coefficients for an equation such that y=a1*x^n+a2*x^(n-1)+...an*x+a(n+1).
I opted to use MLR because although it is far more hideous and lengthy to create, it seems the only method which can incorporate several sets of unique x-data (which currently exist as 13 sets of 1000x1 matrices) into a single linear summation.
As I said in the Answer, I did NOT suggest using a different form, but "...having solved for that quadratic with polyfit, I demonstrate the same computation with mldivide"
Hence the polynomial was simply an example of use mldivide (aka "\"), comparing its output with a known other solution technique for the same problem to illustrate and ensure the proper application of the backslash operator.
For a linear model in N variables, the design matrix would be
X=[x1 x2 x3 ... xN];
and the solution
b=X\y;
Your sample which begins as
X_matrix = [n, sum(A0_5), ..., sum(A5_5), sum(A6_0);
sum(A0_5), ...
looks too suspiciously as though it is actually the X'*X matrix created from X_matrix which is not correct to use with \. If this observation is, indeed, correct, then there's a least one problem (if not the only one) in your solution.
Thank you for clarifying that. I've done a shortened run to test it out, and allowing matlab to generate the x-matrix appears to have solved part of the problem with the coefficients. I'll try running one of the larger codes and let you know how it goes.
For a linear model in N variables, the design matrix would be X=[x1 x2 x3 ... xN];...
NB: this is a zero-intercept model; your X_matrix including the term n indicates you probably have an intercept. The design matrix in that case includes a column of ones...
X=[ones(size(x1) x1 x2 x3 ... xN];
for the model
y=a+b*x1+c*x2+...

Sign in to comment.

Categories

Find more on Particle & Nuclear Physics in Help Center and File Exchange

Products

Asked:

on 3 Jul 2014

Commented:

dpb
on 8 Jul 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!