Linear regression with categorical predictor & quadratic term - dataset

Hey!
I am trying to construct a dataset array as below(matlab help) in order to do a regression:
>> load carsmall
>> ds = dataset(MPG, Weight);
>> ds.Year=nominal(Model_Year);
>> mdl = fitlm(ds, 'MPG~Year+Weight^2')
mdl =
Linear regression model:
MPG ~ 1 + Weight + Year + Weight^2
Estimated Coefficients:
Estimate SE tStat pValue
(Intercept) 54.206 4.7117 11.505 2.6648e-19
Weight -0.016404 0.0031249 -5.2493 1.0283e-06
Year_76 2.0887 0.71491 2.9215 0.0044137
Year_82 8.1864 0.81531 10.041 2.6364e-16
Weight^2 1.5573e-06 4.9454e-07 3.149 0.0022303
Number of observations: 94, Error degrees of freedom: 89
Root Mean Squared Error: 2.78
R-squared: 0.885, Adjusted R-Squared 0.88
F-statistic vs. constant model: 172, p-value = 5.52e-41
But unfortunately, i get an error message when I try to do it with my data:
>> ds=dataset(price, size, weight, speed);
>> ds.postcode=nominal(postcode);
mdl = fitlm(ds, 'price~postcode+size+weight+speed')
Which is:
Index of element to remove exceeds matrix dimensions.
Error in classreg.regr.modelutils.designmatrix>dummyVars (line 432)
X0 = eye(ng); X0(:,1) = [];
Error in classreg.regr.modelutils.designmatrix (line 279)
[Xj,dummynames] = dummyVars(dummyCoding{j},Xj,catLevels{j});
Error in classreg.regr.TermsRegression/designMatrix (line 316)
[design,~,~,coefTerm,coefNames] ...
Error in LinearModel/fitter (line 654)
[model.Design,model.CoefTerm,model.CoefficientNames] =
designMatrix(model,X);
Error in classreg.regr.FitObject/doFit (line 220)
model = fitter(model);
Error in LinearModel.fit (line 857)
model = doFit(model);
Error in fitlm (line 111)
model = LinearModel.fit(X,varargin{:});
Thank you for your help!

10 Comments

Looks like perhaps there's insufficient data for the added variable postcode, maybe?
I am using the same amount of data as for the other variables. Is it possible that the problem is that I use more different variables than in the example?Do I need a different code than? Or is the problem that my postcode is more complex than in the example ( I have several different postcodes). In the example they only differentiate between year 1976/1982 yes/ no:
fitlm creates two dummy (indicator) variables for the nominal variables, Year. The dummy variable Year_76 takes the value 1 if the model year is 1976 and takes the value 0 if it is not. The dummy variable Year_82 takes the value 1 if model year is 1982 and takes the value 0 if it is not. 1970 is the reference year .....
Thank you!
Save the variable postcode in a mat-file an post it here. It is a lot easier to help when you can reproduce the error on your own machine. And in your case it seems difficult to find the problem without knowing the input variables.
I just saw that the postcodes are replaced by NaN ... that might be the problem. How do I import the postcode as it is not a number? Thanks :)
Does this look right to you? I have quite a lot of variables because of the dummy variables... but it still seems to work or?
I would need a lot more of data to be in a position to assess the quality of the regression. But judging by the R-squared value and the F-statistic the result seems quite good. The huge amount of x-variable is a bit troublesome ...
I might try it with a decision tree instead of dummy variables in order to have not so many variables. But for now I am happy you helped me to get this one working :)
I don't know; Rsq is only roughly 40% (leaving 60% of total unexplained) and just glancing thru it appears that a very few of the coefficients are significant out of the "cast of thousands". Superficially I wouldn't say it looks like a very good model and certainly isn't high on the candidates list for a parsimonious one... :)
I'd guess the model estimated on only
(Intercept) 0.090898
bedroom 1.0747e-22
postcode_NW1 0.072403
postcode_SW1X 0.026157
would perform nearly as well.
Thanks, this is just the beginning for me.will try to work with the data with other models afterwards and hopefully get better results. but this help me a lot to get my first matlab analysis working. thanks :)

Sign in to comment.

Answers (0)

Asked:

on 12 Jun 2014

Commented:

on 14 Jun 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!