How to improve my regression problem having a low accuracy?

7 views (last 30 days)
I want to solve a small regression problem. My dataset consists of two features:
  • the population of a country
  • the number of representatives
So, my objective is: given the population of a country I get the estimated number of representative. The issue is that the using linear regression I get a 50% of accuracy. Can this be be motivated by the distribution of the data? Here some descriptors of the dataset:
And here my scatter plot:
I am new in ML and I'm trying to do some stuff by myself. How can I improve my model? I was thinking:
  • use a non linear regression to better fit the data
  • improve my dataset (e.g. removing outliers)
_______________________

Answers (1)

Ameer Hamza
Ameer Hamza on 30 Nov 2020
Linear regression has a closed-form solution, i.e., you can get the globally optimum solution by just putting values in an equation. This means that you cannot get any improvement on the current result unless you change the dataset.
Also, percentage accuracy is not a good metric for regression problems. Usually, you need to use MSE or RMSE error what talking about the accuracy of linear regression.
If you look at the data points, you can easily see that the data is not distributed linearly, but that is fine. You don't need the model to accurately predict each data point (that will lead to overfitting). You just need the model to capture the general trend of data distribution.
However, removing outliers and using a nonlinear model will improve the accuracy, but don't try to minimize the error too much. ML is not about curve-fitting and reducing the error to 0. It is about making the model learns the important patterns in your dataset and make the prediction based on those patterns.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!