matlab too slow in creating one new variable, looking for more efficient way

2 views (last 30 days)
I have a table called z, that looks like this:
ID Price
1 8
2 7
3 10
...
This table has over 1 million rows. I want to create a new variable in this table called "bucket", and I write
for i = 1:height(z)
if (z.Price(i) >=6.490 & z.PRICE(i) <=6.499)
z.bucket(i) = 1;
elseif (z.PRICE(i) >=6.500 & z.PRICE(i) <=6.999)
z.bucket(i) = 2;
elseif (z.PRICE(i) >=7.000 & z.PRICE(i) <=7.499)
z.bucket(i) = 3;
elseif (z.PRICE(i) >=7.500 & z.PRICE(i) <=7.999)
z.bucket(i) = 4;
elseif (z.PRICE(i) >=8.000 & z.PRICE(i) <=8.499)
z.bucket(i) = 5;
elseif (z.PRICE(i) >=8.500 & z.PRICE(i) <=8.999)
z.bucket(i) = 6;
elseif (z.PRICE(i) >=9.000 & z.PRICE(i) <=9.499)
z.bucket(i) = 7;
elseif (z.PRICE(i) >=9.500 & z.PRICE(i) <=9.990)
z.bucket(i) = 8;
else z.bucket(i) = 9;
end
end
I did not initialize bucket but went directly with the code. So the code has been running for over 3 hours now and I also get a warning at the very beginning when I start to run these codes:
Warning: The new variables being added to the table have fewer rows than the table. They have been extended with rows containing default values. > In table/subsasgnDot (line 271) In table/subsasgn (line 67)
What is a better way to do this in order to make the codes run faster? I think it's a very straightforward thing that I want to do --create a new variable based on some if/else condition. I do it a lot in R and SQL, but didn't realize Matlab would be so slow on this simple thing. Maybe I was not doing it the right way. Thanks!

Accepted Answer

Guillaume
Guillaume on 2 Mar 2016
z.Bucket = discretize(z.Price, [6.49, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 9.991, Inf])
Note that I made the last bin (9) starts at 9.991 since it's not specified in your code.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!