Grouping age data into 5 year bins

Dear MATLAB experts,
I'm trying to create a new column in a table, which contains the age binned into 5 years intervals per row. This column should create the age bins according to another column in the same table, which specifies the age of the individual per row. In order to make what I'm looking for more clear, here a few examples:
If the age of the individual in a certain row were 25.92 (in the age column), the binned age column should display the following value: (25.0, 30.0]
If the age of the individual in a certain row were 42.53 (in the age column), the binned age column should display the following value: (40.0, 45.0]
If the age of the individual in a certain row were 33 (in the age column), the binned age column should display the following value: (30.0, 35.0]
... and so on
Thank you in advance

 Accepted Answer

DGM
DGM on 12 May 2021
Edited: DGM on 12 May 2021
I'm going to just do this example starting with a numeric vector, but you can adapt it to use your table if you want.
age = rand(10,1)*50+15; % random pretend data
bs = 5;
agebin = ceil(age/bs)*bs;
agebin = [agebin-bs agebin]
agetable = table(age,agebin) % make a table
gives
agetable =
10×2 table
age agebin
______ ________
27.211 25 30
43.016 40 45
63.641 60 65
43.78 40 45
18.965 15 20
51.753 50 55
43.554 40 45
25.962 25 30
34.771 30 35
48.543 45 50
If it's instead desired to have that field as a char/string with the brackets as shown:
age = rand(10,1)*50+15; % random pretend data
bs = 5;
agebin = ceil(age/bs)*bs;
agebin = [agebin-bs agebin]
agebin = sprintfc('(%d %d]',agebin); % reformat it
agetable = table(age,agebin) % make a table
gives
agetable =
10×2 table
age agebin
______ ___________
30.402 {'(30 35]'}
54.001 {'(50 55]'}
17.773 {'(15 20]'}
33.628 {'(30 35]'}
20.063 {'(20 25]'}
24.851 {'(20 25]'}
38.988 {'(35 40]'}
48.377 {'(45 50]'}
18.798 {'(15 20]'}
46.282 {'(45 50]'}

6 Comments

Hi DGM,
this is really helpful. How could I now change the cell data type into a double for example, so that ' ' isn't displayed?
Thank you in advance
The first example I gave is using numeric (double) output.
Alright, I was just trying to display it so that everything else except for the ' ' would show. Thank you.
I have an additional question, that you might also know something about. I'm now trying to group the data within the different agebins and creating a mean per each age bin. Do you know how this could be done?
Thank you in advance
DGM
DGM on 12 May 2021
Edited: DGM on 12 May 2021
First, I need to point out that I made a mistake in the previous code. I've edited the code above such that it correctly handles cases where the ages are integer-valued. The old code didn't.
Regarding getting bin means, there's probably a canonical way of doing this, but this is what I came up with real quick:
% list of all age bins (by upper edge)
allagebins = unique(agebin(:,2));
meanage = zeros(size(allagebins));
for b = 1:numel(allagebins)
binmembers = age(agebin(:,2)==allagebins(b));
meanage(b) = mean(binmembers);
end
% show as a table
othertable = table([allagebins-bs allagebins],meanage)
gives
othertable =
7×2 table
Var1 meanage
________ _______
15 20 17.335
30 35 33.855
40 45 42.711
45 50 48.693
50 55 50.352
55 60 55.168
60 65 60.981
Thanks for your answers! This has helped me a lot, I understand how you've done it, but I wouldn't be able to come up with the answer on my own.
No worries. There are plenty of code examples around here that make me feel the same way.

Sign in to comment.

More Answers (1)

J. Alex Lee
J. Alex Lee on 12 May 2021
Edited: J. Alex Lee on 12 May 2021
Interesting, this could be useful for me too! You can use "discretize" to bin the data. Below is a quick and dirty class to implement this idea.
BinEdges = 20:5:45
Data = [25.92;42.53;33;30]
% make sure discretize does what i want
[BData,E] = discretize(Data,BinEdges,"IncludedEdge","right")
LeftEdges = BinEdges(BData)
RightEdges = BinEdges(BData+1)
Using the class below results in:
>> db = DataBins(BinEdges,Data)
db =
"[25,30)"
"[40,45)"
"[30,35)"
"[30,35)"
>> db.IncludedEdge = "right"
db =
"(25,30]"
"(40,45]"
"(30,35]"
"(25,30]"
The class
classdef DataBins < handle & matlab.mixin.CustomDisplay
properties
BinEdges
IncludedEdge
Value
BinIDs
LeftEdges
RightEdges
end
methods
function this = DataBins(BinEdges,Value,IncludedEdge)
arguments
BinEdges (1,:)
Value
IncludedEdge (1,1) string {mustBeMember(IncludedEdge,["left","right"])} = "left"
end
this.BinEdges = BinEdges;
this.Value = Value;
this.IncludedEdge = IncludedEdge;
this.update();
end
function update(this)
try % lazy way to only do when all data are set
V = this.Value(:);
S = size(this.Value);
BIDs = discretize(V,this.BinEdges,"IncludedEdge",this.IncludedEdge);
this.BinIDs = reshape(BIDs,S);
this.LeftEdges = reshape(this.BinEdges(BIDs ),S);
this.RightEdges = reshape(this.BinEdges(BIDs+1),S);
end
end
function set.BinEdges(this,val)
this.BinEdges = val;
this.update();
end
function set.Value(this,val)
this.Value = val;
this.update();
end
function set.IncludedEdge(this,val)
this.IncludedEdge = val;
this.update();
end
end
methods (Access = protected)
function displayScalarObject(this)
switch this.IncludedEdge
case "left"
lb = "[";
rb = ")";
case "right"
lb = "(";
rb = "]";
end
disp(...
compose("%s%d,%d%s",lb,this.LeftEdges,this.RightEdges,rb) ...
);
end
end
end

Categories

Asked:

on 12 May 2021

Commented:

DGM
on 12 May 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!