How can I analyse particular portions of an array to use for plotting? Considering dataset of ~1.5 million rows for each variable and ~150,000 locations with 10 values (for each variable) per location
1 view (last 30 days)
Show older comments
I have a dataset of ~1.5 million rows and 6 different variables, defined below:
- Variable 1 (Column 1) - Location
- Var 2 (Column 2) - Temperature
- Var 3 - Rainfall
- Var 4 - Number of people in the location (Variable 1)
- Var 5 - Sensor value A
- Var 6 - Sensor value B
The dataset contains 10 values per location, meaning there's ~150,000 locations.
The questions I'm trying to answer are:
- a) Determine the average number of people and plot the average for the top 100 locations with the most people on average
- b) Determine the minimum and maximum temperature and plot the both values for the top 100 locations of the maximum temperature
The structure given to approach this question was to find the Minimum, Maximum and Average from the 10 values in each location for each Variable (Temperature, Rainfall, Sensor A and B readings). They then suggested creating a matrix with all ~150,000 rows that includes the min, max and average for each required Variable, then plotting the graphs with the newly created matrix.
What process would I follow to find the Minimum, Maximum and Average of the 10 values in each location for each Variable?
I'm currently unsure how to:
- Group the 10 values from each location together to find the min, max and mean for starters
- Make those into a matrix
- Plot the top 100 rows of my matrix (to represent a plot of the top 100 locations); I think I know the basics in how to plot a graph, but not for particular/select data, like the top 100 rows of a matrix/array.
Any guidance would be much appreciated, thank you in advance for any assistance!
3 Comments
Answers (1)
dpb
on 4 Jun 2020
Edited: dpb
on 4 Jun 2020
"I’ve imported the data from a table and converted each variable into their own individual arrays,..."
That's exactly the wrong approach -- use grouping variables on the desired variables as suggested and illustrated in the doc for groupsummary, findgroups and/or splitapply. You'll also find groupsummary already does much if not all of what you're asking for automagically.
Assuming you already have the table, I'll name it tData for "table Data"
tData.Properties.VariableNames={'Location','Temperature','Rainfall','Population','SensorA','SensorB'}; % define meaningful variable names
tGData=groupsummary(tData,'Location',{'min','max','mean'}); % compute wanted statistics by location
Then you use maxk on the desired statistic to find the locations (via optional second output) in the output table for the topmost 100 and extract those for whatever else it is to be done.
It's really all there; just use the tools TMW has provided...
0 Comments
See Also
Categories
Find more on Line Plots in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!