How to code Categorical Variables in NARX neural network data input?
Show older comments
I am working to predict electricity demand (load) and I am having many categorical variables as inputs to a Neural Network Time Series NARX app (eg: months (12 categories spelled out January -December), days (seven categories: 1 - 7), and Hours in each day (1 thru 24). When I load my excel data table to assign "Inputs" as my variables, the Matlab is not able to read and display my categorical variable "Months" because the values are spelled out January thru December. Should I write a simple line code such as below, or is there a different way to flag those variables as Categorical for NARX neural networks? I prefer not to convert Months into 1-12 as Matlab will assume some scale (Month 12 is higher than Month 6, etc). Thank you in advance!
T.HE = categorical(T.HE); T.MONTH = categorical(T.MONTH);T.WEEKDAY = categorical(T.WEEKDAY);
3 Comments
awezmm
on 3 Jan 2020
What is the error you are getting when you say "the Matlab is not able to read and display my categorical variable"
SK
on 3 Jan 2020
Walter Roberson
on 3 Jan 2020
You will not be able to proceed with the Mathworks tools and will need to write your own. The Mathworks tools can only work with data that is all (orderable) numeric, or all categorical, or all cell array of character vectors.
Even if you were to switch to all categorical you would have challenges: when you concatenate together categorical arrays, the individual ranges loose their identity and a new categorical array is created that combines all of the categories, renumbering elements. The neural networks would have no way of knowing that the second column could not simultaneously have Tuesday and March for example.
However as I touched on in my Answer, I think you are making a mistake in trying to make the entries unordered. When you make them unordered you are saying that the second day of February has more predictive power for load on the second day of August than the first day of August has for the second day of August.
Accepted Answer
More Answers (1)
SK
on 3 Jan 2020
1 vote
4 Comments
Walter Roberson
on 3 Jan 2020
Edited: Walter Roberson
on 3 Jan 2020
you are recommending to recode ALL my categorical variables into zeros and ones (binary) by creating additional 12 Columns for months, 24 columns for hours and 7 columns for days of the week
If you do that, then the result is double() datatype, and it is valid to combine that with numeric data such as temperature and humidity in the same array. It is valid to use
[isJanuary, isFebruary, isMarch, isApril, isMay, isJune, isJuly, isAugust, isSeptember, isOctober, isNovember, isDecember, isMonday, isTuesday, isWednesday, isThursday, isFriday, isSaturday, isSunday, is0000, is0100, is0200, is0300, is0400, is0500, is0600, is0700, is0800, is0900, is1000, is1100, is1200, is1300, is1400, is1500, is1600, is1700, is1800, is1900, is2000, is2100, is2200, is2300, TemperatureC, RelativeHumidity, SolarIntensity]
whereas if you tried to use
[monthCategory, WeekdayCategory, HourCategory, TemperatureC, RelativeHumidity, SolarIntensity]
then that would fail because you cannot combine categorical and double precision in the same array.
However, my belief is that you will get further if you code as
[monthNumber, WeekdayNumber, HourNumber, TemperatureC, RelativeHumidity, SolarIntensity]
SK
on 3 Jan 2020
Walter Roberson
on 4 Jan 2020
Yes, that makes sense. Version 2 corresponds to using unordered categories, and Version 1 corresponds to using ordered categories.
SK
on 4 Jan 2020
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!