New data from one hot encoded NN

6 views (last 30 days)
Hi all.
I have a Neural Net with a few One Hot endocded fields. Everything is working well and it is exhibiting the learning rate I need. The issue I have is running new data through the NN to get the propbability output. The issue is that when I created the NN I used a lot of data so for instance one field was transformed into 400 variable by one hot encoding. When I convert the new data to run throught the NN of course I need to one hot encode the field in the new data. The problem is that of course, there being less data, the one hot encode doesn't convert to the same amount of vars so doesn't work.
So for example before the one hot encode the variable had 400 different vars once encoded. The new data matching field only had 5 different types so only had 5 vars once encoded. I'm sure I'm missing something here so does anyone have any idea?
SPG

Accepted Answer

Nandini
Nandini on 20 Jul 2023
It seems like you're facing an issue with the dimensionality mismatch when applying one-hot encoding to new data that has fewer categories compared to the original data used to train your Neural Network (NN). One possible solution is to ensure that the encoding of the new data matches the same dimensions as the original data.
Here's a suggestion to handle this situation:
1. Determine the unique categories of the original data that was used for training. Let's call this set of unique categories `original_categories`.
2. When you encounter new data, one-hot encode the field as usual. However, instead of using the one-hot encoding directly, ensure that the resulting one-hot encoded vector has the same dimensions as the original data.
- Create a list of unique categories in the new data. Let's call this set of unique categories `new_categories`.
- Compare `new_categories` with `original_categories` to identify any missing categories.
- Add the missing categories to `new_categories` and sort them in the same order as `original_categories`.
- Perform one-hot encoding on the field using the updated `new_categories` to ensure the dimensions match.
By aligning the categories and dimensions of the one-hot encoding for both the training data and new data, you can ensure consistency when running the new data through the Neural Network.
Additionally, it's worth noting that if you encounter categories in the new data that were not present in the original training data, the Neural Network may not have learned how to handle these categories effectively. In such cases, it's important to consider how to handle these unseen categories appropriately.
I hope this suggestion helps resolve the dimensionality mismatch issue. Let me know if you have any further questions!
  1 Comment
Stephen Gray
Stephen Gray on 20 Jul 2023
That makes a lot of sense thanks. The data will be unlikely to contain anything unseen before so that should be OK. It's a bit more work than I thought but then it always is with data!
SPG

Sign in to comment.

More Answers (0)

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!