agent.learn data type issue, reinforcement learning toolbox

Question

Lars Meijer on 12 Mar 2024

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/2093281-agent-learn-data-type-issue-reinforcement-learning-toolbox

Commented: Lars Meijer on 19 Mar 2024

I am working on a reinforcement learning study. Currently, I am trying to finalize the agent and make it learn from it's experiences. I can not show all of the code but this is the most important part I think:

%% Define action and observation specifications
ActionInfo = rlFiniteSetSpec([1 2 3]); % Actions that the agent is able to take
ObservationInfo = rlNumericSpec([30 10]); % This is what eventually be input for the neural network
% lots of code here ....
% Defining the everything in experience
CurrentState = env.reset();
action = agent.getAction(CurrentState);  % Get action from agent
[nextState, reward, isDone, ~] = env.step(action);  % Interact with environment
% Collect experience
experience = struct(...
    'Observation', {num2cell(CurrentState)}, ...
    'Action', {num2cell(action)}, ...
    'Reward', reward, ...
    'NextObservation', {num2cell(nextState)}, ...
    'IsDone', isDone);
% Train the agent with the experience
agent = agent.learn(experience);  % Update agent with experience

To elaborate, the currentState and nextState are matrices of 30 x 10 of datatype double, action is 1x1 cell, reward is datatype double, and isDone is logical. However, when passing to these to experience, the agent.learn function does not work because of these parts of code in the batchExperienceArray.m file (when not passing the variables with num2cell):

% batch observation, next observation
for ct = 1:numel(ObservationDimension)
    BatchDim = numel(ObservationDimension{ct})+1;
    % Observation
    Observation = arrayfun(@(x) (x.Observation{ct}), ExpStructArray, 'UniformOutput', false);
    ObservationArray{ct} = cat(BatchDim, Observation{:});
    % NextObservation
    NextObservation = arrayfun(@(x) (x.NextObservation{ct}), ExpStructArray, 'UniformOutput', false);
    NextObservationArray{ct} = cat(BatchDim, NextObservation{:});
end
Action = [ExpStructArray.Action];
for ct = 1:numel(ActionDimension)
    BatchDim = numel(ActionDimension{ct})+1;
    ActionArray{ct} = cat(BatchDim,Action{ct,:});
end

Here the error is that brace indexing is not supported for the data type. When I do pass all the variables in experience like it is in the code above, the error becomes:

Error using rl.function.AbstractFunction/validateInputData_

Input data dimensions must match the dimensions specified in the corresponding observation and action info

specifications.

The question thus becomes: how can I pass the data correctly to the agent.learn with the experience, without all these errors? What am I missing here? If any more information is missing, let me know.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Avadhoot on 19 Mar 2024

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/2093281-agent-learn-data-type-issue-reinforcement-learning-toolbox#answer_1427321

Hi @Lars Meijer,

From the information provided in the question I infer that you are having problems with the dimensions of the observation and action matrices in the input data. You have also implemented batching in your code. The error you are facing is due to a dimension mismatch between the input data and the observation and action info specifications. There also might be an issue with how you pass the experience structure to the "learn" function. You have mentioned that if you pass the variables without the "num2cell" conversion, it again gives the error: " brace indexing is not supported for the data type". This is because the batching in the "learn" function expects the inputs to be cell arrays.

According to MATLAB documentation, there should be buffers to store experiences and the dimensions of each buffer must be as follows:

For the observation buffer: number of observations * number of observation channels * batch size.
For the action buffer: number of actions * number of action channels * batch size.
For reward buffer: 1 * batch size

The source of your error might be that you have not formatted the observations and actions according to the batch size. Consider formatting the buffers in the dimensions mentioned above.

For more information on the training procedure, refer to the below example:

https://www.mathworks.com/help/reinforcement-learning/ug/train-reinforcement-learning-policy-using-custom-training.html

I hope this helps in getting an idea about the cause of the error.

3 Comments
Show 1 older commentHide 1 older comment

Avadhoot on 19 Mar 2024

For training a DQN agent you can take a look at the following example: https://www.mathworks.com/help/reinforcement-learning/ug/model-based-reinforcement-learning-using-custom-training-loop.html

Lars Meijer on 19 Mar 2024

Open in MATLAB Online

I also did look at that one. However, it is also not using the agent creation from the Matlab toolbox. I have gone back to the basics with the following code:

%% Trying to create the custom training loop from scratch again
clear, clc
%% Create parameters that the environment needs, but should be defined outside of the environment to have a better overview
updateAfter = 24;   % Determines after how many time instances (hours in this case) you want to plan the job shop again
JobBatchSize = 10;  % Determines in what size of batch the updated jobs will be given (directly influences the size of the inputs of the neural network)
MaxMachines = 20;   % Determines the max of machines (directly influence the size of inputs as well), which is dependend on the generated data
rng(0, 'twister')   % Set rng to produce deterministisc results for reproducability
%% Importing training data
scriptPath = mfilename('fullpath');                    % This determines the path where this file is in
scriptFolder = fileparts(scriptPath);   
folderPath = fullfile(scriptFolder, 'TrainingData');   % This creates a path to the training data
epDataFiles = dir(fullfile(folderPath, '*.txt'));      % Determines all the episode data files
numEpisodes = length(epDataFiles);                     % Determines the number of episodes based on the number of data files
%% Define action and observation specifications
ActionInfo = rlFiniteSetSpec([1 2 3]); % Actions that the agent is able to take
ObservationInfo = rlNumericSpec([(JobBatchSize+MaxMachines) 10]); % This is what eventually be input for the neural network
%% Creating the neural network
% Determine the wanted neurons per layer
Neurons = 64;
% Define the neural Network
qNetwork = [imageInputLayer(ObservationInfo.Dimension, 'Normalization', 'none')   % Specify 'Normalization' parameter
    fullyConnectedLayer(Neurons)                                    % Fully connected layer with 64 neurons
    reluLayer                                                       % Rectified Linear Unit (ReLU) activation function
    fullyConnectedLayer(numel(ActionInfo.Elements))];               % Output layer
% Convert the network to a dlnetwork object
qNetwork = dlnetwork(qNetwork);
%% Creating DQN Agent
% Create a critic, so that the created neural network is used instead of a
% standard neural network
critic = rlVectorQValueFunction(qNetwork ,ObservationInfo,ActionInfo);
agent = rlDQNAgent(critic);
%% Initialize environment
env = DJSPEnvironmentFinal(ObservationInfo, ActionInfo, epDataFiles, folderPath, updateAfter, MaxMachines, JobBatchSize);
%% Training loop
for episode = 1:numEpisodes
    isDone = false;
    currentState = env.reset();
    episodeReward = 0; % Initialize episode-specific reward
    while ~isDone
        % determine the number of steps taken
        env.StepCount = env.StepCount + 1;
        action = agent.getAction(currentState);
        [nextState, reward, isDone, ~] = env.step(action);  % Interact with environment
        
        % Update episode reward and current state
        episodeReward = episodeReward + reward;
        currentState = nextState;
        
    end
    
    % Update the episode number until training is over
    env.CurrentEpisode = env.CurrentEpisode + 1;
end

Now I need to add the training of the agent. I still do not completely understand how to do that. I think agent.train() would not be useful as I created my own training loop. However, i still do not understand the agent.learn() function completely. I hope that this extra context could help you give me some direction. Thanks again for your reply.

Sign in to comment.

agent.learn data type issue, reinforcement learning toolbox

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

agent.learn data type issue, reinforcement learning toolbox

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment