How to run multi-agent reinforcement learning in custom environment based on GYM?

Question

Lingfeng Tao on 3 May 2022

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/1711155-how-to-run-multi-agent-reinforcement-learning-in-custom-environment-based-on-gym

Edited: Ronit on 16 Feb 2024

Hi,

Recently I followed this link MAT-DL onn github created custom environments based on OpenAI GYM and can be trained with a single agent. My question is how can I create custom environments with GYM that support multi-agent?

Thanks!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Ronit on 16 Feb 2024

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/1711155-how-to-run-multi-agent-reinforcement-learning-in-custom-environment-based-on-gym#answer_1410273

Edited: Ronit on 16 Feb 2024

Open in MATLAB Online

Hi,

I understand that you are trying to create a custom GYM environment with multi-agent system. To achieve this, you can use ‘rlMultiAgentFunctionEnv’ function, which was added in the R2023b release. You will have to install the Reinforcement learning toolbox to use this function.

This function requires you to define the observation and action specifications for your agents and to provide custom MATLAB functions for reset and step functions.

However, as this function was added in the release R2023b, it cannot be used in the earlier versions of MATLAB..

Here is an example of a custom multiagent reinforcement learning environment:

Consider an environment containing two agents. The first agent receives an observation belonging to a four-dimensional continuous space and returns an action that can have two values, -1 and 1.
The second agent receives an observation belonging to a mixed observation space with two channels. The first channel carries a two-dimensional continuous vector, and the second channel carries a value that is either 0 or 1. The action returned by the second agent is a continuous scalar.
To define the observation and action spaces of the two agents, use cell arrays.

The below code shows how to do it:

obsInfo = { rlNumericSpec([4 1]) , [rlNumericSpec([2 1]) rlFiniteSetSpec([0 1])] };
actInfo = {rlFiniteSetSpec([-1 1]), rlNumericSpec([1 1])};
env = rlMultiAgentFunctionEnv(obsInfo,actInfo, @stepFcn,@resetFcn)
function [initialObs, info] = resetFcn()
% For this example, initialize the agent observations randomly 
% (but set to 1 the value carried by the second observation channel of the second agent).
initialObs = {rand(4,1), {rand(2,1) 1} };
% Set the info argument equal to the observation cell. 
info = initialObs;
end
function [nextObs, reward, isdone, info] = stepFcn(action, info)
    % STEPFUN specifies how the environment advances to the next state given
    % the actions from all the agents. 
    
    % If N is the total number of agents, then the arguments are as follows.
    % - NEXTOBS is a 1xN cell array (s).
    % - ACTION is a 1xN cell array.
    % - REWARD is a 1xN numeric array.
    % - ISDONE is a logical or numeric scalar.
    % - INFO contains any data that you want to pass between steps.
    
    % For this example, just return to each agent a random observation multiplied 
    % by the norm of its respective action. 
    % The second observation channel of the second agent carries a value that can be only be 0 or 1.
    nextObs = {  rand([4 1])*norm(action{1}) , {rand([2 1])*norm(action{2}) 0} };
    
    % Return a random reward vector and a false is-done value.
    reward = rand(2,1);
    isdone = false;
end