Clear Filters
Clear Filters

How to run multi-agent reinforcement learning in custom environment based on GYM?

19 views (last 30 days)
Hi,
Recently I followed this link MAT-DL onn github created custom environments based on OpenAI GYM and can be trained with a single agent. My question is how can I create custom environments with GYM that support multi-agent?
Thanks!

Answers (1)

Ronit
Ronit on 16 Feb 2024
Edited: Ronit on 16 Feb 2024
Hi,
I understand that you are trying to create a custom GYM environment with multi-agent system. To achieve this, you can use rlMultiAgentFunctionEnvfunction, which was added in the R2023b release. You will have to install the Reinforcement learning toolbox to use this function.
This function requires you to define the observation and action specifications for your agents and to provide custom MATLAB functions for reset and step functions.
However, as this function was added in the release R2023b, it cannot be used in the earlier versions of MATLAB..
Here is an example of a custom multiagent reinforcement learning environment:
  • Consider an environment containing two agents. The first agent receives an observation belonging to a four-dimensional continuous space and returns an action that can have two values, -1 and 1.
  • The second agent receives an observation belonging to a mixed observation space with two channels. The first channel carries a two-dimensional continuous vector, and the second channel carries a value that is either 0 or 1. The action returned by the second agent is a continuous scalar.
  • To define the observation and action spaces of the two agents, use cell arrays.
The below code shows how to do it:
obsInfo = { rlNumericSpec([4 1]) , [rlNumericSpec([2 1]) rlFiniteSetSpec([0 1])] };
actInfo = {rlFiniteSetSpec([-1 1]), rlNumericSpec([1 1])};
env = rlMultiAgentFunctionEnv(obsInfo,actInfo, @stepFcn,@resetFcn)
function [initialObs, info] = resetFcn()
% For this example, initialize the agent observations randomly
% (but set to 1 the value carried by the second observation channel of the second agent).
initialObs = {rand(4,1), {rand(2,1) 1} };
% Set the info argument equal to the observation cell.
info = initialObs;
end
function [nextObs, reward, isdone, info] = stepFcn(action, info)
% STEPFUN specifies how the environment advances to the next state given
% the actions from all the agents.
% If N is the total number of agents, then the arguments are as follows.
% - NEXTOBS is a 1xN cell array (s).
% - ACTION is a 1xN cell array.
% - REWARD is a 1xN numeric array.
% - ISDONE is a logical or numeric scalar.
% - INFO contains any data that you want to pass between steps.
% For this example, just return to each agent a random observation multiplied
% by the norm of its respective action.
% The second observation channel of the second agent carries a value that can be only be 0 or 1.
nextObs = { rand([4 1])*norm(action{1}) , {rand([2 1])*norm(action{2}) 0} };
% Return a random reward vector and a false is-done value.
reward = rand(2,1);
isdone = false;
end
Following is the output of the above code:
Hope this helps!
Ronit Jain

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!