Collaborative DDPG/Actor-Critic Example

11 views (last 30 days)
I have currently developed a DDPG model which optimizes traffic in intersections along one direction. I am looking towards implementing four of the same model on each direction, ie North-South, South-North, East-West, and West-East, ie I would like to run 4 DDPG models simultaneously each with its own local reward function. I have attempted to combine all 4 approaches but unfortunately the model appears to confuse actions in one direction with observations in another.
For example, if the agent sends a signal to a certain vehicle in the east-west lane to change its speed while simultaneously doing the same in the north-west direction for another vehicle, the system would consider the sum of all rewards for all actions performed, resulting in optimum actions performed on one approach being overshadowed by subpar actions on another.
It is for this reason that I believe that a collaborative multiagent approach may be ideal but I cannot seem to find anything in the Matlab supporting documents to indicate how this may be done beyond very simple simulink examples. I have noted the following which still leave significant gaps:
My current model utilizes a custom environment which interfaces with another software's COM in order to generate a sample environment from which observations are taken and actions are applied. I am not currently using Simulink as a result of the need for the external traffic simulation software being used. My current system involves a rlNumericspec observation space which uses 10 variables and a continuous action space which performs 2 actions.
I would like to simultaneously run the 4 of the same DDPG agents (or other actor-critic models if necessary) which would each have their own independent reward and action space. Is this possible with the Reinforcement Learning Toolbox as of 2020 and if so how may one approach it? More specifically:
  • How would one specify the 4 different sets of observations/actions and how would this be done in the same custom Constructor Function? Each one is of the form rlNumericSpec([10 1]) for a total of 40 observations and an observation space of the form rlNumericSpec([8 1],'LowerLimit',[20;20],'UpperLimit',[40;40]). I have tried following this example (Train Multiple Agents for Path Following Control - MATLAB & Simulink (mathworks.com)) for the actioninfo and obsinfo syntax, ie obsinfo = {obsinfo1, obsinfo2...) whuch thus far has returned an error.
  • For applying said actions to the custom environment, how would said actions appear once the model is running? Would it simply be of the form Action1() Action2(), etc?
  • How would the individual localized reward function be set within the step function. By default for a single agent the reward is simply stored as "Reward", would there be a form such that the rewards would be discretized into Reward_agent1, Reward_agent2, etc?
  • Is it an absolute must to use simulink or can this be done with my existing custom environment setup?
  • Are there any additional resources which may help me achieve this that I may have missed?
I understand that this is quite a large question, but I hope that this would also help others looking to use this software for more complex multi-agent applications without simulink. Thank you in advance for your assistance.

Accepted Answer

Emmanouil Tzorakoleftherakis
Edited: Emmanouil Tzorakoleftherakis on 11 Dec 2020
Hello,
As you noticed, as of R2020b we support (decentralized) multi-agent RL but only in Simulink. We are looking to expand this to more centralized multi-agent approaches in future releases, potentially outside of Simulink (i.e. in MATLAB) as well.
One workaround would be to convert your MATLAB-based environment into a Simulink one using the MATLAB function block. That would allow you to use multi-agent training in 20b and refer to the example links you posted.
Another workaround is to combine all the observations and actions into a single DDPG agent. That way you would be able to use a MATLAB environment (is this what you meant when you said that you combined the 4 approaches?). As you found out though, decentralized multi-agent training comes with challenges, particularly because it leads to non-stationary environments. I don't know how you have set up your problem, but each agent will need to be aware of what every other agent is doing and vice versa. So all previous actions will need to show up as observations for example. That may resolve the situation you described where optimum actions are overshadowed by subpar ones (although the individual subrewards will need to also be properly scaled).
You may also want to look at this example which sounds similar to what you are trying to do.
Hope that helps
  5 Comments
Tarek Ghoul
Tarek Ghoul on 13 Dec 2020
Hello again Emmanouil,
After taking your advice I have attempted to transfer the code from standard MATLAB to Simulink. I have spent several hours reading the supporting documentary and taking the onramp course available online. I am still having difficulties loading in vissim into simulink as I keep obtaining errors.
In the original model, the Vissim object was loaded in a one time use constructor using Vissim =actxserver('Vissim.Vissim.700') which also passed Vissim onto the reset and step handles as I mentioned earlier. In this model, I cannot seem to get Vissim to act as an input to the Matlab custom step/environment Fcn Block.
As you recommended:
>Create a MATLAB function 'myfun' that either creates or loads the vissim object from workspace. You can make that a persistent variable to avoid doing that all the time.
I have created the following code at the workspace level to import the vissim object as a persistant variable. While the object Vissim persists in other files if accessed, it does not seem to be the case for the simulated simulink environment.
coder.extrinsic('StartVissim2')
makepersist()
[Vissim] = StartVissim2()
function [test] = makepersist()
persistent Vissim
persistent vnet
persistent sim
end
function [Vissim] = StartVissim2()
Vissim = actxserver('Vissim.Vissim.700');
Vissim.LoadNet('D:\User\Vissim\testnet\testnetdiscrete.inpx');
vnet=Vissim.net
sim = Vissim.sim
mlock
end
After running the above, any reference to Vissim via dot indexing is undefined returning "Attempt to extract field 'Simulation/Net/etc.' is undefined". It appears as though Vissim itself is stored as a "mxArray", but any attempts to call it yields an error.
For reference, my environment/step function is (with irrelevant/trivial portions removed for readability):
function [rewardA, rewardB,isDone,LoggedSignals,observation] = stepfunction(LoggedSignals,actionA,actionB,observation)
coder.extrinsic('StartVissim2')
TR=0
%test variable to get it to run a function just once, will be modified once working
if TR <1
[Vissim] = StartVissim2()
end
%sets up dot indexed variables to be called
sim = Vissim.Simulation;
vnet = Vissim.Net;
%gets logged signals
State = LoggedSignals.State
% Unpack state vector from previous step
n1 = LoggedSignals.State(1);
n2 = LoggedSignals.State(2);
n3 = LoggedSignals.State(3);
n4= LoggedSignals.State(4);
asp1= LoggedSignals.State(5);
asp2= LoggedSignals.State(6);
asp3= LoggedSignals.State(7);
asp4= LoggedSignals.State(8);
PGap= LoggedSignals.State(9);
TSG= LoggedSignals.State(10);
TSR= LoggedSignals.State(11);
%Stores and executes actions using function
VP1 = Action(1);
VP2 = Action(2);
ApplyAction(Vissim,vnet,LoggedSignals,VP1,VP2,PlatoonIDmat1,PlatoonIDmat2,TSG,TSR,TSGmax,TSRmax);
%Function which runs 10 vissim timesteps using the 'Vissim.simulation.RunSingleStep'
%command while obtaining vehicle data from Vissim and interpreting it
[Outputs_cycle_parameters] = Generate_CV_and_NCV_Matrices(Vissim,sim,vnet,LoggedSignals);
%extracts relevant reward and observation parameters from Outputs_cycle_parameters
ConfM2= Outputs_cycle_parameters(8);
TSGmax = Outputs_cycle_parameters(13);
TSRmax = Outputs_cycle_parameters(14);
TSG = Outputs_cycle_parameters(15);
TSR = Outputs_cycle_parameters(16);
%obtains observation data now that actions have been performed and the reward is obtained
[n1, n2, n3,n4,asp1,asp2,asp3,asp4] = getns(Vissim,sim,vnet);
[PlatoonIDmat1 PlatoonIDmat2 PGap] = GetPlatoons(Vissim, vnet,LoggedSignals);
%sets LoggedSignal State as well as a .Pass that allows for the maximum Green and Red times
%to be recorded and saved for the next timestep without being considered observations
LoggedSignals.State(1)=n1;
LoggedSignals.State(2)=n2;
LoggedSignals.State(3)=n3;
LoggedSignals.State(4)=n4;
LoggedSignals.State(5)=asp1;
LoggedSignals.State(6)=asp2;
LoggedSignals.State(7)=asp3;
LoggedSignals.State(8)=asp4;
LoggedSignals.State(9)= PGap
LoggedSignals.State(10)=TSG;
LoggedSignals.State(11)=TSR;
LoggedSignals.Pass(1) = TSGmax
LoggedSignals.Pass(2) = TSRmax
%Defining the Observation
Observation = LoggedSignals.State;
% Update system states
NextObs = LoggedSignals.State;
% Check terminal condition (No reasonable "done" condition due to Q0), this is governed by steps
IsDone = 0;
%Defines reward based on previous value
Reward = -ConfM2
end
function [Outputs_cycle_parameters]= Generate_CV_and_NCV_Matrices(Vissim,sim,vnet,LoggedSignals)
%Function Which runs the simulation using a for loop with sim.RunSingleStep and several
%other simple matlab computations. To avoid innundating this with code, the most relevant
%portions with regards to the simulink problem are:
get(Vissim.Net.SignalHeads.ItemByKey(15), 'AttValue', 'State') %obtains signal states from Vissim COM
speedmat = Vissim.Net.Vehicles.GetMultiAttValues('Speed') %obtains speed using the GetMultiAttValues method
posmat = Vissim.Net.Vehicles.GetMultiAttValues('Pos')%obtains position using the GetMultiAttValues method
%the remaining code simply manipulates this to obtain traffic parameters for the observation space and collects
% all speeds/positions/types in one big matrix
end
function [n1, n2, n3,n4,asp1,asp2,asp3,asp4] = getns(Vissim,sim,vnet)
%simple funtion to obtain the vehicle and average speeds of vehicles in 4 segments of an approach
%for the observation
end
function [PlatoonIDmat1 PlatoonIDmat2 PGap] = GetPlatoons(Vissim, vnet,LoggedSignals);
%Simple matrix manipulation to identify clusters of vehicles using the similar getmultiattvalues method
end
function ApplyAction(Vissim,vnet,LoggedSignals,VP1,VP2,PlatoonIDmat1,PlatoonIDmat2,TSGm,TSRm,TSGmax,TSRmax,App_Number)
%applies speeds to vehicles and to objects in the Vissim Simulation using both conditional
% statements and the speeds obtained by the action with the following COM related syntax:
vnet.DesSpeedDecision.ItemByKey(2+5*(App_Number-1)).set('AttValue','DesSpeedDistr(70)',VP2);
vnet.Vehicles.ItemByKey(PlatoonIDmat2(n,1)).set('AttValue','DesSpeed',VP1)
end
%attempt to call function in order for it to load Vissim within the environmennt. UNfortunately
%this does not work
function [Vissim] = StartVissim2()
Vissim = actxserver('Vissim.Vissim.700');
Vissim.LoadNet('D:\User\Vissim\testnet\testnetdiscrete.inpx');
mlock
end
If I understood correctly, the above code is what is necessary (beyond the COM interface) for the following structure to be used (assuming 2 agents). Effectively the same as the other examples with the input being the actions and the previous state via logged signals and observations, and the output being the rewards, observation, isdone, and logged signals. At the time being I am testing the stepfunction independently of the other systems to ensure that it works beforehand
I hope that this explains the situation. With this in mind, how might one go about modifying the code with regards to the extrinsic functions and persistant variables to get the COM interface to stick and be able to call variables within the simulink custom function?
Thank you again for all of your help and your quick responses. I appreciate the time that you have spent helping me and others like me with learning how to use this powerful software.
Emmanouil Tzorakoleftherakis
Edited: Emmanouil Tzorakoleftherakis on 16 Dec 2020
Hi again Tarek,
No problem, I try to make time to help out every now and then given that this is not my dayjob.
I think your setup is in the right direction. Here is where I believe the problem is: Vssim is an object that MATLAB cannot directly recognize/generate code from. Converting objects to their C Code equivalent is necessary, particularly when you are using these objects an inputs/outputs in function as you are doing. My recommendation is to encapsulate every single function where vssim is needed as input/output in a single function that only inputs/outputs variables that can be directly read by MATLAB. So something like:
function [rewardA, rewardB,isDone,LoggedSignals,observation] = stepfunction(LoggedSignals,actionA,actionB,observation)
coder.extrinsic('myfun')
[rewardA, rewardB,isDone,LoggedSignals,observation] = myfun(LoggedSignals,actionA,actionB,observation)
end
Then anything that handles Vssim put it in myfun and MATLAB will not bother try to convert it to C code which may eliminate the errors you are seeing.
Hopefully that works

Sign in to comment.

More Answers (0)

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!