Collaborative DDPG/Actor-Critic Example
11 views (last 30 days)
Show older comments
Tarek Ghoul
on 11 Dec 2020
Edited: Emmanouil Tzorakoleftherakis
on 16 Dec 2020
I have currently developed a DDPG model which optimizes traffic in intersections along one direction. I am looking towards implementing four of the same model on each direction, ie North-South, South-North, East-West, and West-East, ie I would like to run 4 DDPG models simultaneously each with its own local reward function. I have attempted to combine all 4 approaches but unfortunately the model appears to confuse actions in one direction with observations in another.
For example, if the agent sends a signal to a certain vehicle in the east-west lane to change its speed while simultaneously doing the same in the north-west direction for another vehicle, the system would consider the sum of all rewards for all actions performed, resulting in optimum actions performed on one approach being overshadowed by subpar actions on another.
It is for this reason that I believe that a collaborative multiagent approach may be ideal but I cannot seem to find anything in the Matlab supporting documents to indicate how this may be done beyond very simple simulink examples. I have noted the following which still leave significant gaps:
My current model utilizes a custom environment which interfaces with another software's COM in order to generate a sample environment from which observations are taken and actions are applied. I am not currently using Simulink as a result of the need for the external traffic simulation software being used. My current system involves a rlNumericspec observation space which uses 10 variables and a continuous action space which performs 2 actions.
I would like to simultaneously run the 4 of the same DDPG agents (or other actor-critic models if necessary) which would each have their own independent reward and action space. Is this possible with the Reinforcement Learning Toolbox as of 2020 and if so how may one approach it? More specifically:
- How would one specify the 4 different sets of observations/actions and how would this be done in the same custom Constructor Function? Each one is of the form rlNumericSpec([10 1]) for a total of 40 observations and an observation space of the form rlNumericSpec([8 1],'LowerLimit',[20;20],'UpperLimit',[40;40]). I have tried following this example (Train Multiple Agents for Path Following Control - MATLAB & Simulink (mathworks.com)) for the actioninfo and obsinfo syntax, ie obsinfo = {obsinfo1, obsinfo2...) whuch thus far has returned an error.
- For applying said actions to the custom environment, how would said actions appear once the model is running? Would it simply be of the form Action1() Action2(), etc?
- How would the individual localized reward function be set within the step function. By default for a single agent the reward is simply stored as "Reward", would there be a form such that the rewards would be discretized into Reward_agent1, Reward_agent2, etc?
- Is it an absolute must to use simulink or can this be done with my existing custom environment setup?
- Are there any additional resources which may help me achieve this that I may have missed?
I understand that this is quite a large question, but I hope that this would also help others looking to use this software for more complex multi-agent applications without simulink. Thank you in advance for your assistance.
0 Comments
Accepted Answer
Emmanouil Tzorakoleftherakis
on 11 Dec 2020
Edited: Emmanouil Tzorakoleftherakis
on 11 Dec 2020
Hello,
As you noticed, as of R2020b we support (decentralized) multi-agent RL but only in Simulink. We are looking to expand this to more centralized multi-agent approaches in future releases, potentially outside of Simulink (i.e. in MATLAB) as well.
One workaround would be to convert your MATLAB-based environment into a Simulink one using the MATLAB function block. That would allow you to use multi-agent training in 20b and refer to the example links you posted.
Another workaround is to combine all the observations and actions into a single DDPG agent. That way you would be able to use a MATLAB environment (is this what you meant when you said that you combined the 4 approaches?). As you found out though, decentralized multi-agent training comes with challenges, particularly because it leads to non-stationary environments. I don't know how you have set up your problem, but each agent will need to be aware of what every other agent is doing and vice versa. So all previous actions will need to show up as observations for example. That may resolve the situation you described where optimum actions are overshadowed by subpar ones (although the individual subrewards will need to also be properly scaled).
Hope that helps
5 Comments
Emmanouil Tzorakoleftherakis
on 13 Dec 2020
Edited: Emmanouil Tzorakoleftherakis
on 16 Dec 2020
Hi again Tarek,
No problem, I try to make time to help out every now and then given that this is not my dayjob.
I think your setup is in the right direction. Here is where I believe the problem is: Vssim is an object that MATLAB cannot directly recognize/generate code from. Converting objects to their C Code equivalent is necessary, particularly when you are using these objects an inputs/outputs in function as you are doing. My recommendation is to encapsulate every single function where vssim is needed as input/output in a single function that only inputs/outputs variables that can be directly read by MATLAB. So something like:
function [rewardA, rewardB,isDone,LoggedSignals,observation] = stepfunction(LoggedSignals,actionA,actionB,observation)
coder.extrinsic('myfun')
[rewardA, rewardB,isDone,LoggedSignals,observation] = myfun(LoggedSignals,actionA,actionB,observation)
end
Then anything that handles Vssim put it in myfun and MATLAB will not bother try to convert it to C code which may eliminate the errors you are seeing.
Hopefully that works
More Answers (0)
See Also
Categories
Find more on Deep Learning with Simulink in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!