Firstly, I must say being RL into the MATLAB platform and have the capability to integrate to Simulink is just so exciting for an ML engineer, used to Python otherwise. I believe this will evolve and cause tremendous excitment in the engineering organizations worldwide. I am so happy to be an early adopter.
I am comparing a PID control versus RL control for a non-linear valve model.
I used the water-tank control DDPG example MATLAB provides is a good starting point. I used a similar strategy of moving the reference signal randomly (within 2 and 10) and moving the inital state of flow randomly (again within 2 and 10).
I expected that the Episode Manager plots will look similar but the Q0 does not converge. I've tried training for 5,000 and once till 10,000 as well.
The critic and actor designs are similar to water-tank. Attached images show the PID model, the RL model, plant and episode manager plots.
Do you please have any suggestions?
I have gone through some similar posts here including this and expert Enrico Anderlini suggestions
I did try modifying the exploration parameters a bit (programmed variance at 0.5 and decay at 1e-4. But that didn't seem to help much.
1. PID Control
2. RL Control
2.b. Non-linear Plant_model
3. RL at 400 episodes
4. RL at 700 episodes