RL Agent Training for multiple training samples

10 views (last 30 days)
I have designed a reinforcement learning (RL) environment using the Reset and Step functions. The code is provided below. In the Reset function, I utilize two files: "G1 line data.xlsx" and "G1 load data.xlsx." Currently, I obtain the first samples from these files for loads and power flow using the commands loads = load_data(1, :)'; and powerFlows = line_data(1, :)';, respectively. As a result, the RL agent trains on a single sample and then stops.
I would like to modify my approach so that the RL agent can train on all the samples in the files "G1 line data.xlsx" and "G1 load data.xlsx."
Reset Function
function [initialObs, initialState] = myResetFunction()
mpc = loadcase('case118');
%mpc.branch(:,6)=2*100*ones(1,186);
%initial_results=rundcopf(mpc)
%loads = initial_results.bus(:, 3);
%powerFlows = initial_results.branch(:, 14);
line_data=xlsread("G1 line data.xlsx");
load_data=xlsread("G1 load data.xlsx");
loads=load_data(1,:)';
powerFlows=line_data(1,:)';
% Ensure the initial observation is a column vector
initialObs = [loads; powerFlows];
initialObs = reshape(initialObs, [], 1); % Ensure column vector with correct shape
initialState = initialObs; % Initialize or reset logged signals if needed
end
Step Function
function [nextObs, reward, isDone, nextstate] = myStepFunctionnew(action, nextstate)
% Load the case
mpc = loadcase('case118');
mpc.branch(:, 6) = 2 * 100 * ones(1, 186); % Setting line limits
% Initialize penalties
genPenalty = 0;
linePenalty = 0;
% Initial generation values (example initialization)
initialGen = [
37.72083507, 41.21935468, 38.62409283, 16.8478368, 200, ...
88.01527158, 15.75220711, 12.34566609, 11.40791109, 8.44E-09, ...
196.3249472, 270.6145549, 9.26E-09, 6.794891389, 1.14E-08, ...
21.74105571, 21.99954329, 13.90368324, 4.394980313, 18.79989818, ...
201.4793428, 47.31339019, 3.51E-08, 3.52E-08, 152.5197471, ...
157.3295995, 3.07E-08, 383.9069549, 385.811582, 505.3672237, ...
1.65E-08, 1.10E-08, 1.52E-08, 1.89E-08, 2.11E-08, 2.33E-08, ...
466.9307997, 2.37E-08, 3.914399662, 594.0434087, 2.37E-08, ...
2.37E-08, 2.37E-08, 2.39E-08, 246.6411074, 39.14938215, ...
2.38E-08, 2.38E-08, 2.38E-08, 2.38E-08, 35.23444391, ...
2.38E-08, 5.851888342, 2.60E-08
]';
% Check if the action vector length matches the number of generators
if length(action) ~= length(initialGen)
error('Action vector length must match the number of generators.');
end
% Define the bounds for new generation values
lowerBound = initialGen * 0.8; % 20% reduction
upperBound = initialGen * 1.2; % 20% increase
% Calculate new generation values based on action
newGen = initialGen .* (1 + action); % Adjust PG value
% Clamp newGen to ensure it's within the bounds
newGen = max(newGen, lowerBound); % Ensure not below lower bound
newGen = min(newGen, upperBound); % Ensure not above upper bound
% Normalize newGen to ensure its sum is 4242
currentSum = sum(newGen);
desiredSum = 4242;
if currentSum ~= 0 % Avoid division by zero
scalingFactor = desiredSum / currentSum;
newGen = newGen * scalingFactor; % Scale newGen
end
mpc.gen(:, 2) = newGen; % Update the generation value
% Debugging output: Check new generation values
%disp('New Generation Values:');
%disp(newGen);
% Check if the generator is within the ±20% range of initial value
for i = 1:length(action)
if newGen(i) < 0.8 * initialGen(i) || newGen(i) > 1.2 * initialGen(i)
% High penalty for violating the generator constraint
genPenalty = genPenalty + abs(newGen(i) - initialGen(i)); % Amount by which generator constraint is violated
end
end
fprintf('genPenalty = %d\n', genPenalty)
% Run the DC power flow calculation
[results, success] = rundcpf(mpc);
% Extract observations
loads = results.bus(:, 3);
powerFlows = results.branch(:, 14);
nextObs = [loads; powerFlows];
nextObs = reshape(nextObs, [], 1); % Ensure column vector
% Power Flow Constraint Check
maxLineLimits = mpc.branch(:, 6);
for j = 1:length(powerFlows)
if abs(powerFlows(j)) > maxLineLimits(j)
% Calculate the line limit violation using absolute power flow
linePenalty = linePenalty + (abs(powerFlows(j)) - maxLineLimits(j)); % Amount by which line limit is violated
end
end
fprintf('genPenalty = %d\n', genPenalty)
fprintf('linePenalty = %d\n', linePenalty)
% Current cost calculation
currentCost = sum(results.gencost(:, 5) .* results.gen(:, 2).^2 + results.gencost(:, 6) .* results.gen(:, 2));
fprintf('currentCost = %d\n', currentCost)
% Initialize the reward
if success == 1
if genPenalty > 0 || linePenalty > 0
% If there are generator or line constraint violations
reward = -(100)*(genPenalty + linePenalty);
fprintf('reward is due to penalty %d\n', reward); % Reward for violations
else
% If no constraints are violated
reward = 1*10^4 - 0.01 * currentCost;
fprintf('reward is actual reward %d\n', reward);
fprintf('cost = %d\n', currentCost); % Calculate the reward based on current cost
end
else
reward = -127460.046762613 * 10000; % High penalty for solution divergence
fprintf('reward is due to divergence %d\n', reward);
end
% Set isDone to false as termination condition is removed
isDone = false; % Modify this as needed based on your logic
% Update the next state
nextstate = nextObs;
% Store the reward in the episode history
persistent rewardHistory;
if isempty(rewardHistory)
rewardHistory = [];
end
rewardHistory(end + 1) = reward; % Append the current reward to history
end

Answers (1)

Aravind
Aravind on 30 Oct 2024
I understand that you want your reinforcement learning agent to start training from different initial conditions each time, based on the conditions listed in your Excel files. From your code, it looks like you are currently reading the Excel files but only selecting the first entry. To vary the initial conditions, you can simply use a different index instead of always selecting the first one.
Here are a couple of strategies you can use to select the index for sampling:
  1. Sample Index Tracking: Implement a system to keep track of which sample the agent is currently using. You can achieve this with a variable that persists across episodes, ensuring that each new episode uses the next sample from the dataset. If the index surpasses the number of available samples, you can reset it to start from the beginning.
  2. Random Index: At the start of each episode, randomly select the sample index. This method removes the need for global or persistent variables and introduces variability into the training process. By randomly picking a starting sample, the agent is exposed to a wider range of initial conditions over time, which can enhance its ability to generalize and adapt to new situations.
By applying these methods, your RL agent will be able to iterate through all samples in your dataset, allowing for comprehensive training. Random initialization can enhance the robustness and adaptability of your RL agent, particularly when dealing with large and diverse datasets.
Here are some resources you might find helpful for implementing these strategies for selecting the initial condition in the reset function:
  1. Persistent variables in MATLAB: https://in.mathworks.com/help/matlab/ref/persistent.html
  2. Global variables in MATLAB: https://www.mathworks.com/help/matlab/ref/global.html
  3. "randi" function for generating random integers: https://www.mathworks.com/help/matlab/ref/randi.html
  4. Example of using the "randi" function in MATLAB: https://www.mathworks.com/help/matlab/math/random-integers.html
I hope this helps!
  1 Comment
Praveen Verma
Praveen Verma on 30 Oct 2024
@Aravind Thanks for the response! If I change the reset function as shown below, will it work? I'm also a bit confused about how the agent will know when to move from the current sample to the next one (currentSample + 1).
function [initialObs, initialState] = myResetFunction()
persistent currentSampleIndex; % Persistent variable to track the current sample index
if isempty(currentSampleIndex)
currentSampleIndex = 1; % Initialize on the first call
end
mpc = loadcase('case118');
% Load data from Excel files
line_data = xlsread("G1 line data.xlsx");
load_data = xlsread("G1 load data.xlsx");
% Use currentSampleIndex to select loads and power flows
loads = load_data(currentSampleIndex, :)';
powerFlows = line_data(currentSampleIndex, :)';
% Ensure the initial observation is a column vector
initialObs = [loads; powerFlows];
initialObs = reshape(initialObs, [], 1); % Ensure column vector with correct shape
initialState = initialObs; % Initialize or reset logged signals if needed
% Display the sample index being used
fprintf('Using sample index: %d\n', currentSampleIndex);
% Update the sample index for the next call
currentSampleIndex = currentSampleIndex + 1; % Increment to the next index
if currentSampleIndex > size(load_data, 1)
currentSampleIndex = 1; % Reset to the first sample if at the end
end
end

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!