-
Notifications
You must be signed in to change notification settings - Fork 136
GSoC: Integration of Agents.jl with RL methods #1170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1170 +/- ##
==========================================
+ Coverage 70.12% 77.30% +7.18%
==========================================
Files 42 42
Lines 2718 2992 +274
==========================================
+ Hits 1906 2313 +407
+ Misses 812 679 -133 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I am trying to implement the new model type in the following way: struct ReinforcementLearningABM{
S<:SpaceType,
A<:AbstractAgent,
C<:Union{AbstractDict{Int,A},AbstractVector{A}},
T,G,K,F,P,R<:AbstractRNG} <: AgentBasedModel{S}
# Standard ABM components
agents::C
agent_step::G
model_step::K
space::S
scheduler::F
properties::P
rng::R
agents_types::T
agents_first::Bool
maxid::Base.RefValue{Int64}
time::Base.RefValue{Int64}
# RL-specific components
rl_config::Base.RefValue{Any}
trained_policies::Dict{Type,Any}
training_history::Dict{Type,Vector{Float64}}
is_training::Base.RefValue{Bool}
end
# Extend mandatory internal API for AgentBasedModel
containertype(::ReinforcementLearningABM{S,A,C}) where {S,A,C} = C
agenttype(::ReinforcementLearningABM{S,A}) where {S,A} = A
discretimeabm(::ReinforcementLearningABM) = true
function ReinforcementLearningABM(
A::Type,
space::S=nothing,
rl_config=nothing;
agent_step!::G=dummystep,
model_step!::K=dummystep,
container::Type=Dict,
scheduler::F=Schedulers.Randomly(),
properties::P=nothing,
rng::R=Random.default_rng(),
agents_first::Bool=true,
warn=true,
kwargs...
) where {S<:SpaceType,G,K,F,P,R<:AbstractRNG}
# Initialize agent container using proper construction
agents = construct_agent_container(container, A)
agents_types = union_types(A)
T = typeof(agents_types)
C = typeof(agents)
model = ReinforcementLearningABM{S,A,C,T,G,K,F,P,R}(
agents,
agent_step!,
model_step!,
space,
scheduler,
properties,
rng,
agents_types,
agents_first,
Ref(0),
Ref(0),
Ref{Any}(rl_config),
Dict{Type,Any}(),
Dict{Type,Vector{Float64}}(),
Ref(false)
)
return model
end I am able to create the model yet when I try to use the function What I mean is the following, is it better to do something like: |
you need to make the model a subtype of |
I thought I was already doing it here: So this is not sufficient ? |
ah, sorry, I didn't see this correctly. I am a bit short on time right now, but I can spend some time to help here next weekend, 26-27 of July. @Tortar perhaps you can give some advice before that?> |
No worries, I think I figured it out. I went this solution |
Yes, this |
We have our next meeting scheduled on the 24/07 at 11. By the way is there any chance we can move that later in the day ? I might have some problems connecting at that time. Furthermore, there are also some design choices regarding the implementation of the |
Which timezone? We can move, give me a timewindow (and zone) of preferrence and i'll put a date there assuming Adriano is also available. |
Actually the best way to solve this is to make |
For me it's okay at any hour in the afternoon by the way so I'll let you decide for the hour of our meeting |
We could make it after lunch, would 2.30pm (CEST) work ? |
yes, i update hte invite |
…erface for RL - Reorganized old interface examples - Added ReinforcementLearningABM and RLEnvironmentWrapper to enable compatibility with POMDPs-based RL algorithms provided by Crux. - Implemented necessary POMDPs functions: actions, observations, observation, initialstate, initialobs, gen, isterminal, discount, and state_space. - Added step_rl! and rl_agent_step! functions for RL agent behavior. - Added examples to show how the new model type works
hello, i am sorry but i am on sick leave for two weeks and i cannot meet next week. I am available again on the 11th of August and would be happy to meet that week! |
Hello, I am returning from my sick leave next week, would you like to have a videocall on thursday? |
At what time would you like to meet ? |
I can do 2pm UK time if that works. |
For me it works |
target_agent = model[agent_id] | ||
agent_pos = target_agent.pos | ||
width, height = getfield(model, :space).extent | ||
observation_radius = model.rl_config[][:observation_radius] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be a model property instead of being passed in rl_config
# decreases in the Gini coefficient. This creates an incentive for agents to learn | ||
# movement patterns that promote wealth redistribution. | ||
|
||
function boltzmann_calculate_reward(env, agent, action, initial_model, final_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably env
can be spared and something like boltzmann_calculate_reward(agent, action, previous_model, current_model)
could be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this still needs to be fixed.
properties = Dict{Symbol,Any}( | ||
:gini_coefficient => 0.0, | ||
:step_count => 0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are probably not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap
I have taken care of the most part of these review things myself. I've also generated these videos showing how random agents differ from RL agents in our example boltzmann.mp4rl_boltzmann.mp4RL agents are clearly smarter :-) Great work Giorgio! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified locally that documentation works, only the videos are lacking at the moment because needs to be included with html code after being produced. Apart from that, I think that the PR is in pretty good shape and so I will approve it, but before merging it, it would probably be useful if @Datseris has a final look at it (and the few open review comments be tackled)
Co-authored-by: George Datseris <[email protected]>
Thanks a lot for your work both @bergio13 and @Tortar . I agree that this looks great, and it is very close to finishing! However I would really like to have a final in-depth look before that. The only problem is that I am currently under a lot of pressure from my main job and therefore lack time. I will try to work on this on the coming weekend if that's okay. Here and there during the evenings I will be adding comments to the review (you won't see them until I submit). The PR is approved and as far as I can tell @bergio13 had a great GSOC project! |
Thanks to you and to @Tortar for your help throughout this GSoC ! |
Just letting you know that I have started working on this locally. I am adding review comments little by little. You won't see them until I submit the review. But there is progress! Hopefully by the end of the weekend I will be done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right! I am submitting my review! I have requested plenty of changes, primarily regarding improving documentation and our interface.
There is something that really concerns me: why do we need the user to use all three of Crux, POMDS, Flux? Please write here a bullet point list justifying why each particular package is required from the user's side. What particular function or type does the user need to give from that package into Agents.jl?
Lastly, we should add a warning/note to the ReinforcementLearningABM
that this feature is experimental and may change inthe future without a breaking change, because when this is tested by more users we may find that some things may need to change.
(also address teh coments by tortar please)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove all the figures and videos that we can generate while building the docs. What is the status with the wolf sheep model? Is this very slow to run/train? If yes, we can upload videos and pictures for it here: https://github.com/JuliaDynamics/JuliaDynamics/tree/master/videos/agents
# **Rules:** | ||
# - Agents move on a 2D periodic grid | ||
# - When agents occupy the same position, they may exchange wealth | ||
# - Wealth flows from richer to poorer agents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused about this rule. Wealth flows from richer to poorer? Shouldn't it be the other way around? In the other boltzmann model we have the exchange is random with all nearby agents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same rule that has been implemented in the mesa example
# First, we define the Gini coefficient calculation, which measures wealth inequality. | ||
# A Gini coefficient of 0 represents perfect equality, while 1 represents maximum inequality. | ||
|
||
function gini(wealths::Vector{Int}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will simplify the code if the function gini
takes as an input a model and calculates the wealths internally. I see in the subsequent functions the wealths are not used anywhere else.
function gini(wealths::Vector{Int}) | ||
n, sum_wi = length(wealths), sum(wealths) | ||
(n <= 1 || sum_wi == 0.0) && return 0.0 | ||
num = sum((2i - n - 1) * w for (i, w) in enumerate(sort(wealths))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num = sum((2i - n - 1) * w for (i, w) in enumerate(sort(wealths))) | |
num = sum((2i - n - 1) * w for (i, w) in enumerate(sort!(wealths))) |
trained_policies::Dict{Type,Any} | ||
training_history::Dict{Type,Any} | ||
is_training::Base.RefValue{Bool} | ||
current_training_agent_type::Base.RefValue{Any} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this of type Any instead of A
?
2. Alternate training batches between agent types | ||
3. Each agent learns against the evolving policies of others | ||
|
||
## Arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete this!
## Arguments | ||
- `model::ReinforcementLearningABM`: The model containing agents to train. Must have RL | ||
configuration set via `set_rl_config!` before training. | ||
- `agent_types`: Single agent type (e.g., `MyAgent`) or vector of agent types (e.g., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agent_types
SHOULD NOT be an input to this function!!!!!!!!!! If this information is already present in the rl_config
, why do we ask the user to input here the same information as a duplicate? No need. We obtain it from the rl_config
. please reduce this function to a 2 argument version with the second argument being already the training mode and the first arugment the model.
``` | ||
|
||
|
||
## Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this statement to the very top of the function as a new sentence. No need to say what the function returns. Just say that the policies are stored in the model and can be accessed via get_trained_policies(model)
or copied to other models using copy_trained_policies!(target, source)
.
You SHOULD NOT write anywhere model.trained_policies
as the users are FORBIDDEN from accessing model fields! The syntax model.property
is reserved for accessing auxilary user-provided model properties.
or copied to other models using `copy_trained_policies!(target, source)`. | ||
|
||
## Notes | ||
- `max_steps` is read directly from the RL configuration (`model.rl_config[][:max_steps]`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? This is odd. Why is this special training option as part of rl_config
and all the others are given as keywords to the training function? We need to make up our mind instead of fragment these kind of decisions. I vote let's move this training option as yet another keyword to this function and remove it from rl_config
alltogether.
|
||
## Notes | ||
- `max_steps` is read directly from the RL configuration (`model.rl_config[][:max_steps]`) | ||
- Episode termination is controlled by the RL environment wrapper using the config value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this sentence means. What is an "RL environment wrapper"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it's not very clear, I was referring to the wrapper i use to allow the model to work with POMDPs.jl, I will modify the sentence and make it more clear: ican probaly just remove it.
@Datseris thank you for your suggestions ! I will implement them as soon as I have more free time ! |
Fixes #648
Initial draft for the integration of reinforcement learning methods within Agents.jl. It is a working sketch still to be polished, refined and improved. The file
rl_interface
contains the code that allows to train the agents of an abm model using reinforcement learning. Examples on how to use this interface are provided inrl_interface_examples
. These examples can be compared with the ones implemented without the interface (seeboltzmann_local
andwolfsheep
) to see how much the interface simplifies the code the user needs to implement.