GSoC: Integration of Agents.jl with RL methods #1170

bergio13 · 2025-07-10T17:16:33Z

Fixes #648

Initial draft for the integration of reinforcement learning methods within Agents.jl. It is a working sketch still to be polished, refined and improved. The file rl_interface contains the code that allows to train the agents of an abm model using reinforcement learning. Examples on how to use this interface are provided in rl_interface_examples. These examples can be compared with the ones implemented without the interface (see boltzmann_local and wolfsheep) to see how much the interface simplifies the code the user needs to implement.

codecov-commenter · 2025-07-10T17:31:29Z

Codecov Report

❌ Patch coverage is 20.15707% with 305 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.30%. Comparing base (8b5b456) to head (4eb1985).
⚠️ Report is 195 commits behind head on main.

Files with missing lines	Patch %	Lines
ext/AgentsRL/src/rl_training_functions.jl	0.00%	132 Missing ⚠️
ext/AgentsRL/src/rl_utils.jl	0.00%	116 Missing ⚠️
ext/AgentsRL/src/step_reinforcement_learning.jl	0.00%	37 Missing ⚠️
src/reinforcement_learning.jl	79.31%	18 Missing ⚠️
src/Agents.jl	77.77%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1170      +/-   ##
==========================================
+ Coverage   70.12%   77.30%   +7.18%     
==========================================
  Files          42       42              
  Lines        2718     2992     +274     
==========================================
+ Hits         1906     2313     +407     
+ Misses        812      679     -133

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

examples/rl/rl_interface_examples.jl

bergio13 · 2025-07-17T09:10:24Z

I am trying to implement the new model type in the following way:

struct ReinforcementLearningABM{
    S<:SpaceType,
    A<:AbstractAgent,
    C<:Union{AbstractDict{Int,A},AbstractVector{A}},
    T,G,K,F,P,R<:AbstractRNG} <: AgentBasedModel{S}
    # Standard ABM components
    agents::C
    agent_step::G
    model_step::K
    space::S
    scheduler::F
    properties::P
    rng::R
    agents_types::T
    agents_first::Bool
    maxid::Base.RefValue{Int64}
    time::Base.RefValue{Int64}

    # RL-specific components
    rl_config::Base.RefValue{Any}
    trained_policies::Dict{Type,Any}
    training_history::Dict{Type,Vector{Float64}}
    is_training::Base.RefValue{Bool}
end

# Extend mandatory internal API for AgentBasedModel
containertype(::ReinforcementLearningABM{S,A,C}) where {S,A,C} = C
agenttype(::ReinforcementLearningABM{S,A}) where {S,A} = A
discretimeabm(::ReinforcementLearningABM) = true

function ReinforcementLearningABM(
    A::Type,
    space::S=nothing,
    rl_config=nothing;
    agent_step!::G=dummystep,
    model_step!::K=dummystep,
    container::Type=Dict,
    scheduler::F=Schedulers.Randomly(),
    properties::P=nothing,
    rng::R=Random.default_rng(),
    agents_first::Bool=true,
    warn=true,
    kwargs...
) where {S<:SpaceType,G,K,F,P,R<:AbstractRNG}

    # Initialize agent container using proper construction
    agents = construct_agent_container(container, A)
    agents_types = union_types(A)
    T = typeof(agents_types)
    C = typeof(agents)

    model = ReinforcementLearningABM{S,A,C,T,G,K,F,P,R}(
        agents,
        agent_step!,
        model_step!,
        space,
        scheduler,
        properties,
        rng,
        agents_types,
        agents_first,
        Ref(0),
        Ref(0),
        Ref{Any}(rl_config),
        Dict{Type,Any}(),
        Dict{Type,Vector{Float64}}(),
        Ref(false)
    )

    return model
end

I am able to create the model yet when I try to use the function add_agent I get a notimplemented error. Does this mean I have to redefine add_agent, remove_agent, etc... functions or extend the model_accessing_api functions ?

What I mean is the following, is it better to do something like:
const DictRLABM = ReinforcementLearningABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}
or should I try to extend the existing functions like this: const DictABM = Union{StandardABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}, EventQueueABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}, ReinforcementLearningABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}}

Datseris · 2025-07-17T09:57:04Z

you need to make the model a subtype of AgentBasedModel.

bergio13 · 2025-07-17T10:00:53Z

I thought I was already doing it here:
struct ReinforcementLearningABM{ S<:SpaceType, A<:AbstractAgent, C<:Union{AbstractDict{Int,A},AbstractVector{A}}, T,G,K,F,P,R<:AbstractRNG} <: AgentBasedModel{S}

So this is not sufficient ?

Datseris · 2025-07-17T17:50:33Z

ah, sorry, I didn't see this correctly. I am a bit short on time right now, but I can spend some time to help here next weekend, 26-27 of July. @Tortar perhaps you can give some advice before that?>

bergio13 · 2025-07-17T17:55:22Z

No worries, I think I figured it out. I went this solution const DictABM = Union{StandardABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}, EventQueueABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}, ReinforcementLearningABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}}. But during the next meeting we can discuss this and if you don't like it i can change it.

Datseris · 2025-07-17T17:58:56Z

Yes, this DictABM design is not optimal, because it is extended by type while it should be extended by functions. Instead of trying to use multiple dispatch to dispatch on models that are dict-based, we should instead have a central function with a bunch of if statements that calls agent_container(model), examines its type, and then acts accordingly? Let's go over this in our next meeting. I am up for weekly meetings on Thursdays.

bergio13 · 2025-07-17T18:11:19Z

We have our next meeting scheduled on the 24/07 at 11. By the way is there any chance we can move that later in the day ? I might have some problems connecting at that time. Furthermore, there are also some design choices regarding the implementation of the POMDPs.initialstate() function that this refactoring from GeneralRLEnvironment to ReinforcementLearningABM requires to take into account in my opinion.

Datseris · 2025-07-17T20:12:09Z

at 11

Which timezone? We can move, give me a timewindow (and zone) of preferrence and i'll put a date there assuming Adriano is also available.

Tortar · 2025-07-18T00:48:40Z

Yes, this DictABM design is not optimal, because it is extended by type while it should be extended by functions. Instead of trying to use multiple dispatch to dispatch on models that are dict-based, we should instead have a central function with a bunch of if statements that calls agent_container(model), examines its type, and then acts accordingly? Let's go over this in our next meeting. I am up for weekly meetings on Thursdays.

Actually the best way to solve this is to make DictABM = AgentBasedModel{S, A,<:AbstractDict{<:Integer,A}} where {S,A}. For now AgentBasedModel doesn't have all those parameters though, but they can be added.

Tortar · 2025-07-18T01:02:10Z

For me it's okay at any hour in the afternoon by the way so I'll let you decide for the hour of our meeting

bergio13 · 2025-07-20T18:18:10Z

We could make it after lunch, would 2.30pm (CEST) work ?

Datseris · 2025-07-20T21:02:20Z

yes, i update hte invite

…erface for RL - Reorganized old interface examples - Added ReinforcementLearningABM and RLEnvironmentWrapper to enable compatibility with POMDPs-based RL algorithms provided by Crux. - Implemented necessary POMDPs functions: actions, observations, observation, initialstate, initialobs, gen, isterminal, discount, and state_space. - Added step_rl! and rl_agent_step! functions for RL agent behavior. - Added examples to show how the new model type works

src/core/model_reinforcement_learning.jl

src/core/rl_utils.jl

Datseris · 2025-07-26T22:40:28Z

hello, i am sorry but i am on sick leave for two weeks and i cannot meet next week. I am available again on the 11th of August and would be happy to meet that week!

Datseris · 2025-08-08T09:10:31Z

Hello, I am returning from my sick leave next week, would you like to have a videocall on thursday?

bergio13 · 2025-08-11T16:21:38Z

Hello, I am returning from my sick leave next week, would you like to have a videocall on thursday?

At what time would you like to meet ?

Datseris · 2025-08-11T17:04:35Z

I can do 2pm UK time if that works.

bergio13 · 2025-08-12T09:39:21Z

I can do 2pm UK time if that works.

For me it works

ext/AgentsVisualizations/src/interaction.jl

examples/rl_boltzmann.jl

Tortar · 2025-08-22T14:34:06Z

examples/rl_boltzmann.jl

+    target_agent = model[agent_id]
+    agent_pos = target_agent.pos
+    width, height = getfield(model, :space).extent
+    observation_radius = model.rl_config[][:observation_radius]


this could be a model property instead of being passed in rl_config

examples/rl_boltzmann.jl

Tortar · 2025-08-22T14:57:22Z

examples/rl_boltzmann.jl

+# decreases in the Gini coefficient. This creates an incentive for agents to learn
+# movement patterns that promote wealth redistribution.
+
+function boltzmann_calculate_reward(env, agent, action, initial_model, final_model)


probably env can be spared and something like boltzmann_calculate_reward(agent, action, previous_model, current_model) could be better

Yeah this still needs to be fixed.

Tortar · 2025-08-22T14:58:04Z

examples/rl_boltzmann.jl

+    properties = Dict{Symbol,Any}(
+        :gini_coefficient => 0.0,
+        :step_count => 0
+    )


these are probably not needed

Tortar · 2025-08-22T15:01:25Z

I added review comments about what we discussed @bergio13 @Datseris in the meeting as well as other example code simplifications

examples/rl_boltzmann.jl

Tortar · 2025-08-23T01:15:50Z

I have taken care of the most part of these review things myself. I've also generated these videos showing how random agents differ from RL agents in our example

boltzmann.mp4

rl_boltzmann.mp4

RL agents are clearly smarter :-)

Great work Giorgio!

Tortar

I verified locally that documentation works, only the videos are lacking at the moment because needs to be included with html code after being produced. Apart from that, I think that the PR is in pretty good shape and so I will approve it, but before merging it, it would probably be useful if @Datseris has a final look at it (and the few open review comments be tackled)

Co-authored-by: George Datseris <[email protected]>

Datseris · 2025-08-27T08:20:06Z

Thanks a lot for your work both @bergio13 and @Tortar . I agree that this looks great, and it is very close to finishing! However I would really like to have a final in-depth look before that. The only problem is that I am currently under a lot of pressure from my main job and therefore lack time. I will try to work on this on the coming weekend if that's okay. Here and there during the evenings I will be adding comments to the review (you won't see them until I submit).

The PR is approved and as far as I can tell @bergio13 had a great GSOC project!

bergio13 · 2025-08-29T10:53:59Z

I have taken care of the most part of these review things myself. I've also generated these videos showing how random agents differ from RL agents in our example

boltzmann.mp4
rl_boltzmann.mp4
RL agents are clearly smarter :-)

Great work Giorgio!

Thanks a lot for your work both @bergio13 and @Tortar . I agree that this looks great, and it is very close to finishing! However I would really like to have a final in-depth look before that. The only problem is that I am currently under a lot of pressure from my main job and therefore lack time. I will try to work on this on the coming weekend if that's okay. Here and there during the evenings I will be adding comments to the review (you won't see them until I submit).

The PR is approved and as far as I can tell @bergio13 had a great GSOC project!

Thanks to you and to @Tortar for your help throughout this GSoC !

Datseris · 2025-09-01T18:50:56Z

Just letting you know that I have started working on this locally. I am adding review comments little by little. You won't see them until I submit the review. But there is progress! Hopefully by the end of the weekend I will be done!

Datseris

All right! I am submitting my review! I have requested plenty of changes, primarily regarding improving documentation and our interface.

There is something that really concerns me: why do we need the user to use all three of Crux, POMDS, Flux? Please write here a bullet point list justifying why each particular package is required from the user's side. What particular function or type does the user need to give from that package into Agents.jl?

Lastly, we should add a warning/note to the ReinforcementLearningABM that this feature is experimental and may change inthe future without a breaking change, because when this is tested by more users we may find that some things may need to change.

(also address teh coments by tortar please)

Datseris · 2025-08-27T08:15:38Z

docs/src/assets/boltzmann_rl_final_state.png

We should remove all the figures and videos that we can generate while building the docs. What is the status with the wolf sheep model? Is this very slow to run/train? If yes, we can upload videos and pictures for it here: https://github.com/JuliaDynamics/JuliaDynamics/tree/master/videos/agents

Datseris · 2025-08-27T08:16:31Z

examples/rl_boltzmann.jl

+# **Rules:**
+# - Agents move on a 2D periodic grid
+# - When agents occupy the same position, they may exchange wealth
+# - Wealth flows from richer to poorer agents  


I am a bit confused about this rule. Wealth flows from richer to poorer? Shouldn't it be the other way around? In the other boltzmann model we have the exchange is random with all nearby agents.

This is the same rule that has been implemented in the mesa example

Datseris · 2025-08-27T08:18:12Z

examples/rl_boltzmann.jl

+# First, we define the Gini coefficient calculation, which measures wealth inequality.
+# A Gini coefficient of 0 represents perfect equality, while 1 represents maximum inequality.
+
+function gini(wealths::Vector{Int})


It will simplify the code if the function gini takes as an input a model and calculates the wealths internally. I see in the subsequent functions the wealths are not used anywhere else.

Datseris · 2025-08-27T08:18:34Z

examples/rl_boltzmann.jl

+function gini(wealths::Vector{Int})
+    n, sum_wi = length(wealths), sum(wealths)
+    (n <= 1 || sum_wi == 0.0) && return 0.0
+    num = sum((2i - n - 1) * w for (i, w) in enumerate(sort(wealths)))


Suggested change

num = sum((2i - n - 1) * w for (i, w) in enumerate(sort(wealths)))

num = sum((2i - n - 1) * w for (i, w) in enumerate(sort!(wealths)))

Datseris · 2025-09-01T18:38:18Z

src/reinforcement_learning.jl

+    trained_policies::Dict{Type,Any}
+    training_history::Dict{Type,Any}
+    is_training::Base.RefValue{Bool}
+    current_training_agent_type::Base.RefValue{Any}


Why is this of type Any instead of A?

Datseris · 2025-09-06T12:01:03Z

src/reinforcement_learning.jl

+2. Alternate training batches between agent types
+3. Each agent learns against the evolving policies of others
+
+## Arguments


delete this!

Datseris · 2025-09-06T12:02:07Z

src/reinforcement_learning.jl

+## Arguments
+- `model::ReinforcementLearningABM`: The model containing agents to train. Must have RL 
+  configuration set via `set_rl_config!` before training.
+- `agent_types`: Single agent type (e.g., `MyAgent`) or vector of agent types (e.g., 


agent_types SHOULD NOT be an input to this function!!!!!!!!!! If this information is already present in the rl_config, why do we ask the user to input here the same information as a duplicate? No need. We obtain it from the rl_config. please reduce this function to a 2 argument version with the second argument being already the training mode and the first arugment the model.

Datseris · 2025-09-06T12:03:49Z

src/reinforcement_learning.jl

+  ```
+
+
+## Returns


Move this statement to the very top of the function as a new sentence. No need to say what the function returns. Just say that the policies are stored in the model and can be accessed via get_trained_policies(model) or copied to other models using copy_trained_policies!(target, source).

You SHOULD NOT write anywhere model.trained_policies as the users are FORBIDDEN from accessing model fields! The syntax model.property is reserved for accessing auxilary user-provided model properties.

Datseris · 2025-09-06T12:04:51Z

src/reinforcement_learning.jl

+  or copied to other models using `copy_trained_policies!(target, source)`.
+
+## Notes
+- `max_steps` is read directly from the RL configuration (`model.rl_config[][:max_steps]`)


Why? This is odd. Why is this special training option as part of rl_config and all the others are given as keywords to the training function? We need to make up our mind instead of fragment these kind of decisions. I vote let's move this training option as yet another keyword to this function and remove it from rl_config alltogether.

Datseris · 2025-09-06T12:05:18Z

src/reinforcement_learning.jl

+
+## Notes
+- `max_steps` is read directly from the RL configuration (`model.rl_config[][:max_steps]`)
+- Episode termination is controlled by the RL environment wrapper using the config value


I don't understand what this sentence means. What is an "RL environment wrapper"?

You are right, it's not very clear, I was referring to the wrapper i use to allow the model to work with POMDPs.jl, I will modify the sentence and make it more clear: ican probaly just remove it.

bergio13 · 2025-09-07T16:31:17Z

@Datseris thank you for your suggestions ! I will implement them as soon as I have more free time !

bergio13 added 4 commits July 10, 2025 17:07

Create .gitkeep

569cdac

Add files via upload

438d861

Add files via upload

d024f37

Delete examples/rl/.gitkeep

906a6c7

Datseris reviewed Jul 11, 2025

View reviewed changes

examples/rl/rl_interface_examples.jl Outdated Show resolved Hide resolved

Datseris reviewed Jul 11, 2025

View reviewed changes

examples/rl/rl_interface_examples.jl Outdated Show resolved Hide resolved

bergio13 added 3 commits July 22, 2025 14:50

ignore log

6463646

fix wolfsheep

ece4017

Tortar reviewed Jul 26, 2025

View reviewed changes

src/core/model_reinforcement_learning.jl Outdated Show resolved Hide resolved

Tortar reviewed Jul 26, 2025

View reviewed changes

src/core/rl_utils.jl Outdated Show resolved Hide resolved

fix indexing, stepping with policies and training config

fd5d309

bergio13 added 5 commits August 21, 2025 18:34

refactor observation_radius + fixes

33bb8b2

fix example

3a638d2

fix tutorial

ac20b28

add tests for extension + improve docs for RLABM

bf86bd0

fix tests

c3c984b

Datseris reviewed Aug 22, 2025

View reviewed changes

ext/AgentsVisualizations/src/interaction.jl Outdated Show resolved Hide resolved

Datseris marked this pull request as ready for review August 22, 2025 13:35

Tortar reviewed Aug 22, 2025

View reviewed changes

Tortar added 4 commits August 22, 2025 17:02

Update examples/rl_boltzmann.jl

f1bf7f6

Update examples/rl_boltzmann.jl

1b367d0

Update examples/rl_boltzmann.jl

de7a081

Update examples/rl_boltzmann.jl

73a7c49

Tortar reviewed Aug 22, 2025

View reviewed changes

examples/rl_boltzmann.jl Outdated Show resolved Hide resolved

Tortar added 6 commits August 22, 2025 17:13

Update examples/rl_boltzmann.jl

d336fd0

Update examples/rl_boltzmann.jl

c37b554

update example

aa0222a

Update Project.toml

47a3240

fix stepping

988821f

Update rl_boltzmann.jl

2e21eca

Tortar added 2 commits August 23, 2025 03:50

Update rl_boltzmann.jl

4b9edd1

Update rl_boltzmann.jl

19b85bb

Tortar approved these changes Aug 23, 2025

View reviewed changes

Update ext/AgentsVisualizations/src/interaction.jl

4eb1985

Co-authored-by: George Datseris <[email protected]>

Datseris reviewed Sep 6, 2025

View reviewed changes

	num = sum((2i - n - 1) * w for (i, w) in enumerate(sort(wealths)))
	num = sum((2i - n - 1) * w for (i, w) in enumerate(sort!(wealths)))

GSoC: Integration of Agents.jl with RL methods #1170

Are you sure you want to change the base?

GSoC: Integration of Agents.jl with RL methods #1170

Uh oh!

Conversation

bergio13 commented Jul 10, 2025 • edited by Tortar Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

bergio13 commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Datseris commented Jul 17, 2025

Uh oh!

bergio13 commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Datseris commented Jul 17, 2025

Uh oh!

bergio13 commented Jul 17, 2025

Uh oh!

Datseris commented Jul 17, 2025

Uh oh!

bergio13 commented Jul 17, 2025

Uh oh!

Datseris commented Jul 17, 2025

Uh oh!

Tortar commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tortar commented Jul 18, 2025

Uh oh!

bergio13 commented Jul 20, 2025

Uh oh!

Datseris commented Jul 20, 2025

Uh oh!

Uh oh!

Uh oh!

Datseris commented Jul 26, 2025

Uh oh!

Datseris commented Aug 8, 2025

Uh oh!

bergio13 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Datseris commented Aug 11, 2025

Uh oh!

bergio13 commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tortar commented Aug 22, 2025

Uh oh!

Uh oh!

Tortar commented Aug 23, 2025

Uh oh!

Tortar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Datseris commented Aug 27, 2025

bergio13 commented Jul 10, 2025 •

edited by Tortar

Loading

codecov-commenter commented Jul 10, 2025 •

edited

Loading

bergio13 commented Jul 17, 2025 •

edited

Loading

bergio13 commented Jul 17, 2025 •

edited

Loading

Tortar commented Jul 18, 2025 •

edited

Loading

bergio13 commented Aug 11, 2025 •

edited

Loading

Tortar left a comment •

edited

Loading

bergio13 Sep 7, 2025 •

edited

Loading

bergio13 Sep 7, 2025 •

edited

Loading

bergio13 commented Sep 7, 2025 •

edited

Loading