Skip to content

Conversation

@DavidFletcherShef
Copy link

Updated existing index.rst tutorial file to add a new tutorial for a graph based network example.

Currently this focuses in methods but doesn't tax the GPU very much at all. Maybe something needs adding to show the capabilities with more agents (e.g. dynamically generate a larger network - currently not thought out how to do this).

Although this is a draft pull request some feedback would be useful at this stage.

Copy link
Member

@Robadob Robadob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a thorough example of graph API, not sure if it would be better placed a well documented example model (we've been mulling restructuring the examples somehow) or on it's own tutorial page (e.g. advanced-tutorial/tutorial2 etc).

A few notes on what I expect will be feedback when reviewed by the whole team.

python

This is a pure C++ tutorial, all recent tutorials/guides have been either Python first or C++ & Python, as it's assumed that's where most the audience now is (albeit the limitations of Python).

We'd probably port all the code to Python once finished, so it can be duplicated like the above tutorial.

timenow

You're using raw C++ time and file IO, to enable you to do some manual logging. My feeling is we probably want to build that into the FLAMEGPU API if it's important enough of a use-case to be in a tutorial.

We do have a logging API, which you're using, is there a limitation within it that you're working around?

environment properties

It's interesting your use of environment properties, to configure the visualisation rather than the model itself. That's not something I think we've considered (I personally rarely use input state files).


This begins with an initialisation routine that will only run in the first modelling timestep. This is here rather than in an initialisation function as it requires access to the graph data which is only available in device functions

device/agent init has long been something I've wanted to add (FLAMEGPU/FLAMEGPU2#329)

@DavidFletcherShef
Copy link
Author

Hi Rob,

Thanks for this - I'll come back about these points in a few days as I'm away just now. To address the point about something that taxes the GPU more I've been working on a much larger network (~560 nodes, ~100 links) and getting some errors such as:

terminate called after throwing an instance of 'flamegpu::visualiser::VisAssert' what(): /home/dif/Documents/2025/Flame-Graph-Example/build/_deps/flamegpu_visualiser-src/src/flamegpu/visualiser/Draw.cpp(101): Draw::_save(): Line drawings require an even number of vertices.

terminate called after throwing an instance of 'flamegpu::visualiser::SketchError' what(): /home/dif/Documents/2025/Flame-Graph-Example/build/_deps/flamegpu_visualiser-src/src/flamegpu/visualiser/Visualiser.cpp(120): Lines sketch contains invalid number of vertices (447/3) or colours (596/4).

terminate called after throwing an instance of 'flamegpu::visualiser::SketchError' what(): /home/dif/Documents/2025/Flame-Graph-Example/build/_deps/flamegpu_visualiser-src/src/flamegpu/visualiser/Visualiser.cpp(120): Lines sketch contains invalid number of vertices (69/3) or colours (88/4).

It runs for a while then crashes, but it's not repeatable how soon it crashes. The numbers it shows do not correspond to the graph size, it seems it randomly loses some of the vertices and then fails. It happens whether or not there are any train agents simulated or drawn. I recompiled with rc3 today but the error remains. It got worse when I used visualiser.setSimulationSpeed(50); to slow the simulation down trying to figure out what is happening - maybe there's some kind of timing issue. Or could there be an issue about exceeding the memory reserved for the graph data? Or that I've currently got additional nodes in the graph data that are not yet linked to anything.

I've reproduced it on two different computers with different GPUs. GPU memory was only about 35% used, and GPU utilisation 30%. I will likely need your help to get to the bottom of this as part of completing the tutorial.

@Robadob
Copy link
Member

Robadob commented Nov 17, 2025

If you can email me the larger network, I'll try and find time to debug it in the next week.

@Robadob
Copy link
Member

Robadob commented Nov 18, 2025

I've diagnosed the problem.

There's a race condition.

The visualiser is trying to rebuild the graph sent to the visualisation (EnvironmentGraphVisData::constructGraph()), at the same time as the graphs are first linked to the visualisation. When this happens at the same time, the buffer the visualiser reading is in an imperfect state and hence various potential errors.

The visualiser is locking the lines_dynamic_mutex mutex, but it appears I forgot to do the same on the simulation side when the graph is first linked to the visualiser (it is in place for later updates to the graph).

This explains the non-deterministic behaviour! I presume the graph I tested with was much smaller, so the chance of hitting this race was much lower.

I believe I've resolved it with a small patch to the main FLAMEGPU2 repository. GitHub is currently returning internal server error on push, so I can't create a pull request just yet.

image

@Robadob
Copy link
Member

Robadob commented Nov 18, 2025

Finally got through the GitHub outage.

You can find the PR here: FLAMEGPU/FLAMEGPU2#1350

You should be able to test it locally by reconfiguring CMake with

cmake -DFLAMEGPU_VERSION=bugfix_graph_vis_race

If CMake isn't happy with changing branch, you may need to try it in a clean build directory and pass that upfront with the rest of the config args.

@DavidFletcherShef
Copy link
Author

Using -DFLAMEGPU_VERSION=bugfix_graph_vis_race and rc4 this error has now gone - thank you. Running the simulation for longer has enabled me to spot some things wrong in my tutorial code. I've worked out the updates needed (removing a confusion between graph vertex_index and vertex_id) and will clean up the tutorial code with a new commit shortly.

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@DavidFletcherShef
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

@DavidFletcherShef
Copy link
Author

I've made some updates to the Graph Tutorial - currently I've left it on the same page as the Circles Tutorial, but it could easily be moved to a separate Advanced Tutorial page of something like that.

Python - yes, this needs doing and the tab structure is there so it can be easily inserted. But I'm not a Python expert, so I'll need some help on this.

timenow - I've realised some of this was duplicating functions already present as part of the logging. I was writing the start and end time to screen but have removed that code now. I was going to remove the timenow code completely but giving unique filenames to the log files is very useful to prevent accidental overwrites and to easily identify the files. So I've left that part in.

Environment properties - yes, this and other models I've developed make extensive use of reading configurations from XML config files. In this tutorial it's just configuring the visualisation and that's quite simple. Not relevant to this tutorial, but if you want to use the XML config file to set up aspects of the agents or messaging that's harder as the config file information is not available until reading the file is triggered by flamegpu::CUDASimulation simulation(model, argc, argv);. In other models I've resolved this by creating a temporary CUDASimulation object, then use temp_cuda_sim.getEnvironmentProperty to obtain the required values, and use them to set up agents or messaging for the real CUDASimulation prior to flamegpu::CUDASimulation real_simulation(model, argc, argv);. That seemed easier than developing a separate way to read the XML, but probably incurs some memory overhead on the host.

@Robadob
Copy link
Member

Robadob commented Dec 1, 2025

Not relevant to this tutorial, but if you want to use the XML config file to set up aspects of the agents or messaging that's harder as the config file information is not available until reading the file is triggered

Possibly something to discuss with Paul.

I previously drafted a means of mutating the spatial messaging configuration via the HostAPI that a PhD student was after. However, I believe Paul wasn't keen on this functionality as it could be misused in a manner harmful to performance.

There's definitely room for a compromise solution somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants