Skip to content

Conversation

ntfshard
Copy link
Contributor

@ntfshard ntfshard commented Jun 14, 2025

🦟 Bug fix

Fixes #

Summary

MeshManager is a singleton which handle all meshes. Therefore it should be potentially accessible from a different treads, like simulation workers or callbacks. But previous synchronization implementation was not fully correct. We should protect not a single mesh in Load method, as previous comment mentioned, but whole unordered_map container.

  • Here I propose fix based on shared_mutex. Due to we want to allow reading a state simultaneously but changing state could lead to a problems.
  • Spotted potential memory leak in AddMesh function: old version could ignore/lose object passed by user if mesh with same name already exist, but in method description mentioned that it will take an ownership
  • Propose to replace raw pointers in unordered_map to unique_ptr for simplicity/safety.

Also I want to discuss problem of staling cache for meshes. I'll leave comment in code to start thread.

Checklist

  • Signed all commits for DCO
  • Added tests
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • Consider updating Python bindings (if the library has them)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers
  • Was GenAI used to generate this PR? If so, make sure to add "Generated-by" to your commits. (See this policy for more info.)

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by and Generated-by messages.

Signed-off-by: Maksim Derbasov <[email protected]>
Comment on lines +143 to 146
if (iter != this->dataPtr->meshes.end())
{
return this->dataPtr->meshes[_filename];
return iter->second.get();
}
Copy link
Contributor Author

@ntfshard ntfshard Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned problem of staled cache is here.
Due to Server and GUI is a different processes, it's impossible(AFAIU) to share in-memory created mesh between 2 of them. (For example it could be for soil shape or something what we want to deform)

So if Server plugin creates Model through ECM, it can specify file as an input (plugin could create and update mesh file periodically). But we have to pick a new name every time and the new meshes will accumulate in server and GUI processes (yes, we can remove from server side, but GUI is detached). We can workaround by adding GUI plugin which should remove old meshes, but it's quite dirty hack.

So. It could be convenient if we could update a mesh if file changed. But I need some clarification/approval to avoid breaking whole design.
Problems which I potentially see: user (of MeshManger) saved pointer to mesh which can be destroyed. Maybe we can replace only guts of mesh and keep pointer intact (but should we?).

In general, it's hard to track lifetime with such API, but redesigning/rewriting API is not very fast procedure. (also seems this class could have a good cleanup for API, but it's out of context)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tidying up the code. It was ported from old gazebo-classic and it's overdue for cleanup / redesign.

So. It could be convenient if we could update a mesh if file changed. But I need some clarification/approval to avoid breaking whole design.

To clarify, is this the use case you're describing? e.g. MeshManager loads a common::Mesh called my_mesh_1 from my_mesh.glb. Sometime later, my_mesh.glb gets updated on disk, and it would be nice to be able load the new my_mesh.glb into my_mesh_1 again instead of adding another new mesh to the map?

I think that's fine. We could add an extra arg to Load, e.g. Load(const std::string &_filename, bool _forceReload) so that it forces the mesh manager to reload the file and updates the mesh stored in the map. Note this extra arg however break ABI so would need a slightly different approach for gz-common6.

On the other hand, I see that you modified AddMesh to support replacing a mesh. I would lean towards having a separate API to explicitly do this, e.g. UpdateMesh.

Copy link
Contributor Author

@ntfshard ntfshard Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, is this the use case you're describing?

My case: we have plugin which calculates new mesh for a soil shape with some frequency. And to be visible in GUI part and visible for sensors on server part we have to re-instantiate it both processes.
From a plugin we can do it only by removing model and adding new model (maybe we can limit to visual but it's the same idea). And changes in ECM will be propagated to GUI process. BUT due to MeshManager on other side already loaded mesh with same name, it won't be reloaded. As I said, we can use other filename, but it will lead to memory bloat in MeshManager.

Interactive use case: User making model of some object in Blender. Exports it in file. Loads it as a mesh to simulator and decide to update it in Blender one more time, saves it as a file with same name and trying to load it: and instead of new model they can see old version. That's very strange user experience
And here if we check file modification time in load call, we can spot this case and do not show old mesh to user

_forceReload doesn't make sense. You can just access singleton to remove old and load new one.

On the other hand, I see that you modified AddMesh to support replacing a mesh. I would lean towards having a separate API to explicitly do this, e.g. UpdateMesh.

As I said, previous implementation had a potential flaw: memory leak if model exist. And due to comment in header file clearly explaining that we are taking ownership of object we have only 2 options: replace or destroy new. And if user somehow want to load new, maybe it will be convenient to replace.

So, let me rephrase my problem in a different way: from user(real human or plugin) we don't have too much ways how to control some aspects of simulator. If MeshManager could automatically sync from Server to GUI process I could just create mesh from memory, add it to ECM and be happy (and delete it from ECM, add it back for update or something).

And out of context comment, you can have 2 API calls which won't break an ABI.

Copy link

@mderbaso-deepx mderbaso-deepx Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it could be much cleaner to redesign api around or shared_ptr, or very abstract ID type (like entity_id). But it will take 1 major release to get rid of current version and I want to aligned vision with maintainers on such step (not in this PR)

(sorry, work account of ntfshard)

@ntfshard
Copy link
Contributor Author

Also maybe I'm targeting this patch to a wrong branch. I confuse it all the time

auto iter = this->dataPtr->meshes.find(_mesh->Name());
if (iter != this->dataPtr->meshes.end())
{
gzdbg << "MeshManager::AddMesh replaced: " << _mesh->Name();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gzdbg << "MeshManager::AddMesh replaced: " << _mesh->Name();
gzdbg << "MeshManager::AddMesh replaced: " << _mesh->Name() << std::endl;

or remove if the debug statement if it is not intended to be left here?

Copy link
Contributor Author

@ntfshard ntfshard Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intended, as I said earlier before we had a memory leak here in this case. Right now maybe it could be convenient to notify user that model was replaced, maybe they have some logic which is loading each time instead on once. And for persons who expected that model will be reloaded, I expect to see a ticket what this function not worked(before)

In other words, leaving (even potential) memory leak is not a great solution in my opinion. We can destroy new model and write message about it. Just want to have some agreement about system behaviour

I'll apply your proposed changes slightly later today

Comment on lines +143 to 146
if (iter != this->dataPtr->meshes.end())
{
return this->dataPtr->meshes[_filename];
return iter->second.get();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tidying up the code. It was ported from old gazebo-classic and it's overdue for cleanup / redesign.

So. It could be convenient if we could update a mesh if file changed. But I need some clarification/approval to avoid breaking whole design.

To clarify, is this the use case you're describing? e.g. MeshManager loads a common::Mesh called my_mesh_1 from my_mesh.glb. Sometime later, my_mesh.glb gets updated on disk, and it would be nice to be able load the new my_mesh.glb into my_mesh_1 again instead of adding another new mesh to the map?

I think that's fine. We could add an extra arg to Load, e.g. Load(const std::string &_filename, bool _forceReload) so that it forces the mesh manager to reload the file and updates the mesh stored in the map. Note this extra arg however break ABI so would need a slightly different approach for gz-common6.

On the other hand, I see that you modified AddMesh to support replacing a mesh. I would lean towards having a separate API to explicitly do this, e.g. UpdateMesh.

@ntfshard ntfshard requested a review from iche033 June 19, 2025 03:45
Signed-off-by: Maksim Derbasov <[email protected]>
@ntfshard ntfshard force-pushed the meshmanagerconcurrency branch from 2493775 to 51db501 Compare June 19, 2025 16:37
Comment on lines 69 to -70
imgFormat == common::Image::PixelFormatType::BAYER_GRBG8 ||
imgFormat == common::Image::PixelFormatType::BAYER_GRBG8 ||
Copy link
Contributor Author

@ntfshard ntfshard Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same line twice
(re-pushed commit due to CI on Windows complained about temp dir)

@ntfshard
Copy link
Contributor Author

ubuntu22-screen0.webm

I filmed this bug with updating mesh and loading it through GUI. Sorry for strange compression, but seems VirtualBox compressed file too well
script:

  • (till 0:20 -- I'm trying to guess folder; lol)
  • in docker in /mesh we have 2 files: cube.stl and head.stl;
  • copy cube.stl to /a.stl;
  • run simulation for empty world;
  • copy head.stl to /a.stl, load /a.stl, still cube;
  • restart sim, load /a.stl -> head

Also reproducible if remove it from scene and try to upload; due to lifetime of mesh is unknown for ECM, we can't remove mesh due to maybe it's still in use in other place

@ntfshard ntfshard closed this Jun 25, 2025
@github-project-automation github-project-automation bot moved this from Inbox to Done in Core development Jun 25, 2025
@ntfshard ntfshard reopened this Jun 25, 2025
@github-project-automation github-project-automation bot moved this from Done to Inbox in Core development Jun 25, 2025
@ntfshard ntfshard closed this Jul 1, 2025
@ntfshard ntfshard reopened this Jul 1, 2025
@github-project-automation github-project-automation bot moved this from Inbox to Done in Core development Jul 1, 2025
@github-project-automation github-project-automation bot moved this from Done to Inbox in Core development Jul 1, 2025
@ntfshard ntfshard closed this Jul 2, 2025
@github-project-automation github-project-automation bot moved this from Inbox to Done in Core development Jul 2, 2025
@ntfshard ntfshard reopened this Jul 2, 2025
@github-project-automation github-project-automation bot moved this from Done to Inbox in Core development Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Inbox
Development

Successfully merging this pull request may close these issues.

3 participants