Skip to content

Conversation

@chillenzer
Copy link
Contributor

@chillenzer chillenzer commented Dec 12, 2025

Species Refactoring

This PR is a complete rewrite of our picmi.Species class and the related protocols. It is somewhat related to #5500, although not quite aligned with the original vision of how that's going to proceed. This PR removes PyPIConGPU as an officially supported interface and removes all pypicongpu unit tests.

New Species protocol

The previous protocol of how species were handled in PyPIConGPU/PICMI had several significant drawbacks. The two main ones are:

  • It was based on a stateful InitManager instance that could not be re-used after having been "baked" once.
  • It was implemented on the PyPIConGPU level while it semantically belongs into our implementation of the PICMI level.

The main problem that this construction tried address is the fact that an inversion of some dependencies happens from the PICMI level to the PyPIConGPU level. For example, on the PICMI level the Species contains the initial_distribution but on the PyPIConGPU level we have a SimpleDensity operation that contains a (list of) Species. Also, various entities in the code impose requirements onto the Species that it must be informed about.

The new species protocol is as follows:

  • The picmi.Species provides a register_requirements public method. Consumers of the species are allowed to call this before any get_as_pypicongpu() calls to register requirements such as "you need to have a mass (optionally: of value =...)". They should typically do so in their __init__.
  • Requirements are given in form of a list containing
    • Any directly instantiateable PyPIConGPU class (in practice, only Constants and Attributes are considered currently)
    • A DelayedConstruction of any PyPIConGPU class that cannot be instantiated directly. This typically happens when that class depends on a species itself.
    • A DependsOn(species=...) instance which (hopefully unsurprisingly) signals a dependency on another species (e.g. an ion species DependsOn(species=electrons) which they emit upon ionisation).
  • At the time of calling Simulation.get_as_pypicongpu all information is available by definition and a partial ordering is defined via DependsOn (cyclic dependencies are currently not checked). The translation can call get_as_pypicongpu on any entity, including the Species instances. This call is stateless, i.e., it can be repeated and the order of execution doesn't matter. Thus any entity can independently construct any PyPIConGPU class instances it needs.
  • Afterwards, there's a global operation on all initialisation operations to fulfill some inter-species relationships that exist in the PyPIConGPU layer for no apparent reason.

As a side remark, one should note that it is undecided as of now if register_requirements should be considered public interface. It is designed as an implementation detail of our internal species handling. But it might provide an unintended resolution to some feature requests from @pordyna and @PrometheusPi. Their experience with the new protocol will show if the user experience is sufficiently intuitive to support its public usage. It is unlikely to make it into the PICMI standard.

For the documentation sprint:

  • Write doc strings and documentation about the new species protocol.

Dropping PyPIConGPU as an officially supported interface

I'm dropping PyPIConGPU as an officially supported interface. Its design goal was to be a Python representation of the input parameters of PIConGPU. It currently fails to meet this goal for no apparent reason. But it seems quite natural to move more into that direction. This would also resolve some of the above-mentioned unnecessary inter-species relationships. As we are also moving to Pydantic, this layer will literally contain no logic in itself, just a plain representation of data. In that state it will need no testing (beyond for typos which we can already test through PICMI). Once we have reached this state, it will be trivial to add this as an official interface again. For now, supporting, testing and thereby fixing its interface is a maintenance burden because there's thousands of lines of code that need adaption to the new interface. I have removed all PyPIConGPU unit tests. We are now free to change any behaviour unobservable from within PICMI.

In a future refactoring:

  • Revisit the test cases and checks removed by this and re-implement their invariants as part of Pydantic validation.

Other fixes and features

Some minor fixes, features, etc. sneaked in because they were on my to-do list for quite a while now:

  • I have removed skipped unit tests in the PICMI layer. Those were tests for classes from the PICMI standard that we don't actually implement (yet).
  • I have fixed the directory handling of the pypicongpu.Runner. It used to just cd around to its liking which internally messed up the working directory and made it hard to reason about where exactly something was executed (particularly while debugging). I have provided a context manager that makes sure to always return back to where it came from after changing directory for, e.g., running pic-build.
  • Fixed a warning in copy_attributes.py because inspect.getmembers accesses __fields__ which is deprecated in Pydantic v2. I'm now checking for pydantic.BaseModel and use the model_fields property instead in that case.
  • Fix broken computation of typical ppc. This is a natural result of the refactoring I've done but it's worth noting because it will lead to generally larger values of the typical ppc.
  • Add method attribute to picmi.Species according to PICMI standard. It indicates the pusher method used for this particle species. Our species class should now be standard-compliant.
  • PICMI diagnostics are BaseModels now.
  • I have refrained from doing too much refactoring towards Transition PyPIConGPU to Pydantic #5500 but I have performed some simplifications in that direction.
  • Removed Auto diagnostic.

Testing status

I have performed the following testing:

  • The (remaining) quick and compiling tests pass (and will be checked in CI again).
  • The end-to-end tests pass (and will not yet be checked in CI). Please run them again during review. They should run on any machine with a working PIConGPU environment including openPMD.
  • I have diffed the generated setup from the LWFA example against its mainline/dev output. The result is identical up to whitespaces and the two known features/fixes (typical ppc and pusher method).

Implementation details

The main algorithm of requirement resolution is implemented in evaluate_requirements in picmi/species.py. It basically filters for the type of requirement it's looking for, evaluates/resolves/??? that filtered collection and runs the delayed constructions. That intermediate "evaluates/resolves/???" step is a nasty, big, $\mathcal{O}(N^2)$ for-loop that on every pair of requirements

  • checks for conflicts
  • tries to merge them (for example, different forms of ionisation behaviour are bundled into one instance)
  • drops unnecessary duplicates.

These steps are implemented as traits (free functions that run some default behaviour unless the given object(s) override that).

The DelayedConstruction is an interesting object, make sure you read its doc string. It is my solution to the conflict of requiring the freedom of arbitrary functions while at the same time having enough information/metadata about what is run to meaningfully reason about the intent, uniqueness, etc. that is encapsulated in that instance.

For review

Just despair. There's no hope! It's thousands of lines of code and you'll never gonna make it!
...
Just kidding.

To the extent applicable to your situation:

  • Please make sure that you run the end-to-end tests. (They are not yet run in CI.)
  • Please run the examples.
  • Please run any of your production cases and compare the output of mainline/dev with this branch. A convenient way of doing so for me was:
git checkout mainline/dev
# run picmi script with `write_input_file("expected")
git checkout -
# run picmi script with `write_input_file("new_branch")`
for file in $(ls {expected,new_branch}/include/picongpu/param/*.param | sed 's|^[^/]*/||g' | sort | uniq); do
    echo $file && diff {expected,new_branch}/$file;
done

I'd not primarily look at the new code in Github's diff view because most of it is a complete rewrite and there's little meaning to the diff. Start exploring it in your normal editor, jump around and get a feel for the code: Start from the picmi.Simulation class, specifically its get_as_pypicongpu method. Following the control flow will lead you through the algorithm of requirement resolution. Check the species internal requirements added from picmi.Species.__init__ and the ionisation-related changes for examples of registering requirements. Afterwards you'll have covered the main functional code changes.

Back in the Github diff view, the remainder of the changes will hopefully feel intuitive and self-explanatory as part of one of the following categories:

  • Necessary changes to adopt the new protocol
  • Removal of the old protocol
  • Removal and/or adaption of the tests
  • Any item from the list of minor fixes and features

There's certainly a lot of bikeshedding applicable. Some namings and placements of code have not yet experienced the kind of careful consideration that they might deserve. Feel free to speak up about those. Some of the major ones I'm aware of:

  • _make_unique in species_requirements.py is poorly named but I'm struggling to come up with a better name because it's quite an overload of responsibilities. That's probably a code smell but it's complex correlations being modeled here.
  • There's surely a point to be made to move the provided specialisations of DelayedConstruction closer to their use cases. Currently, they are all bundled up in species_requirements.py. I was uncertain as to how much re-use there was to be expected among the code base when I started out, so I put them all into one central place. That will not scale well I'm afraid.
  • _DensityImpl is poorly named and badly placed.

I have not cleaned up git history in any particular way and I'm uncertain what would be a safe and intuitive way to group these changes. It is admittedly quite a messy history but I'm afraid that the risk of introducing bugs by trying to disentangle and squash individual aspects into a small number of commits is quite high. And a single commit of ~8kloc of changes might also not be what we want.

@chillenzer chillenzer changed the title WIP: Species refactoring WIP: PICMI Species handling refactoring Dec 15, 2025
@chillenzer chillenzer changed the title WIP: PICMI Species handling refactoring PICMI Species handling refactoring Dec 15, 2025
@chillenzer chillenzer marked this pull request as ready for review December 15, 2025 21:50
@chillenzer
Copy link
Contributor Author

@PrometheusPi @pordyna @ikbuibui Should be ready for full review now. Please note (and read!) the updated description.

@chillenzer chillenzer mentioned this pull request Dec 18, 2025
1 task
Copy link
Member

@pordyna pordyna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, I just went over the files in my IDE, like suggested. Still need to look deeper into it.

append_this = True
for rhs in result:
check_for_conflict(lhs, rhs)
merge_success = try_update_with(rhs, lhs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if try_update_with updates rhs into a state that changes the outcome of check_for_conflict that was already checked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question. Let's make an assumption to start off: try_update_with does not create new conflicts by itself, i.e., any conflict that the update-rhs has was already present in original-rhs or lhs. We can later revisit that assumption.

Control flow:

  • Starting point: result contains a partial list and (the instance that in your question will be) rhs is supposed to be added to result.
  • rhs checks for conflicts with everything already existing result and doesn't find any.
  • result gets filled up further and every new entry will check for conflicts with the already added rhs.
  • lhs is supposed to be added and checks for conflicts with everything in result up to rhs where it is merged.
  • result is filled up to completion while every new addition checks against updated-rhsfor conflicts.
  • Finally, we arrive at result = [..., updated-rhs, ..., <position where lhs would have been added>, ...].

This creates the following checking situation:

  • rhs was checked against everything because
    • original-rhs was checked against everything left of would-be-lhs position.
    • updated-rhs was checked against everything right of updated-rhs.
    • We assumed that try_update_with doesn't create new conflicts.
  • lhs was checked against everything left of updated-rhs and right of would-be-lhs position because
    • it was checked against everything left of updated-rhs before merging with rhs
    • updated-rhs was checked against everything right of would-be-lhs position.

This leaves a hole between updated-rhs and would-be-lhs position that lhs was never checked against. This hole could be closed by removing the short circuiting.

Is that assumption reasonable? We can just make it a requirement and revisit it if that's wrong. Checking for conflicts is anyhow more of a convenience feature. If there's conflicts it would typically blow up at some point. The main functionality of check_for_conflict is to fail early with a clear error message and not something cryptic about ambiguous linking or the like during compilation.

What are your thoughts about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm really not happy about this function. It's a difficult protocol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different approach would be to maintain a list of possible actions, conflicts, etc. that we apply to the requirements:

# Each of these returns a new list.
# The agreement is that they act on all pertinent entries
# and leave the rest as-is.
_RESOLUTION_ACTIONS = [
    # Checks for conflicts before dropping:
    DropDuplicateConstantsAndAttributes,
    MergeSimpleDensities,
    MergeIonizationModels,
    ...
]

# I'd probably rename this to resolve_requirements or the like
# but for the sake of consistency I'm sticking with _make_unique for now:
def _make_unique(requirements, actions=_RESOLUTION_ACTIONS):
    if len(actions) == 0:
        return requirements
    return _make_unique(actions[0](requirements), actions[1:])

Advantages:

  • With this approach it is certainly easier to understand what actions are performed to resolve the requirements.
  • The process is stateless./The style is functional which I like.

Disadvantages:

  • Customisation of this pipeline would be centralised, i.e., if anybody wants to add a new action they'd have to hack something into this list. (Also, we've got a mutable default argument which is always dangerous.)
  • I don't like relying on a global constant.
  • One could instead hard-code the actions one-after-another into the function body but that's even less customisable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further designs with the spirit of shifting responsibilities to an earlier stage:

  • Species.register_requirements could already run the checking and the merging, etc. That would look similar to one of the above but it would be one loop each time a new element is registered instead of one big N^2 loop at the end. That doesn't sound like a significant gain but would already simplify things a little.
  • We could expose the Species._requirements variable and allow registration via direct modification of that. So, the registering class is responsible for keeping requirements in a consistent state.
  • Species.register_requirements could let the freshly registered requirement handle this:
# trait to allow custom actions to be performed and provide some defaults otherwise:
def resolve(new_req, requirements):
    if hasattr(new_req, "resolve"):
        return new_req.resolve(requirements)
    if isinstance(new_req, Constant):
        if not is_unique_in(new_req, Constant):
            raise ValueError(...)
        return requirements + [new_req]
    ...

def register_requirements(self, new_requirements):
    for req in requirements:
        self.requirements = resolve(req, self.requirements)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pordyna @PrometheusPi You might want to add opinions on this design decision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look it up in the GoF again at home.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill take a closer look. As an aside for some interesting reading this problem reminds me of this paper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the check_for_conflict(lhs, rhs) simply be moved into a separate loop after the merging is done with try_update_with(rhs, lhs)? Is the check before merging somehow different from after merging? Is one check at the end not sufficient?

Copy link
Member

@PrometheusPi PrometheusPi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some minor request for changes and several questions, some of them might indicate further changes.

I did not yet review:

  • lib/python/picongpu/pypicongpu/simulation.py
  • lib/python/picongpu/picmi/species_requirements.py
  • lib/python/picongpu/picmi/species.py
  • lib/python/picongpu/picmi/simulation.py

and will do this (hopefully) tomorrow.

Comment on lines +232 to +237
return (
isinstance(other, SimpleDensityOperation)
and other.metadata.kwargs["profile"] == self.metadata.kwargs["profile"]
and other.metadata.kwargs["layout"] == self.metadata.kwargs["layout"]
and (self.metadata.kwargs["species"].extend(other.metadata.kwargs["species"]) or True)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really find this syntax confusing every time I see it

Copy link
Contributor Author

@chillenzer chillenzer Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is admittedly a tiny bit code-golfy that there is a mutating operation in there. Would you be more comfortable with an explicit form?

Suggested change
return (
isinstance(other, SimpleDensityOperation)
and other.metadata.kwargs["profile"] == self.metadata.kwargs["profile"]
and other.metadata.kwargs["layout"] == self.metadata.kwargs["layout"]
and (self.metadata.kwargs["species"].extend(other.metadata.kwargs["species"]) or True)
)
if (
isinstance(other, SimpleDensityOperation)
and other.metadata.kwargs["profile"] == self.metadata.kwargs["profile"]
and other.metadata.kwargs["layout"] == self.metadata.kwargs["layout"]
):
self.metadata.kwargs["species"].extend(other.metadata.kwargs["species"])
return True
return False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikbuibui what is your opinion on @chillenzer's suggestion?

Comment on lines 205 to 209
picmistandard.PICMI_Simulation.__init__(self, **keyword_arguments)

if picongpu_typical_ppc is not None and picongpu_typical_ppc <= 0:
raise ValueError(f"Typical ppc should be > 0, not {self.picongpu_typical_ppc=}.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the PICMI_Simulation.__init__ modify picongpu_typical_ppc? else maybe the check should be called before the base class init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

Fine with me. Do you have a particular reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

append_this = True
for rhs in result:
check_for_conflict(lhs, rhs)
merge_success = try_update_with(rhs, lhs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the check_for_conflict(lhs, rhs) simply be moved into a separate loop after the merging is done with try_update_with(rhs, lhs)? Is the check before merging somehow different from after merging? Is one check at the end not sufficient?

Comment on lines 457 to 458
mi, ma = reduce(lambda lhs, rhs: (min(lhs[0], rhs), max(lhs[1], rhs)), iterable, (1000, 0))
return (ma - mi) // 2 + mi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the (1000,0)? and why the integer division?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1000, 0) is the neutral element of the minmax operation on the interval [0, 1000]. You might have follow-up question like: "Why on earth would you want to put 'the neutral element of the minmax operation on the interval [0, 1000]' in there?" To which I reply: "Okay, fine! I'll choose a less arbitrary range like (-np.inf, np.inf)." On a more serious note: This is currently used for typical ppc, so the values under consideration are on the order of 1-10 and they are all integer. The latter made me try not to reach for floats here.

Integer division because I want to have an integer. Maybe I should rename that function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikbuibui What is your opinion on @chillenzer's reply.
And @chillenzer what is the status of your "Maybe I should rename that function."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the need for a neutral element by using the first element as the starting point.

I've made the function "_private" and added a small docstring clarifying the intent.

@chillenzer chillenzer force-pushed the species-refactoring branch from aedc4d4 to ff535dd Compare January 6, 2026 08:54
@chillenzer
Copy link
Contributor Author

Switched to eager requirement resolution and added some not-at-all polished tests. We'll have to get back to this and properly reconsider the internal protocol. I think the existing code has sufficient encapsulation to not really affect other changes.

Check if you're happy now!

@PrometheusPi PrometheusPi self-requested a review January 7, 2026 14:21
Copy link
Member

@PrometheusPi PrometheusPi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions that need clarification. Looks otherwiese good.

Comment on lines 457 to 458
mi, ma = reduce(lambda lhs, rhs: (min(lhs[0], rhs), max(lhs[1], rhs)), iterable, (1000, 0))
return (ma - mi) // 2 + mi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikbuibui What is your opinion on @chillenzer's reply.
And @chillenzer what is the status of your "Maybe I should rename that function."?

@PrometheusPi PrometheusPi merged commit e48b5d8 into ComputationalRadiationPhysics:dev Jan 9, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:no-compile CI is skipping compile/runtime tests but runs PICMI tests PICMI pypicongpu and picmi related refactoring code change to improve performance or to unify a concept but does not change public API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants