Conversation
There was a problem hiding this comment.
This looks good to me.
TODOs (ideally in the next 2 weeks, i.e., by 27.10):
- Add SCF as a workflow class, like GeomOpt
- Determine a standard for populating SCF and SCF+GeomOpt, ideally using a representative parser
- Align with @ndaelman in the context of #191 and decide what indicators will be present in data.outputs
- Consider backcompatibility / integration procedure in the parsers
Pull Request Test Coverage Report for Build 18594497000Details
💛 - Coveralls |
|
The current status of the implementation is:
|
|
That's great @kubanmar , thanks for pushing it forward 🙌 let's see if there is anytime next week, but if not it can be wrapped up the following week |
| convergence_threshold_unit = Quantity( | ||
| type=str, | ||
| description=""" | ||
| Unit using the pint UnitRegistry() notation for the `convergence_threshold`. | ||
| """ | ||
| ) |
There was a problem hiding this comment.
Note that this won't be compatible with the NOMAD framework for handling units. It won't convert when the user asks so. Moreover, to fit your description, you should alter the type. This still won't power the unit conversion on request.
The modular handling of units is very hard. Many ppl have given it a shot. The closest we have right now are these dataframes.
|
Newest updates:
TODO:
I have left some comments in the code where I could need some help. |
|
@kubanmar can you rename this PR so the scope is more clear, include SCF |
|
The latest commit provides a working example:
TODO:
@EBB2675 if you want to review/contribute I would be happy, otherwise I will continue working on it after the break |
Pull Request Test Coverage Report for Build 22540757767Details
💛 - Coveralls |
|
I have now merged develop and fixed formatting and type checks. There are still plenty of TODOs for which I would appreciate help. Most severe is that this is only compatible with NOMAD v1.3, as the newer version generates too many nested subsections in for the workflow tasks. If I understood @EBB2675 correctly this is not a schema/parser level issue, but can only be addressed on the mapping parser level. Using https://github.com/FAIRmat-NFDI/nomad-parser-plugins-simulation/tree/schema_update_convergence_targets for the parsers, the code now generates the intended data for: exciting single point calculations and geometry optimizations. From my side, this is therefore ready for review. |
|
TODOs:
|
* refactor convergence target schema * add convergence test module * generalize get_convergence_value * revert Results removal * rename to threshold_type * fix convergence test module * clean and ruff * ruff format * fix bool treatment in tests * fix reference assignment * doc string format fix * docstring fixes * update with parsing refactor * ruff
9aadb92 to
88a7159
Compare
|
Additionally, I recommend removing class WavefunctionResidualConvergenceTarget(WorkflowConvergenceTarget):
threshold = Quantity(type=np.float64, unit='coulomb')
threshold_type = Quantity(...) # Can still use absolute/rms/etc for the residual
_convergence_property_path = 'scf_steps.wavefunction_residuals'This maintains the clean semantic separation: class = physical quantity, |
| if isinstance(value, int | float | np.floating): | ||
| # Scalar value - use absolute or relative | ||
| if conv_type == 'absolute': | ||
| return self._check_absolute(value) | ||
| elif conv_type == 'relative': | ||
| # For relative, child class should provide reference | ||
| logger.warning( | ||
| f'Relative convergence requires reference value in ' | ||
| f'{self.__class__.__name__}' | ||
| ) | ||
| return None | ||
| else: | ||
| return self._check_absolute(value) | ||
|
|
||
| elif isinstance(value, np.ndarray): | ||
| # Array value - can use maximum or rms | ||
| if conv_type == 'maximum': | ||
| return self._check_maximum(value) | ||
| elif conv_type == 'rms': | ||
| return self._check_rms(value) | ||
| elif conv_type == 'absolute': | ||
| # For array, treat as maximum | ||
| return self._check_maximum(value) | ||
| else: | ||
| return self._check_maximum(value) |
There was a problem hiding this comment.
Issue: The `threshold_type` dispatch is coupled to data shape, creating artificial constraints where data shape dictates which comparison methods are valid rather than user intent or physical meaning. Why can't you check `absolute` on each array element, or `maximum` of a scalar?
Additionally, there's silent fallback behavior with no warnings when `threshold_type` doesn't match data shape:
- Parser sets `threshold_type='maximum'` but data is scalar → silently uses `absolute` instead (line 219)
- Parser sets `threshold_type='absolute'` but data is array → silently uses `maximum` instead (line 229)
The `threshold_type` should describe the mathematical operation independent of data shape.
There was a problem hiding this comment.
After reflecting more on it: The threshold_type enum conflates three orthogonal concerns:
- Data shape: scalar vs array
- Aggregation method: How to reduce vectors to comparable scalars (
maximum,rms) - Comparison type: How to compare values (
absolute,relative)
These should be independent. For example:
- You might want
maximumaggregation of force components withrelativeconvergence (current max force compared to previous max force) - Or
rmsaggregation of energies across a trajectory withabsoluteconvergence
The current design forces specific pairings that may not match the physical quantity or convergence criteria desired.
A cleaner design might separate:
aggregation_method = MEnum('scalar', 'maximum', 'rms') # How to reduce arrays
comparison_type = MEnum('absolute', 'relative') # How to compare|
Question about I noticed that Section-based structure ( archive.data
├── model_system[] (repeating sections)
├── model_method[] (repeating sections)
└── outputs[] (repeating sections)Workflow structure ( archive.workflow2
├── method (subsection)
├── results (subsection with quantities)
│ ├── is_converged (quantity)
│ ├── final_energy_difference (quantity)
│ ├── final_force_maximum (quantity)
│ └── convergence[] (repeating subsection)
└── tasks[] (repeating sections)Question: Is the quantity structure `workflow2.results` meant to keep shadowing `archive.results`? Understanding this would help clarify the design intent and whether there are legacy patterns that should be considered for future schema evolution. |
Change threshold type in all convergence target classes from `np.float64` to `positive_float()` to enforce schema-level validation. This ensures thresholds are non-negative (x ≥ 0), allowing zero thresholds but rejecting semantically invalid negative values. Updated classes: - `EnergyConvergenceTarget` - `ForceConvergenceTarget` - `PotentialConvergenceTarget` - `ChargeConvergenceTarget` Add test to verify zero thresholds are accepted. Remove test for negative thresholds as these are now prevented at the schema level.
It do not mean it to shadow archive.results. In the same spirit that archive.results contain a 'summary' of data, workflow.results is a summary of the workflow. I would like to think that there are quantities that are specific to workflow.results. .I concede that there are quantities are seemed to be duplicated e.g. in for geometry_optimization workflow, tolerances appear in both. But my opinion is this should not be the case. For me this is limited by the requirement that search indices and the gui are built from archive.results. This already came up from the new results normalizer discussion if we indeed rely on archive.results for everything or we can directly work with the data/ workfow sections. |
Thx for clarifying! These overlapping quantities is why I asked about the shadowing in the first place.
I see. I raised this question since all other SCF data is now handled by sections. This is the only exception, and it is relevant to plotting. I was thinking off turning it also into sections for consistency and polymorphism. I had not yet considered how it would impact query performance (or possibility). |
|
yes, 100% agree with @ladinesa ... there are some other developments of the results section and normalization that was met at the end of the day yesterday, but I will report at our next meeting |
- Add `WavefunctionConvergenceTarget` class to `general.py` for tracking wavefunction coefficient convergence in SCF workflows - Add `delta_wavefunction_rms` quantity to `SCFSteps` schema in `outputs.py` to store RMS changes of wavefunction coefficients - Add comprehensive test suite covering edge cases: zero convergence, missing data, single iteration, NaN/Inf handling, negative values, boundary conditions, and array vs scalar data - Use `positive_float()` for threshold validation (x ≥ 0) - Set `_convergence_property_path` to `scf_steps.delta_wavefunction_rms` for automatic resolution All 51 convergence target tests pass.
- Remove `residuum` from `threshold_type` enum (semantically inconsistent - describes WHAT not HOW) - Remove unused imports (`Iterable`, `jmespath`, `SimulationTime`, `Outputs`) - Simplify `WorkflowConvergenceTarget` docstring
|
Resolved in commit 705b7d9. Removed |
|
Created follow-up issues from review discussion:
|
In this PR I want to refactor how to deal with not converged or otherwise not finished calculations.
The first step is to refactor and unify how we represent the convergence of the simulation workflow. This PR introduces a new subsection
WorkflowConvergenceTargetto all cases where a workflow converges, e.g., SCF, or geometry optimization. It is intended to replace the convergence parameters inGeometryOptimizationModelas well as the convergence settings inSCFOutputsandSelfConsistency.SCFOutputsis very minimal now and I don't know what it is used for. The information about the SCF should be included in the workflow section. ShouldSCFOutputsbe a reference then, to not duplicate information? IsSCFOutputsused anywhere?This is a draft PR, I left the failing tests in there on purpose to keep track of the functionality that I broke by removing
SelfConsistency.Note: This PR breaks the abinit parser, because it tries to populate
workflow.geometry_optimization.GeometryOptimizationModel.convergence_tolerance_energy_difference, which moved toWorkflowConvergenceTargetSummary of changes:
ProgramWorkflowConvergenceTargetWorkflowConvergenceTargettoSimulationWorkflowModelSelfConsistencyself_consistency_reffromPhysicalPropertySCFOutputsGeometryOptimizationModel