Skip to content

Commit 86a3410

Browse files
GeigerJ2agoscinski
authored andcommitted
WIP: Profile data dumping (#6723)
Squashed commit at 2025-05-09 21:54 Add config pydantic model Add detect.py Add group-node-mapping Add dump logger Add dump engine Add dump managers Add facades Add utils Add changes to CLI Add changes to init, disable mypy for feature for now Add changes to docs Add changes to and additional tests [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Fix bug in explicitly-groupd sub-workflows being filtered out for profile/group dumping Fix group validation exception on `verdi profile dump -G` and creation of empty dirs for deselected groups.
1 parent 948ffc5 commit 86a3410

File tree

31 files changed

+7926
-1125
lines changed

31 files changed

+7926
-1125
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,7 @@ repos:
191191
src/aiida/transports/plugins/local.py|
192192
src/aiida/transports/plugins/ssh.py|
193193
src/aiida/workflows/arithmetic/multiply_add.py|
194+
src/aiida/tools/dumping/.*|
194195
)$
195196
196197
- id: generate-conda-environment

docs/source/howto/data.rst

Lines changed: 260 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,10 @@ For details refer to the next section :ref:`"How to add support for custom data
8181
.. _how-to:data:dump:
8282

8383
Dumping data to disk
84-
--------------------
84+
====================
85+
86+
Profile dumping
87+
---------------
8588

8689
.. versionadded:: 2.6
8790

@@ -148,6 +151,262 @@ subdirectories.
148151

149152
For a full list of available options, call :code:`verdi process dump --help`.
150153

154+
Group Dumping
155+
-------------
156+
157+
.. versionadded:: 2.7
158+
159+
The functionality has been expanded to also dump data from groups:
160+
161+
.. code-block:: shell
162+
163+
verdi group dump <group-identifier>
164+
165+
This command will create a directory structure with all processes contained in the specified group. For example:
166+
167+
.. code-block:: shell
168+
169+
$ verdi group dump my-calculations
170+
Warning: This is a new feature which is still in its testing phase. If you encounter unexpected behavior or bugs, please reach out via Discourse.
171+
Report: No config file found. Using command-line arguments.
172+
Report: Starting dump process of group `my-calculations` in mode: INCREMENTAL
173+
Report: Processing group changes...
174+
Report: Processing 1 new or modified groups: ['my-calculations']
175+
Report: Dumping 1 nodes for group 'my-calculations'
176+
Report: Saving final dump log, mapping, and configuration...
177+
Success: Raw files for group `my-calculations` dumped into folder `group-my-calculations-dump`.
178+
179+
Will result in the following output directory:
180+
181+
.. code-block:: shell
182+
183+
$ tree -a group-my-calculations-dump/
184+
group-my-calculations-dump
185+
├── .aiida_dump_safeguard
186+
├── aiida_dump_config.yaml
187+
├── aiida_dump_log.json
188+
└── calculations
189+
└── ArithmeticAddCalculation-4
190+
├── .aiida_dump_safeguard
191+
├── .aiida_node_metadata.yaml
192+
├── inputs
193+
│ ├── .aiida
194+
│ │ ├── calcinfo.json
195+
│ │ └── job_tmpl.json
196+
│ ├── _aiidasubmit.sh
197+
│ └── aiida.in
198+
└── outputs
199+
├── _scheduler-stderr.txt
200+
├── _scheduler-stdout.txt
201+
└── aiida.out
202+
203+
Similarly for a group ``my-workflows`` with a ``MultiplyAddWorkChain``:
204+
205+
.. code-block:: shell
206+
207+
$ verdi group dump my-calculations
208+
Warning: This is a new feature which is still in its testing phase. If you encounter unexpected behavior or bugs, please reach out via Discourse.
209+
Report: No config file found. Using command-line arguments.
210+
Report: Starting dump process of group `my-workflows` in mode: INCREMENTAL
211+
Report: Processing group changes...
212+
Report: Processing 1 new or modified groups: ['my-workflows']
213+
Report: Dumping 1 nodes for group 'my-workflows'
214+
Report: Saving final dump log, mapping, and configuration...
215+
Success: Raw files for group `my-workflows` dumped into folder `group-my-workflows-dump`.
216+
217+
And the following output directory:
218+
219+
.. code-block:: shell
220+
221+
$ tree -a group-my-workflows-dump/
222+
group-my-workflows-dump
223+
├── .aiida_dump_safeguard
224+
├── aiida_dump_config.yaml
225+
├── aiida_dump_log.json
226+
└── workflows
227+
└── MultiplyAddWorkChain-11
228+
├── .aiida_dump_safeguard
229+
├── .aiida_node_metadata.yaml
230+
├── 01-multiply-12
231+
│ ├── .aiida_dump_safeguard
232+
│ ├── .aiida_node_metadata.yaml
233+
│ └── inputs
234+
│ └── source_file
235+
└── 02-ArithmeticAddCalculation-14
236+
├── .aiida_dump_safeguard
237+
├── .aiida_node_metadata.yaml
238+
├── inputs
239+
│ ├── .aiida
240+
│ │ ├── calcinfo.json
241+
│ │ └── job_tmpl.json
242+
│ ├── _aiidasubmit.sh
243+
│ └── aiida.in
244+
└── outputs
245+
├── _scheduler-stderr.txt
246+
├── _scheduler-stdout.txt
247+
└── aiida.out
248+
249+
Profile Dumping
250+
---------------
251+
252+
.. versionadded:: 2.7
253+
254+
And, going even further, you can now also dump your data from an entire AiiDA profile.
255+
If no options are provided, by default, no data is being dumped:
256+
257+
.. code-block:: shell
258+
259+
$ verdi profile dump
260+
Warning: This is a new feature which is still in its testing phase. If you encounter unexpected behavior or bugs, please reach out via Discourse.
261+
Report: No config file found. Using command-line arguments.
262+
Warning: No specific data selection determined from config file or CLI arguments.
263+
Warning: Please specify `--all` to dump all profile data or filters such as `groups`, `user` etc.
264+
Warning: Use `--help` for all options and `--dry-run` to preview.
265+
266+
This is to avoid accidentally initiating the dumping operation on a large AiiDA database.
267+
Instead, if all data of the profile should be dumped, use the ``--all`` flag, or select a subset of your AiiDA data
268+
using ``--groups``, ``--user``, as well as the various time-based filter options the command provides.
269+
270+
If we run with ``--all`` on our current profile, we get the following result:
271+
272+
.. code-block:: shell
273+
274+
$ verdi profile dump --all
275+
Warning: This is a new feature which is still in its testing phase. If you encounter unexpected behavior or bugs, please reach out via Discourse.
276+
Report: No config file found. Using command-line arguments.
277+
Report: Starting dump process of default profile in mode: INCREMENTAL
278+
Report: Processing group changes...
279+
Report: Processing 2 new or modified groups: ['my-calculations', 'my-workflows']
280+
Report: Dumping 1 nodes for group 'my-calculations'
281+
Report: Dumping 1 nodes for group 'my-workflows'
282+
Report: Saving final dump log, mapping, and configuration...
283+
Success: Raw files for profile `docs` dumped into folder `profile-docs-dump`.
284+
285+
The resulting directory preserves the group organization:
286+
287+
.. code-block:: shell
288+
289+
$ tree -a profile-docs-dump/
290+
profile-docs-dump
291+
├── .aiida_dump_safeguard
292+
├── aiida_dump_config.yaml
293+
├── aiida_dump_log.json
294+
└── groups
295+
├── my-calculations
296+
│ ├── .aiida_dump_safeguard
297+
│ └── calculations
298+
│ └── ArithmeticAddCalculation-4
299+
│ ├── .aiida_dump_safeguard
300+
│ ├── .aiida_node_metadata.yaml
301+
│ ├── inputs
302+
│ │ ├── .aiida
303+
│ │ │ ├── calcinfo.json
304+
│ │ │ └── job_tmpl.json
305+
│ │ ├── _aiidasubmit.sh
306+
│ │ └── aiida.in
307+
│ └── outputs
308+
│ ├── _scheduler-stderr.txt
309+
│ ├── _scheduler-stdout.txt
310+
│ └── aiida.out
311+
└── my-workflows
312+
├── .aiida_dump_safeguard
313+
└── workflows
314+
└── MultiplyAddWorkChain-11
315+
├── .aiida_dump_safeguard
316+
├── .aiida_node_metadata.yaml
317+
├── 01-multiply-12
318+
│ ├── .aiida_dump_safeguard
319+
│ ├── .aiida_node_metadata.yaml
320+
│ └── inputs
321+
│ └── source_file
322+
└── 02-ArithmeticAddCalculation-14
323+
├── .aiida_dump_safeguard
324+
├── .aiida_node_metadata.yaml
325+
├── inputs
326+
│ ├── .aiida
327+
│ │ ├── calcinfo.json
328+
│ │ └── job_tmpl.json
329+
│ ├── _aiidasubmit.sh
330+
│ └── aiida.in
331+
└── outputs
332+
├── _scheduler-stderr.txt
333+
├── _scheduler-stdout.txt
334+
└── aiida.out
335+
336+
.. Common Options
337+
.. ------------
338+
339+
.. All three commands (``verdi process dump``, ``verdi group dump``, and ``verdi profile dump``) support various options:
340+
341+
.. - ``-p/--path PATH``: Specify a custom dumping path
342+
.. - ``-o/--overwrite``: Fully overwrite an existing dumping directory
343+
.. - ``--include-inputs/--exclude-inputs``: Include/exclude linked input nodes
344+
.. - ``--include-outputs/--exclude-outputs``: Include/exclude linked output nodes
345+
.. - ``--include-attributes/--exclude-attributes``: Include/exclude node attributes
346+
.. - ``--include-extras/--exclude-extras``: Include/exclude node extras
347+
.. - ``-f/--flat``: Dump files in a flat directory structure
348+
.. - ``--dump-unsealed/--no-dump-unsealed``: Allow/disallow dumping of unsealed process nodes
349+
350+
.. For group and profile dumping, additional options include:
351+
352+
.. - ``--filter-by-last-dump-time/--no-filter-by-last-dump-time``: Only dump nodes modified since last dump
353+
.. - ``--dump-processes/--no-dump-processes``: Control process dumping
354+
.. - ``--only-top-level-calcs/--no-only-top-level-calcs``: Control calculation directory creation
355+
.. - ``--only-top-level-workflows/--no-only-top-level-workflows``: Control workflow directory creation
356+
.. - ``--symlink-calcs/--no-symlink-calcs``: Use symlinks for duplicate calculations to avoid data duplication
357+
358+
.. For a full list of available options, call ``verdi process dump --help``, ``verdi group dump --help``, or ``verdi profile dump --help``.
359+
360+
.. Incremental Dumping
361+
.. ---------------~~
362+
363+
.. By default, all dump commands operate in incremental mode, which means they only process nodes that are new or have been modified since the last dump operation. This makes the feature efficient when run repeatedly:
364+
365+
.. .. code-block:: shell
366+
367+
.. $ verdi group dump my-calculations
368+
.. Report: No (new) calculations to dump in group `my-calculations`.
369+
.. Report: No (new) workflows to dump in group `my-calculations`.
370+
.. Success: Raw files for group `my-calculations` dumped into folder `my-calculations-dump`.
371+
372+
Python API
373+
----------
374+
375+
The dump functionality is also available through a Python API:
376+
377+
.. code-block:: python
378+
379+
# Dump a single process
380+
from aiida import orm, load_profile
381+
from aiida.tools.dump.process import ProcessDump
382+
383+
load_profile()
384+
process_node = orm.load_node(4) # ArithmeticAddCalculation node
385+
process_dump = ProcessDump(process_node=process_node)
386+
process_dump.dump()
387+
388+
# Dump a group
389+
from aiida.tools.dump.group import GroupDump
390+
group = orm.load_group('my-calculations')
391+
group_dump = GroupDump(group=group)
392+
group_dump.dump()
393+
394+
# Dump a profile
395+
from aiida.tools.dump.profile import ProfileDump
396+
profile_dump = ProfileDump()
397+
profile_dump.dump()
398+
399+
Usage Scenarios
400+
------------~~
401+
402+
The data dumping functionality was designed to bridge the gap between research conducted with AiiDA and scientists not familiar with AiiDA. Some common use cases include:
403+
404+
1. Sharing simulation results with collaborators who don't use AiiDA
405+
2. Periodically running the dump command to reflect changes while working on a project
406+
3. Analyzing data using traditional shell tools outside of AiiDA's programmatic approach
407+
408+
###
409+
151410
.. _how-to:data:import:provenance:
152411

153412
Provenance

docs/source/reference/command_line.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,7 @@ Below is a list with all available subcommands.
223223
create Create an empty group with a given label.
224224
delete Delete groups and (optionally) the nodes they contain.
225225
description Change the description of a group.
226+
dump Dump data of an AiiDA group to disk.
226227
list Show a list of existing groups.
227228
move-nodes Move the specified NODES from one group to another.
228229
path Inspect groups of nodes, with delimited label paths.
@@ -397,6 +398,7 @@ Below is a list with all available subcommands.
397398
Commands:
398399
configure-rabbitmq Configure RabbitMQ for a profile.
399400
delete Delete one or more profiles.
401+
dump Dump all data in an AiiDA profile's storage to disk.
400402
list Display a list of all available profiles.
401403
set-default Set a profile as the default profile.
402404
setdefault (Deprecated) Set a profile as the default profile.
@@ -451,7 +453,7 @@ Below is a list with all available subcommands.
451453
--broker-host HOSTNAME Hostname for the message broker. [default: 127.0.0.1]
452454
--broker-port INTEGER Port for the message broker. [default: 5672]
453455
--broker-virtual-host TEXT Name of the virtual host for the message broker without
454-
leading forward slash. [default: ""]
456+
leading forward slash.
455457
--repository DIRECTORY Absolute path to the file repository.
456458
--test-profile Designate the profile to be used for running the test
457459
suite only.

0 commit comments

Comments
 (0)