Skip to content

GSoC 2021 Projects

Leopold Talirz edited this page Jan 27, 2021 · 8 revisions

Getting started with AiiDA

AiiDA is a python framework for managing computational science workflows, with roots in computational materials science. It helps researchers manage large numbers of simulations (1k, 10k, 100k, ...) and complex workflows involving multiple executables. At the same time, it records the provenance of the entire simulation pipeline with the aim to make it fully reproducible.

AiiDA is used in research projects at universities, research institutes and companies (see SciPy 2020 talk, publications, and testimonials).

To be considered as a GSoC student, we ask you to make a small pull request to aiida-core - could be a simple bug fix, improving the documentation, etc. See e.g. GitHub issues by-label

Why work on AiiDA?

  • Help accelerate the transition to open (computational) science
  • Help fix the reproducibility crisis. Computational science is a good place to start.
  • Work with a team of computational scientists (mostly physics backgrounds) who are passionate about both science and coding.

A background in materials science is not needed, but a basic interest in materials science topics will make things easier for you.

Project 1 - Extending the AiiDA REST API towards workflow management

Level: intermediate

AiiDA comes with a built-in REST API (based on the flask microframework) that provides access to the provenance graph stored automatically with any workflow execution. In order to enable the integration of AiiDA as a workflow backend into new or existing web platforms, we plan to extend the REST API to support workflow management.

The design of the REST API extension will follow an AiiDA enhancement proposal that is currently being drafted (and will be ready before you start).

Expected outcomes

In this project, you will implement POST methods that allow the creation of new AiiDA entities via the REST API, starting with /users, and continuing with /computers, /nodes and /groups.

For particularly motivated students, there are exciting stretch goals available (not required/expected):

  • Option 1: implement a new /processes endpoint supporting GET, PUT and DELETE for workflow management
  • Option 2: implement authentication for the new endpoints

Skills

We expect you to be familiar with object-oriented programming in python. Some familiarity with web frameworks like flask will be beneficial.

Project 2 - Performance optimizations at the ORM level

Level: intermediate

AiiDA uses an object-relational mapping (ORM) to map python objects to corresponding records in its PostgreSQL database. The AiiDA ORM allows users to create and manage objects (e.g. AiiDA nodes in the provenance graph) through the AiiDA python API.

While an ORM provides useful abstraction for the user, it adds overhead that can become a bottleneck when operating on large numbers of objects at once.

The goal of this project is to speed up these processes by implementing a ORM API for bulk object creation.

Expected outcomes

You will implement bulk insertion functionality in the AiiDA ORM that works with both ORM backends supported by AiiDA (django and sqlalchemy) and provides performance improvements of several orders of magnitude for large numbers of operations (don't worry, it will).

Stretch goal for exceptional students (not required/expected): use your implementation inside AiiDA to speed up data import and export from AiiDA archive files & more.

Skills

You will need to understand what an object-relational mapping is and be able to work with existing ORM python frameworks. This requires familiarity with object-oriented programming in python as well as a basic understanding of relational databases (like PostgreSQL). Previous experience with an ORM like django or sqlalchemy is beneficial, but not required.

Project 3 - Built-in support for containerized simulation codes (docker, shifter, singularity, ...)

Level: intermediate

AiiDA stores all calculation executions (including detailed information on inputs and outputs) in the form of a directed acyclic graph, where each calculation is represented as a node, and is linked to other data nodes representing the inputs and the outputs that it created. Outputs, in turn, can then be inputs of new calculations. This graph is generated automatically by AiiDA; by inspecting all the "ancestors" of a given data node in the graph, we have a complete description of the "provenance" of that data node, i.e. the full sequence of calculations (with their inputs) that led to its generation.

When a calculation is performed by an external code (e.g. a binary on a remote high-performance computer (HPC)), the code is included as an input of the calculation. As of today, codes in AiiDA are represented as "symlinks" to an existing executable on a remove computer, i.e., they contain a reference of the computer on which they are installed, and the full path to the executable (plus some additional metadata, such as which dynamic libraries to load at runtime).

The last years have seen an increasing adoption of containers (using technologies such as docker, singularity, shifter or sarus), including in the HPC domain, where executables are no longer compiled on the target machine but are compiled once and run in a portable, encapsulated environment. The encapsulation of the full run-time environment, as well as the availability of global container registries, constitute a major step forward in terms of reproducibility - storing the identifier of the container in the AiiDA graph makes it possible to directly re-run existing workflows without access to the computer where it was originally executed.

This project will make containerized codes first-class citizens in the AiiDA provenance graph, making it possible to re-run recorded workflows, even if simulation steps are run on different remote (super)computers.

Expected outcomes

This project will

  1. extend the Code class/interface in AiiDA, to define a code that is not necessarily already installed on a supercomputer, but may be pulled from a container registry on demand (e.g. DockerHub or some local registry in the supercomputer centre)
  2. implement routines to re-run workflows recorded in an existing AiiDA graph, with no parameters except on which computer to run.

Skills

The participant will need to work with the workflow engine of AiiDA. This requires advanced python knowledge (including basic understanding of coroutines), as well as prior experience with container technologies (docker or singularity). Experience with job schedulers on clusters/supercomputers will be beneficial.

Project 4 - Make the AiiA REST API extensible through plugins

Level: intermediate

AiiDA lives in an ecosystem of plugins that provide a wide range of functionalities, from support for certain simulation codes, over scientific workflows or new data types to support for schedulers on supercomputers (see intro to plugin internals). This project focuses on making it possible for plugins to extend the AiiDA REST API - a feature that becomes increasingly important with the integration of AiiDA into web platforms.

Expected outcomes

Under guidance of your mentors,

  • you will refactor the AiiDA REST API to use python entry points for registering API endpoints.
  • all existing endpoints (/users, /computers, /nodes, /groups, ...) will be registered through entry points themselves.
  • the aiida-diff demo plugin will include an example of how to add a new REST endpoint

Skills

We expect you to be familiar with object-oriented programming in python. Some familiarity with web frameworks like flask will be beneficial.

Project N - Your Idea Here

If you're already familiar with AiiDA and have your own idea on how to improve it, we're happy to consider it (you may also want to check the development roadmap for further interesting project ideas). In this case, please think about the steps you would take to attack the problem and contact us in advance so that we can draw up a rough work plan.

Mentorship

The mentors for GSOC 2021 are

We have an active Slack workspace & biweekly developer meetings.

Clone this wiki locally