Skip to content

GSoC 2023 projects

Osvaldo A Martin edited this page Jan 20, 2023 · 7 revisions

ArviZ

ArviZ, is a project dedicated to promoting and building tools for exploratory analysis of Bayesian models. It currently has a Python and a Julia interface. ArviZ aims to integrate seamlessly with established probabilistic programming languages like PyStan, PyMC (3 and 4), Turing, Soss, emcee, Pyro, and to be easily integrated with novel or bespoke Bayesian analyses. Where the aim of the probabilistic programming languages is to make it easy to build and solve Bayesian models, the aim of the ArviZ libraries is to make it easy to process and analyze the results from those Bayesian models.

Timeline

The timeline of the GSoC internships is available at the GSoC website

Projects

Below there is a list of possible topics for your GSoC project, we are also open to other topics, contact us on Gitter. Keep in mind that these are only ideas and that some of them can't be solved entirely in a single GSoC project. When writing your proposal, choose some specific tasks and make sure your proposal is adequate for the GSoC time commitment. We expect all projects to be 350h projects, if you'd like to be considered for a 175h project you must reach out on Gitter. We will not accept 175h applications from people with whom we haven't discussed their time commitments before submitting the application.

Each project also lists some specific requirements needed to be able to successfully complete the project, general requirements are listed below.

Note that these requirements can be learned while writing the proposal and during the community bonding period. You should feel confident to work on any project whose requirements are interesting to you and you would like to learn about them, they are not skills all that you are expected to know before writing your proposal. We aim for GSoC to provide a win-win scenario where you benefit from an inclusive and thriving environment in which to learn and the library benefits from your contributions.

All projects require being comfortable using ArviZ and understanding the relations between its 3 main modules: plots, stats, and data. However, unless specified otherwise, no specific knowledge of inference libraries or about the internals of from_xyz converter functions is needed.

Python

Students working on Python projects should be familiar with Python, numpy, and scipy and have basic xarray/InferenceData knowledge. They should also be able to write unit tests for the added functionality using pytest and be able to enforce development conventions and use black, pylint and pydocstyle for code style and linting.

Julia

Students working on Julia projects should be familiar with Julia, PyCall to use Python objects from within Julia, DataFrames and StatsBase. They should also be able to write unit tests for the added functionality using Test.

Project priority

The highest priority projects are (in no particular order): "Add Gen converter to ArviZ.jl", "Explore design ideas for plotting refactoring" and "InferenceData R compatibility". What does this mean? Organizations send a ranked list of their applicants to Google who then select the x first ranked students. We will use proposal quality as the main evaluation criteria, taking project priority into account only as a tiebreaker between proposals of similar quality. We recommend you apply for the projects that interest you as you'll probably write a much better proposal for that and have a better chance of being accepted.

Expected benefits of working on ArviZ

Students who work on ArviZ can expect their skillset to grow in

  • Bayesian Inference libraries
  • Bayesian modeling workflow and model criticism
  • Matplotlib and/or bokeh usage (depending on project)
  • xarray usage (depending on project)
  • Numba or Dask optimization (depending on project)

ArviZ Dashboards (Python)

The main proposal is to build a dashboard with linked plots, so that inspecting multiple dimensions is easier. At first, the focus should be the ability to call templates that only consume data. Some of the possible templates should be prior + prior predictive, sample diagnostics, posterior + posterior predictive, loo, and regression. The ability to dynamically add or subtract new plots, change plot types, and manually select and save information should also be considered. This dashboard could be built on top of Panel. Although other alternatives might be explored.

Required skills

People working on this project will need to be familiar with Panel (or alternative dashboard framework) and with ArviZ plotting and stats module.

Expected outcome

The expected outcome is a prototype of a new library (or module within the ArviZ library) with building blocks for users to generate dashboards and a couple of working example dashboards built using the said prototype. We don't expect the prototype written during GSoC to support including all ArviZ stats and plots as dashboard elements, but we do expect a writeup/developer guide explaining the design of the prototype to make sure other people can continue the work after GSoC and add the rest of ArviZ functions.

Info

  • Expected size: 350h
  • Difficulty rating: hard
  • Potential Mentors: Ari Hartikainen, Osvaldo Martin, Andy Maloney

Plotting refactoring (Python)

We are brainstorming ideas to refactor the plotting module that would allow better composability and extensibility of ArviZ plotting. We have some ideas in other wiki pages:

Expected output

The expected output is a prototype implementation of the ideas in the links above. We don't expect the prototype to replicate all ArviZ plots, only a small subset of it to evaluate the implementation and its feasibility as a potential refactor for ArviZ.

Required skills

People working on this project should be familiar with plot facetting and composition with both matplotlib and bokeh.

Info

  • Expected size: 350h
  • Difficulty rating: hard
  • Potential mentors: Oriol Abril, Ari Hartikainen, Osvaldo Martin
Clone this wiki locally