Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 213 additions & 0 deletions docs/how_to_contribute.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
.. _how_to_contribute:

=====================
How to Contribute
=====================

We welcome contributions to PyHealth! This guide will help you get started with contributing datasets, tasks, models, bug fixes, or other improvements to the project.

Getting Started
===============

Prerequisites
-------------

PyHealth uses GitHub for development, so you'll need a GitHub account to contribute.

Setting Up Your Development Environment
---------------------------------------

To start contributing to PyHealth:

1. **Fork the PyHealth repository** on GitHub
2. **Clone your forked repository** to your local machine:

.. code-block:: bash

git clone https://github.com/your_username/PyHealth.git
cd PyHealth

3. **Install dependencies**:

.. code-block:: bash

pip install mne pandarallel rdkit transformers accelerate polars

4. **Implement your code** with proper test cases
5. **Push changes** to your forked repository
6. **Create a pull request** to the main PyHealth repository

- Target the ``main`` branch
- Enable edits by maintainers
- Rebase with the remote ``sunlabuiuc`` main branch before creating the PR

Implementation Requirements
===========================

Code File Headers
-----------------

For new contributors, include the following information at the top of your code files:

- Your name(s)
- Your NetID(s) (if applicable for UIUC students)
- Paper title (if applicable to a reproducibility contribution)
- Paper link (if applicable)
- Description of the task/dataset/model you're implementing

Code Style and Documentation
-----------------------------

**General Guidelines:**

- Use object-oriented programming with well-defined and typed functions
- Follow snake_case naming for variables and functions (e.g., ``this_variable``)
- Use PascalCase for class names (e.g., ``ThisClass``)
- Follow PEP8 style with 88 character line length
- Use Google style for docstrings

**Function Documentation Requirements:**

Each function must document:

- **Input arguments**: Define variable types and descriptions
- **Output arguments**: Define variable types and descriptions
- **High-level description** of what the function does
- **Example use case** or where it will be called

**Example Well-Documented Function:**

.. code-block:: python

def parse_basic_info(self, patients: Dict[str, Patient]) -> Dict[str, Patient]:
"""Helper functions which parses patients and admissions tables.

Will be called in `self.parse_tables()`

Docs:
- patients: https://mimic.mit.edu/docs/iv/modules/hosp/patients/
- admissions: https://mimic.mit.edu/docs/iv/modules/hosp/admissions/

Args:
patients: a dict of `Patient` objects indexed by patient_id.

Returns:
The updated patients dict.
"""

Types of Contributions
======================

Contributing a Dataset
----------------------

All datasets must follow these guidelines:

- **Inherit from BaseDataset**: All datasets must inherit from the appropriate BaseDataset class
- **Follow established patterns**:

- For EHR datasets: See the `MIMIC4 dataset example <https://github.com/sunlabuiuc/PyHealth/blob/main/pyhealth/datasets/mimic4.py>`_
- For image datasets: See the `CovidCXR dataset example <https://github.com/sunlabuiuc/PyHealth/blob/main/pyhealth/datasets/covidcxr.py>`_ where each folder represents a sample

- **Include a test task**: Datasets should ideally have an associated task for testing purposes

**Key Requirements:**

- Define all required variables outlined in the BaseDataset documentation
- Provide clear data loading and processing methods
- Include proper error handling and validation

Contributing a Task
-------------------

Tasks must follow the established task class structure:

- **Inherit from base task class**: Follow the pattern defined in existing tasks
- **Examples to reference**:

- `Mortality prediction task <https://github.com/sunlabuiuc/PyHealth/blob/main/pyhealth/tasks/mortality_prediction.py>`_
- `X-ray classification task <https://github.com/sunlabuiuc/PyHealth/blob/main/pyhealth/tasks/chest_xray_classification.py>`_

- **Flexibility**: Tasks can include various implementation details but must have clear inputs/outputs
- **Test cases**: Include example test cases with defined inputs and expected outputs

Contributing a Model
--------------------

Models must follow the model base class structure:

- **Inherit from BaseModel**: All models must inherit from the appropriate base model class
- **Reference implementation**: See the `RNN model example <https://github.com/sunlabuiuc/PyHealth/blob/main/pyhealth/models/rnn.py>`_
- **Test cases**: Include example test cases with dummy inputs and expected outputs

**Key Requirements:**

- Implement required abstract methods from the base class
- Provide clear forward pass implementation
- Include proper initialization and configuration methods

Test Case Requirements
======================

Every contribution must include two types of test cases:

1. **Automated tests**: These will be run by our continuous integration system
2. **Manual test cases**: You must define these yourself with:

- Clear input specifications
- Expected output formats
- Example usage demonstrating functionality

**Note**: You can use frontier LLMs to help generate basic test cases, which we consider valid as long as they are reasonable and comprehensive.

Pull Request Guidelines
=======================

Formatting Your Pull Request
----------------------------

Every pull request must include the following information in the comment:

1. **Who you are** (include NetID if you're an Illinois student)
2. **Type of contribution** (dataset, task, model, bug fix, etc.)
3. **High-level description** of what you've implemented
4. **File guide**: Quick rundown of which files to examine to test your implementation

**Example PR Description:**

.. code-block:: text

**Contributor:** Jane Doe (jdoe2@illinois.edu)

**Contribution Type:** New Dataset

**Description:** Added support for the XYZ Hospital dataset with patient
admission records and diagnostic codes. Includes data preprocessing and
sample task for mortality prediction.

**Files to Review:**
- `pyhealth/datasets/xyz_hospital.py` - Main dataset implementation
- `pyhealth/tasks/xyz_mortality.py` - Example task
- `tests/core/test_xyz_dataset.py` - Test cases

Review Process
--------------

After submitting your pull request:

1. Maintainers will review your code for style, functionality, and completeness
2. Automated tests will be run to ensure compatibility
3. You may be asked to make revisions based on feedback
4. Once approved, your contribution will be merged into the main branch

Getting Help
============

If you need assistance:

- Check existing issues and discussions on GitHub
- Review similar implementations in the codebase
- Reach out to maintainers through GitHub issues
- Consider using LLMs to help with code formatting and documentation

We appreciate your contributions to making PyHealth better for the healthcare AI community!
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ GRASP deep learning ``pyhealth.models.GRASP`

install
tutorials
advance_tutorials
.. advance_tutorials


.. toctree::
Expand All @@ -326,6 +326,7 @@ GRASP deep learning ``pyhealth.models.GRASP`
:hidden:
:caption: Additional Information

how_to_contribute
live
log
about
Expand Down
6 changes: 3 additions & 3 deletions docs/tutorials.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
Tutorials
========================

We provide the following tutorials to help users get started with our pyhealth. Please bare with us as we update the docuemntation on how to use pyhealth.
We provide the following tutorials to help users get started with our pyhealth. Please bear with us as we update the documentation on how to use pyhealth 2.0.


`Tutorial 0: Introduction to pyhealth.data <https://colab.research.google.com/drive/1y9PawgSbyMbSSMw1dpfwtooH7qzOEYdN?usp=sharing>`_ `[Video] <https://www.youtube.com/watch?v=Nk1itBoLOX8&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=2>`_

`Tutorial 1: Introduction to pyhealth.datasets <https://colab.research.google.com/drive/18kbzEQAj1FMs_J9rTGX8eCoxnWdx4Ltn?usp=sharing>`_ `[Video] <https://www.youtube.com/watch?v=c1InKqFJbsI&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=3>`_
`Tutorial 1: Introduction to pyhealth.datasets <https://colab.research.google.com/drive/1voSx7wEfzXfEf2sIfW6b-8p1KqMyuWxK?usp=sharing>`_ `[Video (PyHealth 1.6)] <https://www.youtube.com/watch?v=c1InKqFJbsI&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=3>`_

`Tutorial 2: Introduction to pyhealth.tasks <https://colab.research.google.com/drive/1r7MYQR_5yCJGpK_9I9-A10HmpupZuIN-?usp=sharing>`_ `[Video] <https://www.youtube.com/watch?v=CxESe1gYWU4&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=4>`_
`Tutorial 2: Introduction to pyhealth.tasks <https://colab.research.google.com/drive/1kKkkBVS_GclHoYTbnOtjyYnSee79hsyT?usp=sharing>`_ `[Video (PyHealth 1.6)] <https://www.youtube.com/watch?v=CxESe1gYWU4&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=4>`_

`Tutorial 3: Introduction to pyhealth.models <https://colab.research.google.com/drive/1LcXZlu7ZUuqepf269X3FhXuhHeRvaJX5?usp=sharing>`_ `[Video] <https://www.youtube.com/watch?v=fRc0ncbTgZA&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=6>`_

Expand Down