Skip to content

Commit 0893db7

Browse files
authored
Update docs (#40)
1 parent ed2fd59 commit 0893db7

File tree

8 files changed

+2546
-1703
lines changed

8 files changed

+2546
-1703
lines changed

.github/workflows/docs_build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,5 +31,5 @@ jobs:
3131
python3 -m pip install --upgrade pip && python3 -m pip install poetry
3232
poetry env use '3.10'
3333
source $(poetry env info --path)/bin/activate
34-
poetry install --with docs,test
34+
poetry install --with docs,test,dev,peft
3535
cd docs && rm -rf source/reference/api && make html

README.md

Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,30 @@
11
# mmlearn
2+
23
[![code checks](https://github.com/VectorInstitute/mmlearn/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/mmlearn/actions/workflows/code_checks.yml)
34
[![integration tests](https://github.com/VectorInstitute/mmlearn/actions/workflows/integration_tests.yml/badge.svg)](https://github.com/VectorInstitute/mmlearn/actions/workflows/integration_tests.yml)
45
[![license](https://img.shields.io/github/license/VectorInstitute/mmlearn.svg)](https://github.com/VectorInstitute/mmlearn/blob/main/LICENSE)
56

6-
This project aims at enabling the evaluation of existing multimodal representation learning methods, as well as facilitating
7+
*mmlearn* aims at enabling the evaluation of existing multimodal representation learning methods, as well as facilitating
78
experimentation and research for new techniques.
89

910
## Quick Start
11+
1012
### Installation
13+
1114
#### Prerequisites
15+
1216
The library requires Python 3.10 or later. We recommend using a virtual environment to manage dependencies. You can create
1317
a virtual environment using the following command:
18+
1419
```bash
1520
python3 -m venv /path/to/new/virtual/environment
1621
source /path/to/new/virtual/environment/bin/activate
1722
```
1823

1924
#### Installing binaries
25+
2026
To install the pre-built binaries, run:
27+
2128
```bash
2229
python3 -m pip install mmlearn
2330
```
@@ -73,13 +80,15 @@ Uses the <a href=https://huggingface.co/docs/peft/index>PEFT</a> library to enab
7380
</table>
7481

7582
For example, to install the library with the `vision` and `audio` extras, run:
83+
7684
```bash
7785
python3 -m pip install mmlearn[vision,audio]
7886
```
7987

8088
</details>
8189

8290
#### Building from source
91+
8392
To install the library from source, run:
8493

8594
```bash
@@ -89,6 +98,7 @@ python3 -m pip install -e .
8998
```
9099

91100
### Running Experiments
101+
92102
We use [Hydra](https://hydra.cc/docs/intro/) and [hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/) to manage configurations
93103
in the library.
94104

@@ -97,9 +107,11 @@ have an `__init__.py` file to make it a Python package and an `experiment` folde
97107
This format allows the use of `.yaml` configuration files as well as Python modules (using [structured configs](https://hydra.cc/docs/tutorials/structured_config/intro/) or [hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/)) to define the experiment configurations.
98108

99109
To run an experiment, use the following command:
110+
100111
```bash
101112
mmlearn_run 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> experiment=your_experiment_name
102113
```
114+
103115
Hydra will compose the experiment configuration from all the configurations in the specified directory as well as all the
104116
configurations in the `mmlearn` package. *Note the dot-separated path to the directory containing the experiment configuration
105117
files.*
@@ -109,23 +121,38 @@ One can add a path to `hydra.searchpath` either as a package (`pkg://path.to.con
109121
Hence, please refrain from using the `file://` notation.
110122

111123
Hydra also allows for overriding configuration parameters from the command line. To see the available options and other information, run:
124+
112125
```bash
113126
mmlearn_run 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> --help
114127
```
115128

116129
By default, the `mmlearn_run` command will run the experiment locally. To run the experiment on a SLURM cluster, we use
117130
the [submitit launcher](https://hydra.cc/docs/plugins/submitit_launcher/) plugin built into Hydra. The following is an example
118131
of how to run an experiment on a SLURM cluster:
132+
119133
```bash
120-
mmlearn_run --multirun hydra.launcher.mem_gb=32 hydra.launcher.qos=your_qos hydra.launcher.partition=your_partition hydra.launcher.gres=gpu:4 hydra.launcher.cpus_per_task=8 hydra.launcher.tasks_per_node=4 hydra.launcher.nodes=1 hydra.launcher.stderr_to_stdout=true hydra.launcher.timeout_min=60 '+hydra.launcher.additional_parameters={export: ALL}' 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> experiment=your_experiment_name
134+
mmlearn_run --multirun \
135+
hydra.launcher.mem_per_cpu=5G \
136+
hydra.launcher.qos=your_qos \
137+
hydra.launcher.partition=your_partition \
138+
hydra.launcher.gres=gpu:4 \
139+
hydra.launcher.cpus_per_task=8 \
140+
hydra.launcher.tasks_per_node=4 \
141+
hydra.launcher.nodes=1 \
142+
hydra.launcher.stderr_to_stdout=true \
143+
hydra.launcher.timeout_min=720 \
144+
'hydra.searchpath=[pkg://path.to.my_project.configs]' \
145+
+experiment=my_experiment \
146+
experiment_name=my_experiment_name
121147
```
148+
122149
This will submit a job to the SLURM cluster with the specified resources.
123150

124151
**Note**: After the job is submitted, it is okay to cancel the program with `Ctrl+C`. The job will continue running on
125152
the cluster. You can also add `&` at the end of the command to run it in the background.
126153

127-
128154
## Summary of Implemented Methods
155+
129156
<table>
130157
<tr>
131158
<th style="text-align: left; width: 250px"> Pretraining Methods </th>
@@ -181,33 +208,6 @@ Binary and multi-class classification tasks are supported.
181208
</tr>
182209
</table>
183210

184-
## Components
185-
### Datasets
186-
Every dataset object must return an instance of `Example` with one or more keys/attributes corresponding to a modality name
187-
as specified in the `Modalities` registry. The `Example` object must also include an `example_index` attribute/key, which
188-
is used, in addition to the dataset index, to uniquely identify the example.
189-
190-
<details>
191-
<summary><b>CombinedDataset</b></summary>
192-
193-
The `CombinedDataset` object is used to combine multiple datasets into one. It accepts an iterable of `torch.utils.data.Dataset`
194-
and/or `torch.utils.data.IterableDataset` objects and returns an `Example` object from one of the datasets, given an index.
195-
Conceptually, the `CombinedDataset` object is a concatenation of the datasets in the input iterable, so the given index
196-
can be mapped to a specific dataset based on the size of the datasets. As iterable-style datasets do not support random access,
197-
the examples from these datasets are returned in order as they are iterated over.
198-
199-
The `CombinedDataset` object also adds a `dataset_index` attribute to the `Example` object, corresponding to the index of
200-
the dataset in the input iterable. Every example returned by the `CombinedDataset` will have an `example_ids` attribute,
201-
which is instance of `Example` containing the same keys/attributes as the original example, with the exception of the
202-
`example_index` and `dataset_index` attributes, with values being a tensor of the `dataset_index` and `example_index`.
203-
</details>
204-
205-
### Dataloading
206-
When dealing with multiple datasets with different modalities, the default `collate_fn` of `torch.utils.data.DataLoader`
207-
may not work, as it assumes that all examples have the same keys/attributes. In that case, the `collate_example_list`
208-
function can be used as the `collate_fn` argument of `torch.utils.data.DataLoader`. This function takes a list of `Example`
209-
objects and returns a dictionary of tensors, with all the keys/attributes of the `Example` objects.
210-
211211
## Contributing
212212

213213
If you are interested in contributing to the library, please see [CONTRIBUTING.MD](CONTRIBUTING.MD). This file contains

docs/source/conf.py

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,14 @@
3131
"sphinx_copybutton",
3232
"sphinx_design",
3333
"sphinxcontrib.apidoc",
34+
"myst_parser",
3435
]
3536
add_module_names = False
37+
apidoc_module_dir = "../../mmlearn"
38+
apidoc_output_dir = "reference/api"
39+
apidoc_excluded_paths = ["tests"]
40+
apidoc_separate_modules = True
41+
apidoc_module_first = True
3642
autoclass_content = "class"
3743
autodoc_default_options = {
3844
"members": True,
@@ -47,13 +53,6 @@
4753
autosummary_generate = True
4854
copybutton_prompt_text = r">>> |\.\.\. "
4955
copybutton_prompt_is_regexp = True
50-
napoleon_google_docstring = False
51-
napoleon_numpy_docstring = True
52-
napoleon_include_init_with_doc = True
53-
napoleon_attr_annotations = True
54-
set_type_checking_flag = True
55-
56-
5756
intersphinx_mapping = {
5857
"python": ("https://docs.python.org/3.10/", None),
5958
"numpy": ("http://docs.scipy.org/doc/numpy/", None),
@@ -67,9 +66,12 @@
6766
"torchmetrics": ("https://lightning.ai/docs/torchmetrics/stable/", None),
6867
"Pillow": ("https://pillow.readthedocs.io/en/latest/", None),
6968
"transformers": ("https://huggingface.co/docs/transformers/en/", None),
70-
"peft": ("https://huggingface.co/docs/peft/en/", None),
7169
}
72-
70+
napoleon_google_docstring = False
71+
napoleon_numpy_docstring = True
72+
napoleon_include_init_with_doc = True
73+
napoleon_attr_annotations = True
74+
set_type_checking_flag = True
7375
templates_path = ["_templates"]
7476

7577
# -- Options for HTML output -------------------------------------------------

docs/source/contributing.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
.. include:: ../../CONTRIBUTING.md
2+
:parser: myst_parser.sphinx_

docs/source/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,6 @@ Contents
1212
:maxdepth: 2
1313

1414
installation
15-
getting_started
15+
user_guide
16+
contributing
1617
api

0 commit comments

Comments
 (0)