Skip to content

Commit 71e6838

Browse files
authored
Fix pip extras (#4)
1 parent 9d52a41 commit 71e6838

File tree

7 files changed

+374
-633
lines changed

7 files changed

+374
-633
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ repos:
2222
args: [--lock]
2323

2424
- repo: https://github.com/astral-sh/ruff-pre-commit
25-
rev: v0.5.7
25+
rev: v0.6.1
2626
hooks:
2727
- id: ruff
2828
args: [--fix, --exit-non-zero-on-fix]
@@ -40,9 +40,10 @@ repos:
4040
rev: v1.11.1
4141
hooks:
4242
- id: mypy
43-
entry: python3 -m mypy --show-error-codes --pretty --config-file pyproject.toml
44-
types: [python]
45-
exclude: "tests"
43+
entry: mypy
44+
args: ["--config-file=pyproject.toml", "--show-error-codes", "--pretty"]
45+
types_or: [python, pyi]
46+
exclude: tests|projects
4647

4748
- repo: https://github.com/nbQA-dev/nbQA
4849
rev: 1.8.7
@@ -58,5 +59,3 @@ repos:
5859
entry: python3 -m pytest -m "not integration_test"
5960
pass_filenames: false
6061
always_run: true
61-
62-
exclude: "projects"

README.md

Lines changed: 81 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,63 @@ python3 -m venv /path/to/new/virtual/environment
1616
source /path/to/new/virtual/environment/bin/activate
1717
```
1818

19+
<details>
20+
<summary><b>Installation Options</b></summary>
21+
You can install optional dependencies to enable additional features. Use one or more of the pip extras listed below to
22+
install the desired dependencies.
23+
24+
<table>
25+
<tr>
26+
<th style="text-align: left; width: 150px"> pip extra </th>
27+
<th style="text-align: center"> Dependencies </th>
28+
<th style="text-align: center"> Notes </th>
29+
</tr>
30+
31+
<tr>
32+
<td>
33+
vision
34+
</td>
35+
<td>
36+
"torchvision", "opencv-python", "timm"
37+
</td>
38+
<td>
39+
Enables image processing and vision tasks.
40+
</td>
41+
</tr>
42+
43+
<tr>
44+
<td>
45+
audio
46+
</td>
47+
<td>
48+
"torchaudio"
49+
</td>
50+
<td>
51+
Enables audio processing and tasks.
52+
</td>
53+
</tr>
54+
55+
<tr>
56+
<td>
57+
peft
58+
</td>
59+
<td>
60+
"peft"
61+
</td>
62+
<td>
63+
Uses the <a href=https://huggingface.co/docs/peft/index>PEFT</a> library to enable parameter-efficient fine-tuning.
64+
</td>
65+
</tr>
66+
67+
</table>
68+
69+
For example, to install the library with the `vision` and `audio` extras, run:
70+
```bash
71+
python3 -m pip install mmlearn[vision,audio]
72+
```
73+
74+
</details>
75+
1976
#### Installing binaries
2077
To install the pre-built binaries, run:
2178
```bash
@@ -32,25 +89,31 @@ python3 -m pip install -e .
3289
```
3390

3491
### Running Experiments
35-
To run an experiment, create a folder with a similar structure as the [`configs`](configs/) folder.
36-
Then, use the `mmlearn_run` command to run the experiment as defined in a `.yaml` file under the `experiment` folder, like so:
92+
We use [Hydra](https://hydra.cc/docs/intro/) and [hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/) to manage configurations
93+
in the library.
94+
95+
For new experiments, it is recommended to create a new directory to store the configuration files. The directory should
96+
have an `__init__.py` file to make it a Python package and an `experiment` folder to store the experiment configuration files.
97+
This format allows the use of `.yaml` configuration files as well as Python modules (using [structured configs](https://hydra.cc/docs/tutorials/structured_config/intro/) or [hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/)) to define the experiment configurations.
98+
99+
To run an experiment, use the following command:
37100
```bash
38-
mmlearn_run --config-dir /path/to/config/dir +experiment=<name_of_experiment_config> experiment=your_experiment_name
101+
mmlearn_run 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> experiment=your_experiment_name
39102
```
40-
Notice that the config directory refers to the top-level directory containing the `experiment` folder. The experiment
41-
name is the name of the `.yaml` file under the `experiment` folder, without the extension.
103+
Hydra will compose the experiment configuration from all the configurations in the specified directory as well as all the
104+
configurations in the `mmlearn` package. *Note the dot-separated path to the directory containing the experiment configuration
105+
files.*
42106

43-
We use [Hydra](https://hydra.cc/docs/intro/) to manage configurations, so you can override any configuration parameter
44-
from the command line. To see the available options and other information, run:
107+
Hydra also allows for overriding configurations parameters from the command line. To see the available options and other information, run:
45108
```bash
46-
mmlearn_run --config-dir /path/to/config/dir +experiment=<name_of_experiment> --help
109+
mmlearn_run 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> --help
47110
```
48111

49112
By default, the `mmlearn_run` command will run the experiment locally. To run the experiment on a SLURM cluster, we use
50113
the [submitit launcher](https://hydra.cc/docs/plugins/submitit_launcher/) plugin built into Hydra. The following is an example
51114
of how to run an experiment on a SLURM cluster:
52115
```bash
53-
mmlearn_run --multirun hydra.launcher.mem_gb=32 hydra.launcher.qos=your_qos hydra.launcher.partition=your_partition hydra.launcher.gres=gpu:4 hydra.launcher.cpus_per_task=8 hydra.launcher.tasks_per_node=4 hydra.launcher.nodes=1 hydra.launcher.stderr_to_stdout=true hydra.launcher.timeout_min=60 '+hydra.launcher.additional_parameters={export: ALL}' --config-dir /path/to/config/dir +experiment=<name_of_experiment_config> experiment=your_experiment_name
116+
mmlearn_run --multirun hydra.launcher.mem_gb=32 hydra.launcher.qos=your_qos hydra.launcher.partition=your_partition hydra.launcher.gres=gpu:4 hydra.launcher.cpus_per_task=8 hydra.launcher.tasks_per_node=4 hydra.launcher.nodes=1 hydra.launcher.stderr_to_stdout=true hydra.launcher.timeout_min=60 '+hydra.launcher.additional_parameters={export: ALL}' 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> experiment=your_experiment_name
54117
```
55118
This will submit a job to the SLURM cluster with the specified resources.
56119

@@ -93,20 +156,18 @@ using recall@k metric. This is applicable to any number of pairs of modalities a
93156

94157
## Components
95158
### Datasets
96-
Every dataset object must return an instance of [`Example`](mmlearn/datasets/core/example.py) with one or more keys/attributes
97-
corresponding to a modality name as specified in the [`Modalities registry`](mmlearn/datasets/core/modalities.py).
98-
The `Example` object must also include an `example_index` attribute/key, which is used, in addition to the dataset index,
99-
to uniquely identify the example.
159+
Every dataset object must return an instance of `Example` with one or more keys/attributes corresponding to a modality name
160+
as specified in the `Modalities` registry. The `Example` object must also include an `example_index` attribute/key, which
161+
is used, in addition to the dataset index, to uniquely identify the example.
100162

101163
<details>
102164
<summary><b>CombinedDataset</b></summary>
103165

104-
The [`CombinedDataset`](mmlearn/datasets/core/combined_dataset.py) object is used to combine multiple datasets into one. It
105-
accepts an iterable of `torch.utils.data.Dataset` and/or `torch.utils.data.IterableDataset` objects and returns an `Example`
106-
object from one of the datasets, given an index. Conceptually, the `CombinedDataset` object is a concatenation of the
107-
datasets in the input iterable, so the given index can be mapped to a specific dataset based on the size of the datasets.
108-
As iterable-style datasets do not support random access, the examples from these datasets are returned in order as they
109-
are iterated over.
166+
The `CombinedDataset` object is used to combine multiple datasets into one. It accepts an iterable of `torch.utils.data.Dataset`
167+
and/or `torch.utils.data.IterableDataset` objects and returns an `Example` object from one of the datasets, given an index.
168+
Conceptually, the `CombinedDataset` object is a concatenation of the datasets in the input iterable, so the given index
169+
can be mapped to a specific dataset based on the size of the datasets. As iterable-style datasets do not support random access,
170+
the examples from these datasets are returned in order as they are iterated over.
110171

111172
The `CombinedDataset` object also adds a `dataset_index` attribute to the `Example` object, corresponding to the index of
112173
the dataset in the input iterable. Every example returned by the `CombinedDataset` will have an `example_ids` attribute,
@@ -116,7 +177,7 @@ which is instance of `Example` containing the same keys/attributes as the origin
116177

117178
### Dataloading
118179
When dealing with multiple datasets with different modalities, the default `collate_fn` of `torch.utils.data.DataLoader`
119-
may not work, as it assumes that all examples have the same keys/attributes. In that case, the [`collate_example_list`](mmlearn/datasets/core/example.py)
180+
may not work, as it assumes that all examples have the same keys/attributes. In that case, the `collate_example_list`
120181
function can be used as the `collate_fn` argument of `torch.utils.data.DataLoader`. This function takes a list of `Example`
121182
objects and returns a dictionary of tensors, with all the keys/attributes of the `Example` objects.
122183

mmlearn/datasets/__init__.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
"""Datasets."""
22

33
from mmlearn.datasets.chexpert import CheXpert
4-
from mmlearn.datasets.ego4d import Ego4DDataset
54
from mmlearn.datasets.imagenet import ImageNet
65
from mmlearn.datasets.librispeech import LibriSpeech
76
from mmlearn.datasets.llvip import LLVIPDataset
@@ -12,7 +11,6 @@
1211

1312
__all__ = [
1413
"CheXpert",
15-
"Ego4DDataset",
1614
"ImageNet",
1715
"LibriSpeech",
1816
"LLVIPDataset",

mmlearn/datasets/ego4d.py

Lines changed: 0 additions & 126 deletions
This file was deleted.

0 commit comments

Comments
 (0)