You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can install optional dependencies to enable additional features. Use one or more of the pip extras listed below to
22
+
install the desired dependencies.
23
+
24
+
<table>
25
+
<tr>
26
+
<thstyle="text-align: left; width: 150px"> pip extra </th>
27
+
<thstyle="text-align: center"> Dependencies </th>
28
+
<thstyle="text-align: center"> Notes </th>
29
+
</tr>
30
+
31
+
<tr>
32
+
<td>
33
+
vision
34
+
</td>
35
+
<td>
36
+
"torchvision", "opencv-python", "timm"
37
+
</td>
38
+
<td>
39
+
Enables image processing and vision tasks.
40
+
</td>
41
+
</tr>
42
+
43
+
<tr>
44
+
<td>
45
+
audio
46
+
</td>
47
+
<td>
48
+
"torchaudio"
49
+
</td>
50
+
<td>
51
+
Enables audio processing and tasks.
52
+
</td>
53
+
</tr>
54
+
55
+
<tr>
56
+
<td>
57
+
peft
58
+
</td>
59
+
<td>
60
+
"peft"
61
+
</td>
62
+
<td>
63
+
Uses the <a href=https://huggingface.co/docs/peft/index>PEFT</a> library to enable parameter-efficient fine-tuning.
64
+
</td>
65
+
</tr>
66
+
67
+
</table>
68
+
69
+
For example, to install the library with the `vision` and `audio` extras, run:
70
+
```bash
71
+
python3 -m pip install mmlearn[vision,audio]
72
+
```
73
+
74
+
</details>
75
+
19
76
#### Installing binaries
20
77
To install the pre-built binaries, run:
21
78
```bash
@@ -32,25 +89,31 @@ python3 -m pip install -e .
32
89
```
33
90
34
91
### Running Experiments
35
-
To run an experiment, create a folder with a similar structure as the [`configs`](configs/) folder.
36
-
Then, use the `mmlearn_run` command to run the experiment as defined in a `.yaml` file under the `experiment` folder, like so:
92
+
We use [Hydra](https://hydra.cc/docs/intro/) and [hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/) to manage configurations
93
+
in the library.
94
+
95
+
For new experiments, it is recommended to create a new directory to store the configuration files. The directory should
96
+
have an `__init__.py` file to make it a Python package and an `experiment` folder to store the experiment configuration files.
97
+
This format allows the use of `.yaml` configuration files as well as Python modules (using [structured configs](https://hydra.cc/docs/tutorials/structured_config/intro/) or [hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/)) to define the experiment configurations.
This will submit a job to the SLURM cluster with the specified resources.
56
119
@@ -93,20 +156,18 @@ using recall@k metric. This is applicable to any number of pairs of modalities a
93
156
94
157
## Components
95
158
### Datasets
96
-
Every dataset object must return an instance of [`Example`](mmlearn/datasets/core/example.py) with one or more keys/attributes
97
-
corresponding to a modality name as specified in the [`Modalities registry`](mmlearn/datasets/core/modalities.py).
98
-
The `Example` object must also include an `example_index` attribute/key, which is used, in addition to the dataset index,
99
-
to uniquely identify the example.
159
+
Every dataset object must return an instance of `Example` with one or more keys/attributes corresponding to a modality name
160
+
as specified in the `Modalities` registry. The `Example` object must also include an `example_index` attribute/key, which
161
+
is used, in addition to the dataset index, to uniquely identify the example.
100
162
101
163
<details>
102
164
<summary><b>CombinedDataset</b></summary>
103
165
104
-
The [`CombinedDataset`](mmlearn/datasets/core/combined_dataset.py) object is used to combine multiple datasets into one. It
105
-
accepts an iterable of `torch.utils.data.Dataset` and/or `torch.utils.data.IterableDataset` objects and returns an `Example`
106
-
object from one of the datasets, given an index. Conceptually, the `CombinedDataset` object is a concatenation of the
107
-
datasets in the input iterable, so the given index can be mapped to a specific dataset based on the size of the datasets.
108
-
As iterable-style datasets do not support random access, the examples from these datasets are returned in order as they
109
-
are iterated over.
166
+
The `CombinedDataset` object is used to combine multiple datasets into one. It accepts an iterable of `torch.utils.data.Dataset`
167
+
and/or `torch.utils.data.IterableDataset` objects and returns an `Example` object from one of the datasets, given an index.
168
+
Conceptually, the `CombinedDataset` object is a concatenation of the datasets in the input iterable, so the given index
169
+
can be mapped to a specific dataset based on the size of the datasets. As iterable-style datasets do not support random access,
170
+
the examples from these datasets are returned in order as they are iterated over.
110
171
111
172
The `CombinedDataset` object also adds a `dataset_index` attribute to the `Example` object, corresponding to the index of
112
173
the dataset in the input iterable. Every example returned by the `CombinedDataset` will have an `example_ids` attribute,
@@ -116,7 +177,7 @@ which is instance of `Example` containing the same keys/attributes as the origin
116
177
117
178
### Dataloading
118
179
When dealing with multiple datasets with different modalities, the default `collate_fn` of `torch.utils.data.DataLoader`
119
-
may not work, as it assumes that all examples have the same keys/attributes. In that case, the [`collate_example_list`](mmlearn/datasets/core/example.py)
180
+
may not work, as it assumes that all examples have the same keys/attributes. In that case, the `collate_example_list`
120
181
function can be used as the `collate_fn` argument of `torch.utils.data.DataLoader`. This function takes a list of `Example`
121
182
objects and returns a dictionary of tensors, with all the keys/attributes of the `Example` objects.
0 commit comments