You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+35-14Lines changed: 35 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,8 +28,8 @@ _Suggestions are always welcome!_
28
28
29
29
**Why you might want to use it:**
30
30
31
-
✅ Speed <br>
32
-
Rapidly iterate over models, datasets, tasksand experiments on different accelerators like multi-GPUs or TPUs.
31
+
✅ Save on boilerplate <br>
32
+
Easily add new models, datasets, tasks, experiments, and train on different accelerators, like multi-GPU, TPU or SLURM clusters.
33
33
34
34
✅ Education <br>
35
35
Thoroughly commented. You can use this repo as a learning resource.
@@ -46,7 +46,10 @@ Lightning and Hydra are still evolving and integrate many libraries, which means
46
46
Template is not really adjusted for building data pipelines that depend on each other. It's more efficient to use it for model prototyping on ready-to-use data.
47
47
48
48
❌ Overfitted to simple use case <br>
49
-
The configuration setup is built with simple lightning training in mind. You might need to put some effort to adjust it for different use cases, e.g. lightning lite.
49
+
The configuration setup is built with simple lightning training in mind. You might need to put some effort to adjust it for different use cases, e.g. lightning fabric.
50
+
51
+
❌ Might not support your workflow <br>
52
+
For example, you can't resume hydra-based multirun or hyperparameter search.
50
53
51
54
> **Note**: _Keep in mind this is unofficial community project._
52
55
@@ -319,9 +322,6 @@ python train.py debug=overfit
319
322
# raise exception if there are any numerical anomalies in tensors, like NaN or +/-inf
> **Note**: Apply pre-commit hooks to do things like auto-formatting code and configs, performing code analysis or removing output from jupyter notebooks. See [# Best Practices](#best-practices) for more.
437
437
438
+
Update pre-commit hook versions in `.pre-commit-config.yaml` with:
439
+
440
+
```bash
441
+
pre-commit autoupdate
442
+
```
443
+
438
444
</details>
439
445
440
446
<details>
@@ -818,7 +824,7 @@ You can use different optimization frameworks integrated with Hydra, like [Optun
818
824
819
825
The `optimization_results.yaml` will be available under `logs/task_name/multirun` folder.
820
826
821
-
This approach doesn't support advanced techniques like prunning - for more sophisticated search, you should probably write a dedicated optimization task (without multirun feature).
827
+
This approach doesn't support resuming interrupted search and advanced techniques like prunning - for more sophisticated search and workflows, you should probably write a dedicated optimization task (without multirun feature).
822
828
823
829
<br>
824
830
@@ -889,10 +895,13 @@ def on_train_start(self):
889
895
## Best Practices
890
896
891
897
<details>
892
-
<summary><b>Use Miniconda for GPU environments</b></summary>
898
+
<summary><b>Use Miniconda</b></summary>
899
+
900
+
It's usually unnecessary to install full anaconda environment, miniconda should be enough (weights around 80MB).
901
+
902
+
Big advantage of conda is that it allows for installing packages without requiring certain compilers or libraries to be available in the system (since it installs precompiled binaries), so it often makes it easier to install some dependencies e.g. cudatoolkit for GPU support.
893
903
894
-
It's usually unnecessary to install full anaconda environment, miniconda should be enough.
895
-
It often makes it easier to install some dependencies, like cudatoolkit for GPU support. It also allows you to access your environments globally.
904
+
It also allows you to access your environments globally which might be more convenient than creating new local environment for every project.
# open the Aim UI with the following command (run in the folder containing the `.aim` folder):
7
+
# `aim up`
8
+
9
+
aim:
10
+
_target_: aim.pytorch_lightning.AimLogger
11
+
repo: ${paths.root_dir} # .aim folder will be created here
12
+
# repo: "aim://ip_address:port" # can instead provide IP address pointing to Aim remote tracking server which manages the repo, see https://aimstack.readthedocs.io/en/latest/using/remote_tracking.html#
13
+
14
+
# aim allows to group runs under experiment name
15
+
experiment: null # any string, set to "default" if not specified
16
+
17
+
train_metric_prefix: "train/"
18
+
val_metric_prefix: "val/"
19
+
test_metric_prefix: "test/"
20
+
21
+
# sets the tracking interval in seconds for system usage metrics (CPU, GPU, memory, etc.)
22
+
system_tracking_interval: 10# set to null to disable system metrics tracking
23
+
24
+
# enable/disable logging of system params such as installed packages, git info, env vars, etc.
25
+
log_system_params: true
26
+
27
+
# enable/disable tracking console logs (default value is true)
28
+
capture_terminal_logs: false # set to false to avoid infinite console log loop issue https://github.com/aimhubio/aim/issues/2550
0 commit comments