Skip to content

Commit c7b5c4e

Browse files
authored
Hydra help and documentation update for file-based execution configuration (#31)
1 parent 5765e10 commit c7b5c4e

File tree

6 files changed

+159
-56
lines changed

6 files changed

+159
-56
lines changed

README.md

Lines changed: 86 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -49,69 +49,102 @@ autointent data.train_path=default-multiclass \
4949
seed=42
5050
```
5151

52-
Все опции (по группам):
52+
Все опции в виде yaml (показаны дефолтные значения):
53+
```yaml
54+
data:
55+
# Path to a json file with training data. Set to "default" to use banking77 data stored within the
56+
# autointent package.
57+
train_path: ???
58+
59+
# Path to a json file with test records. Skip this option if you want to use a random subset of the
60+
# training sample as test data.
61+
test_path: null
62+
63+
# Set to true if your data is multiclass but you want to train the multilabel classifier.
64+
force_multilabel: false
65+
66+
task:
67+
# Path to a yaml configuration file that defines the optimization search space.
68+
# Omit this to use the default configuration.
69+
search_space_path: null
70+
logs:
71+
# Name of the run prepended to optimization assets dirname (generated randomly if omitted)
72+
run_name: "awful_hippo_10-30-2024_19-42-12"
73+
74+
# Location where to save optimization logs that will be saved as `<logs_dir>/<run_name>_<cur_datetime>/logs.json`.
75+
# Omit to use current working directory. <-- on Windows it is not correct
76+
dirpath: "/home/user/AutoIntent/awful_hippo_10-30-2024_19-42-12"
77+
78+
dump_dir: "/home/user/AutoIntent/runs/awful_hippo_10-30-2024_19-42-12/modules_dumps"
79+
80+
vector_index:
81+
# Location where to save faiss database file. Omit to use your system's default cache directory.
82+
db_dir: null
83+
84+
# Specify device in torch notation
85+
device: cpu
86+
87+
augmentation:
88+
# Number of shots per intent to sample from regular expressions. This option extends sample utterance
89+
# within multiclass intent records.
90+
regex_sampling: 0
91+
92+
# Config string like "[20, 40, 20, 10]" means 20 one-label examples, 40 two-label examples, 20 three-label examples,
93+
# 10 four-label examples. This option extends multilabel utterance records.
94+
multilabel_generation_config: null
95+
96+
embedder:
97+
# batch size for embedding computation.
98+
batch_size: 1
99+
# sentence length limit for embedding computation
100+
max_length: null
101+
102+
#Affects the randomness
103+
seed: 0
104+
105+
# String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}. Omit to use ERROR by default.
106+
hydra.job_logging.root.level: "ERROR"
53107
```
54-
seed Affects the randomness
55108
56-
== task ==
57-
58-
search_space_path Path to a yaml configuration file that defines the
59-
optimization search space. Omit this to use the
60-
default configuration.
61-
62-
== data ==
63-
64-
train_path Path to a json file with training data. Set to
65-
"default" to use banking77 data stored within the
66-
autointent package.
67-
68-
test_path Path to a json file with test records. Skip this
69-
option if you want to use a random subset of the
70-
training sample as test data.
71-
72-
force_multilabel Set to true if your data is multiclass but you want to
73-
train the multilabel classifier.
74-
75-
== logs ==
76-
77-
dirpath Location where to save optimization logs that will be
78-
saved as `<logs_dir>/<run_name>_<cur_datetime>/logs.json`.
79-
Omit to use current working directory.
80-
81-
run_name Name of the run prepended to optimization assets dirname
82-
83-
log_level String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}.
84-
Omit to use ERROR by default.
85-
86-
== vector_index ==
87-
88-
db_dir Location where to save faiss database file. Omit to
89-
use your system's default cache directory.
90-
91-
device Specify device in torch notation
109+
### Как задавать конфигурационные опции
110+
* Вариант 1 - в коммандной строке в виде key=value. Пример:
111+
```bash
112+
autointent embedder.batch_size=32
113+
```
92114

93-
== augmentation ==
115+
* Вариант 2 - в конфигурационном yaml файле.
116+
Создайте в отдельной папке yaml файл со следующей структурой **my_config.yaml**:
117+
```yaml
118+
defaults:
119+
- optimization_config
120+
- _self_
121+
- override hydra/job_logging: custom
122+
123+
# put the configuration options you want to override here. The full structure is presented above.
124+
# Here is just an example with the same options as for the command line variant above.
125+
embedder:
126+
embedder_batch_size: 32
127+
```
128+
Запускаем AutoIntent:
129+
```bash
130+
autointent --config-path=/path/to/config/directory --config-name=my_config
131+
```
94132

95-
regex_sampling Number of shots per intent to sample from regular
96-
expressions. This option extends sample utterances
97-
within multiclass intent records.
133+
Важно:
134+
* указывайте полный путь в опции config-path.
135+
* не используйте tab в yaml файле.
136+
* желательно чтобы имя файла отличалось от
137+
optimization_config.yaml, чтобы избежать warnings от hydra
98138

99-
seed Affects the data partitioning
139+
Вы можете использовать комбинацию Варианта 1 и 2. Опции из коммандной строки имеют наивысший приоритет.
100140

101-
hydra.job_logging.root.level
102-
String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}.
103-
Omit to use ERROR by default.
104141

105-
multilabel_generation_config
106-
Config string like "[20, 40, 20, 10]" means 20 one-
107-
label examples, 40 two-label examples, 20 three-label
108-
examples, 10 four-label examples. This option extends
109-
multilabel utterance records.
110-
```
111142

112143
Вместе с пакетом предоставляются дефолтные конфиг и данные (5-shot banking77 / 20-shot dstc3).
113144

114-
Пример входных данных в директории `data/intent_records`.
145+
Примеры:
146+
- примеры входных данных: [data](./data)
147+
- примеры конфигов: [example_configs](./example_configs)
115148

116149
### Инференс
117150

autointent/configs/optimization_cli.py

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66
from hydra.core.config_store import ConfigStore
77
from omegaconf import MISSING
88

9-
from autointent.custom_types import LogLevel
109
from autointent.pipeline.optimization.utils import generate_name
1110

1211

@@ -28,7 +27,6 @@ class TaskConfig:
2827
class LoggingConfig:
2928
run_name: str | None = None
3029
dirpath: Path | None = None
31-
level: LogLevel = LogLevel.ERROR
3230
dump_dir: Path | None = None
3331

3432
def __post_init__(self) -> None:
@@ -84,7 +82,11 @@ class OptimizationConfig:
8482
embedder: EmbedderConfig = field(default_factory=EmbedderConfig)
8583

8684
defaults: list[Any] = field(
87-
default_factory=lambda: ["_self_", {"override hydra/job_logging": "autointent_standard_job_logger"}]
85+
default_factory=lambda: [
86+
"_self_",
87+
{"override hydra/job_logging": "autointent_standard_job_logger"},
88+
{"override hydra/help": "autointent_help"},
89+
]
8890
)
8991

9092

@@ -107,7 +109,29 @@ class OptimizationConfig:
107109
"disable_existing_loggers": "false",
108110
}
109111

112+
help_config = {
113+
"app_name": "AutoIntent",
114+
"header": "== ${hydra.help.app_name} ==",
115+
"footer": """
116+
Powered by Hydra (https://hydra.cc)
117+
Use --hydra-help to view Hydra specific help""",
118+
"template": """
119+
${hydra.help.header}
120+
121+
This is ${hydra.help.app_name}!
122+
== Config ==
123+
This is the config generated for this run.
124+
You can override everything, for example:
125+
python my_app.py db.user=foo db.pass=bar
126+
-------
127+
$CONFIG
128+
-------
129+
130+
${hydra.help.footer}""",
131+
}
132+
110133

111134
cs = ConfigStore.instance()
112135
cs.store(name="optimization_config", node=OptimizationConfig)
113136
cs.store(name="autointent_standard_job_logger", group="hydra/job_logging", node=logger_config)
137+
cs.store(name="autointent_help", group="hydra/help", node=help_config)

example_configs/example_1.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
defaults:
2+
- optimization_config
3+
- _self_
4+
5+
data:
6+
train_path: "default-multilabel"
7+
8+
hydra:
9+
job_logging:
10+
root:
11+
level: "INFO"

example_configs/example_2.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
defaults:
2+
- optimization_config
3+
- _self_
4+
5+
data:
6+
train_path: "data/intent_records/ac_robotic_new.json"
7+
force_multilabel: true
8+
9+
logs:
10+
dirpath: "experiments/multiclass_as_multilabel/"
11+
run_name: "robotics_new_testing"
12+
13+
augmentation:
14+
regex_sampling: 10
15+
multilabel_generation_config: "[0, 4000, 1000]"

example_configs/example_3.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
defaults:
2+
- optimization_config
3+
- _self_
4+
5+
data:
6+
train_path: "data/intent_records/ac_robotic_new.json"
7+
test_path: "data/intent_records/ac_robotic_val.json"
8+
force_multilabel: true
9+
10+
augmentation:
11+
regex_sampling: 20

example_configs/example_4.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
defaults:
2+
- optimization_config
3+
- _self_
4+
5+
data:
6+
train_path: "default-multiclass"
7+
test_path: "data/intent_records/banking77_test.json"
8+
9+
seed: 42

0 commit comments

Comments
 (0)