@@ -49,69 +49,102 @@ autointent data.train_path=default-multiclass \
4949 seed=42
5050```
5151
52- Все опции (по группам):
52+ Все опции в виде yaml (показаны дефолтные значения):
53+ ``` yaml
54+ data :
55+ # Path to a json file with training data. Set to "default" to use banking77 data stored within the
56+ # autointent package.
57+ train_path : ???
58+
59+ # Path to a json file with test records. Skip this option if you want to use a random subset of the
60+ # training sample as test data.
61+ test_path : null
62+
63+ # Set to true if your data is multiclass but you want to train the multilabel classifier.
64+ force_multilabel : false
65+
66+ task :
67+ # Path to a yaml configuration file that defines the optimization search space.
68+ # Omit this to use the default configuration.
69+ search_space_path : null
70+ logs :
71+ # Name of the run prepended to optimization assets dirname (generated randomly if omitted)
72+ run_name : " awful_hippo_10-30-2024_19-42-12"
73+
74+ # Location where to save optimization logs that will be saved as `<logs_dir>/<run_name>_<cur_datetime>/logs.json`.
75+ # Omit to use current working directory. <-- on Windows it is not correct
76+ dirpath : " /home/user/AutoIntent/awful_hippo_10-30-2024_19-42-12"
77+
78+ dump_dir : " /home/user/AutoIntent/runs/awful_hippo_10-30-2024_19-42-12/modules_dumps"
79+
80+ vector_index :
81+ # Location where to save faiss database file. Omit to use your system's default cache directory.
82+ db_dir : null
83+
84+ # Specify device in torch notation
85+ device : cpu
86+
87+ augmentation :
88+ # Number of shots per intent to sample from regular expressions. This option extends sample utterance
89+ # within multiclass intent records.
90+ regex_sampling : 0
91+
92+ # Config string like "[20, 40, 20, 10]" means 20 one-label examples, 40 two-label examples, 20 three-label examples,
93+ # 10 four-label examples. This option extends multilabel utterance records.
94+ multilabel_generation_config : null
95+
96+ embedder :
97+ # batch size for embedding computation.
98+ batch_size : 1
99+ # sentence length limit for embedding computation
100+ max_length : null
101+
102+ # Affects the randomness
103+ seed : 0
104+
105+ # String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}. Omit to use ERROR by default.
106+ hydra.job_logging.root.level : " ERROR"
53107` ` `
54- seed Affects the randomness
55108
56- == task ==
57-
58- search_space_path Path to a yaml configuration file that defines the
59- optimization search space. Omit this to use the
60- default configuration.
61-
62- == data ==
63-
64- train_path Path to a json file with training data. Set to
65- "default" to use banking77 data stored within the
66- autointent package.
67-
68- test_path Path to a json file with test records. Skip this
69- option if you want to use a random subset of the
70- training sample as test data.
71-
72- force_multilabel Set to true if your data is multiclass but you want to
73- train the multilabel classifier.
74-
75- == logs ==
76-
77- dirpath Location where to save optimization logs that will be
78- saved as `<logs_dir>/<run_name>_<cur_datetime>/logs.json`.
79- Omit to use current working directory.
80-
81- run_name Name of the run prepended to optimization assets dirname
82-
83- log_level String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}.
84- Omit to use ERROR by default.
85-
86- == vector_index ==
87-
88- db_dir Location where to save faiss database file. Omit to
89- use your system's default cache directory.
90-
91- device Specify device in torch notation
109+ ### Как задавать конфигурационные опции
110+ * Вариант 1 - в коммандной строке в виде key=value. Пример:
111+ ` ` ` bash
112+ autointent embedder.batch_size=32
113+ ```
92114
93- == augmentation ==
115+ * Вариант 2 - в конфигурационном yaml файле.
116+ Создайте в отдельной папке yaml файл со следующей структурой ** my_config.yaml** :
117+ ``` yaml
118+ defaults :
119+ - optimization_config
120+ - _self_
121+ - override hydra/job_logging : custom
122+
123+ # put the configuration options you want to override here. The full structure is presented above.
124+ # Here is just an example with the same options as for the command line variant above.
125+ embedder :
126+ embedder_batch_size : 32
127+ ` ` `
128+ Запускаем AutoIntent:
129+ ` ` ` bash
130+ autointent --config-path=/path/to/config/directory --config-name=my_config
131+ ```
94132
95- regex_sampling Number of shots per intent to sample from regular
96- expressions. This option extends sample utterances
97- within multiclass intent records.
133+ Важно:
134+ * указывайте полный путь в опции config-path.
135+ * не используйте tab в yaml файле.
136+ * желательно чтобы имя файла отличалось от
137+ optimization_config.yaml, чтобы избежать warnings от hydra
98138
99- seed Affects the data partitioning
139+ Вы можете использовать комбинацию Варианта 1 и 2. Опции из коммандной строки имеют наивысший приоритет.
100140
101- hydra.job_logging.root.level
102- String from {DEBUG,INFO,WARNING,ERROR,CRITICAL}.
103- Omit to use ERROR by default.
104141
105- multilabel_generation_config
106- Config string like "[20, 40, 20, 10]" means 20 one-
107- label examples, 40 two-label examples, 20 three-label
108- examples, 10 four-label examples. This option extends
109- multilabel utterance records.
110- ```
111142
112143Вместе с пакетом предоставляются дефолтные конфиг и данные (5-shot banking77 / 20-shot dstc3).
113144
114- Пример входных данных в директории ` data/intent_records ` .
145+ Примеры:
146+ - примеры входных данных: [ data] ( ./data )
147+ - примеры конфигов: [ example_configs] ( ./example_configs )
115148
116149### Инференс
117150
0 commit comments