Skip to content

Commit e1753f5

Browse files
authored
Docs/automl tutorials (#163)
* update pipeline tutorials * minor fix * update tutorials
1 parent 0a3024e commit e1753f5

File tree

7 files changed

+105
-15
lines changed

7 files changed

+105
-15
lines changed

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@
160160
nbsphinx_prolog = """
161161
:tutorial_name: {{ env.docname }}
162162
"""
163-
nbsphinx_execute = "never"
163+
# nbsphinx_execute = "never"
164164
nbsphinx_thumbnails = {
165165
"user_guides/*": "_static/square-white.svg",
166166
}
Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# %% [markdown]
22
"""
3-
# Search Space Configuration
3+
# AutoML Customization
44
55
In this guide, you will learn how to configure a custom hyperparameter search space.
66
"""
@@ -29,15 +29,23 @@
2929

3030
# %% [markdown]
3131
"""
32-
The ``module_name`` field specifies the name of the module. You can find the names, for example, in...
32+
The ``module_name`` field specifies the name of the module. You can explore the available names by yourself:
33+
"""
34+
35+
# %%
36+
from autointent.modules import SCORING_MODULES, DECISION_MODULES, EMBEDDING_MODULES, REGEX_MODULES
3337

34-
TODO: _Add docs for all available modules._
38+
print(list(SCORING_MODULES.keys()))
39+
print(list(DECISION_MODULES.keys()))
40+
print(list(EMBEDDING_MODULES.keys()))
41+
print(list(REGEX_MODULES.keys()))
3542

43+
# %% [markdown]
44+
"""
3645
All fields except ``module_name`` are lists that define the search space for each hyperparameter (see %mddoclink(class,modules.scoring,KNNScorer)). If you omit them, the default set of hyperparameters will be used:
3746
"""
3847

3948
# %%
40-
4149
linear_module = {"module_name": "linear"}
4250

4351
# %% [markdown]
@@ -110,7 +118,6 @@
110118
"""
111119

112120
# %%
113-
114121
from autointent import Dataset
115122

116123
dataset = Dataset.from_hub("AutoIntent/clinc150_subset")
@@ -124,7 +131,23 @@
124131
from autointent import Pipeline
125132

126133
pipeline_optimizer = Pipeline.from_search_space(search_space)
127-
pipeline_optimizer.fit(dataset)
134+
pipeline_optimizer.fit(dataset, sampler="random")
135+
136+
# %% [markdown]
137+
"""
138+
There are three hyperparameter tuning samplers available:
139+
140+
- "random"
141+
- "brute"
142+
- "tpe"
143+
144+
All the samplers are implemented with ![optuna](https://optuna.org/).
145+
"""
146+
147+
# %% [markdown]
148+
"""
149+
One can use more versatile %mddoclink(class,,OptimizationConfig) and %mddoclink(method,Pipeline,from_optimization_config).
150+
"""
128151

129152
# %% [markdown]
130153
"""

user_guides/advanced/03_caching.py

Lines changed: 0 additions & 6 deletions
This file was deleted.

user_guides/basic_usage/03_automl.py

Lines changed: 74 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,42 @@
7474
logging_config = LoggingConfig(project_dir=Path.cwd() / "runs", dump_modules=False, clear_ram=False)
7575
custom_pipeline.set_config(logging_config)
7676

77+
# %% [markdown]
78+
"""
79+
## Default Transformers
80+
81+
One can specify what embedding model and cross-encoder model want to use along with default settings:
82+
"""
83+
84+
# %%
85+
from autointent.configs import EmbedderConfig, CrossEncoderConfig
86+
87+
custom_pipeline.set_config(EmbedderConfig(model_name="prajjwal1/bert-tiny", device="cpu"))
88+
custom_pipeline.set_config(CrossEncoderConfig(model_name="cross-encoder/ms-marco-MiniLM-L2-v2", max_length=8))
89+
90+
# %% [markdown]
91+
"""
92+
See the docs for %mddoclink(class,configs,EmbedderConfig) and %mddoclink(class,configs,CrossEncoderConfig) for options available to customize.
93+
"""
94+
95+
# %% [markdown]
96+
"""
97+
## Cross-Validation vs Hold-Out Validation
98+
99+
If you have lots of training and evaluation data, you can use default hold-out validation strategy. If not, you can choose cross-validation and spend a little more time but utilize the full amount of available data for better hyperparameter tuning.
100+
101+
This behavior is controlled with %mddoclink(class,configs,DataConfig):
102+
"""
103+
104+
# %%
105+
from autointent.configs import DataConfig
106+
custom_pipeline.set_config(DataConfig(scheme="cv", n_folds=3))
107+
108+
# %% [markdown]
109+
"""
110+
See the docs for %mddoclink(class,configs,DataConfig) for other options available to customize.
111+
"""
112+
77113
# %% [markdown]
78114
"""
79115
## Complete Example
@@ -99,7 +135,43 @@
99135
custom_pipeline.set_config(logging_config)
100136

101137
# start auto-configuration
102-
custom_pipeline.fit(dataset)
138+
context = custom_pipeline.fit(dataset)
103139

104-
# inference
140+
# inference on-the-fly
105141
custom_pipeline.predict(["hello world!"])
142+
143+
# %% [markdown]
144+
"""
145+
## Dump Results
146+
147+
One can save all results of auto-configuration process to file system (to ``LoggingConfig.dirpath``):
148+
"""
149+
150+
# %%
151+
context.dump()
152+
153+
# %% [markdown]
154+
"""
155+
Or one can dump only the configured pipeline to any desired location (by default ``LoggingConfig.dirpath``):
156+
"""
157+
158+
# %%
159+
custom_pipeline.dump()
160+
161+
# %% [markdown]
162+
"""
163+
## Load Pipeline for Inference
164+
"""
165+
166+
# %%
167+
loaded_pipe = Pipeline.load(logging_config.dirpath)
168+
169+
# %% [markdown]
170+
"""
171+
Since this notebook is launched automatically while building the docs, we will clean the space if you don't mind :)
172+
"""
173+
174+
# %%
175+
import shutil
176+
177+
shutil.rmtree(logging_config.dirpath)

user_guides/basic_usage/04_inference.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@
8080
# %%
8181
context = pipeline.fit(dataset)
8282
context.dump()
83+
# or pipeline.dump() to save only configured pipeline but not all the optimization assets
8384

8485
# %% [markdown]
8586
"""

0 commit comments

Comments
 (0)