|
1 | 1 | Search Space Configuration |
2 | 2 | ========================== |
3 | 3 |
|
4 | | -В этом гайде вы узнаете как настраивать кастомное пространство поиска гипепараметров. |
| 4 | +In this guide, you will learn how to configure a custom hyperparameter search space. |
| 5 | + |
| 6 | +Python API |
| 7 | +########## |
| 8 | + |
| 9 | +.. note:: |
| 10 | + |
| 11 | + Before reading this guide, we recommend familiarizing yourself with the sections :doc:`../concepts` and :doc:`../learn/optimization`. |
| 12 | + |
| 13 | +Optimization Module |
| 14 | +------------------- |
| 15 | + |
| 16 | +To set up the optimization module, you need to create the following dictionary: |
| 17 | + |
| 18 | +.. code-block:: python |
| 19 | +
|
| 20 | + knn_module = { |
| 21 | + "module_type": "knn", |
| 22 | + "k": [1, 5, 10, 50], |
| 23 | + "embedder_name": [ |
| 24 | + "avsolatorio/GIST-small-Embedding-v0", |
| 25 | + "infgrad/stella-base-en-v2" |
| 26 | + ] |
| 27 | + } |
| 28 | +
|
| 29 | +The ``module_type`` field specifies the name of the module. You can find the names, for example, in :py:data:`autointent.modules.SCORING_MODULES_MULTICLASS`. |
| 30 | + |
| 31 | +All fields except ``module_type`` are lists that define the search space for each hyperparameter. If you omit them, the default set of hyperparameters will be used during auto-configuration: |
| 32 | + |
| 33 | +.. code-block:: python |
| 34 | +
|
| 35 | + linear_module = {"module_type": "linear"} |
| 36 | +
|
| 37 | +Optimization Node |
| 38 | +----------------- |
| 39 | + |
| 40 | +To set up the optimization node, you need to create a list of modules and specify the metric for optimization: |
| 41 | + |
| 42 | +.. code-block:: python |
| 43 | +
|
| 44 | + scoring_node = { |
| 45 | + "node_type": "scoring", |
| 46 | + "metric_name": "scoring_roc_auc", |
| 47 | + "search_space": [ |
| 48 | + knn_module, |
| 49 | + linear_module, |
| 50 | + ] |
| 51 | + } |
| 52 | +
|
| 53 | +Search Space |
| 54 | +------------ |
| 55 | + |
| 56 | +The search space for the entire pipeline looks approximately like this: |
| 57 | + |
| 58 | +.. code-block:: python |
| 59 | +
|
| 60 | + search_space = [ |
| 61 | + { |
| 62 | + "node_type": "retrieval", |
| 63 | + "metric": "retrieval_hit_rate", |
| 64 | + "search_space": [ |
| 65 | + { |
| 66 | + "module_type": "vector_db", |
| 67 | + "k": [10], |
| 68 | + "embedder_name": [ |
| 69 | + "avsolatorio/GIST-small-Embedding-v0", |
| 70 | + "infgrad/stella-base-en-v2" |
| 71 | + ] |
| 72 | + } |
| 73 | + ] |
| 74 | + }, |
| 75 | + { |
| 76 | + "node_type": "scoring", |
| 77 | + "metric": "scoring_roc_auc", |
| 78 | + "search_space": [ |
| 79 | + { |
| 80 | + "module_type": "knn", |
| 81 | + "k": [1, 3, 5, 10], |
| 82 | + "weights": ["uniform", "distance", "closest"] |
| 83 | + }, |
| 84 | + { |
| 85 | + "module_type": "linear" |
| 86 | + }, |
| 87 | + { |
| 88 | + "module_type": "dnnc", |
| 89 | + "cross_encoder_name": [ |
| 90 | + "BAAI/bge-reranker-base", |
| 91 | + "cross-encoder/ms-marco-MiniLM-L-6-v2" |
| 92 | + ], |
| 93 | + "k": [1, 3, 5, 10] |
| 94 | + } |
| 95 | + ] |
| 96 | + }, |
| 97 | + { |
| 98 | + "node_type": "prediction", |
| 99 | + "metric": "prediction_accuracy", |
| 100 | + "search_space": [ |
| 101 | + { |
| 102 | + "module_type": "threshold", |
| 103 | + "thresh": [0.5] |
| 104 | + }, |
| 105 | + { |
| 106 | + "module_type": "argmax" |
| 107 | + } |
| 108 | + ] |
| 109 | + } |
| 110 | + ] |
| 111 | +
|
| 112 | +Start Auto Configuration |
| 113 | +------------------------ |
| 114 | + |
| 115 | +.. code-block:: python |
| 116 | +
|
| 117 | + from autointent.pipeline import PipelineOptimizer |
| 118 | +
|
| 119 | + pipeline_optimizer = PipelineOptimizer.from_dict(search_space) |
| 120 | + pipeline_optimizer.fit(dataset) |
| 121 | +
|
| 122 | +CLI |
| 123 | +### |
| 124 | + |
| 125 | +Yaml Format |
| 126 | +----------- |
| 127 | + |
| 128 | +YAML (YAML Ain't Markup Language) is a human-readable data serialization standard that is often used for configuration files and data exchange between languages with different data structures. It serves similar purposes as JSON but is much easier to read. |
| 129 | + |
| 130 | +Here's an example YAML file: |
| 131 | + |
| 132 | +.. code-block:: yaml |
| 133 | +
|
| 134 | + database: |
| 135 | + host: localhost |
| 136 | + port: 5432 |
| 137 | + username: admin |
| 138 | + # this is a comment |
| 139 | + password: secret |
| 140 | +
|
| 141 | + counts: |
| 142 | + - 10 |
| 143 | + - 20 |
| 144 | + - 30 |
| 145 | +
|
| 146 | + literal_counts: [10, 20, 30] |
| 147 | +
|
| 148 | + users: |
| 149 | + - name: Alice |
| 150 | + age: 30 |
| 151 | + |
| 152 | + - name: Bob |
| 153 | + age: 25 |
| 154 | + |
| 155 | +
|
| 156 | + settings: |
| 157 | + debug: true |
| 158 | + timeout: 30 |
| 159 | +
|
| 160 | +Explanation: |
| 161 | + |
| 162 | +- the whole file represents a dictionary with keys ``database``, ``counts``, ``users``, ``settings``, ``debug``, ``timeout`` |
| 163 | +- ``database`` itself is a dictionary with keys ``host``, ``port``, and so on |
| 164 | +- ``counts`` is a list (Python ``[10, 20, 30]``) |
| 165 | +- ``literal_counts`` is a list too |
| 166 | +- ``users`` is a list of dictionaries |
| 167 | + |
| 168 | +Start Auto Configuration |
| 169 | +------------------------ |
| 170 | + |
| 171 | +To set up the search space for optimization from the command line, you need to... |
0 commit comments