deeppavlov
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 34 additions & 24 deletions b/‎CONTRIBUTING.md‎
Lines changed: 34 additions & 24 deletions
diff --git a/‎Makefile‎
Lines changed: 4 additions & 3 deletions b/‎Makefile‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎assets/classification_pipeline.png‎
-134 KB b/‎assets/classification_pipeline.png‎
-134 KB
diff --git a/‎assets/dependency-graph.png‎
-73.4 KB b/‎assets/dependency-graph.png‎
-73.4 KB
diff --git a/‎autointent/_callbacks/wandb.py‎
Lines changed: 11 additions & 1 deletion b/‎autointent/_callbacks/wandb.py‎
Lines changed: 11 additions & 1 deletion
diff --git a/‎autointent/_wrappers/embedder.py‎
Lines changed: 7 additions & 2 deletions b/‎autointent/_wrappers/embedder.py‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎autointent/configs/__init__.py‎
Lines changed: 9 additions & 1 deletion b/‎autointent/configs/__init__.py‎
Lines changed: 9 additions & 1 deletion
diff --git a/‎autointent/configs/_transformers.py‎
Lines changed: 33 additions & 16 deletions b/‎autointent/configs/_transformers.py‎
Lines changed: 33 additions & 16 deletions
diff --git a/‎autointent/context/data_handler/_stratification.py‎
Lines changed: 10 additions & 1 deletion b/‎autointent/context/data_handler/_stratification.py‎
Lines changed: 10 additions & 1 deletion
diff --git a/‎autointent/context/optimization_info/_optimization_info.py‎
Lines changed: 0 additions & 1 deletion b/‎autointent/context/optimization_info/_optimization_info.py‎
Lines changed: 0 additions & 1 deletion
@@ -1,63 +1,73 @@
 # Contribute to AutoIntent
 
-## Минимальная конфигурация
+## Minimum Configuration
 
-Мы используем `poetry` в качесте менеджера зависимостей и упаковщика.
+We use `poetry` as our dependency manager and packager.
 
-1. Установить `poetry`. Советуем обратиться к разделу официальной документации [Installation with the official installer](https://python-poetry.org/docs/#installing-with-the-official-installer). Если кратко, то достаточно просто запустить команду:
+1. Install `poetry`. We recommend referring to the official documentation section [Installation with the official installer](https://python-poetry.org/docs/#installing-with-the-official-installer). In short, you just need to run:
 ```bash
 curl -sSL https://install.python-poetry.org | python3 -
 ```
 
-2. Склонировать проект, перейти в корень
+2. Clone the project and navigate to the root directory
 
-3. Установить проект со всеми зависимостями:
+3. Install the project with all dependencies:
 ```bash
 make install
 ```
 
-## Дополнительно
+## Additional Setup
 
-Чтобы удобнее трекать ошибки в кодстайле, советуем установить расширение ruff для IDE. Например, для VSCode
+To make it easier to track code style errors, we recommend installing the ruff extension for your IDE. For example, for VSCode:
 ```
 https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff
 ```
-С этим расширением ошибки в кодстайле будут подчеркиваться прямо в редакторе.
+With this extension, code style errors will be underlined directly in the editor.
 
-В корень проекта добавлен файл `.vscode/settings.json`, который указывает расширению путь к конфигу линтера.
+A `.vscode/settings.json` file has been added to the project root, which points the extension to the linter configuration.
 
 ## Contribute
 
-1. Создать ветку, в которой вы будете работать. Чтобы остальным было проще понимать характер вашего контрибьюта, нужно давать краткие, но понятные названия начинающиеся. Советем начинать названия на `feat/` для веток с новыми фичами, `fix/` для исправления багов, `refactor/` для рефакторинга, `test/` для добавления тестов.
+1. Create a branch for your work. To make it easier for others to understand the nature of your contribution, use brief but clear names. We recommend starting branch names with `feat/` for new features, `fix/` for bug fixes, `refactor/` for refactoring, and `test/` for adding tests.
 
-2. Коммит, коммит, коммит, коммит
+2. Commit, commit, commit, commit
 
-3. Если есть новые фичи, желательно добавить для них тесты в директорию [tests](./tests).
+3. If there are new features, it's advisable to add tests for them in the [tests](./tests) directory.
 
-4. Проверить, что внесенные изменения не ломают имеющиеся фичи
+4. You can open a PR!
+
+Every commit in any PR triggers github actions with automated tests. All checks block merging into the main branch (with rare exceptions).
+
+Sometimes waiting for CI can be long, and sometimes it's more convenient to run individual tests:
+- Check that your changes don't break existing features
 ```bash
 make test
 ```
-
-5. Проверить кодстайл
+Or run a specific test (using `test_bert.py` as an example):
+```bash
+poetry run pytest tests/modules/scoring/test_bert.py
+```
+- Check code style (it also applies formatter)
 ```bash
 make lint
 ```
+- Check type hints:
+```bash
+make typing
+```
+Note: If mypy shows different errors locally compared to github actions, you should update your local dependencies:
+```bash
+make update
+```
 
-6. Ура, можно открывать Pull Request!
-
-## Устройство проекта
-
-![](assets/dependency-graph.png)
-
-## Построение документации
+## Building Documentation
 
-Построить html версию в папке `docs/build`:
+Build the HTML version in the `docs/build` folder:
 ```bash
 make docs
 ```
 
-Построить html версию и захостить локально:
+Build the HTML version and host it locally:
 ```bash
 make serve-docs
 ```
@@ -22,9 +22,10 @@ lint:
 	$(poetry) ruff format
 	$(poetry) ruff check --fix
 
-.PHONY: sync
-sync:
-	poetry sync --extras "dev test typing docs"
+.PHONY: update
+update:
+	rm -f poetry.lock
+	poetry install --extras "dev test typing docs"
 
 .PHONY: docs
 docs:
 
@@ -89,6 +89,11 @@ def log_metrics(self, metrics: dict[str, Any]) -> None:
         """
         self.wandb.log(metrics)
 
+    def _close_current_run(self) -> None:
+        """Close the current W&B run if open."""
+        if self.wandb.run is not None:
+            self.wandb.finish()
+
     def log_final_metrics(self, metrics: dict[str, Any]) -> None:
         """Logs final evaluation metrics to W&B.
 
@@ -97,6 +102,8 @@ def log_final_metrics(self, metrics: dict[str, Any]) -> None:
         Args:
             metrics: A dictionary of final performance metrics.
         """
+        self._close_current_run()
+
         wandb_run_init_args = {
             "project": self.project_name,
             "group": self.group,
@@ -105,11 +112,14 @@ def log_final_metrics(self, metrics: dict[str, Any]) -> None:
         }
 
         try:
-            self.wandb.init(config=metrics, **wandb_run_init_args)
+            config = metrics["configs"]
+            self.wandb.init(config=config, **wandb_run_init_args)
+            self.wandb.log(metrics)
         except Exception as e:
             if "run config cannot exceed" not in str(e):
                 # https://github.com/deeppavlov/AutoIntent/issues/202
                 raise
+            self._close_current_run()
             logger.warning("W&B run config is too large, skipping logging modules configs")
             logger.warning("'final_metrics' will be logged to W&B with pipeline_metrics only")
             logger.warning("If you want to access modules configs in future, address to the individual modules runs")
 
@@ -188,10 +188,14 @@ def embed(self, utterances: list[str], task_type: TaskTypeEnum | None = None) ->
         Returns:
             A numpy array of embeddings.
         """
+        prompt = self.config.get_prompt(task_type)
+
         if self.config.use_cache:
             hasher = Hasher()
             hasher.update(self)
             hasher.update(utterances)
+            if prompt:
+                hasher.update(prompt)
 
             embeddings_path = _get_embeddings_path(hasher.hexdigest())
             if embeddings_path.exists():
@@ -200,11 +204,12 @@ def embed(self, utterances: list[str], task_type: TaskTypeEnum | None = None) ->
         self._load_model()
 
         logger.debug(
-            "Calculating embeddings with model %s, batch_size=%d, max_seq_length=%s, embedder_device=%s",
+            "Calculating embeddings with model %s, batch_size=%d, max_seq_length=%s, embedder_device=%s, prompt=%s",
             self.config.model_name,
             self.config.batch_size,
             str(self.config.tokenizer_config.max_length),
             self.config.device,
+            prompt,
         )
 
         if self.config.tokenizer_config.max_length is not None:
@@ -215,7 +220,7 @@ def embed(self, utterances: list[str], task_type: TaskTypeEnum | None = None) ->
             convert_to_numpy=True,
             batch_size=self.config.batch_size,
             normalize_embeddings=True,
-            prompt=self.config.get_prompt_type(task_type),
+            prompt=prompt,
         )
 
         if self.config.use_cache:
 
@@ -2,11 +2,19 @@
 
 from ._inference_node import InferenceNodeConfig
 from ._optimization import DataConfig, LoggingConfig
-from ._transformers import CrossEncoderConfig, EmbedderConfig, HFModelConfig, TaskTypeEnum, TokenizerConfig
+from ._transformers import (
+    CrossEncoderConfig,
+    EarlyStoppingConfig,
+    EmbedderConfig,
+    HFModelConfig,
+    TaskTypeEnum,
+    TokenizerConfig,
+)
 
 __all__ = [
     "CrossEncoderConfig",
     "DataConfig",
+    "EarlyStoppingConfig",
     "EmbedderConfig",
     "HFModelConfig",
     "InferenceNodeConfig",
 
@@ -2,7 +2,10 @@
 from typing import Any, Literal
 
 from pydantic import BaseModel, ConfigDict, Field, PositiveInt
-from typing_extensions import Self, assert_never
+from typing_extensions import Self
+
+from autointent.custom_types import FloatFromZeroToOne
+from autointent.metrics import SCORING_METRICS_MULTICLASS, SCORING_METRICS_MULTILABEL
 
 
 class TokenizerConfig(BaseModel):
@@ -56,7 +59,7 @@ class EmbedderConfig(HFModelConfig):
     default_prompt: str | None = Field(
         None, description="Default prompt for the model. This is used when no task specific prompt is not provided."
     )
-    classifier_prompt: str | None = Field(None, description="Prompt for classifier.")
+    classification_prompt: str | None = Field(None, description="Prompt for classifier.")
     cluster_prompt: str | None = Field(None, description="Prompt for clustering.")
     sts_prompt: str | None = Field(None, description="Prompt for finding most similar sentences.")
     query_prompt: str | None = Field(None, description="Prompt for query.")
@@ -76,8 +79,8 @@ def get_prompt_config(self) -> dict[str, str] | None:
         prompts = {}
         if self.default_prompt:
             prompts[TaskTypeEnum.default.value] = self.default_prompt
-        if self.classifier_prompt:
-            prompts[TaskTypeEnum.classification.value] = self.classifier_prompt
+        if self.classification_prompt:
+            prompts[TaskTypeEnum.classification.value] = self.classification_prompt
         if self.cluster_prompt:
             prompts[TaskTypeEnum.cluster.value] = self.cluster_prompt
         if self.query_prompt:
@@ -88,7 +91,7 @@ def get_prompt_config(self) -> dict[str, str] | None:
             prompts[TaskTypeEnum.sts.value] = self.sts_prompt
         return prompts if len(prompts) > 0 else None
 
-    def get_prompt_type(self, prompt_type: TaskTypeEnum | None) -> str | None:  # noqa: PLR0911
+    def get_prompt(self, prompt_type: TaskTypeEnum | None) -> str | None:
         """Get the prompt type for the given task type.
 
         Args:
@@ -97,21 +100,17 @@ def get_prompt_type(self, prompt_type: TaskTypeEnum | None) -> str | None:  # no
         Returns:
             The prompt for the given task type.
         """
-        if prompt_type is None:
-            return self.default_prompt
-        if prompt_type == TaskTypeEnum.classification:
-            return self.classifier_prompt
-        if prompt_type == TaskTypeEnum.cluster:
+        if prompt_type == TaskTypeEnum.classification and self.classification_prompt is not None:
+            return self.classification_prompt
+        if prompt_type == TaskTypeEnum.cluster and self.classification_prompt is not None:
             return self.cluster_prompt
-        if prompt_type == TaskTypeEnum.query:
+        if prompt_type == TaskTypeEnum.query and self.query_prompt is not None:
             return self.query_prompt
-        if prompt_type == TaskTypeEnum.passage:
+        if prompt_type == TaskTypeEnum.passage and self.passage_prompt is not None:
             return self.passage_prompt
-        if prompt_type == TaskTypeEnum.sts:
+        if prompt_type == TaskTypeEnum.sts and self.sts_prompt is not None:
             return self.sts_prompt
-        if prompt_type == TaskTypeEnum.default:
-            return self.default_prompt
-        assert_never(prompt_type)
+        return self.default_prompt
 
 
 class CrossEncoderConfig(HFModelConfig):
@@ -122,3 +121,21 @@ class CrossEncoderConfig(HFModelConfig):
     tokenizer_config: TokenizerConfig = Field(
         default_factory=lambda: TokenizerConfig(max_length=512)
     )  # this is because sentence-transformers doesn't allow you to customize tokenizer settings properly
+
+
+class EarlyStoppingConfig(BaseModel):
+    val_fraction: float = Field(
+        0.2,
+        description=(
+            "Fraction of train samples to allocate to dev set to monitor quality "
+            "during training and perofrm early stopping if quality doesn't enhances."
+        ),
+    )
+    patience: PositiveInt = Field(1, description="Maximum number of epoches to wait for quality to enhance.")
+    threshold: FloatFromZeroToOne = Field(
+        0.0,
+        description="Minimum quality increment to count it as enhancement. Default: any incremeant is counted",
+    )
+    metric: Literal[tuple((SCORING_METRICS_MULTILABEL | SCORING_METRICS_MULTICLASS).keys())] | None = Field(  # type: ignore[valid-type]
+        "scoring_f1", description="Metric to monitor."
+    )
@@ -72,7 +72,16 @@ def __call__(
             ValueError: If OOS samples are present but allow_oos_in_train is not specified.
         """
         if not self._has_oos_samples(dataset):
-            return self._split_without_oos(dataset, multilabel, self.test_size)
+            train, test = self._split_without_oos(dataset, multilabel, self.test_size)
+            if self.is_few_shot:
+                train, test = create_few_shot_split(
+                    train,
+                    test,
+                    multilabel=multilabel,
+                    label_column=self.label_feature,
+                    examples_per_label=self.examples_per_label,
+                )
+            return train, test
         if allow_oos_in_train is None:
             msg = (
                 "Error while splitting dataset. It contains OOS samples, "
 
@@ -225,7 +225,6 @@ def dump_evaluation_results(self) -> dict[str, Any]:
             "pipeline_metrics": self.pipeline_metrics,
             "metrics": node_wise_metrics,
             "configs": self.trials.model_dump(),
-            "artifacts": self.artifacts.model_dump(),
         }
 
     def dump(self, path: Path) -> None:
Original file line number	Diff line number	Diff line change
`@@ -225,7 +225,6 @@ def dump_evaluation_results(self) -> dict[str, Any]:`
`225`	`225`	`"pipeline_metrics": self.pipeline_metrics,`
`226`	`226`	`"metrics": node_wise_metrics,`
`227`	`227`	`"configs": self.trials.model_dump(),`
`228`		`- "artifacts": self.artifacts.model_dump(),`
`229`	`228`	`}`
`230`	`229`
`231`	`230`	`def dump(self, path: Path) -> None:`