Added CatBoostScorer by nikiduki · Pull Request #209 · deeppavlov/AutoIntent

nikiduki · 2025-05-14T13:51:12Z

В catboost_scorer.py добавил dump и load для CatBoostScorer. После можно будет их отредактировать и взять в _dump_tools

autointent/modules/scoring/_catboost/catboost_scorer.py

Samoed · 2025-05-17T10:31:45Z

autointent/modules/scoring/_catboost/catboost_scorer.py

+        verbose: bool = False,
+        **catboost_kwargs: Any,  # noqa: ANN401
+    ) -> None:
+        self.classification_model_config = EmbedderConfig.from_search_config(classification_model_config)


Если classification_model_config None, то создастся EmbedderConfig с дефолтными значениями. У тебя тест test_catboost_without_embedder получается не то тестирует

Samoed · 2025-05-17T10:33:26Z

autointent/modules/scoring/_catboost/catboost_scorer.py

+
+        def encode(texts: list[str]) -> npt.NDArray[np.float32]:
+            with torch.no_grad():
+                batch = tokenizer(


У EmbedderConfig есть параметры для токенизации в tokenizer_config

Samoed · 2025-05-17T10:34:08Z

autointent/modules/scoring/_catboost/catboost_scorer.py

+
+    def _init_text_tools(self) -> None:
+        if not hasattr(self, "_tokenizer"):
+            self._tokenizer = Tokenizer(lowercasing=True, separator_type="BySense", token_types=["Word", "Number"])
+        if not hasattr(self, "_dictionary"):
+            self._dictionary = Dictionary(occurence_lower_bound=1, gram_order=1)


Тут тоже можно убрать

Samoed · 2025-05-17T10:35:03Z

autointent/modules/scoring/_catboost/catboost_scorer.py

+            y_mat = np.zeros((len(labels), self._n_classes), dtype=np.float32)
+            for i, lbls in enumerate(cast("Sequence[Sequence[int]]", labels)):
+                for class_i, lbl in enumerate(lbls):
+                    y_mat[i, class_i] = lbl
+            y = y_mat


Можно просто заменить на np.asarray(labels)?

Samoed · 2025-05-21T14:41:56Z

autointent/modules/scoring/_catboost/catboost_scorer.py

+            self._dictionary_fitted = False
+
+    def get_embedder_config(self) -> dict[str, Any]:
+        return self.embedder_config.model_dump()


Если self._use_embedder False, то ошибка будет

Samoed · 2025-05-21T14:59:57Z

autointent/modules/scoring/_catboost/catboost_scorer.py

+        if not hasattr(self, "_tokenizer"):
+            self._tokenizer = Tokenizer(lowercasing=True, separator_type="BySense", token_types=["Word", "Number"])
+        if not hasattr(self, "_dictionary"):
+            self._dictionary = Dictionary(occurence_lower_bound=1, gram_order=1)
+        if not hasattr(self, "_dictionary_fitted"):
+            self._dictionary_fitted = False


Надо сделать как-то задаваемым как так? https://catboost.ai/docs/en/references/text-processing__test-processing__default-value
Просто кажется что можно не указывать tokenizer, dictironary и тд

* fix * sklearn scorer proper name * fix typing errors * try to fix pydantic errors

* Update wandb.py * Update wandb.py * Update wandb.py * Update _optimization_info.py * remove print

* fix few shot split * lint

* change how `clear_cache` is called * first version of early stopping * change mypy version * train_test_split bug fix * add `compute_metrics` and `EarlyStoppingCallback` * bug fix * fix mypy * try to fix `"eval_f1" not found` error * forgot to upd `from_context` * try to fix mypy * ty to fix "not found f1" error * refactor a little bit * disable early stopping for lora * fix typing errors * update contributing and makefile * minor change * use our metrics * add docstrings * set 3.10 for mypy * upd contributing.md * try to fix bug * try to fix typing issue * try to fix * add early stopping to ptuning

* add test for configuration * lint * satisfy mypy

* add prompt logging * Update optimizer_config.schema.json * fix --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* fix default prompt * allow to use default prompt with override

# Conflicts: # autointent/modules/scoring/_catboost/catboost_scorer.py # tests/modules/scoring/test_catboost.py

nikiduki added 11 commits April 16, 2025 13:34

Added catboost in dependencies

e59df38

Raw implementation of CatboostScorer

6ac54fe

Update init.py files

19703d7

Minor fix

03a277c

Merge branch 'dev' into feat/catboost_scorer

4b52be7

Added CatBoostScorer

a66c2b6

Added tests for CatBoostScorer

4aac84f

Fix init

445815d

Fix mypy

5135169

Minor fix

f0c4fc8

Fix loss function

aa45b90

voorhs requested changes May 17, 2025

View reviewed changes

autointent/modules/scoring/_catboost/catboost_scorer.py Outdated Show resolved Hide resolved

autointent/modules/scoring/_catboost/catboost_scorer.py Outdated Show resolved Hide resolved

autointent/modules/scoring/_catboost/catboost_scorer.py Show resolved Hide resolved

fix multilabel prediction

1ccb642

Samoed reviewed May 17, 2025

View reviewed changes

Samoed requested changes May 17, 2025

View reviewed changes

refactor catboost scorer

c381bbc

Samoed reviewed May 21, 2025

View reviewed changes

nikiduki added 2 commits May 21, 2025 17:50

minor fix

624eec7

fix test to match

5df52f3

Samoed reviewed May 21, 2025

View reviewed changes

voorhs and others added 9 commits June 14, 2025 21:40

fix/wandb-final-metrics-skipped (#212)

c580317

* fix * sklearn scorer proper name * fix typing errors * try to fix pydantic errors

Remove artifacts from final metrics (#216)

0e1e8ad

* Update wandb.py * Update wandb.py * Update wandb.py * Update _optimization_info.py * remove print

fix few shot split (#219)

0a5b009

* fix few shot split * lint

remove egor (#221)

cd426b8

Check if metric can handle dataset type (#224)

37a93d8

* add test for configuration * lint * satisfy mypy

add prompt logging (#220)

ea6d887

* add prompt logging * Update optimizer_config.schema.json * fix --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

fix caching (#225)

91a941c

Fix default prompt (#226)

66a5fea

* fix default prompt * allow to use default prompt with override

fix strenum

f5cbcca

Samoed approved these changes Jun 14, 2025

View reviewed changes

nikiduki and others added 26 commits June 15, 2025 00:01

Added catboost in dependencies

6782858

Raw implementation of CatboostScorer

ed2c19b

Update init.py files

3db6639

Minor fix

d44b48c

Added CatBoostScorer

8d4f249

Added tests for CatBoostScorer

d73dd58

Fix init

34c72d0

Fix mypy

fee8cc6

Minor fix

052aec1

Fix loss function

45adb7e

fix multilabel prediction

05258c1

refactor catboost scorer

47c1b87

minor fix

22080cb

fix test to match

10a8bf3

Merge branch 'dev' into feat/catboost_scorer

c0a931a

# Conflicts: # autointent/modules/scoring/_catboost/catboost_scorer.py # tests/modules/scoring/test_catboost.py

fix loading

1af4eba

fix lint

f2a916f

fix typing

8da2275

fix dumper

2f587bc

add early stopping

ddc456f

fix errors

379efb1

pull dev

b8e7b2b

codestyle

ece47d6

patch catboost with early stopping and catboost

ffd37b7

try to fix

04facce

fix embed type

ace1f59

voorhs merged commit ac1b732 into dev Jun 18, 2025
21 of 22 checks passed

voorhs deleted the feat/catboost_scorer branch June 18, 2025 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added CatBoostScorer#209

Added CatBoostScorer#209
voorhs merged 55 commits intodevfrom
feat/catboost_scorer

nikiduki commented May 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed May 17, 2025

Uh oh!

Samoed May 17, 2025

Uh oh!

Samoed May 17, 2025

Uh oh!

Samoed May 17, 2025

Uh oh!

Samoed May 21, 2025

Uh oh!

Samoed May 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nikiduki commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed May 17, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 17, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 17, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 17, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nikiduki commented May 14, 2025 •

edited

Loading