Skip to content

Commit 2746b58

Browse files
authored
Upd/docs (#249)
* upd main page * upd quickstart * fix quickstart doctests * upd concepts * add page about text embeddings * add page on automl theory * update dialogue systems page
1 parent 53806e6 commit 2746b58

File tree

10 files changed

+1069
-114
lines changed

10 files changed

+1069
-114
lines changed

docs/source/concepts.rst

Lines changed: 109 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,123 @@
1+
============
12
Key Concepts
23
============
34

4-
.. _key-search-space:
5+
This page introduces the fundamental concepts that underpin AutoIntent's design and functionality. Understanding these concepts will help you effectively use the framework and make informed decisions about your text classification projects.
6+
7+
.. _concepts-pipeline:
8+
9+
Three-Stage Pipeline Architecture
10+
=================================
11+
12+
AutoIntent organizes text classification into a modular three-stage pipeline, providing clear separation of concerns and flexibility in optimization:
13+
14+
**🔤 Embedding Stage**
15+
Transforms raw text into dense vector representations using pre-trained transformer models. This stage handles the computationally intensive text encoding and can be optimized independently from downstream classification tasks.
16+
17+
**📊 Scoring Stage**
18+
Processes embeddings to predict class probabilities. This stage supports diverse approaches from classical machine learning (KNN, logistic regression) to deep learning models (BERT fine-tuning, CNNs). All models operate on pre-computed embeddings for efficiency.
19+
20+
**⚖️ Decision Stage**
21+
Converts predicted probabilities into final classifications by applying thresholds and decision rules. This stage is crucial for multi-label classification and out-of-scope detection scenarios.
22+
23+
This modular design enables efficient experimentation, allows reusing expensive embedding computations across different models, and supports deployment on CPU-only systems.
24+
25+
.. _concepts-automl:
26+
27+
AutoML Optimization Strategy
28+
============================
29+
30+
AutoIntent employs a hierarchical optimization approach that balances exploration with computational efficiency:
31+
32+
**🔧 Module-Level Optimization**
33+
Components are optimized sequentially: embedding → scoring → decision. Each stage builds upon the best model from the previous stage, creating a cohesive pipeline while preventing combinatorial explosion.
34+
35+
**🤖 Model-Level Optimization**
36+
Within each module, both model architectures and hyperparameters are jointly optimized using Optuna's Tree-structured Parzen Estimators and random sampling.
37+
38+
**🗺️ Search Space Configuration**
39+
Optimization behavior is controlled through dictionary-like search spaces that define:
40+
41+
- Available model types and their hyperparameter ranges
42+
- Optimization budget and resource constraints
43+
- Cross-validation and evaluation strategies
44+
45+
.. _concepts-embedding-centric:
46+
47+
Embedding-Centric Design
48+
========================
49+
50+
AutoIntent's architecture centers around transformer-based text embeddings, providing several key advantages:
51+
52+
**⚡ Pre-computed Embeddings**
53+
Text is encoded once and reused across all scoring models, dramatically reducing computational overhead during hyperparameter optimization and enabling efficient experimentation.
54+
55+
**🤗 Model Repository Integration**
56+
Seamless access to thousands of pre-trained models from Hugging Face Hub, with intelligent selection strategies based on retrieval metrics or downstream task performance.
57+
58+
**🚀 Deployment Flexibility**
59+
Separation of embedding generation from classification enables deploying lightweight classifiers on resource-constrained systems while leveraging powerful transformer representations.
60+
61+
.. _concepts-multiclass-multilabel:
62+
63+
Classification Paradigms
64+
========================
65+
66+
AutoIntent supports various classification scenarios through its flexible decision module:
67+
68+
**🏷️ Multi-Class Classification**
69+
Traditional single-label classification where each input belongs to exactly one class. Uses argmax or threshold-based decisions on predicted probabilities.
70+
71+
**🔖 Multi-Label Classification**
72+
Each input can belong to multiple classes simultaneously. Employs adaptive thresholding strategies that can be sample-specific or learned globally across the dataset.
73+
74+
75+
.. _concepts-oos:
76+
77+
Out-of-Scope Detection
78+
======================
79+
80+
A critical capability for production text classification systems, especially in conversational AI:
81+
82+
**📏 Confidence Thresholding**
83+
Uses predicted probability scores to identify inputs that don't belong to any known class. Threshold values can be tuned automatically to balance precision and recall.
84+
85+
**🔗 Integration with Multi-Label**
86+
OOS detection works seamlessly with multi-label scenarios, enabling detection of completely unknown inputs vs. partial matches to known classes.
87+
88+
.. _concepts-presets:
89+
90+
Optimization Presets
91+
====================
92+
93+
AutoIntent provides predefined optimization strategies that balance quality, speed, and resource consumption:
594

6-
Optimization Search Space
7-
-------------------------
95+
**⚡ Zero-Shot Presets**
96+
Leverage class descriptions and large language models for classification without training data. Ideal for rapid prototyping and cold-start scenarios.
897

9-
The automatic selection of a classifier occurs through the iteration of hyperparameters within a certain *search space*. Conceptually, this search space is a dictionary where the keys are the names of the hyperparameters, and the values are lists. The hyperparameters act as the coordinate "axes" of the search space, and the values in the lists act as points on this axis.
98+
**📈 Classic Presets**
99+
Focus on traditional ML approaches (KNN, linear models, tree-based methods) operating on transformer embeddings. Offer excellent balance of performance and efficiency.
10100

11-
.. _key-stages:
101+
**🧠 Neural Network Presets**
102+
Include deep learning approaches like CNN, RNN, and transformer fine-tuning. Provide highest potential performance at increased computational cost.
12103

13-
Classification Stages
14-
---------------------
104+
**🪜 Computational Tiers**
105+
Each preset family offers light, medium, and heavy variants that trade optimization time for potential performance improvements.
15106

16-
Intent classification can be divided into two stages: scoring and decision. Scoring involves predicting the probabilities of the presence of each intent in a given utterance. Prediction involves forming the final decision based on the provided probabilities.
107+
.. _concepts-modularity:
17108

18-
.. _key-oos:
109+
Modular Architecture
110+
====================
19111

20-
Out-of-domain utterances
21-
------------------------
112+
AutoIntent's design emphasizes modularity and extensibility:
22113

23-
If we want to detect out-of-domain examples, it is necessary to set a probability threshold during the decision stage, at which the presence of some known intent can be asserted.
114+
**🧩 Plugin Architecture**
115+
Each component (embedding models, scoring methods, decision strategies) implements a common interface, enabling easy addition of new approaches without modifying core framework code.
24116

25-
.. _key-nodes-modules:
117+
**⚙️ Configuration-Driven**
118+
All aspects of optimization can be controlled through declarative configuration files, supporting reproducible experiments and easy sharing of optimization strategies.
26119

27-
Nodes and Modules
28-
-----------------
120+
**🔧 Extensibility**
121+
Framework can be extended with custom embedding models, scoring algorithms, and decision strategies while maintaining compatibility with the AutoML optimization pipeline.
29122

30-
The scoring or decision model, along with its hyperparameters that need to be iterated, is called an *optimization module*. A set of modules related to one optimization stage (scoring or decision) is called an *optimization node*.
123+
This modular design ensures that AutoIntent can evolve with advances in NLP research while maintaining stability and backward compatibility for existing users.

docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@
5050
"sphinx.ext.intersphinx",
5151
"sphinx_multiversion",
5252
"sphinx.ext.napoleon",
53+
"sphinx_toolbox.collapse"
5354
]
5455

5556
templates_path = ["_templates"]

docs/source/index.rst

Lines changed: 24 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
AutoIntent documentation
22
========================
33

4-
**AutoIntent** is an open source tool for automatic configuration of a text classification pipeline for intent prediction.
4+
**AutoIntent** is an open source tool for automatic configuration of text classification pipelines, with specialized support for intent prediction.
55

66
.. note::
77

88
This project is under active development.
99

10-
The task of intent detection is one of the main subtasks in creating task-oriented dialogue systems, along with scriptwriting and slot filling. AutoIntent project offers users the following:
10+
The task of intent detection is one of the main subtasks in creating task-oriented dialogue systems, along with scriptwriting and slot filling. While AutoIntent is particularly well-suited for intent detection, it can be applied to any text classification problem, including sentiment analysis, topic classification, document categorization, and other NLP tasks.
11+
12+
AutoIntent project offers users the following:
1113

1214
- A convenient library of methods for intent classification that can be used in a sklearn-like "fit-predict" format.
1315
- An AutoML approach to creating classifiers, where the only thing needed is to upload a set of labeled data.
@@ -36,33 +38,34 @@ Example of building an intent classifier in a couple of lines of code:
3638
for match in glob("vector_db*"):
3739
shutil.rmtree(match)
3840

39-
Documentation Contents
40-
----------------------
41-
42-
:doc:`Quickstart <quickstart>`
43-
..............................
44-
45-
It is recommended to begin with the :doc:`quickstart` page. It contains overview of our capabilities and basic instructions for working with our library.
41+
Documentation Guide
42+
-------------------
4643

47-
:doc:`Key Concepts <concepts>`
48-
..............................
44+
Getting Started
45+
...............
4946

50-
Key terms and concepts we use throughout our documentation.
47+
:doc:`🚀 Quickstart <quickstart>`
48+
Jump right in! Install AutoIntent and build your first text classifier in minutes. Perfect for users who want to get up and running quickly with practical examples.
5149

52-
:doc:`User Guides<user_guides>`
53-
................................
50+
:doc:`📚 Key Concepts <concepts>`
51+
Essential terminology and concepts used throughout AutoIntent. Understanding these will help you navigate the documentation and make the most of the library's features.
5452

55-
A series of notebooks that demonstrate in detail and comprehensively the capabilities of our library and how to use it.
53+
In-Depth Learning
54+
.................
5655

57-
:doc:`API Reference <autoapi/autointent/index>`
58-
...............................................
56+
:doc:`📖 User Guides <user_guides>`
57+
Comprehensive tutorials and examples that walk you through AutoIntent's capabilities step-by-step. These hands-on guides cover everything from basic usage to advanced techniques.
5958

60-
Pay special attention to the sections :doc:`autoapi/autointent/modules/index` and :doc:`autoapi/autointent/metrics/index`.
59+
:doc:`🎓 Learn AutoIntent <learn/index>`
60+
Dive deeper into the theory behind AutoIntent. Learn about dialogue systems, AutoML principles, and the science that powers intelligent text classification.
6161

62-
:doc:`Learn AutoIntent<learn/index>`
63-
....................................
62+
Reference
63+
.........
6464

65-
Some theoretical background on dialogue systems and auto ML.
65+
:doc:`🔧 API Reference <autoapi/autointent/index>`
66+
Complete technical documentation for all classes, methods, and functions. Essential reference for developers integrating AutoIntent into their applications.
67+
68+
Key sections: :doc:`Modules <autoapi/autointent/modules/index>` | :doc:`Metrics <autoapi/autointent/metrics/index>`
6669

6770

6871
.. toctree::

0 commit comments

Comments
 (0)