Skip to content

Commit 95ba133

Browse files
author
Jaime Céspedes Sisniega
authored
Merge pull request #313 from IFCA-Advanced-Computing/feature-faq-documentation
Add FAQ section to documentation
2 parents a235b3d + d7f3518 commit 95ba133

File tree

4 files changed

+91
-7
lines changed

4 files changed

+91
-7
lines changed

docs/source/concepts.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,6 @@ The different types of changes that are considered as a form of drift can be cat
3030
${P_{[0, t]}(X) \neq P_{[t+1, \infty)}(X)}$. [Data drift methods](#data-drift) are designed to try to detect this type drift. Unlike *concept drift* taking place, the presence of *data drift* does not guarantee that model's performance is being affected, but it is highly probable that is happening. We have renamed *dataset shift* {cite}`rabanser2019failing` to *data drift*
3131
in order to maintain consistency with the *concept drift* definition. These *data drift* methods can also be used to detect *label drift*, also known as *prior probability shift* {cite}`storkey2009training`, where the label distribution ${P(Y)}$ is the one that changes over time, in such a way that ${P_{[0, t]}(Y) \neq P_{[t+1, \infty)}(Y)}$.
3232

33-
## Why do I need to use a drift detector?
34-
35-
One of the main mistakes when deploying a machine learning model for consumption is to assume that the data used for inference will come from the same distribution as the data on which the model was trained, i.e. that the data will be stationary. It may also be the case that the data use at inference time is still similar to those used for training, but the concept of what was learned in the first instance has changed over time, making the model obsolete in terms of performance.
36-
37-
Drift detectors make it possible to monitor model performance or feature distributions in order to detect significant deviations that can cause model performance decay. By using them it is possible to know when it is necessary to replace the current model with a new one trained on more recent data.
38-
3933
## Verification latency or delay
4034

4135
According to {cite}`dos2016fast`, is defined as the period between a model's prediction and the availability of the ground-truth label (in case of a classification problem) or the target value (in case of a regression problem).

docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@
112112
]
113113
myst_url_schemes = ("http", "https", "mailto")
114114
myst_heading_anchors = 3
115+
myst_all_links_external = True
115116

116117
# MyST-NB configuration
117118
nb_execution_timeout = 480

docs/source/faq.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# FAQ
2+
3+
Here we will try to answer some of the most common questions about drift detection and the Frouros library.
4+
5+
## What is the difference between *concept drift* and *data drift*?
6+
7+
Concept drift refers to changes in the underlying concept being modeled, such as changes in the relationship between
8+
the input features and the target variable. It can be caused by changes in the conditional probability $P(y|X)$ with or
9+
without a change in $P(X)$. Data drift, on the other hand, refers to changes in the distribution of the input features
10+
$P(X)$, such as changes in the feature distributions over time. It focuses on detecting when the incoming data no longer
11+
resembles the data the model was trained on, potentially leading to decreased performance or reliability.
12+
13+
## What is the difference between *out-of-distribution* detection and *data drift* detection?
14+
15+
Out-of-distribution detection focuses on identifying samples that fall outside the training distribution, often used
16+
to detect anomalies or novel data. It aims to detect instances that differ significantly from the data the model was
17+
trained on. Data drift detection, on the other hand, is concerned with identifying shifts or changes in the
18+
distribution of the data over time.
19+
20+
## How can I detect *concept drift* without having access to the ground truth labels at inference time?
21+
22+
In cases where ground truth labels are not available at inference time or the verification latency is high, it may not
23+
be possible to directly detect concept drift using traditional methods. In such cases, it may be necessary to use
24+
alternative techniques, such as data drift detection, to monitor changes in the feature distributions and identify
25+
potential drift. By monitoring the feature distributions, it may be possible to detect when the incoming data no
26+
longer resembles the data the model was trained on, even in the absence of ground truth labels.
27+
28+
## Why do I need to use a *drift* detector?
29+
30+
One of the main mistakes when deploying a machine learning model for consumption is to assume that the data used for
31+
inference will come from the same distribution as the data on which the model was trained, i.e., that the data will be
32+
stationary. It may also be the case that the data used at inference time is still similar to those used for training,
33+
but the concept of what was learned in the first instance has changed over time, making the model obsolete in terms of
34+
performance.
35+
36+
Drift detectors make it possible to monitor model performance or feature distributions to detect significant deviations
37+
that can cause model performance decay. By using them, it is possible to know when it is necessary to replace the
38+
current model with a new one trained on more recent data.
39+
40+
## Is *model drift* the same as *concept drift*?
41+
42+
Model drift is a term used to describe the degradation of a model's performance over time. This can be caused by a
43+
variety of factors, including concept drift, data drift, or other issues such as model aging. Concept drift, on the
44+
other hand, refers specifically to changes in the underlying concept being modeled, such as changes in the relationship
45+
between the input features and the target variable. While concept drift can lead to model drift, model drift can also be
46+
caused by other factors and may not always be directly related to changes in the underlying concept.
47+
48+
## What actions should I take if *drift* is detected in my model?
49+
50+
If drift is detected in your model, it is important to take action to address the underlying cause of the drift.
51+
This may involve retraining the model on more recent data, updating the model's features or architecture, or taking
52+
other steps to ensure that the model remains accurate and reliable. In some cases, it may also be necessary to
53+
re-evaluate the model's performance and consider whether it is still suitable for its intended use case.
54+
55+
## Can Frouros be integrated with popular machine learning frameworks such as TensorFlow or PyTorch?
56+
57+
Yes, Frouros is designed to be compatible with any machine learning frameworks such as TensorFlow or PyTorch. It is
58+
framework-agnostic and can be used with any machine learning model or pipeline.
59+
60+
For instance, we provide an [example](./examples/data_drift/MMD_advance.html) that shows how to integrate Frouros with a PyTorch model to detect data
61+
drift for a computer vision use case. In addition, there is an [example](./examples/concept_drift/DDM_advance.html) that shows how to integrate Frouros with
62+
scikit-learn to detect concept drift in a streaming manner.
63+
64+
## How frequently should I run *drift* detection checks in my machine learning pipeline?
65+
66+
The frequency of drift detection checks will depend on the specific use case and the nature of the data being
67+
processed. In general, it is a good practice to run drift detection checks regularly, such as after each batch of
68+
data or at regular intervals, to ensure that any drift is detected and addressed in a timely manner.
69+
70+
## What are some common causes of *drift* in machine learning models?
71+
72+
Drift in machine learning models can be caused by a variety of factors, including changes in the underlying concept
73+
being modeled, changes in the distribution of the input features, changes in the relationship between the input
74+
features and the target variable, and other issues such as model aging or degradation. It is important to monitor
75+
models for drift and take action to address any detected drift to maintain model accuracy and reliability.
76+
77+
## How can I contribute to the development of Frouros or report issues?
78+
79+
The [contribute section](./contribute.html#how-to-contribute) provides information on how to contribute to the development of Frouros,
80+
including guidelines for reporting issues, submitting feature requests, and contributing code or documentation.
81+
82+
## Does Frouros provide visualization tools for *drift* detection results?
83+
84+
Frouros does not currently provide built-in visualization tools for drift detection results, but it is planned to
85+
include them in future releases.

docs/source/index.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,10 @@
55
:end-before: ⚡️ Quickstart
66
```
77

8-
In order to start using `frouros`, we highly recommend to check {doc}`concepts <concepts>` section to get a quick idea of what `frouros` is capable of, and what it is not yet capable of. Subsequently, we recommend taking a look at the {doc}`examples <examples>` section since it is the best way to start using `frouros`.
8+
In order to start using `frouros`, we highly recommend to check {doc}`concepts <concepts>` and
9+
{doc}`FAQ <faq>` sections to get a quick idea of what `frouros` is capable of, and what it is not yet capable
10+
of. Subsequently, we recommend taking a look at the {doc}`examples <examples>` section since it is the best way to
11+
start using `frouros`.
912

1013
Read {doc}`installation <installation>` instructions to start using `frouros`.
1114

@@ -26,4 +29,5 @@ concepts
2629
api_reference
2730
examples
2831
contribute
32+
faq
2933
```

0 commit comments

Comments
 (0)