You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Update README and "Getting Started" tutorial
Updates the project's documentation to be more user-friendly for new users.
- The main `README.md` has been updated with installation instructions, a clearer "Getting Started" section, and links to the blog and official documentation. The code example has been corrected to use the proper dictionary format for the `reals` parameter.
- The "Getting Started" tutorial (`docs/tutorials/getting_started.qmd`) has been restructured to clearly explain and provide examples for the three main use cases: single model evaluation, model comparison, and population comparison. This new structure is inspired by the documentation for the R version of `rtichoke`.
Copy file name to clipboardExpand all lines: README.md
+36-4Lines changed: 36 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,41 @@
7
7
***Gains and Lift Charts**
8
8
***Decision Curves**
9
9
10
-
The library is designed to be easy to use, while still offering a high degree of control over the final plots.
10
+
The library is designed to be easy to use, while still offering a high degree of control over the final plots. For some reproducible examples please visit the [rtichoke blog](https://uriahf.github.io/rtichoke-py/blog.html)!
11
+
12
+
## Installation
13
+
14
+
You can install `rtichoke` from PyPI:
15
+
16
+
```bash
17
+
pip install rtichoke
18
+
```
19
+
20
+
## Getting Started
21
+
22
+
To use `rtichoke`, you'll need two main inputs:
23
+
24
+
*`probs`: A dictionary containing your model's predicted probabilities.
25
+
*`reals`: A dictionary of the true binary outcomes.
26
+
27
+
Here's a quick example of how to create a ROC curve for a single model:
@@ -18,6 +52,4 @@ The library is designed to be easy to use, while still offering a high degree of
18
52
19
53
## Documentation
20
54
21
-
For a complete guide to the library, including a "Getting Started" tutorial and a full API reference, please see the **[official documentation](https://your-documentation-url.com)**.
22
-
23
-
*(Note: The documentation URL will need to be updated once the website is deployed.)*
55
+
For a complete guide to the library, including a "Getting Started" tutorial and a full API reference, please see the **[official documentation](https://uriahf.github.io/rtichoke-py/)**.
This tutorial provides a basic introduction to the `rtichoke` library. We'll walk through the process of preparing data, creating a decision curve, and visualizing the results.
5
+
This tutorial provides an introduction to the `rtichoke` library, showing how to visualize model performance for different scenarios.
6
6
7
7
## 1. Import Libraries
8
8
@@ -11,52 +11,98 @@ First, let's import the necessary libraries. We'll need `numpy` for data manipul
11
11
```python
12
12
import numpy as np
13
13
import rtichoke as rk
14
+
15
+
# For reproducibility
16
+
np.random.seed(42)
14
17
```
15
18
16
-
## 2. Prepare Your Data
19
+
## 2. Understanding the Inputs
20
+
21
+
`rtichoke` expects two main inputs for creating performance curves:
22
+
23
+
***`probs` (Probabilities)**: A dictionary where keys are model or population names and values are lists or NumPy arrays of predicted probabilities.
24
+
***`reals` (Outcomes)**: A dictionary where keys are population names and values are lists or NumPy arrays of the true binary outcomes (0 or 1).
17
25
18
-
`rtichoke` expects data in a specific format. You'll need two main components:
26
+
Let's look at the three main use cases.
19
27
20
-
***Probabilities (`probs`)**: A dictionary where keys are model names and values are NumPy arrays of predicted probabilities.
21
-
***Real Outcomes (`reals`)**: A NumPy array containing the true binary outcomes (0 or 1).
28
+
### Use Case 1: Single Model
22
29
23
-
Let's create some sample data for two different models:
30
+
This is the simplest case, where you want to evaluate the performance of a single predictive model.
31
+
32
+
For this, you provide `probs` with a single entry for your model and `reals` with a single entry for the corresponding outcomes.
# In an interactive environment (like a Jupyter notebook),
46
+
# this will display the plot.
47
+
fig.show()
40
48
```
41
49
42
-
## 3. Create a Decision Curve
50
+
### Use Case 2: Models Comparison
51
+
52
+
Often, you want to compare the performance of several different models on the *same* population.
43
53
44
-
Now that we have our data, we can create a decision curve. This is a simple one-liner with `rtichoke`:
54
+
For this, you provide `probs` with an entry for each model you want to compare. `reals` will still have a single entry, since the outcome data is the same for all models.
# Create a precision-recall curve to compare the models
67
+
fig = rk.create_precision_recall_curve(
68
+
probs=probs_comparison,
69
+
reals=reals_comparison,
50
70
)
71
+
72
+
fig.show()
51
73
```
52
74
53
-
##4. Show the Plot
75
+
### Use Case 3: Several Populations
54
76
55
-
Finally, let's display the plot. Since `rtichoke` uses Plotly under the hood, you can show the figure just like any other Plotly object.
77
+
This is useful when you want to evaluate a single model's performance across different populations. A common example is comparing performance on a training set versus a testing set to check for overfitting.
78
+
79
+
For this, you provide `probs` with an entry for each population and `reals` with a corresponding entry for each population's outcomes.
56
80
57
81
```python
58
-
# To display the plot in an interactive environment (like a Jupyter notebook)
82
+
# Generate sample data for train and test sets
83
+
probs_train = np.random.rand(100)
84
+
reals_train = (probs_train >0.5).astype(int)
85
+
86
+
probs_test = np.random.rand(80)
87
+
reals_test = (probs_test >0.4).astype(int) # A slightly different relationship
88
+
89
+
probs_populations = {
90
+
"Train": probs_train,
91
+
"Test": probs_test
92
+
}
93
+
reals_populations = {
94
+
"Train": reals_train,
95
+
"Test": reals_test
96
+
}
97
+
98
+
# Create a calibration curve to compare the model's performance
99
+
# on the two populations.
100
+
fig = rk.create_calibration_curve(
101
+
probs=probs_populations,
102
+
reals=reals_populations,
103
+
)
104
+
59
105
fig.show()
60
106
```
61
107
62
-
And that's it! You've created your first decision curve with `rtichoke`. From here, you can explore the other curve types and options that the library has to offer.
108
+
And that's it! You've now seen how to create three of the most common evaluation plots with `rtichoke`. From here, you can explore the other curve types and options that the library has to offer in the [API Reference](../reference/index.qmd).
0 commit comments