Skip to content

Commit 80e1c0e

Browse files
docs: Update README and "Getting Started" tutorial
Updates the project's documentation to be more user-friendly for new users. - The main `README.md` has been updated with installation instructions, a clearer "Getting Started" section, and links to the blog and official documentation. The code example has been corrected to use the proper dictionary format for the `reals` parameter. - The "Getting Started" tutorial (`docs/tutorials/getting_started.qmd`) has been restructured to clearly explain and provide examples for the three main use cases: single model evaluation, model comparison, and population comparison. This new structure is inspired by the documentation for the R version of `rtichoke`.
1 parent 11edff9 commit 80e1c0e

File tree

2 files changed

+112
-34
lines changed

2 files changed

+112
-34
lines changed

README.md

Lines changed: 36 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,41 @@
77
* **Gains and Lift Charts**
88
* **Decision Curves**
99

10-
The library is designed to be easy to use, while still offering a high degree of control over the final plots.
10+
The library is designed to be easy to use, while still offering a high degree of control over the final plots. For some reproducible examples please visit the [rtichoke blog](https://uriahf.github.io/rtichoke-py/blog.html)!
11+
12+
## Installation
13+
14+
You can install `rtichoke` from PyPI:
15+
16+
```bash
17+
pip install rtichoke
18+
```
19+
20+
## Getting Started
21+
22+
To use `rtichoke`, you'll need two main inputs:
23+
24+
* `probs`: A dictionary containing your model's predicted probabilities.
25+
* `reals`: A dictionary of the true binary outcomes.
26+
27+
Here's a quick example of how to create a ROC curve for a single model:
28+
29+
```python
30+
import numpy as np
31+
import rtichoke as rk
32+
33+
# Sample data
34+
probs = {'My Model': np.random.rand(100)}
35+
reals = {'My Population': np.random.randint(0, 2, 100)}
36+
37+
# Create the ROC curve
38+
fig = rk.create_roc_curve(
39+
probs=probs,
40+
reals=reals
41+
)
42+
43+
fig.show()
44+
```
1145

1246
## Key Features
1347

@@ -18,6 +52,4 @@ The library is designed to be easy to use, while still offering a high degree of
1852

1953
## Documentation
2054

21-
For a complete guide to the library, including a "Getting Started" tutorial and a full API reference, please see the **[official documentation](https://your-documentation-url.com)**.
22-
23-
*(Note: The documentation URL will need to be updated once the website is deployed.)*
55+
For a complete guide to the library, including a "Getting Started" tutorial and a full API reference, please see the **[official documentation](https://uriahf.github.io/rtichoke-py/)**.

docs/tutorials/getting_started.qmd

Lines changed: 76 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
2-
title: "Getting Started with Rtichoke"
2+
title: "Getting Started with rtichoke"
33
---
44

5-
This tutorial provides a basic introduction to the `rtichoke` library. We'll walk through the process of preparing data, creating a decision curve, and visualizing the results.
5+
This tutorial provides an introduction to the `rtichoke` library, showing how to visualize model performance for different scenarios.
66

77
## 1. Import Libraries
88

@@ -11,52 +11,98 @@ First, let's import the necessary libraries. We'll need `numpy` for data manipul
1111
```python
1212
import numpy as np
1313
import rtichoke as rk
14+
15+
# For reproducibility
16+
np.random.seed(42)
1417
```
1518

16-
## 2. Prepare Your Data
19+
## 2. Understanding the Inputs
20+
21+
`rtichoke` expects two main inputs for creating performance curves:
22+
23+
* **`probs` (Probabilities)**: A dictionary where keys are model or population names and values are lists or NumPy arrays of predicted probabilities.
24+
* **`reals` (Outcomes)**: A dictionary where keys are population names and values are lists or NumPy arrays of the true binary outcomes (0 or 1).
1725

18-
`rtichoke` expects data in a specific format. You'll need two main components:
26+
Let's look at the three main use cases.
1927

20-
* **Probabilities (`probs`)**: A dictionary where keys are model names and values are NumPy arrays of predicted probabilities.
21-
* **Real Outcomes (`reals`)**: A NumPy array containing the true binary outcomes (0 or 1).
28+
### Use Case 1: Single Model
2229

23-
Let's create some sample data for two different models:
30+
This is the simplest case, where you want to evaluate the performance of a single predictive model.
31+
32+
For this, you provide `probs` with a single entry for your model and `reals` with a single entry for the corresponding outcomes.
2433

2534
```python
26-
# Sample data from the dcurves_example.py script
27-
probs_dict = {
28-
"Marker": np.array([
29-
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5,
30-
0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
31-
]),
32-
"Marker2": np.array([
33-
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5,
34-
0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
35-
])
36-
}
37-
reals = np.array([
38-
1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1
39-
])
35+
# Generate sample data for one model
36+
probs_single = {"Good Model": np.random.rand(100)}
37+
reals_single = {"Population": np.random.randint(0, 2, 100)}
38+
39+
# Create a ROC curve
40+
fig = rk.create_roc_curve(
41+
probs=probs_single,
42+
reals=reals_single,
43+
)
44+
45+
# In an interactive environment (like a Jupyter notebook),
46+
# this will display the plot.
47+
fig.show()
4048
```
4149

42-
## 3. Create a Decision Curve
50+
### Use Case 2: Models Comparison
51+
52+
Often, you want to compare the performance of several different models on the *same* population.
4353

44-
Now that we have our data, we can create a decision curve. This is a simple one-liner with `rtichoke`:
54+
For this, you provide `probs` with an entry for each model you want to compare. `reals` will still have a single entry, since the outcome data is the same for all models.
4555

4656
```python
47-
fig = rk.create_decision_curve(
48-
probs=probs_dict,
49-
reals=reals,
57+
# Generate sample data for three models
58+
probs_comparison = {
59+
"Good Model": np.random.rand(100) + 0.1, # Slightly better
60+
"Bad Model": np.random.rand(100),
61+
"Random Guess": np.linspace(0, 1, 100)
62+
}
63+
reals_comparison = {"Population": np.random.randint(0, 2, 100)}
64+
65+
66+
# Create a precision-recall curve to compare the models
67+
fig = rk.create_precision_recall_curve(
68+
probs=probs_comparison,
69+
reals=reals_comparison,
5070
)
71+
72+
fig.show()
5173
```
5274

53-
## 4. Show the Plot
75+
### Use Case 3: Several Populations
5476

55-
Finally, let's display the plot. Since `rtichoke` uses Plotly under the hood, you can show the figure just like any other Plotly object.
77+
This is useful when you want to evaluate a single model's performance across different populations. A common example is comparing performance on a training set versus a testing set to check for overfitting.
78+
79+
For this, you provide `probs` with an entry for each population and `reals` with a corresponding entry for each population's outcomes.
5680

5781
```python
58-
# To display the plot in an interactive environment (like a Jupyter notebook)
82+
# Generate sample data for train and test sets
83+
probs_train = np.random.rand(100)
84+
reals_train = (probs_train > 0.5).astype(int)
85+
86+
probs_test = np.random.rand(80)
87+
reals_test = (probs_test > 0.4).astype(int) # A slightly different relationship
88+
89+
probs_populations = {
90+
"Train": probs_train,
91+
"Test": probs_test
92+
}
93+
reals_populations = {
94+
"Train": reals_train,
95+
"Test": reals_test
96+
}
97+
98+
# Create a calibration curve to compare the model's performance
99+
# on the two populations.
100+
fig = rk.create_calibration_curve(
101+
probs=probs_populations,
102+
reals=reals_populations,
103+
)
104+
59105
fig.show()
60106
```
61107

62-
And that's it! You've created your first decision curve with `rtichoke`. From here, you can explore the other curve types and options that the library has to offer.
108+
And that's it! You've now seen how to create three of the most common evaluation plots with `rtichoke`. From here, you can explore the other curve types and options that the library has to offer in the [API Reference](../reference/index.qmd).

0 commit comments

Comments
 (0)