Skip to content

Commit 5818f61

Browse files
docs: Update README and "Getting Started" tutorial
Updates the project's documentation to be more user-friendly for new users. - The main `README.md` has been updated with installation instructions, a clearer "Getting Started" section, and links to the blog and official documentation. - The "Getting Started" tutorial (`docs/tutorials/getting_started.qmd`) has been restructured to clearly explain and provide examples for the three main use cases: single model evaluation, model comparison, and population comparison. - All code examples in both the README and the tutorial now use more realistic and intuitive sample data where model predictions are clearly correlated with outcomes, making the visualizations more meaningful.
1 parent 80e1c0e commit 5818f61

File tree

2 files changed

+63
-21
lines changed

2 files changed

+63
-21
lines changed

README.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,22 @@ Here's a quick example of how to create a ROC curve for a single model:
3030
import numpy as np
3131
import rtichoke as rk
3232

33-
# Sample data
34-
probs = {'My Model': np.random.rand(100)}
35-
reals = {'My Population': np.random.randint(0, 2, 100)}
33+
# For reproducibility
34+
np.random.seed(42)
35+
36+
# Generate more realistic sample data for a "good" model
37+
# Probabilities for the positive class are generally higher
38+
probs_positive_class = np.random.rand(50) * 0.5 + 0.5 # High probabilities (0.5 to 1.0)
39+
probs_negative_class = np.random.rand(50) * 0.5 # Low probabilities (0.0 to 0.5)
40+
41+
# Combine and shuffle the data
42+
probs_combined = np.concatenate([probs_positive_class, probs_negative_class])
43+
reals_combined = np.concatenate([np.ones(50), np.zeros(50)])
44+
45+
shuffle_index = np.random.permutation(100)
46+
probs = {'My Model': probs_combined[shuffle_index]}
47+
reals = {'My Population': reals_combined[shuffle_index]}
48+
3649

3750
# Create the ROC curve
3851
fig = rk.create_roc_curve(

docs/tutorials/getting_started.qmd

Lines changed: 47 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,15 @@ This is the simplest case, where you want to evaluate the performance of a singl
3232
For this, you provide `probs` with a single entry for your model and `reals` with a single entry for the corresponding outcomes.
3333

3434
```python
35-
# Generate sample data for one model
36-
probs_single = {"Good Model": np.random.rand(100)}
37-
reals_single = {"Population": np.random.randint(0, 2, 100)}
35+
# Generate realistic sample data for a "good" model
36+
probs_positive_class = np.random.rand(50) * 0.5 + 0.5
37+
probs_negative_class = np.random.rand(50) * 0.5
38+
probs_combined = np.concatenate([probs_positive_class, probs_negative_class])
39+
reals_combined = np.concatenate([np.ones(50), np.zeros(50)])
40+
shuffle_index = np.random.permutation(100)
41+
42+
probs_single = {"Good Model": probs_combined[shuffle_index]}
43+
reals_single = {"Population": reals_combined[shuffle_index]}
3844

3945
# Create a ROC curve
4046
fig = rk.create_roc_curve(
@@ -54,13 +60,26 @@ Often, you want to compare the performance of several different models on the *s
5460
For this, you provide `probs` with an entry for each model you want to compare. `reals` will still have a single entry, since the outcome data is the same for all models.
5561

5662
```python
57-
# Generate sample data for three models
63+
# Generate data for a "Good Model", a "Bad Model", and a "Random Guess"
64+
# The "Good Model" has a clearer separation of probabilities.
65+
good_probs_pos = np.random.rand(50) * 0.4 + 0.6 # 0.6 to 1.0
66+
good_probs_neg = np.random.rand(50) * 0.4 # 0.0 to 0.4
67+
good_probs = np.concatenate([good_probs_pos, good_probs_neg])
68+
69+
# The "Bad Model" has more overlap.
70+
bad_probs_pos = np.random.rand(50) * 0.5 + 0.4 # 0.4 to 0.9
71+
bad_probs_neg = np.random.rand(50) * 0.5 + 0.1 # 0.1 to 0.6
72+
bad_probs = np.concatenate([bad_probs_pos, bad_probs_neg])
73+
74+
reals_comparison_data = np.concatenate([np.ones(50), np.zeros(50)])
75+
shuffle_index_comp = np.random.permutation(100)
76+
5877
probs_comparison = {
59-
"Good Model": np.random.rand(100) + 0.1, # Slightly better
60-
"Bad Model": np.random.rand(100),
61-
"Random Guess": np.linspace(0, 1, 100)
78+
"Good Model": good_probs[shuffle_index_comp],
79+
"Bad Model": bad_probs[shuffle_index_comp],
80+
"Random Guess": np.random.rand(100)
6281
}
63-
reals_comparison = {"Population": np.random.randint(0, 2, 100)}
82+
reals_comparison = {"Population": reals_comparison_data[shuffle_index_comp]}
6483

6584

6685
# Create a precision-recall curve to compare the models
@@ -79,20 +98,30 @@ This is useful when you want to evaluate a single model's performance across dif
7998
For this, you provide `probs` with an entry for each population and `reals` with a corresponding entry for each population's outcomes.
8099

81100
```python
82-
# Generate sample data for train and test sets
83-
probs_train = np.random.rand(100)
84-
reals_train = (probs_train > 0.5).astype(int)
85-
86-
probs_test = np.random.rand(80)
87-
reals_test = (probs_test > 0.4).astype(int) # A slightly different relationship
101+
# Generate sample data for a train and test set.
102+
# Let's assume the model is slightly overfit, performing better on the train set.
103+
104+
# Train set: clear separation
105+
train_probs_pos = np.random.rand(50) * 0.4 + 0.6
106+
train_probs_neg = np.random.rand(50) * 0.4
107+
train_probs = np.concatenate([train_probs_pos, train_probs_neg])
108+
train_reals = np.concatenate([np.ones(50), np.zeros(50)])
109+
train_shuffle = np.random.permutation(100)
110+
111+
# Test set: more overlap
112+
test_probs_pos = np.random.rand(40) * 0.5 + 0.4
113+
test_probs_neg = np.random.rand(40) * 0.5 + 0.1
114+
test_probs = np.concatenate([test_probs_pos, test_probs_neg])
115+
test_reals = np.concatenate([np.ones(40), np.zeros(40)])
116+
test_shuffle = np.random.permutation(80)
88117

89118
probs_populations = {
90-
"Train": probs_train,
91-
"Test": probs_test
119+
"Train": train_probs[train_shuffle],
120+
"Test": test_probs[test_shuffle]
92121
}
93122
reals_populations = {
94-
"Train": reals_train,
95-
"Test": reals_test
123+
"Train": train_reals[train_shuffle],
124+
"Test": test_reals[test_shuffle]
96125
}
97126

98127
# Create a calibration curve to compare the model's performance

0 commit comments

Comments
 (0)