Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc

SebastienMelo · SebastienMelo · commit 222a3c290d9f · 2025-05-14T11:24:18.000+02:00
diff --git a/python_scripts/cross_validation_grouping.py b/python_scripts/cross_validation_grouping.py
@@ -110,9 +110,10 @@
 print(digits.DESCR)
 
 # %% [markdown]
-# If we read carefully, 13 writers wrote the digits of our dataset, accounting
-# for a total amount of 1797 samples. Thus, a writer wrote several times the
-# same numbers. Let's suppose that the writer samples are grouped. Subsequently,
+# If we read carefully, `load_digits` loads a copy of the **test set** of the
+# UCI ML hand-written digits dataset, which consists of 1797 images by
+# **13 different writers**. Thus, each writer wrote several times the same
+# numbers. Let's suppose the dataset is ordered by writer. Subsequently,
 # not shuffling the data will keep all writer samples together either in the
 # training or the testing sets. Mixing the data will break this structure, and
 # therefore digits written by the same writer will be available in both the