Skip to content

Commit 194c9f4

Browse files
committed
reorder sections
1 parent 2141507 commit 194c9f4

File tree

1 file changed

+44
-43
lines changed

1 file changed

+44
-43
lines changed

docs/hub/datasets-viewer-sql-console.md

Lines changed: 44 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -26,49 +26,6 @@ You can also use the DuckDB CLI to query the dataset via the `hf://` protocol. S
2626

2727
## Examples
2828

29-
### Leakage Detection
30-
31-
Leakage detection is the process of identifying whether data in a dataset is present in multiple splits, for example, whether the test set is present in the training set.
32-
33-
<div class="flex justify-center">
34-
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/leakage-detection.png"/>
35-
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/leakage-detection-dark.png"/>
36-
</div>
37-
38-
<p class="text-sm text-center italic">
39-
Learn more about leakage detection <a href="https://huggingface.co/blog/lbourdois/lle">here</a>.
40-
</p>
41-
42-
```sql
43-
WITH
44-
overlapping_rows AS (
45-
SELECT COALESCE(
46-
(SELECT COUNT(*) AS overlap_count
47-
FROM train
48-
INTERSECT
49-
SELECT COUNT(*) AS overlap_count
50-
FROM test),
51-
0
52-
) AS overlap_count
53-
),
54-
total_unique_rows AS (
55-
SELECT COUNT(*) AS total_count
56-
FROM (
57-
SELECT * FROM train
58-
UNION
59-
SELECT * FROM test
60-
) combined
61-
)
62-
SELECT
63-
overlap_count,
64-
total_count,
65-
CASE
66-
WHEN total_count > 0 THEN (overlap_count * 100.0 / total_count)
67-
ELSE 0
68-
END AS overlap_percentage
69-
FROM overlapping_rows, total_unique_rows;
70-
```
71-
7229
### Filtering
7330

7431
The SQL Console makes filtering datasets really easy. For example, if you want to filter the `SkunkworksAI/reasoning-0.01` dataset for instructions and responses with a reasoning length of at least 10, you can use the following query:
@@ -128,3 +85,47 @@ FROM train
12885
WHERE regexp_matches(instruction, '```[a-z]*\n')
12986
limit 100
13087
```
88+
89+
90+
### Leakage Detection
91+
92+
Leakage detection is the process of identifying whether data in a dataset is present in multiple splits, for example, whether the test set is present in the training set.
93+
94+
<div class="flex justify-center">
95+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/leakage-detection.png"/>
96+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/leakage-detection-dark.png"/>
97+
</div>
98+
99+
<p class="text-sm text-center italic">
100+
Learn more about leakage detection <a href="https://huggingface.co/blog/lbourdois/lle">here</a>.
101+
</p>
102+
103+
```sql
104+
WITH
105+
overlapping_rows AS (
106+
SELECT COALESCE(
107+
(SELECT COUNT(*) AS overlap_count
108+
FROM train
109+
INTERSECT
110+
SELECT COUNT(*) AS overlap_count
111+
FROM test),
112+
0
113+
) AS overlap_count
114+
),
115+
total_unique_rows AS (
116+
SELECT COUNT(*) AS total_count
117+
FROM (
118+
SELECT * FROM train
119+
UNION
120+
SELECT * FROM test
121+
) combined
122+
)
123+
SELECT
124+
overlap_count,
125+
total_count,
126+
CASE
127+
WHEN total_count > 0 THEN (overlap_count * 100.0 / total_count)
128+
ELSE 0
129+
END AS overlap_percentage
130+
FROM overlapping_rows, total_unique_rows;
131+
```

0 commit comments

Comments
 (0)