Skip to content

Commit 2f6be59

Browse files
committed
update older images
1 parent 67f55ba commit 2f6be59

File tree

2 files changed

+20
-24
lines changed

2 files changed

+20
-24
lines changed

docs/hub/_toctree.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@
206206
- local: datasets-webdataset
207207
title: WebDataset
208208
- local: datasets-viewer
209-
title: Dataset Viewer
209+
title: Data Studio
210210
sections:
211211
- local: datasets-viewer-configure
212212
title: Configure the Dataset Viewer

docs/hub/datasets-viewer-sql-console.md

Lines changed: 19 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
You can run SQL queries on the dataset in the browser using the SQL Console. The SQL Console is powered by [DuckDB](https://duckdb.org/) WASM and runs entirely in the browser. You can access the SQL Console from the Data Studio.
44

55
<div class="flex justify-center">
6-
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/sql-ai.png" width=600/>
7-
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/sql-ai-dark.png" width=600/>
6+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/sql-ai.png"/>
7+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/sql-ai-dark.png"/>
88
</div>
99

1010
<p class="text-sm text-center italic">
@@ -32,59 +32,55 @@ You can also use the DuckDB locally through the CLI to query the dataset via the
3232
The SQL Console makes filtering datasets really easy. For example, if you want to filter the `SkunkworksAI/reasoning-0.01` dataset for instructions and responses with a reasoning length of at least 10, you can use the following query:
3333

3434
<div class="flex justify-center">
35-
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/bar-struct-length.png"/>
36-
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/bar-struct-length-dark.png"/>
35+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datastudio-filtering.png"/>
36+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datastudio-filtering-dark.png"/>
3737
</div>
3838

39-
In the query, we can use the `len` function to get the length of the `reasoning_chains` column and the `bar` function to create a bar chart of the reasoning lengths.
40-
39+
Here's the SQL to sort by length of the reasoning
4140
```sql
42-
SELECT len(reasoning_chains) AS reason_len, bar(reason_len, 0, 100), *
41+
SELECT *
4342
FROM train
44-
WHERE reason_len > 10
45-
ORDER BY reason_len DESC
43+
WHERE LENGTH(reasoning_chains) > 10;
4644
```
4745

48-
The [bar](https://duckdb.org/docs/sql/functions/char.html#barx-min-max-width) function is a neat built-in DuckDB function that creates a bar chart of the reasoning lengths.
49-
5046
### Histogram
5147

5248
Many dataset authors choose to include statistics about the distribution of the data in the dataset. Using the DuckDB `histogram` function, we can plot a histogram of a column's values.
5349

54-
For example, to plot a histogram of the `reason_len` column in the `SkunkworksAI/reasoning-0.01` dataset, you can use the following query:
50+
For example, to plot a histogram of the `Rating` column in the [Lichess/chess-puzzles](https://huggingface.co/datasets/Lichess/chess-puzzles) dataset, you can use the following query:
5551

5652
<div class="flex justify-center">
57-
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/histogram-simple.png"/>
58-
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/histogram-simple-dark.png"/>
53+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datastudio-histogram.png"/>
54+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datastudio-histogram-dark.png"/>
5955
</div>
6056
<p class="text-sm text-center italic">
6157
Learn more about the `histogram` function and parameters <a href="https://cfahlgren1-sql-snippets.hf.space/histogram" target="_blank" rel="noopener noreferrer">here</a>.
6258
</p>
6359

6460
```sql
65-
FROM histogram(train, len(reasoning_chains))
61+
from histogram(train, Rating)
6662
```
6763

6864
### Regex Matching
6965

7066
One of the most powerful features of DuckDB is the deep support for regular expressions. You can use the `regexp` function to match patterns in your data.
7167

72-
Using the [regexp_matches](https://duckdb.org/docs/sql/functions/char.html#regexp_matchesstring-pattern) function, we can filter the `SkunkworksAI/reasoning-0.01` dataset for instructions that contain markdown code blocks.
68+
Using the [regexp_matches](https://duckdb.org/docs/sql/functions/char.html#regexp_matchesstring-pattern) function, we can filter the [GeneralReasoning/GeneralThought-195k](https://huggingface.co/datasets/GeneralReasoning/GeneralThought-195K) dataset for instructions that contain markdown code blocks.
7369

7470
<div class="flex justify-center">
75-
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/regex-matching-markdown-code.png"/>
76-
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/regex-matching-markdown-code-dark.png"/>
71+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datastudio-regex.png"/>
72+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datastudio-regex-dark.png"/>
7773
</div>
7874
<p class="text-sm text-center italic">
7975
Learn more about the DuckDB regex functions <a href="https://duckdb.org/docs/sql/functions/regular_expressions.html" target="_blank" rel="noopener noreferrer">here</a>.
8076
</p>
8177

8278

8379
```sql
84-
SELECT *
80+
SELECT *
8581
FROM train
86-
WHERE regexp_matches(instruction, '```[a-z]*\n')
87-
limit 100
82+
WHERE regexp_matches(model_answer, '```')
83+
LIMIT 10;
8884
```
8985

9086

@@ -93,8 +89,8 @@ limit 100
9389
Leakage detection is the process of identifying whether data in a dataset is present in multiple splits, for example, whether the test set is present in the training set.
9490

9591
<div class="flex justify-center">
96-
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/leakage-detection.png"/>
97-
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sql_console/leakage-detection-dark.png"/>
92+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datastudio-leakage.png"/>
93+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datastudio-leakage-dark.png"/>
9894
</div>
9995

10096
<p class="text-sm text-center italic">

0 commit comments

Comments
 (0)