Skip to content

Commit 95d7ada

Browse files
authored
Link Spark docs to Spark JupyterLab Space (#1403)
* link to spark jupyterlab space * Update datasets-spark.md
1 parent 1517a74 commit 95d7ada

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

docs/hub/datasets-spark.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,7 @@ To filter the dataset and only keep dialogues in Chinese:
211211
```python
212212
>>> criteria = [("langdetect", "=", "zh-cn")]
213213
>>> df_chinese_only = read_parquet("hf://datasets/BAAI/Infinity-Instruct/7M/*.parquet", filters=criteria)
214-
>>> df_chinese_only
214+
>>> df_chinese_only.show()
215215
+---+----------------------------+-----+----------+----------+
216216
| id| conversations|label|langdetect| source|
217217
+---+----------------------------+-----+----------+----------+
@@ -357,3 +357,11 @@ tmpmj97ab30.parquet: 100%|██████████| 71.3M/71.3M [00:02<00:
357357
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datasets-spark-infinity-instruct-chinese-only-min.png"/>
358358
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datasets-spark-infinity-instruct-chinese-only-dark-min.png"/>
359359
</div>
360+
361+
## Run in JupyterLab on Hugging Face Spaces
362+
363+
You can duplicate the [Spark on HF JupyterLab](https://huggingface.co/spaces/lhoestq/Spark-on-HF-JupyterLab) Space to get a Notebook with PySpark and those helper functions pre-installed.
364+
365+
Click on "Duplicate Space", choose a name for your Space, select your hardware and you are ready:
366+
367+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/spark-on-hf-jupyterlab-screenshot-min.png">

0 commit comments

Comments
 (0)