Skip to content

Commit 03359ed

Browse files
committed
Add 'Working Without File Access' section to LLM guide
- Add commands for LLMs to request when they don't have direct file access - Includes tree/find commands for structure, du/ls for sizes, head for data preview - Helps LLMs guide users on web interfaces (Claude, ChatGPT browser) - Placed early in guide for immediate visibility
1 parent 89029b8 commit 03359ed

File tree

1 file changed

+37
-0
lines changed

1 file changed

+37
-0
lines changed

docs/hub/datasets-upload-guide-llm.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,43 @@ Your goal is to help a user upload a dataset to the Hugging Face Hub. Ideally, t
1717
| **Use appropriate Features** | When using the datasets library, specify correct feature types (e.g., Image(), Audio(), ClassLabel()) to ensure proper data handling and viewer functionality. This enables type-specific optimizations and previews. | Required (when using datasets library) |
1818
| **Document non-standard datasets** | If conversion to hub-compatible formats is impossible and custom formats must be used, ensure repository limits are strictly followed and provide clear documentation on how to download and load the dataset. Include usage examples and any special requirements. | Required (when datasets library isn't compatible) |
1919

20+
## Working Without File Access
21+
22+
When you don't have direct access to the user's files (e.g., web interface), ask the user to run these commands to understand their dataset:
23+
24+
**Dataset structure**:
25+
```bash
26+
# Show directory tree (install with: pip install tree or brew install tree)
27+
tree -L 3 --filelimit 20
28+
29+
# Alternative without tree:
30+
find . -type f -name "*.csv" -o -name "*.json" -o -name "*.parquet" | head -20
31+
```
32+
33+
**Check file sizes**:
34+
```bash
35+
# Total dataset size
36+
du -sh .
37+
38+
# Individual file sizes
39+
ls -lh data/
40+
```
41+
42+
**Peek at data format**:
43+
```bash
44+
# First few lines of CSV/JSON
45+
head -n 5 data/train.csv
46+
47+
# Check image folder structure
48+
ls -la images/ | head -10
49+
```
50+
51+
**Quick file count**:
52+
```bash
53+
# Count files by type
54+
find . -name "*.jpg" | wc -l
55+
```
56+
2057
## Critical Constraints
2158

2259
**Storage Limits**:

0 commit comments

Comments
 (0)