Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,12 @@
"source": "./skills/huggingface-vision-trainer",
"skills": "./",
"description": "Train and fine-tune object detection models (RTDETRv2, YOLOS, DETR and others) and image classification models (timm and transformers models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3) using Transformers Trainer API on Hugging Face Jobs infrastructure or locally. Includes COCO dataset format support, Albumentations augmentation, mAP/mAR metrics, trackio tracking, hardware selection, and Hub persistence."
},
{
"name": "huggingface-spaces",
"source": "./skills/huggingface-spaces",
"skills": "./",
"description": "Find and call Hugging Face Spaces to generate AI artifacts (images, audio, 3D models, etc). Uses semantic search to discover Spaces, then calls their Gradio APIs."
}
]
}
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ This repository contains a few skills to get you started. You can also contribut
| `huggingface-llm-trainer` | Train or fine-tune language models using TRL on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes hardware selection, cost estimation, Trackio monitoring, and Hub persistence. | [SKILL.md](skills/huggingface-llm-trainer/SKILL.md) |
| `huggingface-paper-publisher` | Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles. | [SKILL.md](skills/huggingface-paper-publisher/SKILL.md) |
| `huggingface-papers` | Look up and read Hugging Face paper pages in markdown, and use the papers API for structured metadata like authors, linked models, datasets, Spaces, and media URLs when needed. | [SKILL.md](skills/huggingface-papers/SKILL.md) |
| `huggingface-spaces` | Find and call Hugging Face Spaces to generate AI artifacts (images, audio, 3D models, etc). Uses semantic search to discover Spaces, then calls their Gradio APIs. | [SKILL.md](skills/huggingface-spaces/SKILL.md) |
| `huggingface-tool-builder` | Build reusable scripts for Hugging Face Hub and API workflows. Useful for chaining API calls, enriching Hub metadata, or automating repeated tasks. | [SKILL.md](skills/huggingface-tool-builder/SKILL.md) |
| `huggingface-trackio` | Track and visualize ML training experiments with Trackio. Log metrics via Python API and retrieve them via CLI. Supports real-time dashboards synced to HF Spaces. | [SKILL.md](skills/huggingface-trackio/SKILL.md) |
| `huggingface-vision-trainer` | Train and fine-tune object detection models (RTDETRv2, YOLOS, DETR and others) and image classification models (timm and transformers models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3) using Transformers Trainer API on Hugging Face Jobs infrastructure or locally. Includes COCO dataset format support, Albumentations augmentation, mAP/mAR metrics, trackio tracking, hardware selection, and Hub persistence. | [SKILL.md](skills/huggingface-vision-trainer/SKILL.md) |
Expand Down
2 changes: 2 additions & 0 deletions agents/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ These skills are:
- huggingface-llm-trainer -> "skills/huggingface-llm-trainer/SKILL.md"
- huggingface-paper-publisher -> "skills/huggingface-paper-publisher/SKILL.md"
- huggingface-papers -> "skills/huggingface-papers/SKILL.md"
- huggingface-spaces -> "skills/huggingface-spaces/SKILL.md"
- huggingface-tool-builder -> "skills/huggingface-tool-builder/SKILL.md"
- huggingface-trackio -> "skills/huggingface-trackio/SKILL.md"
- huggingface-vision-trainer -> "skills/huggingface-vision-trainer/SKILL.md"
Expand All @@ -26,6 +27,7 @@ huggingface-gradio: `Build Gradio web UIs and demos in Python. Use when creating
huggingface-llm-trainer: `Train or fine-tune language and vision models using TRL (Transformer Reinforcement Learning) or Unsloth with Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, model selection/leaderboards and model persistence. Use for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.`
huggingface-paper-publisher: `Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.`
huggingface-papers: `Look up and read Hugging Face paper pages in markdown, and use the papers API for structured metadata such as authors, linked models/datasets/spaces, Github repo and project page. Use when the user shares a Hugging Face paper page URL, an arXiv URL or ID, or asks to summarize, explain, or analyze an AI research paper.`
huggingface-spaces: `Find and call Hugging Face Spaces to generate AI artifacts (images, audio, 3D models, etc). Uses semantic search to discover Spaces, then calls their Gradio APIs.`
huggingface-tool-builder: `Use this skill when the user wants to build tool/scripts or achieve a task where using data from the Hugging Face API would help. This is especially useful when chaining or combining API calls or the task will be repeated/automated. This Skill creates a reusable script to fetch, enrich or process data.`
huggingface-trackio: `Track and visualize ML training experiments with Trackio. Use when logging metrics during training (Python API), firing alerts for training diagnostics, or retrieving/analyzing logged metrics (CLI). Supports real-time dashboard visualization, alerts with webhooks, HF Space syncing, and JSON output for automation.`
huggingface-vision-trainer: `Trains and fine-tunes vision models for object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3 — plus any Transformers classifier), and SAM/SAM2 segmentation using Hugging Face Transformers on Hugging Face Jobs cloud GPUs. Covers COCO-format dataset preparation, Albumentations augmentation, mAP/mAR evaluation, accuracy metrics, SAM segmentation with bbox/point prompts, DiceCE loss, hardware selection, cost estimation, Trackio monitoring, and Hub persistence. Use when users mention training object detection, image classification, SAM, SAM2, segmentation, image matting, DETR, D-FINE, RT-DETR, ViT, timm, MobileNet, ResNet, bounding box models, or fine-tuning vision models on Hugging Face Jobs.`
Expand Down
96 changes: 96 additions & 0 deletions skills/huggingface-spaces/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
name: huggingface-spaces
description: Find and call Hugging Face Spaces to generate AI artifacts (images, audio, 3D models, etc). Uses semantic search to discover Spaces, then calls their Gradio APIs.
user-invocable: true
allowed-tools: Bash WebFetch Read Write
argument-hint: <prompt describing what to generate>
---

# Hugging Face Spaces Tool

You have access to thousands of AI apps hosted on Hugging Face Spaces. Use them to generate artifacts like images, audio, 3D models, videos, text, and more.

## Authentication

Install the latest `hf` CLI and log in:

```bash
curl -LsSf https://hf.co/cli/install.sh | bash
hf auth login
```

Always include the user's HF token in API requests:
- **REST calls**: `Authorization: Bearer $(hf auth token)`
- **Python client**: `Client("space-url", hf_token=subprocess.check_output(["hf", "auth", "token"]).decode().strip())`

## Workflow

### Step 1: Find the right Space

Use the `hf` CLI to search for a Space matching the user's request:

```bash
hf spaces search --sdk gradio "<search query>"
```

- Always filter by `--sdk gradio` (only Gradio spaces have callable APIs)
- The output lists Space IDs sorted by relevance with descriptions
- Prefer spaces that are running and have high trending scores
- The space domain is derived from the `id`: `owner-spacename.hf.space` (replace `/` with `-`, lowercase)

### Step 2: Call the Space

Fetch the Space's `agents.md` and follow its instructions to call the Space:

```bash
curl https://huggingface.co/spaces/<owner>/<spacename>/agents.md
```

This returns a Markdown document with everything needed to call the Space: available endpoints, parameters, input/output types, and usage examples — purpose-built for agents like this one.

### Step 3: Handle the output

- **Files (images, audio, 3D models)**: Download from the returned URL and save locally
- **Open/play the result**: Use `open <file>` (macOS) or `afplay <file>` (audio)
- File URLs from Gradio look like: `https://<space>.hf.space/gradio_api/file=<path>`

### Step 4: Save results to a Hugging Face bucket

Persist generated artifacts to a Hugging Face bucket so they survive across sessions and are shareable:

```bash
# Create the bucket once (no-op if it already exists)
hf buckets create <namespace>/<bucket-name> --exist-ok

# Upload the generated file
hf buckets cp <local-file> hf://buckets/<namespace>/<bucket-name>/
```

- Default to a bucket named after the use case (e.g. `<user>/spaces-outputs`) unless the user specifies one
- Use subpaths to organize by Space or run: `hf://buckets/<user>/spaces-outputs/<space-name>/<timestamp>-<file>`
- `hf buckets sync ./local-dir hf://buckets/<user>/<bucket>` for batch uploads
- Print the `hf://` URI back to the user so they can `hf buckets cp` it later

## Tips

- Read `agents.md` carefully — it often documents exact parameter names, accepted values, and example calls
- For the Python client, use `handle_file("/path/to/file")` or `handle_file("https://url")` for file/image inputs
- ZeroGPU spaces have usage quotas — if you get "GPU quota exceeded", wait or try another space
- Multi-step pipelines (e.g., image-to-3D) often require session state — use the Python client
- If a user provides a specific Space URL, skip the search step and use it directly

## Examples

**User says**: "generate an image of a sunset"
1. Search: `hf spaces search --sdk gradio "text to image generation"`
2. Pick a top result (e.g., `mrfakename/Z-Image-Turbo`)
3. Fetch `agents.md`, follow its instructions, download and open the result

**User says**: "convert this image to 3D"
1. Search: `hf spaces search --sdk gradio "image to 3d model"`
2. Pick a result (e.g., a Trellis space)
3. Fetch `agents.md` and follow its instructions

**User says**: "say hello world in speech"
1. Search: `hf spaces search --sdk gradio "text to speech"`
2. Pick a TTS space, call the generate endpoint, download and play the audio