Skip to content

Commit e13e957

Browse files
Merge branch 'main' of github.com:avinashsingh77/modctl
2 parents 3b75daa + 4d7a6dd commit e13e957

File tree

1 file changed

+131
-0
lines changed

1 file changed

+131
-0
lines changed

scripts/README.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Model Selection Scripts
2+
3+
This directory contains scripts for selecting and filtering HuggingFace models compatible with modctl.
4+
5+
## select_top_models.py
6+
7+
Python script that fetches top models from HuggingFace Hub and filters them based on modctl compatibility criteria.
8+
9+
### Compatibility Criteria
10+
11+
The script filters models based on:
12+
13+
1. **Has config.json** - Required for auto-detection of model metadata
14+
2. **Supported formats** - Must have files in formats like:
15+
- `safetensors` (preferred)
16+
- `gguf`
17+
- `bin` (PyTorch)
18+
- `pt`, `pth` (PyTorch)
19+
- `onnx`
20+
3. **Size limit** - Configurable maximum size (default: 20GB)
21+
4. **Metadata** - Attempts to extract:
22+
- Model family (llama, qwen, gpt2, etc.)
23+
- Parameter size (0.5B, 7B, etc.)
24+
- Format type
25+
26+
### Installation
27+
28+
```bash
29+
pip install -r requirements.txt
30+
```
31+
32+
### Usage
33+
34+
Basic usage (fetch top 10 models by downloads):
35+
36+
```bash
37+
python scripts/select_top_models.py
38+
```
39+
40+
#### Options
41+
42+
```bash
43+
python scripts/select_top_models.py \
44+
--limit 10 \ # Number of models to select (default: 10)
45+
--max-size 20.0 \ # Maximum model size in GB (default: 20.0)
46+
--sort-by downloads \ # Sort by: downloads, likes, trending (default: downloads)
47+
--task text-generation \ # Task filter (default: text-generation)
48+
--output models.json # Output file (default: stdout)
49+
```
50+
51+
#### Examples
52+
53+
Get top 5 small models (< 5GB):
54+
55+
```bash
56+
python scripts/select_top_models.py --limit 5 --max-size 5
57+
```
58+
59+
Get most liked models:
60+
61+
```bash
62+
python scripts/select_top_models.py --limit 10 --sort-by likes
63+
```
64+
65+
Save to file:
66+
67+
```bash
68+
python scripts/select_top_models.py --limit 20 --output top_models.json
69+
```
70+
71+
### Output Format
72+
73+
The script outputs JSON with model metadata:
74+
75+
```json
76+
[
77+
{
78+
"id": "Qwen/Qwen3-0.6B",
79+
"family": "qwen3",
80+
"arch": "transformer",
81+
"format": "safetensors",
82+
"param_size": "0.6B",
83+
"size_gb": 1.41,
84+
"downloads": 7509488,
85+
"likes": 867
86+
}
87+
]
88+
```
89+
90+
### Authentication
91+
92+
Some models require HuggingFace authentication. Set the `HF_TOKEN` environment variable:
93+
94+
```bash
95+
export HF_TOKEN="your_huggingface_token"
96+
python scripts/select_top_models.py
97+
```
98+
99+
Or use `huggingface-cli`:
100+
101+
```bash
102+
huggingface-cli login
103+
python scripts/select_top_models.py
104+
```
105+
106+
## GitHub Workflow Integration
107+
108+
The `build-top-models.yml` workflow uses this script to automatically:
109+
110+
1. Select top models from HuggingFace
111+
2. Build them using modctl
112+
3. Push to GitHub Container Registry
113+
114+
### Manual Trigger
115+
116+
You can manually trigger the workflow from GitHub Actions tab with custom parameters:
117+
118+
- **limit**: Number of models to build (default: 10)
119+
- **max_size**: Maximum model size in GB (default: 20)
120+
- **sort_by**: Sort criteria - downloads, likes, or trending
121+
122+
### Scheduled Runs
123+
124+
The workflow runs automatically every Sunday at 00:00 UTC.
125+
126+
### Required Secrets
127+
128+
The workflow requires these GitHub secrets:
129+
130+
- `HF_TOKEN` - HuggingFace API token (for downloading models)
131+
- `GITHUB_TOKEN` - Automatically provided by GitHub Actions

0 commit comments

Comments
 (0)