|
| 1 | +# Model Selection Scripts |
| 2 | + |
| 3 | +This directory contains scripts for selecting and filtering HuggingFace models compatible with modctl. |
| 4 | + |
| 5 | +## select_top_models.py |
| 6 | + |
| 7 | +Python script that fetches top models from HuggingFace Hub and filters them based on modctl compatibility criteria. |
| 8 | + |
| 9 | +### Compatibility Criteria |
| 10 | + |
| 11 | +The script filters models based on: |
| 12 | + |
| 13 | +1. **Has config.json** - Required for auto-detection of model metadata |
| 14 | +2. **Supported formats** - Must have files in formats like: |
| 15 | + - `safetensors` (preferred) |
| 16 | + - `gguf` |
| 17 | + - `bin` (PyTorch) |
| 18 | + - `pt`, `pth` (PyTorch) |
| 19 | + - `onnx` |
| 20 | +3. **Size limit** - Configurable maximum size (default: 20GB) |
| 21 | +4. **Metadata** - Attempts to extract: |
| 22 | + - Model family (llama, qwen, gpt2, etc.) |
| 23 | + - Parameter size (0.5B, 7B, etc.) |
| 24 | + - Format type |
| 25 | + |
| 26 | +### Installation |
| 27 | + |
| 28 | +```bash |
| 29 | +pip install -r requirements.txt |
| 30 | +``` |
| 31 | + |
| 32 | +### Usage |
| 33 | + |
| 34 | +Basic usage (fetch top 10 models by downloads): |
| 35 | + |
| 36 | +```bash |
| 37 | +python scripts/select_top_models.py |
| 38 | +``` |
| 39 | + |
| 40 | +#### Options |
| 41 | + |
| 42 | +```bash |
| 43 | +python scripts/select_top_models.py \ |
| 44 | + --limit 10 \ # Number of models to select (default: 10) |
| 45 | + --max-size 20.0 \ # Maximum model size in GB (default: 20.0) |
| 46 | + --sort-by downloads \ # Sort by: downloads, likes, trending (default: downloads) |
| 47 | + --task text-generation \ # Task filter (default: text-generation) |
| 48 | + --output models.json # Output file (default: stdout) |
| 49 | +``` |
| 50 | + |
| 51 | +#### Examples |
| 52 | + |
| 53 | +Get top 5 small models (< 5GB): |
| 54 | + |
| 55 | +```bash |
| 56 | +python scripts/select_top_models.py --limit 5 --max-size 5 |
| 57 | +``` |
| 58 | + |
| 59 | +Get most liked models: |
| 60 | + |
| 61 | +```bash |
| 62 | +python scripts/select_top_models.py --limit 10 --sort-by likes |
| 63 | +``` |
| 64 | + |
| 65 | +Save to file: |
| 66 | + |
| 67 | +```bash |
| 68 | +python scripts/select_top_models.py --limit 20 --output top_models.json |
| 69 | +``` |
| 70 | + |
| 71 | +### Output Format |
| 72 | + |
| 73 | +The script outputs JSON with model metadata: |
| 74 | + |
| 75 | +```json |
| 76 | +[ |
| 77 | + { |
| 78 | + "id": "Qwen/Qwen3-0.6B", |
| 79 | + "family": "qwen3", |
| 80 | + "arch": "transformer", |
| 81 | + "format": "safetensors", |
| 82 | + "param_size": "0.6B", |
| 83 | + "size_gb": 1.41, |
| 84 | + "downloads": 7509488, |
| 85 | + "likes": 867 |
| 86 | + } |
| 87 | +] |
| 88 | +``` |
| 89 | + |
| 90 | +### Authentication |
| 91 | + |
| 92 | +Some models require HuggingFace authentication. Set the `HF_TOKEN` environment variable: |
| 93 | + |
| 94 | +```bash |
| 95 | +export HF_TOKEN="your_huggingface_token" |
| 96 | +python scripts/select_top_models.py |
| 97 | +``` |
| 98 | + |
| 99 | +Or use `huggingface-cli`: |
| 100 | + |
| 101 | +```bash |
| 102 | +huggingface-cli login |
| 103 | +python scripts/select_top_models.py |
| 104 | +``` |
| 105 | + |
| 106 | +## GitHub Workflow Integration |
| 107 | + |
| 108 | +The `build-top-models.yml` workflow uses this script to automatically: |
| 109 | + |
| 110 | +1. Select top models from HuggingFace |
| 111 | +2. Build them using modctl |
| 112 | +3. Push to GitHub Container Registry |
| 113 | + |
| 114 | +### Manual Trigger |
| 115 | + |
| 116 | +You can manually trigger the workflow from GitHub Actions tab with custom parameters: |
| 117 | + |
| 118 | +- **limit**: Number of models to build (default: 10) |
| 119 | +- **max_size**: Maximum model size in GB (default: 20) |
| 120 | +- **sort_by**: Sort criteria - downloads, likes, or trending |
| 121 | + |
| 122 | +### Scheduled Runs |
| 123 | + |
| 124 | +The workflow runs automatically every Sunday at 00:00 UTC. |
| 125 | + |
| 126 | +### Required Secrets |
| 127 | + |
| 128 | +The workflow requires these GitHub secrets: |
| 129 | + |
| 130 | +- `HF_TOKEN` - HuggingFace API token (for downloading models) |
| 131 | +- `GITHUB_TOKEN` - Automatically provided by GitHub Actions |
0 commit comments