Skip to content

Commit be43ea3

Browse files
committed
Add Hugging Face model support
- Add HuggingFaceLLMClient for local model inference - Support for 6 popular Hugging Face models (Llama 2, Mistral, Zephyr, etc.) - Add memory optimization with quantization support - Create comprehensive example and documentation - Add unit tests for Hugging Face integration - Update dependencies to include transformers, torch, accelerate
1 parent 3bcdd05 commit be43ea3

File tree

10 files changed

+1053
-15
lines changed

10 files changed

+1053
-15
lines changed

HUGGINGFACE_SUPPORT.md

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Hugging Face Model Support
2+
3+
Stagehand now supports using open-source Hugging Face models for local inference! This allows you to run web automation tasks without relying on external API services.
4+
5+
## Supported Models
6+
7+
The following Hugging Face models are pre-configured and ready to use:
8+
9+
- **Llama 2 7B Chat** (`huggingface/meta-llama/Llama-2-7b-chat-hf`)
10+
- **Llama 2 13B Chat** (`huggingface/meta-llama/Llama-2-13b-chat-hf`)
11+
- **Mistral 7B Instruct** (`huggingface/mistralai/Mistral-7B-Instruct-v0.1`)
12+
- **Zephyr 7B Beta** (`huggingface/HuggingFaceH4/zephyr-7b-beta`)
13+
- **CodeGen 2.5B Mono** (`huggingface/Salesforce/codegen-2B-mono`)
14+
- **StarCoder2 7B** (`huggingface/bigcode/starcoder2-7b`)
15+
16+
## Requirements
17+
18+
### Hardware Requirements
19+
- **GPU**: CUDA-compatible GPU with at least 8GB VRAM (recommended)
20+
- **RAM**: At least 16GB system RAM
21+
- **Storage**: 20GB+ free space for model downloads
22+
23+
### Software Requirements
24+
- Python 3.9+
25+
- CUDA toolkit (for GPU acceleration)
26+
- PyTorch with CUDA support
27+
28+
## Installation
29+
30+
Install the required dependencies:
31+
32+
```bash
33+
pip install transformers torch accelerate bitsandbytes
34+
```
35+
36+
For GPU support, make sure you have the appropriate CUDA version installed.
37+
38+
## Basic Usage
39+
40+
```python
41+
import asyncio
42+
from stagehand import Stagehand, StagehandConfig
43+
from stagehand.schemas import AvailableModel
44+
45+
async def main():
46+
# Configure Stagehand to use a Hugging Face model
47+
config = StagehandConfig(
48+
env="LOCAL",
49+
model_name=AvailableModel.HUGGINGFACE_ZEPHYR_7B,
50+
verbose=2,
51+
use_api=False,
52+
)
53+
54+
stagehand = Stagehand(config=config)
55+
56+
try:
57+
await stagehand.init()
58+
await stagehand.navigate("https://example.com")
59+
60+
# Extract data using the Hugging Face model
61+
result = await stagehand.extract(
62+
instruction="Extract the main heading from this page"
63+
)
64+
65+
print(f"Extracted: {result.data}")
66+
67+
finally:
68+
await stagehand.close()
69+
70+
asyncio.run(main())
71+
```
72+
73+
## Advanced Configuration
74+
75+
### Memory Optimization
76+
77+
For systems with limited GPU memory, you can use quantization:
78+
79+
```python
80+
config = StagehandConfig(
81+
env="LOCAL",
82+
model_name=AvailableModel.HUGGINGFACE_LLAMA_2_7B,
83+
use_api=False,
84+
model_client_options={
85+
"device": "cuda",
86+
"quantization_config": {
87+
"load_in_4bit": True,
88+
"bnb_4bit_compute_dtype": "float16",
89+
"bnb_4bit_use_double_quant": True,
90+
}
91+
}
92+
)
93+
```
94+
95+
### Custom Models
96+
97+
You can also use any Hugging Face model by specifying the full model name:
98+
99+
```python
100+
config = StagehandConfig(
101+
env="LOCAL",
102+
model_name="huggingface/your-username/your-model",
103+
use_api=False,
104+
)
105+
```
106+
107+
## Performance Tips
108+
109+
1. **Use GPU**: Always use CUDA if available for significantly faster inference
110+
2. **Quantization**: Use 4-bit or 8-bit quantization to reduce memory usage
111+
3. **Model Size**: Start with smaller models (7B parameters) for testing
112+
4. **Batch Processing**: Process multiple tasks in sequence rather than parallel
113+
5. **Memory Management**: Close other GPU applications when running large models
114+
115+
## Troubleshooting
116+
117+
### Out of Memory Errors
118+
- Use quantization (`load_in_4bit=True`)
119+
- Try a smaller model
120+
- Close other GPU applications
121+
- Use CPU mode (slower but uses less memory)
122+
123+
### Slow Performance
124+
- Ensure CUDA is properly installed
125+
- Use GPU instead of CPU
126+
- Try a smaller model
127+
- Check if other processes are using GPU
128+
129+
### Model Download Issues
130+
- Check internet connection
131+
- Ensure sufficient disk space
132+
- Try downloading manually from Hugging Face Hub
133+
134+
## Examples
135+
136+
See `examples/example_huggingface.py` for comprehensive examples including:
137+
- Basic usage with different models
138+
- Memory-efficient configurations
139+
- Form filling and data extraction
140+
- Error handling and troubleshooting
141+
142+
## Limitations
143+
144+
- **First Run**: Models are downloaded on first use (5-15GB)
145+
- **Memory**: Large models require significant GPU memory
146+
- **Speed**: Local inference is slower than API calls
147+
- **Model Quality**: Some models may not perform as well as commercial APIs
148+
149+
## Contributing
150+
151+
To add support for new Hugging Face models:
152+
153+
1. Add the model to `AvailableModel` enum in `schemas.py`
154+
2. Test the model with various web automation tasks
155+
3. Update this documentation
156+
4. Add tests for the new model
157+
158+
## Support
159+
160+
For issues related to Hugging Face model support:
161+
- Check the [Hugging Face documentation](https://huggingface.co/docs)
162+
- Review the example file for usage patterns
163+
- Open an issue on the GitHub repository

0 commit comments

Comments
 (0)