|
| 1 | +# Hugging Face Model Support |
| 2 | + |
| 3 | +Stagehand now supports using open-source Hugging Face models for local inference! This allows you to run web automation tasks without relying on external API services. |
| 4 | + |
| 5 | +## Supported Models |
| 6 | + |
| 7 | +The following Hugging Face models are pre-configured and ready to use: |
| 8 | + |
| 9 | +- **Llama 2 7B Chat** (`huggingface/meta-llama/Llama-2-7b-chat-hf`) |
| 10 | +- **Llama 2 13B Chat** (`huggingface/meta-llama/Llama-2-13b-chat-hf`) |
| 11 | +- **Mistral 7B Instruct** (`huggingface/mistralai/Mistral-7B-Instruct-v0.1`) |
| 12 | +- **Zephyr 7B Beta** (`huggingface/HuggingFaceH4/zephyr-7b-beta`) |
| 13 | +- **CodeGen 2.5B Mono** (`huggingface/Salesforce/codegen-2B-mono`) |
| 14 | +- **StarCoder2 7B** (`huggingface/bigcode/starcoder2-7b`) |
| 15 | + |
| 16 | +## Requirements |
| 17 | + |
| 18 | +### Hardware Requirements |
| 19 | +- **GPU**: CUDA-compatible GPU with at least 8GB VRAM (recommended) |
| 20 | +- **RAM**: At least 16GB system RAM |
| 21 | +- **Storage**: 20GB+ free space for model downloads |
| 22 | + |
| 23 | +### Software Requirements |
| 24 | +- Python 3.9+ |
| 25 | +- CUDA toolkit (for GPU acceleration) |
| 26 | +- PyTorch with CUDA support |
| 27 | + |
| 28 | +## Installation |
| 29 | + |
| 30 | +Install the required dependencies: |
| 31 | + |
| 32 | +```bash |
| 33 | +pip install transformers torch accelerate bitsandbytes |
| 34 | +``` |
| 35 | + |
| 36 | +For GPU support, make sure you have the appropriate CUDA version installed. |
| 37 | + |
| 38 | +## Basic Usage |
| 39 | + |
| 40 | +```python |
| 41 | +import asyncio |
| 42 | +from stagehand import Stagehand, StagehandConfig |
| 43 | +from stagehand.schemas import AvailableModel |
| 44 | + |
| 45 | +async def main(): |
| 46 | + # Configure Stagehand to use a Hugging Face model |
| 47 | + config = StagehandConfig( |
| 48 | + env="LOCAL", |
| 49 | + model_name=AvailableModel.HUGGINGFACE_ZEPHYR_7B, |
| 50 | + verbose=2, |
| 51 | + use_api=False, |
| 52 | + ) |
| 53 | + |
| 54 | + stagehand = Stagehand(config=config) |
| 55 | + |
| 56 | + try: |
| 57 | + await stagehand.init() |
| 58 | + await stagehand.navigate("https://example.com") |
| 59 | + |
| 60 | + # Extract data using the Hugging Face model |
| 61 | + result = await stagehand.extract( |
| 62 | + instruction="Extract the main heading from this page" |
| 63 | + ) |
| 64 | + |
| 65 | + print(f"Extracted: {result.data}") |
| 66 | + |
| 67 | + finally: |
| 68 | + await stagehand.close() |
| 69 | + |
| 70 | +asyncio.run(main()) |
| 71 | +``` |
| 72 | + |
| 73 | +## Advanced Configuration |
| 74 | + |
| 75 | +### Memory Optimization |
| 76 | + |
| 77 | +For systems with limited GPU memory, you can use quantization: |
| 78 | + |
| 79 | +```python |
| 80 | +config = StagehandConfig( |
| 81 | + env="LOCAL", |
| 82 | + model_name=AvailableModel.HUGGINGFACE_LLAMA_2_7B, |
| 83 | + use_api=False, |
| 84 | + model_client_options={ |
| 85 | + "device": "cuda", |
| 86 | + "quantization_config": { |
| 87 | + "load_in_4bit": True, |
| 88 | + "bnb_4bit_compute_dtype": "float16", |
| 89 | + "bnb_4bit_use_double_quant": True, |
| 90 | + } |
| 91 | + } |
| 92 | +) |
| 93 | +``` |
| 94 | + |
| 95 | +### Custom Models |
| 96 | + |
| 97 | +You can also use any Hugging Face model by specifying the full model name: |
| 98 | + |
| 99 | +```python |
| 100 | +config = StagehandConfig( |
| 101 | + env="LOCAL", |
| 102 | + model_name="huggingface/your-username/your-model", |
| 103 | + use_api=False, |
| 104 | +) |
| 105 | +``` |
| 106 | + |
| 107 | +## Performance Tips |
| 108 | + |
| 109 | +1. **Use GPU**: Always use CUDA if available for significantly faster inference |
| 110 | +2. **Quantization**: Use 4-bit or 8-bit quantization to reduce memory usage |
| 111 | +3. **Model Size**: Start with smaller models (7B parameters) for testing |
| 112 | +4. **Batch Processing**: Process multiple tasks in sequence rather than parallel |
| 113 | +5. **Memory Management**: Close other GPU applications when running large models |
| 114 | + |
| 115 | +## Troubleshooting |
| 116 | + |
| 117 | +### Out of Memory Errors |
| 118 | +- Use quantization (`load_in_4bit=True`) |
| 119 | +- Try a smaller model |
| 120 | +- Close other GPU applications |
| 121 | +- Use CPU mode (slower but uses less memory) |
| 122 | + |
| 123 | +### Slow Performance |
| 124 | +- Ensure CUDA is properly installed |
| 125 | +- Use GPU instead of CPU |
| 126 | +- Try a smaller model |
| 127 | +- Check if other processes are using GPU |
| 128 | + |
| 129 | +### Model Download Issues |
| 130 | +- Check internet connection |
| 131 | +- Ensure sufficient disk space |
| 132 | +- Try downloading manually from Hugging Face Hub |
| 133 | + |
| 134 | +## Examples |
| 135 | + |
| 136 | +See `examples/example_huggingface.py` for comprehensive examples including: |
| 137 | +- Basic usage with different models |
| 138 | +- Memory-efficient configurations |
| 139 | +- Form filling and data extraction |
| 140 | +- Error handling and troubleshooting |
| 141 | + |
| 142 | +## Limitations |
| 143 | + |
| 144 | +- **First Run**: Models are downloaded on first use (5-15GB) |
| 145 | +- **Memory**: Large models require significant GPU memory |
| 146 | +- **Speed**: Local inference is slower than API calls |
| 147 | +- **Model Quality**: Some models may not perform as well as commercial APIs |
| 148 | + |
| 149 | +## Contributing |
| 150 | + |
| 151 | +To add support for new Hugging Face models: |
| 152 | + |
| 153 | +1. Add the model to `AvailableModel` enum in `schemas.py` |
| 154 | +2. Test the model with various web automation tasks |
| 155 | +3. Update this documentation |
| 156 | +4. Add tests for the new model |
| 157 | + |
| 158 | +## Support |
| 159 | + |
| 160 | +For issues related to Hugging Face model support: |
| 161 | +- Check the [Hugging Face documentation](https://huggingface.co/docs) |
| 162 | +- Review the example file for usage patterns |
| 163 | +- Open an issue on the GitHub repository |
0 commit comments