Skip to content

Commit c99feee

Browse files
committed
Updated LocalLab v0.3.1 and Updated Docs
1 parent ccd826a commit c99feee

File tree

6 files changed

+452
-178
lines changed

6 files changed

+452
-178
lines changed

CHANGELOG.md

Lines changed: 17 additions & 137 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,22 @@
22

33
All notable changes to LocalLab will be documented in this file.
44

5+
## [0.3.1] - 2025-03-06
6+
7+
### Fixed
8+
9+
- Fixed NameError in config.py by properly defining the CUSTOM_MODEL variable
10+
- Enhanced error handling for environment variable loading
11+
- Improved model registry initialization for custom models
12+
- Fixed key mismatch in custom model requirements dictionary
13+
- Standardized environment variable naming with LOCALLAB\_ prefix
14+
- Removed duplicate configuration settings for optimization parameters
15+
- Added safe model registry access with fallback support
16+
- Added better error handling for missing environment variables
17+
- Added consistent version constraints for dependencies in setup.py
18+
- Improved error messages for configuration issues
19+
- Updated environment variable documentation with more details
20+
521
## [0.3.0] - 2025-03-05
622

723
### Added
@@ -108,140 +124,4 @@ All notable changes to LocalLab will be documented in this file.
108124

109125
### Added
110126

111-
- Added new `get_gpu_info()` function for detailed GPU monitoring
112-
- Added improved system resource endpoint with detailed GPU metrics
113-
- Added robust environment variable handling for optimization settings
114-
115-
### Changed
116-
117-
- Made optimization flags more robust by checking for empty string values
118-
- Improved fallback handling for missing torch packages
119-
- Enhanced server startup logs with better optimization information
120-
121-
## [0.2.3] - 2025-03-02
122-
123-
### Fixed
124-
125-
- Fixed critical server startup error in Google Colab environment with uvicorn callback configuration
126-
- Resolved "'list' object is not callable" error by properly implementing the callback_notify as an async function
127-
- Enhanced server startup sequence for better compatibility with both local and Colab environments
128-
- Improved custom server implementation to handle callbacks more robustly
129-
130-
## [0.2.2] - 2025-03-02
131-
132-
### Fixed
133-
134-
- Fixed circular import issue between core/app.py and routes/system.py by updating system.py to use get_request_count from logger module directly
135-
- Made Flash Attention warning less alarming by changing it from a warning to an info message with better explanation
136-
- Enhanced get_system_info endpoint with cleaner code and better organization
137-
- Fixed potential issues with GPU info retrieval through better error handling
138-
139-
## [0.2.0] - 2025-03-02
140-
141-
### Added
142-
143-
- Comprehensive environment check system that validates:
144-
- Python version compatibility
145-
- CUDA/GPU availability and configuration
146-
- Ngrok token presence when running in Google Colab
147-
- Improved error handling with detailed error messages and suggestions
148-
- Clear instructions for setting up ngrok authentication token
149-
150-
### Changed
151-
152-
- Complete removal of the deprecated monolithic `main.py` file
153-
- Enhanced ngrok setup process with better authentication handling:
154-
- Automatic detection of auth token from environment variables
155-
- Clear error messages when auth token is missing
156-
- Improved token validation and connection process
157-
- Parameter renamed from `ngrok` to `use_ngrok` for clarity
158-
- More readable ASCII art for initializing banner
159-
- Improved documentation about ngrok requirements for Google Colab
160-
161-
### Fixed
162-
163-
- Fixed circular import issues between core/app.py and routes modules
164-
- Fixed ngrok authentication flow to properly use auth token from environment variables
165-
- Fixed error with missing torch import in the server.py file
166-
- Added graceful handling of missing torch module to prevent startup failures
167-
- Improved error messages when server fails to start
168-
- Better exception handling throughout the codebase
169-
170-
## [0.1.9] - 2025-03-01
171-
172-
### Added
173-
174-
- Clear ASCII art status indicators ("INITIALIZING" and "RUNNING") showing server state
175-
- Warning messages that prevent users from making API requests before the server is ready
176-
- Callback mechanism to display the "RUNNING" banner only when the server is fully operational
177-
- New dedicated logger module with comprehensive features:
178-
- Colorized console output for different log levels
179-
- Server status tracking (initializing, running, error, shutting_down)
180-
- Request tracking with detailed metrics
181-
- Model loading/unloading metrics
182-
- Performance monitoring for slow requests
183-
- API documentation for logger module with usage examples
184-
185-
### Changed
186-
187-
- Completely refactored the codebase into a more modular structure:
188-
- Split main.py into smaller, focused modules
189-
- Created separate directories for routes, UI components, utilities, and core functionality
190-
- Improved import structure to prevent circular dependencies
191-
- Better organization of server startup and API functionality
192-
- Enhanced model loading process with proper timing and status updates
193-
- Improved error handling throughout the application
194-
- Better request metrics in response headers
195-
- Removed old logger.py in favor of the new dedicated logger module
196-
197-
### Fixed
198-
199-
- Complete removal of health checks and validation when setting up ngrok tunnels
200-
- Fixed issue where logs did not appear correctly due to server starting in a separate process
201-
- Simplified ngrok setup process to run without validation to prevent connection errors during startup
202-
- Improved server startup flow to be more direct without background health checks or API validation
203-
- Reorganized startup sequence to work properly with ngrok, enhancing compatibility with Colab
204-
205-
## [0.1.7] - 2025-03-01
206-
207-
### Changed
208-
209-
- Removed the background process workflow for server startup. The server now runs directly in the main process, ensuring that all logs (banner, model details, system resources, etc.) are displayed properly.
210-
- Simplified the startup process by directly calling uvicorn.run(), with optional ngrok setup if the server is run in Google Colab.
211-
212-
## [0.1.6] - 2025-02-25
213-
214-
### Added
215-
216-
- Added utility function is_port_in_use(port: int) → bool to check if a port is already in use.
217-
- Added async utility function load_model_in_background(model_id: str) to load the model asynchronously in the background while managing the global loading flag.
218-
- Updated server startup functions to incorporate these utilities, ensuring proper port management and asynchronous model loading.
219-
220-
## [0.1.5] - 2025-02-25
221-
222-
### Changed
223-
224-
- Extended the initial wait time in start_server from 5 to 15 seconds to allow the server ample time to initialize, especially in Google Colab environments.
225-
- Increased health check timeout to 120 seconds for ngrok mode and 60 seconds for local mode to accommodate slower startups.
226-
- Added detailed logging during health checks to aid in debugging startup issues.
227-
228-
## [0.1.4] - 2025-02-25
229-
230-
### Changed
231-
232-
- Improved logging across startup: the banner, model details, configuration, system resources, API documentation, quick start guide, and footer are now fully logged and printed.
233-
- Updated the start_server function to extend the health check timeout to 60 seconds in Google Colab (when using ngrok) and to set an environment variable to trigger the Colab branch in run_server_proc.
234-
- Modified startup_event to load the model in the background, ensuring that the server's /health endpoint becomes available in time and that logging output is complete.
235-
236-
## [0.1.3] - 2025-02-25
237-
238-
### Changed
239-
240-
- Updated GitHub Actions workflow to install the Locallab package along with its runtime dependencies in CI, ensuring that all required packages are available for proper testing.
241-
242-
### Fixed
243-
244-
- Refactored `run_server_proc` in the spawned process to initialize a dedicated logger ("locallab.spawn") to avoid inheriting SemLock objects from a fork context.
245-
- Ensured that the log queue is created using the multiprocessing spawn context, preventing runtime errors in Google Colab.
246-
- Updated Mermaid diagrams in `README.md` and `docs/colab/README.md` to enclose node labels in double quotes, resolving parse errors in GitHub rendering.
247-
- Removed duplicate architecture diagrams from the root `
127+
- Added new `

docs/colab/setup_guide.md

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Running LocalLab in Google Colab
2+
3+
This guide shows you how to set up and run LocalLab in a Google Colab environment. Google Colab provides free GPU resources that can significantly speed up model inference.
4+
5+
## Quick Start
6+
7+
Copy and paste this code into a Colab notebook cell:
8+
9+
```python
10+
# Install LocalLab package
11+
!pip install locallab
12+
13+
# Set up and run the server
14+
import os
15+
import logging
16+
17+
# Configure logging
18+
logging.basicConfig(
19+
level=logging.INFO,
20+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
21+
)
22+
23+
# Replace with your actual ngrok token
24+
os.environ["NGROK_AUTH_TOKEN"] = "your_ngrok_token"
25+
26+
# Model selection - use any HuggingFace model
27+
os.environ["HUGGINGFACE_MODEL"] = "microsoft/phi-2" # Light model that works well on Colab
28+
29+
# Memory optimization settings
30+
os.environ["LOCALLAB_ENABLE_QUANTIZATION"] = "true"
31+
os.environ["LOCALLAB_QUANTIZATION_TYPE"] = "int8"
32+
os.environ["LOCALLAB_ENABLE_ATTENTION_SLICING"] = "true"
33+
34+
# Speed optimization settings
35+
os.environ["LOCALLAB_ENABLE_FLASH_ATTENTION"] = "true"
36+
os.environ["LOCALLAB_ENABLE_BETTERTRANSFORMER"] = "true"
37+
38+
# Import and start server
39+
from locallab import start_server
40+
41+
# Start with explicit ngrok flag
42+
start_server(use_ngrok=True)
43+
```
44+
45+
## Step-by-Step Setup
46+
47+
### 1. Create a New Notebook
48+
49+
Create a new notebook in Google Colab:
50+
51+
- Go to [Google Colab](https://colab.research.google.com/)
52+
- Click "New Notebook"
53+
54+
### 2. Enable GPU Runtime
55+
56+
Change the runtime to use GPU:
57+
58+
- Click "Runtime" in the menu
59+
- Select "Change runtime type"
60+
- Set "Hardware accelerator" to "GPU"
61+
- Click "Save"
62+
63+
### 3. Install the LocalLab Package
64+
65+
```python
66+
!pip install locallab
67+
```
68+
69+
### 4. Get an ngrok Auth Token
70+
71+
LocalLab uses ngrok to make your local server accessible over the internet:
72+
73+
1. Sign up or log in at [ngrok.com](https://ngrok.com)
74+
2. Go to the [Auth page](https://dashboard.ngrok.com/get-started/your-authtoken)
75+
3. Copy your authtoken
76+
77+
### 5. Configure and Run the Server
78+
79+
```python
80+
import os
81+
import logging
82+
83+
# Configure logging
84+
logging.basicConfig(
85+
level=logging.INFO,
86+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
87+
)
88+
89+
# Set your ngrok token
90+
os.environ["NGROK_AUTH_TOKEN"] = "your_ngrok_token_here" # Replace with your actual token
91+
92+
# Set environment variables
93+
os.environ["HUGGINGFACE_MODEL"] = "microsoft/phi-2" # Good default for Colab
94+
os.environ["LOCALLAB_ENABLE_QUANTIZATION"] = "true"
95+
os.environ["LOCALLAB_QUANTIZATION_TYPE"] = "int8"
96+
os.environ["LOCALLAB_ENABLE_ATTENTION_SLICING"] = "true"
97+
98+
# Import and start server
99+
from locallab import start_server
100+
101+
# Start with explicit ngrok flag
102+
start_server(use_ngrok=True)
103+
```
104+
105+
### 6. Access the Server
106+
107+
After running the cell, you'll see output like:
108+
109+
```
110+
✅ Server running at: https://1a2b3c4d5e.ngrok.io
111+
```
112+
113+
Visit this URL to access the LocalLab UI and API.
114+
115+
## Recommended Models for Colab
116+
117+
Different Colab tiers support different model sizes:
118+
119+
| Colab Tier | GPU Type | Recommended Models |
120+
| ---------- | ----------------- | ---------------------------------------------------------------------------------------- |
121+
| Free | T4 (~16GB VRAM) | microsoft/phi-2, TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
122+
| Pro | T4/V100 | meta-llama/Llama-2-7b-chat-hf, mistralai/Mistral-7B-Instruct-v0.1 |
123+
| Pro+ | A100 (~40GB VRAM) | meta-llama/Llama-2-13b-chat-hf, mistralai/Mixtral-8x7B-Instruct-v0.1 (with quantization) |
124+
125+
## Memory Optimization Configurations
126+
127+
### Configuration for Smaller Models
128+
129+
```python
130+
os.environ["HUGGINGFACE_MODEL"] = "microsoft/phi-2"
131+
os.environ["LOCALLAB_ENABLE_QUANTIZATION"] = "true"
132+
os.environ["LOCALLAB_QUANTIZATION_TYPE"] = "int8"
133+
os.environ["LOCALLAB_ENABLE_FLASH_ATTENTION"] = "true"
134+
```
135+
136+
### Configuration for Medium Models (7B)
137+
138+
```python
139+
os.environ["HUGGINGFACE_MODEL"] = "meta-llama/Llama-2-7b-chat-hf"
140+
os.environ["LOCALLAB_ENABLE_QUANTIZATION"] = "true"
141+
os.environ["LOCALLAB_QUANTIZATION_TYPE"] = "int8"
142+
os.environ["LOCALLAB_ENABLE_ATTENTION_SLICING"] = "true"
143+
os.environ["LOCALLAB_ENABLE_FLASH_ATTENTION"] = "true"
144+
```
145+
146+
### Configuration for Large Models (13B+)
147+
148+
```python
149+
os.environ["HUGGINGFACE_MODEL"] = "meta-llama/Llama-2-13b-chat-hf"
150+
os.environ["LOCALLAB_ENABLE_QUANTIZATION"] = "true"
151+
os.environ["LOCALLAB_QUANTIZATION_TYPE"] = "int4" # More aggressive quantization
152+
os.environ["LOCALLAB_ENABLE_ATTENTION_SLICING"] = "true"
153+
os.environ["LOCALLAB_ENABLE_CPU_OFFLOADING"] = "true" # Offload some layers to CPU
154+
```
155+
156+
## Troubleshooting
157+
158+
### Out of Memory Errors
159+
160+
If you encounter GPU out of memory errors:
161+
162+
1. Use a smaller model
163+
2. Enable more aggressive memory optimizations:
164+
```python
165+
os.environ["LOCALLAB_QUANTIZATION_TYPE"] = "int4"
166+
os.environ["LOCALLAB_ENABLE_CPU_OFFLOADING"] = "true"
167+
```
168+
3. Reduce the max length:
169+
```python
170+
os.environ["LOCALLAB_MAX_LENGTH"] = "1024" # Default is 2048
171+
```
172+
173+
### Connection Issues
174+
175+
If you cannot connect to the ngrok URL:
176+
177+
1. Ensure your ngrok token is correct
178+
2. Check Colab hasn't timed out (Colab sessions expire after idle periods)
179+
3. Try a different region:
180+
```python
181+
os.environ["LOCALLAB_NGROK_REGION"] = "eu" # Options: us, eu, ap, au, sa, jp, in
182+
```
183+
184+
## See Also
185+
186+
- [Environment Variables Documentation](../guides/environment_variables.md)
187+
- [Performance Guide](../features/performance.md)
188+
- [Memory Monitoring](../features/memory.md)

0 commit comments

Comments
 (0)