Skip to content

Commit 94924f3

Browse files
committed
feat:沐曦部署适配
Signed-off-by: Ceng23333 <[email protected]>
1 parent 7591b5a commit 94924f3

File tree

13 files changed

+685
-78
lines changed

13 files changed

+685
-78
lines changed

BABYSITTER_README.md

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# InfiniLM Service Babysitter
2+
3+
This directory contains scripts to automatically restart the InfiniLM service when it crashes or panics.
4+
5+
## Files
6+
7+
- `babysitter.py` - Python script that monitors and restarts the service
8+
- `start_service.sh` - Shell script wrapper for easy service startup
9+
- `infinilm.service` - Systemd service file for production deployment
10+
11+
## Quick Start
12+
13+
### Method 1: Using the shell script (Recommended)
14+
15+
```bash
16+
# Start with default settings (port 5000, service.toml)
17+
./start_service.sh
18+
19+
# Start on a different port
20+
./start_service.sh -p 8080 service.toml
21+
22+
# Allow more restart attempts
23+
./start_service.sh --max-restarts 20 service.toml
24+
25+
# Custom restart delay
26+
./start_service.sh --restart-delay 10 service.toml
27+
```
28+
29+
### Method 2: Using the Python script directly
30+
31+
```bash
32+
# Basic usage
33+
python3 babysitter.py service.toml
34+
35+
# With custom options
36+
python3 babysitter.py --port 8080 --max-restarts 15 --restart-delay 10 service.toml
37+
```
38+
39+
### Method 3: Systemd service (Production)
40+
41+
```bash
42+
# Copy the service file
43+
sudo cp infinilm.service /etc/systemd/system/
44+
45+
# Reload systemd
46+
sudo systemctl daemon-reload
47+
48+
# Enable and start the service
49+
sudo systemctl enable infinilm.service
50+
sudo systemctl start infinilm.service
51+
52+
# Check status
53+
sudo systemctl status infinilm.service
54+
55+
# View logs
56+
sudo journalctl -u infinilm.service -f
57+
```
58+
59+
## Features
60+
61+
- **Automatic Restart**: Automatically restarts the service when it crashes
62+
- **Configurable Limits**: Set maximum restart attempts and delay between restarts
63+
- **Logging**: Comprehensive logging to both file and console
64+
- **Graceful Shutdown**: Handles SIGINT and SIGTERM signals properly
65+
- **Real-time Output**: Shows service output in real-time
66+
- **Error Handling**: Robust error handling and recovery
67+
68+
## Configuration Options
69+
70+
### Babysitter Options
71+
72+
- `--port`: Port to run the service on (default: 5000)
73+
- `--max-restarts`: Maximum number of restart attempts (default: 10)
74+
- `--restart-delay`: Delay between restarts in seconds (default: 5)
75+
76+
### Service Configuration
77+
78+
The service uses the same configuration as the original `cargo run service` command:
79+
80+
- Model paths and GPU assignments
81+
- Sampling parameters (temperature, top-p, etc.)
82+
- Token limits
83+
- Blacklist settings
84+
- **New**: `max-sessions` parameter to limit concurrent connections
85+
86+
## Logging
87+
88+
The babysitter creates a `babysitter.log` file in the current directory with detailed logs including:
89+
90+
- Service start/stop events
91+
- Restart attempts and reasons
92+
- Service output
93+
- Error messages
94+
95+
## Troubleshooting
96+
97+
### Service won't start
98+
99+
1. Check if the config file exists and is valid
100+
2. Verify that the `xtask` directory exists
101+
3. Ensure you have the required dependencies (Python 3, Rust, CUDA)
102+
4. Check the `babysitter.log` file for detailed error messages
103+
104+
### Service keeps restarting
105+
106+
1. Check the service logs for the root cause of crashes
107+
2. Verify GPU memory availability
108+
3. Check if the model files are accessible
109+
4. Review the `max-sessions` setting if you're hitting connection limits
110+
111+
### Performance issues
112+
113+
1. Adjust the `max-sessions` parameter in your config file
114+
2. Monitor GPU memory usage
115+
3. Consider reducing batch sizes or model parameters
116+
117+
## Example Configuration
118+
119+
Add the `max-sessions` parameter to your `service.toml`:
120+
121+
```toml
122+
[FM9G-7B]
123+
path = "/root/zenghua/fm9g-7B-sft-v0.0-F16.gguf"
124+
gpus = [1]
125+
max-tokens = 32768
126+
temperature = 0.7
127+
top-p = 0.7
128+
repetition-penalty = 1.02
129+
max-sessions = 5 # Limit to 5 concurrent sessions
130+
131+
[Qwen3-32B]
132+
path = "/root/zenghua/Qwen3-32B-F16.gguf"
133+
gpus = [4, 5, 6, 7]
134+
max-tokens = 32768
135+
temperature = 0.6
136+
top-p = 0.95
137+
top-k = 20
138+
repetition-penalty = 1.02
139+
think = false
140+
max-sessions = 3 # Limit to 3 concurrent sessions for larger model
141+
```
142+
143+
## Monitoring
144+
145+
### Check service status
146+
147+
```bash
148+
# If using systemd
149+
sudo systemctl status infinilm.service
150+
151+
# Check babysitter logs
152+
tail -f babysitter.log
153+
154+
# Check service logs
155+
sudo journalctl -u infinilm.service -f
156+
```
157+
158+
### Monitor resource usage
159+
160+
```bash
161+
# Check GPU usage
162+
nvidia-smi
163+
164+
# Check memory usage
165+
free -h
166+
167+
# Check process status
168+
ps aux | grep xtask
169+
```
170+
171+
## Security Considerations
172+
173+
- The systemd service runs as root (required for GPU access)
174+
- Consider using a dedicated user for production deployments
175+
- Review and adjust the security settings in `infinilm.service`
176+
- Monitor logs for any suspicious activity

Cargo.lock

Lines changed: 26 additions & 44 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)