InfiniTensor
diff --git a/‎BABYSITTER_README.md‎
Lines changed: 176 additions & 0 deletions b/‎BABYSITTER_README.md‎
Lines changed: 176 additions & 0 deletions
diff --git a/‎Cargo.lock‎
Lines changed: 26 additions & 44 deletions b/‎Cargo.lock‎
Lines changed: 26 additions & 44 deletions
@@ -0,0 +1,176 @@
+# InfiniLM Service Babysitter
+
+This directory contains scripts to automatically restart the InfiniLM service when it crashes or panics.
+
+## Files
+
+- `babysitter.py` - Python script that monitors and restarts the service
+- `start_service.sh` - Shell script wrapper for easy service startup
+- `infinilm.service` - Systemd service file for production deployment
+
+## Quick Start
+
+### Method 1: Using the shell script (Recommended)
+
+```bash
+# Start with default settings (port 5000, service.toml)
+./start_service.sh
+
+# Start on a different port
+./start_service.sh -p 8080 service.toml
+
+# Allow more restart attempts
+./start_service.sh --max-restarts 20 service.toml
+
+# Custom restart delay
+./start_service.sh --restart-delay 10 service.toml
+```
+
+### Method 2: Using the Python script directly
+
+```bash
+# Basic usage
+python3 babysitter.py service.toml
+
+# With custom options
+python3 babysitter.py --port 8080 --max-restarts 15 --restart-delay 10 service.toml
+```
+
+### Method 3: Systemd service (Production)
+
+```bash
+# Copy the service file
+sudo cp infinilm.service /etc/systemd/system/
+
+# Reload systemd
+sudo systemctl daemon-reload
+
+# Enable and start the service
+sudo systemctl enable infinilm.service
+sudo systemctl start infinilm.service
+
+# Check status
+sudo systemctl status infinilm.service
+
+# View logs
+sudo journalctl -u infinilm.service -f
+```
+
+## Features
+
+- **Automatic Restart**: Automatically restarts the service when it crashes
+- **Configurable Limits**: Set maximum restart attempts and delay between restarts
+- **Logging**: Comprehensive logging to both file and console
+- **Graceful Shutdown**: Handles SIGINT and SIGTERM signals properly
+- **Real-time Output**: Shows service output in real-time
+- **Error Handling**: Robust error handling and recovery
+
+## Configuration Options
+
+### Babysitter Options
+
+- `--port`: Port to run the service on (default: 5000)
+- `--max-restarts`: Maximum number of restart attempts (default: 10)
+- `--restart-delay`: Delay between restarts in seconds (default: 5)
+
+### Service Configuration
+
+The service uses the same configuration as the original `cargo run service` command:
+
+- Model paths and GPU assignments
+- Sampling parameters (temperature, top-p, etc.)
+- Token limits
+- Blacklist settings
+- **New**: `max-sessions` parameter to limit concurrent connections
+
+## Logging
+
+The babysitter creates a `babysitter.log` file in the current directory with detailed logs including:
+
+- Service start/stop events
+- Restart attempts and reasons
+- Service output
+- Error messages
+
+## Troubleshooting
+
+### Service won't start
+
+1. Check if the config file exists and is valid
+2. Verify that the `xtask` directory exists
+3. Ensure you have the required dependencies (Python 3, Rust, CUDA)
+4. Check the `babysitter.log` file for detailed error messages
+
+### Service keeps restarting
+
+1. Check the service logs for the root cause of crashes
+2. Verify GPU memory availability
+3. Check if the model files are accessible
+4. Review the `max-sessions` setting if you're hitting connection limits
+
+### Performance issues
+
+1. Adjust the `max-sessions` parameter in your config file
+2. Monitor GPU memory usage
+3. Consider reducing batch sizes or model parameters
+
+## Example Configuration
+
+Add the `max-sessions` parameter to your `service.toml`:
+
+```toml
+[FM9G-7B]
+path = "/root/zenghua/fm9g-7B-sft-v0.0-F16.gguf"
+gpus = [1]
+max-tokens = 32768
+temperature = 0.7
+top-p = 0.7
+repetition-penalty = 1.02
+max-sessions = 5  # Limit to 5 concurrent sessions
+
+[Qwen3-32B]
+path = "/root/zenghua/Qwen3-32B-F16.gguf"
+gpus = [4, 5, 6, 7]
+max-tokens = 32768
+temperature = 0.6
+top-p = 0.95
+top-k = 20
+repetition-penalty = 1.02
+think = false
+max-sessions = 3  # Limit to 3 concurrent sessions for larger model
+```
+
+## Monitoring
+
+### Check service status
+
+```bash
+# If using systemd
+sudo systemctl status infinilm.service
+
+# Check babysitter logs
+tail -f babysitter.log
+
+# Check service logs
+sudo journalctl -u infinilm.service -f
+```
+
+### Monitor resource usage
+
+```bash
+# Check GPU usage
+nvidia-smi
+
+# Check memory usage
+free -h
+
+# Check process status
+ps aux | grep xtask
+```
+
+## Security Considerations
+
+- The systemd service runs as root (required for GPU access)
+- Consider using a dedicated user for production deployments
+- Review and adjust the security settings in `infinilm.service`
+- Monitor logs for any suspicious activity