You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+145Lines changed: 145 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,6 +50,9 @@ Key Features
50
50
REST API
51
51
Manage your SSH configurations, enclaves, security rules, and sessions programmatically using a well-documented REST API.
52
52
53
+
Self-Healing System
54
+
Automatically detects, analyzes, and repairs system errors through intelligent coding agents. Configure patching policies (immediate, off-hours, or never) per pod/service, with built-in security analysis to prevent healing of security-sensitive errors without manual review. When configured, the system can automatically create GitHub pull requests with fixes.
55
+
53
56
Custom SSH Server responds via Sentrius UI or terminals
54
57

55
58
@@ -441,6 +444,148 @@ The JIRA integration provides secure proxy access to JIRA APIs for ticket manage
441
444
442
445
All JIRA requests are authenticated through Keycloak and validated against the user's permissions.
443
446
447
+
## Self-Healing System
448
+
449
+
Sentrius includes an intelligent self-healing system that automatically detects, analyzes, and repairs errors in your infrastructure.
450
+
451
+
### Key Features
452
+
453
+
- **Automatic Error Detection**: Continuously monitors the error output table and OpenTelemetry data for system errors
454
+
- **Security Analysis**: Automatically analyzes errors to determine if they pose security concerns before attempting repairs
455
+
- **Flexible Patching Policies**: Configure per-pod/service policies for when repairs should be applied:
456
+
- **Immediate**: Apply fixes as soon as errors are detected
457
+
- **Off-Hours**: Queue fixes to apply during configured maintenance windows (default: 10 PM - 6 AM)
458
+
- **Never**: Disable self-healing for critical services that require manual intervention
459
+
- **Coding Agent Deployment**: Automatically launches isolated coding agent pods to analyze errors and generate fixes
460
+
- **Docker Image Building**: Spins up Kubernetes Jobs using Kaniko to build and push Docker images with the fixes
enabled: false # Auto-enabled if GitHub integration exists
519
+
apiUrl: "https://api.github.com"
520
+
owner: ""
521
+
repo: ""
522
+
```
523
+
524
+
**Important**: Self-healing requires GitHub integration to be configured in the integration tokens table. The system will automatically detect if a GitHub token exists and only proceed if configured. To add a GitHub integration token, navigate to the Integration Settings in the UI and add a token with `connectionType: "github"`.
525
+
526
+
### Viewing Healing Sessions
527
+
528
+
Monitor active and completed healing sessions:
529
+
530
+
1. Navigate to **Self-Healing Sessions** (`/sso/v1/self-healing/sessions`)
531
+
2. Filter by status: All, Active, or Completed
532
+
3. View detailed information about each session including:
533
+
- Agent activity and logs
534
+
- Security analysis results
535
+
- Docker build status
536
+
- GitHub PR links (if created)
537
+
- Error details and resolution
538
+
539
+
### How It Works
540
+
541
+
The self-healing workflow consists of several automated steps:
542
+
543
+
1. **Error Detection**: The system scans the error_output table every 5 minutes for new errors
544
+
2. **Policy Check**: Determines if healing is enabled for the affected pod and checks the patching policy
545
+
3. **Security Analysis**: Analyzes error logs for security-related keywords
546
+
4. **Agent Launch**: If not a security concern, launches a coding agent pod to analyze and fix the error
547
+
5. **Code Repair**: The coding agent examines the error, generates fixes, and commits changes
548
+
6. **Docker Build**: A Kubernetes Job is created to build a new Docker image with the fixes using Kaniko
549
+
7. **GitHub PR**: If configured, creates a pull request with the changes
550
+
8. **Completion**: Updates the healing session with results and status
551
+
552
+
The entire workflow is asynchronous and can handle multiple concurrent healing sessions.
553
+
554
+
### Security Considerations
555
+
556
+
The self-healing system includes built-in safety mechanisms:
557
+
558
+
- **GitHub Integration Required**: Self-healing only proceeds if a GitHub integration token is configured in the system. This ensures all fixes can be tracked via pull requests.
559
+
- **Security Analysis**: Errors containing security-related keywords (authentication, authorization, vulnerability, etc.) are flagged and require manual review before healing proceeds
560
+
- **No Visibility Restriction**: Security-flagged errors are hidden from general users until cleared by administrators
561
+
- **Audit Trail**: All healing attempts are logged and tracked in the `self_healing_session` table
562
+
- **Isolated Execution**: Healing agents run in isolated Kubernetes pods with limited permissions
563
+
564
+
### Manual Triggering
565
+
566
+
You can manually trigger self-healing for specific errors (requires GitHub integration to be configured):
567
+
568
+
1. Navigate to **Error Logs** (`/sso/v1/notifications/error/log/get`)
569
+
2. Click **Trigger Self-Healing** on any error
570
+
3. Monitor progress in the Self-Healing Sessions view
571
+
572
+
Or via API:
573
+
574
+
```bash
575
+
curl -X POST http://localhost:8080/api/v1/self-healing/trigger/{errorId} \
576
+
-H "Authorization: Bearer <TOKEN>"
577
+
```
578
+
579
+
**Note**: If GitHub integration is not configured, the trigger will fail with a message prompting you to add a GitHub integration token first.
580
+
581
+
### Database Schema
582
+
583
+
The self-healing system uses three main tables:
584
+
585
+
- `self_healing_config`: Stores patching policies per pod/service
586
+
- `self_healing_session`: Tracks each healing attempt and its status
587
+
- `error_output`: Extended with healing status and security analysis fields
588
+
444
589
## Custom Agents
445
590
446
591
Sentrius supports both Java and Python-based custom agents that can extend the platform's functionality for monitoring, automation, and user assistance.
0 commit comments