-
Notifications
You must be signed in to change notification settings - Fork 1
Event Schema
This document describes the unified event system used throughout Stark Orchestrator for debugging, UI timelines, audits, and real-time communication.
Events are first-class data in Stark Orchestrator. Unlike simple logs, events are:
- Structured — Consistent schema for all event types
- Queryable — Filter by category, severity, resource, namespace, time range
- Persistent — Stored in database for audit trails and debugging
- Real-time — Streamed via WebSocket for live updates
- Actionable — Used for UI timelines, alerting, and automation
| Category | Example Events |
|---|---|
| Pod |
PodScheduled, PodFailed, PodRestarted, PodEvicted
|
| Node |
NodeLost, NodeRecovered, NodeRegistered, NodeDraining
|
| Service |
ServiceScaled, ServiceRollback, ServiceFailed
|
| Pack |
PackPublished, PackVersionDeprecated
|
| System |
ClusterStarted, ConfigChanged
|
| Auth |
UserLoggedIn, PermissionDenied
|
| Secret |
SecretCreated, SecretUpdated, SecretDeleted, SecretInjected
|
| Scheduler |
SchedulingCycleCompleted, NoNodesAvailable
|
| Ephemeral |
PodJoinedGroup, PodLeftGroup, PodGroupCreated, PodGroupDissolved
|
All events follow a consistent structure stored in the events table:
interface StarkEvent {
// Identification
id: string; // Unique event ID
eventType: string; // e.g., 'PodScheduled', 'NodeLost'
category: EventCategory; // 'pod' | 'node' | 'pack' | 'service' | 'system' | 'auth' | 'scheduler'
severity: EventSeverity; // 'info' | 'warning' | 'error' | 'critical'
// Resource (polymorphic)
resourceId?: string; // Primary resource ID
resourceType?: string; // 'pod', 'node', 'pack', etc.
resourceName?: string; // Human-readable name
namespace?: string; // Namespace context
// Actor
actorId?: string; // Who caused the event
actorType: EventActorType; // 'user' | 'system' | 'scheduler' | 'node'
// Details
reason?: string; // Short reason code
message?: string; // Human-readable description
previousState?: object; // State before event
newState?: object; // State after event
// Related resource
relatedResourceId?: string; // e.g., node for pod events
relatedResourceType?: string;
relatedResourceName?: string;
// Metadata
metadata: Record<string, unknown>;
source: EventSource; // 'server' | 'node' | 'client' | 'scheduler'
correlationId?: string; // For tracing
timestamp: Date;
}| Level | Description | Examples |
|---|---|---|
info |
Normal operations |
PodCreated, NodeRegistered, PodStarted
|
warning |
Degraded state, attention needed |
PodEvicted, NodeDraining, NodeLost
|
error |
Failure occurred |
PodFailed, ScheduleFailed, ServiceFailed
|
critical |
Immediate attention required | Cluster-wide failures, data loss risks |
Emitted when a new pod is created.
{
"eventType": "PodCreated",
"category": "pod",
"severity": "info",
"resourceId": "pod-uuid",
"resourceType": "pod",
"resourceName": "my-app-abc123",
"namespace": "default",
"reason": "PodCreated",
"message": "Pod created",
"newState": { "status": "pending" },
"metadata": {
"packName": "my-app",
"packVersion": "1.0.0"
}
}Emitted when a pod is assigned to a node.
{
"eventType": "PodScheduled",
"category": "pod",
"severity": "info",
"resourceId": "pod-uuid",
"resourceType": "pod",
"resourceName": "my-app-abc123",
"namespace": "default",
"reason": "Scheduled",
"message": "Pod scheduled to node worker-1",
"previousState": { "status": "pending" },
"newState": { "status": "scheduled" },
"relatedResourceId": "node-uuid",
"relatedResourceType": "node",
"relatedResourceName": "worker-1"
}Emitted when a pod fails with an error.
{
"eventType": "PodFailed",
"category": "pod",
"severity": "error",
"resourceId": "pod-uuid",
"resourceType": "pod",
"resourceName": "my-app-abc123",
"namespace": "default",
"reason": "OOMKilled",
"message": "Container killed due to memory limit",
"previousState": { "status": "running" },
"newState": { "status": "failed" },
"relatedResourceId": "node-uuid",
"relatedResourceType": "node",
"metadata": {
"exitCode": 137,
"memoryLimit": 256
}
}Emitted when a pod is restarted.
{
"eventType": "PodRestarted",
"category": "pod",
"severity": "info",
"resourceId": "pod-uuid",
"resourceType": "pod",
"resourceName": "my-app-abc123",
"namespace": "default",
"reason": "Restarted",
"message": "Pod restarted",
"metadata": {
"restartCount": 3
}
}Emitted when a pod is evicted from a node.
{
"eventType": "PodEvicted",
"category": "pod",
"severity": "warning",
"resourceId": "pod-uuid",
"resourceType": "pod",
"resourceName": "my-app-abc123",
"namespace": "default",
"reason": "NodeDrain",
"message": "Pod evicted due to node drain",
"previousState": { "status": "running" },
"newState": { "status": "evicted" },
"relatedResourceId": "node-uuid",
"relatedResourceType": "node"
}-
PodStarting— Pod is starting up -
PodRunning— Pod is now running -
PodStopped— Pod stopped normally -
PodRolledBack— Pod rolled back to previous version -
PodScaled— Pod replicas changed -
PodUpdated— Pod configuration updated -
PodDeleted— Pod was deleted -
PodScheduleFailed— Scheduling failed (no suitable node)
Emitted when a node registers with the orchestrator.
{
"eventType": "NodeRegistered",
"category": "node",
"severity": "info",
"resourceId": "node-uuid",
"resourceType": "node",
"resourceName": "worker-1",
"reason": "NodeRegistered",
"message": "Node registered with orchestrator",
"newState": { "status": "ready" },
"metadata": {
"runtimeType": "node",
"labels": { "env": "production" },
"allocatable": { "cpu": 2000, "memory": 4096, "pods": 20 }
}
}Emitted when a node stops responding (heartbeat timeout or disconnect).
{
"eventType": "NodeLost",
"category": "node",
"severity": "warning",
"resourceId": "node-uuid",
"resourceType": "node",
"resourceName": "worker-1",
"reason": "HeartbeatTimeout",
"message": "Node lost: worker-1",
"previousState": { "status": "ready" },
"newState": { "status": "offline" },
"metadata": {
"lastHeartbeat": "2026-02-03T10:00:15.000Z",
"runtimeType": "node"
}
}Emitted when a previously offline node reconnects.
{
"eventType": "NodeRecovered",
"category": "node",
"severity": "info",
"resourceId": "node-uuid",
"resourceType": "node",
"resourceName": "worker-1",
"reason": "HeartbeatRestored",
"message": "Node recovered: worker-1",
"previousState": { "status": "offline" },
"newState": { "status": "ready" }
}-
NodeReady— Node is ready to accept pods -
NodeDraining— Node is being drained -
NodeDrained— Node drain completed -
NodeCordoned— Node marked unschedulable -
NodeUncordoned— Node marked schedulable again -
NodeDeleted— Node was removed -
NodeResourcePressure— Node has resource pressure
{
"eventType": "ServiceCreated",
"category": "service",
"severity": "info",
"resourceId": "service-uuid",
"resourceType": "service",
"resourceName": "my-service",
"namespace": "default",
"reason": "Created",
"message": "Service created with 3 replicas"
}{
"eventType": "ServiceRollback",
"category": "service",
"severity": "warning",
"resourceId": "service-uuid",
"resourceType": "service",
"resourceName": "my-service",
"namespace": "default",
"reason": "ConsecutiveFailures",
"message": "Service rolled back from v1.2.0 to v1.1.0",
"previousState": { "version": "1.2.0" },
"newState": { "version": "1.1.0" },
"metadata": {
"failureCount": 3
}
}Emitted when a new secret is created. Secret values are never included in event data.
{
"eventType": "SecretCreated",
"category": "secret",
"severity": "info",
"resourceId": "secret-uuid",
"resourceType": "secret",
"resourceName": "db-creds",
"namespace": "default",
"reason": "Created",
"message": "Secret created (opaque, 2 keys)",
"metadata": {
"secretType": "opaque",
"keyCount": 2,
"injectionMode": "env"
}
}{
"eventType": "SecretUpdated",
"category": "secret",
"severity": "info",
"resourceId": "secret-uuid",
"resourceType": "secret",
"resourceName": "db-creds",
"namespace": "default",
"reason": "Updated",
"message": "Secret updated (2 keys changed)",
"metadata": {
"keyCount": 2
}
}{
"eventType": "SecretDeleted",
"category": "secret",
"severity": "info",
"resourceId": "secret-uuid",
"resourceType": "secret",
"resourceName": "db-creds",
"namespace": "default",
"reason": "Deleted",
"message": "Secret deleted"
}Emitted when secrets are resolved and injected into a pod. Lists secret names but never values.
{
"eventType": "SecretInjected",
"category": "secret",
"severity": "info",
"resourceId": "pod-uuid",
"resourceType": "pod",
"resourceName": "my-app-abc123",
"namespace": "default",
"reason": "SecretsInjected",
"message": "2 secrets injected into pod",
"metadata": {
"secretNames": ["db-creds", "api-cert"],
"envVarCount": 2,
"volumeMountCount": 1
}
}# Get events for a pod
GET /api/events?resourceType=pod&resourceId=<pod-id>
# Get critical events from last hour
GET /api/events?severity=error,critical&since=2026-02-03T09:00:00Z
# Get namespace timeline
GET /api/events?namespace=production&limit=100
# Get node events
GET /api/events?category=node&limit=50import { queryEvents, getPodEvents, getCriticalEvents } from '@stark-o/server';
// Query with filters
const result = await queryEvents({
category: 'pod',
namespace: 'production',
severity: ['error', 'warning'],
since: new Date(Date.now() - 60 * 60 * 1000), // Last hour
limit: 100,
});
// Get pod timeline
const { events } = await getPodEvents('pod-uuid');
// Get critical events for alerting
const { events: critical } = await getCriticalEvents(
new Date(Date.now() - 60 * 60 * 1000) // Since 1 hour ago
);Events are automatically emitted by database triggers when resources change state. You can also emit events programmatically:
import { emitPodEvent, emitNodeEvent, emitEvent } from '@stark-o/server';
// Emit a pod event
await emitPodEvent({
eventType: 'PodFailed',
podId: 'pod-uuid',
podName: 'my-app-abc123',
namespace: 'default',
severity: 'error',
reason: 'OOMKilled',
message: 'Container killed due to memory limit',
previousStatus: 'running',
newStatus: 'failed',
nodeId: 'node-uuid',
nodeName: 'worker-1',
metadata: { exitCode: 137 },
});
// Emit a node event
await emitNodeEvent({
eventType: 'NodeLost',
nodeId: 'node-uuid',
nodeName: 'worker-1',
severity: 'warning',
reason: 'HeartbeatTimeout',
message: 'Node stopped responding',
});
// Emit a generic event
await emitEvent({
eventType: 'ConfigChanged',
category: 'system',
severity: 'info',
reason: 'ConfigUpdate',
message: 'Cluster configuration updated',
actorId: 'user-uuid',
});Common reason codes for categorizing events:
| Reason | Description |
|---|---|
Scheduled |
Pod assigned to node |
ScheduleFailed |
No suitable node found |
NoNodesAvailable |
No nodes available |
InsufficientResources |
Not enough resources |
TaintNotTolerated |
Node taint not tolerated |
Started |
Pod started running |
Failed |
Pod execution failed |
OOMKilled |
Out of memory |
CrashLoopBackOff |
Repeated crashes |
Evicted |
Pod evicted |
NodeDrain |
Evicted due to node drain |
Preempted |
Preempted by higher priority pod |
| Reason | Description |
|---|---|
Registered |
Node registered |
HeartbeatTimeout |
Heartbeat not received |
HeartbeatRestored |
Heartbeat resumed |
Ready |
Node ready |
Cordoned |
Marked unschedulable |
Draining |
Being drained |
MemoryPressure |
Low memory |
DiskPressure |
Low disk space |
The previous pod_history table has been superseded by the unified events table. A compatibility view pod_history_compat is available for backwards compatibility:
SELECT * FROM pod_history_compat WHERE pod_id = 'pod-uuid';The ephemeral data plane emits its own in-memory event stream, separate from the persistent StarkEvent system above. These events are not stored in the database — they are delivered via audit hooks registered on the PodGroupStore or EphemeralDataPlane.
interface EphemeralEvent {
type: EphemeralEventType;
timestamp: string; // ISO 8601
groupId?: string;
podId?: string;
queryId?: string;
message?: string;
metadata?: Record<string, unknown>;
}| Event | Severity | Emitted When |
|---|---|---|
PodJoinedGroup |
info | A pod joins a group for the first time |
PodLeftGroup |
info | A pod explicitly leaves a group |
PodGroupCreated |
info | A group is lazily created (first member joins) |
PodGroupDissolved |
info | A group is garbage-collected (last member leaves or expires) |
PodMembershipExpired |
info | A membership is reaped due to TTL expiration |
PodMembershipRefreshed |
info | A pod refreshes its existing membership |
EphemeralQueryIssued |
info | An ephemeral fan-out query is sent |
EphemeralResponseReceived |
info | An ephemeral response is received from a pod |
{
"type": "PodJoinedGroup",
"timestamp": "2026-02-12T10:00:00.000Z",
"groupId": "demo:podgroup-chat",
"podId": "pod-abc-123",
"message": "Pod 'pod-abc-123' joined group 'demo:podgroup-chat'",
"metadata": { "role": "echo" }
}{
"type": "PodGroupCreated",
"timestamp": "2026-02-12T10:00:00.000Z",
"groupId": "demo:podgroup-chat",
"message": "Group 'demo:podgroup-chat' created"
}{
"type": "PodGroupDissolved",
"timestamp": "2026-02-12T10:05:00.000Z",
"groupId": "demo:podgroup-chat",
"message": "Group 'demo:podgroup-chat' dissolved (no remaining members)"
}{
"type": "PodMembershipExpired",
"timestamp": "2026-02-12T10:02:00.000Z",
"groupId": "demo:podgroup-chat",
"podId": "pod-abc-123",
"message": "Membership expired: pod 'pod-abc-123' in group 'demo:podgroup-chat'"
}// On PodGroupStore (server-side)
const dispose = store.onEvent((event) => {
if (event.type === 'PodGroupDissolved') {
console.log(`Group ${event.groupId} dissolved`);
}
});
// On EphemeralDataPlane (pack-side, local mode only)
const dispose = plane.onEvent((event) => {
console.log(`[audit] ${event.type}: ${event.message}`);
});
// Unregister later
dispose();Full reference: PodGroups & Ephemeral Data Plane
- API Reference - REST and WebSocket APIs
- PodGroups & Ephemeral Data Plane - Ephemeral events & audit hooks
- Metrics and Observability - Monitoring and metrics
- Glossary - Terminology definitions
- Home
- Getting Started
- Concepts
- Core Architecture
- Tutorials
- Reference
- Advanced Topics
- Contribution