Skip to content

Commit d5cab58

Browse files
authored
Merge branch 'main' into garbage-collector-cronjob
2 parents bac3f00 + 7bf725e commit d5cab58

File tree

7 files changed

+205
-4
lines changed

7 files changed

+205
-4
lines changed

docs/user_scenarios.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,28 @@ The backfill request will be picked up by the operator, which will create a Kube
2323

2424
If the stream **is suspended**, you need to create backfill request first and then unsuspend the stream.
2525

26+
Example backfill request:
27+
```yaml
28+
apiVersion: streaming.sneaksanddata.com/v1
29+
kind: BackfillRequest
30+
metadata:
31+
name: my-backfill-request
32+
namespace: arcane-stream-mock
33+
spec:
34+
streamClass: arcane-stream-mock
35+
streamId: my-stream
36+
```
37+
2638
## My stream has failed and I want to restart it
2739
If your stream has failed, you can set `spec.suspended` to `true` to stop the stream.
2840
To avoid data loss, you may create a backfill request that fills in any gaps occurred during the failure.
2941
After that, you can set `spec.suspended` to `false` to restart the stream.
42+
43+
## I deleted the pod and my stream transitioned to failed state, how do I avoid that in the future?
44+
Arcane streaming is built on top of Kubernetes Jobs. By default, when a pod is deleted manually or due to node eviction,
45+
all containers in the pod receive a SIGTERM signal and have a grace period to shut down gracefully **with exit code 0**.
46+
If the containers do not shut down within the grace period, the pod is forcefully terminated. If the pod is terminated
47+
with a non-zero exit code, Kubernetes counts this pod **as failed** and the job may transition to a failed state.
48+
It's a responsibility of the user and/or the plugin developer to ensure that the streaming job handles termination signals
49+
gracefully and exits with code 0 or the exit code that returned by the plugin executable on termination is
50+
[added to the job's podFailurePolicy](https://kubernetes.io/docs/tasks/job/pod-failure-policy/).

justfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ mock-stream-plugin:
2727
--namespace default \
2828
--set jobTemplateSettings.podFailurePolicySettings.retryOnExitCodes="{120,121}" \
2929
--set jobTemplateSettings.backoffLimit=1 \
30-
--version v1.0.4
30+
--version v1.0.5
3131

3232
manifests:
3333
kubectl apply -f integration_tests/manifests/*.yaml
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Performance Testing Resources for Arcane Operator
2+
3+
## Overview
4+
5+
This directory contains Kubernetes manifests for performance testing the Arcane operator. The resources enable automated testing of streaming workloads at scale.
6+
7+
## Prerequisites
8+
9+
- Kubernetes cluster with Arcane operator and `arcane-stream-mock` installed
10+
- `kubectl` configured to access the target cluster
11+
- Namespace `arcane-stream-mock` created
12+
13+
## Resources
14+
15+
### 1. test_secret.yaml
16+
A Kubernetes Secret containing test credentials and configuration data.
17+
18+
**Purpose**: Provides ability to test secret management and access by streaming jobs.
19+
20+
**Apply**:
21+
```bash
22+
kubectl apply -f test_secret.yaml
23+
```
24+
25+
### 2. stream_template.yaml
26+
A `TestStreamDefinition` resource that defines mock streaming workloads.
27+
28+
**Purpose**: Creates test streams for performance evaluation.
29+
30+
**Key Parameters**:
31+
- `runDuration`: Duration for each test run (default: 360s)
32+
- `suspended`: Set to `false` to start streams in backfill mode
33+
- `shouldFail`: Set to `true` to test failure scenarios
34+
35+
**Apply**:
36+
```bash
37+
kubectl apply -f stream_template.yaml
38+
```
39+
40+
## Performance Testing Workflow
41+
42+
### Step 1: Deploy Test Secret
43+
```bash
44+
kubectl apply -f test_secret.yaml
45+
```
46+
47+
Verify the secret is created:
48+
```bash
49+
kubectl get secret test-secret -n arcane-stream-mock
50+
```
51+
52+
### Step 2: Create StreamingJobTemplate
53+
Ensure the referenced `StreamingJobTemplate` (`arcane-stream-mock-standard-job`) exists in the `arcane-stream-mock` namespace before applying the stream template.
54+
55+
### Step 3: Deploy Test Streams
56+
```bash
57+
for i in {1..2000}; do kubectl create -f stream_template.yaml; done
58+
```
59+
60+
This creates a two thousand test streams with a generated name (e.g., `mock-stream-abc123`).
61+
62+
### Step 4: Activate Test Streams
63+
By default, streams are created in suspended state. To start testing:
64+
65+
Assuming you want to activate 300 streams at a time in parallel with 4 concurrent processes:
66+
```bash
67+
kubectl get teststreamdefinition -n arcane-stream-mock --no-headers \
68+
| grep Suspended \
69+
| head -n 300 \
70+
| awk '{print $1}' \
71+
| xargs --max-procs=4 \
72+
| kubectl patch teststreamdefinition <stream-name> -n arcane-stream-mock -p '{"spec":{"suspended":false}}' --type=merge
73+
```
74+
75+
### Step 5: Monitor Performance
76+
Use the Prometeus metrics exposed by the Arcane operator to monitor performance and resource usage during testing.
77+
78+
## Scaling Tests
79+
80+
## Customizing Tests
81+
82+
### Modify Run Duration
83+
Edit `stream_template.yaml` and change `runDuration`:
84+
```yaml
85+
spec:
86+
runDuration: 600s # 10 minutes
87+
```
88+
89+
### Test Failure Scenarios
90+
```yaml
91+
spec:
92+
shouldFail: true
93+
```
94+
95+
## Cleanup
96+
97+
### Remove All Test Streams
98+
```bash
99+
kubectl delete teststreamdefinition -n arcane-stream-mock --all
100+
```
101+
102+
### Remove Test Secret
103+
```bash
104+
kubectl delete -f test_secret.yaml
105+
```
106+
107+
## Troubleshooting
108+
109+
### Stream Not Starting
110+
- Check if `suspended` is set to `false`
111+
- Verify `StreamingJobTemplate` reference exists
112+
- Check operator logs: `kubectl logs -n <operator-namespace> -l app=arcane-operator`
113+
114+
### Jobs Failing
115+
- Verify test secret is properly configured
116+
- Check job logs: `kubectl logs -n arcane-stream-mock <job-pod-name>`
117+
- Ensure sufficient cluster resources (CPU, memory)
118+
- Check if `shouldFail` is set to `true` for intentional failures
119+
120+
## Metrics Collection
121+
122+
Monitor operator performance metrics:
123+
```bash
124+
# CPU/Memory usage
125+
kubectl top pods -n <operator-namespace>
126+
127+
# Stream processing metrics
128+
kubectl describe teststreamdefinition -n arcane-stream-mock
129+
```
130+
131+
## Best Practices
132+
133+
1. **Incremental Load Testing**: Start with 1-2 streams, gradually increase
134+
2. **Resource Limits**: Monitor cluster resources during tests
135+
3. **Cleanup**: Always clean up test resources after performance testing
136+
4. **Isolation**: Use dedicated namespace for performance tests
137+
5. **Baseline**: Establish baseline metrics before scaling tests
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
apiVersion: streaming.sneaksanddata.com/v1
2+
kind: TestStreamDefinition
3+
metadata:
4+
generateName: mock-stream-
5+
namespace: arcane-stream-mock
6+
spec:
7+
backfillJobTemplateRef:
8+
apiVersion: streaming.sneaksanddata.com/v1
9+
kind: StreamingJobTemplate
10+
name: arcane-stream-mock-standard-job
11+
namespace: arcane-stream-mock
12+
destination: mock-destination
13+
jobTemplateRef:
14+
apiVersion: streaming.sneaksanddata.com/v1
15+
kind: StreamingJobTemplate
16+
name: arcane-stream-mock-standard-job
17+
namespace: arcane-stream-mock
18+
runDuration: 360s
19+
shouldFail: false
20+
source: mock-source
21+
suspended: true
22+
testSecretRef:
23+
name: test-secret
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
apiVersion: v1
2+
kind: Secret
3+
metadata:
4+
name: test-secret
5+
namespace: arcane-stream-mock
6+
data:
7+
TEST_SECRET: c2VjcmV0IGRhdGE=
8+
type: Opaque
9+
10+

services/controllers/stream/stream_reconciler.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -497,7 +497,7 @@ func (s *streamReconciler) getLogger(_ context.Context, request types.Namespaced
497497
func (s *streamReconciler) updateStreamPhase(ctx context.Context, definition Definition, backfillRequest *v1.BackfillRequest, next Phase, eventFunc controllers.EventFunc) (reconcile.Result, error) {
498498
logger := s.getLogger(ctx, definition.NamespacedName())
499499
if definition.GetPhase() == next { // coverage-ignore
500-
logger.V(0).Info("Stream phase is already set to", "phase", definition.GetPhase())
500+
logger.V(0).Info("Stream phase is already set", "phase", definition.GetPhase())
501501
return reconcile.Result{}, nil
502502
}
503503
logger.V(0).Info("updating Stream status", "from", definition.GetPhase(), "to", next)

services/controllers/stream/streaming_job.go

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,21 @@ func (j StreamingJob) CurrentConfiguration() (string, error) { // coverage-ignor
1818
}
1919

2020
func (j StreamingJob) IsCompleted() bool { // coverage-ignore (trivial)
21-
return j.Status.Succeeded > 0
21+
for _, condition := range j.Status.Conditions {
22+
if condition.Type == v1.JobComplete && condition.Status == "True" {
23+
return true
24+
}
25+
}
26+
return false
2227
}
2328

2429
func (j StreamingJob) IsFailed() bool { // coverage-ignore (trivial)
25-
return j.Status.Failed > 0 && j.Status.Failed >= *j.Spec.BackoffLimit
30+
for _, condition := range j.Status.Conditions {
31+
if condition.Type == v1.JobFailed && condition.Status == "True" {
32+
return true
33+
}
34+
}
35+
return false
2636
}
2737

2838
func (j StreamingJob) ToV1Job() *v1.Job { // coverage-ignore (trivial)

0 commit comments

Comments
 (0)