Skip to content

Commit 39e9d91

Browse files
authored
Merge branch 'master' into multiple-streaming-subscriptions-2
2 parents 7c75ef7 + 91391fe commit 39e9d91

File tree

13 files changed

+144
-112
lines changed

13 files changed

+144
-112
lines changed

docs/release_notes/v1.15.4.md

Lines changed: 104 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,12 @@
33
This update includes bug fixes:
44

55
- [Fix degradation of Workflow runtime performance over time](#fix-degradation-of-workflow-runtime-performance-over-time)
6+
- [Fix remote Actor invocation 500 retry](#fix-remote-actor-invocation-500-retry)
7+
- [Fix Global Actors Enabled Configuration](#fix-global-actors-enabled-configuration)
8+
- [Prevent panic of reminder operations on slow Actor Startup](#prevent-panic-of-reminder-operations-on-slow-actor-startup)
9+
- [Remove client-side rate limiter from Sentry](#remove-client-side-rate-limiter-from-sentry)
610
- [Allow Service Account for MetalBear mirrord operator in sidecar injector](#allow-service-account-for-metalbear-mirrord-operator-in-sidecar-injector)
11+
- [Fix Scheduler Client connection pruning](#fix-scheduler-client-connection-pruning)
712

813
## Fix degradation of Workflow runtime performance over time
914

@@ -25,6 +30,85 @@ This caused Jobs to fail, and enter failure policy retry loops.
2530

2631
Refactor the Scheduler connection pool logic to properly prune stale connections to prevent job execution occurring on stale connections and causing failure policy loops.
2732

33+
## Fix remote Actor invocation 500 retry
34+
35+
### Problem
36+
37+
An actor invocation across hosts which result in a 500 HTTP header response code would result in the request being retried 5 times.
38+
39+
### Impact
40+
41+
Services which return a 500 HTTP header response code would result in requests under normal operation to return slowly, and request the service on the same request multiple times.
42+
43+
### Root cause
44+
45+
The Actor engine considered a 500 HTTP header response code to be a retriable error, rather than a successful request which returned a non-200 status code.
46+
47+
### Solution
48+
49+
Remove the 500 HTTP header response code from the list of retriable errors.
50+
51+
### Problem
52+
53+
## Fix Global Actors Enabled Configuration
54+
55+
### Problem
56+
57+
When `global.actors.enabled` was set to `false` via Helm or the environment variable `ACTORS_ENABLED=false`, the Dapr sidecar would still attempt to connect to the placement service, causing readiness probe failures and repeatedly logged errors about failing to connect to placement.
58+
Fixes this [issue](https://github.com/dapr/dapr/issues/8551).
59+
60+
### Impact
61+
62+
Dapr sidecars would fail their readiness probes and log errors like:
63+
```
64+
Failed to connect to placement dns:///dapr-placement-server.dapr-system.svc.cluster.local:50005: failed to create placement client: rpc error: code = Unavailable desc = last resolver error: produced zero addresses
65+
```
66+
67+
### Root cause
68+
69+
The sidecar injector was not properly respecting the global actors enabled configuration when setting up the placement service connection.
70+
71+
### Solution
72+
73+
The sidecar injector now properly respects the `global.actors.enabled` helm configuration and `ACTORS_ENABLED` environment variable. When set to `false`, it will not attempt to connect to the placement service, allowing the sidecar to start successfully without actor functionality.
74+
75+
76+
## Prevent panic of reminder operations on slow Actor Startup
77+
78+
### Problem
79+
80+
The Dapr runtime HTTP server would panic if a reminder operation timed out while an Actor was starting up.
81+
82+
### Impact
83+
84+
The HTTP server would panic, causing degraded performance.
85+
86+
### Root cause
87+
88+
The Dapr runtime would attempt to use the reminder service before it was initialized.
89+
90+
### Solution
91+
92+
Correctly return an errors that the actor runtime was not ready in time for the reminder operation.
93+
94+
## Remove client-side rate limiter from Sentry
95+
96+
### Problem
97+
98+
A cold start of many Dapr deployments would take a long time, and even cause some crash loops.
99+
100+
### Impact
101+
102+
A large Dapr deployment would take a non-linear more amount of time that a smaller one to completely roll out.
103+
104+
### Root cause
105+
106+
The Sentry Kubernetes client was configured with a rate limiter which would be exhausted when services all new Dapr deployment at once, cause many client to wait significantly.
107+
108+
### Solution
109+
110+
Remove the client-side rate limiting from the Sentry Kubernetes client.
111+
28112
## Allow Service Account for MetalBear mirrord operator in sidecar injector
29113

30114
### Problem
@@ -33,12 +117,30 @@ Mirrord Operator is not on the allow list of Service Accounts for the dapr sidec
33117

34118
### Impact
35119

36-
Running mirrord in `copy_target` mode would cause the pod to initalise with without the dapr container.
120+
Running mirrord in `copy_target` mode would cause the pod to initalise without the dapr container.
37121

38122
### Root cause
39123

40124
Mirrord Operator is not on the allow list of Service Accounts for the dapr sidecar injector.
41125

42126
### Solution
43127

44-
Add the Mirrord Operator into the allow list of Service Accounts for the dapr sidecar injector.
128+
Add the Mirrord Operator into the allow list of Service Accounts for the dapr sidecar injector.
129+
130+
## Fix Scheduler Client connection pruning
131+
132+
### Problem
133+
134+
Daprd would attempt to connect to stale Scheduler addresses.
135+
136+
### Impact
137+
138+
Network resource usage and error reporting from service mesh sidecars.
139+
140+
### Root cause
141+
142+
Daprd would not close Scheduler gRPC connections to hosts which no longer exist.
143+
144+
### Solution
145+
146+
Daprd now closes connections to Scheduler hosts when they are no longer in the list of active hosts.

go.mod

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
module github.com/dapr/dapr
22

3-
go 1.24.1
3+
go 1.24.2
44

55
require (
66
contrib.go.opencensus.io/exporter/prometheus v0.4.2
@@ -252,7 +252,7 @@ require (
252252
github.com/godbus/dbus v0.0.0-20190726142602-4481cbc300e2 // indirect
253253
github.com/gofrs/uuid v4.4.0+incompatible // indirect
254254
github.com/gogo/protobuf v1.3.2 // indirect
255-
github.com/golang-jwt/jwt/v4 v4.5.1 // indirect
255+
github.com/golang-jwt/jwt/v4 v4.5.2 // indirect
256256
github.com/golang-jwt/jwt/v5 v5.2.2 // indirect
257257
github.com/golang-sql/civil v0.0.0-20220223132316-b832511892a9 // indirect
258258
github.com/golang-sql/sqlexp v0.1.0 // indirect

go.sum

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -738,8 +738,8 @@ github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69
738738
github.com/goji/httpauth v0.0.0-20160601135302-2da839ab0f4d/go.mod h1:nnjvkQ9ptGaCkuDUx6wNykzzlUixGxvkme+H/lnzb+A=
739739
github.com/golang-jwt/jwt v3.2.1+incompatible/go.mod h1:8pz2t5EyA70fFQQSrl6XZXzqecmYZeUEB8OUGHkxJ+I=
740740
github.com/golang-jwt/jwt v3.2.2+incompatible/go.mod h1:8pz2t5EyA70fFQQSrl6XZXzqecmYZeUEB8OUGHkxJ+I=
741-
github.com/golang-jwt/jwt/v4 v4.5.1 h1:JdqV9zKUdtaa9gdPlywC3aeoEsR681PlKC+4F5gQgeo=
742-
github.com/golang-jwt/jwt/v4 v4.5.1/go.mod h1:m21LjoU+eqJr34lmDMbreY2eSTRJ1cv77w39/MY0Ch0=
741+
github.com/golang-jwt/jwt/v4 v4.5.2 h1:YtQM7lnr8iZ+j5q71MGKkNw9Mn7AjHM68uc9g5fXeUI=
742+
github.com/golang-jwt/jwt/v4 v4.5.2/go.mod h1:m21LjoU+eqJr34lmDMbreY2eSTRJ1cv77w39/MY0Ch0=
743743
github.com/golang-jwt/jwt/v5 v5.2.2 h1:Rl4B7itRWVtYIHFrSNd7vhTiz9UpLdi6gZhZ3wEeDy8=
744744
github.com/golang-jwt/jwt/v5 v5.2.2/go.mod h1:pqrtFR0X4osieyHYxtmOUWsAWrfe1Q5UVIyoH402zdk=
745745
github.com/golang-sql/civil v0.0.0-20220223132316-b832511892a9 h1:au07oEsX2xN0ktxqI+Sida1w446QrXBRJ0nee3SNZlA=

pkg/actors/actors.go

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -193,12 +193,10 @@ func (a *actors) Init(opts InitOptions) error {
193193

194194
storeEnabled := a.buildStateStore(opts, apiLevel)
195195

196-
if a.reminderStore != nil {
197-
a.reminders = reminders.New(reminders.Options{
198-
Storage: a.reminderStore,
199-
Table: a.table,
200-
})
201-
}
196+
a.reminders = reminders.New(reminders.Options{
197+
Storage: a.reminderStore,
198+
Table: a.table,
199+
})
202200

203201
var err error
204202
a.placement, err = placement.New(placement.Options{
@@ -357,6 +355,10 @@ func (a *actors) Reminders(ctx context.Context) (reminders.Interface, error) {
357355
return nil, err
358356
}
359357

358+
if a.reminders == nil {
359+
return nil, messages.ErrActorRuntimeNotFound
360+
}
361+
360362
return a.reminders, nil
361363
}
362364

@@ -374,7 +376,7 @@ func (a *actors) waitForReady(ctx context.Context) error {
374376
}
375377
return nil
376378
case <-ctx.Done():
377-
return ctx.Err()
379+
return messages.ErrActorRuntimeNotFound
378380
}
379381
}
380382

pkg/apphealth/health.go

Lines changed: 0 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -155,11 +155,6 @@ func (h *AppHealth) ReportHealth(status *Status) {
155155
return
156156
}
157157

158-
// Limit health reports to 1 per second
159-
if !h.ratelimitReports() {
160-
return
161-
}
162-
163158
// Channel is buffered, so make sure that this doesn't block
164159
// Just in case another report is being worked on!
165160
select {
@@ -205,32 +200,6 @@ func (h *AppHealth) doProbe(parentCtx context.Context) {
205200
}
206201
}
207202

208-
// Returns true if the health report can be saved. Only 1 report per second at most is allowed.
209-
func (h *AppHealth) ratelimitReports() bool {
210-
var (
211-
swapped bool
212-
attempts uint8
213-
)
214-
215-
now := h.clock.Now().UnixMicro()
216-
217-
// Attempts at most 2 times before giving up, as the report may be stale at that point
218-
for !swapped && attempts < 2 {
219-
attempts++
220-
221-
// If the last report was less than `reportMinInterval` ago, nothing to do here
222-
prev := h.lastReport.Load()
223-
if prev > now-reportMinInterval.Microseconds() {
224-
return false
225-
}
226-
227-
swapped = h.lastReport.CompareAndSwap(prev, now)
228-
}
229-
230-
// If we couldn't do the swap after 2 attempts, just return false
231-
return swapped
232-
}
233-
234203
func (h *AppHealth) setResult(ctx context.Context, status *Status) {
235204
h.lastReport.Store(h.clock.Now().UnixMicro())
236205

pkg/apphealth/health_test.go

Lines changed: 0 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -121,56 +121,6 @@ func TestAppHealth_setResult(t *testing.T) {
121121
assert.Equal(t, threshold+3, h.failureCount.Load())
122122
}
123123

124-
func TestAppHealth_ratelimitReports(t *testing.T) {
125-
clock := clocktesting.NewFakeClock(time.Now())
126-
h := New(config.AppHealthConfig{}, nil)
127-
h.clock = clock
128-
129-
// First run should always succeed
130-
require.True(t, h.ratelimitReports())
131-
132-
// Run again without waiting
133-
require.False(t, h.ratelimitReports())
134-
require.False(t, h.ratelimitReports())
135-
136-
// Step and test
137-
clock.Step(reportMinInterval)
138-
require.True(t, h.ratelimitReports())
139-
require.False(t, h.ratelimitReports())
140-
141-
// Run tests for 1 second, constantly
142-
// Should succeed only 10 times.
143-
clock.Step(reportMinInterval)
144-
firehose := func(start time.Time, step time.Duration) (passed int64) {
145-
for clock.Now().Sub(start) < time.Second*10 {
146-
if h.ratelimitReports() {
147-
passed++
148-
}
149-
clock.Step(step)
150-
}
151-
return passed
152-
}
153-
154-
passed := firehose(clock.Now(), 10*time.Millisecond)
155-
assert.Equal(t, int64(10), passed)
156-
157-
// Repeat, but run with 3 parallel goroutines
158-
wg := sync.WaitGroup{}
159-
totalPassed := atomic.Int64{}
160-
start := clock.Now()
161-
wg.Add(3)
162-
for range 3 {
163-
go func() {
164-
totalPassed.Add(firehose(start, 3*time.Millisecond))
165-
wg.Done()
166-
}()
167-
}
168-
wg.Wait()
169-
passed = totalPassed.Load()
170-
assert.GreaterOrEqual(t, passed, int64(8))
171-
assert.LessOrEqual(t, passed, int64(12))
172-
}
173-
174124
func Test_StartProbes(t *testing.T) {
175125
t.Run("closing context should return", func(t *testing.T) {
176126
ctx, cancel := context.WithCancel(t.Context())

pkg/apphealth/status.go

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,6 @@ package apphealth
1616

1717
import "time"
1818

19-
const (
20-
// reportMinInterval is the minimum interval between health reports.
21-
reportMinInterval = time.Second
22-
)
23-
2419
type Status struct {
2520
IsHealthy bool `json:"ishealthy"`
2621
TimeUnix int64 `json:"timeUnix"`

pkg/injector/service/config_test.go

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,12 +26,27 @@ import (
2626
)
2727

2828
func TestGetInjectorConfig(t *testing.T) {
29+
t.Setenv("NAMESPACE", "test-namespace")
30+
t.Setenv("SIDECAR_IMAGE", "daprd-test-image")
31+
32+
t.Run("respect globally disabling placement", func(t *testing.T) {
33+
t.Setenv("ACTORS_ENABLED", "false")
34+
cfg, err := GetConfig()
35+
require.NoError(t, err)
36+
assert.False(t, cfg.parsedActorsEnabled)
37+
assert.Equal(t, "false", cfg.ActorsEnabled)
38+
})
39+
t.Run("default placement is enabled", func(t *testing.T) {
40+
cfg, err := GetConfig()
41+
require.NoError(t, err)
42+
assert.Empty(t, cfg.ActorsEnabled)
43+
assert.True(t, cfg.parsedActorsEnabled)
44+
})
45+
2946
t.Run("with kube cluster domain env", func(t *testing.T) {
3047
t.Setenv("TLS_CERT_FILE", "test-cert-file")
3148
t.Setenv("TLS_KEY_FILE", "test-key-file")
32-
t.Setenv("SIDECAR_IMAGE", "daprd-test-image")
3349
t.Setenv("SIDECAR_IMAGE_PULL_POLICY", "Always")
34-
t.Setenv("NAMESPACE", "test-namespace")
3550
t.Setenv("KUBE_CLUSTER_DOMAIN", "cluster.local")
3651
t.Setenv("ALLOWED_SERVICE_ACCOUNTS", "test1:test-service-account1,test2:test-service-account2")
3752
t.Setenv("ALLOWED_SERVICE_ACCOUNTS_PREFIX_NAMES", "namespace:test-service-account1,namespace2*:test-service-account2")
@@ -49,9 +64,7 @@ func TestGetInjectorConfig(t *testing.T) {
4964
t.Run("not set kube cluster domain env", func(t *testing.T) {
5065
t.Setenv("TLS_CERT_FILE", "test-cert-file")
5166
t.Setenv("TLS_KEY_FILE", "test-key-file")
52-
t.Setenv("SIDECAR_IMAGE", "daprd-test-image")
5367
t.Setenv("SIDECAR_IMAGE_PULL_POLICY", "IfNotPresent")
54-
t.Setenv("NAMESPACE", "test-namespace")
5568
t.Setenv("KUBE_CLUSTER_DOMAIN", "")
5669

5770
cfg, err := GetConfig()
@@ -65,8 +78,6 @@ func TestGetInjectorConfig(t *testing.T) {
6578
t.Run("sidecar run options not set", func(t *testing.T) {
6679
t.Setenv("TLS_CERT_FILE", "test-cert-file")
6780
t.Setenv("TLS_KEY_FILE", "test-key-file")
68-
t.Setenv("SIDECAR_IMAGE", "daprd-test-image")
69-
t.Setenv("NAMESPACE", "test-namespace")
7081

7182
// Default values are true
7283
t.Setenv("SIDECAR_RUN_AS_NON_ROOT", "")

pkg/injector/service/pod_patch.go

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,10 +80,13 @@ func (i *injector) getPodPatchOperations(ctx context.Context, ar *admissionv1.Ad
8080
sidecar.CurrentTrustAnchors = trustAnchors
8181
sidecar.DisableTokenVolume = !token.HasKubernetesToken()
8282

83-
// Set addresses for actor services
83+
// Set addresses for actor services only if it's not explicitly globally disabled
8484
// Even if actors are disabled, however, the placement-host-address flag will still be included if explicitly set in the annotation dapr.io/placement-host-address
8585
// So, if the annotation is already set, we accept that and also use placement for actors services
86-
if sidecar.PlacementAddress == "" {
86+
if !i.config.GetActorsEnabled() {
87+
sidecar.ActorsService = ""
88+
sidecar.PlacementAddress = ""
89+
} else if sidecar.PlacementAddress == "" {
8790
// Set configuration for the actors service
8891
actorsSvcName, actorsSvc := i.config.GetActorsService()
8992
actorsSvcAddr := actorsSvc.Address(i.config.Namespace, i.config.KubeClusterDomain)

tests/apps/resiliencyapp/go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
module github.com/dapr/dapr/tests/apps/resiliencyapp
22

3-
go 1.24.1
3+
go 1.24.2
44

55
require (
66
github.com/dapr/dapr v0.0.0

0 commit comments

Comments
 (0)