Commit 2a302ae
authored
Add independent DPU upgrade support via gNOI streaming operations (#532)
* Add gRPC interceptor framework and DPU proxy for gNMI
Implement a gRPC interceptor framework to support request routing
based on metadata headers. Add DPU proxy interceptor that detects
requests with x-sonic-ss-target-type=dpu metadata and resolves
DPU information from Redis STATE_DB.
Changes:
- Add pkg/interceptors: Core interceptor framework with chaining support
- Add pkg/interceptors/dpuproxy: DPU proxy implementation with:
- Metadata extraction for x-sonic-ss-target-type and x-sonic-ss-target-index
- Redis client interface and adapter for pure Go compatibility
- DPU resolver to query CHASSIS_MIDPLANE_TABLE from STATE_DB
- Comprehensive unit and integration tests using miniredis
Current implementation (Phase 1.5):
- Intercepts unary and streaming gRPC calls
- Extracts DPU routing metadata from request headers
- Resolves DPU IP address and reachability status from Redis
- Logs resolved information and passes through to local handler
Future work (Phase 2):
- Implement actual gRPC forwarding to DPU servers
- Add connection pooling for DPU clients
- Handle bidirectional streaming proxy
Testing:
- All pure Go packages pass make -f pure.mk ci
- Integration tests with miniredis for Redis operations
- Verified on hardware with DPU metadata headers
* Implement DPU gRPC client framework with Time method forwarding
Add connection pooling with keepalive for long-lived connections to DPU
gNOI servers. Implement forwarding for System.Time method to enable
testing of the DPU proxy infrastructure.
* Implement gNOI File.Put RPC with comprehensive unit tests
- Enable File.Put RPC in gnmi_server to handle file uploads
- Implement HandlePut function in pkg/gnoi/file with:
- Streaming file upload support
- MD5 hash verification
- Path security validation (only /tmp and /var/tmp allowed)
- Atomic file write (temp file + rename)
- Container path translation support
- Add comprehensive unit tests covering:
- Successful uploads and multi-chunk transfers
- Hash validation and mismatch detection
- Path security enforcement
- Error handling for invalid inputs
- Large file uploads (1MB test)
- Edge cases (EOF, invalid messages)
- Achieve 86.4% test coverage with 83.3% for HandlePut
* Refactor DPU TransferToRemote logic to pure package
- Move HandleTransferToRemoteForDPU from gnoi_file.go to pkg/gnoi/file
- Update gnoi_file.go to use pure package function
- Add proper parameter validation and documentation
- Maintain same functionality with cleaner separation
- Business logic now in testable pure package
* Fix hardcoded DPU gNMI port - read from CONFIG_DB
- Add CONFIG_DB lookup for DPU gNMI port configuration
- Update DPUResolver to query both StateDB and ConfigDB
- Add GNMIPort field to DPUInfo structure
- Update getConnection to use dynamic port instead of hardcoded 50052
- Initialize both Redis clients in telemetry.go
- Default to 50052 if CONFIG_DB doesn't specify gnmi_port
- Improve error messages to show actual connection targets
Fixes connection failures when DPUs use non-standard gNMI ports.
* Add robust DPU gNMI port discovery with fallback
- Try multiple common gNMI ports (8080, 50052) if first fails
- Add GNMIPortsToTry field to DPUInfo for port list
- Update getConnection to attempt multiple ports with quick health check
- Cache successful port for better logging
- Prioritize CONFIG_DB configured port, then try common alternatives
- Improve error messages to show all attempted ports
- Prevents connection failures due to port misconfigurations
This makes the DPU proxy resilient to:
- CONFIG_DB vs actual service port mismatches
- Different DPU gNMI service configurations
- Port configuration changes
* Refactor System.SetPackage to use sonic-installer instead of dbus
- Move SetPackage business logic from gnmi_server to pure pkg/gnoi/system package
- Replace dbus.InstallImage() with sonic-installer install -y command execution
- Replace dbus.DownloadImage() with requirement for local images only
- Use pkg/exec for sonic-installer command execution via nsenter
- Add comprehensive test coverage for pure packages (71.1% coverage)
- Remove RemoteDownload support - images must be local for security
- Update pure.mk to include new testable packages: pkg/exec, pkg/gnoi/os, pkg/gnoi/system
- Simplify gnmi_server/gnoi_system.go SetPackage from 87 lines to 22 lines
- Also refactor OS.Verify to use sonic-installer list instead of dbus
This eliminates dbus dependency for image management operations and makes
the business logic testable without SONiC environment dependencies.
Tested end-to-end: System.SetPackage successfully installs and activates
SONiC images using sonic-installer commands.
* Fix code formatting with gofmt
- Remove trailing empty lines and fix indentation
- Clean up whitespace in struct field alignment
- Remove extra blank lines between constants
No functional changes, only formatting cleanup.
* Fix dpuproxy package tests to use dual Redis clients
Update test functions to match NewDPUResolver constructor signature that
requires separate state and config Redis clients instead of single client.
This fixes the broken CI for pure packages.
* Add DPU reboot capability to gNOI System.Reboot RPC
- Add HandleDPUReboot function to pkg/gnoi/system for DPU-specific reboot logic
- Modify System.Reboot to detect DPU metadata and route to HandleDPUReboot when present
- Register /gnoi.system.System/Reboot in DPU proxy with HandleOnNPU mode
- Use reboot -d DPU{index} command via pkg/exec for host-level DPU control
- Handle async reboot behavior - don't treat non-zero exit codes as errors
- Add comprehensive tests for DPU reboot validation and function signatures
- Preserve existing system reboot functionality for non-DPU requests
This enables remote DPU rebooting via gRPC metadata headers:
x-sonic-ss-target-type: dpu
x-sonic-ss-target-index: 3
Tested with DPU3 on NPU 10.3.146.230 - successfully reboots DPU.
* Add DPU streaming support for gNOI System.SetPackage RPC
- Add SetPackage to forwardable methods registry for DPU proxy
- Implement forwardSetPackageStream function for streaming RPC forwarding
- Increase timeouts for sonic-installer commands (10min install, 2min set-default)
- Enable complete DPU package installation workflow through proxy
* Implement streaming proxy for DPU File.TransferToRemote operations
This change replaces the inefficient download-store-upload pattern with a
direct streaming proxy that eliminates NPU disk usage and minimizes memory
consumption for large file transfers to DPU devices.
Key improvements:
- Zero NPU disk usage (was: full file size)
- Constant memory usage ~128KB (was: full file in RAM)
- Concurrent MD5 hash calculation during streaming
- Direct HTTP-to-gRPC streaming with io.TeeReader
- Comprehensive test coverage (85.5% download, 91.7% hash)
Components added:
- internal/download: DownloadHTTPStreaming() for size-limited HTTP streams
- internal/hash: StreamingMD5Calculator for concurrent hash calculation
- pkg/gnoi/file: HandleTransferToRemoteForDPUStreaming() main proxy logic
- gnmi_server: Updated routing to use streaming implementation
The streaming proxy maintains the same security model and error handling
while dramatically improving resource efficiency for large firmware and
package transfers to DPU devices.
* format fix
* Fix duplicate OSServer function declarations causing build failure
Remove duplicate Activate and Verify functions from gnoi.go that were
incorrectly added during merge resolution. These functions already exist
in gnoi_os.go as part of the gNOI OS refactor from master branch.
The duplication caused compilation errors:
- OSServer.Activate redeclared
- OSServer.Verify redeclared
Also cleaned up unused imports that were only needed by the removed functions.
* Move DPU interceptor setup logic to pure package with proper cleanup
Extract DPU proxy creation logic from telemetry.go into pkg/interceptors/setup.go
to improve testability and fix resource leaks identified by Copilot review.
Changes:
- Add pkg/interceptors/setup.go with NewServerChain() factory function
- Simplify telemetry.go from 13 lines of setup to 4 lines
- Fix Redis client resource leak with proper defer cleanup
- Maintain pure package architecture for better testing
This addresses Copilot feedback about Redis clients never being closed,
preventing connection leaks during server operation.
* Fix critical closure variable capture bug in interceptor chain
Address closure capture issue identified by GitHub Copilot where loop variables
were captured by reference instead of value, causing all interceptors to use
the last iteration's values.
Changes:
- Fix UnaryInterceptor: Use immediate function invocation to capture variables by value
- Fix StreamInterceptor: Apply same pattern for streaming RPCs
- Remove unused "strings" import from gnoi.go
This bug would have caused complete failure of the DPU interceptor chain,
where all interceptors would reference the same (last) interceptor instance,
breaking the entire DPU upgrade workflow.
The fix uses the Go idiom of immediate function invocation to capture loop
variables by value rather than reference, ensuring each interceptor closure
captures the correct interceptor instance.
* Fix temp file cleanup in gNOI File.Put operations
Improve temp file cleanup logic to only remove files on error paths,
preventing removal of successfully renamed files. Add error logging
for cleanup failures to aid troubleshooting.
Changes:
- Replace unconditional os.Remove with conditional cleanup
- Check file existence before attempting removal
- Log cleanup errors without failing the operation
- Add missing log import for error reporting
This resolves a potential temp file leak when rename operations
fail after successful file writes.
* Improve error handling and resource management across core components
* Enhanced temp file cleanup error handling in gNOI file operations
* Simplified exit code handling in command execution for more consistent behavior
* Replaced connection state checks with actual RPC health checks in DPU proxy
* Added proper interceptor chain lifecycle management in telemetry server
* Improved logging for DPU reboot operations to reflect expected behavior
* Added detailed comments for file size limit detection logic
These changes focus on defensive programming practices and prevent resource leaks in long-running services.
* Remove obsolete tests after pure package refactoring
* Remove TestSetPackage tests in gnmi_server that tested deprecated dbus implementation
* Remove Put_Fails_with_Unimplemented_Error test since gNOI File.Put is now implemented
* Keep pure package tests in pkg/gnoi/system for new sonic-installer implementation
These test removals address pipeline failures caused by tests expecting obsolete behavior
after the DPU interceptor and pure package refactoring.
* Enhance test coverage for pure packages
- Add comprehensive tests for pkg/gnoi/file/file.go (39.3% -> 70.1% coverage)
- Add extensive DPU function validation and HTTP streaming tests
- Add missing tests for pkg/interceptors/setup.go (0% -> 97.1% coverage)
- Enhance pkg/gnoi/os/os_test.go with context cancellation tests
- Add validation tests for pkg/gnoi/system/system_test.go
- Improve error path coverage across all pure packages
- Add security validation and edge case tests
Coverage improvements:
- pkg/interceptors/setup.go: 0% -> 97.1% (+97.1%)
- pkg/gnoi/file/file.go: 39.3% -> 70.1% (+30.8%)
- pkg/gnoi/os/os.go: 41.7% -> 66.7% (+25.0%)
- Comprehensive validation and error handling coverage
Note: DPU gRPC functions remain at ~55-57% due to untestable
networking code that requires full integration test setup.
* fix format
* just to rerun pipeline
* Refactor DPU routing logic to pure Go package and improve test coverage
Move DPU metadata parsing from gnmi_server/gnoi_file.go to pkg/gnoi/file/file.go
to enable easier testing without CGO dependencies. This improves architecture
by separating authentication concerns from business logic.
Changes:
- Move DPU routing logic into HandlePut function in pure Go package
- Simplify gnmi_server Put implementation to just auth + delegation
- Add comprehensive tests for DPU routing in pure Go package
- Achieve 71.4% coverage for pkg/gnoi/file package
- Add missing Put success path test for gnmi_server coverage
Benefits:
- Better testability with pure Go package (no CGO required)
- Cleaner separation of concerns (auth vs business logic)
- Reduced complexity in gnmi_server layer
- Improved test coverage for DPU functionality
* Achieve 87.1% test coverage for pkg/gnoi/file package
- Add comprehensive monkey patching tests for DPU functions using gomonkey
- Update pure.mk to include -gcflags="all=-N -l" for monkey patching compatibility
- Add coverage boost tests for edge cases and error paths
- Fix infinite recursion issues in container path testing
Coverage improvements:
- HandleTransferToRemoteForDPU: 57.1% → 83.7%
- HandleTransferToRemoteForDPUStreaming: 55.0% → 88.3%
- Overall package coverage: 71.4% → 87.1% (exceeds 80% requirement)
The monkey patching tests mock the entire call chain (HandleTransferToRemote,
grpc.Dial, gNOI file client methods) to test DPU routing logic without
requiring actual network connections or external dependencies.
Pipeline fix: The key issue was that Azure pipeline ran tests without
-gcflags="all=-N -l" flags needed for monkey patching. Without these flags,
Go compiler optimizations prevent gomonkey from working, causing DPU function
tests to fail and resulting in lower coverage (71.4% vs 87.1%).
* Achieve 100% test coverage for pkg/gnoi/os package
- Add comprehensive monkey patching tests for HandleVerify function
- Mock exec.RunHostCommand to test success and error paths
- Test various sonic-installer output formats and parsing edge cases
Coverage improvements:
- HandleVerify: 42.9% → 100%
- parseCurrentVersion: 100% (maintained)
- Overall package coverage: 66.7% → 100%
The monkey patching tests cover all execution paths:
- Successful sonic-installer execution with version parsing
- Command execution failures (permission denied, etc.)
- Version parsing failures (malformed output)
- Different version string formats (standard, master builds, with spaces)
This exceeds the 80% coverage requirement for the pipeline.
* Achieve 100% test coverage for pkg/gnoi/system package
- Add comprehensive monkey patching tests for all system functions
- Mock exec.RunHostCommand to test success and error paths
- Test HandleDPUReboot function with various DPU indices
- Test package installation and activation workflows
Coverage improvements:
- HandleSetPackage: 75.9% → 100%
- installPackage: 66.7% → 100%
- activatePackage: 66.7% → 100%
- HandleDPUReboot: 0% → 100%
- Overall package coverage: 60.7% → 100%
The monkey patching tests cover all execution paths:
- Successful package installation with and without activation
- Command execution failures (permission denied, command not found)
- DPU reboot operations for different DPU indices
- Error handling in both installation and activation phases
This far exceeds the 80% coverage requirement for the pipeline.
* Extract business logic from server wrappers to pure handlers for testability
Moved DPU metadata extraction and routing logic from server wrapper files
into testable pure handler functions to improve test coverage and resolve
diff coverage issues preventing merge approval.
Changes:
- Consolidated DPU routing logic into existing handlers (HandleTransferToRemote, HandleReboot, HandleSetPackageStream)
- Simplified server wrappers to delegate all business logic to pure handlers
- Added comprehensive tests using gomonkey for newly testable pure functions
- Coverage improvements: file 87.8%, system 95.9% (target: 80% minimum)
Fixes diff coverage below threshold in:
- gnmi_server/gnoi_file.go (lines 117,120,125-126)
- gnmi_server/gnoi_system.go (lines 206,210,337,342)
* Format and cleanup test files
* Fix compilation errors for incremental build
Remove unused imports and fix variable declarations:
- Remove unused dpuproxy import from gnoi_system.go
- Remove unused metadata import from gnoi_file.go
- Fix variable naming conflict in sendRebootReqOnNotifCh usage
Enables successful dpkg-buildpackage build with extracted business logic.
* Replace NPU terminology with generic local device terminology
Update symbols and comments to use more accurate terminology:
- HandleOnNPU → HandleLocally (clearer intent for local processing)
- handleTransferToRemoteNPU → handleTransferToRemoteLocal
- TestHandleReboot_NPU_Fallback → TestHandleReboot_Local_Fallback
- Update comments: 'NPU fallback/handling' → 'local fallback/handling'
Improves architectural clarity by distinguishing between:
- DPU operations (forwarded to remote DPU)
- Local operations (processed on host device)
No functional changes - purely terminology cleanup for better code readability.1 parent 6052f33 commit 2a302ae
File tree
39 files changed
+9071
-491
lines changed- gnmi_server
- internal
- download
- hash
- pkg
- exec
- gnoi
- file
- os
- system
- interceptors
- dpuproxy
- telemetry
39 files changed
+9071
-491
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
| 100 | + | |
| 101 | + | |
101 | 102 | | |
102 | 103 | | |
103 | 104 | | |
104 | 105 | | |
105 | 106 | | |
106 | 107 | | |
107 | 108 | | |
| 109 | + | |
| 110 | + | |
108 | 111 | | |
109 | 112 | | |
110 | 113 | | |
111 | | - | |
| 114 | + | |
| 115 | + | |
112 | 116 | | |
113 | 117 | | |
114 | 118 | | |
115 | 119 | | |
116 | 120 | | |
117 | 121 | | |
118 | 122 | | |
119 | | - | |
120 | | - | |
| 123 | + | |
121 | 124 | | |
122 | 125 | | |
123 | 126 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| 20 | + | |
19 | 21 | | |
20 | 22 | | |
21 | 23 | | |
22 | 24 | | |
23 | | - | |
24 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
25 | 35 | | |
26 | 36 | | |
27 | 37 | | |
| |||
40 | 50 | | |
41 | 51 | | |
42 | 52 | | |
43 | | - | |
| 53 | + | |
44 | 54 | | |
45 | 55 | | |
46 | 56 | | |
47 | 57 | | |
48 | | - | |
| 58 | + | |
49 | 59 | | |
50 | 60 | | |
51 | 61 | | |
52 | 62 | | |
53 | 63 | | |
54 | 64 | | |
55 | | - | |
| 65 | + | |
56 | 66 | | |
57 | 67 | | |
58 | 68 | | |
| |||
188 | 198 | | |
189 | 199 | | |
190 | 200 | | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | 201 | | |
207 | 202 | | |
208 | 203 | | |
| |||
236 | 231 | | |
237 | 232 | | |
238 | 233 | | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
239 | 299 | | |
240 | 300 | | |
241 | 301 | | |
| |||
399 | 459 | | |
400 | 460 | | |
401 | 461 | | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
402 | 526 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
198 | 199 | | |
199 | 200 | | |
200 | 201 | | |
201 | | - | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
202 | 216 | | |
203 | 217 | | |
204 | 218 | | |
| |||
211 | 225 | | |
212 | 226 | | |
213 | 227 | | |
214 | | - | |
| 228 | + | |
215 | 229 | | |
216 | 230 | | |
217 | 231 | | |
218 | | - | |
| 232 | + | |
219 | 233 | | |
220 | | - | |
| 234 | + | |
221 | 235 | | |
222 | | - | |
| 236 | + | |
223 | 237 | | |
224 | 238 | | |
225 | 239 | | |
| |||
308 | 322 | | |
309 | 323 | | |
310 | 324 | | |
311 | | - | |
312 | | - | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | | - | |
318 | | - | |
319 | | - | |
320 | | - | |
321 | | - | |
322 | | - | |
323 | | - | |
324 | | - | |
325 | | - | |
326 | | - | |
327 | | - | |
328 | | - | |
329 | | - | |
330 | | - | |
331 | | - | |
332 | | - | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
337 | | - | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | | - | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
362 | | - | |
363 | | - | |
364 | | - | |
365 | | - | |
366 | | - | |
367 | | - | |
368 | | - | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | | - | |
376 | | - | |
377 | | - | |
378 | | - | |
379 | | - | |
380 | | - | |
381 | | - | |
382 | | - | |
383 | | - | |
384 | | - | |
385 | | - | |
386 | | - | |
387 | | - | |
| 325 | + | |
| 326 | + | |
388 | 327 | | |
389 | 328 | | |
390 | 329 | | |
| |||
0 commit comments