Skip to content

gateway: auto select relay#53

Merged
x032205 merged 2 commits intomainfrom
PLATFRM-4
Nov 10, 2025
Merged

gateway: auto select relay#53
x032205 merged 2 commits intomainfrom
PLATFRM-4

Conversation

@x032205
Copy link
Member

@x032205 x032205 commented Nov 5, 2025

Automatically select relay if the --relay flag is not passed. Selection is done in this order of priority:

  • Is healthy?
  • Is org relay?
  • Is ping low?

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Implements automatic relay selection when the --relay flag is not provided. The selection prioritizes healthy relays (heartbeat < 1 hour), then org-specific relays over instance relays, and finally selects the relay with the lowest ping latency. A relay cache file is used to remember the last successful relay.

Key changes:

  • Removed required --relay flag from command examples
  • Added GetRelayName function in helper.go with smart relay selection logic
  • Moved gateway initialization after access token and relay name are obtained
  • Relay selection order: manual flag/env > cached relay (if healthy) > org relays (lowest ping) > instance relays (lowest ping) > random healthy relay

Critical issue found:

  • The code attempts to access the domain flag via cmd.Flags().GetString("domain"), but domain is only defined as a persistent flag on rootCmd, not on gatewayStartCmd. This will cause a runtime error when auto-selecting relays.

Confidence Score: 1/5

  • This PR has a critical bug that will cause runtime failures when auto-selecting relays
  • The domain flag access issue in helper.go:50 will cause the auto-relay selection feature to fail immediately. The domain flag is defined as a persistent flag on rootCmd, but the code tries to access it through cmd.Flags().GetString("domain") which won't work for subcommands unless the flag is explicitly added to them. This is a blocking issue that needs to be resolved before merge.
  • packages/util/helper.go requires immediate attention due to the domain flag access bug that will cause runtime errors

Important Files Changed

File Analysis

Filename Score Overview
packages/cmd/gateway.go 4/5 Refactored to call GetRelayName for automatic relay selection; moved gateway instantiation after obtaining access token and relay name
packages/util/helper.go 1/5 Added GetRelayName function with relay selection logic; contains critical bug with domain flag access that will cause runtime error

Sequence Diagram

sequenceDiagram
    participant User
    participant gatewayStartCmd
    participant GetRelayName
    participant InfisicalAPI
    participant HealthyRelays
    participant Cache

    User->>gatewayStartCmd: Start gateway (optional --relay flag)
    gatewayStartCmd->>gatewayStartCmd: Get access token
    gatewayStartCmd->>GetRelayName: Call with cmd, forceRefetch=false, token
    
    GetRelayName->>GetRelayName: Check --relay flag or INFISICAL_RELAY_NAME env
    alt Relay specified
        GetRelayName-->>gatewayStartCmd: Return specified relay
    else No relay specified (auto-select)
        GetRelayName->>GetRelayName: Get domain flag (BUG: not accessible)
        GetRelayName->>InfisicalAPI: GET /v1/relays
        InfisicalAPI-->>GetRelayName: All relays
        GetRelayName->>GetRelayName: Filter healthy relays (heartbeat < 1 hour)
        
        alt Has cached relay
            GetRelayName->>Cache: Read RELAY_CACHE_FILE
            Cache-->>GetRelayName: Cached relay name
            GetRelayName->>GetRelayName: Check if cached relay in healthy list
            GetRelayName->>GetRelayName: Ping cached relay (TCP :8443)
            alt Cached relay responsive
                GetRelayName-->>gatewayStartCmd: Return cached relay
            end
        end
        
        GetRelayName->>GetRelayName: Split into org relays & instance relays
        
        alt Has healthy org relays
            GetRelayName->>HealthyRelays: Ping all org relays (parallel)
            HealthyRelays-->>GetRelayName: Latencies
            GetRelayName->>GetRelayName: Select lowest latency org relay
        end
        
        alt No responsive org relays AND has instance relays
            GetRelayName->>HealthyRelays: Ping all instance relays (parallel)
            HealthyRelays-->>GetRelayName: Latencies
            GetRelayName->>GetRelayName: Select lowest latency instance relay
        end
        
        alt All pings failed
            GetRelayName->>GetRelayName: Random selection (prioritize org relays)
        end
        
        GetRelayName->>Cache: Write chosen relay to RELAY_CACHE_FILE
        GetRelayName-->>gatewayStartCmd: Return chosen relay name
    end
    
    gatewayStartCmd->>gatewayStartCmd: Create gateway instance with relay
    gatewayStartCmd->>gatewayStartCmd: Start gateway
Loading

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@x032205
Copy link
Member Author

x032205 commented Nov 10, 2025

@greptile review this

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Implements automatic relay selection when --relay flag is not provided. The selection prioritizes healthy relays, then org relays, then lowest ping.

Key changes:

  • Added CallGetRelays API to fetch available relays from platform
  • Created GetRelayName helper that selects best relay based on health, org membership, and latency
  • Removed relay flag requirement from gateway start commands
  • Implements caching to reuse last successful relay

Issues found:

  • Critical security concern: Relay Host field from API used without validation to construct network addresses (lines 83, 126 in helper.go). Compromised backend could return malicious hosts enabling SSRF/DNS manipulation attacks
  • Weak random number generation using math/rand instead of crypto/rand for relay selection (lines 186, 189)
  • Cache file permissions too permissive (0644 vs 0600)

The automatic selection logic is well-structured with proper fallbacks, but needs security hardening before merge.

Confidence Score: 2/5

  • Not safe to merge - critical security vulnerabilities in relay Host validation
  • Score of 2 due to multiple security issues: unvalidated relay Host field enables DNS manipulation/SSRF attacks if backend is compromised, weak RNG for security-sensitive relay selection, and overly permissive cache file permissions. The core logic is sound, but these security flaws must be fixed before merge.
  • packages/util/helper.go requires immediate attention for security fixes on Host validation (lines 83, 126) and RNG usage (lines 186, 189)

Important Files Changed

File Analysis

Filename Score Overview
packages/api/api.go 5/5 Added CallGetRelays function to fetch relays from API - straightforward implementation with proper error handling
packages/api/model.go 5/5 Added Relay struct and GetRelaysResponse type - clean data model additions
packages/util/helper.go 2/5 New GetRelayName function with auto-selection logic - contains security issues: unvalidated Host field enables DNS manipulation attacks, weak random number generation for relay selection

Sequence Diagram

sequenceDiagram
    participant Gateway as Gateway Start
    participant GetRelayName as GetRelayName()
    participant API as Infisical API
    participant Cache as Cache File
    participant Network as Network Ping

    Gateway->>GetRelayName: Request relay name (with token)
    GetRelayName->>GetRelayName: Check --relay flag
    alt Relay flag provided
        GetRelayName-->>Gateway: Return relay name
    else No relay flag
        GetRelayName->>API: CallGetRelays()
        API-->>GetRelayName: Return all relays
        GetRelayName->>GetRelayName: Filter healthy relays (heartbeat < 1hr)
        alt Has healthy relays
            GetRelayName->>Cache: Read cached relay name
            alt Cache exists & relay still healthy
                GetRelayName->>Network: Ping cached relay:8443
                alt Ping successful
                    GetRelayName-->>Gateway: Return cached relay
                else Ping failed
                    GetRelayName->>GetRelayName: Proceed to selection
                end
            end
            GetRelayName->>GetRelayName: Split into org/instance relays
            alt Has org relays
                GetRelayName->>Network: Ping all org relays (parallel)
                Network-->>GetRelayName: Return latencies
                GetRelayName->>GetRelayName: Select lowest latency
            else No org relays or all failed
                GetRelayName->>Network: Ping all instance relays (parallel)
                Network-->>GetRelayName: Return latencies
                GetRelayName->>GetRelayName: Select lowest latency
            end
            alt All pings failed
                GetRelayName->>GetRelayName: Random selection from healthy relays
            end
            GetRelayName->>Cache: Write selected relay name
            GetRelayName-->>Gateway: Return selected relay
        else No healthy relays
            GetRelayName-->>Gateway: Error: no healthy relays
        end
    end
Loading

3 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

@x032205 x032205 merged commit a760abc into main Nov 10, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants