feat: add SSH subscribe/execute/report remote-ops#1
Conversation
Device agent (BusyBox sh, no runtime) + cloud Go dispatcher (gliderlabs/ssh, independent of bastion): - device-agent/remote-agent.sh: poll fetch -> run via sh -> report, with offline spooling, idempotency and per-task timeout; inittab respawn snippet. - device-server/main.go: pubkey->devid auth (device never self-reports id), fetch/report verbs over SSH exec, file-backed task queue, taskid path- traversal allowlist, 4 MiB result cap. devid is derived from the authenticated public key; host-key pinning enforced on the device side (no -y). Verified end-to-end (fetch/execute/report, unauth key rejected, path traversal rejected).
There was a problem hiding this comment.
Code Review
This pull request introduces a minimal SSH-based 'subscribe / execute / report' system consisting of a shell-based device agent (remote-agent.sh) and a Go-based dispatcher server (device-server). Feedback on these changes highlights several critical security and robustness improvements. Key recommendations include validating taskid on the agent and devid on the server to prevent path traversal vulnerabilities, explicitly rejecting empty device IDs, avoiding standard input conflicts when executing scripts on the agent, preserving SSH stderr output for troubleshooting, properly detecting truncated execution reports, and handling file deletion errors to prevent infinite execution loops.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| [ -n "$resp" ] || return 0 | ||
|
|
||
| taskid=$(printf '%s\n' "$resp" | head -n1) | ||
| [ -n "$taskid" ] || return 0 |
There was a problem hiding this comment.
The taskid received from the server is used directly to write files in the spool directory ($SPOOL/$taskid). If the server is compromised or spoofed, a malicious taskid containing path traversal characters (e.g., ../../etc/cron.d/malicious) could allow arbitrary file write on the device. Validate the taskid to ensure it only contains safe characters.
| [ -n "$taskid" ] || return 0 | |
| [ -n "$taskid" ] || return 0 | |
| case "$taskid" in | |
| *[!A-Za-z0-9._-]* | "." | "..") | |
| log "invalid taskid: $taskid" | |
| return 0 | |
| ;; | |
| esac |
| } | ||
|
|
||
| func (d *dispatcher) handle(sess ssh.Session) { | ||
| devid, _ := sess.Context().Value("devid").(string) |
There was a problem hiding this comment.
If devid is empty (e.g., due to a misconfiguration or if another authentication mechanism is added later), the server will perform operations on the root of tasksDir or resultsDir (since filepath.Join(dir, "") resolves to dir). This could allow unauthorized access or directory listing of all tasks. Explicitly reject connections with an empty devid.
devid, _ := sess.Context().Value("devid").(string)
if devid == "" {
io.WriteString(sess.Stderr(), "unauthorized: missing device identity\n")
_ = sess.Exit(1)
return
}|
|
||
| script=$(printf '%s\n' "$resp" | tail -n +2) | ||
| log "executing task $taskid" | ||
| out=$(printf '%s\n' "$script" | timeout "$TASK_TIMEOUT" sh 2>&1) |
There was a problem hiding this comment.
Piping the script body into sh's standard input can cause issues if any command inside the script (such as read, ssh, or interactive tools) also reads from standard input. Those commands will consume the remaining lines of the script itself, leading to truncated execution or unexpected behavior. Running the script via sh -c with stdin redirected from /dev/null avoids this issue and is more efficient.
| out=$(printf '%s\n' "$script" | timeout "$TASK_TIMEOUT" sh 2>&1) | |
| out=$(timeout "$TASK_TIMEOUT" sh -c "$script" < /dev/null 2>&1) |
| run_once() { | ||
| flush_spool | ||
|
|
||
| resp=$(dbc fetch 2>/dev/null) || { log "fetch failed"; return 0; } |
There was a problem hiding this comment.
Redirecting stderr to /dev/null for dbc fetch hides useful SSH connection errors (e.g., host key mismatch, network timeout, authentication failure). Removing 2>/dev/null allows these errors to be printed to stderr, which is captured by the init system or syslog, greatly improving observability and troubleshooting.
| resp=$(dbc fetch 2>/dev/null) || { log "fetch failed"; return 0; } | |
| resp=$(dbc fetch) || { log "fetch failed"; return 0; } |
| fields := strings.SplitN(line, " ", 2) | ||
| if len(fields) != 2 { | ||
| return nil, fmt.Errorf("line %d: expected '<devid> <pubkey>'", i+1) | ||
| } | ||
| pk, _, _, _, err := gossh.ParseAuthorizedKey([]byte(fields[1])) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("line %d: %w", i+1, err) | ||
| } | ||
| devs = append(devs, device{id: fields[0], key: pk}) |
There was a problem hiding this comment.
Validate the devid loaded from the authorized_devices file to ensure it only contains safe characters (reusing validTaskID). This acts as a defense-in-depth measure to prevent path traversal or other unexpected filesystem issues if the configuration file is misconfigured or maliciously modified.
fields := strings.SplitN(line, " ", 2)
if len(fields) != 2 {
return nil, fmt.Errorf("line %d: expected '<devid> <pubkey>'", i+1)
}
devid := fields[0]
if !validTaskID(devid) {
return nil, fmt.Errorf("line %d: invalid devid %q", i+1, devid)
}
pk, _, _, _, err := gossh.ParseAuthorizedKey([]byte(fields[1]))
if err != nil {
return nil, fmt.Errorf("line %d: %w", i+1, err)
}
devs = append(devs, device{id: devid, key: pk})| body, err := io.ReadAll(io.LimitReader(sess, d.maxResult)) | ||
| if err != nil { | ||
| log.Printf("report read %s/%s: %v", devid, taskid, err) | ||
| _ = sess.Exit(1) | ||
| return | ||
| } |
There was a problem hiding this comment.
Using io.LimitReader(sess, d.maxResult) silently truncates the report if it exceeds the maximum size, and the server still deletes the task as if it succeeded. This can lead to silent loss of execution logs. Reading d.maxResult + 1 bytes allows the server to detect when the limit is exceeded, reject the report with an error, and keep the task in the queue for troubleshooting.
| body, err := io.ReadAll(io.LimitReader(sess, d.maxResult)) | |
| if err != nil { | |
| log.Printf("report read %s/%s: %v", devid, taskid, err) | |
| _ = sess.Exit(1) | |
| return | |
| } | |
| body, err := io.ReadAll(io.LimitReader(sess, d.maxResult+1)) | |
| if err != nil { | |
| log.Printf("report read %s/%s: %v", devid, taskid, err) | |
| _ = sess.Exit(1) | |
| return | |
| } | |
| if int64(len(body)) > d.maxResult { | |
| io.WriteString(sess.Stderr(), "error: result exceeds maximum allowed size\n") | |
| _ = sess.Exit(2) | |
| return | |
| } |
| _ = sess.Exit(1) | ||
| return | ||
| } | ||
| _ = os.Remove(filepath.Join(d.tasksDir, devid, taskid)) |
There was a problem hiding this comment.
Ignoring the error from os.Remove can lead to silent infinite task execution loops if the task file cannot be deleted (e.g., due to permission issues). Log the error so that operators are alerted to filesystem issues.
if err := os.Remove(filepath.Join(d.tasksDir, devid, taskid)); err != nil {
log.Printf("failed to remove completed task %s/%s: %v", devid, taskid, err)
}- Removed the standalone device-server, integrating its fetch/report logic into the sshportal main program. - Updated README to reflect the new management commands for devices, device keys, and device tasks. - Implemented new database models for Device, DeviceKey, DeviceTask, and DeviceReport. - Added device management commands in the bastion shell for creating, inspecting, enabling, disabling, and removing devices. - Introduced device task management commands for creating, inspecting, listing, and canceling tasks. - Implemented device RPC handling for fetch and report commands, ensuring compatibility with existing device-agent scripts. - Enhanced authentication context to support device-specific operations.
…nality - Added WebSession model to handle cookie-based authentication. - Introduced session management in the API server with session creation and expiration. - Implemented login and logout handlers to manage user sessions. - Enhanced auth middleware to support both API tokens and web sessions. - Added session garbage collection to periodically clean up expired sessions. - Updated server to serve static files with optional on-disk overrides for the embedded UI. - Introduced a new endpoint for user authentication and session management. - Added a full-stack demo for end-to-end testing of the new features.
- Introduced new API endpoints for managing hosts and SSH keys, including listing, creating, retrieving, and deleting hosts. - Implemented WebSocket support for shell access to hosts, allowing users to interact with remote systems via a terminal interface. - Enhanced the Server struct to include AES key and ACL check command for secure operations. - Added CheckACL function to share authorization logic between the web API and bastion. - Updated the API server initialization to accept new parameters for encryption and ACL checks.
- Introduced new SFTP handling routes in the API to manage file operations (list, upload, download, mkdir, remove, rename). - Implemented SFTP client connection and authentication logic in a new sftp.go file. - Updated the existing hosts.go file to route SFTP requests appropriately. - Added necessary data structures and response formats for SFTP operations. - Enhanced error handling and logging for SFTP actions.
- Introduced device status derived from LastSeenAt with states: online, stale, and offline. - Implemented deviceInfo model to store periodic self-reports from devices. - Enhanced taskDTO to include task history and duration metrics. - Added new API endpoints for device info and task management. - Implemented event logging for task lifecycle events including queued, dispatched, reported, and cancelled. - Updated the agent to send periodic info snapshots to the server. - Added documentation for the device agent and REST API usage. - Improved error handling and response structures across various endpoints.
Device agent (BusyBox sh, no runtime) + cloud Go dispatcher (gliderlabs/ssh, independent of bastion):
devid is derived from the authenticated public key; host-key pinning enforced on the device side (no -y). Verified end-to-end (fetch/execute/report, unauth key rejected, path traversal rejected).