Skip to content

Conversation

@hiroTamada
Copy link
Contributor

@hiroTamada hiroTamada commented Sep 30, 2025

<!-- mesa-description-start -->
<!-- mesa-description-start -->
<!-- mesa-description-start -->
<!-- mesa-description-start -->
Initially
image

After running

➜  ~ curl -X PATCH http://localhost:444/display \
  -H "Content-Type: application/json" \
  -d '{"width": 1920, "height": 1080}'
{"height":1080,"width":1920}
image

For headless, I exec-ed into the running docker, and I confirmed the xvfb config by

root@18f971b11d5d:/# ps aux | grep Xvfb
root      1675  0.0  0.8 203536 69648 ?        S    18:31   0:00 Xvfb :1 -ac -screen 0 1920x1080x24 -retro -dpi 96 -nolisten tcp -nolisten unix
root      2195  0.0  0.0   2904  1532 pts/1    S+   18:34   0:00 grep --color=auto Xvfb

TL;DR

This PR introduces a PATCH /display API endpoint to dynamically change the browser's viewport resolution without restarting the container.

Why we made these changes

This allows users to programmatically control the browser's viewport size, which is crucial for testing responsive designs, simulating different device screens, and tailoring the resolution for various automation tasks.

What changed?

  • API: Added a PATCH /display endpoint in server/cmd/api/api/display.go to allow dynamic resolution changes for both headful (Xorg) and headless (Xvfb) modes.
  • Resolution Logic: The new handler uses xrandr for Xorg and supervisor restarts for Xvfb. It includes a require_idle safety check and restarts Chromium to apply the new viewport.
  • Neko Integration: Added an authenticated Neko client (server/lib/nekoclient/client.go) to manage headful sessions.
  • Dependencies: Added the x11-xserver-utils package to the images/chromium-headful/Dockerfile to provide xrandr.
  • Configuration: Updated images/chromium-headful/xorg.conf to include additional display modes.
  • API Spec: Updated openapi.yaml with the new display endpoint and regenerated the Go client.
  • Scripts: Modified run-docker.sh and run-unikernel.sh to pass Chromium flags as a JSON array instead of a space-separated string.
  • Testing: Added a new e2e test, TestDisplayResolutionChange, to validate the functionality.

Validation

  • Manually tested resolution change with the curl command.
  • Verified Chromium restarts and fits the new resolution.
  • Ensured require_idle flag prevents changes during an active session.

Description generated by Mesa. Update settings
<!-- mesa-description-end -->

Description generated by Mesa. Update settings
<!-- mesa-description-end -->

Description generated by Mesa. Update settings
<!-- mesa-description-end -->

Description generated by Mesa. Update settings

@hiroTamada hiroTamada marked this pull request as ready for review October 4, 2025 04:51
Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of 2d73dc8...fc57289

Analysis

  1. Session Disruption Risk: The requirement to restart Chromium after resolution changes could negatively impact user experience and session continuity.

  2. Deployment Infrastructure Coupling: Direct modification of supervisor configuration files for Xvfb creates tight coupling between the application and deployment infrastructure.

  3. System Component Coordination Complexity: The implementation requires orchestrating multiple system components (X server, supervisor, Chromium), increasing the number of potential failure points and making the system more brittle.

  4. Runtime Flexibility vs. Stability Tradeoff: While the configurable viewport adds runtime flexibility, it also introduces additional complexity that could impact system stability.

Tip

⚡ Quick Actions

This review was generated by Mesa.

Actions:

Slash Commands:

  • /review - Request a full code review
  • /review latest - Review only changes since the last review
  • /describe - Generate PR description. This will update the PR body or issue comment depending on your configuration
  • /help - Get help with Mesa commands and configuration options

4 files reviewed | 0 comments | Review on Mesa | Edit Reviewer Settings

cursor[bot]

This comment was marked as outdated.

@hiroTamada hiroTamada changed the title WIP: POC for configurable viewport Configurable viewport Oct 4, 2025
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

wget ca-certificates python2 supervisor xclip xdotool \
pulseaudio dbus-x11 xserver-xorg-video-dummy \
libcairo2 libxcb1 libxrandr2 libxv1 libopus0 libvpx7 \
x11-xserver-utils \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need this for xrandr

cursor[bot]

This comment was marked as outdated.

@hiroTamada hiroTamada requested a review from rgarcia October 7, 2025 23:09
s.ProcessExec(ctx, oapi.ProcessExecRequestObject{Body: &removeEnvReq})

// Add the environment line with WIDTH and HEIGHT
addEnvCmd := []string{"-lc", fmt.Sprintf(`sed -i '/\[program:xvfb\]/a environment=WIDTH="%d",HEIGHT="%d",DPI="96",DISPLAY=":1"' /etc/supervisor/conf.d/services/xvfb.conf`, width, height)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is sufficiently intricate where an e2e test might be beneficial. See https://github.com/onkernel/kernel-images/blob/main/server/e2e/e2e_chromium_test.go#L102-L219

Copy link
Contributor

@Sayan- Sayan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great stuff!

two general thoughts for neko:

  1. instead of raw http requests I wonder if it would be easier to generate an oapi client of their spec (e.g. https://github.com/onkernel/neko/blob/aa0487f68ebbff1056d7355ec2da127986e5be5f/server/openapi.yaml#L146-L170 +
  2. It might be helpful to pull out the neko bits into their own client that internally manages the token rather than that being a responsibility of the ApiService

Comment on lines 1181 to 1184
refresh_rate:
type: integer
enum: [60, 30, 25]
description: Display refresh rate in Hz. If omitted, uses the highest available rate for the resolution.
Copy link
Contributor

@Sayan- Sayan- Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double checking - did we also run numbers on impact of refresh rate in benchmarking? I don't see anything in https://docs.google.com/spreadsheets/d/1VBhIEoNTJoD95gKsalM3XUH2lDnz2XwFlyVwECV-45I/edit?gid=0#gid=0

namely, I'd think higher refresh rates would consume more resources for the gstreamer pipeline backing live views, so I'm curious on the numbers there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the api side, we are only going to allow these following configurations

1024x768x60
1920x1080x60
2560x1440x30
1920 x 1200x60
1440 x 900x30

On the images side, I think we can leave it flexible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we also want 10 in here?

Comment on lines 1205 to 1216
live_view_sessions:
type: integer
description: Number of active Neko viewer sessions.
is_recording:
type: boolean
description: Whether recording is currently active
is_replaying:
type: boolean
description: Whether replay is currently active
resizable_now:
type: boolean
description: True when no blockers are present for resizing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need all these bits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not... I can just return the current width, height and refresh_rate

if requireIdle {
live := s.getActiveNekoSessions(ctx)
isRecording := s.anyRecordingActive(ctx)
isReplaying := false // replay not currently implemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this droppable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my thought process.

We can make the images api to be flexible to accommodate both cases where it requires no active sessions or not.

If the request says requireIdle, then we reject the request if there are active sessions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry this was in reference to isReplaying. I think it's just a dupe of isRecording and it's not implemented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see. Yeah, I will do that. Good thanks!!

displayMode := s.detectDisplayMode(ctx)

// Parse restartChromium flag (default depends on mode)
restartChrome := (displayMode == "xvfb") // default true for xvfb, false for xorg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking my understanding - why don't we restart for headful?

Copy link
Contributor Author

@hiroTamada hiroTamada Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use this flag to --start-maximized to start chromium.

So when x.org expands, the chromium expands. But, we can optionally override this through the request. chromium restarts take time, so we want to avoid it as much as possible

Comment on lines +122 to +142
// detectDisplayMode detects whether we're running Xorg (headful) or Xvfb (headless)
func (s *ApiService) detectDisplayMode(ctx context.Context) string {
log := logger.FromContext(ctx)
checkCmd := []string{"-lc", "supervisorctl status xvfb >/dev/null 2>&1 && echo 'xvfb' || echo 'xorg'"}
checkReq := oapi.ProcessExecRequest{Command: "bash", Args: &checkCmd}
checkResp, _ := s.ProcessExec(ctx, oapi.ProcessExecRequestObject{Body: &checkReq})

if execResp, ok := checkResp.(oapi.ProcessExec200JSONResponse); ok {
if execResp.StdoutB64 != nil {
if output, err := base64.StdEncoding.DecodeString(*execResp.StdoutB64); err == nil {
outputStr := strings.TrimSpace(string(output))
if outputStr == "xvfb" {
log.Info("detected Xvfb display (headless mode)")
return "xvfb"
}
}
}
}
log.Info("detected Xorg display (headful mode)")
return "xorg"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly wonder if this is easier as an env var

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.... good point. Let me research this. I talked about this with Raf as well. He was more comfortable with this approach

Comment on lines 161 to 172
// setResolutionXorgViaNeko changes resolution for Xorg using Neko API
func (s *ApiService) setResolutionXorgViaNeko(ctx context.Context, width, height, refreshRate int, restartChrome bool) error {
if err := s.setResolutionViaNeko(ctx, width, height, refreshRate); err != nil {
return fmt.Errorf("failed to change resolution via Neko API: %w", err)
}

if restartChrome {
s.restartChromium(ctx)
}

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this wrapper?

Comment on lines 438 to 440
// getNekoToken obtains a bearer token from Neko API for authentication.
// It caches the token and reuses it for subsequent requests.
func (s *ApiService) getNekoToken(ctx context.Context) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - we should pull the neko port / url from the env instead of hardcoding

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long do these tokens last for? do they survive neko restarts? mainly wondering if we could just do it at API start up time so it's always ready but can see there could be problems with that approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the token lifetime is technically infinite. But you can manually expire it or rotate it. Maybe it is safer to maintain the current implementation?

@hiroTamada hiroTamada requested review from Sayan- and rgarcia October 8, 2025 20:04
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

eval "FLAGS_ARRAY=($CHROMIUM_FLAGS)"
else
FLAGS_ARRAY=()
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Environment Variable Injection Vulnerability

The use of eval "FLAGS_ARRAY=($CHROMIUM_FLAGS)" in both scripts introduces a command injection vulnerability. Since CHROMIUM_FLAGS can be set via environment variables, malicious input could lead to arbitrary shell code execution.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is fine. It is not for production use case

fi
done
FLAGS_JSON+=']}'
echo "$FLAGS_JSON" > "$FLAGS_FILE"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Shell Code Injection via User-Controlled Flags

The eval "FLAGS_ARRAY=($CHROMIUM_FLAGS)" command in both scripts introduces a command injection vulnerability. Since CHROMIUM_FLAGS can be user-controlled, malicious input could lead to arbitrary shell code execution.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, not user facing and this is fine

// Note: Using bash -c (not -lc) to avoid login shell overriding DISPLAY env var
cmd := exec.CommandContext(ctx, "bash", "-c", "xrandr | grep -E '\\*' | awk '{print $1}'")
cmd.Env = append(os.Environ(), fmt.Sprintf("DISPLAY=%s", display))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Insecure Command Execution in Resolution Fetching

The getCurrentResolution function directly uses exec.CommandContext with bash -c instead of the ProcessExec API. This approach is fragile and vulnerable to command injection due to unescaped DISPLAY and xrandr output parsing. It also bypasses internal process controls and logging, and handles the DISPLAY environment variable inconsistently.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not user facing path. This is fine

"--privileged",
"--network=host",
"-p", "10001:10001", // API server
"-p", "9222:9222", // DevTools proxy
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: API URL Mismatch After Network Change

The runContainer function switched from --network=host to explicit port mapping (-p 10001:10001). This means the hardcoded apiBaseURL (http://127.0.0.1:10001) no longer correctly points to the container's API server from the host, causing API calls in tests that use runContainer to fail.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to do this to run this on Mac. I will revert if this is not necessary

s.ProcessExec(ctx, oapi.ProcessExecRequestObject{Body: &removeEnvReq})

// Add the environment line with WIDTH and HEIGHT
addEnvCmd := []string{"-lc", fmt.Sprintf(`sed -i '/\[program:xvfb\]/a environment=WIDTH="%d",HEIGHT="%d",DPI="96",DISPLAY=":1"' /etc/supervisor/conf.d/services/xvfb.conf`, width, height)}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Vulnerability in sed Command Injection

The sed commands in setResolutionXvfb use fmt.Sprintf to inject width and height into shell commands without escaping. This allows for command injection. The direct string manipulation of the supervisor config is also fragile, risking breakage if the format changes.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is fine. Not user facing

eval "FLAGS_ARRAY=($CHROMIUM_FLAGS)"
else
FLAGS_ARRAY=()
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Shell Injection via CHROMIUM_FLAGS

The eval command in run-unikernel.sh and run-docker.sh parses CHROMIUM_FLAGS. This creates a security vulnerability, allowing arbitrary command injection if CHROMIUM_FLAGS contains malicious shell code, as it can be controlled by user input or environment variables.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor

@Sayan- Sayan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm! last couple of q's

Comment on lines 53 to 62
// Initialize Neko authenticated client
adminPassword := os.Getenv("NEKO_ADMIN_PASSWORD")
if adminPassword == "" {
adminPassword = "admin" // Default from neko.yaml
}
nekoAuthClient, err := nekoclient.NewAuthClient("http://127.0.0.1:8080", "admin", adminPassword)
if err != nil {
return nil, fmt.Errorf("failed to create neko auth client: %w", err)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we prefer to initial this here instead of passing it in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reasons lol

Comment on lines 1181 to 1184
refresh_rate:
type: integer
enum: [60, 30, 25]
description: Display refresh rate in Hz. If omitted, uses the highest available rate for the resolution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we also want 10 in here?

Comment on lines +12 to +20
// AuthClient wraps the Neko OpenAPI client and handles authentication automatically.
// It manages token caching and refresh on 401 responses.
type AuthClient struct {
client *nekooapi.ClientWithResponses
tokenMu sync.Mutex
token string
username string
password string
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is way easier to read, thanks for iterating!

N25 PatchDisplayRequestRefreshRate = 25
N30 PatchDisplayRequestRefreshRate = 30
N60 PatchDisplayRequestRefreshRate = 60
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing Enum Value Causes Validation Errors

The generated PatchDisplayRequestRefreshRate enum constants are incomplete. The value 10 is missing from the generated Go code, despite being defined in the OpenAPI spec. This prevents clients from using 10 Hz and may lead to validation issues.

Fix in Cursor Fix in Web

@hiroTamada hiroTamada force-pushed the hiro/configurable_viewport branch from d1189ff to 9802b64 Compare October 10, 2025 18:55
@hiroTamada hiroTamada force-pushed the hiro/configurable_viewport branch from 9802b64 to 513dcbb Compare October 10, 2025 18:55
else
FLAGS_JSON+=",\"$flag\""
fi
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Shell Command Injection via eval

The eval command used to parse CHROMIUM_FLAGS in both run-unikernel.sh and run-docker.sh introduces a command injection vulnerability. If CHROMIUM_FLAGS contains malicious shell commands, eval will execute them, potentially leading to arbitrary code execution.

Additional Locations (1)

Fix in Cursor Fix in Web

fi
done
FLAGS_JSON+=']}'
echo "$FLAGS_JSON" > "$FLAGS_FILE"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: JSON Flag Parsing Fails on Special Characters

The JSON generation for Chromium flags doesn't escape special characters within flag values. If a flag contains double quotes, backslashes, or other JSON special characters, the resulting JSON will be invalid. This could cause the downstream flag parsing system to fail or misinterpret flags.

Additional Locations (1)

Fix in Cursor Fix in Web

@hiroTamada hiroTamada merged commit a730519 into main Oct 10, 2025
11 of 14 checks passed
@hiroTamada hiroTamada deleted the hiro/configurable_viewport branch October 10, 2025 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants