Skip to content

Conversation

@buiilding
Copy link

@buiilding buiilding commented Dec 19, 2025

Problem

When using local grounding models like OpenGVLab/InternVL3_5-4B via vLLM, the model may return normalized coordinates (0.0 - 1.0) instead of absolute pixel coordinates.
The previous implementation only extracted integers (re.findall(r"\d+", response)), which caused two issues:

  1. Decimals were stripped (e.g., 0.4518 became 0 and 4518).
  2. The y coordinate often became a large integer (e.g., 4518), causing pyautogui to trigger a FailSafeException by hitting the screen edge.

Solution

Updated gui_agents/s3/agents/grounding.py to:

  1. Parse floating point numbers correctly using re.findall(r"\d+\.?\d*", response).
  2. Detect if coordinates are normalized (values <= 1.0).
  3. Automatically scale normalized coordinates by grounding_width and grounding_height to get correct screen pixel values.

Testing

Tested with agent_s using vllm provider and InternVL3_5-4B. The agent now correctly clicks on target elements instead of crashing with pyautogui.FailSafeException.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced coordinate parsing to properly handle both normalized and absolute coordinate formats. Coordinates in the 0-1 range now receive automatic scaling based on grounding parameters, while absolute values maintain their existing behavior. This ensures full backward compatibility while extending support for both coordinate systems.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 19, 2025

Walkthrough

The generate_coords function in the grounding module was updated to parse coordinates as floating-point numbers and conditionally scale normalized coordinates (in range [0,1]) using engine parameters, while preserving integer handling for absolute coordinates.

Changes

Cohort / File(s) Summary
Coordinate Parsing Enhancement
gui_agents/s3/agents/grounding.py
Modified generate_coords to parse coordinates as floats; added conditional scaling logic that applies grounding_width and grounding_height when both coordinates fall within [0,1] range; otherwise casts to integers directly, preserving backward compatibility for absolute coordinate values

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Focus areas: Verify the conditional logic for [0,1] range detection is correct and that the scaling calculation properly applies grounding_width and grounding_height
  • Edge cases: Confirm behavior for boundary values (0.0, 1.0) and mixed coordinate types (one normalized, one absolute)
  • Regression risk: Ensure existing behavior for absolute coordinates remains unchanged

Poem

🐰 A rabbit hops through coordinates divine,
Where normalized values now perfectly align,
From floats to scaled dimensions bright,
The grounding logic dances just right,
No integers lost—just enhanced and refined! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: handling normalized coordinates from local grounding models, which is the core fix addressing the normalization issue.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
gui_agents/s3/agents/grounding.py (2)

244-249: Regex pattern may miss decimals without leading digits.

The pattern r"\d+\.?\d*" correctly captures most floating-point numbers but will not match numbers starting with a decimal point (e.g., ".456"). While most models output "0.456" format, consider using r"\d+\.?\d*|\.\d+" for complete coverage.

🔎 More robust regex pattern
-        # Regex to find floating point numbers (0.xxxx) or integers
-        numericals = re.findall(r"\d+\.?\d*", response)
+        # Regex to find floating point numbers (0.xxxx, .xxxx) or integers
+        numericals = re.findall(r"\d+\.?\d*|\.\d+", response)

242-242: Consider using logger instead of print for debugging.

The code uses print() for debugging output. For consistency with the rest of the codebase (which uses logging.getLogger("desktopenv.agent") at line 16), consider using the logger instead.

🔎 Proposed change
-        print("RAW GROUNDING MODEL RESPONSE:", response)
+        logger.debug(f"RAW GROUNDING MODEL RESPONSE: {response}")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cb57fb and 9aa4005.

📒 Files selected for processing (1)
  • gui_agents/s3/agents/grounding.py (1 hunks)

Comment on lines +251 to +257
# If coordinates are normalized (0-1), scale them up
if x <= 1.0 and y <= 1.0:
x = int(x * self.engine_params_for_grounding["grounding_width"])
y = int(y * self.engine_params_for_grounding["grounding_height"])
else:
x = int(x)
y = int(y)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add defensive checks for required engine parameters.

The code directly accesses self.engine_params_for_grounding["grounding_width"] and ["grounding_height"] without verifying these keys exist. If these parameters are missing from the configuration, the code will raise a KeyError and crash the agent.

🔎 Proposed fix with defensive checks
         # If coordinates are normalized (0-1), scale them up
         if x <= 1.0 and y <= 1.0:
-            x = int(x * self.engine_params_for_grounding["grounding_width"])
-            y = int(y * self.engine_params_for_grounding["grounding_height"])
+            grounding_width = self.engine_params_for_grounding.get("grounding_width")
+            grounding_height = self.engine_params_for_grounding.get("grounding_height")
+            if grounding_width is None or grounding_height is None:
+                raise ValueError("grounding_width and grounding_height must be specified in engine_params_for_grounding when using normalized coordinates")
+            x = int(x * grounding_width)
+            y = int(y * grounding_height)
         else:
             x = int(x)
             y = int(y)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# If coordinates are normalized (0-1), scale them up
if x <= 1.0 and y <= 1.0:
x = int(x * self.engine_params_for_grounding["grounding_width"])
y = int(y * self.engine_params_for_grounding["grounding_height"])
else:
x = int(x)
y = int(y)
# If coordinates are normalized (0-1), scale them up
if x <= 1.0 and y <= 1.0:
grounding_width = self.engine_params_for_grounding.get("grounding_width")
grounding_height = self.engine_params_for_grounding.get("grounding_height")
if grounding_width is None or grounding_height is None:
raise ValueError("grounding_width and grounding_height must be specified in engine_params_for_grounding when using normalized coordinates")
x = int(x * grounding_width)
y = int(y * grounding_height)
else:
x = int(x)
y = int(y)
🤖 Prompt for AI Agents
In gui_agents/s3/agents/grounding.py around lines 251 to 257, the code accesses
self.engine_params_for_grounding["grounding_width"] and ["grounding_height"]
directly which can raise KeyError; add defensive checks: verify the keys exist
and their values are numeric and >0 before using them (e.g., use dict.get with a
sensible default or explicitly raise a clear ValueError if missing/invalid),
convert to int only after validation, and ensure the fallback behavior for
normalized coordinates still makes sense when defaults are applied.

⚠️ Potential issue | 🟠 Major

Consider edge cases in normalized coordinate detection.

The condition x <= 1.0 and y <= 1.0 may misidentify absolute coordinates as normalized in edge cases:

  • Absolute coordinates like (0, 0), (1, 1), (0, 500), or (1, 768) where one or both values are ≤ 1.0 would trigger incorrect scaling
  • Mixed coordinates (one normalized, one absolute) like (0.5, 500) would be treated as absolute, potentially causing incorrect clicks

While these edge cases are unlikely in typical usage, consider adding a more robust heuristic, such as checking if both coordinates are in the range [0.0, 1.0] and the model is known to output normalized coordinates (via a configuration flag).

🔎 Suggested improvement with configuration flag

Add a configuration parameter to explicitly indicate coordinate format:

# In __init__ or configuration
self.uses_normalized_coords = engine_params_for_grounding.get("normalized_coordinates", False)

Then update the detection logic:

-        # If coordinates are normalized (0-1), scale them up
-        if x <= 1.0 and y <= 1.0:
+        # If coordinates are normalized (0-1), scale them up
+        should_scale = (x <= 1.0 and y <= 1.0) if self.uses_normalized_coords else (x < 1.0 and y < 1.0)
+        if should_scale:
             x = int(x * self.engine_params_for_grounding["grounding_width"])
             y = int(y * self.engine_params_for_grounding["grounding_height"])
         else:

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In gui_agents/s3/agents/grounding.py around lines 251-257, the current check (x
<= 1.0 and y <= 1.0) can misclassify absolute coordinates as normalized; add a
configuration flag self.uses_normalized_coords =
engine_params_for_grounding.get("normalized_coordinates", False) (initialized in
__init__) and change the detection to only scale when
self.uses_normalized_coords is True and both x and y are within [0.0, 1.0];
otherwise treat as absolute coordinates (convert to int) and optionally emit a
debug/warning if one coordinate looks normalized and the other absolute to aid
debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant