Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions gui_agents/s3/agents/grounding.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,9 +240,23 @@ def generate_coords(self, ref_expr: str, obs: Dict) -> List[int]:
# Generate and parse coordinates
response = call_llm_safe(self.grounding_model)
print("RAW GROUNDING MODEL RESPONSE:", response)
numericals = re.findall(r"\d+", response)

# Regex to find floating point numbers (0.xxxx) or integers
numericals = re.findall(r"\d+\.?\d*", response)
assert len(numericals) >= 2
return [int(numericals[0]), int(numericals[1])]

x = float(numericals[0])
y = float(numericals[1])

# If coordinates are normalized (0-1), scale them up
if x <= 1.0 and y <= 1.0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Would you mind adding additional check of if both
grounding_width and grounding_height key exists in engine_params_for_grounding?
This avoids the edge case of either y or x == 1.0

x = int(x * self.engine_params_for_grounding["grounding_width"])
y = int(y * self.engine_params_for_grounding["grounding_height"])
else:
x = int(x)
y = int(y)
Comment on lines +251 to +257
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add defensive checks for required engine parameters.

The code directly accesses self.engine_params_for_grounding["grounding_width"] and ["grounding_height"] without verifying these keys exist. If these parameters are missing from the configuration, the code will raise a KeyError and crash the agent.

🔎 Proposed fix with defensive checks
         # If coordinates are normalized (0-1), scale them up
         if x <= 1.0 and y <= 1.0:
-            x = int(x * self.engine_params_for_grounding["grounding_width"])
-            y = int(y * self.engine_params_for_grounding["grounding_height"])
+            grounding_width = self.engine_params_for_grounding.get("grounding_width")
+            grounding_height = self.engine_params_for_grounding.get("grounding_height")
+            if grounding_width is None or grounding_height is None:
+                raise ValueError("grounding_width and grounding_height must be specified in engine_params_for_grounding when using normalized coordinates")
+            x = int(x * grounding_width)
+            y = int(y * grounding_height)
         else:
             x = int(x)
             y = int(y)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# If coordinates are normalized (0-1), scale them up
if x <= 1.0 and y <= 1.0:
x = int(x * self.engine_params_for_grounding["grounding_width"])
y = int(y * self.engine_params_for_grounding["grounding_height"])
else:
x = int(x)
y = int(y)
# If coordinates are normalized (0-1), scale them up
if x <= 1.0 and y <= 1.0:
grounding_width = self.engine_params_for_grounding.get("grounding_width")
grounding_height = self.engine_params_for_grounding.get("grounding_height")
if grounding_width is None or grounding_height is None:
raise ValueError("grounding_width and grounding_height must be specified in engine_params_for_grounding when using normalized coordinates")
x = int(x * grounding_width)
y = int(y * grounding_height)
else:
x = int(x)
y = int(y)
🤖 Prompt for AI Agents
In gui_agents/s3/agents/grounding.py around lines 251 to 257, the code accesses
self.engine_params_for_grounding["grounding_width"] and ["grounding_height"]
directly which can raise KeyError; add defensive checks: verify the keys exist
and their values are numeric and >0 before using them (e.g., use dict.get with a
sensible default or explicitly raise a clear ValueError if missing/invalid),
convert to int only after validation, and ensure the fallback behavior for
normalized coordinates still makes sense when defaults are applied.

⚠️ Potential issue | 🟠 Major

Consider edge cases in normalized coordinate detection.

The condition x <= 1.0 and y <= 1.0 may misidentify absolute coordinates as normalized in edge cases:

  • Absolute coordinates like (0, 0), (1, 1), (0, 500), or (1, 768) where one or both values are ≤ 1.0 would trigger incorrect scaling
  • Mixed coordinates (one normalized, one absolute) like (0.5, 500) would be treated as absolute, potentially causing incorrect clicks

While these edge cases are unlikely in typical usage, consider adding a more robust heuristic, such as checking if both coordinates are in the range [0.0, 1.0] and the model is known to output normalized coordinates (via a configuration flag).

🔎 Suggested improvement with configuration flag

Add a configuration parameter to explicitly indicate coordinate format:

# In __init__ or configuration
self.uses_normalized_coords = engine_params_for_grounding.get("normalized_coordinates", False)

Then update the detection logic:

-        # If coordinates are normalized (0-1), scale them up
-        if x <= 1.0 and y <= 1.0:
+        # If coordinates are normalized (0-1), scale them up
+        should_scale = (x <= 1.0 and y <= 1.0) if self.uses_normalized_coords else (x < 1.0 and y < 1.0)
+        if should_scale:
             x = int(x * self.engine_params_for_grounding["grounding_width"])
             y = int(y * self.engine_params_for_grounding["grounding_height"])
         else:

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In gui_agents/s3/agents/grounding.py around lines 251-257, the current check (x
<= 1.0 and y <= 1.0) can misclassify absolute coordinates as normalized; add a
configuration flag self.uses_normalized_coords =
engine_params_for_grounding.get("normalized_coordinates", False) (initialized in
__init__) and change the detection to only scale when
self.uses_normalized_coords is True and both x and y are within [0.0, 1.0];
otherwise treat as absolute coordinates (convert to int) and optionally emit a
debug/warning if one coordinate looks normalized and the other absolute to aid
debugging.


return [x, y]

# Calls pytesseract to generate word level bounding boxes for text grounding
def get_ocr_elements(self, b64_image_data: str) -> Tuple[str, List]:
Expand Down