-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: Handle normalized coordinates from local grounding models (e.g. InternVL) #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -240,9 +240,23 @@ def generate_coords(self, ref_expr: str, obs: Dict) -> List[int]: | |||||||||||||||||||||||||||||||||||||
| # Generate and parse coordinates | ||||||||||||||||||||||||||||||||||||||
| response = call_llm_safe(self.grounding_model) | ||||||||||||||||||||||||||||||||||||||
| print("RAW GROUNDING MODEL RESPONSE:", response) | ||||||||||||||||||||||||||||||||||||||
| numericals = re.findall(r"\d+", response) | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| # Regex to find floating point numbers (0.xxxx) or integers | ||||||||||||||||||||||||||||||||||||||
| numericals = re.findall(r"\d+\.?\d*", response) | ||||||||||||||||||||||||||||||||||||||
| assert len(numericals) >= 2 | ||||||||||||||||||||||||||||||||||||||
| return [int(numericals[0]), int(numericals[1])] | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| x = float(numericals[0]) | ||||||||||||||||||||||||||||||||||||||
| y = float(numericals[1]) | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| # If coordinates are normalized (0-1), scale them up | ||||||||||||||||||||||||||||||||||||||
| if x <= 1.0 and y <= 1.0: | ||||||||||||||||||||||||||||||||||||||
| x = int(x * self.engine_params_for_grounding["grounding_width"]) | ||||||||||||||||||||||||||||||||||||||
| y = int(y * self.engine_params_for_grounding["grounding_height"]) | ||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||
| x = int(x) | ||||||||||||||||||||||||||||||||||||||
| y = int(y) | ||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+251
to
+257
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add defensive checks for required engine parameters. The code directly accesses 🔎 Proposed fix with defensive checks # If coordinates are normalized (0-1), scale them up
if x <= 1.0 and y <= 1.0:
- x = int(x * self.engine_params_for_grounding["grounding_width"])
- y = int(y * self.engine_params_for_grounding["grounding_height"])
+ grounding_width = self.engine_params_for_grounding.get("grounding_width")
+ grounding_height = self.engine_params_for_grounding.get("grounding_height")
+ if grounding_width is None or grounding_height is None:
+ raise ValueError("grounding_width and grounding_height must be specified in engine_params_for_grounding when using normalized coordinates")
+ x = int(x * grounding_width)
+ y = int(y * grounding_height)
else:
x = int(x)
y = int(y)📝 Committable suggestion
Suggested change
🤖 Prompt for AI AgentsConsider edge cases in normalized coordinate detection. The condition
While these edge cases are unlikely in typical usage, consider adding a more robust heuristic, such as checking if both coordinates are in the range [0.0, 1.0] and the model is known to output normalized coordinates (via a configuration flag). 🔎 Suggested improvement with configuration flagAdd a configuration parameter to explicitly indicate coordinate format: # In __init__ or configuration
self.uses_normalized_coords = engine_params_for_grounding.get("normalized_coordinates", False)Then update the detection logic: - # If coordinates are normalized (0-1), scale them up
- if x <= 1.0 and y <= 1.0:
+ # If coordinates are normalized (0-1), scale them up
+ should_scale = (x <= 1.0 and y <= 1.0) if self.uses_normalized_coords else (x < 1.0 and y < 1.0)
+ if should_scale:
x = int(x * self.engine_params_for_grounding["grounding_width"])
y = int(y * self.engine_params_for_grounding["grounding_height"])
else:
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| return [x, y] | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| # Calls pytesseract to generate word level bounding boxes for text grounding | ||||||||||||||||||||||||||||||||||||||
| def get_ocr_elements(self, b64_image_data: str) -> Tuple[str, List]: | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Would you mind adding additional check of if both
grounding_width and grounding_height key exists in engine_params_for_grounding?
This avoids the edge case of either y or x == 1.0