Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,7 @@ __pycache__
*.so
demonstrations/*
plan.txt

# Jetbrains
.idea/
*.iml
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,48 @@ Currently, the main script in the repo is `spot_localization.py`. Here is an exa
python spot_utils/spot_localization.py --hostname 192.168.80.3 --map_name b45-621
```
You can 'hijack' the robot via the tablet when prompted, move it around, and then have it print out its pose! We can then command it to navigate to any saved pose via other utilities (which are forthcoming).

## Mapping
> Last Updated: 05/17/2025

Our code is currently designed to operate given a saved map of a particular
environment. The map is required to define a coordinate system that persists
between robot runs. Moreover, each map contains associated metadata information
that we use for visualization, collision checking, and a variety of other
functions.

To create a new map of a new environment:
1. Print out and tape april tags around the environment. The tags are [here](https://support.bostondynamics.com/s/article/About-Fiducials)
2. Run the interactive script from the spot SDK to create a map, while walking
the spot around the environment. The script is [here](https://github.com/boston-dynamics/spot-sdk/blob/master/python/examples/graph_nav_command_line/recording_command_line.py)
- `python/examples/graph_nav_command_line/recording_command_line.py`
- See the Google doc for authentication info: https://docs.google.com/document/d/1BFIRWoBJyJk-XPf_4wVe616g_d3o7oDfB2ZuQyszcFU/edit?tab=t.0
- Set the `BOSDYN_CLIENT_USERNAME` and `BOSDYN_CLIENT_PASSWORD` environment variable according to the authentication details in the doc.
- Run it with the `hostname` (i.e., the IP address).
3. Make sure to `(0) Clear map` first, otherwise the map construction may have dreams and do weird stuff
4. `(1) Start recording a map`
5. Hijack controls on the tablet, walk Spot around, create a bunch of waypoints using `(4) Create a default waypoint in the current robot's location.`
6. `(9) Automatically find and close loops.` then `(0) Close all loops.`
7. `(a) Optimize the map's anchoring.`
8. `(2) Stop recording a map.`
9. `(5) Download the map after recording.`
5. Save the map files to `spot_utils / graph_nav_maps / <your new env name>`
6. Create a file named `metadata.yaml` if one doesn't already exist within the folder
associated with a map. See below for more details.
7. Set `--spot_graph_nav_map` to your new env name.


**Converting a map to a pointcloud**
1. Run [this script](https://github.com/boston-dynamics/spot-sdk/tree/master/python/examples/graph_nav_extract_point_cloud) on the pre-made map to yield an output `.ply` pointcloud file.
[Optional]
- `python/examples/graph_nav_extract_point_cloud/extract_point_cloud.py`
2. Install the [Open3D package](http://www.open3d.org/docs/release/getting_started.html) with `pip install open3d`.
3. Open up a python interpreter in your terminal, and run the following commands:
```
import open3d as o3d

pcd = o3d.io.read_point_cloud("<path to your pointcloud file>")
o3d.visualization.draw_geometries_with_editing([pcd])
```
4. Within the window, do SHIFT + left click on a point to print out its 3D coords to the terminal. Copy the first two (x, y) of all the
boundary points into the yaml. Note that you can do SHIFT + right click to unselect a point.
88 changes: 71 additions & 17 deletions exec_plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,12 @@
close_gripper,
move_hand_to_relative_pose,
open_gripper,
stow_arm,
)
from skills.spot_navigation import navigate_to_absolute_pose
from skills.spot_navigation import (
navigate_to_absolute_pose_precise,
)
from spot_utils.gemini_utils import get_pixel_from_gemini
from spot_utils.perception.spot_cameras import capture_images
from spot_utils.spot_localization import SpotLocalizer
from spot_utils.utils import (
Expand All @@ -32,6 +36,9 @@
get_pixel_from_user,
verify_estop,
)
import rerun as rr
from PIL import Image
import cv2

DEFAULT_HAND_LOOK_FLOOR_POSE = math_helpers.SE3Pose(
x=0.80, y=0.0, z=0.25, rot=math_helpers.Quat.from_pitch(np.pi / 3)
Expand Down Expand Up @@ -85,10 +92,13 @@ def map_to_spot(pose: math_helpers.SE2Pose) -> math_helpers.SE2Pose:

def move_to(x_abs: float, y_abs: float, yaw_abs: float) -> None:
"""Move the robot to the specified absolute pose."""
print(f"move_to(x_abs={x_abs}, y_abs={y_abs}, yaw_abs={yaw_abs}")
desired_pose = math_helpers.SE2Pose(x=x_abs, y=y_abs, angle=yaw_abs)
desired_pose_spot = map_to_spot(desired_pose)
if ROBOT is not None and LOCALIZER is not None:
navigate_to_absolute_pose(ROBOT, LOCALIZER, desired_pose_spot)
navigate_to_absolute_pose_precise(
ROBOT, LOCALIZER, desired_pose_spot, tolerance=0.05
)


def gaze(direction: str) -> None:
Expand All @@ -102,20 +112,43 @@ def grasp(text_prompt: Optional[str]) -> None:
"""Grasp an object at a specified pixel."""
# Capture an image.
camera = "hand_color_image"
if ROBOT is not None and LOCALIZER is not None:
rgbd = capture_images(ROBOT, LOCALIZER, [camera])[camera]

if text_prompt and SAM_ENDPOINT:
# Select a pixel by querying GroundedSAM.
pixel = get_pixel_from_grounded_sam(rgbd.rgb, text_prompt, SAM_ENDPOINT)
else:
# Select a pixel by querying the user.
pixel = get_pixel_from_user(rgbd.rgb)

if pixel is not None:
# Grasp at the pixel with a top-down grasp.
top_down_rot = math_helpers.Quat.from_pitch(np.pi / 2)
grasp_at_pixel(ROBOT, rgbd, pixel, grasp_rot=top_down_rot)
assert ROBOT is not None, "Sahit why!"
assert LOCALIZER is not None, "SAHIT WHY!!!!"

images = capture_images(ROBOT, LOCALIZER, [camera])
rgbd = images[camera]
rgb_np = rgbd.rgb
rr.log("rgb_raw", rr.Image(rgb_np))

# FIXME: Don't do this!!! Hiding implementation
# if text_prompt and SAM_ENDPOINT:
# # Select a pixel by querying GroundedSAM.
# pixel = get_pixel_from_grounded_sam(rgbd.rgb, text_prompt, SAM_ENDPOINT)
# else:
# # Select a pixel by querying the user.
# pixel = get_pixel_from_user(rgbd.rgb)

# Call Gemini to point
vlm_query_template = f"""
Point to the {text_prompt}. If you cannot see the {text_prompt} fully, point to the best guess.
The answer should follow the json format: [{{"point": , "label": }}, ...]. The points are in [y, x] format normalized to 0-1000.
"""
image_pil = Image.fromarray(rgb_np)
pixel = get_pixel_from_gemini(vlm_query_template, image_pil)

# Draw pixel on the image
bgr = cv2.cvtColor(rgb_np, cv2.COLOR_RGB2BGR)
cv2.circle(bgr, pixel, 5, (0, 0, 255), -1)
rgb_annotated = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
rr.log("pointing", rr.Image(rgb_annotated))

if pixel is not None:
# Grasp at the pixel with a top-down grasp.
top_down_rot = math_helpers.Quat.from_pitch(np.pi / 2)
grasp_at_pixel(ROBOT, rgbd, pixel, grasp_rot=top_down_rot)
return
else:
raise RuntimeError("WTF. Grasp failed!")


def grasp_at_pose(X_RobEE: NDArray) -> None:
Expand Down Expand Up @@ -148,6 +181,7 @@ def place_at_pose(X_RobEE: NDArray) -> None:
if __name__ == "__main__":
# running this script standalone initializes a bosdyn robot and localizer.
# It then executes the list of commands provided in the plan file
rr.init("exec_plan", spawn=True)
parser = argparse.ArgumentParser(description="Parse the robot's hostname.")
parser.add_argument(
"--hostname",
Expand Down Expand Up @@ -178,7 +212,27 @@ def place_at_pose(X_RobEE: NDArray) -> None:
SPOT_ROOM_POSE = metadata["spot-room-pose"]
else:
print("spot-room-pose not found in metadata.yaml, using default val")
SPOT_ROOM_POSE = {"x": 0.0, "y": 0.0, "z": 0.0}
SPOT_ROOM_POSE = {"x": 0.0, "y": 0.0, "z": 0.0, "angle": 0.0}
# Stow before running plan

# grasp("teddy bear")
stow_arm(ROBOT)
# open_gripper(ROBOT)
# close_gripper(ROBOT)
# gaze("DOWN")
# move_to(x_abs=2.454116588554405, y_abs=-2.759339339399913, yaw_abs=-2.401609411896162)
# move_to(x_abs=2.542117893278808, y_abs=-0.752019696769485, yaw_abs=-1.933767708350784)
#
# move_to(x_abs=2.723552922357717, y_abs=-3.172621984462674, yaw_abs=-2.753087971468099)
# gaze("AHEAD")
# grasp("caterpillar")
#
# move_to(x_abs=4.471199645580120, y_abs=-3.217432928943325, yaw_abs=-0.023820034339159)
# place_at_pose([0.9594754639330341, 0.0, 0.09019530213554317, 0.0, 0.0, -0.18379480399141906, 0.9829646331510385])

# move_to(x_abs=4.483301381932474, y_abs=-3.486604871046227, yaw_abs=0.130814244458979)
# place_at_pose([0.8673816028445435, 0.0, -0.011952195088877238, 0.0, 0.0, 0.1621424933557064, 0.9867673544703406])
# print()
with open(args.plan, "r") as plan_file:
exec(plan_file.read())
print("done")
58 changes: 57 additions & 1 deletion skills/spot_navigation.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,12 @@
from bosdyn.client.sdk import Robot

from spot_utils.spot_localization import SpotLocalizer
from spot_utils.utils import get_graph_nav_dir, get_robot_state, verify_estop
from spot_utils.utils import (
get_graph_nav_dir,
get_robot_state,
get_se2_distance,
verify_estop,
)


def navigate_to_relative_pose(
Expand Down Expand Up @@ -106,6 +111,57 @@ def navigate_to_absolute_pose(
)


def navigate_to_absolute_pose_precise(
robot: Robot,
localizer: SpotLocalizer,
target_pose: math_helpers.SE2Pose,
max_xytheta_vel: Tuple[float, float, float] = (2.0, 2.0, 1.0),
min_xytheta_vel: Tuple[float, float, float] = (-2.0, -2.0, -1.0),
timeout: float = 20.0,
tolerance: float = 0.05,
max_num_tries: int = 5,
) -> None:
"""Move to the specific target absolute pose; potentially trying multiple
times.
"""
# First, localize the robot.
localizer.localize()
robot_pose = localizer.get_last_robot_pose()
robot_se2 = robot_pose.get_closest_se2_transform()
curr_dist_to_target = get_se2_distance(robot_se2, target_pose)
curr_tries = 0
prev_dist_to_target = float("inf")
while curr_dist_to_target > tolerance and max_num_tries > curr_tries:
prev_dist_to_target = curr_dist_to_target
localizer.localize()
navigate_to_absolute_pose(
robot, localizer, target_pose, max_xytheta_vel, min_xytheta_vel, timeout
)
# Re-localize the robot.
localizer.localize()
robot_pose = localizer.get_last_robot_pose()
robot_se2 = robot_pose.get_closest_se2_transform()
curr_dist_to_target = get_se2_distance(robot_se2, target_pose)
print(curr_dist_to_target)
Copy link

Copilot AI May 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider replacing print statements with an appropriate logging mechanism for better production-level output control.

Copilot uses AI. Check for mistakes.
# If the robot is not moving, we should move randomly backwards
# to avoid getting stuck.
if abs(curr_dist_to_target - prev_dist_to_target) < 0.01:
# Move backwards.
rel_pose = math_helpers.SE2Pose(x=-0.25, y=0, angle=0)
navigate_to_relative_pose(
robot, rel_pose, max_xytheta_vel, min_xytheta_vel, timeout
)
curr_tries += 1
if curr_dist_to_target > tolerance:
navigate_to_absolute_pose(
robot, localizer, target_pose, max_xytheta_vel, min_xytheta_vel, timeout
)
print(
f"Failed to reach the target pose after {curr_tries} tries. "
f"Current distance to target: {curr_dist_to_target}"
)


if __name__ == "__main__":
# Run this file alone to test manually.
# Make sure to pass in --spot_robot_ip.
Expand Down
Binary file added spot_utils/388885845015.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
98 changes: 98 additions & 0 deletions spot_utils/gemini_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
import json
from typing import Tuple

import PIL

from spot_utils.pretrained_model_interface import GoogleGeminiVLM


def get_pixel_from_gemini(
vlm_query_str: str, pil_image: PIL.Image.Image
) -> Tuple[int, int]:
# Assuming create_vlm_by_name exists and works like create_llm_by_name
# Use the specific model name from CFG or hardcode if necessary
vlm = GoogleGeminiVLM("gemini-1.5-flash")

# 2. Construct the query
# Adjust prompt as needed for better VLM performance
def parse_json_output(json_output_str):
# Parsing out the markdown fencing
lines = json_output_str.splitlines()
for i, line in enumerate(lines):
if line.strip() == "```json":
json_output_str = "\n".join(lines[i + 1 :])
json_output_str = json_output_str.split("```")[0]
break
json_output_str = json_output_str.strip()
return json_output_str

# 3. Query the VLM
# Assuming sample_completions takes a list of images
vlm_output_list = vlm.sample_completions(
prompt=vlm_query_str,
imgs=[pil_image],
temperature=0.0, # Low temp for deterministic output
seed=42,
num_completions=1,
)
vlm_output_str = vlm_output_list[0]
# 4. Parse the JSON string
json_string_to_parse = parse_json_output(vlm_output_str)
parsed_data = json.loads(json_string_to_parse)
# 5. Extract and denormalize coordinates
if not isinstance(parsed_data, list) or not parsed_data:
raise ValueError("Parsed JSON is not a non-empty list.")
# Assuming the first point is the desired one
first_point_obj = parsed_data[0]
if (
"point" not in first_point_obj
or not isinstance(first_point_obj["point"], list)
or len(first_point_obj["point"]) != 2
):
raise ValueError(
"First element in JSON does not contain a valid 'point' list [y, x]."
)
y_norm, x_norm = first_point_obj["point"]
if not isinstance(y_norm, (int, float)) or not isinstance(x_norm, (int, float)):
raise ValueError("Normalized coordinates are not numbers.")
# Denormalize from 0-1000 range to image pixel coordinates
img_height = pil_image.height
img_width = pil_image.width
y = int(y_norm * img_height / 1000.0)
x = int(x_norm * img_width / 1000.0)
# Clamp coordinates to be within image bounds
y = max(0, min(y, img_height - 1))
x = max(0, min(x, img_width - 1))
# Assign to the 'pixel' variable in (x, y) format
pixel = (x, y)
return pixel



if __name__ == "__main__":
import logging
import rerun as rr
from PIL import Image
import cv2
import numpy as np


rr.init("gemini_test", spawn=True)
logging.basicConfig(level=logging.DEBUG)

pil_image = Image.open("388885845015.jpg")

vlm_query_template = """
Point to the human.
The answer should follow the json format: [{"point": , "label": }, ...]. The points are in [y, x] format normalized to 0-1000.
"""

pixel = get_pixel_from_gemini(vlm_query_template, pil_image)

image_np = np.asarray(pil_image)
bgr = cv2.cvtColor(image_np, cv2.COLOR_RGB2BGR)
cv2.circle(bgr, pixel, 5, (0, 0, 255), -1)

rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
rr.log("rgb", rr.Image(rgb))
print()
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

4edge_snapshot_id_ane-syphon-WgF8TQA5by29Onmt7bAc7w==
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

7edge_snapshot_id_carven-angler-6dUv3NnhHWdPOOU36y+eEw==
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified spot_utils/graph_nav_maps/outside-621/graph
Binary file not shown.
Loading
Loading