Skip to content
164 changes: 164 additions & 0 deletions docs/superpowers/plans/2026-03-19-zerocopy-raw-normalize.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Zerocopy Raw Normalize Implementation Plan

> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Refactor the macOS zerocopy path so raw transport tensors stay internal while the public `ZEROCOPY` path continues to return normalized `RGB + flipped` output.

**Architecture:** Keep the native macOS bridge unchanged for now and move the raw-to-public normalization boundary into `ObservationConverter`. Minimize diff by touching the current macOS zerocopy branches only, adding a small normalization helper and focused tests.

**Tech Stack:** Python, PyTorch, pytest, existing CraftGround environment converter code

---

## Chunk 1: MacOS Zerocopy Boundary

### Task 1: Add failing tests for macOS zerocopy public normalization

**Files:**
- Modify: `tests/python/unit/test_observation_converter.py` or nearest existing converter test file
- Modify: `src/craftground/environment/observation_converter.py`

- [ ] **Step 1: Locate or create the nearest unit-test file for `ObservationConverter`**

Run: `rg -n "ObservationConverter|initialize_zerocopy|ScreenEncodingMode.ZEROCOPY" tests src/craftground`
Expected: identify the closest existing test home; only create a new test file if no suitable one exists.

- [ ] **Step 2: Write a failing test for the macOS path that preserves public output semantics**

Test idea:

```python
def test_macos_zerocopy_returns_rgb_flipped_from_raw_transport(...):
converter = ObservationConverter(...)
raw = fake_tensor_with_known_bgra_rows(...)
converter.internal_type = ObservationTensorType.APPLE_TENSOR
converter.last_observations[0] = raw

obs, _ = converter.convert(fake_observation(...))

assert torch.equal(obs, expected_rgb_flipped)
```

- [ ] **Step 3: Write a failing test that initialization stores raw transport data without normalization**

Test idea:

```python
def test_initialize_zerocopy_stores_raw_apple_tensor(...):
converter = ObservationConverter(...)
raw = fake_raw_tensor(...)
mock_initialize_from_mach_port.return_value = raw

converter.initialize_zerocopy(fake_mach_port_bytes)

assert converter.last_observations[0] is raw
```

- [ ] **Step 4: Run the targeted tests and verify they fail for the intended reason**

Run: `pytest <targeted-test-file> -k zerocopy -v`
Expected: FAIL due to missing helper/incorrect state handling, not import or syntax errors.

- [ ] **Step 5: Commit the failing tests**

```bash
git add <targeted-test-file>
git commit -m "test: add macos zerocopy normalization coverage"
```

### Task 2: Implement raw/public separation with minimal diff

**Files:**
- Modify: `src/craftground/environment/observation_converter.py`
- Test: `tests/python/unit/test_observation_converter.py` or nearest existing converter test file

- [ ] **Step 1: Add a small helper for macOS raw-to-public normalization**

Implementation target:

```python
def _normalize_apple_zerocopy_tensor(self, tensor: "TorchArrayType") -> "TorchArrayType":
return tensor.clone()[:, :, [2, 1, 0]].flip(0)
```

Keep this helper private and adjacent to zerocopy handling.

- [ ] **Step 2: Make `initialize_zerocopy()` store the raw Apple tensor only**

Requirements:

- no normalization in initialization
- keep storing the raw transport tensor
- ensure type tracking updates the field used by `convert()`

- [ ] **Step 3: Update the public `ZEROCOPY` branch to call the helper at return time**

Requirements:

- preserve current `RGB + flipped` behavior for Apple tensors
- do not broaden the change into unrelated output types
- keep non-Apple branches unchanged unless required for correctness

- [ ] **Step 4: Fix the touched zerocopy state variable inconsistency if still present**

The code currently mixes `internal_type` and `observation_tensor_type`. The implementation should ensure the active state variable used by `convert()` is updated consistently in the touched path.

- [ ] **Step 5: Run the targeted tests and verify they pass**

Run: `pytest <targeted-test-file> -k zerocopy -v`
Expected: PASS

- [ ] **Step 6: Run any neighboring converter tests**

Run: `pytest <targeted-test-file> -v`
Expected: PASS with no zerocopy regressions in nearby behavior.

- [ ] **Step 7: Commit the implementation**

```bash
git add src/craftground/environment/observation_converter.py <targeted-test-file>
git commit -m "refactor: separate raw macos zerocopy transport from public normalization"
```

## Chunk 2: Verification And Handoff

### Task 3: Verify branch state and prepare for PR

**Files:**
- Modify: `docs/superpowers/specs/2026-03-19-zerocopy-raw-normalize-design.md`
- Modify: `docs/superpowers/plans/2026-03-19-zerocopy-raw-normalize.md`

- [ ] **Step 1: Run a focused verification sweep**

Run:

```bash
pytest <targeted-test-file> -v
git status --short
git log --oneline -5
```

Expected:

- tests passing
- only intended files modified
- branch contains the new zerocopy commits

- [ ] **Step 2: Update docs if implementation differs from the planned low-diff shape**

Keep spec and plan aligned with actual code decisions.

- [ ] **Step 3: Prepare PR summary**

Include:

- internal raw/public separation
- no public `ZEROCOPY` behavior change
- future raw-mode support enabled

- [ ] **Step 4: Commit any doc adjustments**

```bash
git add docs/superpowers/specs/2026-03-19-zerocopy-raw-normalize-design.md docs/superpowers/plans/2026-03-19-zerocopy-raw-normalize.md
git commit -m "docs: record zerocopy raw normalize design and plan"
```
118 changes: 118 additions & 0 deletions docs/superpowers/specs/2026-03-19-zerocopy-raw-normalize-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Zerocopy Raw Normalize Design

## Summary

Refactor the macOS zerocopy path so the internal transport tensor remains in its native raw layout, while the public `ZEROCOPY` API continues to expose the current normalized image contract (`RGB + flipped`). This keeps the current user-facing behavior stable, reduces coupling between the custom MPS bridge and presentation logic, and creates a clean path to expose a future raw mode without redesigning the pipeline again.

## Current Problem

The macOS zerocopy path currently mixes three concerns in one flow:

1. Transporting the framebuffer into Python as an MPS-backed tensor.
2. Representing the native producer layout.
3. Returning the user-facing normalized layout.

This makes the custom bridge responsible for data semantics it should not own. It also forces the current implementation to rely on `clone()` in the public path as both a correctness workaround and a format-conversion boundary.

## Decision

Use the approved option:

- Keep the internal macOS zerocopy representation as native raw data.
- Preserve the existing public `ZEROCOPY` output semantics.
- Move normalization (`RGB + flip`) to the public conversion boundary.
- Do not expose a new raw public mode in this change.

This is the lowest-diff option that improves maintainability without giving up future extensibility.

## Scope

In scope:

- macOS zerocopy internals in `observation_converter.py`
- separation between raw transport tensor and normalized public tensor
- tests for the new internal/public boundary behavior
- cleanup of the internal zerocopy type-tracking bug if encountered in the touched path

Out of scope:

- changing the public API shape of `ScreenEncodingMode.ZEROCOPY`
- introducing a new public raw mode
- redesigning the native C++ bridge
- changing non-macOS behavior unless required for shared helper safety

## Design

### Internal Representation

`initialize_zerocopy()` should store the macOS tensor exactly as produced by the native bridge. No channel reorder, Y-flip, or user-facing normalization should happen at initialization time.

The raw tensor is the transport artifact. It is allowed to be in producer-native layout as long as the public boundary is responsible for converting it into the documented output format.

### Public Representation

The existing `ScreenEncodingMode.ZEROCOPY` path should continue to return the same logical image as before. The normalization step should happen in a helper near the public return site, not during bridge setup.

For the macOS path this means:

- source: raw Apple tensor
- public output: normalized `RGB + flipped`

This keeps behavior stable for users while isolating future raw-mode support to a small API addition later.

### State Ownership

The converter should make the distinction between:

- raw zerocopy tensor state
- normalized public observation state

The implementation may use separate fields or a clearly named helper-based boundary, but the raw/public distinction must be explicit in code. Avoid clever state machines. This should stay easy to follow inside the existing file.

### Future Raw Mode

This refactor should make a later `ZEROCOPY_RAW` or equivalent mode straightforward:

- the raw tensor already exists as a first-class internal concept
- the later feature would only add a new public branch that returns the raw tensor directly
- the normalized branch would remain unchanged

## Testing Strategy

Add targeted tests around the converter boundary:

- macOS zerocopy initialization stores the raw transport tensor without premature normalization
- public `ZEROCOPY` still returns the same normalized semantics as before
- normalization helper is the only place that performs the macOS format conversion

Tests should focus on layout semantics and call boundaries, not the real native bridge.

## Risk Management

Primary risk:

- accidentally changing the user-visible output of `ZEROCOPY`

Mitigation:

- keep the public return contract unchanged
- isolate the new normalization helper
- cover the output semantics with tests

Secondary risk:

- touching shared converter state and unintentionally affecting non-macOS paths

Mitigation:

- keep the diff local to the macOS-specific branches whenever possible
- avoid broad refactors outside the touched boundary

## Success Criteria

The change is successful when:

- current `ZEROCOPY` callers still receive `RGB + flipped` output
- the macOS zerocopy path internally preserves a raw transport tensor
- the code clearly separates raw transport from public normalization
- the refactor makes a future raw public mode a small additive change
26 changes: 18 additions & 8 deletions src/craftground/environment/observation_converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,12 @@ def convert(
if self.internal_type == ObservationTensorType.NONE:
self.initialize_zerocopy(observation.ipc_handle)
if self.internal_type == ObservationTensorType.APPLE_TENSOR:
obs_1 = self.last_observations[0].clone()[:, :, [2, 1, 0]].flip(0)
obs_1 = self._normalize_apple_zerocopy_tensor(
self.last_observations[0]
)
return (obs_1, None)
elif self.internal_type == ObservationTensorType.CUDA_DLPACK:
obs_1 = self.last_observations[0].clone()[:, :, :3].flip(0)
obs_1 = self._normalize_cuda_zerocopy_tensor(self.last_observations[0])
return (obs_1, None)
Comment on lines 119 to 126
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

render() now sees the raw zerocopy tensor, not the public RGB output.

Line 120/125 normalizes only the return value. initialize_zerocopy() still leaves self.last_observations[0] pointing at the raw Apple/CUDA buffer, and render() later consumes that cache directly on Lines 168-204. That regresses zerocopy rendering back to unnormalized BGR/RGBA data.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/craftground/environment/observation_converter.py` around lines 119 - 126,
The code normalizes only the value returned in the APPLE_TENSOR/CUDA_DLPACK
branches but leaves self.last_observations[0] as the raw zerocopy buffer so
render() later consumes unnormalized data; fix by replacing the raw cache with
the normalized tensor (assign the result of _normalize_apple_zerocopy_tensor or
_normalize_cuda_zerocopy_tensor back into self.last_observations[0]) inside the
branches (affecting initialize_zerocopy / the branch handling for
ObservationTensorType.APPLE_TENSOR and ObservationTensorType.CUDA_DLPACK) so
both the returned tuple and the cached last_observations[0] contain the
normalized RGB(A) tensor.

else:
raise ValueError(
Expand Down Expand Up @@ -229,6 +231,16 @@ def convert_raw_observation(self, image_bytes: bytes) -> np.ndarray:
# arr = np.transpose(last_rgb_frame, (2, 1, 0)) # channels, width, height
return rgb_array_or_tensor

def _normalize_apple_zerocopy_tensor(
self, raw_tensor: "TorchArrayType"
) -> "TorchArrayType":
return raw_tensor.clone()[:, :, [2, 1, 0]].flip(0)

def _normalize_cuda_zerocopy_tensor(
self, raw_tensor: "TorchArrayType"
) -> "TorchArrayType":
return raw_tensor.clone()[:, :, :3].flip(0)

def initialize_zerocopy(self, ipc_handle: bytes):
import torch
from .craftground_native import initialize_from_mach_port # type: ignore
Expand All @@ -250,8 +262,7 @@ def initialize_zerocopy(self, ipc_handle: bytes):
print(rgb_array_or_tensor.dtype)
print(rgb_array_or_tensor.device)
self.last_observations[0] = rgb_array_or_tensor
# drop alpha, flip y axis, and clone
self.observation_tensor_type = ObservationTensorType.APPLE_TENSOR
self.internal_type = ObservationTensorType.APPLE_TENSOR
else:
import torch.utils.dlpack

Expand All @@ -268,8 +279,7 @@ def initialize_zerocopy(self, ipc_handle: bytes):
print(rgb_array_or_tensor.device)
print(f"{rgb_array_or_tensor.data_ptr()=}\n\n")
self.last_observations[0] = rgb_array_or_tensor
# drop alpha, flip y axis, and clone
self.observation_tensor_type = ObservationTensorType.CUDA_DLPACK
self.internal_type = ObservationTensorType.CUDA_DLPACK

def convert_jax_observation(self, ipc_handle: bytes) -> "JaxArrayType":
import jax.numpy as jnp
Expand All @@ -295,7 +305,7 @@ def convert_jax_observation(self, ipc_handle: bytes) -> "JaxArrayType":
self.last_observations[0] = rgb_array_or_tensor
# drop alpha, flip y axis, and clone
rgb_array_or_tensor = rgb_array_or_tensor.clone()[:, :, [2, 1, 0]].flip(0)
self.observation_tensor_type = ObservationTensorType.JAX_NP
self.internal_type = ObservationTensorType.JAX_NP
return rgb_array_or_tensor
else:
cuda_dlpack = mtl_tensor_from_cuda_mem_handle(
Expand All @@ -308,5 +318,5 @@ def convert_jax_observation(self, ipc_handle: bytes) -> "JaxArrayType":
jax_image = jnp.from_dlpack(cuda_dlpack)
rgb_array_or_tensor = jax_image
rgb_array_or_tensor = rgb_array_or_tensor.clone()[:, :, [2, 1, 0]].flip(0)
self.observation_tensor_type = ObservationTensorType.JAX_NP
self.internal_type = ObservationTensorType.JAX_NP
return jax_image, None
Loading
Loading