Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a native Metal normalization pipeline and refactors zerocopy handling so raw Apple tensors are retained internally and normalized at the public conversion boundary; updates the Python observation converter and JAX paths, adds unit tests for macOS/JAX flows, and integrates Apple C++/Metal normalization into the build. Changes
Sequence Diagram(s)sequenceDiagram
participant Python as Python (ObservationConverter)
participant Native as craftground_native (pybind IPC)
participant Metal as Metal Compute (MTL)
Python->>Native: initialize_from_mach_port(ipc_handle)
Native-->>Python: raw MTL-backed uint8 tensor
Python->>Python: store raw tensor (last_observations[0]), set internal_type=APPLE_TENSOR
Python->>Native: normalize_apple_mtl_tensor(raw_tensor)
alt Native (Apple build) returns tensor
Native->>Metal: dispatch Metal kernel (flip + BGR->RGB)
Metal-->>Native: normalized RGB uint8 tensor
Native-->>Python: normalized tensor
else Native returns None / absent
Python->>Python: fallback normalize (_normalize_apple_zerocopy_tensor)
Python-->>Python: produce normalized tensor
end
Python-->>Client: return normalized observation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Refactors the macOS zerocopy observation flow so raw BGRA/BGR + unflipped transport tensors are kept internally, while the public ScreenEncodingMode.ZEROCOPY output remains normalized (RGB + flipped) at the converter boundary. This also updates internal tensor-type bookkeeping and adds targeted unit/regression coverage plus design/plan docs.
Changes:
- Move Apple/CUDA zerocopy normalization into dedicated helper(s) at the
convert()return boundary. - Fix zerocopy/JAX internal tensor type tracking by consistently using
internal_type. - Add a macOS zerocopy regression test and document the design/implementation plan.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/craftground/environment/observation_converter.py |
Stores raw zerocopy tensors internally; normalizes only at the public return boundary; updates internal type tracking. |
tests/python/unit/test_observation_converter.py |
Adds a unit test that asserts raw storage + normalized output for the macOS (mach-port) zerocopy path. |
docs/superpowers/specs/2026-03-19-zerocopy-raw-normalize-design.md |
Captures the rationale and intended boundary between raw transport vs normalized public output. |
docs/superpowers/plans/2026-03-19-zerocopy-raw-normalize.md |
Records the implementation/testing checklist used to land the refactor safely. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| self.last_observations[0] = rgb_array_or_tensor | ||
| # drop alpha, flip y axis, and clone | ||
| rgb_array_or_tensor = rgb_array_or_tensor.clone()[:, :, [2, 1, 0]].flip(0) |
| jax_image = jnp.from_dlpack(cuda_dlpack) | ||
| rgb_array_or_tensor = jax_image | ||
| rgb_array_or_tensor = rgb_array_or_tensor.clone()[:, :, [2, 1, 0]].flip(0) | ||
| self.observation_tensor_type = ObservationTensorType.JAX_NP | ||
| self.internal_type = ObservationTensorType.JAX_NP | ||
| return jax_image, None |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
docs/superpowers/plans/2026-03-19-zerocopy-raw-normalize.md (1)
13-13: Minor: Use official "macOS" spelling.Apple's operating system is officially spelled "macOS" (lowercase "mac").
📝 Suggested fix
-## Chunk 1: MacOS Zerocopy Boundary +## Chunk 1: macOS Zerocopy Boundary🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-03-19-zerocopy-raw-normalize.md` at line 13, Update the header and any occurrences that use "MacOS" to the official Apple capitalization "macOS" (e.g., change the heading "## Chunk 1: MacOS Zerocopy Boundary" to "## Chunk 1: macOS Zerocopy Boundary" and update any other instances of "MacOS" in this document to "macOS").tests/python/unit/test_observation_converter.py (1)
21-24: Consider adding error path coverage.The current test covers the happy path well. Consider adding tests for edge cases in follow-up work:
- Error handling when
initialize_from_mach_portreturnsNone- CUDA zerocopy path (different handle length)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/python/unit/test_observation_converter.py` around lines 21 - 24, Add test cases in tests/python/unit/test_observation_converter.py to cover failure and CUDA zerocopy branches: create a fake_native where initialize_from_mach_port returns None and assert the converter (the code exercised by the existing test) handles/raises the expected error or returns the expected fallback; and create another fake_native where mtl_tensor_from_cuda_mem_handle receives a CUDA handle of a different length to exercise the CUDA zerocopy path and assert the converter's behavior. Reference the existing fake_native SimpleNamespace and the initialize_from_mach_port and mtl_tensor_from_cuda_mem_handle call sites to implement these two new tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/craftground/environment/observation_converter.py`:
- Around line 319-322: The JAX branch incorrectly uses tensor methods (.clone(),
.flip()) and returns the original jax_image instead of the processed tensor;
update the block that sets rgb_array_or_tensor to use JAX-safe operations (e.g.,
use jax.numpy to copy/convert and jnp.flip to flip the array) rather than
.clone()/.flip(), change the channel reorder from [:, :, [2, 1, 0]] to [:, :,
:3] to drop alpha (matching CUDA semantics), ensure self.internal_type is set to
ObservationTensorType.JAX_NP, and return rgb_array_or_tensor, None (not
jax_image).
- Around line 306-309: convert_jax_observation() is using PyTorch methods on JAX
arrays (rgb_array_or_tensor.clone() and .flip(0)), which will raise
AttributeError; replace those with JAX equivalents by copying or re-wrapping the
array using jnp.array(...) or .copy() and use jnp.flip(..., axis=0) to flip the
y axis, and perform channel reordering via advanced indexing (e.g., [...,
[2,1,0]]); set self.internal_type = ObservationTensorType.JAX_NP as before and
return the transformed array; also apply the same replacements in the else
branch that performs the same clone/flip operations.
---
Nitpick comments:
In `@docs/superpowers/plans/2026-03-19-zerocopy-raw-normalize.md`:
- Line 13: Update the header and any occurrences that use "MacOS" to the
official Apple capitalization "macOS" (e.g., change the heading "## Chunk 1:
MacOS Zerocopy Boundary" to "## Chunk 1: macOS Zerocopy Boundary" and update any
other instances of "MacOS" in this document to "macOS").
In `@tests/python/unit/test_observation_converter.py`:
- Around line 21-24: Add test cases in
tests/python/unit/test_observation_converter.py to cover failure and CUDA
zerocopy branches: create a fake_native where initialize_from_mach_port returns
None and assert the converter (the code exercised by the existing test)
handles/raises the expected error or returns the expected fallback; and create
another fake_native where mtl_tensor_from_cuda_mem_handle receives a CUDA handle
of a different length to exercise the CUDA zerocopy path and assert the
converter's behavior. Reference the existing fake_native SimpleNamespace and the
initialize_from_mach_port and mtl_tensor_from_cuda_mem_handle call sites to
implement these two new tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 22924490-8cd7-4aa4-9b17-4c8fefa20703
📒 Files selected for processing (4)
docs/superpowers/plans/2026-03-19-zerocopy-raw-normalize.mddocs/superpowers/specs/2026-03-19-zerocopy-raw-normalize-design.mdsrc/craftground/environment/observation_converter.pytests/python/unit/test_observation_converter.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/cpp/ipc_apple.h (1)
6-8: Missingpybind11include forpy::objecttype.This header uses
py::objectbut doesn't include the pybind11 header. It relies on the includer having already included pybind11, which is fragile and can cause confusing compilation errors.♻️ Proposed fix
`#ifndef` __IPC_APPLE_H__ `#define` __IPC_APPLE_H__ +#include <pybind11/pybind11.h> +namespace py = pybind11; + `#define` USE_CUSTOM_DL_PACK_TENSOR 1 py::object🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cpp/ipc_apple.h` around lines 6 - 8, The header uses py::object (in declarations mtl_tensor_from_mach_port and normalize_apple_mtl_tensor_impl) but doesn't include pybind11, causing fragile builds; add the appropriate pybind11 include (e.g., `#include` <pybind11/pybind11.h> or <pybind11/pytypes.h>) at the top of src/cpp/ipc_apple.h so the py::object type is defined and the header can be included independently.src/cpp/ipc.h (1)
7-10: Missingpybind11include forpynamespace alias.The header declares
namespace py = pybind11;and usespy::object, but doesn't include the pybind11 header. This works only when the includer happens to include pybind11 first.♻️ Proposed fix
`#include` <stdlib.h> `#include` <string> +#include <pybind11/pybind11.h> namespace py = pybind11;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cpp/ipc.h` around lines 7 - 10, The header declares namespace py = pybind11 and uses py::object in initialize_from_mach_port and normalize_apple_mtl_tensor but doesn't include pybind11; add the proper include for pybind11 (e.g. include <pybind11/pybind11.h>) at the top of src/cpp/ipc.h so the py alias and py::object are always defined regardless of include order.src/cpp/ipc_apple_ops.mm (1)
120-122: Consider using PyTorch'sgetMTLBufferStorage()helper for consistency with MPS backend patterns.The direct cast from
src.storage().data()toid<MTLBuffer>is supported by PyTorch's MPS backend and documented in their Metal kernel implementations. However, PyTorch provides thegetMTLBufferStorage()helper function for this exact operation, which aligns with the backend's recommended approach and improves code clarity. Using the helper would make the intent explicit and align with PyTorch's own metal kernel code patterns.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cpp/ipc_apple_ops.mm` around lines 120 - 122, Replace the raw casts to Metal buffers with PyTorch's helper: instead of casting src.storage().data() and dst.storage().data() to id<MTLBuffer>, call getMTLBufferStorage(src.storage()) and getMTLBufferStorage(dst.storage()) (keeping use of getNormalizePipeline(stream->device())/pipeline unchanged) so src_buffer and dst_buffer are obtained via the helper for clarity and consistency with MPS backend patterns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cpp/ipc_apple_ops.mm`:
- Around line 71-95: Reformat the Metal pipeline creation block to satisfy
clang-format style rules: run clang-format -i on the file or reflow the block so
spacing, alignment and brace placement match project style; specifically
reformat the section that creates and checks
library/newFunctionWithName:@"normalize_bgr_flip", the error assignments to
library_error and pipeline_error, and the calls involving device, options,
pipeline, function, and pipeline_error so that indentation, line breaks and
spacing conform to clang-format (target the code around
newLibraryWithSource:options:error:, newFunctionWithName:, and
newComputePipelineStateWithFunction:error:); then re-run CI to ensure no
remaining formatting violations.
---
Nitpick comments:
In `@src/cpp/ipc_apple_ops.mm`:
- Around line 120-122: Replace the raw casts to Metal buffers with PyTorch's
helper: instead of casting src.storage().data() and dst.storage().data() to
id<MTLBuffer>, call getMTLBufferStorage(src.storage()) and
getMTLBufferStorage(dst.storage()) (keeping use of
getNormalizePipeline(stream->device())/pipeline unchanged) so src_buffer and
dst_buffer are obtained via the helper for clarity and consistency with MPS
backend patterns.
In `@src/cpp/ipc_apple.h`:
- Around line 6-8: The header uses py::object (in declarations
mtl_tensor_from_mach_port and normalize_apple_mtl_tensor_impl) but doesn't
include pybind11, causing fragile builds; add the appropriate pybind11 include
(e.g., `#include` <pybind11/pybind11.h> or <pybind11/pytypes.h>) at the top of
src/cpp/ipc_apple.h so the py::object type is defined and the header can be
included independently.
In `@src/cpp/ipc.h`:
- Around line 7-10: The header declares namespace py = pybind11 and uses
py::object in initialize_from_mach_port and normalize_apple_mtl_tensor but
doesn't include pybind11; add the proper include for pybind11 (e.g. include
<pybind11/pybind11.h>) at the top of src/cpp/ipc.h so the py alias and
py::object are always defined regardless of include order.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 266c2b26-26b0-4ebe-860d-c43ea28eefb0
📒 Files selected for processing (8)
CMakeLists.txtsrc/cpp/ipc.cppsrc/cpp/ipc.hsrc/cpp/ipc_apple.hsrc/cpp/ipc_apple.mmsrc/cpp/ipc_apple_ops.mmsrc/craftground/environment/observation_converter.pytests/python/unit/test_observation_converter.py
✅ Files skipped from review due to trivial changes (1)
- CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (1)
- src/craftground/environment/observation_converter.py
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cpp/ipc_apple.mm (1)
117-123:⚠️ Potential issue | 🟠 MajorValidate the IOSurface size at runtime before building the tensor.
Line 148 still uses the caller-supplied
width/height. The only check isasserton Lines 120-121, which disappears in release builds. If those values drift, the code can give the tensor the wrong shape and size the no-copy buffer from the wrong height.Suggested fix
- assert(iosurface_width == width); - assert(iosurface_height == height); + if (iosurface_width != static_cast<size_t>(width) || + iosurface_height != static_cast<size_t>(height)) { + throw std::runtime_error( + "IOSurface dimensions do not match requested tensor shape" + ); + } size_t bytesPerRow = IOSurfaceGetBytesPerRow(ioSurface); - size_t length = bytesPerRow * height; + size_t length = bytesPerRow * iosurface_height; @@ - DLManagedTensor *tensor = - createDLPackTensorMetal(mtlBuffer, width, height, bytesPerRow); + DLManagedTensor *tensor = createDLPackTensorMetal( + mtlBuffer, iosurface_width, iosurface_height, bytesPerRow + );Also applies to: 147-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cpp/ipc_apple.mm` around lines 117 - 123, Runtime validation is missing: don't rely on asserts for iosurface size — check iosurface_width and iosurface_height against the caller-supplied width/height at runtime and fail fast if they differ; use IOSurfaceGetWidth/IOSurfaceGetHeight to compute the tensor shape and the no-copy buffer length (bytesPerRow * iosurface_height) instead of using the possibly-stale caller `height`/`width`, and handle the mismatch by returning an error/throwing/early-exit from the function (replace the assert-based checks around iosurface_width/iosurface_height and update the calculation of `length` to use `iosurface_height` and `iosurface_width` as the authoritative dimensions).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cpp/ipc_apple_ops.mm`:
- Around line 69-96: The code uses manual retains/releases and alloc/init which
break ARC; update the Metal setup to ARC-safe operations: remove all `[...
release]` and the `retain` on library_error, replace the `[[NSError alloc]
initWithDomain:...]` with the autoreleasing factory `[NSError
errorWithDomain:code:userInfo:]`, and stop passing the static `pipeline_error`
into Metal calls—use a local `NSError *localPipelineError = nil` when calling
`newComputePipelineStateWithFunction:function error:&localPipelineError` then
assign `pipeline_error = localPipelineError` (or `pipeline_error =
[localPipelineError copy]` only if you need an owned copy). Keep references to
`options`, `library`, `function`, and `pipeline` as ARC-managed objects and
remove manual memory management calls around them.
- Around line 124-125: The C-style casts converting storage().data() to
id<MTLBuffer> (variables src_buffer and dst_buffer) are incompatible with ARC;
change those casts to non-retaining Objective-C bridge casts by replacing the
C-style casts with (__bridge id<MTLBuffer>) when recovering the buffers from
src.storage().data() and dst.storage().data() so the conversions are ARC-safe
and do not alter retain counts.
In `@src/cpp/ipc.cpp`:
- Around line 105-109: The Python import expects a symbol named
mtl_dlpack_from_mach_port but the PYBIND11_MODULE only exports
initialize_from_mach_port, normalize_apple_mtl_tensor, and
mtl_tensor_from_cuda_mem_handle; add an exported binding named
mtl_dlpack_from_mach_port (either by binding the existing C++ function
initialize_from_mach_port under that Python name or by implementing a thin
wrapper function mtl_dlpack_from_mach_port that calls the existing
implementation) in the PYBIND11_MODULE block so observation_converter.py can
import mtl_dlpack_from_mach_port successfully.
In `@src/craftground/environment/observation_converter.py`:
- Around line 119-126: The code normalizes only the value returned in the
APPLE_TENSOR/CUDA_DLPACK branches but leaves self.last_observations[0] as the
raw zerocopy buffer so render() later consumes unnormalized data; fix by
replacing the raw cache with the normalized tensor (assign the result of
_normalize_apple_zerocopy_tensor or _normalize_cuda_zerocopy_tensor back into
self.last_observations[0]) inside the branches (affecting initialize_zerocopy /
the branch handling for ObservationTensorType.APPLE_TENSOR and
ObservationTensorType.CUDA_DLPACK) so both the returned tuple and the cached
last_observations[0] contain the normalized RGB(A) tensor.
In `@tests/python/unit/test_observation_converter.py`:
- Around line 15-21: The fixture should model macOS' 4-byte-per-pixel layout by
creating an underlying HxWx4 uint8 tensor (add a dummy fourth channel per pixel)
and then produce the HxWx3 view using torch.as_strided with element-strides
(H_stride=8, W_stride=4, C_stride=1) so the resulting raw_bgr has shape (2,2,3)
but a physical 4-byte stride; replace the current raw_bgr creation with this
pattern (use the variable name raw_bgr) and apply the same change to the other
Apple zero-copy test fixture referenced in the file.
---
Outside diff comments:
In `@src/cpp/ipc_apple.mm`:
- Around line 117-123: Runtime validation is missing: don't rely on asserts for
iosurface size — check iosurface_width and iosurface_height against the
caller-supplied width/height at runtime and fail fast if they differ; use
IOSurfaceGetWidth/IOSurfaceGetHeight to compute the tensor shape and the no-copy
buffer length (bytesPerRow * iosurface_height) instead of using the
possibly-stale caller `height`/`width`, and handle the mismatch by returning an
error/throwing/early-exit from the function (replace the assert-based checks
around iosurface_width/iosurface_height and update the calculation of `length`
to use `iosurface_height` and `iosurface_width` as the authoritative
dimensions).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3628f1fe-9f4e-46de-b7ba-2309a7b0df39
📒 Files selected for processing (8)
pybind11src/cpp/ipc.cppsrc/cpp/ipc.hsrc/cpp/ipc_apple.hsrc/cpp/ipc_apple.mmsrc/cpp/ipc_apple_ops.mmsrc/craftground/environment/observation_converter.pytests/python/unit/test_observation_converter.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/cpp/ipc.h
- src/cpp/ipc_apple.h
| PYBIND11_MODULE(craftground_native, m) { | ||
| m.doc() = "Craftground Native Module"; | ||
| m.def("initialize_from_mach_port", &initialize_from_mach_port); | ||
| m.def("normalize_apple_mtl_tensor", &normalize_apple_mtl_tensor); | ||
| m.def("mtl_tensor_from_cuda_mem_handle", &mtl_tensor_from_cuda_mem_handle); |
There was a problem hiding this comment.
The Apple JAX path still imports a symbol this module never exports.
Line 108 adds the new normalize binding, but src/craftground/environment/observation_converter.py Line 298 still does from .craftground_native import mtl_dlpack_from_mach_port. That name is not bound anywhere in this module, so the first macOS/JAX conversion will fail with ImportError. The new unit test only passes because it injects a fake implementation into sys.modules.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/cpp/ipc.cpp` around lines 105 - 109, The Python import expects a symbol
named mtl_dlpack_from_mach_port but the PYBIND11_MODULE only exports
initialize_from_mach_port, normalize_apple_mtl_tensor, and
mtl_tensor_from_cuda_mem_handle; add an exported binding named
mtl_dlpack_from_mach_port (either by binding the existing C++ function
initialize_from_mach_port under that Python name or by implementing a thin
wrapper function mtl_dlpack_from_mach_port that calls the existing
implementation) in the PYBIND11_MODULE block so observation_converter.py can
import mtl_dlpack_from_mach_port successfully.
| if self.internal_type == ObservationTensorType.APPLE_TENSOR: | ||
| obs_1 = self.last_observations[0].clone()[:, :, [2, 1, 0]].flip(0) | ||
| obs_1 = self._normalize_apple_zerocopy_tensor( | ||
| self.last_observations[0] | ||
| ) | ||
| return (obs_1, None) | ||
| elif self.internal_type == ObservationTensorType.CUDA_DLPACK: | ||
| obs_1 = self.last_observations[0].clone()[:, :, :3].flip(0) | ||
| obs_1 = self._normalize_cuda_zerocopy_tensor(self.last_observations[0]) | ||
| return (obs_1, None) |
There was a problem hiding this comment.
render() now sees the raw zerocopy tensor, not the public RGB output.
Line 120/125 normalizes only the return value. initialize_zerocopy() still leaves self.last_observations[0] pointing at the raw Apple/CUDA buffer, and render() later consumes that cache directly on Lines 168-204. That regresses zerocopy rendering back to unnormalized BGR/RGBA data.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/craftground/environment/observation_converter.py` around lines 119 - 126,
The code normalizes only the value returned in the APPLE_TENSOR/CUDA_DLPACK
branches but leaves self.last_observations[0] as the raw zerocopy buffer so
render() later consumes unnormalized data; fix by replacing the raw cache with
the normalized tensor (assign the result of _normalize_apple_zerocopy_tensor or
_normalize_cuda_zerocopy_tensor back into self.last_observations[0]) inside the
branches (affecting initialize_zerocopy / the branch handling for
ObservationTensorType.APPLE_TENSOR and ObservationTensorType.CUDA_DLPACK) so
both the returned tuple and the cached last_observations[0] contain the
normalized RGB(A) tensor.
| raw_bgr = torch.tensor( | ||
| [ | ||
| [[10, 20, 30], [40, 50, 60]], | ||
| [[70, 80, 90], [100, 110, 120]], | ||
| ], | ||
| dtype=torch.uint8, | ||
| ) |
There was a problem hiding this comment.
Model the macOS raw tensor as a 4-byte-strided view in these fixtures.
The Apple bridge now exposes shape[2] == 3 with a physical pixel stride of 4. These compact HWCx3 tensors will still pass even if the stride metadata regresses again.
Example fixture shape
- raw_bgr = torch.tensor(
+ raw_bgra = torch.tensor(
[
- [[10, 20, 30], [40, 50, 60]],
- [[70, 80, 90], [100, 110, 120]],
+ [[10, 20, 30, 255], [40, 50, 60, 255]],
+ [[70, 80, 90, 255], [100, 110, 120, 255]],
],
dtype=torch.uint8,
)
+ raw_bgr = raw_bgra[..., :3]
+ assert raw_bgr.stride() == (8, 4, 1)Apply the same pattern to both Apple zerocopy tests.
Also applies to: 63-69
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/python/unit/test_observation_converter.py` around lines 15 - 21, The
fixture should model macOS' 4-byte-per-pixel layout by creating an underlying
HxWx4 uint8 tensor (add a dummy fourth channel per pixel) and then produce the
HxWx3 view using torch.as_strided with element-strides (H_stride=8, W_stride=4,
C_stride=1) so the resulting raw_bgr has shape (2,2,3) but a physical 4-byte
stride; replace the current raw_bgr creation with this pattern (use the variable
name raw_bgr) and apply the same change to the other Apple zero-copy test
fixture referenced in the file.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cpp/ipc_apple_ops.mm`:
- Around line 116-118: The kernel assumes channels are contiguous; before
building NormalizeParams and dispatching the kernel, validate the tensor channel
stride and positivity: add TORCH_CHECK(src.stride(2) == 1, "Expected contiguous
channel stride (stride(2) == 1)") and also assert positive strides (e.g.,
TORCH_CHECK(src.stride(i) > 0, "...") for relevant dims) to avoid non-contiguous
memory causing corrupted reads; perform these checks near the existing
TORCH_CHECK calls that reference src.dim(), src.size(2), and
src.storage().data() so the validation runs before NormalizeParams construction
and kernel dispatch.
- Around line 129-138: The NormalizeParams construction narrows 64-bit tensor
metadata (src/dst size, stride, storage_offset) to uint32_t without bounds
checks; add checked conversions before filling NormalizeParams by verifying each
value from src.size(...), src.stride(...), src.storage_offset(),
dst.stride(...), dst.storage_offset() is <= UINT32_MAX (use
std::numeric_limits<uint32_t>::max() or UINT32_MAX), and handle violations
(return an error/throw an exception or log and abort) instead of silently
truncating; after validation, perform the static_casts into NormalizeParams as
before.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/cpp/ipc_apple_ops.mm (1)
12-21: Pin the host/kernel param ABI with astatic_assert.
NormalizeParamsis duplicated in C++ and in the embedded Metal source. A small host-side size check would at least fail fast if the CPU layout drifts from the shader contract in a later refactor.🧭 Suggested hardening
struct NormalizeParams { uint32_t width; uint32_t height; uint32_t src_row_stride; uint32_t src_pixel_stride; uint32_t src_offset; uint32_t dst_row_stride; uint32_t dst_pixel_stride; uint32_t dst_offset; }; +static_assert( + sizeof(NormalizeParams) == 8 * sizeof(uint32_t), + "NormalizeParams must stay in sync with the Metal kernel" +);Also applies to: 33-42
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cpp/ipc_apple_ops.mm` around lines 12 - 21, Add a compile-time layout check to pin the host/kernel ABI: immediately after the NormalizeParams struct definition add a static_assert that sizeof(NormalizeParams) equals the exact byte size expected by the Metal shader ABI (use the literal byte value that matches the embedded Metal struct), and include similar static_asserts for the other duplicated structs mentioned (the ones around lines 33-42) so any future layout drift fails to compile; place these assertions next to the corresponding struct declarations and keep the message descriptive (e.g., "NormalizeParams size must match Metal shader ABI").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cpp/ipc_apple_ops.mm`:
- Around line 119-126: Validation currently checks channel contiguity
(src.stride(2) == 1) and positive strides but misses overlap checks; add checks
before the kernel launch to reject HWC views that would let the kernel read
invalid bytes: if width > 1 require src.stride(1) >= 3, and if height > 1
require src.stride(0) >= ((width > 0 ? (width - 1) : 0) * src.stride(1) + 3);
reference the tensor strides via src.stride(1) and src.stride(0), and the
width/height variables used for the kernel so the code returns an error
(TORCH_CHECK) when these conditions are not met.
---
Nitpick comments:
In `@src/cpp/ipc_apple_ops.mm`:
- Around line 12-21: Add a compile-time layout check to pin the host/kernel ABI:
immediately after the NormalizeParams struct definition add a static_assert that
sizeof(NormalizeParams) equals the exact byte size expected by the Metal shader
ABI (use the literal byte value that matches the embedded Metal struct), and
include similar static_asserts for the other duplicated structs mentioned (the
ones around lines 33-42) so any future layout drift fails to compile; place
these assertions next to the corresponding struct declarations and keep the
message descriptive (e.g., "NormalizeParams size must match Metal shader ABI").
| TORCH_CHECK( | ||
| src.stride(2) == 1, | ||
| "Expected contiguous channel stride (stride(2) == 1)" | ||
| ); | ||
| TORCH_CHECK( | ||
| src.stride(0) > 0 && src.stride(1) > 0, | ||
| "Expected positive row and pixel strides" | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f ipc_apple_ops.mmRepository: yhs0602/CraftGround
Length of output: 87
🏁 Script executed:
cat -n src/cpp/ipc_apple_ops.mmRepository: yhs0602/CraftGround
Length of output: 7864
Reject overlapping HWC views in validation checks.
Lines 119–126 enforce channel contiguity (stride(2) == 1) and positive strides, but do not ensure pixels or rows are non-overlapping. The kernel (lines 54–62) reads 3 bytes per pixel assuming stride(1) >= 3 and reads across rows assuming stride(0) >= (width - 1) * stride(1) + 3. Tensors with stride(1) < 3 or insufficient stride(0) will cause the kernel to read the wrong bytes without error.
Add validation before line 148:
- For width > 1: require
stride(1) >= 3 - For height > 1: require
stride(0) >= (width - 1) * stride(1) + 3(handle width = 0 as 0)
Suggested fix
+ const auto width = to_u32_checked(src.size(1), "width");
+ const auto height = to_u32_checked(src.size(0), "height");
+ const auto src_row_stride =
+ to_u32_checked(src.stride(0), "src_row_stride");
+ const auto src_pixel_stride =
+ to_u32_checked(src.stride(1), "src_pixel_stride");
+ const auto min_row_span =
+ width == 0
+ ? uint64_t{0}
+ : (static_cast<uint64_t>(width) - 1) *
+ static_cast<uint64_t>(src_pixel_stride) +
+ 3u;
+
+ TORCH_CHECK(
+ width <= 1 || src_pixel_stride >= 3,
+ "Expected pixel stride >= 3 for non-overlapping RGB pixels"
+ );
+ TORCH_CHECK(
+ height <= 1 || static_cast<uint64_t>(src_row_stride) >= min_row_span,
+ "Expected row stride to cover full row without overlap"
+ );
+
NormalizeParams params{
- to_u32_checked(src.size(1), "width"),
- to_u32_checked(src.size(0), "height"),
- to_u32_checked(src.stride(0), "src_row_stride"),
- to_u32_checked(src.stride(1), "src_pixel_stride"),
+ width,
+ height,
+ src_row_stride,
+ src_pixel_stride,
to_u32_checked(src.storage_offset(), "src_offset"),
to_u32_checked(dst.stride(0), "dst_row_stride"),
to_u32_checked(dst.stride(1), "dst_pixel_stride"),
to_u32_checked(dst.storage_offset(), "dst_offset"),
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/cpp/ipc_apple_ops.mm` around lines 119 - 126, Validation currently checks
channel contiguity (src.stride(2) == 1) and positive strides but misses overlap
checks; add checks before the kernel launch to reject HWC views that would let
the kernel read invalid bytes: if width > 1 require src.stride(1) >= 3, and if
height > 1 require src.stride(0) >= ((width > 0 ? (width - 1) : 0) *
src.stride(1) + 3); reference the tensor strides via src.stride(1) and
src.stride(0), and the width/height variables used for the kernel so the code
returns an error (TORCH_CHECK) when these conditions are not met.
|
성능 관점에서 이 PR의 임팩트는 “완전한 zero-copy 달성”보다는, 기존 macOS public zerocopy 경로의 불필요한 중간 처리를 줄이는 쪽에 가깝습니다. 기존 경로에서는 Python 쪽에서 정리하면:
즉, 이 PR의 성능 가치는 “copy를 0으로 만들었다”가 아니라, |
Summary
ZEROCOPYpath on macOS with a native Metal normalize op that writes an allocator-managed MPS tensor inRGB + flippedlayoutVerification
uv run --extra test pytest tests/python/unit/test_observation_converter.py -vpython -m pip install -e . --no-build-isolationNotes
@rpath/libomp.dylibruntime issue in this machine setupSummary by CodeRabbit
Documentation
New Features
Refactor
Tests