Another open question

scotts · scotts · commit 90f41ed497cb · 2025-09-08T12:34:24.000-07:00
diff --git a/decoder_native_transforms.md b/decoder_native_transforms.md
@@ -25,12 +25,14 @@ What the user is asking for, in English:
 2. For each decoded frame, I want each frame to pass through the following transforms:
    1. Add or remove frames as necessary to ensure a constant 30 frames per second.
    2. Resize the frame to 640x480. Use the algorithm that is TorchVision's default.
-   3. Inside the resized frame, crop the image to 32x32. The x and y coordinates are chosen randomly upon the creation of the Python `VideoDecoder` object. All decoded frames use the same values for x and y.
+   3. Inside the resized frame, crop the image to 32x32. The x and y coordinates are
+      chosen randomly upon the creation of the Python `VideoDecoder` object. All decoded
+      frames use the same values for x and y.
 
 These three transforms are instructive, as they force us to consider:
 
 1. How "easy" TorchVision transforms will be handled, where all values are
-   static. Resize is such an example.
+   static. `Resize` is such an example.
 2. Transforms that involve randomness. The main question we need to resolve
    is when the random value is resolved. I think this comes down to: once
    upon Python `VideoDecoder` creation, or different for each frame decoded?
@@ -41,11 +43,11 @@ These three transforms are instructive, as they force us to consider:
    TorchVision. In particular, FPS is something that multiple users have
    asked for.
 
-First let's consider implementing the "easy" case of Resize.
+First let's consider implementing the "easy" case of `Resize`.
 
 1. We add an optional `transforms` parameter to the initialization of
    `VideoDecoder`. It is a sequence of TorchVision Transforms.
-2. During VideoDecoder object creation, we walk the list, capturing two
+2. During `VideoDecoder` object creation, we walk the list, capturing two
    pieces of information:
    1. The transform name that the C++ layer will understand. (We will
       have to decide if we want to just use the FFmpeg filter name
@@ -87,9 +89,10 @@ Open questions:
 2. For random transforms, when should the value be fixed?
 3. Transforms such as Resize don't actually implement a `make_params()`
    method. How does TorchVision get their parameters? How will TorchCodec?
-4. How do we communicate the transformation names and parameters to the C++
+4. Should the name at the bridge layer between Python and C++ just be the FFmpeg filter name?
+5. How do we communicate the transformation names and parameters to the C++
    layer? We need to support transforms with an arbitrary number of parameters.
-5. How does this generalize to `AudioDecoder`? Ideally we would be able to
+6. How does this generalize to `AudioDecoder`? Ideally we would be able to
    support TorchAudio's transforms in a similar way.
-6. What is the relationship between the C++ transform objects, `FilterGraph`
+7. What is the relationship between the C++ transform objects, `FilterGraph`
    and `FiltersContext`?