Formatting

scotts · scotts · commit a9e818296d05 · 2025-09-08T12:14:54.000-07:00
diff --git a/decoder_native_transforms.md b/decoder_native_transforms.md
@@ -1,5 +1,6 @@
 We want to support this user-facing API:
- 
+
+ ```python
     decoder = VideoDecoder(
         "vid.mp4",
         transforms=[
@@ -8,18 +9,19 @@ We want to support this user-facing API:
             ),
             torchvision.transforms.v2.Resize(
                 width=640,
-                height=480, 
+                height=480,
             ),
             torchvision.transforms.v2.RandomCrop(
                 width=32,
                 height=32,
             ),
         ]
     )
+```
 
 What the user is asking for, in English:
 
-    1. I want to decode frames from the file "vid.mp4".
+    1. I want to decode frames from the file `"vid.mp4".`
     2. For each decoded frame, I want each frame to pass through the following
        transforms:
          a. Add or remove frames as necessary to ensure a constant 30 frames
@@ -38,7 +40,7 @@ These three transforms are instructive, as they force us to consider:
     2. Transforms that involve randomness. The main question we need to resolve
        is when the random value is resolved. I think this comes down to: once
        upon Python VideoDecoder creation, or different for each frame decoded?
-       I made the call above that it should be once upon Python VideoDecoder
+       I made the call above that it should be once upon Python `VideoDecoder`
        creation, but we need to make sure that lines up with what we think
        users will want.
     3. Transforms that are supported by FFmpeg but not supported by
@@ -48,19 +50,19 @@ These three transforms are instructive, as they force us to consider:
 First let's consider implementing the "easy" case of Resize.
 
     1. We add an optional `transforms` parameter to the initialization of
-       VideoDecoder. It is a sequence of TorchVision Transforms.
+       `VideoDecoder`. It is a sequence of TorchVision Transforms.
     2. During VideoDecoder object creation, we walk the list, capturing two
        pieces of information:
-           a. The transform name that the C++ layer will understand. (We will 
+           a. The transform name that the C++ layer will understand. (We will
               have to decide if we want to just use the FFmpeg filter name
               here, the fully resolved Transform name, or introduce a new
               naming layer.)
            b. The parameters in a format that the C++ layer will understand. We
               obtain them by calling `make_params()` on the Transform object.
-    3. We add an optional transforms parameter to core.add_video_stream(). This
+    3. We add an optional transforms parameter to `core.add_video_stream()`. This
        parameter will be a vector, but whether the vector contains strings,
        tensors, or some combination of them is TBD.
-    4. The custom_ops.cpp and pybind_ops.cpp layer is responsible for turning
+    4. The `custom_ops.cpp` and `pybind_ops.cpp` layer is responsible for turning
        the values passed from the Python layer into transform objects that the
        C++ layer knows about. We will have one class per transform we support.
        Each class will have:
@@ -69,31 +71,31 @@ First let's consider implementing the "easy" case of Resize.
            c. A virtual member function that knows how to produce a string that
               can be passed to FFmpeg's filtergraph.
     5. We add a vector of such transforms to
-       SingleStreamDecoder::addVideoStream. We store the vector as a field in
-       SingleStreamDecoder.
-    6. We need to reconcile FilterGraph, FiltersContext and this vector of
+       `SingleStreamDecoder::addVideoStream`. We store the vector as a field in
+       `SingleStreamDecoder`.
+    6. We need to reconcile `FilterGraph`, `FiltersContext` and this vector of
        transforms. They are all related, but it's not clear to me what the
        exact relationship should be.
     7. The actual string we pass to FFmepg's filtergraph comes from calling
        the virtual member function on each transform object.
 
 For the transforms that do not exist in TorchVision, we can build on the above:
 
-   1. We define a new module, torchcodec.decoders.transforms.
+   1. We define a new module, `torchcodec.decoders.transforms`.
    2. All transforms we define in there inherit from
-      torchvision.transforms.v2.Transform. 
+      `torchvision.transforms.v2.Transform`.
    3. We implement the mimimum needed to hook the new transforms into the
       machinery defined above.
 
 Open questions:
 
-    1. Is torchcodec.transforms the right namespace?
+    1. Is `torchcodec.transforms` the right namespace?
     2. For random transforms, when should the value be fixed?
-    3. Transforms such as Resize don't actually implement a make_params()
+    3. Transforms such as Resize don't actually implement a `make_params()`
        method. How does TorchVision get their parameters? How will TorchCodec?
     4. How do we communicate the transformation names and parameters to the C++
        layer? We need to support transforms with an arbitrary number of parameters.
-    5. How does this generalize to AudioDecoder? Ideally we would be able to
+    5. How does this generalize to `AudioDecoder`? Ideally we would be able to
        support TorchAudio's transforms in a similar way.
-    6. What is the relationship between the C++ transform objects, FilterGraph
-       and FiltersContext?
+    6. What is the relationship between the C++ transform objects, `FilterGraph`
+       and `FiltersContext`?