Skip to content

Commit a9e8182

Browse files
committed
Formatting
1 parent 567bdc5 commit a9e8182

File tree

1 file changed

+20
-18
lines changed

1 file changed

+20
-18
lines changed

decoder_native_transforms.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
We want to support this user-facing API:
2-
2+
3+
```python
34
decoder = VideoDecoder(
45
"vid.mp4",
56
transforms=[
@@ -8,18 +9,19 @@ We want to support this user-facing API:
89
),
910
torchvision.transforms.v2.Resize(
1011
width=640,
11-
height=480,
12+
height=480,
1213
),
1314
torchvision.transforms.v2.RandomCrop(
1415
width=32,
1516
height=32,
1617
),
1718
]
1819
)
20+
```
1921

2022
What the user is asking for, in English:
2123

22-
1. I want to decode frames from the file "vid.mp4".
24+
1. I want to decode frames from the file `"vid.mp4".`
2325
2. For each decoded frame, I want each frame to pass through the following
2426
transforms:
2527
a. Add or remove frames as necessary to ensure a constant 30 frames
@@ -38,7 +40,7 @@ These three transforms are instructive, as they force us to consider:
3840
2. Transforms that involve randomness. The main question we need to resolve
3941
is when the random value is resolved. I think this comes down to: once
4042
upon Python VideoDecoder creation, or different for each frame decoded?
41-
I made the call above that it should be once upon Python VideoDecoder
43+
I made the call above that it should be once upon Python `VideoDecoder`
4244
creation, but we need to make sure that lines up with what we think
4345
users will want.
4446
3. Transforms that are supported by FFmpeg but not supported by
@@ -48,19 +50,19 @@ These three transforms are instructive, as they force us to consider:
4850
First let's consider implementing the "easy" case of Resize.
4951

5052
1. We add an optional `transforms` parameter to the initialization of
51-
VideoDecoder. It is a sequence of TorchVision Transforms.
53+
`VideoDecoder`. It is a sequence of TorchVision Transforms.
5254
2. During VideoDecoder object creation, we walk the list, capturing two
5355
pieces of information:
54-
a. The transform name that the C++ layer will understand. (We will
56+
a. The transform name that the C++ layer will understand. (We will
5557
have to decide if we want to just use the FFmpeg filter name
5658
here, the fully resolved Transform name, or introduce a new
5759
naming layer.)
5860
b. The parameters in a format that the C++ layer will understand. We
5961
obtain them by calling `make_params()` on the Transform object.
60-
3. We add an optional transforms parameter to core.add_video_stream(). This
62+
3. We add an optional transforms parameter to `core.add_video_stream()`. This
6163
parameter will be a vector, but whether the vector contains strings,
6264
tensors, or some combination of them is TBD.
63-
4. The custom_ops.cpp and pybind_ops.cpp layer is responsible for turning
65+
4. The `custom_ops.cpp` and `pybind_ops.cpp` layer is responsible for turning
6466
the values passed from the Python layer into transform objects that the
6567
C++ layer knows about. We will have one class per transform we support.
6668
Each class will have:
@@ -69,31 +71,31 @@ First let's consider implementing the "easy" case of Resize.
6971
c. A virtual member function that knows how to produce a string that
7072
can be passed to FFmpeg's filtergraph.
7173
5. We add a vector of such transforms to
72-
SingleStreamDecoder::addVideoStream. We store the vector as a field in
73-
SingleStreamDecoder.
74-
6. We need to reconcile FilterGraph, FiltersContext and this vector of
74+
`SingleStreamDecoder::addVideoStream`. We store the vector as a field in
75+
`SingleStreamDecoder`.
76+
6. We need to reconcile `FilterGraph`, `FiltersContext` and this vector of
7577
transforms. They are all related, but it's not clear to me what the
7678
exact relationship should be.
7779
7. The actual string we pass to FFmepg's filtergraph comes from calling
7880
the virtual member function on each transform object.
7981

8082
For the transforms that do not exist in TorchVision, we can build on the above:
8183

82-
1. We define a new module, torchcodec.decoders.transforms.
84+
1. We define a new module, `torchcodec.decoders.transforms`.
8385
2. All transforms we define in there inherit from
84-
torchvision.transforms.v2.Transform.
86+
`torchvision.transforms.v2.Transform`.
8587
3. We implement the mimimum needed to hook the new transforms into the
8688
machinery defined above.
8789

8890
Open questions:
8991

90-
1. Is torchcodec.transforms the right namespace?
92+
1. Is `torchcodec.transforms` the right namespace?
9193
2. For random transforms, when should the value be fixed?
92-
3. Transforms such as Resize don't actually implement a make_params()
94+
3. Transforms such as Resize don't actually implement a `make_params()`
9395
method. How does TorchVision get their parameters? How will TorchCodec?
9496
4. How do we communicate the transformation names and parameters to the C++
9597
layer? We need to support transforms with an arbitrary number of parameters.
96-
5. How does this generalize to AudioDecoder? Ideally we would be able to
98+
5. How does this generalize to `AudioDecoder`? Ideally we would be able to
9799
support TorchAudio's transforms in a similar way.
98-
6. What is the relationship between the C++ transform objects, FilterGraph
99-
and FiltersContext?
100+
6. What is the relationship between the C++ transform objects, `FilterGraph`
101+
and `FiltersContext`?

0 commit comments

Comments
 (0)