Skip to content

Commit 90f41ed

Browse files
committed
Another open question
1 parent 07f1d73 commit 90f41ed

File tree

1 file changed

+10
-7
lines changed

1 file changed

+10
-7
lines changed

decoder_native_transforms.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,14 @@ What the user is asking for, in English:
2525
2. For each decoded frame, I want each frame to pass through the following transforms:
2626
1. Add or remove frames as necessary to ensure a constant 30 frames per second.
2727
2. Resize the frame to 640x480. Use the algorithm that is TorchVision's default.
28-
3. Inside the resized frame, crop the image to 32x32. The x and y coordinates are chosen randomly upon the creation of the Python `VideoDecoder` object. All decoded frames use the same values for x and y.
28+
3. Inside the resized frame, crop the image to 32x32. The x and y coordinates are
29+
chosen randomly upon the creation of the Python `VideoDecoder` object. All decoded
30+
frames use the same values for x and y.
2931

3032
These three transforms are instructive, as they force us to consider:
3133

3234
1. How "easy" TorchVision transforms will be handled, where all values are
33-
static. Resize is such an example.
35+
static. `Resize` is such an example.
3436
2. Transforms that involve randomness. The main question we need to resolve
3537
is when the random value is resolved. I think this comes down to: once
3638
upon Python `VideoDecoder` creation, or different for each frame decoded?
@@ -41,11 +43,11 @@ These three transforms are instructive, as they force us to consider:
4143
TorchVision. In particular, FPS is something that multiple users have
4244
asked for.
4345

44-
First let's consider implementing the "easy" case of Resize.
46+
First let's consider implementing the "easy" case of `Resize`.
4547

4648
1. We add an optional `transforms` parameter to the initialization of
4749
`VideoDecoder`. It is a sequence of TorchVision Transforms.
48-
2. During VideoDecoder object creation, we walk the list, capturing two
50+
2. During `VideoDecoder` object creation, we walk the list, capturing two
4951
pieces of information:
5052
1. The transform name that the C++ layer will understand. (We will
5153
have to decide if we want to just use the FFmpeg filter name
@@ -87,9 +89,10 @@ Open questions:
8789
2. For random transforms, when should the value be fixed?
8890
3. Transforms such as Resize don't actually implement a `make_params()`
8991
method. How does TorchVision get their parameters? How will TorchCodec?
90-
4. How do we communicate the transformation names and parameters to the C++
92+
4. Should the name at the bridge layer between Python and C++ just be the FFmpeg filter name?
93+
5. How do we communicate the transformation names and parameters to the C++
9194
layer? We need to support transforms with an arbitrary number of parameters.
92-
5. How does this generalize to `AudioDecoder`? Ideally we would be able to
95+
6. How does this generalize to `AudioDecoder`? Ideally we would be able to
9396
support TorchAudio's transforms in a similar way.
94-
6. What is the relationship between the C++ transform objects, `FilterGraph`
97+
7. What is the relationship between the C++ transform objects, `FilterGraph`
9598
and `FiltersContext`?

0 commit comments

Comments
 (0)