You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: decoder_native_transforms.md
+10-7Lines changed: 10 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,12 +25,14 @@ What the user is asking for, in English:
25
25
2. For each decoded frame, I want each frame to pass through the following transforms:
26
26
1. Add or remove frames as necessary to ensure a constant 30 frames per second.
27
27
2. Resize the frame to 640x480. Use the algorithm that is TorchVision's default.
28
-
3. Inside the resized frame, crop the image to 32x32. The x and y coordinates are chosen randomly upon the creation of the Python `VideoDecoder` object. All decoded frames use the same values for x and y.
28
+
3. Inside the resized frame, crop the image to 32x32. The x and y coordinates are
29
+
chosen randomly upon the creation of the Python `VideoDecoder` object. All decoded
30
+
frames use the same values for x and y.
29
31
30
32
These three transforms are instructive, as they force us to consider:
31
33
32
34
1. How "easy" TorchVision transforms will be handled, where all values are
33
-
static. Resize is such an example.
35
+
static. `Resize` is such an example.
34
36
2. Transforms that involve randomness. The main question we need to resolve
35
37
is when the random value is resolved. I think this comes down to: once
36
38
upon Python `VideoDecoder` creation, or different for each frame decoded?
@@ -41,11 +43,11 @@ These three transforms are instructive, as they force us to consider:
41
43
TorchVision. In particular, FPS is something that multiple users have
42
44
asked for.
43
45
44
-
First let's consider implementing the "easy" case of Resize.
46
+
First let's consider implementing the "easy" case of `Resize`.
45
47
46
48
1. We add an optional `transforms` parameter to the initialization of
47
49
`VideoDecoder`. It is a sequence of TorchVision Transforms.
48
-
2. During VideoDecoder object creation, we walk the list, capturing two
50
+
2. During `VideoDecoder` object creation, we walk the list, capturing two
49
51
pieces of information:
50
52
1. The transform name that the C++ layer will understand. (We will
51
53
have to decide if we want to just use the FFmpeg filter name
@@ -87,9 +89,10 @@ Open questions:
87
89
2. For random transforms, when should the value be fixed?
88
90
3. Transforms such as Resize don't actually implement a `make_params()`
89
91
method. How does TorchVision get their parameters? How will TorchCodec?
90
-
4. How do we communicate the transformation names and parameters to the C++
92
+
4. Should the name at the bridge layer between Python and C++ just be the FFmpeg filter name?
93
+
5. How do we communicate the transformation names and parameters to the C++
91
94
layer? We need to support transforms with an arbitrary number of parameters.
92
-
5. How does this generalize to `AudioDecoder`? Ideally we would be able to
95
+
6. How does this generalize to `AudioDecoder`? Ideally we would be able to
93
96
support TorchAudio's transforms in a similar way.
94
-
6. What is the relationship between the C++ transform objects, `FilterGraph`
97
+
7. What is the relationship between the C++ transform objects, `FilterGraph`
0 commit comments