Fixes wrong output dimensions in ConvTranspose1d#78
Open
ecobost wants to merge 3 commits intodescriptinc:mainfrom
Open
Fixes wrong output dimensions in ConvTranspose1d#78ecobost wants to merge 3 commits intodescriptinc:mainfrom
ecobost wants to merge 3 commits intodescriptinc:mainfrom
Conversation
Changes output_padding to deal better with odd strides.
plenty of changes since 1.0.0
|
@ecobost Hi thanks for your work. I come to this pr since I am fixing the same problem in stable-audio-tools repo. May I ask if you have observe any artifacts that will caused by using output_padding=0? |
Author
|
Hi @ArchiMickey, I haven't seen any artifacts but I wasn't necessarily looking out for them. I doubt there will be any, though. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Solves #42, #58 and #68: all related to incorrect computation of output shape in the ConvTraspose1d of the DecoderBlock (as also pointed out in PR #44).
When using stride > 1 in a conv operation the output dimensions are underdetermined and ConvTranspose1d needs extra info (the output_padding) to compute the expected output (see note in docs).
Given the construction constraints of the conv/deconv operations (namely, kernel_size=stride/2, padding=ceil(stride/2)), I figured out the right output_padding (so we always recover the same input dimensions) is:
with input_timesteps = timestestep dimension of the input to the original conv1d.
This PR sets output_padding=0 for even strides and 1 for odd strides. This will work in the vast majority of cases (including for all pretrained models) except when:
1: if stride is even and input_timesteps is not divisible by stride.
2: if stride is odd and input_timesteps+1 is divisible by stride.
Both of which are unlikely ( and case 1 would fail anyway even without this PR). At the very least, I believe the current setting is a more sensible default.