Skip to content

Implementation Discrepency Relative to Publication #548

@Yatagarasu50469

Description

@Yatagarasu50469

@JiahuiYu
Salutations,

Examining the original publication, it is indicated that the inputs (I) to a gated convolution are convolved with different weights (one set for features: W_f and another for gating: W_g), activated with different functions (sigmoid for gating and elu for features), then multiplied together. On the first gated convolution in the model, given cnum=48, W_g and W_f would each have a depth of 48, and the output of the gated convolution would have 48 output channels. This interpretation would match the publication and with a comment made in a prior issue's discussion #62 (comment) that gated convolution can be implemented as (Code A):

x1 = self.conv1(x)
x2 = self.conv2(x)
x = sigmoid(x2) * activation(x1)

However, using the gen_conv definition provided, there is a single Conv2D with cnum=48, the result of which is then split in half to create two smaller sets (x and y) of just 24 channels, each of which gets activated and then multiplied together. Here then (Code B), as implemented, the gated convolution only produces 24 output channels.

x, y = tf.split(x, 2, 3)
x = activation(x)
y = tf.nn.sigmoid(y)
x = x * y

Similarly, referencing the same prior issue discussion, the other code given there (Code C):

x = self.conv(x)
x1, x2 = split(x, 2) # split along channels 
x = sigmoid(x2) * activation(x1)

would not be equivalent to Code A unless "self.conv" has double the number of filters as "self.conv1" and "self.conv2" (and assuming, of course, self.conv1 and self.conv2 have the same number of filters).

Any insight you can provide regarding this apparent discrepancy would be greatly appreciated.
Thank you in advanced.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions