.. _chapter_transposed_conv: Transposed Convolution ====================== The layers we introduced so far for convolutional neural networks, including convolutional layers (:numref:chapter_conv_layer) and pooling layers (:numref:chapter_pooling), often reducethe input width and height, or keep them unchanged. Applications such as semantic segmentation (:numref:chapter_semantic_segmentation) and generative adversarial networks (:numref:chapter_dcgan), however, require to predict values for each pixel and therefore needs to increase input width and height. Transposed convolution, also named fractionally-strided convolution :ref:Dumoulin.Visin.2016 or deconvolution :ref:Long.Shelhamer.Darrell.2015, serves this purpose. .. code:: python from mxnet import nd, init from mxnet.gluon import nn import d2l Basic 2D Transposed Convolution ------------------------------- Let’s consider a basic case that both input and output channels are 1, with 0 padding and 1 stride. :numref:fig_trans_conv illustrates how transposed convolution with a :math:2\times 2 kernel is computed on the :math:2\times 2 input matrix. .. _fig_trans_conv: .. figure:: ../img/trans_conv.svg Transposed convolution layer with a :math:2\times 2 kernel. We can implement this operation by giving matrix kernel :math:K and matrix input :math:X. .. code:: python def trans_conv(X, K): h, w = K.shape Y = nd.zeros((X.shape[0] + h - 1, X.shape[1] + w - 1)) for i in range(X.shape[0]): for j in range(X.shape[1]): Y[i: i + h, j: j + w] += X[i, j] * K return Y Remember the convolution computes results by Y[i, j] = (X[i: i + h, j: j + w] * K).sum() (refer to corr2d in :numref:chapter_conv_layer), which summarizes input values through the kernel. While the transposed convolution broadcasts input values through the kernel, which results in a larger output shape. Verify the results in :numref:fig_trans_conv. .. code:: python X = nd.array([[0,1], [2,3]]) K = nd.array([[0,1], [2,3]]) trans_conv(X, K) .. parsed-literal:: :class: output [[ 0. 0. 1.] [ 0. 4. 6.] [ 4. 12. 9.]] Or we can use nn.Conv2DTranspose to obtain the same results. As nn.Conv2D, both input and kernel should be 4-D tensors. .. code:: python X, K = X.reshape((1, 1, 2, 2)), K.reshape((1, 1, 2, 2)) tconv = nn.Conv2DTranspose(1, kernel_size=2) tconv.initialize(init.Constant(K)) tconv(X) .. parsed-literal:: :class: output [[[[ 0. 0. 1.] [ 0. 4. 6.] [ 4. 12. 9.]]]] Padding, Strides, and Channels ------------------------------ We apply padding elements to the input in convolution, while they are applied to the output in transposed convolution. A :math:1\times 1 padding means we first compute the output as normal, then remove the first/last rows and columns. .. code:: python tconv = nn.Conv2DTranspose(1, kernel_size=2, padding=1) tconv.initialize(init.Constant(K)) tconv(X) .. parsed-literal:: :class: output [[[[4.]]]] Similarly, strides are applied to outputs as well. .. code:: python tconv = nn.Conv2DTranspose(1, kernel_size=2, strides=2) tconv.initialize(init.Constant(K)) tconv(X) .. parsed-literal:: :class: output [[[[0. 0. 0. 1.] [0. 0. 2. 3.] [0. 2. 0. 3.] [4. 6. 6. 9.]]]] The multi-channel extension of the transposed convolution is the same as the convolution. When the input has multiple channels, denoted by :math:c_i, the transposed convolution assigns a :math:k_h\times k_w kernel matrix to each input channel. If the output has a channel size :math:c_o, then we have a :math:c_i\times k_h\times k_w kernel for each output channel. As a result, if we feed :math:X into a convolutional layer :math:f to compute :math:Y=f(X) and create a transposed convolution layer :math:g with the same hyper-parameters as :math:f except for the output channel set to be the channel size of :math:X, then :math:g(Y) should has the same shape as :math:X. Let’s verify this statement. .. code:: python X = nd.random.uniform(shape=(1, 10, 16, 16)) conv = nn.Conv2D(20, kernel_size=5, padding=2, strides=3) tconv = nn.Conv2DTranspose(10, kernel_size=5, padding=2, strides=3) conv.initialize() tconv.initialize() tconv(conv(X)).shape == X.shape .. parsed-literal:: :class: output True Analogy to Matrix Transposition ------------------------------- The transposed convolution takes its name from the matrix transposition. In fact, convolution operations can also be achieved by matrix multiplication. In the example below, we define a :math:3\times input :math:X with a :math:2\times 2 kernel :math:K, and then use corr2d to compute the convolution output. .. code:: python X = nd.arange(9).reshape((3,3)) K = nd.array([[0,1], [2,3]]) Y = d2l.corr2d(X, K) Y .. parsed-literal:: :class: output [[19. 25.] [37. 43.]] Next, we rewrite convolution kernel :math:K as a matrix :math:W. Its shape will be :math:(4,9), where the :math:i-th row present applying the kernel to the input to generate the :math:i-th output element. .. code:: python def kernel2matrix(K): k, W = nd.zeros(5), nd.zeros((4, 9)) k[:2], k[3:5] = K[0,:], K[1,:] W[0, :5], W[1, 1:6], W[2, 3:8], W[3, 4:] = k, k, k, k return W W = kernel2matrix(K) W .. parsed-literal:: :class: output [[0. 1. 0. 2. 3. 0. 0. 0. 0.] [0. 0. 1. 0. 2. 3. 0. 0. 0.] [0. 0. 0. 0. 1. 0. 2. 3. 0.] [0. 0. 0. 0. 0. 1. 0. 2. 3.]] Then the convolution operator can be implemented by matrix multiplication with proper reshaping. .. code:: python Y == nd.dot(W, X.reshape((-1))).reshape((2,2)) .. parsed-literal:: :class: output [[1. 1.] [1. 1.]] We can implement transposed convolution as a matrix multiplication as well by reusing kernel2matrix. To reuse the generated :math:W, we construct a :math:2\times 2 input, so the corresponding weight matrix will have a shape :math:(9,4), which is :math:W^T. Let’s verify the results. .. code:: python X = nd.array([[0,1], [2,3]]) Y = trans_conv(X, K) Y == nd.dot(W.T, X.reshape((-1))).reshape((3,3)) .. parsed-literal:: :class: output [[1. 1. 1.] [1. 1. 1.] [1. 1. 1.]] Summary ------- - Compared to convolutions that reduce inputs through kernels, transposed convolutions broadcast inputs. - If a convolution layer reduces the input width and height by :math:n_w and :math:h_h time, respectively. Then a transposed convolution layer with the same kernel sizes, padding and strides will increase the input width and height by :math:n_w and :math:n_h, respectively. - We can implement convolution operations by the matrix multiplication, the corresponding transposed convolutions can be done by transposed matrix multiplication. Exercises --------- - Is it efficient to use matrix multiplication to implement convolution operations? Why?