.. _sec_basic_gan:

Generative Adversarial Networks
===============================


Throughout most of this book, we have talked about how to make
predictions. In some form or another, we used deep neural networks to
learn mappings from data examples to labels. This kind of learning is
called discriminative learning, as in, we’d like to be able to
discriminate between photos of cats and photos of dogs. Classifiers and
regressors are both examples of discriminative learning. And neural
networks trained by backpropagation have upended everything we thought
we knew about discriminative learning on large complicated datasets.
Classification accuracies on high-res images have gone from useless to
human-level (with some caveats) in just 5-6 years. We will spare you
another spiel about all the other discriminative tasks where deep neural
networks do astoundingly well.

But there is more to machine learning than just solving discriminative
tasks. For example, given a large dataset, without any labels, we might
want to learn a model that concisely captures the characteristics of
this data. Given such a model, we could sample synthetic data examples
that resemble the distribution of the training data. For example, given
a large corpus of photographs of faces, we might want to be able to
generate a new photorealistic image that looks like it might plausibly
have come from the same dataset. This kind of learning is called
generative modeling.

Until recently, we had no method that could synthesize novel
photorealistic images. But the success of deep neural networks for
discriminative learning opened up new possibilities. One big trend over
the last three years has been the application of discriminative deep
nets to overcome challenges in problems that we do not generally think
of as supervised learning problems. The recurrent neural network
language models are one example of using a discriminative network
(trained to predict the next character) that once trained can act as a
generative model.

In 2014, a breakthrough paper introduced Generative adversarial networks
(GANs) :cite:`Goodfellow.Pouget-Abadie.Mirza.ea.2014`, a clever new
way to leverage the power of discriminative models to get good
generative models. At their heart, GANs rely on the idea that a data
generator is good if we cannot tell fake data apart from real data. In
statistics, this is called a two-sample test - a test to answer the
question whether datasets :math:`X=\{x_1,\ldots, x_n\}` and
:math:`X'=\{x'_1,\ldots, x'_n\}` were drawn from the same distribution.
The main difference between most statistics papers and GANs is that the
latter use this idea in a constructive way. In other words, rather than
just training a model to say “hey, these two datasets do not look like
they came from the same distribution”, they use the `two-sample
test <https://en.wikipedia.org/wiki/Two-sample_hypothesis_testing>`__ to
provide training signals to a generative model. This allows us to
improve the data generator until it generates something that resembles
the real data. At the very least, it needs to fool the classifier even
if our classifier is a state of the art deep neural network.

.. _fig_gan:

.. figure:: ../img/gan.svg

   Generative Adversarial Networks


The GAN architecture is illustrated in :numref:`fig_gan`. As you can
see, there are two pieces in GAN architecture - first off, we need a
device (say, a deep network but it really could be anything, such as a
game rendering engine) that might potentially be able to generate data
that looks just like the real thing. If we are dealing with images, this
needs to generate images. If we are dealing with speech, it needs to
generate audio sequences, and so on. We call this the generator network.
The second component is the discriminator network. It attempts to
distinguish fake and real data from each other. Both networks are in
competition with each other. The generator network attempts to fool the
discriminator network. At that point, the discriminator network adapts
to the new fake data. This information, in turn is used to improve the
generator network, and so on.

The discriminator is a binary classifier to distinguish if the input
:math:`x` is real (from real data) or fake (from the generator).
Typically, the discriminator outputs a scalar prediction
:math:`o\in\mathbb R` for input :math:`\mathbf x`, such as using a fully
connected layer with hidden size 1, and then applies sigmoid function to
obtain the predicted probability :math:`D(\mathbf x) = 1/(1+e^{-o})`.
Assume the label :math:`y` for the true data is :math:`1` and :math:`0`
for the fake data. We train the discriminator to minimize the
cross-entropy loss, *i.e.*,

.. math::  \min_D \{ - y \log D(\mathbf x) - (1-y)\log(1-D(\mathbf x)) \},

For the generator, it first draws some parameter
:math:`\mathbf z\in\mathbb R^d` from a source of randomness, *e.g.*, a
normal distribution :math:`\mathbf z \sim \mathcal{N} (0, 1)`. We often
call :math:`\mathbf z` as the latent variable. It then applies a
function to generate :math:`\mathbf x'=G(\mathbf z)`. The goal of the
generator is to fool the discriminator to classify
:math:`\mathbf x'=G(\mathbf z)` as true data, *i.e.*, we want
:math:`D( G(\mathbf z)) \approx 1`. In other words, for a given
discriminator :math:`D`, we update the parameters of the generator
:math:`G` to maximize the cross-entropy loss when :math:`y=0`, *i.e.*,

.. math::  \max_G \{ - (1-y) \log(1-D(G(\mathbf z))) \} = \max_G \{ - \log(1-D(G(\mathbf z))) \}.

If the generator does a perfect job, then
:math:`D(\mathbf x')\approx 1`, so the above loss is near 0, which
results in the gradients that are too small to make good progress for
the discriminator. So commonly, we minimize the following loss:

.. math::  \min_G \{ - y \log(D(G(\mathbf z))) \} = \min_G \{ - \log(D(G(\mathbf z))) \}, 

which is just feeding :math:`\mathbf x'=G(\mathbf z)` into the
discriminator but giving label :math:`y=1`.

To sum up, :math:`D` and :math:`G` are playing a “minimax” game with the
comprehensive objective function:

.. math:: \min_D \max_G \{ -E_{x \sim \textrm{Data}} \log D(\mathbf x) - E_{z \sim \textrm{Noise}} \log(1 - D(G(\mathbf z))) \}.

Many of the GANs applications are in the context of images. As a
demonstration purpose, we are going to content ourselves with fitting a
much simpler distribution first. We will illustrate what happens if we
use GANs to build the world’s most inefficient estimator of parameters
for a Gaussian. Let’s get started.


.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#pytorch-1-0" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab is-active">pytorch</a><a href="#mxnet-1-1" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab ">mxnet</a><a href="#tensorflow-1-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

    <div class="mdl-tabs__panel is-active" id="pytorch-1-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    %matplotlib inline
    import torch
    from torch import nn
    from d2l import torch as d2l


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="mxnet-1-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    %matplotlib inline
    from mxnet import autograd, gluon, init, np, npx
    from mxnet.gluon import nn
    from d2l import mxnet as d2l
    
    npx.set_np()


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="tensorflow-1-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    import tensorflow as tf
    from d2l import tensorflow as d2l


.. raw:: html

    </div>


.. raw:: html

    </div>

Generate Some “Real” Data
-------------------------

Since this is going to be the world’s lamest example, we simply generate
data drawn from a Gaussian.


.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#pytorch-3-0" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab is-active">pytorch</a><a href="#mxnet-3-1" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab ">mxnet</a><a href="#tensorflow-3-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

    <div class="mdl-tabs__panel is-active" id="pytorch-3-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = torch.normal(0.0, 1, (1000, 2))
    A = torch.tensor([[1, 2], [-0.1, 0.5]])
    b = torch.tensor([1, 2])
    data = torch.matmul(X, A) + b


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="mxnet-3-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = np.random.normal(0.0, 1, (1000, 2))
    A = np.array([[1, 2], [-0.1, 0.5]])
    b = np.array([1, 2])
    data = np.dot(X, A) + b


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    [21:54:27] ../src/storage/storage.cc:196: Using Pooled (Naive) StorageManager for CPU


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="tensorflow-3-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = tf.random.normal((1000, 2), 0.0, 1)
    A = tf.constant([[1, 2], [-0.1, 0.5]])
    b = tf.constant([1, 2], tf.float32)
    data = tf.matmul(X, A) + b


.. raw:: html

    </div>


.. raw:: html

    </div>

Let’s see what we got. This should be a Gaussian shifted in some rather
arbitrary way with mean :math:`b` and covariance matrix :math:`A^TA`.


.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#pytorch-5-0" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab is-active">pytorch</a><a href="#mxnet-5-1" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab ">mxnet</a><a href="#tensorflow-5-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

    <div class="mdl-tabs__panel is-active" id="pytorch-5-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    d2l.set_figsize()
    d2l.plt.scatter(data[:100, (0)].detach().numpy(), data[:100, (1)].detach().numpy());
    print(f'The covariance matrix is\n{torch.matmul(A.T, A)}')


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    The covariance matrix is
    tensor([[1.0100, 1.9500],
            [1.9500, 4.2500]])


.. figure:: output_gan_bc32f8_27_1.svg


.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    batch_size = 8
    data_iter = d2l.load_array((data,), batch_size)


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="mxnet-5-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    d2l.set_figsize()
    d2l.plt.scatter(data[:100, (0)].asnumpy(), data[:100, (1)].asnumpy());
    print(f'The covariance matrix is\n{np.dot(A.T, A)}')


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    The covariance matrix is
    [[1.01 1.95]
     [1.95 4.25]]


.. figure:: output_gan_bc32f8_31_1.svg


.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    batch_size = 8
    data_iter = d2l.load_array((data,), batch_size)


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="tensorflow-5-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    d2l.set_figsize()
    d2l.plt.scatter(data[:100, 0].numpy(), data[:100, 1].numpy());
    print(f'The covariance matrix is\n{tf.matmul(A, A, transpose_a=True)}')


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    The covariance matrix is
    [[1.01 1.95]
     [1.95 4.25]]


.. figure:: output_gan_bc32f8_35_1.svg


.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    batch_size = 8
    data_iter = d2l.load_array((data,), batch_size)


.. raw:: html

    </div>


.. raw:: html

    </div>

Generator
---------

Our generator network will be the simplest network possible - a single
layer linear model. This is since we will be driving that linear network
with a Gaussian data generator. Hence, it literally only needs to learn
the parameters to fake things perfectly.


.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#pytorch-7-0" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab is-active">pytorch</a><a href="#mxnet-7-1" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab ">mxnet</a><a href="#tensorflow-7-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

    <div class="mdl-tabs__panel is-active" id="pytorch-7-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    net_G = nn.Sequential(nn.Linear(2, 2))


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="mxnet-7-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    net_G = nn.Sequential()
    net_G.add(nn.Dense(2))


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="tensorflow-7-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    net_G = tf.keras.layers.Dense(2)


.. raw:: html

    </div>


.. raw:: html

    </div>

Discriminator
-------------

For the discriminator we will be a bit more discriminating: we will use
an MLP with 3 layers to make things a bit more interesting.


.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#pytorch-9-0" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab is-active">pytorch</a><a href="#mxnet-9-1" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab ">mxnet</a><a href="#tensorflow-9-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

    <div class="mdl-tabs__panel is-active" id="pytorch-9-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    net_D = nn.Sequential(
        nn.Linear(2, 5), nn.Tanh(),
        nn.Linear(5, 3), nn.Tanh(),
        nn.Linear(3, 1))


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="mxnet-9-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    net_D = nn.Sequential()
    net_D.add(nn.Dense(5, activation='tanh'),
              nn.Dense(3, activation='tanh'),
              nn.Dense(1))


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="tensorflow-9-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    net_D = tf.keras.models.Sequential([
        tf.keras.layers.Dense(5, activation="tanh", input_shape=(2,)),
        tf.keras.layers.Dense(3, activation="tanh"),
        tf.keras.layers.Dense(1)
    ])


.. raw:: html

    </div>


.. raw:: html

    </div>

Training
--------

First we define a function to update the discriminator.


.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#pytorch-11-0" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab is-active">pytorch</a><a href="#mxnet-11-1" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab ">mxnet</a><a href="#tensorflow-11-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

    <div class="mdl-tabs__panel is-active" id="pytorch-11-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    #@save
    def update_D(X, Z, net_D, net_G, loss, trainer_D):
        """Update discriminator."""
        batch_size = X.shape[0]
        ones = torch.ones((batch_size,), device=X.device)
        zeros = torch.zeros((batch_size,), device=X.device)
        trainer_D.zero_grad()
        real_Y = net_D(X)
        fake_X = net_G(Z)
        # Do not need to compute gradient for `net_G`, detach it from
        # computing gradients.
        fake_Y = net_D(fake_X.detach())
        loss_D = (loss(real_Y, ones.reshape(real_Y.shape)) +
                  loss(fake_Y, zeros.reshape(fake_Y.shape))) / 2
        loss_D.backward()
        trainer_D.step()
        return loss_D


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="mxnet-11-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    #@save
    def update_D(X, Z, net_D, net_G, loss, trainer_D):
        """Update discriminator."""
        batch_size = X.shape[0]
        ones = np.ones((batch_size,), ctx=X.ctx)
        zeros = np.zeros((batch_size,), ctx=X.ctx)
        with autograd.record():
            real_Y = net_D(X)
            fake_X = net_G(Z)
            # Do not need to compute gradient for `net_G`, detach it from
            # computing gradients.
            fake_Y = net_D(fake_X.detach())
            loss_D = (loss(real_Y, ones) + loss(fake_Y, zeros)) / 2
        loss_D.backward()
        trainer_D.step(batch_size)
        return float(loss_D.sum())


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="tensorflow-11-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    #@save
    def update_D(X, Z, net_D, net_G, loss, optimizer_D):
        """Update discriminator."""
        batch_size = X.shape[0]
        ones = tf.ones((batch_size,)) # Labels corresponding to real data
        zeros = tf.zeros((batch_size,)) # Labels corresponding to fake data
        # Do not need to compute gradient for `net_G`, so it is outside GradientTape
        fake_X = net_G(Z)
        with tf.GradientTape() as tape:
            real_Y = net_D(X)
            fake_Y = net_D(fake_X)
            # We multiply the loss by batch_size to match PyTorch's BCEWithLogitsLoss
            loss_D = (loss(ones, tf.squeeze(real_Y)) + loss(
                zeros, tf.squeeze(fake_Y))) * batch_size / 2
        grads_D = tape.gradient(loss_D, net_D.trainable_variables)
        optimizer_D.apply_gradients(zip(grads_D, net_D.trainable_variables))
        return loss_D


.. raw:: html

    </div>


.. raw:: html

    </div>

The generator is updated similarly. Here we reuse the cross-entropy loss
but change the label of the fake data from :math:`0` to :math:`1`.


.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#pytorch-13-0" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab is-active">pytorch</a><a href="#mxnet-13-1" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab ">mxnet</a><a href="#tensorflow-13-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

    <div class="mdl-tabs__panel is-active" id="pytorch-13-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    #@save
    def update_G(Z, net_D, net_G, loss, trainer_G):
        """Update generator."""
        batch_size = Z.shape[0]
        ones = torch.ones((batch_size,), device=Z.device)
        trainer_G.zero_grad()
        # We could reuse `fake_X` from `update_D` to save computation
        fake_X = net_G(Z)
        # Recomputing `fake_Y` is needed since `net_D` is changed
        fake_Y = net_D(fake_X)
        loss_G = loss(fake_Y, ones.reshape(fake_Y.shape))
        loss_G.backward()
        trainer_G.step()
        return loss_G


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="mxnet-13-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    #@save
    def update_G(Z, net_D, net_G, loss, trainer_G):
        """Update generator."""
        batch_size = Z.shape[0]
        ones = np.ones((batch_size,), ctx=Z.ctx)
        with autograd.record():
            # We could reuse `fake_X` from `update_D` to save computation
            fake_X = net_G(Z)
            # Recomputing `fake_Y` is needed since `net_D` is changed
            fake_Y = net_D(fake_X)
            loss_G = loss(fake_Y, ones)
        loss_G.backward()
        trainer_G.step(batch_size)
        return float(loss_G.sum())


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="tensorflow-13-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    #@save
    def update_G(Z, net_D, net_G, loss, optimizer_G):
        """Update generator."""
        batch_size = Z.shape[0]
        ones = tf.ones((batch_size,))
        with tf.GradientTape() as tape:
            # We could reuse `fake_X` from `update_D` to save computation
            fake_X = net_G(Z)
            # Recomputing `fake_Y` is needed since `net_D` is changed
            fake_Y = net_D(fake_X)
            # We multiply the loss by batch_size to match PyTorch's BCEWithLogits loss
            loss_G = loss(ones, tf.squeeze(fake_Y)) * batch_size
        grads_G = tape.gradient(loss_G, net_G.trainable_variables)
        optimizer_G.apply_gradients(zip(grads_G, net_G.trainable_variables))
        return loss_G


.. raw:: html

    </div>


.. raw:: html

    </div>

Both the discriminator and the generator performs a binary logistic
regression with the cross-entropy loss. We use Adam to smooth the
training process. In each iteration, we first update the discriminator
and then the generator. We visualize both losses and generated examples.


.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#pytorch-15-0" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab is-active">pytorch</a><a href="#mxnet-15-1" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab ">mxnet</a><a href="#tensorflow-15-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

    <div class="mdl-tabs__panel is-active" id="pytorch-15-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    def train(net_D, net_G, data_iter, num_epochs, lr_D, lr_G, latent_dim, data):
        loss = nn.BCEWithLogitsLoss(reduction='sum')
        for w in net_D.parameters():
            nn.init.normal_(w, 0, 0.02)
        for w in net_G.parameters():
            nn.init.normal_(w, 0, 0.02)
        trainer_D = torch.optim.Adam(net_D.parameters(), lr=lr_D)
        trainer_G = torch.optim.Adam(net_G.parameters(), lr=lr_G)
        animator = d2l.Animator(xlabel='epoch', ylabel='loss',
                                xlim=[1, num_epochs], nrows=2, figsize=(5, 5),
                                legend=['discriminator', 'generator'])
        animator.fig.subplots_adjust(hspace=0.3)
        for epoch in range(num_epochs):
            # Train one epoch
            timer = d2l.Timer()
            metric = d2l.Accumulator(3)  # loss_D, loss_G, num_examples
            for (X,) in data_iter:
                batch_size = X.shape[0]
                Z = torch.normal(0, 1, size=(batch_size, latent_dim))
                metric.add(update_D(X, Z, net_D, net_G, loss, trainer_D),
                           update_G(Z, net_D, net_G, loss, trainer_G),
                           batch_size)
            # Visualize generated examples
            Z = torch.normal(0, 1, size=(100, latent_dim))
            fake_X = net_G(Z).detach().numpy()
            animator.axes[1].cla()
            animator.axes[1].scatter(data[:, 0], data[:, 1])
            animator.axes[1].scatter(fake_X[:, 0], fake_X[:, 1])
            animator.axes[1].legend(['real', 'generated'])
            # Show the losses
            loss_D, loss_G = metric[0]/metric[2], metric[1]/metric[2]
            animator.add(epoch + 1, (loss_D, loss_G))
        print(f'loss_D {loss_D:.3f}, loss_G {loss_G:.3f}, '
              f'{metric[2] / timer.stop():.1f} examples/sec')


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="mxnet-15-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    def train(net_D, net_G, data_iter, num_epochs, lr_D, lr_G, latent_dim, data):
        loss = gluon.loss.SigmoidBCELoss()
        net_D.initialize(init=init.Normal(0.02), force_reinit=True)
        net_G.initialize(init=init.Normal(0.02), force_reinit=True)
        trainer_D = gluon.Trainer(net_D.collect_params(),
                                  'adam', {'learning_rate': lr_D})
        trainer_G = gluon.Trainer(net_G.collect_params(),
                                  'adam', {'learning_rate': lr_G})
        animator = d2l.Animator(xlabel='epoch', ylabel='loss',
                                xlim=[1, num_epochs], nrows=2, figsize=(5, 5),
                                legend=['discriminator', 'generator'])
        animator.fig.subplots_adjust(hspace=0.3)
        for epoch in range(num_epochs):
            # Train one epoch
            timer = d2l.Timer()
            metric = d2l.Accumulator(3)  # loss_D, loss_G, num_examples
            for X in data_iter:
                batch_size = X.shape[0]
                Z = np.random.normal(0, 1, size=(batch_size, latent_dim))
                metric.add(update_D(X, Z, net_D, net_G, loss, trainer_D),
                           update_G(Z, net_D, net_G, loss, trainer_G),
                           batch_size)
            # Visualize generated examples
            Z = np.random.normal(0, 1, size=(100, latent_dim))
            fake_X = net_G(Z).asnumpy()
            animator.axes[1].cla()
            animator.axes[1].scatter(data[:, 0], data[:, 1])
            animator.axes[1].scatter(fake_X[:, 0], fake_X[:, 1])
            animator.axes[1].legend(['real', 'generated'])
            # Show the losses
            loss_D, loss_G = metric[0]/metric[2], metric[1]/metric[2]
            animator.add(epoch + 1, (loss_D, loss_G))
        print(f'loss_D {loss_D:.3f}, loss_G {loss_G:.3f}, '
              f'{metric[2] / timer.stop():.1f} examples/sec')


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="tensorflow-15-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    def train(net_D, net_G, data_iter, num_epochs, lr_D, lr_G, latent_dim, data):
        loss = tf.keras.losses.BinaryCrossentropy(
            from_logits=True, reduction=tf.keras.losses.Reduction.SUM)
        for w in net_D.trainable_variables:
            w.assign(tf.random.normal(mean=0, stddev=0.02, shape=w.shape))
        for w in net_G.trainable_variables:
            w.assign(tf.random.normal(mean=0, stddev=0.02, shape=w.shape))
        optimizer_D = tf.keras.optimizers.Adam(learning_rate=lr_D)
        optimizer_G = tf.keras.optimizers.Adam(learning_rate=lr_G)
        animator = d2l.Animator(
            xlabel="epoch", ylabel="loss", xlim=[1, num_epochs], nrows=2,
            figsize=(5, 5), legend=["discriminator", "generator"])
        animator.fig.subplots_adjust(hspace=0.3)
        for epoch in range(num_epochs):
            # Train one epoch
            timer = d2l.Timer()
            metric = d2l.Accumulator(3)  # loss_D, loss_G, num_examples
            for (X,) in data_iter:
                batch_size = X.shape[0]
                Z = tf.random.normal(
                    mean=0, stddev=1, shape=(batch_size, latent_dim))
                metric.add(update_D(X, Z, net_D, net_G, loss, optimizer_D),
                           update_G(Z, net_D, net_G, loss, optimizer_G),
                           batch_size)
            # Visualize generated examples
            Z = tf.random.normal(mean=0, stddev=1, shape=(100, latent_dim))
            fake_X = net_G(Z)
            animator.axes[1].cla()
            animator.axes[1].scatter(data[:, 0], data[:, 1])
            animator.axes[1].scatter(fake_X[:, 0], fake_X[:, 1])
            animator.axes[1].legend(["real", "generated"])
    
            # Show the losses
            loss_D, loss_G = metric[0] / metric[2], metric[1] / metric[2]
            animator.add(epoch + 1, (loss_D, loss_G))
    
        print(f'loss_D {loss_D:.3f}, loss_G {loss_G:.3f}, '
              f'{metric[2] / timer.stop():.1f} examples/sec')


.. raw:: html

    </div>


.. raw:: html

    </div>

Now we specify the hyperparameters to fit the Gaussian distribution.


.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#pytorch-17-0" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab is-active">pytorch</a><a href="#mxnet-17-1" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab ">mxnet</a><a href="#tensorflow-17-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

    <div class="mdl-tabs__panel is-active" id="pytorch-17-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    lr_D, lr_G, latent_dim, num_epochs = 0.05, 0.005, 2, 20
    train(net_D, net_G, data_iter, num_epochs, lr_D, lr_G,
          latent_dim, data[:100].detach().numpy())


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    loss_D 0.693, loss_G 0.693, 1020.0 examples/sec


.. figure:: output_gan_bc32f8_102_1.svg


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="mxnet-17-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    lr_D, lr_G, latent_dim, num_epochs = 0.05, 0.005, 2, 20
    train(net_D, net_G, data_iter, num_epochs, lr_D, lr_G,
          latent_dim, data[:100].asnumpy())


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    loss_D 0.693, loss_G 0.693, 265.2 examples/sec


.. figure:: output_gan_bc32f8_105_1.svg


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="tensorflow-17-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    lr_D, lr_G, latent_dim, num_epochs = 0.05, 0.005, 2, 20
    train(net_D, net_G, data_iter, num_epochs, lr_D, lr_G,
          latent_dim, data[:100].numpy())


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    loss_D 0.693, loss_G 0.693, 167.5 examples/sec


.. figure:: output_gan_bc32f8_108_1.svg


.. raw:: html

    </div>


.. raw:: html

    </div>

Summary
-------

-  Generative adversarial networks (GANs) composes of two deep networks,
   the generator and the discriminator.
-  The generator generates the image as much closer to the true image as
   possible to fool the discriminator, via maximizing the cross-entropy
   loss, *i.e.*, :math:`\max \log(D(\mathbf{x'}))`.
-  The discriminator tries to distinguish the generated images from the
   true images, via minimizing the cross-entropy loss, *i.e.*,
   :math:`\min - y \log D(\mathbf{x}) - (1-y)\log(1-D(\mathbf{x}))`.

Exercises
---------

-  Does an equilibrium exist where the generator wins, *i.e.* the
   discriminator ends up unable to distinguish the two distributions on
   finite samples?


.. raw:: html

    <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#pytorch-19-0" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab is-active">pytorch</a><a href="#mxnet-19-1" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab ">mxnet</a></div>


.. raw:: html

    <div class="mdl-tabs__panel is-active" id="pytorch-19-0">

`Discussions <https://discuss.d2l.ai/t/1082>`__


.. raw:: html

    </div>


.. raw:: html

    <div class="mdl-tabs__panel " id="mxnet-19-1">

`Discussions <https://discuss.d2l.ai/t/408>`__


.. raw:: html

    </div>


.. raw:: html

    </div>