.. raw:: html

.. raw:: latex \diilbookstyleinputcell .. code:: python def add_(): return ''' def add(a, b): return a + b ''' def fancy_func_(): return ''' def fancy_func(a, b, c, d): e = add(a, b) f = add(c, d) g = add(e, f) return g ''' def evoke_(): return add_() + fancy_func_() + 'print(fancy_func(1, 2, 3, 4))' prog = evoke_() print(prog) y = compile(prog, '', 'exec') exec(y) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output def add(a, b): return a + b def fancy_func(a, b, c, d): e = add(a, b) f = add(c, d) g = add(e, f) return g print(fancy_func(1, 2, 3, 4)) 10 .. raw:: html

.. raw:: html

pytorch mxnet tensorflow

.. raw:: html

As mentioned above, PyTorch is based on imperative programming and uses dynamic computation graphs. In an effort to leverage the portability and efficiency of symbolic programming, developers considered whether it would be possible to combine the benefits of both programming paradigms. This led to a torchscript that lets users develop and debug using pure imperative programming, while having the ability to convert most programs into symbolic programs to be run when product-level computing performance and deployment are required. .. raw:: html

.. raw:: html

When designing Gluon, developers considered whether it would be possible to combine the benefits of both programming paradigms. This led to a hybrid model that lets users develop and debug with pure imperative programming, while having the ability to convert most programs into symbolic programs to be run when product-level computing performance and deployment are required. In practice this means that we build models using the ``HybridBlock`` or ``HybridSequential`` class. By default, either of them is executed in the same way the ``Block`` or ``Sequential`` class is executed in imperative programming. The ``HybridSequential`` class is a subclass of ``HybridBlock`` (just like ``Sequential`` subclasses ``Block``). When the ``hybridize`` function is called, Gluon compiles the model into the form used in symbolic programming. This allows one to optimize the computation-intensive components without sacrifices in the way a model is implemented. We will illustrate the benefits below, focusing on sequential models and blocks. .. raw:: html

.. raw:: html

The imperative programming paradigm is now the default in Tensorflow 2, a welcoming change for those new to the language. However, the same symbolic programming techniques and subsequent computational graphs still exist in TensorFlow, and can be accessed by the easy-to-use ``tf.function`` decorator. This brought the imperative programming paradigm to TensorFlow, allowed users to define more intuitive functions, then wrap them and compile them into computational graphs automatically using a feature the TensorFlow team refers to as `autograph `__. .. raw:: html

.. raw:: html

pytorch mxnet tensorflow

.. raw:: html

.. raw:: latex \diilbookstyleinputcell .. code:: python import torch from torch import nn from d2l import torch as d2l # Factory for networks def get_net(): net = nn.Sequential(nn.Linear(512, 256), nn.ReLU(), nn.Linear(256, 128), nn.ReLU(), nn.Linear(128, 2)) return net x = torch.randn(size=(1, 512)) net = get_net() net(x) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output tensor([[-0.1602, 0.0003]], grad_fn=) By converting the model using ``torch.jit.script`` function, we are able to compile and optimize the computation in the MLP. The model’s computation result remains unchanged. .. raw:: latex \diilbookstyleinputcell .. code:: python net = torch.jit.script(net) net(x) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output tensor([[-0.1602, 0.0003]], grad_fn=) This seems almost too good to be true: write the same code as before and simply convert the model using ``torch.jit.script``. Once this happens the network is optimized (we will benchmark the performance below). .. raw:: html

.. raw:: html

.. raw:: latex \diilbookstyleinputcell .. code:: python from mxnet import np, npx from mxnet.gluon import nn from d2l import mxnet as d2l npx.set_np() # Factory for networks def get_net(): net = nn.HybridSequential() net.add(nn.Dense(256, activation='relu'), nn.Dense(128, activation='relu'), nn.Dense(2)) net.initialize() return net x = np.random.normal(size=(1, 512)) net = get_net() net(x) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output [22:07:10] ../src/storage/storage.cc:196: Using Pooled (Naive) StorageManager for CPU .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output array([[ 0.16526175, -0.14005634]]) By calling the ``hybridize`` function, we are able to compile and optimize the computation in the MLP. The model’s computation result remains unchanged. .. raw:: latex \diilbookstyleinputcell .. code:: python net.hybridize() net(x) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output array([[ 0.16526175, -0.14005634]]) This seems almost too good to be true: simply designate a block to be ``HybridSequential``, write the same code as before and invoke ``hybridize``. Once this happens the network is optimized (we will benchmark the performance below). Unfortunately this does not work magically for every layer. That said, a layer will not be optimized if it inherits from the ``Block`` class instead of the ``HybridBlock`` class. .. raw:: html

.. raw:: html

.. raw:: latex \diilbookstyleinputcell .. code:: python import tensorflow as tf from tensorflow.keras.layers import Dense from d2l import tensorflow as d2l # Factory for networks def get_net(): net = tf.keras.Sequential() net.add(Dense(256, input_shape = (512,), activation = "relu")) net.add(Dense(128, activation = "relu")) net.add(Dense(2, activation = "linear")) return net x = tf.random.normal([1,512]) net = get_net() net(x) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output Formerly, all functions built in TensorFlow were built as a computational graph, and therefore JIT compiled by default. However, with the release of TensorFlow 2.X and EagerTensor, this is no longer the default behavor. We cen re-enable this functionality with tf.function. tf.function is more commonly used as a function decorator, however it is possible to call it direcly as a normal python function, shown below. The model’s computation result remains unchanged. .. raw:: latex \diilbookstyleinputcell .. code:: python net = tf.function(net) net(x) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output This seems almost too good to be true: write the same code as before and simply convert the model using ``tf.function``. Once this happens the network is built as a computational graph in TensorFlow’s MLIR intermediate representation and is heavily optimized at the compiler level for rapid execution (we will benchmark the performance below). Explicitly adding the ``jit_compile = True`` flag to the ``tf.function()`` call enables XLA (Accelerated Linear Algebra) functionality in TensorFlow. XLA can further optimize JIT compiled code in certain instances. Graph-mode execution is enabled without this explicit definition, however XLA can make certain large linear algebra operations (in the vein of those we see in deep learning applications) much faster, particularly in a GPU environment. .. raw:: html

.. raw:: html

pytorch mxnet tensorflow

.. raw:: html

.. raw:: latex \diilbookstyleinputcell .. code:: python #@save class Benchmark: """For measuring running time.""" def __init__(self, description='Done'): self.description = description def __enter__(self): self.timer = d2l.Timer() return self def __exit__(self, *args): print(f'{self.description}: {self.timer.stop():.4f} sec') Now we can invoke the network twice, once with and once without torchscript. .. raw:: latex \diilbookstyleinputcell .. code:: python net = get_net() with Benchmark('Without torchscript'): for i in range(1000): net(x) net = torch.jit.script(net) with Benchmark('With torchscript'): for i in range(1000): net(x) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output Without torchscript: 2.1447 sec With torchscript: 4.0545 sec As is observed in the above results, after an ``nn.Sequential`` instance is scripted using the ``torch.jit.script`` function, computing performance is improved through the use of symbolic programming. .. raw:: html

.. raw:: html

.. raw:: latex \diilbookstyleinputcell .. code:: python #@save class Benchmark: """For measuring running time.""" def __init__(self, description='Done'): self.description = description def __enter__(self): self.timer = d2l.Timer() return self def __exit__(self, *args): print(f'{self.description}: {self.timer.stop():.4f} sec') Now we can invoke the network twice, once with and once without hybridization. .. raw:: latex \diilbookstyleinputcell .. code:: python net = get_net() with Benchmark('Without hybridization'): for i in range(1000): net(x) npx.waitall() net.hybridize() with Benchmark('With hybridization'): for i in range(1000): net(x) npx.waitall() .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output Without hybridization: 0.7242 sec With hybridization: 0.4670 sec As is observed in the above results, after a ``HybridSequential`` instance calls the ``hybridize`` function, computing performance is improved through the use of symbolic programming. .. raw:: html

.. raw:: html

.. raw:: latex \diilbookstyleinputcell .. code:: python #@save class Benchmark: """For measuring running time.""" def __init__(self, description='Done'): self.description = description def __enter__(self): self.timer = d2l.Timer() return self def __exit__(self, *args): print(f'{self.description}: {self.timer.stop():.4f} sec') Now we can invoke the network three times, once executed eagerly, once with graph-mode execution, and again using JIT compiled XLA. .. raw:: latex \diilbookstyleinputcell .. code:: python net = get_net() with Benchmark('Eager Mode'): for i in range(1000): net(x) net = tf.function(net) with Benchmark('Graph Mode'): for i in range(1000): net(x) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output Eager Mode: 1.9038 sec Graph Mode: 0.4864 sec As is observed in the above results, after a ``tf.keras.Sequential`` instance is scripted using the ``tf.function`` function, computing performance is improved through the use of symbolic programming via graph-mode execution in tensorflow. .. raw:: html

.. raw:: html

pytorch mxnet tensorflow

.. raw:: html

One of the benefits of compiling the models is that we can serialize (save) the model and its parameters to disk. This allows us to store a model in a manner that is independent of the front-end language of choice. This allows us to deploy trained models to other devices and easily use other front-end programming languages. At the same time the code is often faster than what can be achieved in imperative programming. Let’s see the ``export`` function in action. .. raw:: latex \diilbookstyleinputcell .. code:: python net.export('my_mlp') !ls -lh my_mlp* .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output -rw-r--r-- 1 ci ci 643K Aug 18 22:07 my_mlp-0000.params -rw-r--r-- 1 ci ci 3.2K Aug 18 22:07 my_mlp-symbol.json The model is decomposed into a (large binary) parameter file and a JSON description of the program required to execute the model computation. The files can be read by other front-end languages supported by Python or MXNet, such as C++, R, Scala, and Perl. Let’s have a look at the first few lines in the model description. .. raw:: latex \diilbookstyleinputcell .. code:: python !head my_mlp-symbol.json .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output { "nodes": [ { "op": "null", "name": "data", "inputs": [] }, { "op": "null", "name": "dense3_weight", Earlier, we demonstrated that, after calling the ``hybridize`` function, the model is able to achieve superior computing performance and portability. Note, though that hybridization can affect model flexibility, in particular in terms of control flow. Besides, contrary to the ``Block`` instance, which needs to use the ``forward`` function, for a ``HybridBlock`` instance we need to use the ``hybrid_forward`` function. .. raw:: latex \diilbookstyleinputcell .. code:: python class HybridNet(nn.HybridBlock): def __init__(self, **kwargs): super(HybridNet, self).__init__(**kwargs) self.hidden = nn.Dense(4) self.output = nn.Dense(2) def hybrid_forward(self, F, x): print('module F: ', F) print('value x: ', x) x = F.npx.relu(self.hidden(x)) print('result : ', x) return self.output(x) The code above implements a simple network with 4 hidden units and 2 outputs. The ``hybrid_forward`` function takes an additional argument ``F``. This is needed since, depending on whether the code has been hybridized or not, it will use a slightly different library (``ndarray`` or ``symbol``) for processing. Both classes perform very similar functions and MXNet automatically determines the argument. To understand what is going on we print the arguments as part of the function invocation. .. raw:: latex \diilbookstyleinputcell .. code:: python net = HybridNet() net.initialize() x = np.random.normal(size=(1, 3)) net(x) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output module F: value x: [[-0.6338663 0.40156594 0.46456942]] result : [[0.01641375 0. 0. 0. ]] .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output array([[0.00097611, 0.00019453]]) Repeating the forward computation will lead to the same output (we omit details). Now let’s see what happens if we invoke the ``hybridize`` function. .. raw:: latex \diilbookstyleinputcell .. code:: python net.hybridize() net(x) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output module F: value x: <_Symbol data> result : <_Symbol hybridnet0_relu0> .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output array([[0.00097611, 0.00019453]]) Instead of using ``ndarray`` we now use the ``symbol`` module for ``F``. Moreover, even though the input is of ``ndarray`` type, the data flowing through the network is now converted to ``symbol`` type as part of the compilation process. Repeating the function call leads to a surprising outcome: .. raw:: latex \diilbookstyleinputcell .. code:: python net(x) .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output array([[0.00097611, 0.00019453]]) This is quite different from what we saw previously. All print statements, as defined in ``hybrid_forward``, are omitted. Indeed, after hybridization the execution of ``net(x)`` does not involve the Python interpreter any longer. This means that any spurious Python code is omitted (such as print statements) in favor of a much more streamlined execution and better performance. Instead, MXNet directly calls the C++ backend. Also note that some functions are not supported in the ``symbol`` module (e.g., ``asnumpy``) and operations in-place such as ``a += b`` and ``a[:] = a + b`` must be rewritten as ``a = a + b``. Nonetheless, compilation of models is worth the effort whenever speed matters. The benefit can range from small percentage points to more than twice the speed, depending on the complexity of the model, the speed of the CPU, and the speed and number of GPUs. .. raw:: html

.. raw:: html