{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 0
   },
   "source": [
    "# Convolutions for Images\n",
    ":label:`sec_conv_layer`\n",
    "\n",
    "Now that we understand how convolutional layers work in theory,\n",
    "we are ready to see how they work in practice.\n",
    "Building on our motivation of convolutional neural networks\n",
    "as efficient architectures for exploring structure in image data,\n",
    "we stick with images as our running example.\n",
    "\n",
    "\n",
    "## The Cross-Correlation Operation\n",
    "\n",
    "Recall that strictly speaking, convolutional layers\n",
    "are a  misnomer, since the operations they express\n",
    "are more accurately described as cross-correlations.\n",
    "Based on our descriptions of convolutional layers in :numref:`sec_why-conv`,\n",
    "in such a layer, an input tensor\n",
    "and a kernel tensor are combined\n",
    "to produce an output tensor through a (**cross-correlation operation.**)\n",
    "\n",
    "Let us ignore channels for now and see how this works\n",
    "with two-dimensional data and hidden representations.\n",
    "In :numref:`fig_correlation`,\n",
    "the input is a two-dimensional tensor\n",
    "with a height of 3 and width of 3.\n",
    "We mark the shape of the tensor as $3 \\times 3$ or ($3$, $3$).\n",
    "The height and width of the kernel are both 2.\n",
    "The shape of the *kernel window* (or *convolution window*)\n",
    "is given by the height and width of the kernel\n",
    "(here it is $2 \\times 2$).\n",
    "\n",
    "![Two-dimensional cross-correlation operation. The shaded portions are the first output element as well as the input and kernel tensor elements used for the output computation: $0\\times0+1\\times1+3\\times2+4\\times3=19$.](../img/correlation.svg)\n",
    ":label:`fig_correlation`\n",
    "\n",
    "In the two-dimensional cross-correlation operation,\n",
    "we begin with the convolution window positioned\n",
    "at the upper-left corner of the input tensor\n",
    "and slide it across the input tensor,\n",
    "both from left to right and top to bottom.\n",
    "When the convolution window slides to a certain position,\n",
    "the input subtensor contained in that window\n",
    "and the kernel tensor are multiplied elementwise\n",
    "and the resulting tensor is summed up\n",
    "yielding a single scalar value.\n",
    "This result gives the value of the output tensor\n",
    "at the corresponding location.\n",
    "Here, the output tensor has a height of 2 and width of 2\n",
    "and the four elements are derived from\n",
    "the two-dimensional cross-correlation operation:\n",
    "\n",
    "$$\n",
    "0\\times0+1\\times1+3\\times2+4\\times3=19,\\\\\n",
    "1\\times0+2\\times1+4\\times2+5\\times3=25,\\\\\n",
    "3\\times0+4\\times1+6\\times2+7\\times3=37,\\\\\n",
    "4\\times0+5\\times1+7\\times2+8\\times3=43.\n",
    "$$\n",
    "\n",
    "Note that along each axis, the output size\n",
    "is slightly smaller than the input size.\n",
    "Because the kernel has width and height greater than one,\n",
    "we can only properly compute the cross-correlation\n",
    "for locations where the kernel fits wholly within the image,\n",
    "the output size is given by the input size $n_h \\times n_w$\n",
    "minus the size of the convolution kernel $k_h \\times k_w$\n",
    "via\n",
    "\n",
    "$$(n_h-k_h+1) \\times (n_w-k_w+1).$$\n",
    "\n",
    "This is the case since we need enough space\n",
    "to \"shift\" the convolution kernel across the image.\n",
    "Later we will see how to keep the size unchanged\n",
    "by padding the image with zeros around its boundary\n",
    "so that there is enough space to shift the kernel.\n",
    "Next, we implement this process in the `corr2d` function,\n",
    "which accepts an input tensor `X` and a kernel tensor `K`\n",
    "and returns an output tensor `Y`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "origin_pos": 4,
    "tab": [
     "tensorflow"
    ]
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "from d2l import tensorflow as d2l\n",
    "\n",
    "\n",
    "def corr2d(X, K):  #@save\n",
    "    \"\"\"Compute 2D cross-correlation.\"\"\"\n",
    "    h, w = K.shape\n",
    "    Y = tf.Variable(tf.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1)))\n",
    "    for i in range(Y.shape[0]):\n",
    "        for j in range(Y.shape[1]):\n",
    "            Y[i, j].assign(tf.reduce_sum(\n",
    "                X[i: i + h, j: j + w] * K))\n",
    "    return Y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 5
   },
   "source": [
    "We can construct the input tensor `X` and the kernel tensor `K`\n",
    "from :numref:`fig_correlation`\n",
    "to [**validate the output of the above implementation**]\n",
    "of the two-dimensional cross-correlation operation.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "origin_pos": 6,
    "tab": [
     "tensorflow"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=\n",
       "array([[19., 25.],\n",
       "       [37., 43.]], dtype=float32)>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = tf.constant([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])\n",
    "K = tf.constant([[0.0, 1.0], [2.0, 3.0]])\n",
    "corr2d(X, K)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 7
   },
   "source": [
    "## Convolutional Layers\n",
    "\n",
    "A convolutional layer cross-correlates the input and kernel\n",
    "and adds a scalar bias to produce an output.\n",
    "The two parameters of a convolutional layer\n",
    "are the kernel and the scalar bias.\n",
    "When training models based on convolutional layers,\n",
    "we typically initialize the kernels randomly,\n",
    "just as we would with a fully-connected layer.\n",
    "\n",
    "We are now ready to [**implement a two-dimensional convolutional layer**]\n",
    "based on the `corr2d` function defined above.\n",
    "In the `__init__` constructor function,\n",
    "we declare `weight` and `bias` as the two model parameters.\n",
    "The forward propagation function\n",
    "calls the `corr2d` function and adds the bias.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "origin_pos": 10,
    "tab": [
     "tensorflow"
    ]
   },
   "outputs": [],
   "source": [
    "class Conv2D(tf.keras.layers.Layer):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "\n",
    "    def build(self, kernel_size):\n",
    "        initializer = tf.random_normal_initializer()\n",
    "        self.weight = self.add_weight(name='w', shape=kernel_size,\n",
    "                                      initializer=initializer)\n",
    "        self.bias = self.add_weight(name='b', shape=(1, ),\n",
    "                                    initializer=initializer)\n",
    "\n",
    "    def call(self, inputs):\n",
    "        return corr2d(inputs, self.weight) + self.bias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 11
   },
   "source": [
    "In\n",
    "$h \\times w$ convolution\n",
    "or a $h \\times w$ convolution kernel,\n",
    "the height and width of the convolution kernel are $h$ and $w$, respectively.\n",
    "We also refer to\n",
    "a convolutional layer with a $h \\times w$\n",
    "convolution kernel simply as a $h \\times w$ convolutional layer.\n",
    "\n",
    "\n",
    "## Object Edge Detection in Images\n",
    "\n",
    "Let us take a moment to parse [**a simple application of a convolutional layer:\n",
    "detecting the edge of an object in an image**]\n",
    "by finding the location of the pixel change.\n",
    "First, we construct an \"image\" of $6\\times 8$ pixels.\n",
    "The middle four columns are black (0) and the rest are white (1).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "origin_pos": 13,
    "tab": [
     "tensorflow"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Variable 'Variable:0' shape=(6, 8) dtype=float32, numpy=\n",
       "array([[1., 1., 0., 0., 0., 0., 1., 1.],\n",
       "       [1., 1., 0., 0., 0., 0., 1., 1.],\n",
       "       [1., 1., 0., 0., 0., 0., 1., 1.],\n",
       "       [1., 1., 0., 0., 0., 0., 1., 1.],\n",
       "       [1., 1., 0., 0., 0., 0., 1., 1.],\n",
       "       [1., 1., 0., 0., 0., 0., 1., 1.]], dtype=float32)>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = tf.Variable(tf.ones((6, 8)))\n",
    "X[:, 2:6].assign(tf.zeros(X[:, 2:6].shape))\n",
    "X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 14
   },
   "source": [
    "Next, we construct a kernel `K` with a height of 1 and a width of 2.\n",
    "When we perform the cross-correlation operation with the input,\n",
    "if the horizontally adjacent elements are the same,\n",
    "the output is 0. Otherwise, the output is non-zero.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "origin_pos": 15,
    "tab": [
     "tensorflow"
    ]
   },
   "outputs": [],
   "source": [
    "K = tf.constant([[1.0, -1.0]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 16
   },
   "source": [
    "We are ready to perform the cross-correlation operation\n",
    "with arguments `X` (our input) and `K` (our kernel).\n",
    "As you can see, [**we detect 1 for the edge from white to black\n",
    "and -1 for the edge from black to white.**]\n",
    "All other outputs take value 0.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "origin_pos": 17,
    "tab": [
     "tensorflow"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Variable 'Variable:0' shape=(6, 7) dtype=float32, numpy=\n",
       "array([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
       "       [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
       "       [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
       "       [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
       "       [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
       "       [ 0.,  1.,  0.,  0.,  0., -1.,  0.]], dtype=float32)>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Y = corr2d(X, K)\n",
    "Y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 18
   },
   "source": [
    "We can now apply the kernel to the transposed image.\n",
    "As expected, it vanishes. [**The kernel `K` only detects vertical edges.**]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "origin_pos": 19,
    "tab": [
     "tensorflow"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Variable 'Variable:0' shape=(8, 5) dtype=float32, numpy=\n",
       "array([[0., 0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0., 0.]], dtype=float32)>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "corr2d(tf.transpose(X), K)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 20
   },
   "source": [
    "## Learning a Kernel\n",
    "\n",
    "Designing an edge detector by finite differences `[1, -1]` is neat\n",
    "if we know this is precisely what we are looking for.\n",
    "However, as we look at larger kernels,\n",
    "and consider successive layers of convolutions,\n",
    "it might be impossible to specify\n",
    "precisely what each filter should be doing manually.\n",
    "\n",
    "Now let us see whether we can [**learn the kernel that generated `Y` from `X`**]\n",
    "by looking at the input--output pairs only.\n",
    "We first construct a convolutional layer\n",
    "and initialize its kernel as a random tensor.\n",
    "Next, in each iteration, we will use the squared error\n",
    "to compare `Y` with the output of the convolutional layer.\n",
    "We can then calculate the gradient to update the kernel.\n",
    "For the sake of simplicity,\n",
    "in the following\n",
    "we use the built-in class\n",
    "for two-dimensional convolutional layers\n",
    "and ignore the bias.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "origin_pos": 23,
    "tab": [
     "tensorflow"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 2, loss 6.643\n",
      "epoch 4, loss 1.270\n",
      "epoch 6, loss 0.277\n",
      "epoch 8, loss 0.073\n",
      "epoch 10, loss 0.023\n"
     ]
    }
   ],
   "source": [
    "# Construct a two-dimensional convolutional layer with 1 output channel and a\n",
    "# kernel of shape (1, 2). For the sake of simplicity, we ignore the bias here\n",
    "conv2d = tf.keras.layers.Conv2D(1, (1, 2), use_bias=False)\n",
    "\n",
    "# The two-dimensional convolutional layer uses four-dimensional input and\n",
    "# output in the format of (example, height, width, channel), where the batch\n",
    "# size (number of examples in the batch) and the number of channels are both 1\n",
    "X = tf.reshape(X, (1, 6, 8, 1))\n",
    "Y = tf.reshape(Y, (1, 6, 7, 1))\n",
    "lr = 3e-2  # Learning rate\n",
    "\n",
    "Y_hat = conv2d(X)\n",
    "for i in range(10):\n",
    "    with tf.GradientTape(watch_accessed_variables=False) as g:\n",
    "        g.watch(conv2d.weights[0])\n",
    "        Y_hat = conv2d(X)\n",
    "        l = (abs(Y_hat - Y)) ** 2\n",
    "        # Update the kernel\n",
    "        update = tf.multiply(lr, g.gradient(l, conv2d.weights[0]))\n",
    "        weights = conv2d.get_weights()\n",
    "        weights[0] = conv2d.weights[0] - update\n",
    "        conv2d.set_weights(weights)\n",
    "        if (i + 1) % 2 == 0:\n",
    "            print(f'epoch {i + 1}, loss {tf.reduce_sum(l):.3f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 24
   },
   "source": [
    "Note that the error has dropped to a small value after 10 iterations. Now we will [**take a look at the kernel tensor we learned.**]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "origin_pos": 27,
    "tab": [
     "tensorflow"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.97336704, -1.0011578 ]], dtype=float32)>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf.reshape(conv2d.get_weights()[0], (1, 2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 28
   },
   "source": [
    "Indeed, the learned kernel tensor is remarkably close\n",
    "to the kernel tensor `K` we defined earlier.\n",
    "\n",
    "## Cross-Correlation and Convolution\n",
    "\n",
    "Recall our observation from :numref:`sec_why-conv` of the correspondence\n",
    "between the cross-correlation and convolution operations.\n",
    "Here let us continue to consider two-dimensional convolutional layers.\n",
    "What if such layers\n",
    "perform strict convolution operations\n",
    "as defined in :eqref:`eq_2d-conv-discrete`\n",
    "instead of cross-correlations?\n",
    "In order to obtain the output of the strict *convolution* operation, we only need to flip the two-dimensional kernel tensor both horizontally and vertically, and then perform the *cross-correlation* operation with the input tensor.\n",
    "\n",
    "It is noteworthy that since kernels are learned from data in deep learning,\n",
    "the outputs of convolutional layers remain unaffected\n",
    "no matter such layers\n",
    "perform\n",
    "either the strict convolution operations\n",
    "or the cross-correlation operations.\n",
    "\n",
    "To illustrate this, suppose that a convolutional layer performs *cross-correlation* and learns the kernel in :numref:`fig_correlation`, which is denoted as the matrix $\\mathbf{K}$ here.\n",
    "Assuming that other conditions remain unchanged,\n",
    "when this layer performs strict *convolution* instead,\n",
    "the learned kernel $\\mathbf{K}'$ will be the same as $\\mathbf{K}$\n",
    "after $\\mathbf{K}'$ is\n",
    "flipped both horizontally and vertically.\n",
    "That is to say,\n",
    "when the convolutional layer\n",
    "performs strict *convolution*\n",
    "for the input in :numref:`fig_correlation`\n",
    "and $\\mathbf{K}'$,\n",
    "the same output in :numref:`fig_correlation`\n",
    "(cross-correlation of the input and $\\mathbf{K}$)\n",
    "will be obtained.\n",
    "\n",
    "In keeping with standard terminology with deep learning literature,\n",
    "we will continue to refer to the cross-correlation operation\n",
    "as a convolution even though, strictly-speaking, it is slightly different.\n",
    "Besides,\n",
    "we use the term *element* to refer to\n",
    "an entry (or component) of any tensor representing a layer representation or a convolution kernel.\n",
    "\n",
    "\n",
    "## Feature Map and Receptive Field\n",
    "\n",
    "As described in :numref:`subsec_why-conv-channels`,\n",
    "the convolutional layer output in\n",
    ":numref:`fig_correlation`\n",
    "is sometimes called a *feature map*,\n",
    "as it can be regarded as\n",
    "the learned representations (features)\n",
    "in the spatial dimensions (e.g., width and height)\n",
    "to the subsequent layer.\n",
    "In CNNs,\n",
    "for any element $x$ of some layer,\n",
    "its *receptive field* refers to\n",
    "all the elements (from all the previous layers)\n",
    "that may affect the calculation of $x$\n",
    "during the forward propagation.\n",
    "Note that the receptive field\n",
    "may be larger than the actual size of the input.\n",
    "\n",
    "Let us continue to use :numref:`fig_correlation` to explain the receptive field.\n",
    "Given the $2 \\times 2$ convolution kernel,\n",
    "the receptive field of the shaded output element (of value $19$)\n",
    "is\n",
    "the four elements in the shaded portion of the input.\n",
    "Now let us denote the $2 \\times 2$\n",
    "output as $\\mathbf{Y}$\n",
    "and consider a deeper CNN\n",
    "with an additional $2 \\times 2$ convolutional layer that takes $\\mathbf{Y}$\n",
    "as its input, outputting\n",
    "a single element $z$.\n",
    "In this case,\n",
    "the receptive field of $z$\n",
    "on $\\mathbf{Y}$ includes all the four elements of $\\mathbf{Y}$,\n",
    "while\n",
    "the receptive field\n",
    "on the input includes all the nine input elements.\n",
    "Thus,\n",
    "when any element in a feature map\n",
    "needs a larger receptive field\n",
    "to detect input features over a broader area,\n",
    "we can build a deeper network.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## Summary\n",
    "\n",
    "* The core computation of a two-dimensional convolutional layer is a two-dimensional cross-correlation operation. In its simplest form, this performs a cross-correlation operation on the two-dimensional input data and the kernel, and then adds a bias.\n",
    "* We can design a kernel to detect edges in images.\n",
    "* We can learn the kernel's parameters from data.\n",
    "* With kernels learned from data, the outputs of convolutional layers remain unaffected regardless of such layers' performed operations (either strict convolution or cross-correlation).\n",
    "* When any element in a feature map needs a larger receptive field to detect broader features on the input, a deeper network can be considered.\n",
    "\n",
    "\n",
    "## Exercises\n",
    "\n",
    "1. Construct an image `X` with diagonal edges.\n",
    "    1. What happens if you apply the kernel `K` in this section to it?\n",
    "    1. What happens if you transpose `X`?\n",
    "    1. What happens if you transpose `K`?\n",
    "1. When you try to automatically find the gradient for the `Conv2D` class we created, what kind of error message do you see?\n",
    "1. How do you represent a cross-correlation operation as a matrix multiplication by changing the input and kernel tensors?\n",
    "1. Design some kernels manually.\n",
    "    1. What is the form of a kernel for the second derivative?\n",
    "    1. What is the kernel for an integral?\n",
    "    1. What is the minimum size of a kernel to obtain a derivative of degree $d$?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 31,
    "tab": [
     "tensorflow"
    ]
   },
   "source": [
    "[Discussions](https://discuss.d2l.ai/t/271)\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}