{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 0
   },
   "source": [
    "# Concise Implementation of Linear Regression\n",
    ":label:`sec_linear_concise`\n",
    "\n",
    "Broad and intense interest in deep learning for the past several years\n",
    "has inspired companies, academics, and hobbyists\n",
    "to develop a variety of mature open source frameworks\n",
    "for automating the repetitive work of implementing\n",
    "gradient-based learning algorithms.\n",
    "In :numref:`sec_linear_scratch`, we relied only on\n",
    "(i) tensors for data storage and linear algebra;\n",
    "and (ii) auto differentiation for calculating gradients.\n",
    "In practice, because data iterators, loss functions, optimizers,\n",
    "and neural network layers\n",
    "are so common, modern libraries implement these components for us as well.\n",
    "\n",
    "In this section, (**we will show you how to implement\n",
    "the linear regression model**) from :numref:`sec_linear_scratch`\n",
    "(**concisely by using high-level APIs**) of deep learning frameworks.\n",
    "\n",
    "\n",
    "## Generating the Dataset\n",
    "\n",
    "To start, we will generate the same dataset as in :numref:`sec_linear_scratch`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "origin_pos": 1,
    "tab": [
     "mxnet"
    ]
   },
   "outputs": [],
   "source": [
    "from mxnet import autograd, gluon, np, npx\n",
    "from d2l import mxnet as d2l\n",
    "\n",
    "npx.set_np()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "origin_pos": 4,
    "tab": [
     "mxnet"
    ]
   },
   "outputs": [],
   "source": [
    "true_w = np.array([2, -3.4])\n",
    "true_b = 4.2\n",
    "features, labels = d2l.synthetic_data(true_w, true_b, 1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 5
   },
   "source": [
    "## Reading the Dataset\n",
    "\n",
    "Rather than rolling our own iterator,\n",
    "we can [**call upon the existing API in a framework to read data.**]\n",
    "We pass in `features` and `labels` as arguments and specify `batch_size`\n",
    "when instantiating a data iterator object.\n",
    "Besides, the boolean value `is_train`\n",
    "indicates whether or not\n",
    "we want the data iterator object to shuffle the data\n",
    "on each epoch (pass through the dataset).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "origin_pos": 6,
    "tab": [
     "mxnet"
    ]
   },
   "outputs": [],
   "source": [
    "def load_array(data_arrays, batch_size, is_train=True):  #@save\n",
    "    \"\"\"Construct a Gluon data iterator.\"\"\"\n",
    "    dataset = gluon.data.ArrayDataset(*data_arrays)\n",
    "    return gluon.data.DataLoader(dataset, batch_size, shuffle=is_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "origin_pos": 9,
    "tab": [
     "mxnet"
    ]
   },
   "outputs": [],
   "source": [
    "batch_size = 10\n",
    "data_iter = load_array((features, labels), batch_size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 10
   },
   "source": [
    "Now we can use `data_iter` in much the same way as we called\n",
    "the `data_iter` function in :numref:`sec_linear_scratch`.\n",
    "To verify that it is working, we can read and print\n",
    "the first minibatch of examples.\n",
    "Comparing with :numref:`sec_linear_scratch`,\n",
    "here we use `iter` to construct a Python iterator and use `next` to obtain the first item from the iterator.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "origin_pos": 11,
    "tab": [
     "mxnet"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[array([[-0.5143139 ,  0.1389543 ],\n",
       "        [ 0.9472738 , -0.72838444],\n",
       "        [-0.3931961 ,  0.81664836],\n",
       "        [-1.0394957 ,  0.40891457],\n",
       "        [ 1.8883077 , -3.072945  ],\n",
       "        [ 0.05601164,  1.362304  ],\n",
       "        [-2.0417614 , -0.1588216 ],\n",
       "        [-0.37867823, -0.8062544 ],\n",
       "        [ 0.03905592, -1.9712304 ],\n",
       "        [-0.8365324 , -0.5932887 ]]),\n",
       " array([[ 2.7082257 ],\n",
       "        [ 8.566592  ],\n",
       "        [ 0.63570935],\n",
       "        [ 0.7104962 ],\n",
       "        [18.425564  ],\n",
       "        [-0.309327  ],\n",
       "        [ 0.67318946],\n",
       "        [ 6.1864944 ],\n",
       "        [10.995712  ],\n",
       "        [ 4.5396338 ]])]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "next(iter(data_iter))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 12
   },
   "source": [
    "## Defining the Model\n",
    "\n",
    "When we implemented linear regression from scratch\n",
    "in :numref:`sec_linear_scratch`,\n",
    "we defined our model parameters explicitly\n",
    "and coded up the calculations to produce output\n",
    "using basic linear algebra operations.\n",
    "You *should* know how to do this.\n",
    "But once your models get more complex,\n",
    "and once you have to do this nearly every day,\n",
    "you will be glad for the assistance.\n",
    "The situation is similar to coding up your own blog from scratch.\n",
    "Doing it once or twice is rewarding and instructive,\n",
    "but you would be a lousy web developer\n",
    "if every time you needed a blog you spent a month\n",
    "reinventing the wheel.\n",
    "\n",
    "For standard operations, we can [**use a framework's predefined layers,**]\n",
    "which allow us to focus especially\n",
    "on the layers used to construct the model\n",
    "rather than having to focus on the implementation.\n",
    "We will first define a model variable `net`,\n",
    "which will refer to an instance of the `Sequential` class.\n",
    "The `Sequential` class defines a container\n",
    "for several layers that will be chained together.\n",
    "Given input data, a `Sequential` instance passes it through\n",
    "the first layer, in turn passing the output\n",
    "as the second layer's input and so forth.\n",
    "In the following example, our model consists of only one layer,\n",
    "so we do not really need `Sequential`.\n",
    "But since nearly all of our future models\n",
    "will involve multiple layers,\n",
    "we will use it anyway just to familiarize you\n",
    "with the most standard workflow.\n",
    "\n",
    "Recall the architecture of a single-layer network as shown in :numref:`fig_single_neuron`.\n",
    "The layer is said to be *fully-connected*\n",
    "because each of its inputs is connected to each of its outputs\n",
    "by means of a matrix-vector multiplication.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 13,
    "tab": [
     "mxnet"
    ]
   },
   "source": [
    "In Gluon, the fully-connected layer is defined in the `Dense` class.\n",
    "Since we only want to generate a single scalar output,\n",
    "we set that number to 1.\n",
    "\n",
    "It is worth noting that, for convenience,\n",
    "Gluon does not require us to specify\n",
    "the input shape for each layer.\n",
    "So here, we do not need to tell Gluon\n",
    "how many inputs go into this linear layer.\n",
    "When we first try to pass data through our model,\n",
    "e.g., when we execute `net(X)` later,\n",
    "Gluon will automatically infer the number of inputs to each layer.\n",
    "We will describe how this works in more detail later.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "origin_pos": 16,
    "tab": [
     "mxnet"
    ]
   },
   "outputs": [],
   "source": [
    "# `nn` is an abbreviation for neural networks\n",
    "from mxnet.gluon import nn\n",
    "\n",
    "net = nn.Sequential()\n",
    "net.add(nn.Dense(1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 19
   },
   "source": [
    "## Initializing Model Parameters\n",
    "\n",
    "Before using `net`, we need to (**initialize the model parameters,**)\n",
    "such as the weights and bias in the linear regression model.\n",
    "Deep learning frameworks often have a predefined way to initialize the parameters.\n",
    "Here we specify that each weight parameter\n",
    "should be randomly sampled from a normal distribution\n",
    "with mean 0 and standard deviation 0.01.\n",
    "The bias parameter will be initialized to zero.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 20,
    "tab": [
     "mxnet"
    ]
   },
   "source": [
    "We will import the `initializer` module from MXNet.\n",
    "This module provides various methods for model parameter initialization.\n",
    "Gluon makes `init` available as a shortcut (abbreviation)\n",
    "to access the `initializer` package.\n",
    "We only specify how to initialize the weight by calling `init.Normal(sigma=0.01)`.\n",
    "Bias parameters are initialized to zero by default.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "origin_pos": 23,
    "tab": [
     "mxnet"
    ]
   },
   "outputs": [],
   "source": [
    "from mxnet import init\n",
    "\n",
    "net.initialize(init.Normal(sigma=0.01))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 26,
    "tab": [
     "mxnet"
    ]
   },
   "source": [
    "The code above may look straightforward but you should note\n",
    "that something strange is happening here.\n",
    "We are initializing parameters for a network\n",
    "even though Gluon does not yet know\n",
    "how many dimensions the input will have!\n",
    "It might be 2 as in our example or it might be 2000.\n",
    "Gluon lets us get away with this because behind the scene,\n",
    "the initialization is actually *deferred*.\n",
    "The real initialization will take place only\n",
    "when we for the first time attempt to pass data through the network.\n",
    "Just be careful to remember that since the parameters\n",
    "have not been initialized yet,\n",
    "we cannot access or manipulate them.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 29
   },
   "source": [
    "## Defining the Loss Function\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 30,
    "tab": [
     "mxnet"
    ]
   },
   "source": [
    "In Gluon, the `loss` module defines various loss functions.\n",
    "In this example, we will use the Gluon\n",
    "implementation of squared loss (`L2Loss`).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "origin_pos": 33,
    "tab": [
     "mxnet"
    ]
   },
   "outputs": [],
   "source": [
    "loss = gluon.loss.L2Loss()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 36
   },
   "source": [
    "## Defining the Optimization Algorithm\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 37,
    "tab": [
     "mxnet"
    ]
   },
   "source": [
    "Minibatch stochastic gradient descent is a standard tool\n",
    "for optimizing neural networks\n",
    "and thus Gluon supports it alongside a number of\n",
    "variations on this algorithm through its `Trainer` class.\n",
    "When we instantiate `Trainer`,\n",
    "we will specify the parameters to optimize over\n",
    "(obtainable from our model `net` via `net.collect_params()`),\n",
    "the optimization algorithm we wish to use (`sgd`),\n",
    "and a dictionary of hyperparameters\n",
    "required by our optimization algorithm.\n",
    "Minibatch stochastic gradient descent just requires that\n",
    "we set the value `learning_rate`, which is set to 0.03 here.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "origin_pos": 40,
    "tab": [
     "mxnet"
    ]
   },
   "outputs": [],
   "source": [
    "from mxnet import gluon\n",
    "\n",
    "trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 43
   },
   "source": [
    "## Training\n",
    "\n",
    "You might have noticed that expressing our model through\n",
    "high-level APIs of a deep learning framework\n",
    "requires comparatively few lines of code.\n",
    "We did not have to individually allocate parameters,\n",
    "define our loss function, or implement minibatch stochastic gradient descent.\n",
    "Once we start working with much more complex models,\n",
    "advantages of high-level APIs will grow considerably.\n",
    "However, once we have all the basic pieces in place,\n",
    "[**the training loop itself is strikingly similar\n",
    "to what we did when implementing everything from scratch.**]\n",
    "\n",
    "To refresh your memory: for some number of epochs,\n",
    "we will make a complete pass over the dataset (`train_data`),\n",
    "iteratively grabbing one minibatch of inputs\n",
    "and the corresponding ground-truth labels.\n",
    "For each minibatch, we go through the following ritual:\n",
    "\n",
    "* Generate predictions by calling `net(X)` and calculate the loss `l` (the forward propagation).\n",
    "* Calculate gradients by running the backpropagation.\n",
    "* Update the model parameters by invoking our optimizer.\n",
    "\n",
    "For good measure, we compute the loss after each epoch and print it to monitor progress.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "origin_pos": 44,
    "tab": [
     "mxnet"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[10:52:56] src/base.cc:49: GPU context requested, but no GPUs found.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 1, loss 0.024935\n",
      "epoch 2, loss 0.000090\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 3, loss 0.000051\n"
     ]
    }
   ],
   "source": [
    "num_epochs = 3\n",
    "for epoch in range(num_epochs):\n",
    "    for X, y in data_iter:\n",
    "        with autograd.record():\n",
    "            l = loss(net(X), y)\n",
    "        l.backward()\n",
    "        trainer.step(batch_size)\n",
    "    l = loss(net(features), labels)\n",
    "    print(f'epoch {epoch + 1}, loss {l.mean().asnumpy():f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 47
   },
   "source": [
    "Below, we [**compare the model parameters learned by training on finite data\n",
    "and the actual parameters**] that generated our dataset.\n",
    "To access parameters,\n",
    "we first access the layer that we need from `net`\n",
    "and then access that layer's weights and bias.\n",
    "As in our from-scratch implementation,\n",
    "note that our estimated parameters are\n",
    "close to their ground-truth counterparts.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "origin_pos": 48,
    "tab": [
     "mxnet"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "error in estimating w: [ 4.9364567e-04 -1.5735626e-05]\n",
      "error in estimating b: [0.00065851]\n"
     ]
    }
   ],
   "source": [
    "w = net[0].weight.data()\n",
    "print(f'error in estimating w: {true_w - w.reshape(true_w.shape)}')\n",
    "b = net[0].bias.data()\n",
    "print(f'error in estimating b: {true_b - b}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 51
   },
   "source": [
    "## Summary\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 52,
    "tab": [
     "mxnet"
    ]
   },
   "source": [
    "* Using Gluon, we can implement models much more concisely.\n",
    "* In Gluon, the `data` module provides tools for data processing, the `nn` module defines a large number of neural network layers, and the `loss` module defines many common loss functions.\n",
    "* MXNet's module `initializer` provides various methods for model parameter initialization.\n",
    "* Dimensionality and storage are automatically inferred, but be careful not to attempt to access parameters before they have been initialized.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 55
   },
   "source": [
    "## Exercises\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "origin_pos": 56,
    "tab": [
     "mxnet"
    ]
   },
   "source": [
    "1. If we replace `l = loss(output, y)` with `l = loss(output, y).mean()`, we need to change `trainer.step(batch_size)` to `trainer.step(1)` for the code to behave identically. Why?\n",
    "1. Review the MXNet documentation to see what loss functions and initialization methods are provided in the modules `gluon.loss` and `init`. Replace the loss by Huber's loss.\n",
    "1. How do you access the gradient of `dense.weight`?\n",
    "\n",
    "[Discussions](https://discuss.d2l.ai/t/44)\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}