Gluon Implementation of Linear Regression¶
With the development of deep learning frameworks, it has become increasingly easy to develop deep learning applications. In practice, we can usually implement the same model, but much more concisely how we introduce it in the previous section. In this section, we will introduce how to use the Gluon interface provided by MXNet.
Generating Data Sets¶
We will generate the same data set as that used in the previous section.
In [1]:
from mxnet import autograd, nd
num_inputs = 2
num_examples = 1000
true_w = nd.array([2, -3.4])
true_b = 4.2
features = nd.random.normal(scale=1, shape=(num_examples, num_inputs))
labels = nd.dot(features, true_w) + true_b
labels += nd.random.normal(scale=0.01, shape=labels.shape)
Reading Data¶
Gluon provides the data
module to read data. Since data
is often
used as a variable name, we will replace it with the pseudonym gdata
(adding the first letter of Gluon) when referring to the imported
data
module. In each iteration, we will randomly read a mini-batch
containing 10 data instances.
In [2]:
from mxnet.gluon import data as gdata
batch_size = 10
# Combining the features and labels of the training data.
dataset = gdata.ArrayDataset(features, labels)
# Randomly reading mini-batches.
data_iter = gdata.DataLoader(dataset, batch_size, shuffle=True)
The use of data_iter
here is the same as in the previous section.
Now, we can read and print the first mini-batch of instances.
In [3]:
for X, y in data_iter:
print(X, y)
break
[[ 1.8917114 -1.2347414 ]
[ 0.8761411 -0.695539 ]
[ 0.05878817 1.2624414 ]
[ 0.01046171 0.08084143]
[-1.2243727 -0.27489948]
[-0.72766423 -1.8169528 ]
[ 1.7130351 2.0944266 ]
[ 1.4136469 -1.416811 ]
[-2.5899131 -0.3166397 ]
[ 1.6491317 -1.2902275 ]]
<NDArray 10x2 @cpu(0)>
[12.192286 8.318588 0.03279139 3.960095 2.6707168 8.924203
0.5036066 11.866432 0.10322509 11.878293 ]
<NDArray 10 @cpu(0)>
Define the Model¶
When we implemented the linear regression model from scratch in the previous section, we needed to define the model parameters and use them to describe step by step how the model is evaluated. This can become complicated as we build complex models. Gluon provides a large number of predefined layers, which allow us to focus especially on the layers used to construct the model rather than having to focus on the implementation.
To define a linear model, first import the module nn
. nn
is an
abbreviation for neural networks. As the name implies, this module
defines a large number of neural network layers. We will first define a
model variable net
, which is a Sequential
instance. In Gluon, a
Sequential
instance can be regarded as a container that concatenates
the various layers in sequence. When constructing the model, we will add
the layers in their order of occurrence in the container. When input
data is given, each layer in the container will be calculated in order,
and the output of one layer will be the input of the next layer.
In [4]:
from mxnet.gluon import nn
net = nn.Sequential()
Recall the architecture of a single layer network. The layer is fully
connected since it connects all inputs with all outputs by means of a
matrix-vector multiplication. In Gluon, the fully connected layer is
referred to as a Dense
instance. Since we only want to generate a
single scalar output, we set that number to \(1\).
Linear.regression.is.a.single-layer.neural.network..
In [5]:
net.add(nn.Dense(1))
It is worth noting that, in Gluon, we do not need to specify the input
shape for each layer, such as the number of linear regression inputs.
When the model sees the data, for example, when the net(X)
is
executed later, the model will automatically infer the number of inputs
in each layer. We will describe this mechanism in detail in the chapter
“Deep Learning Computation”. Gluon introduces this design to make model
development more convenient.
Initialize Model Parameters¶
Before using net
, we need to initialize the model parameters, such
as the weights and biases in the linear regression model. We will import
the initializer
module from MXNet. This module provides various
methods for model parameter initialization. The init
here is the
abbreviation of initializer
. Byinit.Normal(sigma=0.01)
we
specify that each weight parameter element is to be randomly sampled at
initialization with a normal distribution with a mean of 0 and standard
deviation of 0.01. The bias parameter will be initialized to zero by
default.
In [6]:
from mxnet import init
net.initialize(init.Normal(sigma=0.01))
The code above looks pretty straightforward but in reality something quite strange is happening here. We are initializing parameters for a networks where we haven’t told Gluon yet how many dimensions the input will have. It might be 2 as in our example or 2,000, so we couldn’t just preallocate enough space to make it work. What happens behind the scenes is that the updates are deferred until the first time that data is sent through the networks. In doing so, we prime all settings (and the user doesn’t even need to worry about it). The only cautionary notice is that since the parameters have not been initialized yet, we would not be able to manipulate them yet.
Define the Loss Function¶
In Gluon, the module loss
defines various loss functions. We will
replace the imported module loss
with the pseudonym gloss
, and
directly use the squared loss it provides as a loss function for the
model.
In [7]:
from mxnet.gluon import loss as gloss
loss = gloss.L2Loss() # The squared loss is also known as the L2 norm loss.
Define the Optimization Algorithm¶
Again, we do not need to implement mini-batch stochastic gradient
descent. After importing Gluon, we now create a Trainer
instance and
specify a mini-batch stochastic gradient descent with a learning rate of
0.03 (sgd
) as the optimization algorithm. This optimization
algorithm will be used to iterate through all the parameters contained
in the net
instance’s nested layers through the add
function.
These parameters can be obtained by the collect_params
function.
In [8]:
from mxnet import gluon
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})
Training¶
You might have noticed that it was a bit more concise to express our model in Gluon. For example, we didn’t have to individually allocate parameters, define our loss function, or implement stochastic gradient descent. The benefits of relying on Gluon’s abstractions will grow substantially once we start working with much more complex models. But once we have all the basic pieces in place, the training loop itself is quite similar to what we would do if implementing everything from scratch.
To refresh your memory. For some number of epochs, we’ll make a complete pass over the dataset (train_data), grabbing one mini-batch of inputs and the corresponding ground-truth labels at a time. Then, for each batch, we’ll go through the following ritual.
- Generate predictions
net(X)
and the lossl
by executing a forward pass through the network. - Calculate gradients by making a backwards pass through the network
via
l.backward()
. - Update the model parameters by invoking our SGD optimizer (note that we need not tell trainer.step about which parameters but rather just the amount of data, since we already performed that in the initialization of trainer).
For good measure we compute the loss on the features after each epoch and print it to monitor progress.
In [9]:
num_epochs = 3
for epoch in range(1, num_epochs + 1):
for X, y in data_iter:
with autograd.record():
l = loss(net(X), y)
l.backward()
trainer.step(batch_size)
l = loss(net(features), labels)
print('epoch %d, loss: %f' % (epoch, l.mean().asnumpy()))
epoch 1, loss: 0.040675
epoch 2, loss: 0.000148
epoch 3, loss: 0.000050
The model parameters we have learned and the actual model parameters are
compared as below. We get the layer we need from the net
and access
its weight (weight
) and bias (bias
). The parameters we have
learned and the actual parameters are very close.
In [10]:
w = net[0].weight.data()
print('Error in estimating w', true_w.reshape(w.shape) - w)
b = net[0].bias.data()
print('Error in estimating b', true_b - b)
Error in estimating w
[[0.00042605 0.00014257]]
<NDArray 1x2 @cpu(0)>
Error in estimating b
[0.00035143]
<NDArray 1 @cpu(0)>
Summary¶
- Using Gluon, we can implement the model more succinctly.
- In Gluon, the module
data
provides tools for data processing, the modulenn
defines a large number of neural network layers, and the moduleloss
defines various loss functions. - MXNet’s module
initializer
provides various methods for model parameter initialization. - Dimensionality and storage are automagically inferred (but caution if you want to access parameters before they’ve been initialized).
Problems¶
- If we replace
l = loss(output, y)
withl = loss(output, y).mean()
, we need to changetrainer.step(batch_size)
totrainer.step(1)
accordingly. Why? - Review the MXNet documentation to see what loss functions and
initialization methods are provided in the modules
gluon.loss
andinit
. Replace the loss by Huber’s loss. - How do you access the gradient of
dense.weight
?