Data Operations¶
In deep learning, we frequently operate on data. To experience hands-on deep learning, this section describes how to operate on data in the memory.
In MXNet, NDArray is the primary tool for storing and transforming data. If you have used NumPy before, you will find that NDArray is very similar to NumPy’s multidimensional array. However, NDArray provides more features, such as GPU computing and auto-derivation, which makes it more suitable for deep learning.
Create NDArray¶
Let us introduce the most basic functionalities of NDArray first. If you are not familiar with the mathematical operations we use, you can refer to the “Mathematical Basics” section in the appendix.
First, import the ndarray
module from MXNet. Here, nd
is short
for ndarray
.
In [1]:
from mxnet import nd
Then, we create a row vector using the arrange
function.
In [2]:
x = nd.arange(12)
x
Out[2]:
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.]
<NDArray 12 @cpu(0)>
This returns an NDArray instance containing 12 consecutive integers
starting from 0. From the property <NDArray 12 @cpu(0)>
shown when
printing x
we can see that it is a one-dimensional array with a
length of 12 and is created in the CPU main memory. The 0 in “@cpu(0)”
has no special meaning and does not represent a specific core.
We can get the NDArray instance shape through the shape
property.
In [3]:
x.shape
Out[3]:
(12,)
We can also get the total number of elements in the NDArray instance
through the size
property.
In [4]:
x.size
Out[4]:
12
In the following, we use the reshape
function to change the shape of
the line vector x
to (3, 4), which is a matrix of 3 rows and 4
columns. Except for the shape change, the elements inx
remain
unchanged.
In [5]:
x = x.reshape((3, 4))
x
Out[5]:
[[ 0. 1. 2. 3.]
[ 4. 5. 6. 7.]
[ 8. 9. 10. 11.]]
<NDArray 3x4 @cpu(0)>
Notice that the shape in the x
property has changed. The above
x.reshape((3, 4))
can also be written as x.reshape((-1, 4))
or
x.reshape((3, -1))
. Since the number of elements of x
is known,
here -1
can be inferred from the number of elements and the size of
other dimensions.
Next, we create a tensor with each element being 0 and a shape of (2, 3, 4). In fact, the previously created vectors and matrices are special tensors.
In [6]:
nd.zeros((2, 3, 4))
Out[6]:
[[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]]
<NDArray 2x3x4 @cpu(0)>
Similarly, we can create a tensor with each element being 1.
In [7]:
nd.ones((3, 4))
Out[7]:
[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]
<NDArray 3x4 @cpu(0)>
We can also specify the value of each element in the NDArray that needs to be created through a Python list.
In [8]:
y = nd.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
y
Out[8]:
[[2. 1. 4. 3.]
[1. 2. 3. 4.]
[4. 3. 2. 1.]]
<NDArray 3x4 @cpu(0)>
In some cases, we need to randomly generate the value of each element in the NDArray. Next, we create an NDArray with a shape of (3,4). Each of its elements is randomly sampled in a normal distribution with a mean of 0 and standard deviation of 1.
In [9]:
nd.random.normal(0, 1, shape=(3, 4))
Out[9]:
[[ 2.2122064 0.7740038 1.0434405 1.1839255 ]
[ 1.8917114 -1.2347414 -1.771029 -0.45138445]
[ 0.57938355 -1.856082 -1.9768796 -0.20801921]]
<NDArray 3x4 @cpu(0)>
Operation¶
NDArray supports a large number of operators. For example, we can carry out addition by element on two previously created NDArrays with a shape of (3, 4). The shape of the result does not change.
In [10]:
x + y
Out[10]:
[[ 2. 2. 6. 6.]
[ 5. 7. 9. 11.]
[12. 12. 12. 12.]]
<NDArray 3x4 @cpu(0)>
Multiply by element:
In [11]:
x * y
Out[11]:
[[ 0. 1. 8. 9.]
[ 4. 10. 18. 28.]
[32. 27. 20. 11.]]
<NDArray 3x4 @cpu(0)>
Divide by element:
In [12]:
x / y
Out[12]:
[[ 0. 1. 0.5 1. ]
[ 4. 2.5 2. 1.75]
[ 2. 3. 5. 11. ]]
<NDArray 3x4 @cpu(0)>
Index operations by element:
In [13]:
y.exp()
Out[13]:
[[ 7.389056 2.7182817 54.59815 20.085537 ]
[ 2.7182817 7.389056 20.085537 54.59815 ]
[54.59815 20.085537 7.389056 2.7182817]]
<NDArray 3x4 @cpu(0)>
In addition to computations by element, we can also use the dot
function for matrix operations. Next, we will perform matrix
multiplication to transpose x
and y
. Since x
is a matrix of
3 rows and 4 columns, y
is transposed into a matrix of 4 rows and 3
columns. The two matrices are multiplied to obtain a matrix of 3 rows
and 3 columns.
In [14]:
nd.dot(x, y.T)
Out[14]:
[[ 18. 20. 10.]
[ 58. 60. 50.]
[ 98. 100. 90.]]
<NDArray 3x3 @cpu(0)>
We can also merge multiple NDArrays. Next, we concatenate two matrices on the line (dimension 0, the leftmost element in the shape) and the column (dimension 1, the second element from the left in the shape).
In [15]:
nd.concat(x, y, dim=0), nd.concat(x, y, dim=1)
Out[15]:
(
[[ 0. 1. 2. 3.]
[ 4. 5. 6. 7.]
[ 8. 9. 10. 11.]
[ 2. 1. 4. 3.]
[ 1. 2. 3. 4.]
[ 4. 3. 2. 1.]]
<NDArray 6x4 @cpu(0)>,
[[ 0. 1. 2. 3. 2. 1. 4. 3.]
[ 4. 5. 6. 7. 1. 2. 3. 4.]
[ 8. 9. 10. 11. 4. 3. 2. 1.]]
<NDArray 3x8 @cpu(0)>)
A new NDArray with an element of 0 or 1 can be obtained using the
conditional judgment. Take x == y
as an example. If x
and y
are determined to be true at the same position (value is equal), then
the new NDArray has a value of 1 at the same position; otherwise, it is
0.
In [16]:
x == y
Out[16]:
[[0. 1. 0. 1.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
<NDArray 3x4 @cpu(0)>
Summing all the elements in the NDArray yields an NDArray with only one element.
In [17]:
x.sum()
Out[17]:
[66.]
<NDArray 1 @cpu(0)>
We can transform the result into a scalar in Python using the
asscalar
function. In the following example, the \(L_2\) norm
result of x
is a single element NDArray, the same as the previous
example, but the final result is transformed into a scalar in Python.
In [18]:
x.norm().asscalar()
Out[18]:
22.494444
We can also rewrite y.exp()
, x.sum()
, x.norm()
, etc. as
nd.exp(y)
, nd.sum(x)
, nd.norm(x)
, etc.
Broadcast Mechanism¶
In the above section, we saw how to perform operations by element on two NDArrays of the same shape. When two elements of different shapes of NDArray are operated by element, a broadcasting mechanism may be triggered. First, copy the elements appropriately so that the two NDArrays have the same shape, and then carry out operations by element.
Define two NDArrays:
In [19]:
a = nd.arange(3).reshape((3, 1))
b = nd.arange(2).reshape((1, 2))
a, b
Out[19]:
(
[[0.]
[1.]
[2.]]
<NDArray 3x1 @cpu(0)>,
[[0. 1.]]
<NDArray 1x2 @cpu(0)>)
Since a
和b
is a matrix of 3 rows and 1 column, and 1 row and
2 columns respectively, if it is needed to compute a+b
, then the
three elements in the first column of a
are broadcast (copied) to
the second column, and the two elements in the first line of b
are
broadcast (copied) to the second and third lines. In this way, we can
add two matrixes of 3 rows and 2 columns by element.
In [20]:
a + b
Out[20]:
[[0. 1.]
[1. 2.]
[2. 3.]]
<NDArray 3x2 @cpu(0)>
Index¶
In NDArray, the index represents the position of the element. The index of the NDArray is incremented from 0. For example, the line indexes of a matrix of 3 rows and 2 columns are 0, 1, and 2 respectively, and column indexes are 0 and 1 respectively.
In the following example, we specify the row index interception range of
NDArray as [1:3]
. Following the convention of closing the left and
opening the right for the specified range, it intercepts two rows of the
matrix x
with indexes 1 and 2.
In [21]:
x[1:3]
Out[21]:
[[ 4. 5. 6. 7.]
[ 8. 9. 10. 11.]]
<NDArray 2x4 @cpu(0)>
We can specify the location of the individual elements in the NDArray that need to be accessed, such as the index of the rows and columns in the matrix, and reassign the element.
In [22]:
x[1, 2] = 9
x
Out[22]:
[[ 0. 1. 2. 3.]
[ 4. 5. 9. 7.]
[ 8. 9. 10. 11.]]
<NDArray 3x4 @cpu(0)>
Of course, we can also intercept some of the elements and reassign them. In the following example, we reassign each column element with a row index of 1.
In [23]:
x[1:2, :] = 12
x
Out[23]:
[[ 0. 1. 2. 3.]
[12. 12. 12. 12.]
[ 8. 9. 10. 11.]]
<NDArray 3x4 @cpu(0)>
Memory Overhead of the Operation¶
In the previous example, we opened new memory for each operation to
store the result of the operation. For example, even with operations
like y = x + y
, we will create new memory and then point y
to
the new memory. To demonstrate this, we can use the id
function that
comes with Python: if the IDs of the two instances are the same, then
they correspond to the same memory address; otherwise, they are
different.
In [24]:
before = id(y)
y = y + x
id(y) == before
Out[24]:
False
If we want to specify the result to a specific memory, we can use the
index described earlier to perform the replacement. In the example
below, we first create an NDArray with the same shape as y
and an
element of 0 through zeros_like
, denoted as z
. Next, we write
the result of x + y
into the memory corresponding to z
through
[:]
.
In [25]:
z = y.zeros_like()
before = id(z)
z[:] = x + y
id(z) == before
Out[25]:
True
In fact, in the above example, we still created temporary memory for
x + y
to store the computation results, then copy it to the memory
corresponding to z
. If we want to avoid this temporary memory
overhead, we can use the out
parameter in the operator ’ s full name
function.
In [26]:
nd.elemwise_add(x, y, out=z)
id(z) == before
Out[26]:
True
If the value of x
is not reused in subsequent programs, we can also
use x[:] = x + y
or x += y
to reduce the memory overhead of the
operation.
In [27]:
before = id(x)
x += y
id(x) == before
Out[27]:
True
Mutual Transformation of NDArray and NumPy¶
We can use the array
and asnumpy
functions to transform data
between NDArray and NumPy formats. Next, the NumPy instance is
transformed into an NDArray instance.
In [28]:
import numpy as np
p = np.ones((2, 3))
d = nd.array(p)
d
Out[28]:
[[1. 1. 1.]
[1. 1. 1.]]
<NDArray 2x3 @cpu(0)>
Then, the NDArray instance is transformed into a NumPy instance.
In [29]:
d.asnumpy()
Out[29]:
array([[1., 1., 1.],
[1., 1., 1.]], dtype=float32)
Summary¶
- NDArray is a primary tool for storing and transforming data in MXNet.
- We can easily create, operate, and specify indexes on NDArray, as well as transform them from/to NumPy.
exercise¶
- Run the code in this section. Change the conditional judgment
x == y
in this section tox < y
orx > y
, and then see what kind of NDArray you can get. - Replace the two NDArrays that operate by element in the broadcast mechanism with other shapes. Is the result the same as expected?