前兩個(gè)筆記筆者集中探討了卷積神經(jīng)網(wǎng)絡(luò)中的卷積原理,對(duì)于二維卷積和三維卷積的原理進(jìn)行了深入的剖析,對(duì) CNN 的卷積、池化、全連接、濾波器、感受野等關(guān)鍵概念進(jìn)行了充分的理解。本節(jié)內(nèi)容將繼續(xù)秉承之前 DNN 的學(xué)習(xí)路線,在利用 Tensorflow 搭建神經(jīng)網(wǎng)絡(luò)之前,先嘗試?yán)?numpy 手動(dòng)搭建卷積神經(jīng)網(wǎng)絡(luò),以期對(duì)卷積神經(jīng)網(wǎng)絡(luò)的卷積機(jī)制、前向傳播和反向傳播的原理和過(guò)程有更深刻的理解。
單步卷積過(guò)程
在正式搭建 CNN 之前,我們先依據(jù)前面筆記提到的卷積機(jī)制的線性計(jì)算的理解,利用 numpy 定義一個(gè)單步卷積過(guò)程。代碼如下:
def conv_single_step(a_slice_prev, W, b):
s = a_slice_prev * W # Sum over all entries of the volume s.
Z = np.sum(s) # Add bias b to Z. Cast b to a float() so that Z results in a scalar value.
Z = float(Z + b)
return Z
在上述的單步卷積定義中,我們傳入了一個(gè)前一層輸入的要進(jìn)行卷積的區(qū)域,即感受野 a_slice_prev ,濾波器 W,即卷積層的權(quán)重參數(shù),偏差 b,對(duì)其執(zhí)行 Z=Wx+b 的線性計(jì)算即可實(shí)現(xiàn)一個(gè)單步的卷積過(guò)程。
CNN前向傳播過(guò)程:卷積
正如 DNN 中一樣,CNN 即使多了卷積和池化過(guò)程,模型仍然是前向傳播和反向傳播的訓(xùn)練過(guò)程。CNN 的前向傳播包括卷積和池化兩個(gè)過(guò)程,我們先來(lái)看如何利用 numpy 基于上面定義的單步卷積實(shí)現(xiàn)完整的卷積過(guò)程。卷積計(jì)算并不難,我們?cè)趩尾骄矸e中就已經(jīng)實(shí)現(xiàn)了,難點(diǎn)在于如何實(shí)現(xiàn)濾波器在輸入圖像矩陣上的的掃描和移動(dòng)過(guò)程。
這其中我們需要搞清楚一些變量和參數(shù),以及每一個(gè)輸入輸出的 shape,這對(duì)于我們執(zhí)行卷積和矩陣相乘至關(guān)重要。首先我們的輸入是原始圖像矩陣,也可以是前一層經(jīng)過(guò)激活后的圖像輸出矩陣,這里以前一層的激活輸出為準(zhǔn),輸入像素的 shape 我們必須明確,然后是濾波器矩陣和偏差,還需要考慮步幅和填充,在此基礎(chǔ)上我們基于濾波器移動(dòng)和單步卷積搭建定義如下前向卷積過(guò)程:
def conv_forward(A_prev, W, b, hparameters):
"""
Arguments:
A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
b -- Biases, numpy array of shape (1, 1, 1, n_C)
hparameters -- python dictionary containing "stride" and "pad"
Returns:
Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
cache -- cache of values needed for the conv_backward() function
"""
# 前一層輸入的shape
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
# 濾波器權(quán)重的shape
(f, f, n_C_prev, n_C) = W.shape
# 步幅參數(shù)
stride = hparameters['stride']
# 填充參數(shù)
pad = hparameters['pad']
# 計(jì)算輸出圖像的高寬
n_H = int((n_H_prev + 2 * pad - f) / stride + 1)
n_W = int((n_W_prev + 2 * pad - f) / stride + 1)
# 初始化輸出
Z = np.zeros((m, n_H, n_W, n_C))
# 對(duì)輸入執(zhí)行邊緣填充
A_prev_pad = zero_pad(A_prev, pad)
for i in range(m):
a_prev_pad = A_prev_pad[i, :, :, :]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
# 濾波器在輸入圖像上掃描
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
# 定義感受野
a_slice_prev = a_prev_pad[vert_start : vert_end, horiz_start : horiz_end, :] # 對(duì)感受野執(zhí)行單步卷積
Z[i, h, w, c] = conv_single_step(a_slice_prev, W[:,:,:,c], b[:,:,:,c])
assert(Z.shape == (m, n_H, n_W, n_C))
cache = (A_prev, W, b, hparameters)
return Z, cache
這樣,卷積神經(jīng)網(wǎng)絡(luò)前向傳播中一個(gè)完整的卷積計(jì)算過(guò)程就被我們定義好了。通常而言,我們也會(huì)對(duì)卷積后輸出加一個(gè) relu 激活操作,正如前面的圖2所示,這里我們就省略不加了。
CNN前向傳播過(guò)程:池化
池化簡(jiǎn)單而言就是取局部區(qū)域最大值,池化的前向傳播跟卷積過(guò)程類(lèi)似,但相對(duì)簡(jiǎn)單一點(diǎn),無(wú)需執(zhí)行單步卷積那樣的乘積運(yùn)算。同樣需要注意的是各參數(shù)和輸入輸出的 shape,因此我們定義如下前向傳播池化過(guò)程:
def pool_forward(A_prev, hparameters, mode = "max"):
"""
Arguments:
A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
hparameters -- python dictionary containing "f" and "stride"
mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
Returns:
A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters
"""
# 前一層輸入的shape
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
# 步幅和權(quán)重參數(shù)
f = hparameters["f"]
stride = hparameters["stride"]
# 計(jì)算輸出圖像的高寬
n_H = int(1 + (n_H_prev - f) / stride)
n_W = int(1 + (n_W_prev - f) / stride)
n_C = n_C_prev
# 初始化輸出
A = np.zeros((m, n_H, n_W, n_C))
for i in range(m):
for h in range(n_H):
for w in range(n_W):
for c in range (n_C):
# 樹(shù)池在輸入圖像上掃描
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
# 定義池化區(qū)域
a_prev_slice = A_prev[i, vert_start:vert_end, horiz_start:horiz_end, c]
# 選擇池化類(lèi)型
if mode == "max":
A[i, h, w, c] = np.max(a_prev_slice)
elif mode == "average":
A[i, h, w, c] = np.mean(a_prev_slice)
cache = (A_prev, hparameters)
assert(A.shape == (m, n_H, n_W, n_C))
return A, cache
由上述代碼結(jié)構(gòu)可以看出,前向傳播的池化過(guò)程的代碼結(jié)構(gòu)和卷積過(guò)程非常類(lèi)似。
CNN反向傳播過(guò)程:卷積
定義好前向傳播之后,難點(diǎn)和關(guān)鍵點(diǎn)就在于如何給卷積和池化過(guò)程定義反向傳播過(guò)程。卷積層的反向傳播向來(lái)是個(gè)復(fù)雜的過(guò)程,Tensorflow 中我們只要定義好前向傳播過(guò)程,反向傳播會(huì)自動(dòng)進(jìn)行計(jì)算。但利用 numpy 搭建 CNN 反向傳播就還得我們自己定義了。其關(guān)鍵還是在于準(zhǔn)確的定義損失函數(shù)對(duì)于各個(gè)變量的梯度:
由上述梯度計(jì)算公式和卷積的前向傳播過(guò)程,我們定義如下卷積的反向傳播函數(shù):
def conv_backward(dZ, cache): """
Arguments:
dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
cache -- cache of values needed for the conv_backward(), output of conv_forward()
Returns:
dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev),
numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
dW -- gradient of the cost with respect to the weights of the conv layer (W)
numpy array of shape (f, f, n_C_prev, n_C)
db -- gradient of the cost with respect to the biases of the conv layer (b)
numpy array of shape (1, 1, 1, n_C)
"""
# 獲取前向傳播中存儲(chǔ)的cache
(A_prev, W, b, hparameters) = cache
# 前一層輸入的shape
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
# 濾波器的 shape
(f, f, n_C_prev, n_C) = W.shape
# 步幅和權(quán)重參數(shù)
stride = hparameters['stride']
pad = hparameters['pad']
# dZ 的shape
(m, n_H, n_W, n_C) = dZ.shape
# 初始化 dA_prev, dW, db
dA_prev = np.zeros((m, n_H_prev, n_W_prev, n_C_prev))
dW = np.zeros((f, f, n_C_prev, n_C))
db = np.zeros((1, 1, 1, n_C))
# 對(duì)A_prev 和 dA_prev 執(zhí)行零填充
A_prev_pad = zero_pad(A_prev, pad)
dA_prev_pad = zero_pad(dA_prev, pad)
for i in range(m):
# select ith training example from A_prev_pad and dA_prev_pad
a_prev_pad = A_prev_pad[i,:,:,:]
da_prev_pad = dA_prev_pad[i,:,:,:]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
# 獲取當(dāng)前感受野
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
# 獲取當(dāng)前濾波器矩陣
a_slice = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]
# 梯度更新
da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]
dW[:,:,:,c] += a_slice * dZ[i, h, w, c]
db[:,:,:,c] += dZ[i, h, w, c]
dA_prev[i, :, :, :] = da_prev_pad[pad:-pad, pad:-pad, :]
assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))
return dA_prev, dW, db
CNN反向傳播過(guò)程:池化
反向傳播中的池化操作跟卷積也是類(lèi)似的。再此之前,我們需要根據(jù)濾波器為最大池化和平均池化分別創(chuàng)建一個(gè) mask 和一個(gè) distribute_value :
def create_mask_from_window(x):
"""
Creates a mask from an input matrix x, to identify the max entry of x.
Arguments:
x -- Array of shape (f, f)
Returns:
mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
"""
mask = (x == np.max(x))
return mask
def distribute_value(dz, shape):
"""
Distributes the input value in the matrix of dimension shape
Arguments:
dz -- input scalar
shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz
Returns:
a -- Array of size (n_H, n_W) for which we distributed the value of dz
"""
(n_H, n_W) = shape
# Compute the value to distribute on the matrix
average = dz / (n_H * n_W)
# Create a matrix where every entry is the "average" value
a = np.full(shape, average)
return a
然后整合封裝最大池化的反向傳播過(guò)程:
def pool_backward(dA, cache, mode = "max"):
"""
Arguments:
dA -- gradient of cost with respect to the output of the pooling layer, same shape as A
cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters
mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
Returns:
dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev
"""
# Retrieve information from cache
(A_prev, hparameters) = cache
# Retrieve hyperparameters from "hparameters"
stride = hparameters['stride']
f = hparameters['f']
# Retrieve dimensions from A_prev's shape and dA's shape
m, n_H_prev, n_W_prev, n_C_prev = A_prev.shape
m, n_H, n_W, n_C = dA.shape
# Initialize dA_prev with zeros
dA_prev = np.zeros((m, n_H_prev, n_W_prev, n_C_prev))
for i in range(m):
# select training example from A_prev
a_prev = A_prev[i,:,:,:]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
# Find the corners of the current "slice"
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
# Compute the backward propagation in both modes.
if mode == "max":
a_prev_slice = a_prev[vert_start:vert_end, horiz_start:horiz_end, c]
mask = create_mask_from_window(a_prev_slice)
dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += np.multiply(mask, dA[i,h,w,c]) elif mode == "average": # Get the value a from dA
da = dA[i,h,w,c]
# Define the shape of the filter as fxf
shape = (f,f)
# Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da.
dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da, shape)
# Making sure your output shape is correct
assert(dA_prev.shape == A_prev.shape)
return dA_prev
這樣卷積神經(jīng)網(wǎng)絡(luò)的整個(gè)前向傳播和反向傳播過(guò)程我們就搭建好了??梢哉f(shuō)是非常費(fèi)力的操作了,但我相信,經(jīng)過(guò)這樣一步步的根據(jù)原理的手寫(xiě),你一定會(huì)對(duì)卷積神經(jīng)網(wǎng)絡(luò)的原理理解更加深刻了。
本文由《自興動(dòng)腦人工智能》項(xiàng)目部 凱文 投稿。
-
神經(jīng)網(wǎng)絡(luò)
+關(guān)注
關(guān)注
42文章
4777瀏覽量
100966 -
人工智能
+關(guān)注
關(guān)注
1792文章
47525瀏覽量
239263 -
機(jī)器學(xué)習(xí)
+關(guān)注
關(guān)注
66文章
8428瀏覽量
132851 -
深度學(xué)習(xí)
+關(guān)注
關(guān)注
73文章
5510瀏覽量
121349
發(fā)布評(píng)論請(qǐng)先 登錄
相關(guān)推薦
評(píng)論