3. Tensorflow 中的深度神經(jīng)網(wǎng)絡(luò)
到目前為止,我們已經(jīng)看到了LeNet5 CNN架構(gòu)。 LeNet5包含兩個(gè)卷積層,緊接著的是完全連接的層,因此可以稱為淺層神經(jīng)網(wǎng)絡(luò)。那時(shí)候(1998年),GPU還沒(méi)有被用來(lái)進(jìn)行計(jì)算,而且CPU的功能也沒(méi)有那么強(qiáng)大,所以,在當(dāng)時(shí),兩個(gè)卷積層已經(jīng)算是相當(dāng)具有創(chuàng)新意義了。
后來(lái),很多其他類型的卷積神經(jīng)網(wǎng)絡(luò)被設(shè)計(jì)出來(lái),你可以在查看詳細(xì)信息。
比如,由Alex Krizhevsky開(kāi)發(fā)的非常有名的AlexNet?架構(gòu)(2012年),7層的ZF Net?(2013),以及16層的?VGGNet?(2014)。
在2015年,Google發(fā)布了一個(gè)包含初始模塊的22層的CNN(GoogLeNet),而微軟亞洲研究院構(gòu)建了一個(gè)152層的CNN,被稱為ResNet。
現(xiàn)在,根據(jù)我們目前已經(jīng)學(xué)到的知識(shí),我們來(lái)看一下如何在Tensorflow中創(chuàng)建AlexNet和VGGNet16架構(gòu)。
3.1 AlexNet
雖然LeNet5是第一個(gè)ConvNet,但它被認(rèn)為是一個(gè)淺層神經(jīng)網(wǎng)絡(luò)。它在由大小為28 x 28的灰度圖像組成的MNIST數(shù)據(jù)集上運(yùn)行良好,但是當(dāng)我們嘗試分類更大、分辨率更好、類別更多的圖像時(shí),性能就會(huì)下降。
第一個(gè)深度CNN于2012年推出,稱為AlexNet,其創(chuàng)始人為Alex Krizhevsky、Ilya Sutskever和Geoffrey Hinton。與最近的架構(gòu)相比,AlexNet可以算是簡(jiǎn)單的了,但在當(dāng)時(shí)它確實(shí)非常成功。它以令人難以置信的15.4%的測(cè)試錯(cuò)誤率贏得了ImageNet比賽(亞軍的誤差為26.2%),并在全球深度學(xué)習(xí)和人工智能領(lǐng)域掀起了一場(chǎng)革命。
?
它包括5個(gè)卷積層、3個(gè)最大池化層、3個(gè)完全連接層和2個(gè)丟棄層。整體架構(gòu)如下所示:
第0層:大小為224 x 224 x 3的輸入圖像
第1層:具有96個(gè)濾波器(filter_depth_1 = 96)的卷積層,大小為11×11(filter_size_1 = 11),步長(zhǎng)為4。它包含ReLU激活函數(shù)。 緊接著的是最大池化層和本地響應(yīng)歸一化層。
第2層:具有大小為5 x 5(filter_size_2 = 5)的256個(gè)濾波器(filter_depth_2 = 256)且步幅為1的卷積層。它包含ReLU激活函數(shù)。 緊接著的還是最大池化層和本地響應(yīng)歸一化層。
第3層:具有384個(gè)濾波器的卷積層(filter_depth_3 = 384),尺寸為3×3(filter_size_3 = 3),步幅為1。它包含ReLU激活函數(shù)
第4層:與第3層相同。
第5層:具有大小為3×3(filter_size_4 = 3)的256個(gè)濾波器(filter_depth_4 = 256)且步幅為1的卷積層。它包含ReLU激活函數(shù)
第6-8層:這些卷積層之后是完全連接層,每個(gè)層具有4096個(gè)神經(jīng)元。在原始論文中,他們對(duì)1000個(gè)類別的數(shù)據(jù)集進(jìn)行分類,但是我們將使用具有17個(gè)不同類別(的花卉)的oxford17數(shù)據(jù)集。
請(qǐng)注意,由于這些數(shù)據(jù)集中的圖像太小,因此無(wú)法在MNIST或CIFAR-10數(shù)據(jù)集上使用此CNN(或其他的深度CNN)。正如我們以前看到的,一個(gè)池化層(或一個(gè)步幅為2的卷積層)將圖像大小減小了2倍。 AlexNet具有3個(gè)最大池化層和一個(gè)步長(zhǎng)為4的卷積層。這意味著原始圖像尺寸會(huì)縮小2^5。 MNIST數(shù)據(jù)集中的圖像將簡(jiǎn)單地縮小到尺寸小于0。
因此,我們需要加載具有較大圖像的數(shù)據(jù)集,最好是224 x 224 x 3(如原始文件所示)。 17個(gè)類別的花卉數(shù)據(jù)集,又名oxflower17數(shù)據(jù)集是最理想的,因?yàn)樗诉@個(gè)大小的圖像:
ox17_image_width = 224
ox17_image_height = 224
ox17_image_depth = 3
ox17_num_labels = 17
import tflearn.datasets.oxflower17 as oxflower17
train_dataset_, train_labels_ = oxflower17.load_data(one_hot=True)
train_dataset_ox17, train_labels_ox17 = train_dataset_[:1000,:,:,:], train_labels_[:1000,:]
test_dataset_ox17, test_labels_ox17 = train_dataset_[1000:,:,:,:], train_labels_[1000:,:]
print('Training set', train_dataset_ox17.shape, train_labels_ox17.shape)
print('Test set', test_dataset_ox17.shape, test_labels_ox17.shape)
讓我們?cè)囍贏lexNet中創(chuàng)建權(quán)重矩陣和不同的層。正如我們之前看到的,我們需要跟層數(shù)一樣多的權(quán)重矩陣和偏差矢量,并且每個(gè)權(quán)重矩陣的大小應(yīng)該與其所屬層的過(guò)濾器的大小相對(duì)應(yīng)。
ALEX_PATCH_DEPTH_1, ALEX_PATCH_DEPTH_2, ALEX_PATCH_DEPTH_3, ALEX_PATCH_DEPTH_4 = 96, 256, 384, 256
ALEX_PATCH_SIZE_1, ALEX_PATCH_SIZE_2, ALEX_PATCH_SIZE_3, ALEX_PATCH_SIZE_4 = 11, 5, 3, 3
ALEX_NUM_HIDDEN_1, ALEX_NUM_HIDDEN_2 = 4096, 4096
def variables_alexnet(patch_size1 = ALEX_PATCH_SIZE_1, patch_size2 = ALEX_PATCH_SIZE_2,
patch_size3 = ALEX_PATCH_SIZE_3, patch_size4 = ALEX_PATCH_SIZE_4,
patch_depth1 = ALEX_PATCH_DEPTH_1, patch_depth2 = ALEX_PATCH_DEPTH_2,
patch_depth3 = ALEX_PATCH_DEPTH_3, patch_depth4 = ALEX_PATCH_DEPTH_4,
num_hidden1 = ALEX_NUM_HIDDEN_1, num_hidden2 = ALEX_NUM_HIDDEN_2,
image_width = 224, image_height = 224, image_depth = 3, num_labels = 17):
w1 = tf.Variable(tf.truncated_normal([patch_size1, patch_size1, image_depth, patch_depth1], stddev=0.1))
b1 = tf.Variable(tf.zeros([patch_depth1]))
w2 = tf.Variable(tf.truncated_normal([patch_size2, patch_size2, patch_depth1, patch_depth2], stddev=0.1))
b2 = tf.Variable(tf.constant(1.0, shape=[patch_depth2]))
w3 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth2, patch_depth3], stddev=0.1))
b3 = tf.Variable(tf.zeros([patch_depth3]))
w4 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth3, patch_depth3], stddev=0.1))
b4 = tf.Variable(tf.constant(1.0, shape=[patch_depth3]))
w5 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth3, patch_depth3], stddev=0.1))
b5 = tf.Variable(tf.zeros([patch_depth3]))
pool_reductions = 3
conv_reductions = 2
no_reductions = pool_reductions + conv_reductions
w6 = tf.Variable(tf.truncated_normal([(image_width // 2no_reductions)(image_height // 2no_reductions)patch_depth3, num_hidden1], stddev=0.1))
b6 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))
w7 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
b7 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
w8 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
b8 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
variables = {
'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5, 'w6': w6, 'w7': w7, 'w8': w8,
'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5, 'b6': b6, 'b7': b7, 'b8': b8
}
return variables
def model_alexnet(data, variables):
layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 4, 4, 1], padding='SAME')
layer1_relu = tf.nn.relu(layer1_conv + variables['b1'])
layer1_pool = tf.nn.max_pool(layer1_relu, [1, 3, 3, 1], [1, 2, 2, 1], padding='SAME')
layer1_norm = tf.nn.local_response_normalization(layer1_pool)
layer2_conv = tf.nn.conv2d(layer1_norm, variables['w2'], [1, 1, 1, 1], padding='SAME')
layer2_relu = tf.nn.relu(layer2_conv + variables['b2'])
layer2_pool = tf.nn.max_pool(layer2_relu, [1, 3, 3, 1], [1, 2, 2, 1], padding='SAME')
layer2_norm = tf.nn.local_response_normalization(layer2_pool)
layer3_conv = tf.nn.conv2d(layer2_norm, variables['w3'], [1, 1, 1, 1], padding='SAME')
layer3_relu = tf.nn.relu(layer3_conv + variables['b3'])
layer4_conv = tf.nn.conv2d(layer3_relu, variables['w4'], [1, 1, 1, 1], padding='SAME')
layer4_relu = tf.nn.relu(layer4_conv + variables['b4'])
layer5_conv = tf.nn.conv2d(layer4_relu, variables['w5'], [1, 1, 1, 1], padding='SAME')
layer5_relu = tf.nn.relu(layer5_conv + variables['b5'])
layer5_pool = tf.nn.max_pool(layer4_relu, [1, 3, 3, 1], [1, 2, 2, 1], padding='SAME')
layer5_norm = tf.nn.local_response_normalization(layer5_pool)
flat_layer = flatten_tf_array(layer5_norm)
layer6_fccd = tf.matmul(flat_layer, variables['w6']) + variables['b6']
layer6_tanh = tf.tanh(layer6_fccd)
layer6_drop = tf.nn.dropout(layer6_tanh, 0.5)
layer7_fccd = tf.matmul(layer6_drop, variables['w7']) + variables['b7']
layer7_tanh = tf.tanh(layer7_fccd)
layer7_drop = tf.nn.dropout(layer7_tanh, 0.5)
logits = tf.matmul(layer7_drop, variables['w8']) + variables['b8']
return logits
現(xiàn)在我們可以修改CNN模型來(lái)使用AlexNet模型的權(quán)重和層次來(lái)對(duì)圖像進(jìn)行分類。
3.2 VGG Net-16
VGG Net于2014年由牛津大學(xué)的Karen Simonyan和Andrew Zisserman創(chuàng)建出來(lái)。 它包含了更多的層(16-19層),但是每一層的設(shè)計(jì)更為簡(jiǎn)單;所有卷積層都具有3×3以及步長(zhǎng)為3的過(guò)濾器,并且所有最大池化層的步長(zhǎng)都為2。
所以它是一個(gè)更深的CNN,但更簡(jiǎn)單。
它存在不同的配置,16層或19層。 這兩種不同配置之間的區(qū)別是在第2,第3和第4最大池化層之后對(duì)3或4個(gè)卷積層的使用(見(jiàn)下文)。
配置為16層(配置D)的結(jié)果似乎更好,所以我們?cè)囍赥ensorflow中創(chuàng)建它。
#The VGGNET Neural Network
VGG16_PATCH_SIZE_1, VGG16_PATCH_SIZE_2, VGG16_PATCH_SIZE_3, VGG16_PATCH_SIZE_4 = 3, 3, 3, 3
VGG16_PATCH_DEPTH_1, VGG16_PATCH_DEPTH_2, VGG16_PATCH_DEPTH_3, VGG16_PATCH_DEPTH_4 = 64, 128, 256, 512
VGG16_NUM_HIDDEN_1, VGG16_NUM_HIDDEN_2 = 4096, 1000
def variables_vggnet16(patch_size1 = VGG16_PATCH_SIZE_1, patch_size2 = VGG16_PATCH_SIZE_2,
patch_size3 = VGG16_PATCH_SIZE_3, patch_size4 = VGG16_PATCH_SIZE_4,
patch_depth1 = VGG16_PATCH_DEPTH_1, patch_depth2 = VGG16_PATCH_DEPTH_2,
patch_depth3 = VGG16_PATCH_DEPTH_3, patch_depth4 = VGG16_PATCH_DEPTH_4,
num_hidden1 = VGG16_NUM_HIDDEN_1, num_hidden2 = VGG16_NUM_HIDDEN_2,
image_width = 224, image_height = 224, image_depth = 3, num_labels = 17):
w1 = tf.Variable(tf.truncated_normal([patch_size1, patch_size1, image_depth, patch_depth1], stddev=0.1))
b1 = tf.Variable(tf.zeros([patch_depth1]))
w2 = tf.Variable(tf.truncated_normal([patch_size1, patch_size1, patch_depth1, patch_depth1], stddev=0.1))
b2 = tf.Variable(tf.constant(1.0, shape=[patch_depth1]))
w3 = tf.Variable(tf.truncated_normal([patch_size2, patch_size2, patch_depth1, patch_depth2], stddev=0.1))
b3 = tf.Variable(tf.constant(1.0, shape = [patch_depth2]))
w4 = tf.Variable(tf.truncated_normal([patch_size2, patch_size2, patch_depth2, patch_depth2], stddev=0.1))
b4 = tf.Variable(tf.constant(1.0, shape = [patch_depth2]))
w5 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth2, patch_depth3], stddev=0.1))
b5 = tf.Variable(tf.constant(1.0, shape = [patch_depth3]))
w6 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth3, patch_depth3], stddev=0.1))
b6 = tf.Variable(tf.constant(1.0, shape = [patch_depth3]))
w7 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth3, patch_depth3], stddev=0.1))
b7 = tf.Variable(tf.constant(1.0, shape=[patch_depth3]))
w8 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth3, patch_depth4], stddev=0.1))
b8 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))
w9 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))
b9 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))
w10 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))
b10 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))
w11 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))
b11 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))
w12 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))
b12 = tf.Variable(tf.constant(1.0, shape=[patch_depth4]))
w13 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))
b13 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))
no_pooling_layers = 5
w14 = tf.Variable(tf.truncated_normal([(image_width // (2no_pooling_layers))(image_height // (2no_pooling_layers))patch_depth4 , num_hidden1], stddev=0.1))
b14 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))
w15 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
b15 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
w16 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
b16 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
variables = {
'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5, 'w6': w6, 'w7': w7, 'w8': w8, 'w9': w9, 'w10': w10,
'w11': w11, 'w12': w12, 'w13': w13, 'w14': w14, 'w15': w15, 'w16': w16,
'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5, 'b6': b6, 'b7': b7, 'b8': b8, 'b9': b9, 'b10': b10,
'b11': b11, 'b12': b12, 'b13': b13, 'b14': b14, 'b15': b15, 'b16': b16
}
return variables
def model_vggnet16(data, variables):
layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
layer1_actv = tf.nn.relu(layer1_conv + variables['b1'])
layer2_conv = tf.nn.conv2d(layer1_actv, variables['w2'], [1, 1, 1, 1], padding='SAME')
layer2_actv = tf.nn.relu(layer2_conv + variables['b2'])
layer2_pool = tf.nn.max_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
評(píng)論
查看更多