我們都知道在CNN網(wǎng)絡(luò)中,輸入的是圖片的矩陣,也是最基本的特征,整個(gè)CNN網(wǎng)絡(luò)就是一個(gè)信息提取的過程,從底層的特征逐漸抽取到高度抽象的特征,網(wǎng)絡(luò)的層數(shù)越多也就意味這能夠提取到的不同級別的抽象特征更加豐富,并且越深的網(wǎng)絡(luò)提取的特征越抽象,就越具有語義信息。但神經(jīng)網(wǎng)絡(luò)越深真的越好嗎?我們可以看下面一張圖片,圖中描述了不同深度的傳統(tǒng)神經(jīng)網(wǎng)絡(luò)效果對比圖,顯然神經(jīng)網(wǎng)絡(luò)越深效果不一定好。
對于傳統(tǒng)CNN網(wǎng)絡(luò),網(wǎng)絡(luò)深度的增加,容易導(dǎo)致梯度消失和爆炸。針對梯度消失和爆炸的解決方法一般是正則初始化和中間的正則化層,但是這會導(dǎo)致另一個(gè)問題,退化問題,隨著網(wǎng)絡(luò)層數(shù)的增加,在訓(xùn)練集上的準(zhǔn)確率卻飽和甚至下降了。為此,殘差神經(jīng)網(wǎng)絡(luò)應(yīng)運(yùn)而生。
一、算法原理
殘差網(wǎng)絡(luò)通過加入 shortcut connections,變得更加容易被優(yōu)化。包含一個(gè) shortcut connection 的幾層網(wǎng)絡(luò)被稱為一個(gè)殘差塊(residual block),如下圖所示。
普通的平原網(wǎng)絡(luò)與深度殘差網(wǎng)絡(luò)的最大區(qū)別在于,深度殘差網(wǎng)絡(luò)有很多旁路的支線將輸入直接連到后面的層,使得后面的層可以直接學(xué)習(xí)殘差,這些支路就叫做shortcut。傳統(tǒng)的卷積層或全連接層在信息傳遞時(shí),或多或少會存在信息丟失、損耗等問題。ResNet 在某種程度上解決了這個(gè)問題,通過直接將輸入信息繞道傳到輸出,保護(hù)信息的完整性,整個(gè)網(wǎng)絡(luò)則只需要學(xué)習(xí)輸入、輸出差別的那一部分,簡化學(xué)習(xí)目標(biāo)和難度。
二、代碼實(shí)戰(zhàn)
構(gòu)建19層ResNet網(wǎng)絡(luò),以負(fù)荷預(yù)測為例
%%
clc
clear
close all
load Train.mat
% load Test.mat
Train.weekend = dummyvar(Train.weekend);
Train.month = dummyvar(Train.month);
Train = movevars(Train,{'weekend','month'},'After','demandLag');
Train.ts = [];
Train(1,:) =[];
y = Train.demand;
x = Train{:,2:5};
[xnorm,xopt] = mapminmax(x',0,1);
[ynorm,yopt] = mapminmax(y',0,1);
xnorm = xnorm(:,1:1000);
ynorm = ynorm(1:1000);
k = 24; % 滯后長度
% 轉(zhuǎn)換成2-D image
for i = 1:length(ynorm)-k
Train_xNorm{:,i} = xnorm(:,i:i+k-1);
Train_yNorm(i) = ynorm(i+k-1);
Train_y{i} = y(i+k-1);
end
Train_x = Train_xNorm';
ytest = Train.demand(1001:1170);
xtest = Train{1001:1170,2:5};
[xtestnorm] = mapminmax('apply', xtest',xopt);
[ytestnorm] = mapminmax('apply',ytest',yopt);
% xtestnorm = [xtestnorm; Train.weekend(1001:1170,:)'; Train.month(1001:1170,:)'];
xtest = xtest';
for i = 1:length(ytestnorm)-k
Test_xNorm{:,i} = xtestnorm(:,i:i+k-1);
Test_yNorm(i) = ytestnorm(i+k-1);
Test_y(i) = ytest(i+k-1);
end
Test_x = Test_xNorm';
x_train = table(Train_x,Train_y');
x_test = table(Test_x);
%% 訓(xùn)練集和驗(yàn)證集劃分
% TrainSampleLength = length(Train_yNorm);
% validatasize = floor(TrainSampleLength * 0.1);
% Validata_xNorm = Train_xNorm(:,end - validatasize:end,:);
% Validata_yNorm = Train_yNorm(:,TrainSampleLength-validatasize:end);
% Validata_y = Train_y(TrainSampleLength-validatasize:end);
%
% Train_xNorm = Train_xNorm(:,1:end-validatasize,:);
% Train_yNorm = Train_yNorm(:,1:end-validatasize);
% Train_y = Train_y(1:end-validatasize);
%% 構(gòu)建殘差神經(jīng)網(wǎng)絡(luò)
lgraph = layerGraph();
tempLayers = [
imageInputLayer([4 24],"Name","imageinput")
convolution2dLayer([3 3],32,"Name","conv","Padding","same")];
lgraph = addLayers(lgraph,tempLayers);
tempLayers = [
batchNormalizationLayer("Name","batchnorm")
reluLayer("Name","relu")];
lgraph = addLayers(lgraph,tempLayers);
tempLayers = [
additionLayer(2,"Name","addition")
convolution2dLayer([3 3],32,"Name","conv_1","Padding","same")];
lgraph = addLayers(lgraph,tempLayers);
tempLayers = [
batchNormalizationLayer("Name","batchnorm_1")
reluLayer("Name","relu_1")];
lgraph = addLayers(lgraph,tempLayers);
tempLayers = [
additionLayer(2,"Name","addition_1")
convolution2dLayer([3 3],32,"Name","conv_2","Padding","same")];
lgraph = addLayers(lgraph,tempLayers);
tempLayers = [
batchNormalizationLayer("Name","batchnorm_2")
reluLayer("Name","relu_2")];
lgraph = addLayers(lgraph,tempLayers);
tempLayers = [
additionLayer(2,"Name","addition_2")
convolution2dLayer([3 3],32,"Name","conv_3","Padding","same")];
lgraph = addLayers(lgraph,tempLayers);
tempLayers = [
batchNormalizationLayer("Name","batchnorm_3")
reluLayer("Name","relu_3")];
lgraph = addLayers(lgraph,tempLayers);
tempLayers = [
additionLayer(2,"Name","addition_3")
fullyConnectedLayer(1,"Name","fc")
regressionLayer("Name","regressionoutput")];
lgraph = addLayers(lgraph,tempLayers);
% 清理輔助變量
clear tempLayers;
lgraph = connectLayers(lgraph,"conv","batchnorm");
lgraph = connectLayers(lgraph,"conv","addition/in2");
lgraph = connectLayers(lgraph,"relu","addition/in1");
lgraph = connectLayers(lgraph,"conv_1","batchnorm_1");
lgraph = connectLayers(lgraph,"conv_1","addition_1/in2");
lgraph = connectLayers(lgraph,"relu_1","addition_1/in1");
lgraph = connectLayers(lgraph,"conv_2","batchnorm_2");
lgraph = connectLayers(lgraph,"conv_2","addition_2/in2");
lgraph = connectLayers(lgraph,"relu_2","addition_2/in1");
lgraph = connectLayers(lgraph,"conv_3","batchnorm_3");
lgraph = connectLayers(lgraph,"conv_3","addition_3/in2");
lgraph = connectLayers(lgraph,"relu_3","addition_3/in1");
plot(lgraph);
analyzeNetwork(lgraph);
%% 設(shè)置網(wǎng)絡(luò)參數(shù)
maxEpochs = 60;
miniBatchSize = 20;
options = trainingOptions('adam', ...
'MaxEpochs',maxEpochs, ...
'MiniBatchSize',miniBatchSize, ...
'InitialLearnRate',0.01, ...
'GradientThreshold',1, ...
'Shuffle','never', ...
'Plots','training-progress',...
'Verbose',0);
net = trainNetwork(x_train,lgraph ,options);
Predict_yNorm = predict(net,x_test);
Predict_y = double(Predict_yNorm)
plot(Test_y)
hold on
plot(Predict_y)
legend('真實(shí)值','預(yù)測值')