在過(guò)去的十年中,深度學(xué)習(xí)見(jiàn)證了某種形式的寒武紀(jì)大爆發(fā)。技術(shù)、應(yīng)用和算法的絕對(duì)數(shù)量遠(yuǎn)遠(yuǎn)超過(guò)了前幾十年的進(jìn)步。這是由于多種因素的偶然組合,其中之一是許多開(kāi)源深度學(xué)習(xí)框架提供的強(qiáng)大的免費(fèi)工具。Theano (Bergstra等人,2010 年)、DistBelief (Dean等人,2012 年)和 Caffe (Jia等人,2014 年)可以說(shuō)代表了被廣泛采用的第一代此類模型。與 SN2 (Simulateur Neuristique) 等早期(開(kāi)創(chuàng)性)作品相比 (Bottou 和 Le Cun,1988),它提供了類似 Lisp 的編程體驗(yàn),現(xiàn)代框架提供了自動(dòng)微分和 Python 的便利性。這些框架使我們能夠自動(dòng)化和模塊化實(shí)現(xiàn)基于梯度的學(xué)習(xí)算法的重復(fù)性工作。
在3.4 節(jié)中,我們僅依靠 (i) 張量進(jìn)行數(shù)據(jù)存儲(chǔ)和線性代數(shù);(ii) 計(jì)算梯度的自動(dòng)微分。在實(shí)踐中,由于數(shù)據(jù)迭代器、損失函數(shù)、優(yōu)化器和神經(jīng)網(wǎng)絡(luò)層非常普遍,現(xiàn)代圖書(shū)館也為我們實(shí)現(xiàn)了這些組件。在本節(jié)中,我們將向您展示如何 使用深度學(xué)習(xí)框架的高級(jí) API 簡(jiǎn)潔地實(shí)現(xiàn)3.4 節(jié)中的線性回歸模型。
import numpy as np import torch from torch import nn from d2l import torch as d2l
from mxnet import autograd, gluon, init, np, npx from mxnet.gluon import nn from d2l import mxnet as d2l npx.set_np()
import jax import optax from flax import linen as nn from jax import numpy as jnp from d2l import jax as d2l
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
import numpy as np import tensorflow as tf from d2l import tensorflow as d2l
3.5.1. 定義模型
當(dāng)我們?cè)诘?3.4 節(jié)中從頭開(kāi)始實(shí)現(xiàn)線性回歸時(shí) ,我們明確定義了我們的模型參數(shù)并編寫(xiě)了計(jì)算代碼以使用基本線性代數(shù)運(yùn)算生成輸出。你應(yīng)該知道如何做到這一點(diǎn)。但是一旦您的模型變得更加復(fù)雜,并且一旦您幾乎每天都必須這樣做,您就會(huì)很高興獲得幫助。這種情況類似于從頭開(kāi)始編寫(xiě)自己的博客。做一兩次是有益和有益的,但如果你花一個(gè)月重新發(fā)明輪子,你將成為一個(gè)糟糕的 Web 開(kāi)發(fā)人員。
對(duì)于標(biāo)準(zhǔn)操作,我們可以使用框架的預(yù)定義層,這使我們能夠?qū)W⒂谟糜跇?gòu)建模型的層,而不用擔(dān)心它們的實(shí)現(xiàn)。回想一下圖 3.1.2中描述的單層網(wǎng)絡(luò)的架構(gòu)。該層稱為全連接層,因?yàn)樗拿總€(gè)輸入都通過(guò)矩陣向量乘法連接到它的每個(gè)輸出。
在 PyTorch 中,全連接層定義在Linear和 LazyLinear(自版本 1.8.0 起可用)類中。后者允許用戶僅指定輸出維度,而前者額外詢問(wèn)有多少輸入進(jìn)入該層。指定輸入形狀很不方便,這可能需要大量的計(jì)算(例如在卷積層中)。因此,為簡(jiǎn)單起見(jiàn),我們將盡可能使用此類“惰性”層。
class LinearRegression(d2l.Module): #@save """The linear regression model implemented with high-level APIs.""" def __init__(self, lr): super().__init__() self.save_hyperparameters() self.net = nn.LazyLinear(1) self.net.weight.data.normal_(0, 0.01) self.net.bias.data.fill_(0)
In Gluon, the fully connected layer is defined in the Dense class. Since we only want to generate a single scalar output, we set that number to 1. It is worth noting that, for convenience, Gluon does not require us to specify the input shape for each layer. Hence we do not need to tell Gluon how many inputs go into this linear layer. When we first pass data through our model, e.g., when we execute net(X) later, Gluon will automatically infer the number of inputs to each layer and thus instantiate the correct model. We will describe how this works in more detail later.
class LinearRegression(d2l.Module): #@save """The linear regression model implemented with high-level APIs.""" def __init__(self, lr): super().__init__() self.save_hyperparameters() self.net = nn.Dense(1) self.net.initialize(init.Normal(sigma=0.01))
class LinearRegression(d2l.Module): #@save """The linear regression model implemented with high-level APIs.""" lr: float def setup(self): self.net = nn.Dense(1, kernel_init=nn.initializers.normal(0.01))
In Keras, the fully connected layer is defined in the Dense class. Since we only want to generate a single scalar output, we set that number to 1. It is worth noting that, for convenience, Keras does not require us to specify the input shape for each layer. We do not need to tell Keras how many inputs go into this linear layer. When we first try to pass data through our model, e.g., when we execute net(X) later, Keras will automatically infer the number of inputs to each layer. We will describe how this works in more detail later.
class LinearRegression(d2l.Module): #@save """The linear regression model implemented with high-level APIs.""" def __init__(self, lr): super().__init__() self.save_hyperparameters() initializer = tf.initializers.RandomNormal(stddev=0.01) self.net = tf.keras.layers.Dense(1, kernel_initializer=initializer)
在forward方法中,我們只調(diào)用預(yù)定義層的內(nèi)置__call__ 方法來(lái)計(jì)算輸出。
@d2l.add_to_class(LinearRegression) #@save def forward(self, X): return self.net(X)
@d2l.add_to_class(LinearRegression) #@save def forward(self, X): return self.net(X)
@d2l.add_to_class(LinearRegression) #@save def forward(self, X): return self.net(X)
@d2l.add_to_class(LinearRegression) #@save def forward(self, X): return self.net(X)
3.5.2. 定義損失函數(shù)
該類MSELoss計(jì)算均方誤差(沒(méi)有 1/2(3.1.5)中的因素)。默認(rèn)情況下,MSELoss 返回示例的平均損失。它比我們自己實(shí)現(xiàn)更快(也更容易使用)。
@d2l.add_to_class(LinearRegression) #@save def loss(self, y_hat, y): fn = nn.MSELoss() return fn(y_hat, y)
The loss module defines many useful loss functions. For speed and convenience, we forgo implementing our own and choose the built-in loss.L2Loss instead. Because the loss that it returns is the squared error for each example, we use meanto average the loss across over the minibatch.
@d2l.add_to_class(LinearRegression) #@save def loss(self, y_hat, y): fn = gluon.loss.L2Loss() return fn(y_hat, y).mean()
@d2l.add_to_class(LinearRegression) #@save def loss(self, params, X, y, state): y_hat = state.apply_fn({'params': params}, *X) return optax.l2_loss(y_hat, y).mean()
The MeanSquaredError class computes the mean squared error (without the 1/2 factor in (3.1.5)). By default, it returns the average loss over examples.
@d2l.add_to_class(LinearRegression) #@save def loss(self, y_hat, y): fn = tf.keras.losses.MeanSquaredError() return fn(y, y_hat)
3.5.3. 定義優(yōu)化算法
Minibatch SGD 是用于優(yōu)化神經(jīng)網(wǎng)絡(luò)的標(biāo)準(zhǔn)工具,因此 PyTorch 支持它以及模塊中該算法的許多變體optim。當(dāng)我們實(shí)例化一個(gè)SGD實(shí)例時(shí),我們指定要優(yōu)化的參數(shù),可通過(guò) 和我們的優(yōu)化算法所需的self.parameters()學(xué)習(xí)率 ( ) 從我們的模型中獲得。self.lr
@d2l.add_to_class(LinearRegression) #@save def configure_optimizers(self): return torch.optim.SGD(self.parameters(), self.lr)
Minibatch SGD is a standard tool for optimizing neural networks and thus Gluon supports it alongside a number of variations on this algorithm through its Trainer class. Note that Gluon’s Trainer class stands for the optimization algorithm, while the Trainer class we created in Section 3.2 contains the training method, i.e., repeatedly call the optimizer to update the model parameters. When we instantiate Trainer, we specify the parameters to optimize over, obtainable from our model net via net.collect_params(), the optimization algorithm we wish to use (sgd), and a dictionary of hyperparameters required by our optimization algorithm.
@d2l.add_to_class(LinearRegression) #@save def configure_optimizers(self): return gluon.Trainer(self.collect_params(), 'sgd', {'learning_rate': self.lr})
@d2l.add_to_class(LinearRegression) #@save def configure_optimizers(self): return optax.sgd(self.lr)
Minibatch SGD is a standard tool for optimizing neural networks and thus Keras supports it alongside a number of variations on this algorithm in the optimizers module.
@d2l.add_to_class(LinearRegression) #@save def configure_optimizers(self): return tf.keras.optimizers.SGD(self.lr)
3.5.4. 訓(xùn)練
您可能已經(jīng)注意到,通過(guò)深度學(xué)習(xí)框架的高級(jí) API 表達(dá)我們的模型需要更少的代碼行。我們不必單獨(dú)分配參數(shù)、定義損失函數(shù)或?qū)嵤┬∨?SGD。一旦我們開(kāi)始處理更復(fù)雜的模型,高級(jí) API 的優(yōu)勢(shì)就會(huì)顯著增加?,F(xiàn)在我們已經(jīng)準(zhǔn)備好所有基本部分,訓(xùn)練循環(huán)本身與我們從頭開(kāi)始實(shí)施的循環(huán)相同。所以我們只需要調(diào)用 依賴于3.4節(jié)方法 實(shí)現(xiàn)的方法( 3.2.4fit節(jié)介紹)來(lái)訓(xùn)練我們的模型。fit_epoch
model = LinearRegression(lr=0.03) data = d2l.SyntheticRegressionData(w=torch.tensor([2, -3.4]), b=4.2) trainer = d2l.Trainer(max_epochs=3) trainer.fit(model, data)
model = LinearRegression(lr=0.03) data = d2l.SyntheticRegressionData(w=np.array([2, -3.4]), b=4.2) trainer = d2l.Trainer(max_epochs=3) trainer.fit(model, data)
model = LinearRegression(lr=0.03) data = d2l.SyntheticRegressionData(w=jnp.array([2, -3.4]), b=4.2) trainer = d2l.Trainer(max_epochs=3) trainer.fit(model, data)
model = LinearRegression(lr=0.03) data = d2l.SyntheticRegressionData(w=tf.constant([2, -3.4]), b=4.2) trainer = d2l.Trainer(max_epochs=3) trainer.fit(model, data)
下面,我們將通過(guò)有限數(shù)據(jù)訓(xùn)練學(xué)習(xí)到的模型參數(shù)與生成數(shù)據(jù)集的實(shí)際參數(shù)進(jìn)行比較。為了訪問(wèn)參數(shù),我們?cè)L問(wèn)了我們需要的層的權(quán)重和偏差。正如我們從頭開(kāi)始實(shí)施一樣,請(qǐng)注意我們估計(jì)的參數(shù)接近于它們的真實(shí)對(duì)應(yīng)物。
@d2l.add_to_class(LinearRegression) #@save def get_w_b(self): return (self.net.weight.data, self.net.bias.data) w, b = model.get_w_b() print(f'error in estimating w: {data.w - w.reshape(data.w.shape)}') print(f'error in estimating b: {data.b - b}')
error in estimating w: tensor([ 0.0022, -0.0069]) error in estimating b: tensor([0.0080])
@d2l.add_to_class(LinearRegression) #@save def get_w_b(self): return (self.net.weight.data(), self.net.bias.data()) w, b = model.get_w_b()
@d2l.add_to_class(LinearRegression) #@save def get_w_b(self, state): net = state.params['net'] return net['kernel'], net['bias'] w, b = model.get_w_b(trainer.state)
@d2l.add_to_class(LinearRegression) #@save def get_w_b(self): return (self.get_weights()[0], self.get_weights()[1]) w, b = model.get_w_b()
3.5.5. 概括
本節(jié)包含深度網(wǎng)絡(luò)(在本書(shū)中)的第一個(gè)實(shí)現(xiàn),以利用現(xiàn)代深度學(xué)習(xí)框架提供的便利,例如 MXNet (Chen等人,2015 年)、JAX (Frostig等人,2018 年)、PyTorch (Paszke等人,2019 年)和 Tensorflow (Abadi等人,2016 年). 我們使用框架默認(rèn)值來(lái)加載數(shù)據(jù)、定義層、損失函數(shù)、優(yōu)化器和訓(xùn)練循環(huán)。每當(dāng)框架提供所有必要的功能時(shí),使用它們通常是個(gè)好主意,因?yàn)檫@些組件的庫(kù)實(shí)現(xiàn)往往會(huì)針對(duì)性能進(jìn)行大量?jī)?yōu)化,并針對(duì)可靠性進(jìn)行適當(dāng)測(cè)試。同時(shí),盡量不要忘記這些模塊是可以直接實(shí)現(xiàn)的。這對(duì)于希望生活在模型開(kāi)發(fā)前沿的有抱負(fù)的研究人員尤其重要,您將在其中發(fā)明任何當(dāng)前庫(kù)中不可能存在的新組件。
在PyTorch中,該data模塊提供了數(shù)據(jù)處理的工具,該 nn模塊定義了大量的神經(jīng)網(wǎng)絡(luò)層和常見(jiàn)的損失函數(shù)。我們可以通過(guò)用以結(jié)尾的方法替換它們的值來(lái)初始化參數(shù)_。請(qǐng)注意,我們需要指定網(wǎng)絡(luò)的輸入維度。雖然這目前微不足道,但當(dāng)我們想要設(shè)計(jì)具有多層的復(fù)雜網(wǎng)絡(luò)時(shí),它可能會(huì)產(chǎn)生重大的連鎖反應(yīng)。需要仔細(xì)考慮如何參數(shù)化這些網(wǎng)絡(luò)以實(shí)現(xiàn)可移植性。
In Gluon, the data module provides tools for data processing, the nn module defines a large number of neural network layers, and the loss module defines many common loss functions. Moreover, the initializer gives access to many choices for parameter initialization. Conveniently for the user, dimensionality and storage are automatically inferred. A consequence of this lazy initialization is that you must not attempt to access parameters before they have been instantiated (and initialized).
In TensorFlow, the data module provides tools for data processing, the keras module defines a large number of neural network layers and common loss functions. Moreover, the initializers module provides various methods for model parameter initialization. Dimensionality and storage for networks are automatically inferred (but be careful not to attempt to access parameters before they have been initialized).
3.5.6. 練習(xí)
如果將小批量的總損失替換為小批量損失的平均值,您需要如何更改學(xué)習(xí)率?
查看框架文檔以查看提供了哪些損失函數(shù)。特別是,用 Huber 的穩(wěn)健損失函數(shù)替換平方損失。即使用損失函數(shù)
(3.5.1)l(y,y′)={|y?y′|?σ2if|y?y′|>σ12σ(y?y′)2otherwise
您如何訪問(wèn)模型權(quán)重的梯度?
如果改變學(xué)習(xí)率和迭代次數(shù),解決方案會(huì)如何變化?它是否不斷改進(jìn)?
當(dāng)您更改生成的數(shù)據(jù)量時(shí),解決方案如何變化?
繪制估計(jì)誤差 w^?w和b^?b作為數(shù)據(jù)量的函數(shù)。提示:以對(duì)數(shù)方式而不是線性方式增加數(shù)據(jù)量,即 5、10、20、50、...、10,000 而不是 1,000、2,000、...、10,000。
為什么提示中的建議是合適的?
-
pytorch
+關(guān)注
關(guān)注
2文章
808瀏覽量
13248
發(fā)布評(píng)論請(qǐng)先 登錄
相關(guān)推薦
評(píng)論