與詞相似度和類比任務(wù)一樣,我們也可以將預(yù)訓(xùn)練詞向量應(yīng)用于情感分析。由于第 16.1 節(jié)中的 IMDb 評(píng)論數(shù)據(jù)集 不是很大,使用在大規(guī)模語料庫上預(yù)訓(xùn)練的文本表示可能會(huì)減少模型的過度擬合。作為圖 16.2.1所示的具體示例 ,我們將使用預(yù)訓(xùn)練的 GloVe 模型表示每個(gè)標(biāo)記,并將這些標(biāo)記表示輸入多層雙向 RNN 以獲得文本序列表示,并將其轉(zhuǎn)換為情感分析輸出 (Maas等,2011)。對(duì)于相同的下游應(yīng)用程序,我們稍后會(huì)考慮不同的架構(gòu)選擇。
16.2.1。用 RNN 表示單個(gè)文本
在文本分類任務(wù)中,例如情感分析,變長的文本序列將被轉(zhuǎn)換為固定長度的類別。在下面的BiRNN
類中,雖然文本序列的每個(gè)標(biāo)記都通過嵌入層 ( self.embedding
) 獲得其單獨(dú)的預(yù)訓(xùn)練 GloVe 表示,但整個(gè)序列由雙向 RNN ( self.encoder
) 編碼。更具體地說,雙向 LSTM 在初始和最終時(shí)間步的隱藏狀態(tài)(在最后一層)被連接起來作為文本序列的表示。然后通過具有兩個(gè)輸出(“正”和“負(fù)”)的全連接層 ( self.decoder
) 將該單一文本表示轉(zhuǎn)換為輸出類別。
class BiRNN(nn.Module):
def __init__(self, vocab_size, embed_size, num_hiddens,
num_layers, **kwargs):
super(BiRNN, self).__init__(**kwargs)
self.embedding = nn.Embedding(vocab_size, embed_size)
# Set `bidirectional` to True to get a bidirectional RNN
self.encoder = nn.LSTM(embed_size, num_hiddens, num_layers=num_layers,
bidirectional=True)
self.decoder = nn.Linear(4 * num_hiddens, 2)
def forward(self, inputs):
# The shape of `inputs` is (batch size, no. of time steps). Because
# LSTM requires its input's first dimension to be the temporal
# dimension, the input is transposed before obtaining token
# representations. The output shape is (no. of time steps, batch size,
# word vector dimension)
embeddings = self.embedding(inputs.T)
self.encoder.flatten_parameters()
# Returns hidden states of the last hidden layer at different time
# steps. The shape of `outputs` is (no. of time steps, batch size,
# 2 * no. of hidden units)
outputs, _ = self.encoder(embeddings)
# Concatenate the hidden states at the initial and final time steps as
# the input of the fully connected layer. Its shape is (batch size,
# 4 * no. of hidden units)
encoding = torch.cat((outputs[0], outputs[-1]), dim=1)
outs = self.decoder(encoding)
return outs
class BiRNN(nn.Block):
def __init__(self, vocab_size, embed_size, num_hiddens,
num_layers, **kwargs):
super(BiRNN, self).__init__(**kwargs)
self.embedding = nn.Embedding(vocab_size, embed_size)
# Set `bidirectional` to True to get a bidirectional RNN
self.encoder = rnn.LSTM(num_hiddens, num_layers=num_layers,
bidirectional=True, input_size=embed_size)
self.decoder = nn.Dense(2)
def forward(self, inputs):
# The shape of `inputs` is (batch size, no. of time steps). Because
# LSTM requires its input's first dimension to be the temporal
# dimension, the input is transposed before obtaining token
# representations. The output shape is (no. of time steps, batch size,
# word vector dimension)
embeddings = self.embedding(inputs.T)
# Returns hidden states of the last hidden layer at different time
# steps. The shape of `outputs` is (no. of time steps, batch size,
# 2 * no. of hidden units)
outputs = self.encoder(embeddings)
# Concatenate the hidden states at the initial and final time steps as
# the input of the fully connected layer. Its shape is (batch size,
# 4 * no. of hidden units)
encoding = np.concatenate((outputs[0], outputs[-1]), axis=1)
outs = self.decoder(encoding)
return outs
讓我們構(gòu)建一個(gè)具有兩個(gè)隱藏層的雙向 RNN 來表示用于情感分析的單個(gè)文本。
embed_size, num_hiddens, num_layers, devices = 100, 100, 2, d2l.try_all_gpus()
net = BiRNN(len(vocab), embed_size, num_hiddens, num_layers)
def init_weights(module):
if type(module) == nn.Linear:
nn.init.xavier_uniform_(module.weight)
if type(module) == nn.LSTM:
for param in module._flat_weights_names:
if "weight" in param:
nn.init.xavier_uniform_(module._parameters[param])
net.apply(init_weights);
16.2.2。加載預(yù)訓(xùn)練詞向量
embed_size
下面我們?yōu)樵~匯表中的標(biāo)記加載預(yù)訓(xùn)練的 100 維(需要與 一致)GloVe 嵌入。
打印詞匯表中所有標(biāo)記的向量形狀。
我們使用這些預(yù)訓(xùn)練的詞向量來表示評(píng)論中的標(biāo)記,并且不會(huì)在訓(xùn)練期間更新這些向量。
16.2.3。訓(xùn)練和評(píng)估模型
現(xiàn)在我們可以訓(xùn)練雙向 RNN 進(jìn)行情感分析。
lr, num_epochs = 0.01, 5
trainer = torch.optim.Adam(net.parameters(), lr=lr)
loss = nn.CrossEntropyLoss(reduction="none")
d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices)
loss 0.311, train acc 0.872, test acc 0.850
574.5 examples/sec on [device(type='cuda', index=0), device(type='cuda', index=1)]
loss 0.428, train acc 0.806, test acc 0.791
488.5 examples/sec on [gpu(0), gpu(1)]
我們定義了以下函數(shù)來使用經(jīng)過訓(xùn)練的模型預(yù)測(cè)文本序列的情緒net
。
評(píng)論
查看更多