Pytorch简单实现BiLSTM文本生成

一、准备数据

seq = "I love you. Chinese vocabulary is generally used to express one's feelings to another person whom one admires. It can also be used among relatives. It is the expression of one person's feelings to another. It can also be used to express things with strong feelings, such as pets and goods. It can be said by boys to girls, girls to boys, girls to girls, boys to boys."

此后需要将数据转换为小写并且去除标点符号，保留空格，并且建立字母索引表，如下所示：

index2word = {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g', 7: 'h', 8: 'i', 9: 'j', 10: 'k', 11: 'l', 12: 'm', 13: 'n', 14: 'o', 15: 'p', 16: 'q', 17: 'r', 18: 's', 19: 't', 20: 'u', 21: 'v', 22: 'w', 23: 'x', 24: 'y', 25: 'z', 26: ' '}
word2index = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6, 'h': 7, 'i': 8, 'j': 9, 'k': 10, 'l': 11, 'm': 12, 'n': 13, 'o': 14, 'p': 15, 'q': 16, 'r': 17, 's': 18, 't': 19, 'u': 20, 'v': 21, 'w': 22, 'x': 23, 'y': 24, 'z': 25, ' ': 26}

再将seq中的使用index表示，例如 :

"i love" = ['i', ' ', 'l', 'o', 'v', 'e'] = [8, 26, 11, 14, 21, 4]

最后设置窗口大小，例如：每5个字母预测下一个字母，设置window = 5，如图：

二、模型

输入采用embedding生成词向量输入
模型采用双向LSTM接一个LSTM，将LSTM最后一个隐层作为全连接层的输入
此模型图如下所示：

三、具体代码

import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as Data
from torch.autograd import Variableseq = "I love you. Chinese vocabulary is generally used to express one's feelings to another person whom one admires. It can also be used among relatives. It is the expression of one person's feelings to another. It can also be used to express things with strong feelings, such as pets and goods. It can be said by boys to girls, girls to boys, girls to girls, boys to boys."
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ']
# 转为小写并去除标点符号
seq_lower = [i for i in seq.lower() if i in letters]word2index = {}
id = 0
for i in letters:word2index[i] = idid += 1
index2word = {value: key for key, value in word2index.items()}# 句子索引
seq_index = [word2index[i] for i in seq_lower]
seq_length = len(seq_index)
window = 3
# 生成输入数据
batch_x = []
batch_y = []
for i in range(seq_length - window + 1):x = seq_index[i: i + window]if i + window >= seq_length:y = word2index[' ']else:y = seq_index[i + window]batch_x.append(x)batch_y.append(y)# 训练数据
batch_x, batch_y = Variable(torch.LongTensor(batch_x)), Variable(torch.LongTensor(batch_y))# 参数
vocab_size = len(letters)
embedding_size = 16
n_hidden = 32
batch_size = 10
num_classes = vocab_sizedataset = Data.TensorDataset(batch_x, batch_y)
loader = Data.DataLoader(dataset, batch_size, shuffle=True)# 建立模型
class BiLSTM(nn.Module):def __init__(self):super(BiLSTM, self).__init__()self.word_vec = nn.Embedding(vocab_size, embedding_size)# bidirectional双向LSTMself.bilstm = nn.LSTM(embedding_size, n_hidden, 1, bidirectional=True)self.lstm = nn.LSTM(2 * n_hidden, 2 * n_hidden, 1, bidirectional=False)self.fc = nn.Linear(n_hidden * 2, num_classes)def forward(self, input):embedding_input = self.word_vec(input)# 调换第一维和第二维度embedding_input = embedding_input.permute(1, 0, 2)bilstm_output, (h_n1, c_n1) = self.bilstm(embedding_input)lstm_output, (h_n2, c_n2)= self.lstm(bilstm_output)fc_out = self.fc(lstm_output[-1])return fc_outmodel = BiLSTM()
print(model)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)# 训练
for epoch in range(50):cost = 0for input_batch, target_batch in loader:pred = model(input_batch)loss = criterion(pred, target_batch)cost += loss.item()optimizer.zero_grad()loss.backward()optimizer.step()print("Epoch: %d,  loss: %.5f " % (epoch, cost))# 测试
test_text = 'lov'
test_batch = [[word2index[i] for i in test_text]]
test_batch = torch.LongTensor(test_batch)
out = model(test_batch)
predict = torch.max(out, 1)[1].item()
print(test_text,"后一个字母为：", index2word[predict])

Pytorch简单实现BiLSTM文本生成相关推荐

四步快速配置一个简单高效的文本生成图像基准模型 T2I baseline
本文将介绍一个简单高效的文本生成图像基准模型,该基准模型是DF-GAN20版代码,清楚简单,实用性高,本基准模型代码在他的基础上经过少量简化和处理,虚拟环境也进行了打包,非常适合作为一个基线模型,然后 ...
PyTorch:Bi-LSTM的文本生成
作者|Fernando López 编译|VK 来源|Towards Data Science "写作没有规定.有时它来得容易而且完美:有时就像在岩石上钻孔,然后用炸药把它炸开一样.&quo ...
【深度学习】PyTorch:Bi-LSTM的文本生成
作者 | Fernando López 编译 | VK 来源 | Towards Data Science ❝ "写作没有规定.有时它来得容易而且完美:有时就像在岩石上钻孔,然后用炸药把它炸 ...
pytorch使用lstm_在PyTorch中使用Bi-LSTM生成文本
pytorch使用lstm "There is no rule on how to write. Sometimes it comes easily and perfectly: somet ...
Bi-LSTM的文本生成
❝ "写作没有规定.有时它来得容易而且完美:有时就像在岩石上钻孔,然后用炸药把它炸开一样."-欧内斯特·海明威 ❞ 本文的目的是解释如何通过实现基于LSTMs的强大体系结构来构建文 ...
RNN LSTM GRU 代码实战 ---- 简单的文本生成任务
RNN LSTM GRU 代码实战 ---- 简单的文本生成任务 import torch if torch.cuda.is_available():# Tell PyTorch to use the ...
基于逻辑回归，支持向量机，朴素贝叶斯以及简单深度学习文本分类方法（BiLSTM、CNN）实现的中文情感分析，含数据集可直接运行
基于逻辑回归,支持向量机,朴素贝叶斯以及简单深度学习文本分类方法(BiLSTM.CNN)实现的中文情感分析,含数据集可直接运行完整代码下载地址:中文情感分析中文情感分析本项目旨在通过一个中文情感 ...
python自动生成鸡汤文_马尔可夫链文本生成的简单应用：不足20行的Python代码生成鸡汤文...
提到自然语言的生成时,人们通常认为要会使用高级数学来思考先进的AI系统,然而,并不一定要这样.在这篇文章中,我将使用马尔可夫链和一个小的语录数据集来产生新的语录. 马尔可夫链马尔可夫链是一个只根据先 ...
PyTorch实现用于文本生成的循环神经网络
作者|DR. VAIBHAV KUMAR 编译|VK 来源|Analytics In Diamag 自然语言处理(NLP)有很多有趣的应用,文本生成就是其中一个有趣的应用. 当一个机器学习模型工作在诸 ...

Pytorch简单实现BiLSTM文本生成

一、准备数据

二、模型

三、具体代码

Pytorch简单实现BiLSTM文本生成相关推荐

最新文章

热门文章