使用LSTM生成文本的TensorFlow

TensorFlow using LSTMs for generating text

我想使用tensorflow生成文本，并一直在修改LSTM教程(https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html#recurrent-neural-networks)代码以这样做，但是我的最初解决方案似乎产生了废话，即使经过长时间的训练，它也没有改善。我不明白为什么。这个想法是从零矩阵开始，然后一次生成一个单词。

这是代码，我在下面添加了两个功能
https://tensorflow.googlesource.com/tensorflow/ /master/tensorflow/models/rnn/ptb/ptb_word_lm.py

生成器如下所示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

def generate_text(session,m,eval_op):

state = m.initial_state.eval()

x = np.zeros((m.batch_size,m.num_steps), dtype=np.int32)

output = str()
for i in xrange(m.batch_size):
for step in xrange(m.num_steps):
try:
# Run the batch
# targets have to bee set but m is the validation model, thus it should not train the neural network
cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities],
{m.input_data: x, m.targets: x, m.initial_state: state})

# Sample a word-id and add it to the matrix and output
word_id = sample(probabilities[0,:])
output = output +"" + reader.word_from_id(word_id)
x[i][step] = word_id

except ValueError as e:
print("ValueError")

print(output)

我在ptb_model中添加了变量"概率"，它只是logits上的softmax。

1	self._probabilities = tf.nn.softmax(logits)

和采样：

1
2
3
4
5

def sample(a, temperature=1.0):
# helper function to sample an index from a probability array
a = np.log(a) / temperature
a = np.exp(a) / np.sum(np.exp(a))
return np.argmax(np.random.multinomial(1, a, 1))

我一直在朝着完全相同的目标努力，只是让它起作用。您在这里进行了许多正确的修改，但我认为您已经错过了一些步骤。

首先，要生成文本，您需要创建一个仅代表单个时间步长的模型的不同版本。原因是我们需要对每个输出y进行采样，然后才能将其输入到模型的下一步。我通过做一个新的配置来做到这一点，该配置将num_steps和batch_size都设置为等于1。

1
2
3
4
5
6
7
8
9
10
11
12
13
14

class SmallGenConfig(object):
"""Small config. for generation"""
init_scale = 0.1
learning_rate = 1.0
max_grad_norm = 5
num_layers = 2
num_steps = 1 # this is the main difference
hidden_size = 200
max_epoch = 4
max_max_epoch = 13
keep_prob = 1.0
lr_decay = 0.5
batch_size = 1
vocab_size = 10000

我还用以下几行为模型添加了一个概率：

1	self._output_probs = tf.nn.softmax(logits)

和

1
2
3

@property
def output_probs(self):
return self._output_probs

然后，我的generate_text()函数有一些区别。第一个是使用tf.train.Saver()对象从磁盘加载保存的模型参数。请注意，我们在从上方使用新配置实例化PTBModel之后执行此操作。

1
2
3
4
5
6
7
8
9
10
11
12
13

def generate_text(train_path, model_path, num_sentences):
gen_config = SmallGenConfig()

with tf.Graph().as_default(), tf.Session() as session:
initializer = tf.random_uniform_initializer(-gen_config.init_scale,
gen_config.init_scale)
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel(is_training=False, config=gen_config)

# Restore variables from disk.
saver = tf.train.Saver()
saver.restore(session, model_path)
print("Model restored from file" + model_path)

第二个区别是我得到了从id到单词字符串的查找表(我必须编写此函数，请参见下面的代码)。

1	words = reader.get_vocab(train_path)

我以与您相同的方式设置初始状态，但是随后我以不同的方式设置了初始令牌。我想使用"句子结尾"标记，以便从正确的单词类型开始我的句子。我浏览了index一词，发现<eos>恰好具有索引2(确定性)，因此我对其进行了硬编码。最后，我将其package在1x1 Numpy Matrix中，以便它成为模型输入的正确类型。

1
2
3

state = m.initial_state.eval()
x = 2 # the id for '<eos>' from the training set
input = np.matrix([[x]]) # a 2D numpy matrix

最后，这是我们生成句子的部分。请注意，我们告诉session.run()计算output_probs和final_state。我们给它输入和状态。在第一次迭代中，输入为<eos>，状态为initial_state，但是在后续迭代中，我们将最后一次采样的输出作为输入，并从最后一次迭代传递状态。还要注意，我们使用words列表从输出索引中查找单词字符串。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

text =""
count = 0
while count < num_sentences:
output_probs, state = session.run([m.output_probs, m.final_state],
{m.input_data: input,
m.initial_state: state})
x = sample(output_probs[0], 0.9)
if words[x]=="<eos>":
text +=".\
\
"
count += 1
else:
text +="" + words[x]
# now feed this new word as input into the next iteration
input = np.matrix([[x]])

那么我们要做的就是打印出我们累积的文本。

1 2	print(text) return

generate_text()函数就是这样。

最后，让我向您展示get_vocab()的函数定义，该函数定义已放置在reader.py中。

1
2
3
4
5
6
7
8
9

def get_vocab(filename):
data = _read_words(filename)

counter = collections.Counter(data)
count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

words, _ = list(zip(*count_pairs))

return words

您需要做的最后一件事是能够在训练模型后保存模型，看起来像

1	save_path = saver.save(session,"/tmp/model.ckpt")

这就是您稍后在生成文本时将从磁盘加载的模型。

还有一个问题：我发现，有时Tensorflow softmax函数产生的概率分布并不完全等于1.0。当总和大于1.0时，np.random.multinomial()会引发错误。所以我不得不编写自己的采样函数，看起来像这样

1
2
3
4
5
6
7
8
9
10

def sample(a, temperature=1.0):
a = np.log(a) / temperature
a = np.exp(a) / np.sum(np.exp(a))
r = random.random() # range: [0,1)
total = 0.0
for i in range(len(a)):
total += a[i]
if total>r:
return i
return len(a)-1

当您将所有这些放在一起时，小型模型能够为我带来一些很棒的句子。祝你好运。

我使用了您的代码，似乎不正确。因此，我对其进行了一些修改，似乎可以正常工作。
这是我的代码，但我不确定它是否正确：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

def generate_text(session,m,eval_op, word_list):
output = []
for i in xrange(20):
state = m.initial_state.eval()
x = np.zeros((1,1), dtype=np.int32)
y = np.zeros((1,1), dtype=np.int32)
output_str =""
for step in xrange(100):
if True:
# Run the batch
# targets have to bee set but m is the validation model, thus it should not train the neural network
cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities],
{m.input_data: x, m.targets: y, m.initial_state: state})
# Sample a word-id and add it to the matrix and output
word_id = sample(probabilities[0,:])
if (word_id<0) or (word_id > len(word_list)):
continue
#print(word_id)
output_str = output_str +"" + word_list[word_id]
x[0][0] = word_id
print(output_str)
output.append(output_str)
return output