TensorFlow using LSTMs for generating text
我想使用tensorflow生成文本,并一直在修改LSTM教程(https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html#recurrent-neural-networks)代码以这样做,但是我的最初解决方案似乎产生了废话,即使经过长时间的训练,它也没有改善。我不明白为什么。这个想法是从零矩阵开始,然后一次生成一个单词。
这是代码,我在下面添加了两个功能
https://tensorflow.googlesource.com/tensorflow/ /master/tensorflow/models/rnn/ptb/ptb_word_lm.py
生成器如下所示
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | def generate_text(session,m,eval_op): state = m.initial_state.eval() x = np.zeros((m.batch_size,m.num_steps), dtype=np.int32) output = str() for i in xrange(m.batch_size): for step in xrange(m.num_steps): try: # Run the batch # targets have to bee set but m is the validation model, thus it should not train the neural network cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities], {m.input_data: x, m.targets: x, m.initial_state: state}) # Sample a word-id and add it to the matrix and output word_id = sample(probabilities[0,:]) output = output +"" + reader.word_from_id(word_id) x[i][step] = word_id except ValueError as e: print("ValueError") print(output) |
我在ptb_model中添加了变量"概率",它只是logits上的softmax。
1 | self._probabilities = tf.nn.softmax(logits) |
和采样:
1 2 3 4 5 | def sample(a, temperature=1.0): # helper function to sample an index from a probability array a = np.log(a) / temperature a = np.exp(a) / np.sum(np.exp(a)) return np.argmax(np.random.multinomial(1, a, 1)) |
我一直在朝着完全相同的目标努力,只是让它起作用。您在这里进行了许多正确的修改,但我认为您已经错过了一些步骤。
首先,要生成文本,您需要创建一个仅代表单个时间步长的模型的不同版本。原因是我们需要对每个输出y进行采样,然后才能将其输入到模型的下一步。我通过做一个新的配置来做到这一点,该配置将
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | class SmallGenConfig(object): """Small config. for generation""" init_scale = 0.1 learning_rate = 1.0 max_grad_norm = 5 num_layers = 2 num_steps = 1 # this is the main difference hidden_size = 200 max_epoch = 4 max_max_epoch = 13 keep_prob = 1.0 lr_decay = 0.5 batch_size = 1 vocab_size = 10000 |
我还用以下几行为模型添加了一个概率:
1 | self._output_probs = tf.nn.softmax(logits) |
和
1 2 3 | @property def output_probs(self): return self._output_probs |
然后,我的
1 2 3 4 5 6 7 8 9 10 11 12 13 | def generate_text(train_path, model_path, num_sentences): gen_config = SmallGenConfig() with tf.Graph().as_default(), tf.Session() as session: initializer = tf.random_uniform_initializer(-gen_config.init_scale, gen_config.init_scale) with tf.variable_scope("model", reuse=None, initializer=initializer): m = PTBModel(is_training=False, config=gen_config) # Restore variables from disk. saver = tf.train.Saver() saver.restore(session, model_path) print("Model restored from file" + model_path) |
第二个区别是我得到了从id到单词字符串的查找表(我必须编写此函数,请参见下面的代码)。
1 | words = reader.get_vocab(train_path) |
我以与您相同的方式设置初始状态,但是随后我以不同的方式设置了初始令牌。我想使用"句子结尾"标记,以便从正确的单词类型开始我的句子。我浏览了index一词,发现
1 2 3 | state = m.initial_state.eval() x = 2 # the id for '<eos>' from the training set input = np.matrix([[x]]) # a 2D numpy matrix |
最后,这是我们生成句子的部分。请注意,我们告诉
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | text ="" count = 0 while count < num_sentences: output_probs, state = session.run([m.output_probs, m.final_state], {m.input_data: input, m.initial_state: state}) x = sample(output_probs[0], 0.9) if words[x]=="<eos>": text +=".\ \ " count += 1 else: text +="" + words[x] # now feed this new word as input into the next iteration input = np.matrix([[x]]) |
那么我们要做的就是打印出我们累积的文本。
1 2 | print(text) return |
最后,让我向您展示
1 2 3 4 5 6 7 8 9 | def get_vocab(filename): data = _read_words(filename) counter = collections.Counter(data) count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0])) words, _ = list(zip(*count_pairs)) return words |
您需要做的最后一件事是能够在训练模型后保存模型,看起来像
1 | save_path = saver.save(session,"/tmp/model.ckpt") |
这就是您稍后在生成文本时将从磁盘加载的模型。
还有一个问题:我发现,有时Tensorflow softmax函数产生的概率分布并不完全等于1.0。当总和大于1.0时,
1 2 3 4 5 6 7 8 9 10 | def sample(a, temperature=1.0): a = np.log(a) / temperature a = np.exp(a) / np.sum(np.exp(a)) r = random.random() # range: [0,1) total = 0.0 for i in range(len(a)): total += a[i] if total>r: return i return len(a)-1 |
当您将所有这些放在一起时,小型模型能够为我带来一些很棒的句子。祝你好运。
我使用了您的代码,似乎不正确。因此,我对其进行了一些修改,似乎可以正常工作。
这是我的代码,但我不确定它是否正确:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | def generate_text(session,m,eval_op, word_list): output = [] for i in xrange(20): state = m.initial_state.eval() x = np.zeros((1,1), dtype=np.int32) y = np.zeros((1,1), dtype=np.int32) output_str ="" for step in xrange(100): if True: # Run the batch # targets have to bee set but m is the validation model, thus it should not train the neural network cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities], {m.input_data: x, m.targets: y, m.initial_state: state}) # Sample a word-id and add it to the matrix and output word_id = sample(probabilities[0,:]) if (word_id<0) or (word_id > len(word_list)): continue #print(word_id) output_str = output_str +"" + word_list[word_id] x[0][0] = word_id print(output_str) output.append(output_str) return output |