This article comes from the Google AI Blog. Its dialogue research in the open field is trying to develop a chat bot (not specifically aimed at chatting) which can still actually chat with users about anything. Today’s chatbots often answer unanswered questions.
Meena is an end-to-end trained neural dialog model based on 2.6 billion parameters that learns to respond wisely to specific dialog contexts. Aiming at open domain chatbots, this paper proposes an artificial evaluation index: the reasonableness and specificity average (SSA), which captures the basic but important attributes of natural conversations.
Perplexity is an automatic indicator applicable to any neural dialogue model. It is highly related with SSA and is useful parameter for measurement of the uncertainty of a language model. The lower the perplexity, the better the model’s SSA score and the stronger the correlation coefficient. Experiments have shown that Meena’s SSA is very close to humans. In subsequent work, other attributes such as personality and factuality will also be considered.
Original text from Google AI Blog by Daniel Adiwardana
The following is his explanation of ‘Meena’ over his blog.
Modern chatbots are often very professional, and as long as the user’s behavior is not far from expectations, they will perform well. In order to better handle a wide variety of conversation topics, open-ended dialogue research has explored a complementary approach, attempting to develop a chatbot that is not specifically targeted at chat but can still chat with users about anything.
This is not only an interesting research question, but it can also promote the development of many interesting apps, such as further humanized computer interaction, improved foreign language exercises, and the creation of interactive movie and video game characters that can be linked.
However, current open-domain chatbots have a serious flaw: what they say is usually meaningless. Sometimes they speak inconsistently, or lack common sense and basic knowledge about the world. In addition, the answer given by the chatbot is not a context-specific response. For example, “I don’t know” can be used to answer any question, but it is not specific. Current chatbots do this more often than humans because there are many possible user inputs during a conversation.
In the paper “Towards a human-like open domain chatbot”, we introduced Meena, an end-to-end trained neural conversation model with 2.6 billion parameters. We have proven that Meena can engage in more reasonable and specific conversations compared to existing advanced chatbots. We propose a new artificial evaluation index for open domain chatbots , namely the average of rationality and specificity (hereinafter referred to as SSA) , which can capture basic but important attributes in human dialogue .
It is noteworthy that, we show that perplexity (Perplexity) is an automated index for any nerve dialogue model, and highly relevant to the SSA.
Meena is an end-to-end neural dialogue model that learns to make intelligent answers based on specific contexts. The goal of training is to minimize the degree of perplexity, which is to predict the uncertainty of the next sentence. In the example above, it refers to the next word in the conversation.
Specifically, Meena has a single ET encoder module and 13 ET decoder modules, as shown in the figure below. The encoder handles the context of the conversation to help Meena understand what has been said in the conversation. The decoder then uses this information to construct the actual answer. By adjusting the hyper-parameters, we find that a powerful decoder is the key to improving the quality of the conversation.
Legend: Meena’s 7-sentence dialogue context and the generation of “next generation” responses, examples.
The sessions used for training are organized as a tree of clues, where each reply in each clue is treated as a session round. We extract each session training example (with seven contextual conversations back and forth) as a path to the clue tree. We think that the seven back-and-forth conversations are a good balance, while ensuring that the training context is long enough, and that it does not exceed the memory limit, because longer contexts consume more memory.
The Meena model has 2.6 billion parameters and is trained on 341 GB of text, which is filtered from public social media conversations. Compared with the existing advanced generation model OpenAI GPT-2, Meena has a model capacity of 1.7 times and is trained on 8.5 times the data.
Manual evaluation index: SSA
SSA i.e. Sensibleness and Specificity Average.
To calculate the SSA, we had a free-form conversation with the chatbot we were testing. These bots include Meena and other well-known open domain chat bots, especially Mitsuku, Cleverbot, XiaoIce, and DialoGPT. To ensure consistency between assessments, each conversation begins with the same greeting “Hi!”.
For each speech, the crowd workers will evaluate two questions: ? “Answer it meaningful ” and “answer whether specific?”. Evaluators need to use common sense to judge whether the answer is completely reasonable. If there are any questions, such as semantic confusion, illogicality, out of context, or factual errors, the answer should be “meaningless.”
If the answer is meaningful, you need to evaluate whether the answer fits the given context. For example, if A says “I love tennis” and B answers “Good”, the answer should be marked as “not specific” because such a response can be used in many different contexts. However, if B answers: “Me too, I love Federer!”, Then the answer is marked as “specific” because it is closely related to what is being discussed.
For each chatbot, we collected 1,600 to 2,400 personal conversations through about 100 conversations. The answer for each model is marked by the crowdworkers to indicate whether it is reasonable and specific. The reasonableness of a chatbot is an answer marked as “reasonable”, and the specificity is an answer marked as “specific”. The average of these two is the SSA score. The results below show that Meena far outperforms the state-of-the-art chatbots in terms of SSA scores and is closing the gap with human performance.
‘Meena’ is not available for users
Google has made it clear that its new chatbot isn’t available to everyone. However, it could reach more users in the coming months. “However, we are weighing the risks and benefits of outsourcing the model checkpoint, and we may choose to make it available in the coming months to help advance research in this area,” Google said.