Step 3: Utterance Generation and Intent Mapping

Once we've compiled a list of core user intents, it's time to generate as many possible utterances to match to them as we can. An utterance is any free text a user types when conversing with a bot. Because it's free text, there's a multitude of ways users can choose to phrase their request; phrasing which can be further diversified by things like poor spelling, typos, and varying levels of user domain knowledge.

Typo

We generate these utterances based on key customer personas identified in previous design stages. By now we should have a good idea of who the users are – but how do they speak?

Natural Language Processing

For example, the intent "invest" may be triggered by I want to invest, invest my money, start investing, or even I want to save, if the user is less familiar with the field of investment banking. It's important to anticipate this variety to train the bot accordingly, and also to eliminate design bias wherever possible.

Example training utterances for a GetTaxCertificates intent in LUIS, an NLP engine

Example training utterances for a GetTaxCertificates intent in LUIS, an NLP engine

Bot training consists of generating a collection of utterances users are likely to say and mapping them to the intent manually within the Natural Language Processing (NLP) engine. This training data 'teaches' the bot how to interpret new utterances sent through by actual users.

This process is ongoing - when interacting with real-world users, each new utterance the bot understands correctly becomes part of the training data, ever-increasing the bot's accuracy and competence.

Once the bot has been running for a while, we can look at the conversation data and see where we can further improve its training. This means that the longer our bots run, the better they become at performing their core purpose, and the more tailored to the user they'll be.

Example of LUIS correctly matching an utterance to the GetTaxCertificates intent (with a confidence score of 0.885) in the testing panel

Example of LUIS correctly matching an utterance to the GetTaxCertificates intent (with a confidence score of 0.885) in the testing panel
}
  query": "where can i see my tax cert",
  "topScoringintent": {
    "intent": "GetTaxCertificates",
    "score": 0.8853706
  },
  "intents": []
    {
      "intent": "GetTaxCertificates",
      "score": 0.8853706
    },
    {
      "intent": "None",
      "score": 0.1953956
    },
    {
      "intent": "ChangeUserType",
      "score": 0.0587605648
    },
    {
      "intent": "AskContactDetails",
      "score": 0.0461635441
    },
    {
      "intent": "WantToInvest",
      "score": 0.0357262753
    },
Json result in LUIS showing the top intent matched, as well as how the utterance scored for other intents

Fuzzy text matching

At base level, our bots can handle general spelling errors and poor grammar. However, language variation is widespread, and dialects and idiolects abound. So, in addition to NLP training, we also apply a layer of fuzzy text matching to our NLP engine.

A fuzzy match occurs when an utterance is matched to an intent because it fits the pattern of that intent approximately rather than exactly. When a user inputs something that isn't an exact match to a pattern or entity the bot knows, we fuzzy match it to something that has a high likelihood of being the real match.

This proves a useful technique for refining the bot's intent-matching results, increasing accuracy as well as accounting for language varieties we may not have thought of when we trained the bot.

A fuzzy text match using the Levenshtein distance

A fuzzy text match using the Levenshtein distance

Applying the Levenshtein distance is one of the methods we use in fuzzy text matching, particularly in interpreting typos and spelling errors. The Levenshtein distance refers to the minimum number of single character edits needed to change one word into another.

For example, if a user asked for a "core balance fund fact sheet", we apply this edit distance to map their request to the actual name of the fund, the "core balanced fund".

The Levenshtein distance here would be two, as it takes an insertion of two single characters to move from "balance" to "balanced". This would be a smaller editing distance than mapping it to the "equity fund", for example. So, our bot would assume that the unrecognised entity "balance fund sheet" should be interpreted as the similar, recognised entity "balanced fund sheet".

Alternatives?

Of course, it is sometimes better to restrict these margins for error that are introduced when a user is given free rein in a chat conversation. As an alternative to NLP (or in conjunction with NLP), we can also create a guided solution that allows a user to achieve their intents purely through structured query, decision tree logic and visual controls.

Example of guided conversation

Example of guided conversation

We use components such as the Atura Form Engine when building this style of assistant. Utilising buttons as guardrails in a chat conversation keeps users on the "happy path" (away from Sorry, I didn't understand that ☹ messages) , and also helps avoid scenarios where users stare blankly at a chat window, not knowing what the bot does or how to talk to it.

That's about it for utterance generation and intent mapping. Now that we've taught our bot how to understand user input, what is it going to respond to – and how?