Q: How many Training Questions should my Custom Answers have?
We have found that, in general, you should aim to have between 10 and 20 training questions per Answer, and each training question should contain multiple words (avoid single-word training questions). You should aim to have balanced training question sets, as some Custom Answers having drastically more training than others can result in more inaccuracies.
Q: How should I train queries that often include product names? (i.e. do you have apples? can I buy oranges from you?) How does the ML handle technical/business jargon?
If the specific name or term of the query is important for the Custom Answer, one of the models our ML uses for answer prediction is more sensitive to where keywords appear in training. Another model we use is more sensitive to semantic meaning and will do some level of generalization to other words that are semantically similar. Unfortunately, there is no current way to indicate when a term should be specific or general, but we are aiming to build out this functionality in the future!
Q: Are training changes immediately reflected in the chatter experience, or is there a lag time?
There is latency in model training after adding/modifying training data that is proportional to the amount of total training data: more Training Questions will require more time to train the bot on them. Generally, waiting at least 15 minutes after adding/modifying training data before testing is probably good.
Q: Can we train the bot with non-English questions?
Yes! Thanks to our Multilingual Training feature, you can now add training in any language your bot supports.
Q: Should we include capitalizations? Avoid it? Do capitalizations have any particular meaning for the model?
None of our current models considers the casing of characters, so capitalization would not matter. However, it is recommended to use capitalizations wherever it makes sense grammatically, for example, proper nouns) in case we decide to make use of them in the future. Currently, we always use the lowercase version of the Training Questions.
Q: Does punctuation need to be included or removed? If punctuation should be removed, are existing Training Questions that have punctuation detrimental to the model?
Some of our models are more sensitive to punctuation than others. In general, you can preserve punctuation as naturally as it appears in the customer messages you expect to see - if our models need to remove punctuation for any reason on the back end, we are able to remove it automatically.
Q: How do numbers and special characters impact matching accuracy?
Our models are not optimized for training on numbers and special characters, but they have some awareness and ability to predict them. Having only one Custom Answer that is trained with numbers can potentially negatively impact training since users may use numbers frequently to refer to a variety of things. Include numbers in your training where they would naturally occur.