Skip to content

Step 2: Prepare Datasets for Training and Testing

This is the step where most of the effort is involved: the machine learning algorithms involved in NLP - and most of the state-of-the-art NLP engines out there are based on some kind of machine learning - are only as good as the data they have been trained on. It is both a question of quality as well as quantity.

Botium has tools to support in gathering and augmenting datasets for training and testing.


Although from a technical perspective it doesn’t make a lot of sense to use training data for testing, this is usually the first step in Botium Coach: - It can be done with a few clicks in Botium Box - It will give you first insights how the NLP engine is performing on the data it has been trained on - It shows up any flaws within the training data itself

Don’t underestimate the importance of clean training data for the real-life-performance of your NLP engine!

Option 1: Use Training Data for Testing

Instead of annotating the test cases manually, Botium Box includes a Test Case Wizard to download the conversation model of an NLP provider and convert it to BotiumScript test cases. They can be used instantly by Botium Coach.


See Botium Coach / NLP Analytics Support for an overview which Botium connectors are supported by the Test Case Wizard.

Using the Test Case Wizard

For each supported NLP engine there are different options. When using with Rasa, you first have to upload the Rasa training data file to Botium.

Depending on the NLP engine, you can decide to:

  • Only import intent names and user examples as utterance lists (recommended for getting started)

  • Generate convo files with NLU intent asserters

  • Generate convo files with NLU intent and entity asserters


Using the same data for training and testing (as it is the case here) doesn’t allow to draw conclusions how the NLU model will perform in real-life scenarios. If it is your primary concern to test your NLU model for real-life scenarios, a separation of training and test data is required.

Pro Tip: Split Training Data by 80/20 into Training and Testing Data

In typical machine learning projects, there is the 80/20 rule: Use 80% of the available data for training, and 20% for testing. Botium Box includes a dataset splitter to divide a dataset into two pieces. Expand the Test Set Splitter section in the Transformation Wizard menu of the Botium dataset:

Here you can tell Botium the ratio to split the data as well as the names of the resulting Botium datasets. You should now use the 80% training data to re-train your NLP engine and use the 20% test data for testing. The Test Case Wizard in Botium Box helps to upload the training data to your NLP engine of choice - see Step 6: Training your NLU engine

Option 2: Use Included Botium Datasets

Botium Box comes with batteries included - out-of-the-box there are datasets available in Botium Box you can use for testing and training your NLP engine.

  • More than 70.000 user examples

  • More than 20 languages (english, german, french, spanish, …)

  • More than 40 domains (smalltalk, banking, travel, insurance, customer support, security, …)

Botium Coach will work very well with those datasets.

In order to use them with Botium Coach, you will have to rename the utterance lists to the intent name your NLP engine is using.

Option 3 - The Ideal Scenario: Bring your own data

As a general rule of thumb, never use training data for testing: It is not a challenge for an NLP engine to correctly predict the intent for a user example it already knows. The purpose of all the NLP training is to finally make predictions for user examples that it has never seen before.

That’s why it is recommended to always strictly separate the data you use for training your NLU engine from the data you use for testing.

Annotate Existing Test Cases with NLP Asserters

If you are already using Botium Box for conversational flow testing, you can annotate the BotiumScript test case with NLP asserters so Botium Coach knows the expected outcome and can compare with the predictions.

Here you have an example test case from Botium Box, involving a chatbot from the tourism domain:

In BotiumScript, this test case looks like this:


I want to travel from Berlin to Vienna.

Im happy to hear it. And where are you now?

in Munich

So you are in Munich, and want to travel from Berlin to Vienna?

You can annotate the expected NLP intent by editing the bot conversation step and adding the NLP Intent Asserter and the NLP Entity Values Asserter

The annotated test case then looks like this:

And again, in BotiumScript:


I want to travel from Berlin to Vienna.

Im happy to hear it. And where are you now?
INTENT travel

in Münich.

So you are in Münich, and want to travel from Berlin to Vienna?
INTENT travel
ENTITY_VALUES Berlin|Vienna|Münich


Test Cases can be written in all supported BotiumScript file formats - plain text, YAML, JSON, CSV, Markdown, Excel, Google Sheets

The benefits of annotating existing conversational test cases is that you can re-use existing test data. The drawback is that the analytic results will be distorted if you have multi-step conversations: A Botium test case will exit as soon as the first asserter fails, all following conversation steps are ignored.

Botium Coach works best for simple question/answer conversations: a question (“user example”) is sent to the NLP engine, and the response is processed by Botium Coach. Register a new test set in Botium Box and add a new utterance list for each NLP intent you want to be resolved - name it exactly like the intent. Then add user examples you want to test.

In BotiumScript this is a flat text file:

I want to travel from Berlin to Vienna
go to vienna, from berlin
book a flight from berlin
book a ticket to vienna

As a final step, you have to tell Botium that this test set is only for question/answer conversations. In the Settings menu, expand the Advanced Scripting Settings and enable the Expand Utterances to Conversations as well as the Use Utterance Name as NLU Intent options.

Pro Tip: Use Paraphraser to Quickly Generate New User Examples

Botium Box includes a paraphraser to quickly generate new user examples based on given ones. After adding a handful of user examples to the utterance list, click on the Paraphrase it! button to get a couple of suggestions for additional user examples and select the ones you want to use.


Before using the Paraphraser, you have to configure the Paraphraser API Key in the Botium Box System Settings.

The result is a big number of similar user examples to use for testing.

Read more about the Botium Box Paraphraser here.

Keep Test Dataset in Git Repository

Instead of adding your test datasets to the internal Botium Box repository, you can (and should!) use a Git repository for you test data and establish a process for continuous improvements - see Best Practice: Test Case Development

Download Training Data into Botium Box (optional)

Besides adding the test dataset to Botium Box, you should also add the training dataset - if not already done with one of the previous steps. This way you can use the tools included in Botium Box for data augmentation, such as the paraphraser, the translator and the humanificator.

Use the Test Case Wizard to download the training data from your NLP engine into Botium Box

Advanced Challenges

“The art of challenging chatbots” is the Botium tagline. And if you need some special challenges for your chatbot, then read on.

Multi-Language Testing

Many chatbots out there are built to serve users in multiple languages. So you need training and test datasets in multiple languages. The internet language is english, and most public domain datasets to be used for training and testing chatbots are available in english only. That’s why we included a Translation Transformer in the Botium Box Transformation Wizard.

Before using the translator, you have to configure the Google Translate Service Account Key in the Botium Box System Settings.

You can select the source language (leave empty for auto detect) and the target language. You now have an additional dataset you can use for multi language testing.

Humanification Testing

Humanification in Botium stands for simulation of human behaviour or habits. BotiumScript makes it is easy to verify the chatbot’s ability to follow a conversation flow. It is an important step to recognize the need for automation in this area. But in real world, you cannot expect human users to act like a computer script:

  • typographic errors are introduced

  • different typing speeds

  • sausage finger syndrom

With Botiums Humanification layer it is possible to evaluate how your chatbot deals with typical human typing habits. See here to know more: Best Practice: Humanification of Test Sets