Creating a Module

First, create a new module by clicking the Create Module button at the top  of your screen:

 

A three step dialog starts, where you will specify the characteristics of your classifier.

Name, Description, Permissions and Module Type

  • Name: the name of the module.
  • Description: that should explain the functionality of your module.
  • Permissions:
    • Public (every user in MonkeyLearn will be able to see and use the module)
    • Private (only you or people that you explicitly invite through teams can see and use the module)
  • Module Type:
    • Classifier
    • Pipeline

Type of Problem

You can select different problems that will internally set parameters to obtain better results, eg: you’re working with Social Media, Web Scraping, Sentiment Analysis, Topic Categorization, etc.

Type of Text and Language

 Choose the type of text that you will classify (tweets, news articles, descriptions, reviews, etc).
Then you have to set the proper natural language (English, Spanish, French, German, etc.) for the text that will be processed. If you have to process different languages at the same time, or your language is not in the list, select the multilanguage option.
By providing this information, MonkeyLearn will set the most appropiate set of parameters (preprocessing and algorithms) to obtain the best results for your particular application.

Advanced Options

 Optionally you can configure the classifier’s advanced options where you can tweak particular parameters, see Classifier Parameters for more details.

Creating a Category Tree

In order to create a category three, follow the section Category Tree for details. But basically you have two options:
  • Create the categories right on the category three at the left within the Sandbox/Tree tab.
  • Create the categories when uploading tagged data, that is, when you upload data, you’re also defining the categories and hierarchies. To s

Creating categories on the GUI

Just go to the Sandbox/Tree tab and add categories to the corresponding parent category:

 

save image

Creating categories on data upload

Just go to the Sandbox/Samples tab and upload new tagged data with the Upload wizard:

 

save image

save image

 

When you upload a tagged dataset, you can specify the column that has the text content and the column that has the category. Use the combo boxes at the top of the column to select “Use as text” or “Use as category” respectively. If you upload samples with new categories, MonkeyLearn will create the corresponding category for you. Take a look at the CSV/Excel file specification to know how the syntax to denote hierarchies and multilabel categories.

Adding Training Samples

Now that you have our category tree, you must upload training samples that are representative for each category node. If you created the category tree by uploading a tagged CSV/Excel file, you may  already uploaded some samples.

You have three different options to upload text samples:
  • Create sample allows you to create a sample by pasting text into a textbox.
  • Upload as CSV/Excel file,  through the GUI.
  • Upload data through the API.

Training your Classifier

Now you are ready to train our Machine Learning model. After creating the category tree and adding samples to each category (at least one sample per leaf node) you can train the model by clicking the Train button.

 

 

You will notice that the state changes to a yellow TRAINING alongside with a progress bar. As our example has few categories and samples, the training is almost instant. After the training is finished, if the process was successful, the state changes to a green TRAINED. Congratulations! you have trained your first machine learning model!

The screen now shows some performance indicators in the Statistics section, depending on the samples you uploaded, category tree and selected parameters, you can obtain different results. In the picture, the results show 82% of Accuracy (please refer to section Classifier statistics in order to review the different performance indicators).

The statistics also show the amount of Samples that have being used to train the model. In the picture we can see that 3,000 samples were used.

You may have noticed that when you select each category of the tree, you can view the statistics and the samples associated with that particular category. Also, samples are only shown when the particular category has samples associated to it. Take into account that categories that have children categories use the samples of their children to train themselves. For example, when the classification module has to decide between Sport and Politics the samples from Basketball and Football shall be used as samples for Sports category.

Keyword Cloud

At the right of the statistics you can see a keyword cloud that shows the terms that are used to characterize the samples to know in which category should be placed (in machine learning these are commonly called features or attributes). Take into account that the keywords can be a bit transformed if you use stemming in your advanced settings. Also the length of the terms obtained depends on your configuration of the n-gram range. The following shows the keyword cloud corresponding to the Sports category in our example classifier:

You can see a detailed list of keywords in order of relevance in the corresponding category by opening the Keyword List. The following shows the start of the keyword list corresponding to the previous keyword cloud:

Confusion Matrix

Another useful tool to analyze how well the classifier is performing, and in particular, which errors is making, is to look at the Confusion matrix. If you select a particular parent category, you’ll see the confusion matrix at the bottom of the screen, like this:

 

Testing your Classifier

After you train your module, MonkeyLearn publishes a web API that allows you to integrate your module within your project with any programming language. Take a look at Integrating Modules for more details on that. With the Classify tab, you can test those endpoints with a simple graphic interface.

You can type or paste a text into the MonkeyLearn interface, click Submit and obtain the corresponding classification in the result box.

The result, returned in JSON format. Check the API documentation for more details.

You can also perform a classification with a list of samples by uploading a CSV/Excel file. Just select the Classify File and follow the wizard.

Putting your Module in Production

After you created and trained a custom classifier, you may want to integrate with your project via the API. The correct way to do that is to first Deploy your module. This process will make a copy of from your Sandbox to the Live version. Note that this generates a different endpoint to call the module through the API. The live endpoint should be used in production and the sandbox version only for development, experimentation and testing purposes. With this feature you can keep modifying and experimenting with your classifier in the sandbox without affecting the production version (live) of your module.