A category tree is the way categories (the tags that you want to assign to your texts) are organized into hierarchies.

The term “tree”, is used for this structure because its graphical representation resembles a tree, although the root of the tree is the top node and the bottom nodes are the “leaves” of the tree. In MonkeyLearn we use trees to represent the categories that are used to tag text information.

So the first step in building a text classifier consists in designing a category tree that organizes the tags that we want to assign to data.

Sentiment Analysis

If you want to assigne the sentiment of an opinion in text, you probably have categories like: Negative, Neutral and Positive:

 

Topic Classification

If you want to assign the topic of a news article, you could have categories like: Sports, Politics, Science, etc.

You can define a hierarchical tree when you want to have subcategories. For example, let’s say that we want to be more specific and assign subcategories within Politics, Science and Sports like the following hierarchy:

You can be as specific as you want and add more subcategories as needed.

Tips to design a good Category Tree

  • Try to organize categories according to their semantic relations. For example: Cell Phones and Laptops should be children of Electronics because they are a specific types of electronic devices.
  • Try to declare sibling categories that are disjoint. That is, avoid defining categories that are ambiguous or have overlapping, there should be no doubt in which category a text should be placed.
  • Make sure you have a label for each type of text you want to classify. When you have a text input, you should always have a corresponding category where the text should be assigned.

To create a category tree you basically have two options.

Create a category tree through the GUI

You can design and create your own category tree directly through MonkeyLearn GUI. When you just created a classification module, the category only has one category node, called Root. This node is the basis of every category tree and can not be deleted. To start creating your own category you can click the contextual menu in the root category:

Creating custom classifiers add child

Click Add child and then type the name of the new category. You can keep adding more categories to create the category hierarchy that you need.
There’s another option to create a category tree that may be more appropriate for more complex structures, we will see that in our next section.

Create a category tree by uploading a CSV/Excel file

You can also upload samples and the tree structure with a CSV or Excel file. The file must have the following format:

text sample 1 category sample 1
text sample 2 category sample 2
text sample N category sample N

Each row will be a sample, being the first column the sample’s text content and the second column the sample’s category label. Uploading samples with a CSV/Excel file into MonkeyLearn is as easy as following a simple wizard. You can read more about it in the CSV/Excel file documentation.

 

Modifying your category tree

If you want to improve your category tree, you can use the contextual menus in every node:

Creating custom classifiers modify tree
  • Add child as we have used before, allows to create a new child node in the selected node.

  • Rename allows to change the name of a node.

  • Change parent allows to move a node to another parent. A dialog will popup in order to select the new parent where the category will be moved.

Creating custom classifiers change parents

 

  • Delete all samples deletes all the samples corresponding to the selected node.

  • Delete category allows to remove a category from the tree. A dialog will popup to confirm the operation and to select what to do with the corresponding samples, three options are possible:
    • Delete all the corresponding samples.
    • Move the corresponding samples to the parent category.
    • Select a new category where the corresponding samples will be transferred.

    Creating custom classifiers delete category