Text Classification

Text classification modules are used to classify information, that is, assign a category to a text, also known as tagging. A category is a label and categories are structured in a hierarchical category tree. For example, the following category tree shows categories related to retail products:

_images/category_tree.png

A machine learning classifier, learns to assign the corresponding category to a text by using mathematical models and algorithms that learn to associate a particular input (text) to the corresponding output (label).

This process is done automatically, the only thing that you have to do in MonkeyLearn if you want to create your custom text classifier is:

  1. Define your category tree.
  2. Upload text samples for each category.
  3. Train the machine learning classifier (just click the Train button!).

Text samples can be reviews, articles, emails, tweets, or any piece of text.

In the previous category tree, a classifier that gets the input:

Built on 64-bit desktop-class architecture, the new A8 chip delivers more power,
even while driving a larger display.

Other Features include: Wi-Fi 802.11
a/b/g/n/ac, dual-band, Wi-Fi hotspot, Bluetooth: v4.0, A2DP, USB: v2.0, GPS:
with A-GPS, GLONASS, Browser: HTML (Safari), Messaging: iMessage, SMS (threaded
view), MMS, Email, Push Email, Built-in rechargeable lithium-ion Battery,
Talktime: Up to 14 Hours (3G), Standby: Up to 250 Hours (3G)

May return the following result:

That is, it understands that the text is talking about Electronics and in particular about Cell Phones. Probabilities indicate how sure is the classifier about the prediction, 1 means 100% sure.

Text classification can be single-label or multi-label, that is, the classifier can assign only one category or multiple categories respectively. The choice of what type of classification you use depends on your particular problem, MonkeyLearn allows users to use both type of classification schemas.

Examples of Text Classification

As an example of possible text classification applications, MonkeyLearn has different pre-created classification modules to classify different types of information. The following are some of the pre-created classifiers that are ready to use within MonkeyLearn.

Language Detection

In Language Detection, you may have a text and want to programmatically detect in which language is written, e.g.: Spanish, English, French, etc.

For example, if we have the following texts:

Text A: Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data…

Text B: El aprendizaje automático o aprendizaje de máquinas es una rama de la inteligencia artificial cuyo objetivo es desarrollar técnicas que permitan a las computadoras aprender…

Text C: L’apprentissage automatique (machine learning en anglais), un des champs d’étude de l’intelligence artificielle, est la discipline scientifique concernée par le développement…

Text A would be classified as English, Text B as Spanish and Text C as French.

Product Classification

Imagine you have a set of products and you want to programmatically classify the products according to their descriptions:

Let’s say we have three different products with their descriptions:

Product A: “This women’s printed pullover sweater is a great basic to add to your wardrobe. Throw this sweater over a top for extra warmth and to add some fun pops of color to your outfit. Pair it with jeans and boots this winter. This top is an excellent essential…”

Product B: “Relieve tired, aching feet from the stress of high heels. The fully-lined padded insole provides a comfortable fit and a rubber outsole for durability. Bendable comes in a convenient carry-bag…”

Product C: “Listen to your favorite songs from your Bluetooth-enabled device through the crystal clear V7 Bluetooth wireless speaker. You will enjoy its 33-foot range and hands-free speakerphone function that allows…”

With a classifier we can implement a method to organize products according to their descriptions, in our examples:

Product A description talks about a sweater, so it would be classified, for example, as Apparel -> Clothes -> Sweaters.

Product B description talks about some kind of woman sandal, so it would be classified as Apparel -> Shoes.

Product C description talks about a bluetooth speakerphone, so it would be classified as Electronics -> Audio.

Topic Classification

Imagine you have any type of text: tweets, documents, news or reviews, and you want to programmatically detect which topic is talking about. For example:

We have a webpage that has content like this:

_images/cooking_recipe.png

A topic classifier may understand the written text and tag the information as Food & Drink -> Cooking.

Or if we have a tweet like this:

_images/tweet.png

A topic classifier may interpret the tweet ant tag the information as Environment.

Sentiment Analysis

With sentiment analysis, usually you may have a comment, tweet or review from a user or customer and you want to programmatically detect the sentiment (if they are talking positively or negatively about something):

_images/review.png

A sentiment classifier may categorize the review as: Hotels -> Positive.

You can be even more specific, and do sentiment analysis at the sentence level, that is, partition a review into sentences, and get the sentiment of each sentence. That’s very useful as most reviews express positive and negative opinions. Besides, you can detect about which particular aspect is talking about. In the case of hotels, you may classify opinions to know if they are talking about the service, location, price, etc.

Key Points in a Classifier

Text classification modules are constructed taking into account the following key aspects:

  1. A Category Tree
  2. Some Training Samples
  3. A Machine Learning Model
  4. Some Performance Measures

Next steps

We had a look at the basic concepts involved in text classification, the next steps are to have a brief idea on how do classifiers work.