Extraction modules are used to extract data from text, that is, the result you are looking for exists within the text. The main difference between classification, is that in classification the result may be a prediction of a label, tag or category, that usually is not present within the text and has to be predicted or induced from the text contents.

MonkeyLearn has different extraction modules to extract different types of data: addresses, emails, entities, company names, keywords, etc. You may select the extraction module that resolves your particular problem. In the near future we will add the feature to allow users to create their custom extractors.

Examples of Extraction

As an example of possible extraction applications, MonkeyLearn has different pre-created extraction modules. The following are some of the pre-created extractors that are ready to use within MonkeyLearn.

Keyword Extraction

Keywords are relevant terms within a text, terms that in some way summarize the contents of a text. Keywords can be compounded by one or more words. They can be used to index data to be searched, summarize texts, generate tag clouds, etc.

For example, if we have the following text as an input to a keyword extractor:

The results may be something like:

It returned the top most relevant terms within the text. As you can see the terms can be compounded by more than one word and have a corresponding relevance measure that says how important is within that particular content.

Entity Extraction

Entities can be persons, organizations or locations. A Named Entity Recognition (NER) extractor, returns entities that exist within the text contents. NERs label sequences of words in a text which are the names of thing alongside their corresonding types: PERSON, ORGANIZATION and LOCATION.

For example, if we have the following text as an input to an entity extractor:

The results may be something like:

It found that the text mentioned six different locations (Europe, Prussia, Austria-Hungary, Austria, Germany and Russia), and one person (Otto von Bismarck).

Address Extraction

Let’s say that we want to extract addresses from text. If we input the following text from a webpage to an address extraction module:

_images/addresses_example.png

We would obtain the following output (a list of extracted addresses with their corresponding components):

Next steps

We had a look at the basic concepts and examples of extraction modules, the next steps are to have a look on the Extraction: Quick Start to have quick view of how to use an extraction module.