After selecting a parent category in the category three, you can see the confusion matrix which shows the confusion between the actual category (where the sample was tagged to belong to) and the predicted category (the category predicted by the classifier).


In the previous example, this is the confusion matrix corresponding to the Root category in this news classifier. You can see that, for example, 49 samples were tagged as Arts & Culture but were predicted as Living (the red 49 number in the first row of the matrix).

Note that you allways want all numbers greater than 0 to be in the organge diagonal of the matrix. That means that your classifier works with 100% accuracy (100% of precision and 100% of recall in all of the involved categories). Red numbers indicate confusions that are substantially larger, those errors are good candidates to start improving your classifier.

The confusion matrix is created by partitioning the dataset into 4 disjoint subsets and performing k-fold crossvalidation. Basically consists in training and testing 4 different models, each one, trained on 3/4 of the data and tested on 1/4 of the remaining data.

You can click the numbers in the matrix to see the corresponding samples in the Samples section. For example, if you click in the red 49 you’ll see something like this:



You can see the samples that were tagged as /Arts & Cultue but were predicted as Living at testing time.

See Sample Management section for more details on how to manage samples in a classifier.