NetOwl DocMatcher

Document Categorization

NetOwl DocMatcher® uses advanced linguistic features and robust machine learning algorithms to intelligently compare and categorize documents. This unique combination of natural language processing (NLP)-based features and machine learning makes DocMatcher very powerful and highly accurate.

DocMatcher is used for a variety of automated document categorization and comparison tasks, ranging from intelligence analysis to Customer Relationship Management (CRM), Customer Experience Management (CEM), patent analysis, and resume routing. For example, DocMatcher offers Sentiment Analysis to categorize a wide variety of social media data such as Twitter tweets as being negative, neutral, or positive. It also provides more sophisticated Opinion Mining capabilities to detect customers’ opinions on particular products and services.

In addition, with DocMatcher, it is easy to explore concepts and relationships between documents and identify duplicate or near-duplicate documents.

Unlike other trainable categorization products, DocMatcher does not require a large set of training documents to achieve high accuracy. A large government agency ranked DocMatcher the highest, by a significant margin, among other categorization tools in a large-scale operational benchmarking.



Achieves high accuracy using natural language processing (NLP) technology and sophisticated machine learning algorithms.


Optimized for high-speed document comparison and categorization.


Easy to create categorization “models” for different domains and applications, from sentiment analysis for Customer Experience Management (CEM) to topic categorization for enterprise portals.


Easy to customize categorization features through built-in and custom thesauri, semantic concepts, and stop word capabilities.


Scales to handle tens of thousands of categories to support large-scale enterprise applications.

Deployment Friendly

Offers ability to incrementally update the trained model.


Support document categorization in multiple languages.


  • Compares and categorizes documents with high accuracy
  • Avoids the bottleneck of requiring training documents
  • Optimized for high throughput for enterprise applications
  • Reduces unstructured data volume effectively through duplicate detection