NetOwl Extractor offers the best-of-breed named entity extraction as well as link and event extraction in multiple languages. It is based on over a decade of advanced research and development. Using sophisticated computational linguistics and natural language processing technologies, NetOwl Extractor accurately finds and classifies key entities, events, and links in unstructured text.
NetOwl Extractor’s state-of-the-art accuracy and high throughput, combined with the latest cloud computing architectures using frameworks such as Hadoop and HPCC (High Performance Computing Cluster), make advanced Big Data Analysis a reality for unstructured text.
NetOwl offers four types of semantic ontologies out of the box, each one offering a progressively richer set of semantic concepts. NetOwl ontologies encompass a variety of domains including Business, Cyber Security, Finance, Homeland Security, Intelligence, Law Enforcement, Military, National Security, Politics, and Social Media.
With over 100 types of entities, NetOwl offers a broad semantic ontology for entity extraction that goes far beyond that of standard named entity extraction.
- Sample entities include people, organizations, places, addresses, artifacts, phone numbers, dates, etc.
- Not only named entities but also definite noun phrases and pronouns are extracted, and coreference resolution is performed to resolve entities within a document.
- Many important attributes, such as age, personal characteristics, country code, and currency, are also extracted for entities.
NetOwl extracts semantic links between two entities based on the linguistic clues within a text, not by simple co-occurrence. NetOwl’s Link ontology offers a large set of semantic links between entities.
- Sample links include affiliation, association, various familial relationships, etc. that are useful in many domains.
- Advanced coreference resolution enables information-rich link extraction.
NetOwl offers an extensive event ontology out of the box that is applicable to various domains. NetOwl’s Event ontology provides over 100 types of events. Event extraction not only extracts events themselves but also event participants as well as associated event times and locations. Like link extraction, coreference resolution makes possible event extraction filled with rich event participant information.
- Sample events include transaction, conflict, merger and acquisition, personnel events, etc.
- Event extraction not only extracts events but also event participants (arguments) to provide full semantic triples.
Cyber Security Extraction
The Cyber Security ontology adds Cyber Security-specific semantic entity and event ontology types to NetOwl’s Entity, Link, and Event ontologies. It integrates concepts from US-CERT, DoD, and other leading cyber security organizations.
- Sample cyber entities and events include malware, hacking tools, denial of service, phishing, website hijacking, etc.
MultilingualSupports multiple languages, including:
- Chinese (traditional and simplified)
- Persian (Farsi and Dari)
Language IDOffers a seamlessly integrated language ID capability where the language of the input text is automatically detected, and the text is processed accordingly. Both microblog and standard document lengths are supported. A mixed language document, where sections of the document are written in multiple languages, can also be handled automatically.
Name NormalizationAssigns normalized forms to extracted person, organization, and place names, taking into account capitalization, acronyms, abbreviations, nicknames, etc. When NetOwl’s Smart Geotagging is used alongside entity extraction, place names are both disambiguated and normalized. Name normalization is ideal for cross-document name resolution for applications such as faceted search and link analysis.
Semantic DisambiguationRecognizes and classifies concepts using linguistic context. This sophisticated feature distinguishes semantic ambiguities like:
- "Apple" (company) vs. "apple" (fruit)
- "Jordan" (place) vs. "Jordan" (person)
- "fire" a weapon (conflict) vs. "fire" a person (personnel)
Coreference ResolutionResolves co-referring extracted entities, whether they are name aliases, pronouns, or definite noun phrases, identifying them as referring to the same object. For example:
- "FAA" ➞ "Federal Aviation Administration"
- "The company’s Chairman of the Board" ➞ "Mary Smith"
Smart Name TranslationNetOwl Extractor’s Smart Name Translation capability provides a Latin-character translation of named entities extracted from foreign languages that use different scripts, for example, Arabic, Persian (Farsi, Dari), Chinese (traditional, simplified), Korean, and Russian.