As data volumes continue to grow, even the highest powered servers can no longer keep up with the data throughput requirements for particular applications. Cloud computing is the latest framework for providing horizontally scalable, distributed computing services that can keep up with the demands of Big Data applications. NetOwl has been deployed on a variety of public and private clouds to help customers address their data analytics needs.
Public clouds such as Amazon’s Elastic Compute Cloud (EC2) provide a very accessible platform to rapidly deploy NetOwl services on any number of nodes to meet individual customers’ throughput requirements. Other public clouds like those offered by HP, IBM, Microsoft and others can also run NetOwl services to provide advanced text and entity analytics for Big Data.
For customers who prefer private clouds, NetOwl has been providing the same text and entity analytics services in such closed environments. Whether on the same commodity hardware and infrastructure of the public cloud computing frameworks or on special-purpose hardware like the LexisNexis High Performance Computing Cluster (HPCC), where NetOwl is currently deployed, NetOwl software is easily deployable in all distributed computing environments.
One specific distributed computing environment for running NetOwl on a cluster of machines is Apache Hadoop. Hadoop provides a scalable framework through its MapReduce processing paradigm where NetOwl analyzes large quantities of unstructured data to extract key entities, relationships, events, and geospatial data. This NetOwl-derived semantic information can be further exploited by downstream analytical and visualization software by storing it into traditional RDBMS, XML databases such as MarkLogic, or emerging NoSQL databases such as Apache HBase, Apache Cassandra, Amazon Dynamo and others to support more advanced Big Data analysis.