ZETTASET

Zettaset enables massive data analytics for the enterprise via Hadoop for a variety of applications including the Security Data Warehouse. A Hadoop-based SDW provides long term log storage for (1) forensics for incident investigation and (2) data mining for a variety of fraud analyses. An SDW complements the real-time log/event collection, normalization, enrichment, and correlation of Security Event & Information Management (SIEM) solutions. The fact is that large organizations’ log generation volumes have outstripped traditional and even columnar relational databases. The underlying Hadoop engine enables iterative analysis of log data in the hundreds and thousands of terabytes by scaling out horizontally on commodity servers and storage.

Zettaset provides an easy to scale, resilient, secure solution for data aggregation and analysis. Zettaset provides true redundancy, in a solution that is scalable, easy to deploy, and has a simple licensing model that leads to a lower total cost of ownership.

Big data are large datasets, in the order of terabytes, exabytes and zettabytes. Working with this amount of data is challenging using existing databases and management tools when it comes to the analysis, volume, speed and diversity of the data. Bridging this gap allows analysts to make informed decisions, and as we enter the Zettabyte age this becomes a necessity.

The Apache Hadoop software library framework allows for distributed processing of large datasets across clusters of computers on commodity hardware. This is most beneficial when combined with subprojects and other Hadoop-related projects. This solution is designed for flexibility and scalability, with an architecture that scales to thousands of servers. The library detects and handles failures at the application layer, delivering a high-availability service on commodity hardware.

Zettaset maximizes and enhances this offering by filling the gaps in the opensource projects, integrating as many as 30 services and dependencies into a single deployable system.

MAAP – Management, Administration, Automation and Provisioning

Management

  • A single administration interface enables users to manage petabytes of data and thousands of machines through a single application that controls all cluster environments
  • Administrators are able to determine which servers in the cluster are online, which processes and services are running, performance metrics on individual nodes, enable updates, and expand functionality
  • Data protection is ensured with the fail-safe model and redundancy of the solution’s key components including the NameNode server
  • Kerberos authentication, in conjunction with group and user level access control and data encryption, provides customers with the means to customize their security model to further protect the safety and availability of the data

Administration

  • Our Application Programming Interface (API) supports most major programming and scripting languages to allow for easy integration
  • Support for any data format (structured, semi-structured, or unstructured) combined with the ability to scale to several thousand servers and ease of management allow the infrastructure to expand on-demand on commodity hardware
  • A centralized Web console to manage services, jobs, and add or decommission nodes within a cluster
  • Ability to propagate software updates to all nodes from a centralized console and audit all transactions occurring on the platform

Automation

  • Monitoring for all critical services, generate alerts and notification for failures on the cluster and self-healing
  • Rapid-server provisioning for thousands of servers for a highly resilient and easily deployable solution
  • Failover for high-availability where every critical service (Hive, Oozie and Zookeeper) has an automated failover system with support for multiple backup servers
  • NameNode failover to mitigate data loss by providing secondary and tertiary level redundancy of the file block map. Triple replication of each data block protects the data

Provisioning

  • Able to assist clients as part of installation in choosing the appropriate hardware, software, tools and supply a fully supported Hadoop distribution with associated infrastructure
  • Installation, setup, configuration and document the process for adding and removing machines
  • Support for trillion-plus files expands the utility of the analytic platform
  • Simple licensing model, per usable stored terabyte, leads to a significantly lower Total Cost of Ownership (TCO), enabling customers to scale easily and efficiently

 

Features of Zettaset over the Native Hadoop Project:

Feature Apache Hadoop Zettaset
Ease of Configuration No Yes
Ease of Scalability No Yes
Resilience No Yes
Built-in Managment No Yes
Automatic health Monitoring No Yes
Distribue data processing Yes Yes
Commodity Hardware Yes Yes
Certified Stack No Yes
Additional Language Support No Yes
Enhanced Security No Yes
Rolling Upgrades No Yes

Definitions of Terms

  • Hadoop – Java software framework to support data intensive distributed applications
  • ZooKeeper – a highly reliable distributed coordination system
  • MapReduce – flexible parallel data processing framework on large datasets
  • HDFS – Hadoop Distributed File System
  • Oozie – a MapReduce job scheduler
  • HBase – key-value database
  • Hive – a high-level language built on top of MapReduce for analyzing large datasets
  • Pig – enables the analysis of large datasets using Pig Latin. Pig Latin is a high-level language compiled into MapReduce for parallel data processing
  • Admin Console / API – enables users to import, visualize, manipulate, analyze and export data with API support for almost all programming/scripting languages
  • Services Management & Monitoring –automatic monitoring and restart of services
  • High Availability – NameNode failover and services failover, no single point of failure
  • Backups & Failover – hot Disaster Recovery (DR)

If you have a question or a comment, or would like more information or a demonstration, please let us know by completing the Contact Us box on the upper right side of this page.