BDL Architecture

Faster, Connected, Big data

BDL is the industry's only true secure, enterprise-ready Big Data distribution aimed at understanding unstructured data. BDL addresses the complete needs of data-at-rest and dark data, powers a new breed of customer applications and delivers unprecedented insights to accelerate innovation.

Data Management

Hadoop Distributed File System (HDFS) is the core component of the BigConnect Data Lake (BDL) for data-at-rest and dark data. HDFS provides a scalable, fault-tolerant and cost-efficient storage for your Big Data Lake. Operational data sits on top of HDFS in Druid, Accumulo, or BigConnect Graph Engine

More info: HDFS, Druid, Accumulo, Hive, BigConnect Graph Engine

Data Access

BDL includes a versatile range of processing engines that empowers you to interact with the same data in multiple ways, at the same time. This means applications for big data analytics can interact with the data in the best way: from federated, interactive SQL to low latency access with NoSQL or linked data through Cypher.

More info: Druid, Accumulo, Hive, Presto, Cypher Lab

Data Analytcs

Best of breed tools for big data analytics are available to derive instant knowledge and insights. Work with BigConnect Explorer to search, aggregate and see how data is linked or with BigConnect Discovery to create stunning dashboards and reports

More info: BigConnect Explorer, BigConnect Discovery

Data Science

Emerging use cases for data science are enabled with Apache Spark, TensorFow, H2O, Jupyter Notebooks and Prodigy, all in a seamingless, integrated way. Train, Test and Productionize your state-of-the-art model with Stream Sets data pipelines or data preparation workflows in BigConnect Discovery.

More info: Spark, TensorFlow, H2O, Jupyter Notebook

Data Movement

BDL brings data access and management to a new level with powerful tools for data governance and integration. They provide a reliable, repeatable, and complete framework for managing the flow of data in and out of your Data Lake. This controlled structure, along with a set of tooling to ease and automate the application of schema or metadata on sources is critical for successful integration of a Data Lake into your modern data architecture

More info: BigConnect Data Collector


Security is woven and integrated into BDL in multiple layers. Critical features for authentication, authorization, accountability and data protection are in place to help secure BDL across these key requirements.


Operations teams deploy, monitor and manage a BDL cluster within their broader enterprise data ecosystem. Apache Ambari simplifies this experience. Ambari is an open source management platform for provisioning, managing, monitoring, and securing the BigConnect Data Lake, enabling it to fit seamlessly into your enterprise environment

More info: Apache Ambari


More and more enterprise architectures are shifting to hybrid and multi-cloud environments. While this shift allows for more flexibility and agility, it also means having to separate compute from storage, creating new challenges in how data needs to be managed and orchestrated across frameworks, clouds and storage systems. BDL provides an in-memory HDFS-compatible Virtual File System to work with on-premise and cloud data in a unified way,enabling hybrid data.

More info: Alluxio