Course Summary

The Big Data Architect works closely with the customer and the solutions architect to translate the customer's business requirements into a Big Data solution. The Big Data Architect has deep knowledge of the relevant technologies, understands the relationship between those technologies, and how they can be integrated and combined to effectively solve any given big data business problem. This individual has the ability to design large-scale data processing systems for the enterprise and provide input on the architectural decisions including hardware and software. The Big Data Architect also understands the complexity of data and can design systems and models to handle different data variety including (structured, semi-structured, unstructured), volume, velocity (including stream processing), and veracity. The Big Data Architect is also able to effectively address information governance and security challenges associated with the system.

After completing this course, you will be well on your way to becoming a BigConnect Architect.

Topics covered

  • BigConnect Overview
  • Data Lifecycle: loading, streaming, retention, replication
  • Transforming Data
  • Real time, near real time, batch processing
  • Machine Learning pipelines
  • Scale up, scale out, scale to X
  • Hadoop, Spark and BigConnect Scalability
  • Fault tolerance in Hadoop, Spark and BigConnect
  • Securty: principles, privacy, threats, technologies
  • Cluster sizing and evolution
  • Technology Selection: choose the right Big Data tools for the job
  • BigConnect, Hadoop Software Architecture
  • Planning and implementing a Data Lake
  • Enterprise Search
  • Best practices

Course Details


This course is best suited to solution architects and senior-level engineers who want to become proficient in the design, implementation and integration of Big Data solutions within the IT enterprise or cloud-based environments.


4 Days | 4 hours per day


Ability to translate functional requirements into technical specifications

Ability to take overall solution/logical architecture and provide physical architecture.

Understand Clusters, Scalability, Networking, Data Modeling, Latency, Disaster Recovery, Data Ingestion, Data Querying, and Security.

Working knowledge of technologies such as HDFS, Spark, MapReduce, ElasticSearch, Data Formats and relational database management systems


  • Mac, Linux, or Windows
  • Latest version of Chrome or Firefox (Safari is not 100% supported)
  • Stable internet connection
  • Disable any ad-blockers and restart your browser before class