HDP Developer: Quick Start - Public Live Training

HDP Developer: Quick Start - Public Live Training

This 4 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark.

Not currently available

About this course

Overview:
This training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark. Topics include: Essential understanding of HDP & its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core, Spark SQL, Apache Zeppelin, and additional Spark features.

Target Audience:
Developers and data engineers who need to understand and develop applications on HDP. 

Prerequisites:
Students should be familiar with programming principles and have experience in software development. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.

Format:
Lecture/Discussion, Hands-on Labs and Demos

Duration:
4 Days

Curriculum

  • Course Logistics
  • HDP Developer: Quick Start - Live Training Schedule
  • Lesson 1:
  • Case for Hadoop
  • Lesson 2:
  • The Hadoop Ecosystem
  • Lab 1- Starting an HDP 2.3 Cluster
  • Lesson 3:
  • HDFS Architecture
  • Lab 2- Using HDFS Commands
  • Lesson 4:
  • Ingesting Data Into HDFS
  • Lesson 5:
  • Parallel Processing Fundamentals
  • Lesson 6:
  • YARN Architecture
  • Lesson 7:
  • Apache Pig
  • Demonstration 1 - Understanding Pig
  • Lab 3 - Getting Started with Pig
  • Lab 4- Exploring Data with Pig
  • Lesson 8:
  • Advanced Pig Processing
  • Lab 5 - Splitting a Dataset
  • Lab 6 - Joining Datasets (Optional)
  • Lab 7 - Preparing Data for Hive
  • Lesson 9:
  • Apache Hive
  • Lab 8 -Understanding Hve Tables
  • Demonstration 2 - Understanding Partitions and Skew
  • Lab 9 - Analyzing Big Data with Hive
  • Demonstration 3 - Computing ngrams (Optional)
  • Lab 10 - Joining Datasets in Hive
  • Lab 11 - Computing ngrams of Emails in Avro Format (Optional)
  • Lesson 10:
  • Using HCatalog
  • Lab 12 - Using HCatalog with Pig (Optional)
  • Lesson 11:
  • Advanced Hive Programming
  • Lab 13 - Advanced Hive Programming
  • Lesson 12:
  • Overview of Zeppelin and Spark
  • Lab 14 - Introduction to Spark REPLs and Zeppelin
  • Lesson 13:
  • RDD Programming
  • Lab 15 - Create and Manipulate RDDs
  • Lesson 14:
  • Pair RDDs
  • Lab 16 - Create and Manipulate Pair RDDs
  • Lesson 15:
  • Spark SQL
  • Lab 17 - Create and Save DataFrames and Tables
  • Lab 18 - Working with DataFrames
  • Lesson 16:
  • Caching and Persisting
  • Lesson 17:
  • Build and Submit Spark Applications
  • Lab 19 - Build and Submit Applications to YARN
  • Lesson 18: (Optional)
  • Introduction to Machine Learning with Spark (Optional)
  • Lab- Machine Learning Walkthrough.pdf (Optional)
  • Wrapping Up
  • Course & Instructor Survey

About this course

Overview:
This training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark. Topics include: Essential understanding of HDP & its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core, Spark SQL, Apache Zeppelin, and additional Spark features.

Target Audience:
Developers and data engineers who need to understand and develop applications on HDP. 

Prerequisites:
Students should be familiar with programming principles and have experience in software development. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.

Format:
Lecture/Discussion, Hands-on Labs and Demos

Duration:
4 Days

Curriculum

  • Course Logistics
  • HDP Developer: Quick Start - Live Training Schedule
  • Lesson 1:
  • Case for Hadoop
  • Lesson 2:
  • The Hadoop Ecosystem
  • Lab 1- Starting an HDP 2.3 Cluster
  • Lesson 3:
  • HDFS Architecture
  • Lab 2- Using HDFS Commands
  • Lesson 4:
  • Ingesting Data Into HDFS
  • Lesson 5:
  • Parallel Processing Fundamentals
  • Lesson 6:
  • YARN Architecture
  • Lesson 7:
  • Apache Pig
  • Demonstration 1 - Understanding Pig
  • Lab 3 - Getting Started with Pig
  • Lab 4- Exploring Data with Pig
  • Lesson 8:
  • Advanced Pig Processing
  • Lab 5 - Splitting a Dataset
  • Lab 6 - Joining Datasets (Optional)
  • Lab 7 - Preparing Data for Hive
  • Lesson 9:
  • Apache Hive
  • Lab 8 -Understanding Hve Tables
  • Demonstration 2 - Understanding Partitions and Skew
  • Lab 9 - Analyzing Big Data with Hive
  • Demonstration 3 - Computing ngrams (Optional)
  • Lab 10 - Joining Datasets in Hive
  • Lab 11 - Computing ngrams of Emails in Avro Format (Optional)
  • Lesson 10:
  • Using HCatalog
  • Lab 12 - Using HCatalog with Pig (Optional)
  • Lesson 11:
  • Advanced Hive Programming
  • Lab 13 - Advanced Hive Programming
  • Lesson 12:
  • Overview of Zeppelin and Spark
  • Lab 14 - Introduction to Spark REPLs and Zeppelin
  • Lesson 13:
  • RDD Programming
  • Lab 15 - Create and Manipulate RDDs
  • Lesson 14:
  • Pair RDDs
  • Lab 16 - Create and Manipulate Pair RDDs
  • Lesson 15:
  • Spark SQL
  • Lab 17 - Create and Save DataFrames and Tables
  • Lab 18 - Working with DataFrames
  • Lesson 16:
  • Caching and Persisting
  • Lesson 17:
  • Build and Submit Spark Applications
  • Lab 19 - Build and Submit Applications to YARN
  • Lesson 18: (Optional)
  • Introduction to Machine Learning with Spark (Optional)
  • Lab- Machine Learning Walkthrough.pdf (Optional)
  • Wrapping Up
  • Course & Instructor Survey