HDP Analyst: Data Science

HDP Analyst: Data Science

This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing.

About this course

Overview:
This course Provides instruction on the processes and practice of data science, including machine learning and natural languageprocessing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikitlearn), the Natural Language Toolkit (NLTK), and Spark MLlib.

Target Audience:
Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop.

Prerequisites:
Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course.
Students should be familiar with programming principles and have previous experience in software development. Experience with Linux and a basic understanding of DataFlow tools would be helpful. No prior Hadoop experience required, but is very helpful.

Format:
Live Instructor
50% Lecture
50% Hands-On Labs

 

How to Register:

  1. Click the  "Purchase" button at the top of the page to initiate your purchase
  2. After you have completed your purchase and registration, you will be able to select the event that you wish to attend from the classes scheduled below after logging into your account

HDP Analyst: Data Science - Live Training Schedule

Event Date Spaces Left
HDP Analyst: Data Science (Virtual) - 3 Nov 28, 2017 10:00 a.m. -
Nov 30, 2017 6:00 p.m. EST
25
HDP Analyst: Data Science (Virtual) - 3 Mar 06, 2018 10:00 a.m. -
Mar 08, 2018 6:00 p.m. EST
25

Curriculum

  • Course Logistics
  • HDP Analyst: Data Science - Live Training Schedule
  • Lesson 1:
  • AWS Guacamole Setup Guide
  • Using Hadoop for Data Science
  • Lab Guide: Setting Up the Development Environment
  • Lesson 2:
  • HDFS
  • Demonstration: Understanding Block Storage
  • Lab Guide: Using HDFS Commands
  • Lesson 3:
  • The MapReduce Framework
  • Demonstration: Understanding MapReduce
  • Lesson 4:
  • Hadoop 2 and YARN
  • Lesson 5:
  • Machine Learning From Data
  • Lab Guide: Using Apache Mahout for Machine Learning
  • Lesson 6:
  • Introduction to Pig
  • Demonstration: Understanding Pig
  • Lab Guide: Getting Started with Apache Pig
  • Lesson 7:
  • Python Programming
  • Lab Guide: Using the IPython Notebook
  • Lesson 8:
  • Analyzing Data with Python
  • Demonstration: Understanding the NumPy Package
  • Demonstration: Pandas Library
  • Lab Guide: Performing Data Analysis with Python
  • Lab Guide: Interpolating Data Points
  • Lesson 9:
  • Running Python on Hadoop
  • Lab Guide: Defining a Pig User Defined Function in Python
  • Lab Guide: Streaming Python with Pig
  • Lab Guide: Exploring Data with Apache Pig
  • Lesson 10:
  • Machine Learning Algorithms
  • Demonstration: Classification with Scikit-Learn
  • Lab Guide: Computing K-Nearest Neighbor
  • Lab Guide: Generating a K-Means Clustering
  • Lesson 11:
  • Natural Language Processing
  • Demonstration: POS Tagging Using a Decision Tree
  • Lab Guide: Using the Python Natural Language Toolkit
  • Lab Guide: Classifying Text using Naïve Bayes
  • Lesson 12:
  • Apache Spark MLib
  • Lab Guide: Using Spark Transformations and Actions
  • Lab Guide: Using Spark MLib
  • Lab Guide: Creating a Spam Classifier using Spark MLlib
  • Lesson 13:
  • Taking Data Science to Production
  • Wrapping Up
  • Course & Instructor Survey

About this course

Overview:
This course Provides instruction on the processes and practice of data science, including machine learning and natural languageprocessing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikitlearn), the Natural Language Toolkit (NLTK), and Spark MLlib.

Target Audience:
Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop.

Prerequisites:
Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course.
Students should be familiar with programming principles and have previous experience in software development. Experience with Linux and a basic understanding of DataFlow tools would be helpful. No prior Hadoop experience required, but is very helpful.

Format:
Live Instructor
50% Lecture
50% Hands-On Labs

 

How to Register:

  1. Click the  "Purchase" button at the top of the page to initiate your purchase
  2. After you have completed your purchase and registration, you will be able to select the event that you wish to attend from the classes scheduled below after logging into your account

Live events

HDP Analyst: Data Science - Live Training Schedule

Event Date Spaces Left
HDP Analyst: Data Science (Virtual) - 3 Nov 28, 2017 10:00 a.m. -
Nov 30, 2017 6:00 p.m. EST
25
HDP Analyst: Data Science (Virtual) - 3 Mar 06, 2018 10:00 a.m. -
Mar 08, 2018 6:00 p.m. EST
25

Curriculum

  • Course Logistics
  • HDP Analyst: Data Science - Live Training Schedule
  • Lesson 1:
  • AWS Guacamole Setup Guide
  • Using Hadoop for Data Science
  • Lab Guide: Setting Up the Development Environment
  • Lesson 2:
  • HDFS
  • Demonstration: Understanding Block Storage
  • Lab Guide: Using HDFS Commands
  • Lesson 3:
  • The MapReduce Framework
  • Demonstration: Understanding MapReduce
  • Lesson 4:
  • Hadoop 2 and YARN
  • Lesson 5:
  • Machine Learning From Data
  • Lab Guide: Using Apache Mahout for Machine Learning
  • Lesson 6:
  • Introduction to Pig
  • Demonstration: Understanding Pig
  • Lab Guide: Getting Started with Apache Pig
  • Lesson 7:
  • Python Programming
  • Lab Guide: Using the IPython Notebook
  • Lesson 8:
  • Analyzing Data with Python
  • Demonstration: Understanding the NumPy Package
  • Demonstration: Pandas Library
  • Lab Guide: Performing Data Analysis with Python
  • Lab Guide: Interpolating Data Points
  • Lesson 9:
  • Running Python on Hadoop
  • Lab Guide: Defining a Pig User Defined Function in Python
  • Lab Guide: Streaming Python with Pig
  • Lab Guide: Exploring Data with Apache Pig
  • Lesson 10:
  • Machine Learning Algorithms
  • Demonstration: Classification with Scikit-Learn
  • Lab Guide: Computing K-Nearest Neighbor
  • Lab Guide: Generating a K-Means Clustering
  • Lesson 11:
  • Natural Language Processing
  • Demonstration: POS Tagging Using a Decision Tree
  • Lab Guide: Using the Python Natural Language Toolkit
  • Lab Guide: Classifying Text using Naïve Bayes
  • Lesson 12:
  • Apache Spark MLib
  • Lab Guide: Using Spark Transformations and Actions
  • Lab Guide: Using Spark MLib
  • Lab Guide: Creating a Spam Classifier using Spark MLlib
  • Lesson 13:
  • Taking Data Science to Production
  • Wrapping Up
  • Course & Instructor Survey