Management and Processing of Big Data

Return to schedule

register-button24

 

Course Title Management and Processing of Big Data
Course Number 900-058-EQ
Platform Linux
Duration 48 hours
Emploi-Québec fee (taxes incl.)
$96
General Public fee (taxes incl.)
$772.63
Schedule Saturday & Sunday  10 a.m.- 5:30 p.m.
Last class on Saturday, July 22nd  10 a.m. – 4:30 p.m.
(Lunch 30 minutes)
Dates May 27, June 3, 10, 17, July 8, 15, 22
Prerequisites Comfortable in Linux environment, some programming experience is required (ideally JAVA )
Target Audience Developers, Programmers
Instructor Jacques LeNormand
Location Brittain Hall / BH-213

NB: This is a non-credit course. Certificate provided for all participants who have completed 80% of course hours

Course Description:

This course provides practical foundation level training that enables participation in big data projects. Participants will be introduced to big data technology and tools, including MapReduce and Hadoop. They will learn how to install and configure the Hadoop in cluster environment, how to write complex MapReduce programs, and how to analyze Big Data using Pig and Hive.

 

Topics Covered in this Course

Introduction into Hadoop

  1. Problems with traditional systems
  2. What is Hadoop and what does it do?
  3. Running your first Big-Data program Example

Components of Hadoop: Basic Concepts and HDFS

  1. The Hadoop Distributed File System
  2. MapReduce Overview
  3. Hadoop Cluster Overview
  4. Hadoop Jobs and Tasks

Hadoop Operations

  1. Setting up Amazon Cloud (AWS) environment
  2. Setting up Hadoop cluster on AWS
  3. Installation and configuration of Hadoop in Pseudo-distributed mode
  4. Removing and adding data nodes

Writing basic MapReduce Programs

  1. Writing basic MapReduce programs in Java
  2. Writing MapReduce programs with the Streaming API
  3. Unit Testing MapReduce Programs
  4. Accessing HDFS Programmatically

Advanced MapReduce Programs

  1. Chaining jobs
  2. Joining data from various resources

Related projects

  1. NoSQL databases
  2. Hadoop ecosystem: Pig, Hive, Sqoop, HBase

 

TOP