Have any Question? 0755-4243743

Big Data


hello

Introduction

Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.

Quite simply, big data reflects the changing world we live in. The more things change, the more the changes are captured and recorded as data. Take weather as an example. For a weather forecaster, the amount of data collected around the world about local conditions is substantial. Logically, it would make sense that local environments dictate regional effects and regional effects dictate global effects, but it could well be the other way around. One way or another, this weather data reflects the attributes of big data, where real-time processing is needed for a massive amount of data, and where the large number of inputs can be machine generated, personal observations or outside forces like sun spots.


BigData Course Content

BIG DATA HADOOP

  • Big Data – the actual reason for Hadoop
  • Understanding Big data
  • Collecting and cleaning data
  • Traditional approach for processing and its challenges
  • Big data vs Hadoop

AN INTRODUCTION TO HADOOP

  • Hadoop overview
  • Hadoop components
  • Hadoop distributions
  • Getting started
  • What is HDFS<
  • What is Map Reduce
  • Hadoop stack
  • Hands On – Hadoop setup and basic operation

HDFS

  • HDFS explained
  • High availability
  • Federation
  • Architecture
  • File system Shell
  • Hands On

MAP REDUCE

  • Map Reduce flow
  • Hello World
  • Map Reduce API concepts
  • Mapper
  • Reducer
  • Other components – combiner,
  • partitioner, shuffle/sort
  • Hadoop 1.x vs 2.x
  • Hadoop streaming API
  • Hands on with Eclipse

YARN

  • Architecture
  • Scheduler
  • Resource Manager (RM)
  • RM HA
  • YARN commands
  • Hands On with YARN applications

INTEGRATING HADOOP INTO THE WORKFLOW

  • RDBMS interaction using Sqoop
  • Workflow management using Oozie
  • Back office jobs with Zookeeper
  • Hands On with actual data sets

DATA MINING

  • Unstructured data using PIG
  • Structured data mining using hive
  • Hands On with actual data sets

HBASE

  • Problem with SQL Database
  • Introduction to NOSQL
  • Hands On Exercises
  • Introduction to HBASE
  • Column Families
  • Delving deeper into HBASE
  • HBASE Architecture
  • HBASE Hands-On Exercises

DELVING DEEPER INTO THE HADOOP API

  • More about ToolRunner
  • Testing with MRUnit
  • Reducing Intermediate Data With
  • Combiners
  • The configure and close methods for
  • Map/Reduce Setup and Teardown
  • Writing Partitioners for Better Load
  • Balancing
  • Hands-On Exercise
  • Directly Accessing HDFS
  • Using the Distributed Cache

PRACTICAL DEVELOPMENT TIPS AND TECHNIQUES

  • Debugging MapReduce Code
  • Using LocalJobRunner Mode for Easier
  • Debugging
  • Retrieving Job Information with Counters
  • Logging
  • Splittable File Formats
  • Determining the Optimal Number of
  • Reducers
  • Map-Only MapReduce Jobs
  • Hands-On Exercise

JOINING DATA SETS IN MAPREDUCE

  • Map-Side Joins
  • The Secondary Sort
  • Reduce-Side Joins

SQL

INTRODUCING SQL BASIC COMMANDS

INTRODUCTION OF SQL

  • Writing basic select statements.
  • Restricting and sorting of data
  • Introducing sql function
  • Single row function and group function
  • Conditional Expression
  • Using substitution on variable
  • Introducing sql commands
  • Using DDL Statements
  • Managing Tables
  • Data manipulation operation
  • Understanding transaction
  • Using transaction control statements
  • Overview of locks
  • Using flashback and purge command
  • Granting and revoking system and object privileges
  • Designing tables by using key constraints
  • Deferred constraints
  • Retrieving data from more than one tables using join operation.
  • Aggregating Data Using Group Functions
  • Introducing With View
  • Introducing With Indexes
  • Introducing With Synonyms
  • Introducing With Sequence And Use With Database
  • Introducing Sub queries
  • Single row sub query ,Multiple row sub query
  • Correlated sub query
  • Top - n analysis
  • Using the Set Operators
  • Inserting and Updating Data
  • Deleting Data
  • Creating Other Schema Objects
  • Managing Objects with Data Dictionary Views
  • Date and Time Function
  • Rollup and Cube Operator Using Group by Function
  • Multiple Insertion and Types
  • Hierarchical Tree
  • Sql loader

ADVANCED SQL

  • Overview of analytical function
  • Introducing inline view
  • Introduction of With clause
  • Materialize view
  • Overview of restore.
  • overview of semi join and anti join
  • Embedding sub query with Dml operation

LINUX

  • Installation of Linux
  • Access the command line
  • Manage files from the command line
  • Create, view, and edit text files
  • Manage local Linux users and groups