BigData Syllabus
Big Data is a term applied to data sets whose size or type is beyond the ability of traditional relational databases to capture, manage and process the data with low latency. Big data has one or more of the following characteristics: high volume, high velocity or high variety. Artificial intelligence (AI), mobile, social and the Internet of Things (IoT) are driving data complexity through new forms and sources of data. For example, big data comes from sensors, devices, video/audio, networks, log files, transactional applications, web, and social media — much of it generated in real time and at a very large scale.
Analysis of big data allows analysts, researchers and business users to make better and faster decisions using data that was previously inaccessible or unusable. Businesses can use advanced analytics techniques such as text analytics, machine learning, predictive analytics, data mining, statistics and natural language processing to gain new insights from previously untapped data sources independently or together with existing enterprise data.
Big Data Training modules
Why Big Data?
What is Big Data?
Use of Big Data
Different types of Databases:
OLAP & OLTP
Data
Hadoop major components
How Map reduce work?
HDFS
Reading and writing data from and to from HDFS
components in HDFS
Hadoop 1 and Hadoop 2
Hadoop1 Architecture
Hadoop2 Architecture
Map reduce V1 components & Responsibilities:
Job Tracker
Task Tracer
Map reduce in Hadoop2
Limitations and Advantages of Hadoop1 & 2
what is Yarn?
Yarn Architecture
Core Components in Yarn:
Application Master
Resource Manager
NodeManager
Container
How it Work?
Writing Map reduce Programs
Introducing UNIX Commands
HIVE
Why and What HIVE?
Hive Architecture:
Metastore
HiveQL
Working on Hive
Hive Data types:
Column Types
String typesv
Timestamp
Literals
Complex Types
Create Databases and Operations
Partitioning:
Static Partition
Dynamic partition
Hive Bucketing
Querying a Table:
To filter on conditions
Arithmetic and Logical Operators
Complex Operators
Built-in Functions
Aggregate functions
Joins:
Left outer join
Right Outer join
Full Outer Join
SubQuery
File Format in hive
Program from Java to hive
Program
PIG
Big Data Analytics Using Pig
Pig features
Pig Vs Mapreduce
Pig Architecture
Pig Components
How pig Work?
Data Model
Tuple, Bag, Relation
Pig - Execution Modes
LOAD file
Pig Latin - Data Types
Complex Data Types
Type construction operators
Relational Operators
Join Operator:
Self-join
Inner-join
Outer-join: left join, right-join, and full join
Using Multiple Keys:
Cross Operator
Union Operator
Split Operator
Filtering
Distinct Operator
Foreach operator
Order By
Limit Operator
Built in eval functions
String functions
Date Time functions
Program Exercise
R PROGRAMMING
OVERVIEW OF R
Features of R
Basic Syntax
R Data Types
R-variables
R-variables
R-Operators
Conditional Statements
R-switch statement
R-Loops
R-functions
R-Strings
R-Vectors
R-lists
R-Matrices
R Arrays
R Factors
R Dataframes
R Packages
Melting and casting
R using Files
R JSON file
R-databases
Pie Charts
Bar Charts
Boxplots
Histograms
Mean, Median, Mode
Line Graph
Scatter plots
HBASE
What is HBase?
How it Works?
HBase Architecture
HBase Features
Table Creation
Describe and Alter
DML: Data Manipulation Language
Table scope operators
Exercise
Sqoop
Sqoop introduction
Import Data
Export Datav
Sqoop JOB
codegen
Eval
List Databases
List tables
Spark
what is Apache Spark?
How it Works?
Architecture
RDD