Data  Engineering  

The Introduction to Data Science class will survey the foundational topics in data science, namely:

  • Data Manipulation
  • Data Analysis with Statistics and Machine Learning
  • Data Communication with Information Visualization
  • Data at Scale — Working with Big Data
  • Introduction to R

The class will focus on breadth and present the topics briefly instead of focusing on a single topic in depth. This will give you the opportunity to sample and apply the basic techniques of data science.

  • Any Graduates(B.A,B.Com,B.Sc)
  • Engineering Students(B.Tech, B.E, M.Tech)
  • BCA,MCA
  • Any Diploma Holder
  • Any Working Professionals
  • Data Warehouse Administrators
  • Database Administrators
  • Software Tester
  • Project Manager
  • MIS Support

Fundamentals of BigData

  • What is Big Data
  • Managing Bigdata
  • Extracting insights from Bigdata
  • Bigdata for business intelligence

 

Development Environment

  • Introduction to Databricks
  • Databricks account setup

 

Spark – DataFrame API

  • Create DataFrame
  • Schema Inference
  • File formats – awareness
  • Define custom schema
  • Introduction to DBFS

 

Spark – DataFrame – Functions

  • Functions, Filters & Aggregations
  • Windowing
  • Partitions & Bucketing
  • Joins

 

Spark SQL

  • Functions, Filters & Aggregations
  • Windowing
  • Partitions & Bucketing
  • Joins

 

Spark Architecture

  • Why Spark
  • Distributing computing
  • Cluster concepts
  • RDD concepts
  • Memory Management
  • Spark Optimization
  • Structured Streaming
  • Spark UI

ELT with Spark SQL

  • Data Extraction techniques
  • Data load features
  • Transformation techniques
  • Delta lake
  • Lakehouse architecture

 

Big Data eco system

  • Resource Manager( YARN)
  • HIVE
  • Fundamentals of Cloud Computing
  • Introduction, Cloud computing architecture
  • Delivery Models, Deployment Models and Benefits of moving to cloud

 

Just Enough Scala/Python for Spark Programmers

  • Getting started with Python Vairables and DataTypes Loops and Conditions Methods
  • Functions and Packages Collection and Classes

 

Project:   Real Life Case Study

  • Learners are taught to understand business intelligence and business and data analytics.
  • To understand the business data analysis through the powerful tools of data application.
  • Learn how to apply Tableau, MapReduce, and get introduced in to R and R+.
  • Understand the methods of data mining and creation of decision tree.
  • Explore different aspects of Big Data Technologies.
  • Learn the concepts of loop functions and debugging tools.