Data  Science  

  • What will you learn
  • Audience
  • Course Topic
  • Course Objective

The Introduction to Data Science class will survey the foundational topics in data science, namely:

  • Data Manipulation
  • Data Analysis with Statistics and Machine Learning
  • Data Communication with Information Visualization
  • Data at Scale -- Working with Big Data
  • Introduction to R

The class will focus on breadth and present the topics briefly instead of focusing on a single topic in depth. This will give you the opportunity to sample and apply the basic techniques of data science.

  • Any Graduates(B.A,B.Com,B.Sc)
  • Engineering Students(B.Tech, B.E, M.Tech)
  • BCA,MCA
  • Any Diploma Holder
  • Any Working Professionals
  • Data Warehouse Administrators
  • Database Administrators
  • Software Tester
  • Project Manager
  • MIS Support

Data Science Overview

 Session 1: Introduction to R 

  • Introduction to Analytics and R
  • Installation of R software
  • R packages
  • Simple Calculations
  • Vector Creation and Concatenation
  • Property of Vector
  • Concepts of Vector – Numeric, Character, and Factor
  • Creating Factor
  • Conversion to other data types
  • Concepts of class(), typeof(), levels()
  • Creating Data frame
  • Structure
  • Concept of Strings as Factors
  • Renaming
  • Accessing columns
  • Subsetting
  • Horizontal and vertical allignment
  • Filtering
  • Usage of Boolean operations in Filtering
  • Filtering using Multiple conditions (Normal and Alternative process)
  • Accessing top and bottom rows
  • Sorting rows
  • Multiple sorting
  • Attach and paste command
  • Gather and spread command
  • Checking number of rows and columns
  • Importing and Exporting data

 

Session 2: Data handling in R

  • Usage of simple if-else statement
  • Multiple if-else statement (Normal and Alternative process)
  • Usage of if-else with AND, OR conditions
  • Appending different data sets
  • Merging (Inner, Outer, Left and Right)
  • Handling of Missing values
  • If else statement
  • Extra trick of using if else statement
  • Removal of Duplicates
  • Merging – Inner, Outer, Left and Right
  • Different Statistical Functions
  • Removal of duplicate values
  • Viewing of duplicate rows
  • Viewing unique rows
  • Renaming a column
  • Usage of cut() function
  • Missing values
    • Deleting missing values
    • Viewing rows with missing values
    • Viewing rows without missing values
    • Missing value imputation
  • Deleting a column

 

Session 3: String and Date functions in R 

  • Inbuilt string functions in R
  • Data cleaning with text functions
  • tolower()
  • toupper()
  • str_to_title()
  • gsub()
  • trimws()
  • trimws() with which
  • str_trim()
  • removeNumbers()
  • removePunctuation()
  • stripWhitespace()
  • Other usages of gsub()
    • gsub() to remove punctuation
    • gsub() to remove numbers
    • gsub() to remove letters
    • gsub() to remove space
    • gsub() to remove numbers, space and punctuations at a time
    • gsub() to keep digits and remove everything else
    • gsub() to keep spaces and remove everything else
    • gsub() to keep alphabets, numbers and remove everything else
  • Splitting the string
  • Replacing characters in a string
  • Translating words in a string
  • Use of substring()
  • Extracting words
  •  
  • Inbuilt date functions in R
  • Printing present date and time
  • Extraction of day
  • Extraction of weekdays
  • Extraction of month
  • Extraction of months in words
  • Extraction of year
  • Concatenating day, month, year with delimiter
  • Showing date in a specific format
  • Updating the date
  • Adding days or weeks or months or years
  • Showing difference between two dates in weeks, days, hours, minutes and seconds

 

Session 4: Pivot Table and SQL in R 

  • Pivot Table in R
    • dcast() function
    • setDT() function
    • list() function
    • aggregate() function
    • count() function
    • table() function

 

  • SQL in R
    • Reading all data
    • Reading specific variables
    • Creating aliases for variables
    • Storing results with condition
    • Creating new columns with condition
    • Descriptive statistics in SQL
    • Descriptive statistics in SQL differentiated by some variable
    • Descriptive statistics in SQL sorted by some variable
    • Alternative method of if-else
    • Aggregate functions
    • Removal of duplicated rows
    • Alternative method to create new fields
    • Counting the frequencies

Session 5: Basics of Statistics 1 

  • Types of Data
  • Descriptive Statistics
    • Measures of Central Tendency
    • (Mean, median, mode, quantiles, percentiles, outliers)
    • Measures of Dispersion
    • Absolute Measures (Range, QD, IQR, MAD, Variance, Standard deviation)
    • Disadvantages of Absolute Measures of Dispersion
    • Relative Measures (Coefficient of Variation, Coefficient of QD, Coefficient of MAD)
    • Measures of Shape
    • (Skewness, Kurtosis, Box-and-Whisker Plot)

 

Session 6: Basics of Statistics 2 

  • Population and Sample
  • Sampling Distribution
  • Central Limit Theorem
  • Standard error
  • Normal Distribution
  • Area under the Normal Curve
  • Confidence Interval

Session 7: Introduction to Machine Learning Algorithm 

  • Machine Learning concept
  • Difference between Artificial Intelligence, Machine Learning and Deep Learning

 

  • Broad categorization of Machine Learning Algorithms
  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning

 

Python Training

Session 1:

  • Python as Calculator
  • Variable Types
  • Finding variable types
  • Tuples and Lists
  • Working with Lists
  • List within lists
  • Subset a list
  • Subset and calculate
  • Slicing and Dicing
  • Subset list of lists
  • List Manipulation

 

Session 2:

  • Dictionary
  • Accessing items
  • Create and print dictionary
  • Change values
  • Print items in a loop in dictionary
  • Dictionary length
  • Adding items
  • Adding two dictionaries
  • Removing items
  • Delete
  • Clear
  • Get
  • Dictionaries of dictionaries

 

Session 3:

  • String methods (capitalize, casefold, center, count, find, join, replace, split, strip, lower, upper, title, +)
  • Conditional Statements (if, elif, else)
  • Loops (while, for)
  • Functions
  • Lambda Functions
  • Scope of Variables – Global and Local Variables
  • Function Arguments, Keyword Arguments, Default Arguments, Variable Length Arguments
  • List Comprehension

 

Session 4:

  • Basics of Numpy
  • Numpy 2D Array & 3D Array
  • Printing Numpy Arrays
  • Indexing and Slicing Numpy array
  • Stacking
  • Numpy Side Effects

 

Session 5:

  • Introduction to Pandas
  • Object Creation (pd.Series, pd.DataFrame, pd.date_range)
  • Checking structure of data
  • Reading and Writing
  • Head and Tail of data
  • Viewing Columns
  • Summary of data for both numeric and categorical columns
  • Rows and columns of data

 

Session 6:

  • Dimensions
  • Change the type of columns to numeric
  • Converting multiple columns to numeric columns
  • Converting single column or multiple columns to categorical columns
  • Subset the data
  • Accessing rows and columns using loc and iloc
  • Filtering data, use of AND, OR, AND-OR, isin, ==, NOT, Multiple NOT

 

Session 7: 

  • Working with Missing Data
  • Statistics for the variables
  • Different merging (inner, left join, right join, outer join)

 

Session 8:

  • Grouping the variables (single variable single statistic, multiple variables single statistic, single variable multiple statistics, multiple variables multiple statistics)
  • Reshaping the data using pivot and melt

 

Session 9:

  • Data Visualization using Matplotlib and Seaborn

 

Machine Learning Using Python

  • Machine Learning Algorithms Using Python
  • Difference between Artificial Intelligence, Machine Learning, Deep Learning and Data Science,
  • What is Machine Learning?
  • Categorization of Machine Learning – Supervised Learning, Unsupervised Learning, Reinforcement Learning,
  • Difference between Regression and Classification
  • Concepts of Linear Regression

  • Learners are taught to understand business intelligence and business and data analytics.
  • To understand the business data analysis through the powerful tools of data application.
  • Learn how to apply Tableau, MapReduce, and get introduced in to R and R+.
  • Understand the methods of data mining and creation of decision tree.
  • Explore different aspects of Big Data Technologies.
  • Learn the concepts of loop functions and debugging tools.