Data  Science

• What will you learn
• Audience
• Course Topic
• Course Objective

The Introduction to Data Science class will survey the foundational topics in data science, namely:

• Data Manipulation
• Data Analysis with Statistics and Machine Learning
• Data Communication with Information Visualization
• Data at Scale -- Working with Big Data
• Introduction to R

The class will focus on breadth and present the topics briefly instead of focusing on a single topic in depth. This will give you the opportunity to sample and apply the basic techniques of data science.

• Engineering Students(B.Tech, B.E, M.Tech)
• BCA,MCA
• Any Diploma Holder
• Any Working Professionals
• Software Tester
• Project Manager
• MIS Support

Data Science Overview

Session 1: Introduction to R

• Introduction to Analytics and R
• Installation of R software
• R packages
• Simple Calculations
• Vector Creation and Concatenation
• Property of Vector
• Concepts of Vector – Numeric, Character, and Factor
• Creating Factor
• Conversion to other data types
• Concepts of class(), typeof(), levels()
• Creating Data frame
• Structure
• Concept of Strings as Factors
• Renaming
• Accessing columns
• Subsetting
• Horizontal and vertical allignment
• Filtering
• Usage of Boolean operations in Filtering
• Filtering using Multiple conditions (Normal and Alternative process)
• Accessing top and bottom rows
• Sorting rows
• Multiple sorting
• Attach and paste command
• Checking number of rows and columns
• Importing and Exporting data

Session 2: Data handling in R

• Usage of simple if-else statement
• Multiple if-else statement (Normal and Alternative process)
• Usage of if-else with AND, OR conditions
• Appending different data sets
• Merging (Inner, Outer, Left and Right)
• Handling of Missing values
• If else statement
• Extra trick of using if else statement
• Removal of Duplicates
• Merging – Inner, Outer, Left and Right
• Different Statistical Functions
• Removal of duplicate values
• Viewing of duplicate rows
• Viewing unique rows
• Renaming a column
• Usage of cut() function
• Missing values
• Deleting missing values
• Viewing rows with missing values
• Viewing rows without missing values
• Missing value imputation
• Deleting a column

Session 3: String and Date functions in R

• Inbuilt string functions in R
• Data cleaning with text functions
• tolower()
• toupper()
• str_to_title()
• gsub()
• trimws()
• trimws() with which
• str_trim()
• removeNumbers()
• removePunctuation()
• stripWhitespace()
• Other usages of gsub()
• gsub() to remove punctuation
• gsub() to remove numbers
• gsub() to remove letters
• gsub() to remove space
• gsub() to remove numbers, space and punctuations at a time
• gsub() to keep digits and remove everything else
• gsub() to keep spaces and remove everything else
• gsub() to keep alphabets, numbers and remove everything else
• Splitting the string
• Replacing characters in a string
• Translating words in a string
• Use of substring()
• Extracting words
•
• Inbuilt date functions in R
• Printing present date and time
• Extraction of day
• Extraction of weekdays
• Extraction of month
• Extraction of months in words
• Extraction of year
• Concatenating day, month, year with delimiter
• Showing date in a specific format
• Updating the date
• Adding days or weeks or months or years
• Showing difference between two dates in weeks, days, hours, minutes and seconds

Session 4: Pivot Table and SQL in R

• Pivot Table in R
• dcast() function
• setDT() function
• list() function
• aggregate() function
• count() function
• table() function

• SQL in R
• Creating aliases for variables
• Storing results with condition
• Creating new columns with condition
• Descriptive statistics in SQL
• Descriptive statistics in SQL differentiated by some variable
• Descriptive statistics in SQL sorted by some variable
• Alternative method of if-else
• Aggregate functions
• Removal of duplicated rows
• Alternative method to create new fields
• Counting the frequencies

Session 5: Basics of Statistics 1

• Types of Data
• Descriptive Statistics
• Measures of Central Tendency
• (Mean, median, mode, quantiles, percentiles, outliers)
• Measures of Dispersion
• Absolute Measures (Range, QD, IQR, MAD, Variance, Standard deviation)
• Disadvantages of Absolute Measures of Dispersion
• Relative Measures (Coefficient of Variation, Coefficient of QD, Coefficient of MAD)
• Measures of Shape
• (Skewness, Kurtosis, Box-and-Whisker Plot)

Session 6: Basics of Statistics 2

• Population and Sample
• Sampling Distribution
• Central Limit Theorem
• Standard error
• Normal Distribution
• Area under the Normal Curve
• Confidence Interval

Session 7: Introduction to Machine Learning Algorithm

• Machine Learning concept
• Difference between Artificial Intelligence, Machine Learning and Deep Learning

• Broad categorization of Machine Learning Algorithms
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning

Python Training

Session 1:

• Python as Calculator
• Variable Types
• Finding variable types
• Tuples and Lists
• Working with Lists
• List within lists
• Subset a list
• Subset and calculate
• Slicing and Dicing
• Subset list of lists
• List Manipulation

Session 2:

• Dictionary
• Accessing items
• Create and print dictionary
• Change values
• Print items in a loop in dictionary
• Dictionary length
• Removing items
• Delete
• Clear
• Get
• Dictionaries of dictionaries

Session 3:

• String methods (capitalize, casefold, center, count, find, join, replace, split, strip, lower, upper, title, +)
• Conditional Statements (if, elif, else)
• Loops (while, for)
• Functions
• Lambda Functions
• Scope of Variables – Global and Local Variables
• Function Arguments, Keyword Arguments, Default Arguments, Variable Length Arguments
• List Comprehension

Session 4:

• Basics of Numpy
• Numpy 2D Array & 3D Array
• Printing Numpy Arrays
• Indexing and Slicing Numpy array
• Stacking
• Numpy Side Effects

Session 5:

• Introduction to Pandas
• Object Creation (pd.Series, pd.DataFrame, pd.date_range)
• Checking structure of data
• Head and Tail of data
• Viewing Columns
• Summary of data for both numeric and categorical columns
• Rows and columns of data

Session 6:

• Dimensions
• Change the type of columns to numeric
• Converting multiple columns to numeric columns
• Converting single column or multiple columns to categorical columns
• Subset the data
• Accessing rows and columns using loc and iloc
• Filtering data, use of AND, OR, AND-OR, isin, ==, NOT, Multiple NOT

Session 7:

• Working with Missing Data
• Statistics for the variables
• Different merging (inner, left join, right join, outer join)

Session 8:

• Grouping the variables (single variable single statistic, multiple variables single statistic, single variable multiple statistics, multiple variables multiple statistics)
• Reshaping the data using pivot and melt

Session 9:

• Data Visualization using Matplotlib and Seaborn

Machine Learning Using Python

• Machine Learning Algorithms Using Python
• Difference between Artificial Intelligence, Machine Learning, Deep Learning and Data Science,
• What is Machine Learning?
• Categorization of Machine Learning – Supervised Learning, Unsupervised Learning, Reinforcement Learning,
• Difference between Regression and Classification
• Concepts of Linear Regression

• Learners are taught to understand business intelligence and business and data analytics.
• To understand the business data analysis through the powerful tools of data application.
• Learn how to apply Tableau, MapReduce, and get introduced in to R and R+.
• Understand the methods of data mining and creation of decision tree.
• Explore different aspects of Big Data Technologies.
• Learn the concepts of loop functions and debugging tools.