Foundations of Data Science
Get certified as a Data Scientist with our Data Science Training Course
By reading this, you've already taken your first steps on the path the becoming a data scientist. Here are a few reasons to stick along!
Data Science is fast becoming one of most sought after professions in India and around the world.
More than 1.5 Lakh job openings for Data Scientists projected in 2020, increasing by 62% from 2019.
Data is everywhere, it is a universal currency. Learning how to gain insights from data is an invaluable skill to have.
If you are familiar with programming (in any language) and comfortable with mathematics at 12th standard (high school) level, then you should be able to follow along with this course. This course is well suited for the following learning objectives:
Understand the value of data science and the process behind using it.
Learn the fundamentals of statistics and probability required for data science.
Use Python to gather, store, clean, analyse, and visualise data-sets.
Apply statistical methods to formulate and test data hypotheses
Apply statistical inference to uncover relationships within data-sets
Understand the role of ML and DL in the data science pipeline
Understand real-world challenges with several case studies
Mitesh Khapra and Pratyush Kumar
Hello!
Information Questionnaire
Introduction
What is Data Science?
Collecting Data
Storing Data
Processing Data
Describing Data
Statistical Modelling
Algorithmic Modelling
Why is Data Science so popular today
Are AI and Data Science related?
Problem Solving
Knowledge Representation & Reasoning
Decision Making
Communication, Perception & Actuation
The Myths of Data Science
The Path to Data Science
Feedback: Introduction
Week-1 Quiz Test (Graded & compulsory)
Week-1 Quiz Explained (optional)
Engineering Aspects of Data Science
System Perspective of Data Science
CRISP - DM_Business Understanding
CRISP - DM_Data Understanding, Preparation & Modelling
CRISP - DM_Evaluation & Deployment
Programming Tools
Why Python?
Python - Libraries
Summary
Feedback: Engineering Data Science Systems
Week 2 Part-1 Quiz Test ( Graded & Compulsory)
Week 2 Part-1 Quiz Explained (Optional)
Introduction to Statistics
What is Statistics
How to select a Sample
How to Design an Experiment
How to Describe & Summarise Data
Why do we need Probability Theory?
How do we give guarantees for estimates made from sample
What is a hypothesis & How do we test it?
How to model relationship between variables?
How well does the model fit the data?
Summary
Feedback: What is Statistics
Week-2 Part-2 Quiz Test (Graded & compulsory)
Week-2 Part-2 Quiz Explained (Optional)
Getting started with Python
Google Colab
Printing & Basic Data Types
Variables
Integers, Floating Points, Boolean types & Input
Processing Strings, Integers & Floating Points
If, For, While Blocks
Functions
Download: Week 3 Course NoteBook
Assignment Problems
Week 3 Assignment 1 Questions
Week 3 Quiz Test (Graded & Compulsory)
Week 3 Quiz Explained (Optional)
Solution to Assignment Problem 1 - Part 1
Solution to Assignment Problem 1 - Part 2
Solution to Assignment Problem 2 - Part 1
Solution to Assignment Problem 2 - Part 2
Download: Week 3 Assignment Solutions
Feedback: Getting started with Python
Bog Contest 1 - Winners
Introduction to Descriptive Statistics
Different types of Data
How to describe Qualitative Data?
Course Insights
How to describe Quantative Data? Histograms
Histograms Continued...
Typical Trends in Histograms
Uses of Histograms in ML
Stem and Leaf Plots
How to describe relationship between variables? Scatter Plots
Uses of Scatter Plots in ML
Summary
Feedback: Descriptive Statistics Part 1
Week 4 Quiz test (Graded & Compulsory)
Week 4 Quiz Explained (optional)
Commenting and Error Handling
Lists
Lists - Continued
Solution - Exercise problem on Lists
Tuples & Sets
Dictionaries
(Solution - Exercise problem on Dictionaries ) & (Exercise problem in Design Thinking)
Solution - Exercise problem on Design Thinking
File Handling - Read
File Handling - Write
Solution Parts 1, 2 (Exercise on most common words)
Solution Part 3 (Exercise on most common 2-grams)
Feedback: Python (contd)
Week 5 Quiz test (Graded & Compulsory)
Week 5 Quiz Explained (Optional)
Download: Week 5 course notebook, sample text file
Week 5 Assignment - Download (Compulsory)
Week 5 Assignment Solutions - Download (Optional)
Python Data Objects Reference NoteBook
Introduction - Measures of Centrality and Spread
Different measures of Centrality
Characteristics of Measures of Centrality
Sensitivity of the Measures of Centrality to Outliers
What do the measures of Centrality look like for different types of distributions?
Compute median from a Histogram
Compute mean from a Histogram
Compute Mode from a Histograms
Effect of Transformations on the measures of centrality
Summary
Feedback: Descriptive Statistics Part 2
Week 6 Quiz Test (Graded & Compulsory)
Week 6 Quiz Explained (optioinal)
Introduction to Measures of Spread - Percentiles
Procedure for Computing Percentile
Alternative methods for Computing Percentile - Part - 1
Alternative methods for Computing Percentile - Part - 2
Frequently used Percentile
Compute the Percentile rank of a value in the data
Effect of Transformation on Percentiles
Summary Percentiles
Measures of Spread
Measures of Spread (Variance)
Why we square the Deviations ?
What does the variance tell us about the data ?
Effect of Transformations on Measures of spread
How do you use mean & Variance to Standardise data ?
Summary Measures of Spread
What are Box Plots ?
Feedback: Descriptive Statistics Part 3
Week 7 Quiz Test (Graded & Compulsory)
Week 7 Quiz Explained (Optional)
Python Data Containers - Reference
W8 Data Files - Download
NumPy
High Dimensional Array & Creating NumPy Array
Indexing
Numpy Operations
Problem Solution
Broadcasting
File handling
Stats with Numpy
Rules of Statistics
Case Study & Problems
Problem Solution Part 1
Problem Solution Part 2
Problem Solution Part 3
Lecture Notebooks - Download
Feedback: Numpy
Numpy - Additional Exercises
Week-8 Quiz test (Graded & Compulsory)
Week-8 Quiz Explained (Optional)
W9 Data File - Download
Introduction - Pandas
Creating Series Object
iLoc & Loc
Simple Operations
Solution - Task 1
Solution - Task 2 & 3
NIFTY case study
Case Study Solution
W9 Lecture Notebook
Week -9 Quiz Test (Compulsory)
Week -9 Quiz Explained (Optional)
Feedback: Pandas
Dataframe Object
Task on creating Dataframes
Creating Mean row
Working with Planetary dataset
Droping Null Values
Querying from dataframe
Applying functions to dataframes
Use of groupby method
Filter, Split, Apply, Aggregate
Working with Nifty50 Dataset
Nifty data - Download
Tasks on NIFTY datasets
W10 - Pandas (continued) Notebook
Feedback: Pandas (continued)
Week 10 Quiz Test (Compulsory)
Week 10 Quiz Explained (Optional)
Data Visualisation
Read Complex JSON files
Styling Tabulation
Distribution of Data - Histogram
Box Plot
Distribution of a categorical variable
Joint Distribution of two variables
Swarm Plot
Violin Plot
Multiple Violin Plots
Paired Violin Plot
Faceted plotting
Pair Plot
Boxen Plots
Feedback: Visualization
Week 11 - Quiz Test (Compulsory)
Week 11 - Quiz Explained (optional)
W11 - Visualisation Notebook
Data Visualization - Recap
Pie Chart
Donut Chart
Stacked Bar Plot
Relative Stacked Bar Plot
Time - Varying compostion of data
Stacked Area Plot
Scatter Plots
Bar Plot
Continuous vs Continuous Plot
Line Plot
Line Plot Covid Data
Heat Map
Summary & Task on open-ended visualisation
W12 - Visualisation (cont.) Notebook
Feedback: Visualization (cont.)
Week 12 Quiz Test (Compulsory)
Week 12 Quiz Explained (optional)
Pandas Recap
Handling missing data
Missing data with Pandas
Open ended descriptive statistics
Agriculture Example Part 1
Agriculture Example Part 2
Week 13 Lecture NB
Feedback: Approching Open Ended DS Problem
Week 13 Quiz Test (Compulsory)
Week 13 Quiz Explained (Optional)
Why do we need Counting and Probability Theory?
Very Simple Counting
The Multiplication Principle
Multiplication Principle Special Case: Sequences with Repetition
Multiplication Principle Special Case: Sequences without Repetition
Example: A Different Kind of Sequence
Multiplication Principle Special Case: Sequence Length Equals the Number of Objects
The Subraction Principle
Collections
Collections (Some Examples)
Collections with Repetitions
Collections (+ multiplication principle)
Collections (+ subraction principle)
Summary
Week 14 Quiz Test (Compulsory)
Week 14 Quiz Explained (Optional)
Feedback: Counting
Introduction
The Element of Chance (Nothing in life is certain)
A brief overview of Set Theory
Properties of Set Operations
Experiments & Sample spaces
Events of an Experiment
Axioms of Probability
Some properties of Probability
Example problems (Probability Theory)
Designing Probablity functions (as relative frequency)
Designing Probablity functions (equally likely outcomes)
Summary - 1
Conditional Probabilities
Examples (Conditional Probabilities)
The Multiplication Principle
Total Probability Theorem
Bayes' Theorem
Independent Events
Summary - 2
Week 15 Quiz Test (Compulsory)
Week 15 Quiz Explained (Optional)
Feedback: Sample spaces & Events
Introduction
Random Variable
Probability Mass Functions
Properties of PMF
Disctrete distributions
Bernoulli Distribution
Binomial Distribution
Example (Binomial Distribution)
More Examples (Binomial Distribution)
Is Binomial Distribution a valid distribution ?
Geometric Distribution
Is Geometric distribution a valid distribution ?
Uniform Distribution
Expectation
Examples - Expectation
Properties of Expectation
Function of a Random Variable
Variance of a Random Variable
Properties of Variance
Summary
Week 16 Quiz Test (Compulsory)
Week 16 Quiz Explained (Optional)
Feedback: Random Variables
Introduction
Continuous Random Variable
Intution : Density vs Mass
Uniform Distribution (Continuous)
Some Fun with Functions
Normal Distributions
Probability Density Function
Standard Normal Distribution
Sampling Methods
Experimental Studies
Week 17 Quiz Test (Compulsory)
Week 17 Quiz Explained (Optional)
Feedback: Distributions & Sampling Strategies
Introduction - Inferential Statistics
Distribution of Sample Statistics
Parameter
Sample
Why do we Compute Statistics ?
Estimate Population Parameters
Random Sample
Recap : Probability
Probability Space
What kind of random variables ?
What is inferential statistics?
Our Roadmap
Demo 01
Demo 02
Demo Problems
Exercise - Part 1
Exercise - Part 2
Week 18 Quiz Test (Compulsory)
Week 18 Quiz Explained (Optional)
Feedback: Distributions of Sample Statistics
Central Limit Theorem
Demo 01
Alternative version of CLT
CLT - Attempt at Proof
Implications of CLT
Computing area under N
Demo 02
Special Significance for N
Likelihood of sample mean
Super-Impose N
Approximating Distributions
Demo 03
Normal Approximation of Binomial Distribution
Week 19 Quiz Test (Compulsory)
Week 19 Quiz Explained (Optional)
Feedback: Central Limit Theorem
Chi Square Distribution
Estimating E[S2]
Estimating E[S2] - Exercise
Geometric arguement
Algebraic arguement
Find Expected value of the error
Estimating Var[S2]
Distribution of sum of squares of standard normal variables
Distribution for N>1
k degrees of freedom
Variance of X2(k)
Recap & Statistics of S2
On to Experiments
Expectation of Proportion
Variance of Proportion
Week 20 Quiz Test (Compulsory)
Week 20 Quiz Explained (Optional)
Feedback: Chi-square Distribution
Point and Interval Estimators
Examples to Solve
What are the Estimator
Properties of Estimator
Point Estimator for Mean & Proportion
Point Estimator for Sample Variance
Example Estimation with TimeSeries
Real World Problem
On to Interval Estimators
Interval Estimator of μ with known σ
Examples of Estimator
Examples of Estimation
Lower and Upper Bounds
Upper Confidence Bound
Interval Estimator of μ with unknown σ
T Distribution Plots
Comparing interval bounds with z- and t- variables
Examples with T Statistics
Computing interval bounds for population proportion p
Week 21 Quiz Test (Compulsory)
Week 21 Quiz Explained (Optional)
Feedback: Point and Interval Estimators
Hypothesis Testing Case Study - 1
Case Study 2
Case Study 3 & 4
Case Study 5 & 6
Three Cases
Variance: Known - Case Study 1
Variance: Known - Case Study 2
Effect of n, σ, and α
Variance: Known - Case Study 3 & 4
Variance: Known - Case Study 5 & 6
z-test vs t-test
Variance: Unknown - Case Study 1 & 2
Hypothesis testing proportion(p) instead of mean
Type 1 & Type 2 errors
Two tailed & One tailed z- test
Two tailed & One tailed t- test
Plotting Distribution
Chi-Square test of independence (case studies)
Chi-Square test of independence (case study -2)
Summary
Week 22 Lecture Notebook
Week 22 Quiz Test (Compulsory)
Week 22 Quiz Explained (Optional)
Feedback: Week 22
End of course Feedback
Obtaining Certificate
Driven by our passion for teaching and interest in nation building,
all PadhAI One courses are offered at very affordable prices.
For students/faculty | For professionals |
Students enrolled in schools/colleges and faculty members | Working professionals and those looking to up-skill |
Applicants must provide a valid ID card indicating present affiliation. | No pre-requisites |
Rs 1,000 + 18% GST for each course | Rs 5,000 + 18% GST for each course |
Student Testimonials from our previous offering: PadhAI Deep Learning
Each week, we will release 2 to 3 hours of video content. We recommend 2 to 3 hours of self-learning and practice. Thus, a weekly commitment of 4 to 6 hours is required. The duration of the course is for 18 to 20 weeks.
However, in case you are unable to find this time due to other commitments, you can do the course at your own pace and complete it within any time within one year.
Yes, if you complete the entire course and finish the assignments, you will receive a certificate from One Fourth Labs. This is digitally signed and can be shared on LinkedIn and other websites.
Each course in the PadhAI One Data Science series will have a separate certificate.
You will have access to the course content (videos, assignments, community) for 1 year from the start of the course.
The Foundations in Data Science course focuses on the basics of statistics and Python programming for data science. These fundamentals are required for many job roles.
Also, in the machine learning course, we will assume a background in these areas. If you are confident about the topics enlisted in the syllabus, then you can directly join the Machine Learning course that begins later this year.
No, we do not provide any computational resources. The course platform only hosts the video lectures and assignments. All programming assignments and projects will be done on Google Colaboratory, which is a freely available resource. In the course, we provide a tutorial on how to use Google Colaboratory. It is therefore sufficient to have a standard computer and a good internet connection.
You will have access to the PadhAI course community where you can post your queries. Dedicated TAs will answer them. You are also encouraged to interact with your peers and learn together.
While data science is a highly sought after job role, we do not provide any placement guarantee or support.