AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS Anna University Syllabus R2021
AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS L T P C
3 0 0 3
OBJECTIVES:
- To understand the techniques and processes of data science
- To apply descriptive data analytics
- To visualize data for various applications
- To understand inferential data analytics
- To analysis and build predictive models from data
UNIT I INTRODUCTION TO DATA SCIENCE 08
Need for data science – benefits and uses – facets of data – data science process – setting the research goal – retrieving data – cleansing, integrating, and transforming data – exploratory data analysis – build the models – presenting and building applications.
UNIT II DESCRIPTIVE ANALYTICS 10
Frequency distributions – Outliers –interpreting distributions – graphs – averages - describing variability – interquartile range – variability for qualitative and ranked data - Normal distributions – z scores –correlation – scatter plots – regression – regression line – least squares regression line – standard error of estimate – interpretation of r2 – multiple regression equations regression toward the mean.
UNIT III INFERENTIAL STATISTICS 09
Populations – samples – random sampling – Sampling distribution- standard error of the mean - Hypothesis testing – z-test – z-test procedure –decision rule – calculations – decisions – interpretations - one-tailed and two-tailed tests – Estimation – point estimate – confidence interval – level of confidence – effect of sample size.
UNIT IV ANALYSIS OF VARIANCE 09
t-test for one sample – sampling distribution of t – t-test procedure – t-test for two independent samples – p-value – statistical significance – t-test for two related samples. F-test – ANOVA – Two- factor experiments – three f-tests – two-factor ANOVA –Introduction to chi-square tests.
UNIT V PREDICTIVE ANALYTICS 09
Linear least squares – implementation – goodness of fit – testing a linear model – weighted resampling. Regression using StatsModels – multiple regression – nonlinear relationships – logistic regression – estimating parameters – Time series analysis – moving averages – missing values – serial correlation – autocorrelation. Introduction to survival analysis.
TOTAL : 45 PERIODS
OUTCOMES:
Upon successful completion of this course, the students will be able to:
CO1: Explain the data analytics pipeline
CO2: Describe and visualize data
CO3 : Perform statistical inferences from data
CO4 : Analyze the variance in the data
CO5 : Build models for predictive analytics
TEXT BOOKS
1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Science”, Manning Publications, 2016. (first two chapters for Unit I).
2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley Publications, 2017.
3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016.
REFERENCES
1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green Tea Press, 2014.
2. Sanjeev J. Wagh, Manisha S. Bhende, Anuradha D. Thakare, “Fundamentals of Data Science”, CRC Press, 2022.
3. Chirag Shah, “A Hands-On Introduction to Data Science”, Cambridge University Press, 2020.
4. Vineet Raina, Srinath Krishnamurthy, “Building an Effective Data Science Practice: A Framework to Bootstrap and Manage a Successful Data Science Practice”, Apress, 2021
Comments
Post a Comment