AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS Anna University Syllabus R2021

AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS L T P C
3 0 0 3

OBJECTIVES:

To understand the techniques and processes of data science

To apply descriptive data analytics

To visualize data for various applications

To understand inferential data analytics

To analysis and build predictive models from data

UNIT I                          INTRODUCTION TO DATA SCIENCE                         08

Need for data science – benefits and uses – facets of data – data science process – setting the research goal – retrieving data – cleansing, integrating, and transforming data – exploratory data analysis – build the models – presenting and building applications.

UNIT II                             DESCRIPTIVE ANALYTICS                           10

Frequency distributions – Outliers –interpreting distributions – graphs – averages - describing variability – interquartile range – variability for qualitative and ranked data - Normal distributions – z scores –correlation – scatter plots – regression – regression line – least squares regression line – standard error of estimate – interpretation of r2 – multiple regression equations regression toward the mean.

UNIT III                            INFERENTIAL STATISTICS                         09

Populations – samples – random sampling – Sampling distribution- standard error of the mean - Hypothesis testing – z-test – z-test procedure –decision rule – calculations – decisions – interpretations - one-tailed and two-tailed tests – Estimation – point estimate – confidence interval – level of confidence – effect of sample size.

UNIT IV                                 ANALYSIS OF VARIANCE                        09

t-test for one sample – sampling distribution of t – t-test procedure – t-test for two independent samples – p-value – statistical significance – t-test for two related samples. F-test – ANOVA – Two- factor experiments – three f-tests – two-factor ANOVA –Introduction to chi-square tests.

UNIT V                                PREDICTIVE ANALYTICS                           09

Linear least squares – implementation – goodness of fit – testing a linear model – weighted resampling. Regression using StatsModels – multiple regression – nonlinear relationships – logistic regression – estimating parameters – Time series analysis – moving averages – missing values – serial correlation – autocorrelation. Introduction to survival analysis.
TOTAL : 45 PERIODS

OUTCOMES:

Upon successful completion of this course, the students will be able to:
CO1: Explain the data analytics pipeline
CO2: Describe and visualize data
CO3 : Perform statistical inferences from data
CO4 : Analyze the variance in the data
CO5 : Build models for predictive analytics

TEXT BOOKS

1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Science”, Manning Publications, 2016. (first two chapters for Unit I).
2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley Publications, 2017.
3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016.

REFERENCES

1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green Tea Press, 2014.
2. Sanjeev J. Wagh, Manisha S. Bhende, Anuradha D. Thakare, “Fundamentals of Data Science”, CRC Press, 2022.
3. Chirag Shah, “A Hands-On Introduction to Data Science”, Cambridge University Press, 2020.
4. Vineet Raina, Srinath Krishnamurthy, “Building an Effective Data Science Practice: A Framework to Bootstrap and Manage a Successful Data Science Practice”, Apress, 2021

College Syllabus

Search This Blog