Applied Data Analysis

Teachers

Included in study programs

Teaching results

The aim of the course is to teach students the knowledge and skills related to modern methods of applied data analysis and statistical learning, including the use of R software, for conducting empirical economic research and designing research methods to solve economic problems.
Upon completion of the course, students should acquire:
a) knowledge of modern methods of research and data visualization, linear regression, general data analysis, and statistical learning.
b) skills in working with data that they can use in their own empirical research. Additionally, they will acquire advanced skills in using modern software (R) for empirical economic research, be able to write their own functions, use functions for data visualization, and empirically estimate advanced methods of statistical learning.
c) competences to design research for a given economic problem and perform data analysis. They will be competent to further develop their knowledge in data analysis and the use of modern software, understand empirical articles from applied data analysis, and be able to apply them in new contexts.

Indicative content

1. Introduction to the R Programming Language and Function Anatomy
2. Basic Mathematical and Statistical Concepts in Statistical Learning, Notation, and Types of Variables
3. Introduction to Data Types, Their Loading, Cleaning, Wrangling, and Merging
4. Visualization of Categorical Data
5. Visualization of Numerical Data
6. Summarizing the Relationship Between Two (Categorical and Numerical) Variables - Linear and Non-Linear Relationships, Scatter Plots, Correlation, and Quantile Plots
7. Randomization and Randomized Controlled Experiments, Unbiasedness and Consistency of Estimates
8. Standard Errors and Confidence Intervals, Hypothesis Testing, Parameters vs. Hyperparameters, Classification vs. Regression
9. Fundamental Algorithms I: Linear Regression
10. Fundamental Algorithms II: Logistic Regression, Decision Trees, Support Vector Machine, k-nearest neighbors
11. Anatomy of Statistical Learning, Gradient Descent, Basic Variable Transformations and Algorithm Selection, Underfitting vs. Overfitting
12. Adjustment of Explanatory Variables (coding, normalization, handling missing data), Regularization, and Model Selection and Evaluation
13. Advanced Methods of Statistical Learning

Support literature

Required readings:
Hadley, W. , Çentikaya-Rundel, H., a Garrett ,G., 2022. R for data science. O’Reilly Media, Inc.
Burkov, A. The Hundred-Page Machine Learning Book. GitHub.
Recommended readings:
James, G., Witten, D., Hastie, T. a Tibshirani, R., 2013. An Introduction to Statistical Learning: with Applications in R. New York: Springer.
Imai, K., 2018 Quantitative social science: An Introduction. Princeton University Press.

Requirements to complete the course

20 % - activity during seminars
20 % - assignments
60 % - final exam

Student workload

Total study load: 156h
Out of that:
participation in lectures 26h,
participation in seminars 26,
preparation for seminars 26h,
assignments 26h,
preparation for the final exam 52h

Language whose command is required to complete the course

English, Slovak

Date of approval: 10.02.2023

Date of the latest change: 21.05.2024