Data Science in R

Teachers

Included in study programs

Teaching results

In particular, students acquire the following abilities:
- basic knowledge of data processing and visualization in R,
- basic knowledge in the field of programming in R,
- basic knowledge of project creation in R,
- basic knowledge of the possibilities of working with large databases using R.
Students acquire in particular the following skills:
- ability to use basic tools for data processing, visualization and analysis in R,
- using R and RStudio.
Students will acquire the following competencies:
- practical skills and competencies with the application of methods used to analyze data and solve economic and other problems.

Indicative content

The aim of this course is to provide knowledge in the field of data analysis in R and tools for their application in solving specific empirical problems. Emphasis is placed on the issues of data processing, selection, modeling and visualization. This course also contains basic information about the possibilities of working with large databases using the R program.

1. Mathematical operations in R, logical operators and comparison operators, data types in R, definition of variables and vectors, indexing of vectors and operations with vectors, lists.
2. Creation of matrices, operations with matrices, indexing of matrices, creation of table structures using data frames, selection and indexing of data frames and operations with data frames, import and export of data.
3. Basics of programming in R, condition if else, function ifelse, use of loops - for, while, creation of own functions.
4. Basic information about the group of packages tidyverse serving for data import, manipulation, modeling and visualization (packages such as readr, tibble, tidyr, dplyr, ggplot2, forcat, modelr…).
5. Data manipulation, use of dplyr package, selection of variables, filtering of variables, calculation of summary statistics, pipe operator (%>%).
6. Preparation and cleaning of data for data analysis (tidyr), grouping of data according to specific variables, work with categorical data, work with time formats.
7. Working with table structures (tibble), working with relational data, joining data from multiple tables based on keys, filtering using multiple tables.
8. Using the ggplot2 package to create various types of graphs (bar graph, pie graph, line graph, histogram, scatter graph, Boxplot…) and setting selected parameters of individual graphs.
9. Working with the Markdown R tool used to combine text, code and results.
10. Connecting and working with SQL database using dbplyr package. Working with large databases and connecting to other types of databases (dtplyr library, data.table).
11. Formulation and answering of a research question using the construction of a regression model and its testing (tidymodels, modelr).
12. Introduction to machine learning, overview of possibilities of using machine learning in R, application of machine learning using regression.
13. Basic information about the possibilities of data extraction from the web (import.io, rvest…).

Support literature

1. H. Wickham – G. Grolemund (2017). R for Data Science – visualize, model, transform, tidy and import data. https://r4ds.had.co.nz/index.html
2. J. Bryan – STAT545. https://stat545.com/
3. P. L. de Micheaux, R. Drouilhet, B. Liquet (2013). The R Software – Fundamentals of Programming and Statistical Analysis, Springer.

Requirements to complete the course

10 % activity during semester
20% tests
70 % semester project + final exam

Student workload

Total study load (in hours): 6 credits x 26 hours = 156 hours
Distribution of study load
Lectures participation: 26 hours
Seminar participation: 26 hours
Preparation for seminars: 13 hours
Preparation for tests: 13 hours
Semester project preparation: 52hours
Preparation for final exam: 26 hours

Language whose command is required to complete the course

Slovak

Date of approval: 11.03.2024

Date of the latest change: 16.05.2022