Data Science Basics

Teachers

Included in study programs

Teaching results

Knowledge and competences:
After completing the course, the student is able to grasp the process of data collection, data storage and data processing for answering the business question. The course offers the student the opportunity to master the basic concepts and techniques of effective work with data and their analysis. The student will understand how to conceptually approach the process of gathering the knowledge from complex, multidimensional data and will gain an intuition of what business-related questions he/she is able to answer by in-depth analysis. Emphasis is placed on preprocessing, exploratory analysis (EDA) and smooth data visualization.
Skills:
The student will encounter primarily two programming languages, namely PostgreSQL an Python, and other visualization tools.

Indicative content

1. Introduction to real-world data-related issues - big data, multidimensional data, structured and unstructured data, discussion on the topic of data in practice.
2. Data storage in relational databases - creating and updating the tables with PostgreSQL and understanding different data types, simple querying of one and / or more columns from an SQL table, use of aliases and compound queries, filtering rows using WHERE and logical operators
3. Merging and aggregating data in relational databases - applying the optimization strategy of setting primary and foreign keys, sorting and grouping of data using HAVING and application of aggregation functions, JOIN for joining tables by practicing INNER, LEFT and RIGHT joining. Introducing the key word UNION.
4. Nested data query in relational databases - nested queries inside SELECT, FROM and WHERE clauses, basic arithmetic in nested queries.
5. Reporting and exploratory analysis (EDA) in relational databases - exploring PostgreSQL databases and analysing the data in them, summarize their main characteristics of data set by exploratory data analysis (EDA).
6. Python scripting language and its ecosystem - Python script development in common integrated programming environments (IDEs), variable assignment, basic commands in Python, work with user input, logical operations, basic arithmetic and string formatting, work-flow control using conditions (if, else, elif).
7. Basics of programming in Python - working with data arrays, manipulating mentioned data objects by predefined methods and functions, cycles with controlled run and use of Python-specific keywords, making it easier, runtime optimization using list comprehension.
8. Introduction to data science - working with NumPy library objects, core principle of vectorization seeking optimization of performance, arithmetic operations in NumPy.
9. Data science with the use of Pandas - understanding of basic Pandas-specific object and getting familiar with their specifics, retrieving the values ​​and mastering common data issues - missing values, outliers, and conditional data operations, importing external data sources with Python.
10. Data science with the use of Pandas II. - data grouping and sorting functions using one / more predefined aggregation functions and creating user specific functions, merging tables in Pandas, understanding multi-indexing.
11. Data visualization in Python - visualization using various graphs by predefined libraries and interactive graphs, modifying the chart parameters and style, subplots.
12. Data acquisition -API connections with requests library, interaction with the application interface and formatting of the obtained data. extraction of data from the web by beautifulsoup library, formatting html data, creating a Spider application designed for comprehensive web browsing.
13. Comprehensive summary applied in an economic analysis - comprehensive analysis requiring applying all presented topics.

Support literature

- VANDERPLAS, J. Python Data Science Handbook: Essential Tools for Working with Data. O’Reilly Media, 2016. 541 s. ISBN 978-1-4919-1205-8.
- MOLINARO, A. SQL Cookbook: Query Solutions and Techniques for All SQL Users 2nd Edition, Kindle Edition. O’Reilly Media, 2020. 806 p. ASIN: B08P3XYBM1
- LUTZ, M. Learning Python, 5th Edition. O’Reilly Media, 2016. 1648 s. ISBN 978-1-4493-5573-9.
- NELSON, D. Data Visualization in Python. Kindle Edition, 2020. 405 s. ASIN: B08QVJJFG8.

Requirements to complete the course

20% - midterm exam
20% - seminar paper
60% - final exam

Student workload

- participation in lectures 26 hours
- participation in seminars 26 hours
- studying for midterm exam 26 hours
- writing the seminar paper 26 hours
- preparation for the exam 52 hours

Date of approval: 11.03.2024

Date of the latest change: 28.12.2021