Big Data

Teachers

Included in study programs

Teaching results

Upon completion of the course, students should be able to:
A. define the basic concepts of big data management and analysis,
B. recognize the challenges that organizations face with big data
C. understand big data as it affects business, scientific progress, and our daily lives.
D. the ability to design scalable solutions for organizations of different types
E. Analyze and solve problems related to the processing and use of big data both conceptually and practically for a variety of industries such as government organizations, manufacturing, retail, education, banking/finance, healthcare and pharmaceuticals, and more.

Indicative content

1. Introduction to the problem of big data.
2. Current challenges, trends and applications of big data
3. Data types and data formats of big data.
4. Introduction to Hadoop, how Hadoop works
5. Hadoop ecosystem
6. Principles of HDFS
7. Technologies for big data management
8. YARN, HBase, Hive, Pig
9. Basic principles and data processing with MapReduce
10. HBase principles
11. Technologies for big data management
12. Algorithms for big data analysis
13. Big data application perspective and big data implementation issues

Support literature

1. Hendl, J.:Big data - Věda o datech, základy a aplikace (česky), Grada 2021
2. Holubová I., Kosek j., Minařík k., Novák D.: Big Data a NoSQL databáze. Grada, 2015, ISBN 9788024754666
3. Matthew J. Salganik. (2017). Bit by Bit: Social Research in the Digital Age. Princeton University Press.
4. Cathy O’Neil. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Penguin Books.
5. Rob Kitchin. (2014). The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. SAGE Publications
6. Lockwood, Glenn. (2014). Conceptual Overview of Map-Reduce and Hadoop. Blog Post (http://www.glennklockwood.com/data-intensive/hadoop/overview.html)
7. Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. (2014). The Parable of Google Flu: Traps in Big Data Analysis. Science 343(6176): 1203-1205.
8. Lazer, David. (2015). The Rise of the Social Algorithm. Science 348(6239): 1090-1091.
9. Anand Rajaraman and Jeffrey David Ullman (2011) Mining of Massive Datasets ISBN-10: 1107015359
ISBN-13: 978-1107015357
10. Murugesan, San; Bojanova, Irena, (2016) Encyclopedia of cloud computing. Wiley-IEEE Press. ISBN: 9781118821954

Syllabus

Within the course, the content will focus on the following three areas: • Introduction to the problem of big data. Current challenges, trends and applications. It also includes topics such as the history of big data, their elements, types, advantages, disadvantages, etc. Definition of big data, enterprise / structured data, social / unstructured data, unstructured data for analytical services, which are large data sets, sources of big data, industries using big data, challenges we face in the field of big data. Use of big data in enterprises and businesses. A Big Data application perspective that covers topics such as the use of big data in marketing, analysts, retail, healthcare, consumer goods, defense, government, and so on. • Algorithms for analyzing big data. Knowledge mining algorithms and UIs that have been developed specifically to solve the problems of processing big data. Data mining algorithms for big data and data streams. • Technologies for managing big data. Big Data technologies and tools, with special emphasis on the Map-Reduce paradigm and the Hadoop ecosystem. This area covers such topics as the introduction to Hadoop, the operation of Hadoop, Cloud computing (features, benefits, applications). Understanding the Hadoop and its ecosystem, which includes HDFS, MapReduce, YARN, HBase, Hive, Pig, Sqoop, Zookeeper, Flume, Oozie, etc. The basics of MapReduce and HBase emphasize the creation of a simple mapreduce framework and the concepts that apply to it. This area also covers the stack of large data files, i. data source layer, receive layer, source layer, security layer, visualization layer, visualization approaches, etc. This area also covers information about NoSQL data management systems, including document databases, relationships, graph databases, schema-free databases, and so on

Requirements to complete the course

Exercises 40% Exercises include the development and defense of projects, which students work on in exercises during the semester. Each project submitted is graded separately and the student must achieve at least a 51% pass rate when the results are aggregated. Verifies the summative level of learning outcomes D., E.
Examination 60% of the grade. The exam consists of two parts: a test and a specific problem-solving task. The test verifies the level of learning outcomes A.,B., C.

Student workload

Total study load (in hours): 6 credits x 26 hours = 156 hours
Distribution of study load
Lectures and seminar participation: 52 hours
Preparation for seminars: 13 hours
Written assignments: 13 hours
Final exam preparation: 60 hours

Language whose command is required to complete the course

slovak

Date of approval: 10.02.2023

Date of the latest change: 18.05.2022