Module Code - Title:

CS4422 - DATA CENTRIC COMPUTING

Year Last Offered:

2025/6

Hours Per Week:

Lecture

2

Lab

14

Tutorial

1

Other

0

Private

18

Credits

21

Grading Type:

N

Prerequisite Modules:

Rationale and Purpose of the Module:

This is Block 3 (21 ECTS) on the 3+1 Integrated BSc/MSc Immersive Software Engineering and runs Year 1 Weeks -2 to 9 (11 Weeks) in the spring semester. This block introduces students to data centric computing in real world data rich scenarios, using programming APIs and appropriate data structures. Students will take a deep dive into problem centric data seams using data mining and machine learning techniques to capture and visualise patterns, leading to quantitative critical thinking. The block is oriented towards data mining for computer scientists and is motivated by similar offerings in Year 1 of undergraduate programmes in North American institutions.

Syllabus:

1. Define data science: formulating a question, data collection and cleaning, exploratory data analysis and visualisation, data mining techniques, and critical evaluation; and describe the relation between data mining, data analytics, data science. 2. Ethical dimensions of data science, such as: modelling the social and cultural values that inform how data-driven tools are developed and deployed; examining how data science might transform society 3. Data pre-processing: feature extraction, data cleaning, handling missing data, methods for identifying outliers, data transformation. 4. Mathematics for data science: linear algebra, differential equations, eigenvectors. 5. Methods for feature selection: filter, wrapper and embedded methods. 6. An overview of machine learning for data mining: supervised vs. unsupervised learning, classification, numeric prediction, clustering, association learning. 7. A sample of algorithms for building predictive and descriptive analytics models: - Predictive modelling algorithms for classification and numeric prediction, such as OneR, ID3, C4.5, Naïve Bayes, k-Nearest Neighbours, Prism, Support Vector Machines, linear regression, logistic regression, Perceptron, and others. - Descriptive modelling algorithms for clustering and association learning: k-means, apriori, max-miner. 8. Evaluation of predictive and descriptive analytics models: Holdout and cross-validation, cost-beneISE analysis, user feedback. 9. Visual analytics: methodology and workflow. 10. Case studies in subdomains, such as sentiment analysis, recommender systems, etc. 11. Apply data regulations in a data workflow with a focus on General Data Protection regulation (GDPR) and other national frameworks. 12. Preparing for Residency Part 3: series of seminars on professional practice such as standups, sprints, documentation, stakeholders, coping with uncertainty, setbacks, and conflict; and guest lectures on software development in practice

Learning Outcomes:

Cognitive (Knowledge, Understanding, Application, Analysis, Evaluation, Synthesis)

On successful completion of this module, students will be able to: - Summarise the main elements of the data science workflow. - Differentiate predictive from descriptive analytics in terms of methods and output. - Hand perform calculations using linear algebra, differential equations, and eigenvectors. Understand how these techniques are applied inside data science algorithms and libraries that implement these algorithms - Recognise and describe at least one algorithm in each of the four categories: the four categories: classification, numeric prediction, clustering, association learning. - Formulate a data science question based on a real world data set - Write a data mining program to pre-process this dataset, performing actions such as cleaning the data and identifying outliers - Extend this program to select appropriate features from the dataset - Extend this program to use these features to train one or more predictive and/or descriptive models - Extend this program to evaluate the model for accuracy, bias, and real-world usefulness - Demonstrate an understanding of the requirements for professional practice

Affective (Attitudes and Values)

On successful completion of this module, students will be able to: - Discuss the ethical dimensions of data science from a societal perspective - Appreciate the need for GDPR compliance - Present data to non-technical audiences

Psychomotor (Physical Skills)

On successful completion of this module, students will be able to:

How the Module will be Taught and what will be the Learning Experiences of the Students:

The block is taught using the problem-based learning, the flipped classroom concept, and blended learning in a state of the art laboratory setting with an emphasis on collaborative practice and technical excellence. Learning and teaching will be research led with a focus on translating theory into practice, innovation and knowledge creation.

Research Findings Incorporated in to the Syllabus (If Relevant):

Prime Texts:

I. H. Witten, E. Frank, and M. Hall. (2017) Data Mining: Practical Machine Learning Tools and Techniques , Elsevier Science & Technology

Other Relevant Texts:

M. North (2018) Data Mining for the Masses, Third Edition , CreateSpace Independent Publishing Platform;

J. Han, M. Kamber, and J. Pei. (2017) Data Mining: Concepts and Techniques , Elsevier Science & Technology

Programme(s) in which this Module is Offered:

Semester(s) Module is Offered:

Spring

Module Leader:

mark.burkley@ul.ie