Page 1 of 1

Module Code - Title:

MS6071 - R FOR STATISTICAL DATA SCIENCE

Year Last Offered:

2025/6

Hours Per Week:

Lecture

0

Lab

2

Tutorial

0

Other

0

Private

8

Credits

6

Grading Type:

N

Prerequisite Modules:

Rationale and Purpose of the Module:

R is the core programming language in statistical data science. This module will start with an introduction to programming in R, building to more complex topics such as writing R packages. The module will cover data wrangling, data visualisation and dash-boarding, statistical modelling in R, and how to create R packages. Students will also learn about reproducible research, coding and modelling practices.

Syllabus:

1. R language essentials: objects, functions, packages and libraries, using the tidyverse and base R. 2. Data cleaning and wrangling: subsetting; filtering; merging; grouping, outlier detection, etc 3. Flow control and loops: for, while, if/else. 4. Data visualisation 5. Probability distributions and simulation. 6. Statistical inference: one- and two-sample inference for quantitative and qualitative data. 7. Predictive analytics: correlation and regression. 8. Reproducible research and coding practices using RMarkdown, Git and GitHub, and Quarto. 9. Developing RShiny dashboards. 10. Introduction to developing R packages.

Learning Outcomes:

Cognitive (Knowledge, Understanding, Application, Analysis, Evaluation, Synthesis)

On successful completion of this module, students will be able to: 1. Demonstrate proficiency in programming in R and RStudio/Posit. 2. Examine and explore data through appropriate visualisation methods. 3. Summarise data using relevant summary statistics. 4. Develop a statistical data science pipeline to visualise, summarise, and model data, and disseminate the results. 5. Apply reproducible coding practices to real-world datasets. 6. Develop interactive visualisation tools for dissemination.

Affective (Attitudes and Values)

On successful completion of this module, students will be able to: 1. Synthesise information across the statistical data science pipeline to develop algorithms for decision-making. 2. Formulate a well-constructed rationale to defend and justify coding and modelling approaches adopted. 3. Display a commitment to reproducible coding, modelling and dissemination practices.

Psychomotor (Physical Skills)

N/A

How the Module will be Taught and what will be the Learning Experiences of the Students:

This module will be taught using a flipped classroom approach. Lectures will consist of online material and hands-on labs will apply the knowledge to relevant data examples. It will contribute towards graduates who are KNOWLEDGEABLE (being able to bring their data science knowledge to bear on real-world problems), RESPONSIBLE (via reproducible analytics, modelling and dissemination practices), ARTICULATE (being able to present their findings to a variety of stakeholders, both academic and industry), CREATIVE (developing dashboards to disseminate findings and present research visually), COLLABORATIVE (sharing best practice in a data science team).

Research Findings Incorporated in to the Syllabus (If Relevant):

Prime Texts:

Peng, R.D. (2022) R Programming for Data Science , Available online
Wickham, H. and Grolemund, G. (2016) R for Data Science , Available online
Xie, Y., Allaire, J.J. and Grolemund G. (2018) RMarkdown the Definitive Guide , Available online
Wickham, H. and Bryan, J. (2021) R Packages , Available online

Other Relevant Texts:

Programme(s) in which this Module is Offered:

MSDSSLTFA - DATA SCIENCE AND STATISTICAL LEARNING

Semester(s) Module is Offered:

Autumn

Module Leader:

Maeve.Upton@ul.ie