Module Code - Title:
MS6071
-
R FOR STATISTICAL DATA SCIENCE
Year Last Offered:
2025/6
Hours Per Week:
Grading Type:
N
Prerequisite Modules:
Rationale and Purpose of the Module:
R is the core programming language in statistical data science. This module will start with an introduction to programming in R, building to more complex topics such as writing R packages. The module will cover data wrangling, data visualisation and dash-boarding, statistical modelling in R, and how to create R packages. Students will also learn about reproducible research, coding and modelling practices.
Syllabus:
1. R language essentials: objects, functions, packages and libraries, using the tidyverse and base R.
2. Data cleaning and wrangling: subsetting; filtering; merging; grouping, outlier detection, etc
3. Flow control and loops: for, while, if/else.
4. Data visualisation
5. Probability distributions and simulation.
6. Statistical inference: one- and two-sample inference for quantitative and qualitative data.
7. Predictive analytics: correlation and regression.
8. Reproducible research and coding practices using RMarkdown, Git and GitHub, and Quarto.
9. Developing RShiny dashboards.
10. Introduction to developing R packages.
Learning Outcomes:
Cognitive (Knowledge, Understanding, Application, Analysis, Evaluation, Synthesis)
On successful completion of this module, students will be able to:
1. Demonstrate proficiency in programming in R and RStudio/Posit.
2. Examine and explore data through appropriate visualisation methods.
3. Summarise data using relevant summary statistics.
4. Develop a statistical data science pipeline to visualise, summarise, and model data, and disseminate the results.
5. Apply reproducible coding practices to real-world datasets.
6. Develop interactive visualisation tools for dissemination.
Affective (Attitudes and Values)
On successful completion of this module, students will be able to:
1. Synthesise information across the statistical data science pipeline to develop algorithms for decision-making.
2. Formulate a well-constructed rationale to defend and justify coding and modelling approaches adopted.
3. Display a commitment to reproducible coding, modelling and dissemination practices.
Psychomotor (Physical Skills)
N/A
How the Module will be Taught and what will be the Learning Experiences of the Students:
This module will be taught using a flipped classroom approach. Lectures will consist of online material and hands-on labs will apply the knowledge to relevant data examples. It will contribute towards graduates who are KNOWLEDGEABLE (being able to bring their data science knowledge to bear on real-world problems), RESPONSIBLE (via reproducible analytics, modelling and dissemination practices), ARTICULATE (being able to present their findings to a variety of stakeholders, both academic and industry), CREATIVE (developing dashboards to disseminate findings and present research visually), COLLABORATIVE (sharing best practice in a data science team).
Research Findings Incorporated in to the Syllabus (If Relevant):
Prime Texts:
Peng, R.D. (2022)
R Programming for Data Science
, Available online
Wickham, H. and Grolemund, G. (2016)
R for Data Science
, Available online
Xie, Y., Allaire, J.J. and Grolemund G. (2018)
RMarkdown the Definitive Guide
, Available online
Wickham, H. and Bryan, J. (2021)
R Packages
, Available online
Other Relevant Texts:
Programme(s) in which this Module is Offered:
MSDSSLTFA - DATA SCIENCE AND STATISTICAL LEARNING
Semester(s) Module is Offered:
Autumn
Module Leader:
Maeve.Upton@ul.ie