Page 1 of 1

Module Code - Title:

EE4032 - TENSOR AND GPU FUNDAMENTALS

Year Last Offered:

2025/6

Hours Per Week:

Lecture

2

Lab

2

Tutorial

0

Other

0

Private

6

Credits

6

Grading Type:

N

Prerequisite Modules:

CE4703
CE4518

Rationale and Purpose of the Module:

In today's complex computing applications, there is a more towards incorporating AI (artificial intelligence), machine learning and deep learning concepts and algorithms within the computing software and hardware. Many systems are based on software programs operating on a CPU (central processing unit). With the need for high performance computing (HPC), the designer utilises other forms of processing unit and hardware resources to develop a computing platform that meets the needs of an application, in terms of processing time, data storage and processing, and cost. There is a need to understand how to use the available hardware and software resources available. The GPU (graphics processing unit) will be explored as a superior processor architecture to the CPU for AI and machine learning applications. The module is also to be offered on the new MEng programme offered by the Department of Electronic and Computer Engineering (MEng in Electronic and Computer Engineering).

Syllabus:

The module will focus on the use of appropriate computing platform hardware and will be based on two parts. Each part having a specific focus and purpose as follows: ------------------------------------------------------------------- Part 1: The Graphics Processing Unit (GPU) for AI and machine learning ------------------------------------------------------------------- Heterogeneous parallel computing, architecture of a modern GPU, challenges in parallel computing, data parallel computing, CUDA program structure. Device and host memory transfers, kernel functions and threading, thread organisation, launching kernels. Thread scheduling and latency, CUDA memory types and usage, tiling. Warps, thread granularity, numerical and arithmetic issues with CUDA. -------------------------------------------------------------- Part 2: Data structures and hardware for AI and machine learning -------------------------------------------------------------- Representing data: scalars, vectors, arrays, matrices and tensors. Tensors: What are tensors, why use tensors? Example applications of tensors. Tensor calculus: Tensor arithmetic. Tensor rank. Tensor products. Modelling the world using tensors. Multidimensional arrays. Hardware considerations: Processing units (C - Central, G - Graphics and T - Tensor). The TPU. Memory. The field programmable gate array (FPGA) and the application specific integrated circuit (ASIC).

Learning Outcomes:

Cognitive (Knowledge, Understanding, Application, Analysis, Evaluation, Synthesis)

Detail the different CUDA memory types and their respective uses. Exploit tiling for efficient use of global memory. Understand how complex data sets would stored and analysed. Understand the role of the different hardware components required in an AI and machine learning computing platform.

Affective (Attitudes and Values)

Appreciate the role of the GPU in AI and machine learning applications. Appreciate how to use the GPU. Appreciate the need for suitable hardware platforms for implementing AI and machine learning algorithms. Appreciate how Python and TensorFlow can be used to model complex, multidimensional arrays to solve tensor analysis problems.

Psychomotor (Physical Skills)

Create, implement, run and test a CUDA program to transfer data between host and device. Launch Kernels with different thread organisations. Develop Python and TensorFlow scripts to model and solve multidimensional array problems.

How the Module will be Taught and what will be the Learning Experiences of the Students:

The module will focus on the use of appropriate computing platform hardware and will be based on two parts. Each part having a specific focus and purpose as follows: Part 1: The Graphics Processing Unit (GPU) for AI and machine learning. An introduction to the architecture, role and programming of the GPU as a superior processor architecture to the CPU for AI and machine learning applications. This part will involve the students learning how to program a GPU and working with complex data sets. The CUDA parallel computing platform will be used for GPU programming. Part 2: Data structures and hardware for AI and machine learning. An introduction to hardware design concerns for AI and machine learning applications. This will involve a consideration into the structures, storage and processing of complex data sets. The tensor will be introduced as a compact way in which to model and analyse complex, multi-dimensional data arrays. The Python programming language with TensorFlow will be used to develop the required practical skills.

Research Findings Incorporated in to the Syllabus (If Relevant):

Prime Texts:

David Kirk Wen-mei Hwu (2016) Programming Massively Parallel Processors , Morgan Kaufmann
Daniel Fleisch (2017) A Student's Guide to Vectors and Tensors , Cambridge University Press

Other Relevant Texts:

Programme(s) in which this Module is Offered:

BEECENUFA - ELECTRONIC AND COMPUTER ENGINEERING

Semester(s) Module is Offered:

Spring

Module Leader:

Richard.Conway@ul.ie