Page 1 of 1

Module Code - Title:

MN5001 - NATURAL LANGUAGE PROCESSING: AN INTRODUCTION

Year Last Offered:

2025/6

Hours Per Week:

Lecture

2

Lab

0

Tutorial

0

Other

2

Private

6

Credits

6

Grading Type:

N

Prerequisite Modules:

Rationale and Purpose of the Module:

This module introduces students to the world of Natural Language Processing (NLP). This module covers the fundamentals of statistical NLP, and its techniques and applications with a foundational approach. Natural Language Processing is a set of ICT skills and techniques that allows human language and text to be understood by electronic devices and computer systems. Natural Language Processing (NLP) gives the device the ability to understand human interactions. NLP focuses on translating human speech, gestures, and text into actionable data for the system to use. NLP works in the background to enable virtual assistants, chatbots, grammar and sentiment checkers as well as webpage translation. Combined with machine learning algorithms, NLP creates systems that can be trained to perform tasks and get better through experience. Drawing from computer science and computational linguistics among other disciplines, NLP attempts to fill the gap between human communication and computer understanding.

Syllabus:

1. Basic Text Processing: Regular Expressions, Word Tokenization, Normalization, Stemming and Lemmatization, Sentence Segmentation. 2. String Similarity: Minimum Edit Distance, Backtrace and Alignment, Weighted Minimum Edit Distance, Phonetic Matching, Real-world applications (record de-duplication). 3. N-gram Language Models: Introduction to N-grams, Estimating N-gram Probabilities, N-grams Evaluation and Perplexity, Generalization and Zeros, Add-One (Laplace) Smoothing, (Interpolation, Good Turing Smoothing, Kneser Ney Smoothing), Google Books N-gram Corpus, Zipf's law. 4. Spelling Correction: Introduction to the task of Spelling Correction, The Noisy Channel Model of Spelling, Real Word Spelling Correction, Peter Norvig's Spell Checker, State of the Art Systems. 5. Text Classification: Introduction to the task of text classification, Introduction to Naïve Bayes, Formalizing the Naive Bayes Classifier, Naive Bayes Learning, Naive Bayes Relationship to Language Modeling, Precision, Recall, and the F measure, Text Classification Evaluation (micro & macro averaging), Practical Issues in Text Classification, Manual labelling tools and techniques (Amazon Mechanical Turk, brat, doccano, INCEpTION). 6. Sentiment Analysis: Introduction to the task of Sentiment Analysis, the baseline algorithm for Sentiment Analysis (tokenization, feature extraction, classification), Sentiment Lexicons, Learning Sentiment Lexicons, Other Sentiment Tasks (aspects, attributes, targets).

Learning Outcomes:

Cognitive (Knowledge, Understanding, Application, Analysis, Evaluation, Synthesis)

On successful completion of this module, students will be able to: 1. Use regular expressions to match complex patterns. 2. Implement various text pre-processing steps (tokenization, normalization, stemming, lemmatization, sentence segmentation). 3. Use minimum edit distance and phonetic matching to measure the similarity between two strings. 4. Learn an N-gram language model from a corpus, and deploy the model to generate text. 5. Implement a spelling correction program. 6. Implement a naïve bayes text classification system and evaluate its performance using a standard benchmark dataset. 7. Implement a sentiment analysis system and evaluate its performance using a standard dataset.

Affective (Attitudes and Values)

On successful completion of this module, students will be able to: 1. Appreciate the use of third-party state-of-the-art NLP libraries such as Spacy, NLTK, TextBlob, CoreNLP in their projects. 2. Appreciate the role of third-party NLP cloud platforms such as IBM Watson's Natural Language Understanding, Google Cloud Natural Language, Amazon Comprehend, and Microsoft Azure Text Analytics API, in advancing NLP applications.

Psychomotor (Physical Skills)

On successful completion of this module, students will be able to:

How the Module will be Taught and what will be the Learning Experiences of the Students:

The module will be delivered fully online using on-line lectures, labs and tutorials.

Research Findings Incorporated in to the Syllabus (If Relevant):

Prime Texts:

Daniel Jurafsky, James H. Martin (2021) Speech and Language Processing (3rd Edition): , Standford

Other Relevant Texts:

Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola. () Dive into Deep Learning (D2L.ai): Interactive Deep Learning Book with Multi-Framework Code, Math, and Discussions , D2Lai Project

Programme(s) in which this Module is Offered:

MSARINTPA - ARTIFICIAL INTELLIGENCE

Semester(s) Module is Offered:

Autumn

Module Leader:

arash.joorabchi@ul.ie