Module Code - Title:
MN5151
-
INFORMATION RETRIEVAL
Year Last Offered:
2025/6
Hours Per Week:
Grading Type:
N
Prerequisite Modules:
Rationale and Purpose of the Module:
This module introduces students to the fields of Information Retrieval, Information Extraction, and Semantic Web. The module will cover a blend of fundamental concepts and current tools, techniques, and technologies used in modern information retrieval systems.
Syllabus:
The module will cover a blend of fundamental concepts and current tools, techniques, and technologies used in modern information retrieval systems under the 6 headings indicated below.
1. Information Retrieval concepts and models: such as structured vs. unstructured data, the classic search model, term document incidence matrices, inverted index, Query Processing with the Inverted Index, The Boolean Retrieval Model & Extended Boolean models, Phrase Queries and Positional Indexes.
2. Ranked Retrieval Systems: including Scoring with the Jaccard Coefficient, Term Frequency Weighting, Inverse Document Frequency Weighting, TF-IDF Weighting, Vector Space Model, TF-IDF Cosine Similarity, Evaluating Search Engines (precision-recall curve, MAP ,MRR, NDCG).
3. Text Clustering Methods: such as K-means, K-means for text documents, Flat clustering, Hierarchical clustering.
4. Information Extraction Approaches: including Named Entity Recognition, Relation Extraction, Using Patterns to Extract Relations, Semi Supervised and Unsupervised Relation Extraction.
5. Question Answering Approaches: such as Answer Types and Query Formulation, Passage Retrieval and Answer Extraction, Using Knowledge-bases in Question Answering, Answering Complex Questions (query-focused summarization).
6. Semantic Web and Linked Data Approaches: including Taxonomies, Ontologies, Knowledge Graphs, Ontology Querying, Ontology Reasoning, Data Quality and Interlinking.
Learning Outcomes:
Cognitive (Knowledge, Understanding, Application, Analysis, Evaluation, Synthesis)
On successful completion of this module, students will be able to:
1. Implement a simple inverted index and search through it.
2. Implement TF-IDF and cosine similarity to build a simple search engine.
3. Use local full-text search engines such as Apache Lucene, Solr, and Elasticsearch, and cloud-based options such as MeiliSearch and Algolia.
4. Implement a simple K-means text clustering method.
5. Use document clustering engines such as carrot2 and Weka.
6. Implement a simple Named Entity Recognition method (sentence segmentation, tokenization, part of speech tagging, entity detection, relation detection).
7. Store linked data in a triplestore such as Ontotext GraphDB.
8. Use SPARQL to query knowledge bases such as Wikidata.
Affective (Attitudes and Values)
On successful completion of this module, students will be able to:
1. Appreciate the use libraries and cloud-based services for Named Entity Recognition (e.g., GATE, OpenNLP, Spacy, NLTK, Azure Cognitive Services, Watson Natural Language Understanding, TextRazor).
2. Appreciate the use semantic web technologies (RDF, RDFS, OWL, JSON-LD, RDFa and schema.org) to create and publish Linked Data.
Psychomotor (Physical Skills)
On successful completion of this module, students will be able to:
How the Module will be Taught and what will be the Learning Experiences of the Students:
The module will be delivered fully online using on-line lectures, labs and tutorials.
Research Findings Incorporated in to the Syllabus (If Relevant):
Prime Texts:
Manning, C.D., Raghavan, P. & Schütze, H. (2008)
Introduction to information retrieval.
, Cambridge University Press
Other Relevant Texts:
Allemang, D., Hendler, J., & Gandon, F. (2020)
Semantic Web for the Working Ontologist: Effective Modeling for Linked Data, RDFS, and OWL
, Association for Computing Machinery
Daniel Jurafsky, James H. Martin. (2021)
Speech and Language Processing (3rd Edition)
, Stanford
Programme(s) in which this Module is Offered:
MSARINTPA - ARTIFICIAL INTELLIGENCE
Semester(s) Module is Offered:
Autumn
Module Leader:
arash.joorabchi@ul.ie