Chapter 1

Introduction to IR and Web Search

Introduction, Data vs Information Retrieval, Logical view of the documents, Architecture of IR System, Web search system, History of IR, Related areas

Notes Coming Soon
Chapter 2

Text properties, operations and preprocessing

Tokenization, Text Normalization, Stop-word removal, Morphological Analysis, Word Stemming (Porter Algorithm), Case folding, Lemmatization, Word statistics (Zipf's law, Heaps’ Law), Index term selection, Inverted indices, Positional Inverted index, Natural Language Processing in Information Retrieval, Basic NLP tasks – POS tagging; shallow parsing

Notes Coming Soon
Chapter 3

Basic IR Models

Classes of Retrieval Model, Boolean model, Term weighting mechanism – TF, IDF, TF-IDF weighting, Cosine Similarity, Vector space model , Probabilistic models (the binary independence model ,Language models; · KL-divergence; · Smoothing), Non-Overlapping Lists, Proximal Nodes Mode

Notes Coming Soon
Chapter 4

Evaluation of IR

Precision, Recall, F-Measure, MAP (Mean Average Precision), (DCG) Discounted Cumulative Gain, Known-item Search Evaluation

Notes Coming Soon
Chapter 5

Query Operations and Languages

Relevance feedback and pseudo relevance feedback, Query expansion (with a thesaurus or WordNet and correlation matrix), Spelling correction (Edit distance, K – Gram indexes, Context sensitive spelling correction), Query languages (Single-Word Queries, Context Queries, Boolean Queries, Structural Query, Natural Language)

Notes Coming Soon
Chapter 6

Web Search

Search engines (working principle), Spidering (Structure of a spider, Simple spidering algorithm, multithreaded spidering, Bot), Directed spidering (Topic directed, Link directed), Crawlers\n109 (Basic crawler architecture), Link analysis (HITS, Page ranking), Query log analysis, Handling “invisible” Web – Snippet generation, CLIR (Cross Language Information Retrieval)

Notes Coming Soon
Chapter 7

Text Categorization

Categorization, Learning for Categorization, General learning issues, Learning algorithms: Bayesian (naïve), Decision tree, KNN, Rocchio)

Notes Coming Soon
Chapter 8

Text Clustering

Clustering, Clustering algorithms (Hierarchical clustering, k-means, k-medoid, Expectation maximization (EM), Text shingling)

Notes Coming Soon
Chapter 9

Recommender System

Personalization, Collaborative filtering recommendation, Content-based recommendation

Notes Coming Soon
Chapter 10

Question Answering

Information bottleneck, Information Extraction, Ambiguities in IE, Architecture of QA system, Question processing, Paragraph retrieval, Answer processing

Notes Coming Soon
Chapter 11

Advanced IR Models

Latent Semantic Indexing (LSI), Singular value decomposition, Latent Dirichlet Allocation, Efficient string searching, Knuth – Morris – Pratt, Boyer – Moore Family, Pattern matching