Académique Documents
Professionnel Documents
Culture Documents
Lebanese University
Faculty of Economics and Business Administration 1st Branch Class: M1 Instructor: Dr. Lina A. Nimri
1
Introduction
Introduction
Examples of information need in the context of the world wide web: Find all documents containing information on computer courses which:
(1) are offered by universities in South England, and (2) are accredited by the BCS/IEE bodies, To be relevant, the document must include information on admission requirements, and e-mail and phone number for contact purpose.
Information Retrieval
Information Retrieval
Primary goal of an IR system Retrieve all the documents which are relevant to a user query, while retrieving as few non-relevant documents as possible.
Data Retrieval
Determine which documents contain the keywords in the user query is not always enough to satisfy the user information need. Data Retrieval retrieves objects which satisfy clearly defined conditions, such as regular expressions or relational algebra expressions. Data Retrieval system deals with data with welldefined structure and semantics
Area of interest
Digital Libraries Information experts World Wide Web - Very difficult task
Effective retrieval
The
User tasks
Pull technology
User requests information in an interactive manner 3 retrieval tasks
Browsing (hypertext) Retrieval (classical IR
Push technology
automatic and
systems) Browsing and retrieval (modern digital libraries and web systems)
permanent pushing of information to user software agents example: news service filtering (retrieval task) relevant information for later inspection by user
10
Pulling
The user can browse the documents when his main objectives are not clear in the beginning and whose purpose might change during the interaction with the system. Combination of retrieval and browsing is not yet a well established approach.
Retrieval
Database Browsing 11
Documents
Unit of retrieval A passage of free text
size of documents arbitrary newspaper article vs. journal paper vs. email
12
What is a document?
13
Representation of documents
Documents are represented thru a set of index terms or
extracted directly form text specified by human subjects (information science) metadata
Most complete representation High computational cost Reduce set of representative keywords
Elimination of stop words Stemming Identification of noun phrases Further compression and indexing
Accents spacing
stopwords
Noun groups
stemming
Manual indexing
15
Query
Relevance feedback
Retrieval functions
Retrieved documents
16
Queries
Information Need:
Simple queries
composed of two or three, perhaps even
Boolean queries
neural networks AND speech recognition
Context Queries
Proximity search, phrase queries
17
Best-Match retrieval
Compare the terms in a document and query Compute similarity between each document in the collection and the query based on the terms that they have in common Sorting the documents in order of decreasing similarity with the query The outputs are a ranked list and displayed to the user - the top ones are more relevant as judged by the system
18
Retrieved Documents
19
Indexed Documents
Documents
Retrieved Documents
Ranked Documents
20
Text Operations
Logical view Logical view
Query Operations
Query
Indexing
Inverted file
Index
Text repository
21
Ranking
Key Topics
Indexing text documents Retrieving text documents Evaluation Query reformulations
Information Extraction
Extract from the text what the document means.
23