Vous êtes sur la page 1sur 5

Amit

Unit 6

1. Which are different techniques of document indexing ?

2. Compare data retrieval and information retrieval ?

3. What are inverted index and signature file ?

4. Note on Indexing of documents

5. Why Effective index structure is important ?

6.What is web crawler?


Ans: Web crawler are programs that locate and gather
information on web.
They recursively follow hyperlinks present in known documents to
find other documents. A crawler retrieves the document and
adds info. Found the documents to a combined index; the
document is generally not stored, although some search engines
do cache a copy of the document to give clients a faster access.

7. Describe Web search Engine.


Ans: Since the number of documents on theWeb is very large, it is
not possible to crawl the whole Web in a short period of time; and
in fact, all search engines cover only some portions of theWeb,
not all of it, and their crawlers may take weeks or months to
perform a single crawl of all the pages they cover. There are
usually many processes, running on multiple machines, involved
in crawling. A database stores a set of links (or sites) to be
crawled; it assigns links from this set to each crawler process.
New links found during a crawl are added to the database, and
may be crawled later if they are not crawled immediately. Pages
found during a crawl are also handed over to an indexing system,
which may be running on a different machine. Pages have to be
refetched (that is, links recrawled) periodically to obtain updated
information, and to discard sites that no longer exist, so that the
information in the search index is kept reasonably up to date.
The indexing system itself runs on multiple machines in parallel. It
is not a good
Amit

idea to add pages to the same index that is being used for
queries, since doing so
would require concurrency control on the index, and affect query
and update performance. Instead, one copy of the index is used
to answer queries while another copy is updated with newly
crawled pages. At periodic intervals the copies switch over, with
the old one being updated while the new copy is being used for
queries.

8. What is question answering system ?


Ans: Question answering systems attempt to provide direct
answers to questions posed by users. They are targeted at info
on web typically generated one or more keyword queries from a
submitted question, execute the keyword queries on against web
search engines, and parsed returned documents that answer the
question.

9. Describe distinct ways a user can find information on the web?


Ans: 1) Information Extraction.
2) Querying Structured data
3) Question Answering.

10)What do Web Search engines do ? Describe in one line.


Ans: Web search engines crawl the web to find pages, analyze
them to compute prestige measures, and index them.

Q1:-WHAT IS SYNONYM?
ANS1:-synonym means the words having the same meaning but
different representation

Q2:-WHAT IS HOMONYMS?
ANS2:-Homonym means the words having the same
pronounciation but bifferent meanings

Q3:-WHAT IS ONTOLOGIES?
ANS3:-It is the process to overcome the limitation of keyword
based search
Amit

Q4:-EXPLAIN SYNONYM WITH THE HELP OF EXAMPLE?


ANS4:-Synonym is the collections of the words having the same
meaning but different representation
for eg."motorcycle repair" = "motorcycle representation"etc.

Q5:-EXPLAIN HOMONYM WITH THE HELP OF EXAMPLE?


ANS5:-homonym is the collections of the words having the
different meaning but same pronounciation
for eg. "hair" and "hare"etc.

Q.1 How is relevance ranking calculated using TF?


A. We use the frequency of occurance ( that is how many times
that particular term has occurred ) of the term in the document as
a measure of its relevance.
One way of measuring TF (d,t) i.e. Term Frequency or the
relevance of the document to a term t is
TF(d,t) = log (1+ n(d,t)/n(d))

Q.2What is the use of Information Retrieval System?


A. Information Retrieval System is intended to support people
who are actively seeking or searching for information, as in
internet searching. Information Retrieval typically assumes a
static or relatively static database, against which people search.

Q.3 Explain Simillarity based Retireval System.


A. Simillarty based Retrieval relies on best match rather than
exact match and uses techniques to compute the similarities
between the query and information items. As the user information
needs are also fuzzy, an important characteristeic for this class or
Retrieval Technique is its support for the iterative process of
retrieval.

Q.4 What is cosine Simillarity?

Q.5 What is the use of similarity based Retrieval System?


Amit

Q.1. What is the difference between Information Retrieval and


Data Retrieval?
A.
1. Data Retrieval System gives an exact match of the search
elements, whereas, Information Retrieval System gives partial or
best match results.
2. Query language in Data Retrieval System is Artficial,
whereas, Natural language is used in Information Retrieval
System.
3. Complete Query specification is required in Date Retrieval
System, whereas, partial Query specification works in Information
Retrieval System.

Q.2. Explain the components of information Retrieval System.


A.
The typical components of Information Retrieval System
are :
1. Input
2. Processor
3. Output

Q.3. what is Relevance? How is it calculated?


A. Relevance can be calculated as the cosine between the two
vectors, i.e. their cross product divided by the square roots of the
squares of each vector. This measure varies between 0 and 1.

Q.4. what is TF-IDF? (Term Frequency – Inverse Document


Frequency )
A. A measure of the frequency of occurrence of a particular
term in a particular document as well as how often that term
occurs in the entire collection of interest.

Q.5. How is TF – IDF used? What is the need?


A. If a term occurs frequently in one document but also occurs
frequently in every other document in the collection then it is not
a very important t word and the TF-IDF measure reduces the
weight placed on it. A common term is considered less important
than the rare terms.
Amit

If a term occurs in every document then the inverse


document frequency is zero./ If it occurs in half of the documents,
it will be 0.3, and if it occurs in 20 of 10000 documents, it will be
2.6

Q.6. Illustrate the components of Information Retrieval System


using Diagram.

Q.7. Information Retrieval System is best match or partial match,


whereas, data Retrieval System is exact match. Expand.

Vous aimerez peut-être aussi