Vous êtes sur la page 1sur 4

CS 214 Introduction to Internet and HTML

by Jayson G. Mauricio

CS 214: Introduction to Internet and HTML

LECTURE 3: Familiarization with Web Tools Part Two: Search Engines

by Jayson G. Mauricio Our Lady of Fatima University Antipolo Campus

Lecture 3: Familiariazation with Web Tools Part Two: Search Engines

CS 214 Introduction to Internet and HTML

by Jayson G. Mauricio

LECTURE 3: Familiarization with Web Tools Part Two: Search Engines

What a Search Engine Is?


As the term is generally used, a search engine has two parts: A "robot" or "crawler" that goes to every page or representative pages on the Web and creates a huge index A program that receives your search request, compares it to the entries in the index, and returns results to you

Internet search engines are special sites on the Web that are designed to help people find information stored on other sites. There are differences in the ways various search engines work, but they all perform three basic tasks: They search the Internet -- or select pieces of the Internet -- based on important words. They keep an index of the words they find, and where they find them. They allow users to look for words or combinations of words found in that index. Early search engines held an index of a few hundred thousand pages and documents, and received maybe one or two thousand inquiries each day. Today, a top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day. In this article, we'll tell you how these major tasks are performed, and how Internet search engines put the pieces together in order to let you find the information you need on the Web

The Major Search Engines and How They Work


Most if not all of the major search engines attempt to do something close to indexing the entire content of the World Wide Web. Once a site's pages have been indexed, the search engine will return periodically to the site to update the index. Some search engines give special weighting to: words in the title, in subject descriptions and keywords listed in HTML META tags, to the first words on a page, and to the frequent recurrence (up to a limit) of a word on a page. Because each of the search engines use a somewhat different indexing and retrieval scheme (which is likely to be treated as proprietary information) and because each search engine can change its scheme at any time, we haven't tried to describe these here. Major Search engines 1. AltaVista, sponsored by Digital Equipment Corp., processes more than 2.5 million search requests every day. It has cataloged more than 15 billion words on some 30 million Web pages as well as all 13,000 Usenet newsgroups. It collects Web pages at the rate of 2.5

Lecture 3: Familiariazation with Web Tools Part Two: Search Engines

CS 214 Introduction to Internet and HTML

by Jayson G. Mauricio

million a day. Find AltaVista at http://www.altavista.digital.com 2. Excite has a database of 1.5 million Web pages that you can search by keyword or by concept. In addition, it has a browsable directory of more than 50,000 reviewed Web sites, a Usenet database of more than 1 million articles, and a search of the Usenet classifieds from the last 2 weeks. Find Excite at http://www.excite.com 3. HotBot features a menu-driven search engine. You can search by file type, date, geographic location and domain, and Web site. Find HotBot at http://www.hotbot.com 4. InfoSeek is a full-text search system with which you can look for Web pages, Usenet newsgroups, and FAQs. A normal, free search is limited to the first 100 matches. If you subscribe to InfoSeek Professional, you can search computer, medical, and business news, press releases, and technical-support databases. Find InfoSeek at http://www2.infoseek.com 5. Lycos is used by more than 500,000 people every week and catalogs some 20 million Web pages, FTP sites, and Gopher sites. Find Lycos at http://www.lycos.com 6. Open Text Index is a very powerful, multilingual search engine with which you can do a weighted search and receive information that is ranked by relevancy. Find Open Text at http://www.opentext.com:8080 7. WebCrawler is a free service from America Online that gives you fast access to a 200-megabyte database of 2 million indexed Web documents. Find WebCrawler at http://webcrawler.com 8. Yahoo lists more than 200,000 Web sites in more than 20,000 categories. A utility at this site lets you extend your search to other search engines, such as

Lecture 3: Familiariazation with Web Tools Part Two: Search Engines

CS 214 Introduction to Internet and HTML

by Jayson G. Mauricio

AltaVista, Lycos, or WebCrawler. Find Yahoo at http://www.yahoo.com 9. Yehey is the first Filipino Search Engine. 10. Google - "Googol" is the mathematical term for a 1 followed by 100 zeros. The term was coined by Milton Sirotta, nephew of American mathematician Edward Kasner, and was popularized in the book, "Mathematics and the Imagination" by Kasner and James Newman. Google's play on the term reflects the company's mission to organize the immense amount of information available on the web.

Lecture 3: Familiariazation with Web Tools Part Two: Search Engines

Vous aimerez peut-être aussi