Vous êtes sur la page 1sur 18

Unit -2 World Wide Web

Web Browsers
A web browser is a software application which enables a user to display and interact with text, images, videos, music and other information typically located on a Web page at a website on the World Wide Web or a Local Area Network. Text and images on a Web page can contain hyperlinks to other Web pages at the same or different website. Web browsers format HTML information for display, so the appearance of a Web page may differ between browsers. Some of the Web browsers available for personal computers include Internet Explorer, Mozilla Firefox, Safari, and Opera in order of descending popularity. Web browsers are the most commonly used type of HTTP user agent. Although browsers are typically used to access the World Wide Web, they can also be used to access information provided by Web servers in private Networks or content in File Systems. Web browsers communicate with Web servers primarily using HTTP (Hypertext Transfer Protocol) to fetch Web pages. HTTP allows Web browsers to submit information to Web servers as well as fetch Web pages from them. 2.1.1 Internet Explorer Windows Internet Explorer (formerly Microsoft Internet Explorer abbreviated as MSIE), commonly abbreviated as IE, is a series of graphical web browsers developed by Microsoft and included as part of the Microsoft Windows line of operating systems starting in 1995. It has been the most widely used web browser since 1999. 2.1.1.1 Features of Internet Explorer

Internet Explorer has been designed to view the broadest range of web pages and to provide certain features within the operating system including Microsoft Update. Some of its features are as follows: 1. Standard support

Internet & Web Technology

Internet Explorer almost implementation techniques.

fully

support

HTML,

CSS

XML

2. Usability and Accessibility Internet Explorer makes use of the accessibility framework provided in Windows. Internet Explorer is also a user interface for FTP. Recent versions feature pop up blocking and tabbed browsing. 3. Cache Internet Explorer caches visited content in the Temporary Internet Files folder to allow quicker access to previously visited pages. 4. Security Internet Explorer uses a zone based security framework that groups sites based on certain conditions, including whether it is an intranet or internet based site. Security restrictions are applied on a per zone basis, all the sites in a zone are subject to the restrictions. 5. Group Policy Internet Explorer is fully configurable using Group Policy (feature that provides centralized management and configuration of computers and remote users). Administrators of Windows Server Domains (a logical group of computers running versions of the Microsoft Windows operating system that share a central directory database) can apply and enforce a variety of settings that affect the user interface (such as disabling menu items and individual configuration options), as well as underlying security features such as downloading of files, zone configuration, per site settings, etc. 2.1.1.2 Criticisms to Internet Explorer

Internet Explorer has been subject to many criticisms. Most of the criticism concerns its security architecture and its degree of support of open standards. Much criticism of Internet Explorer is related to concerns about security. Much of the spy ware, ad ware, and computer viruses across the Internet are made possible by exploitable bugs and flaws in the security of Internet Explorer, sometimes requiring nothing more than viewing of a malicious web page in order to install them.
2

Introduction to the Internet

2.1.2 Netscape Navigator Netscape Navigator, also known as Netscape, is a proprietary (proprietary software is a term for computer software with restrictions on use or private modification, or with restrictions judged to be excessive on copying or publishing of modified or unmodified versions) web browser that was popular during the 1990s. It is a closed source, non free web browser. Initially it was known as Mosaic browser and was a paid commercial web browser. But later in 1994 it was decided to make it freely available for all non commercial users. During development, the Netscape browser was known by the code name Mozilla. The Mozilla name was used as the user Agent in HTTP requests by the browser. Other web browsers claimed to be compatible with Netscapes extensions to HTML. Mozilla is now a generic name for matters related to the open source successor to Netscape Communicator. 2.1.2.1 The Rise of Netscape

When the consumer Internet revolution arrived in the mid-to-late 1990s, Netscape was well positioned to take advantage of it. With a good mix of features and an attractive licensing scheme that allowed free use for noncommercial purposes, the Netscape browser soon became the de facto (in fact / in practice) standard, particularly on the Windows platform. Internet service providers and computer magazine publishers helped make Navigator readily available. An important innovation that Netscape introduced in 1994 was the on-thefly display of web pages, where text and graphics appeared on the screen as the web page downloaded. Earlier web browsers would not display a page until all graphics on it had been loaded over the network connection; this often made a user stare at a blank page for as long as several minutes. With Netscape, people using dial-up connections could begin reading the text of a webpage within seconds of entering a web address, even before the rest of the text and graphics had finished downloading. This made the web much more tolerable to the average user. 2.1.2.2 Fall of Netscape

Microsoft saw Netscape's success as a clear threat to the dominant status of the Microsoft Windows operating system. It began a wide-reaching campaign to establish control over the browser market.
3

Internet & Web Technology

The resulting battle between the two companies became known as the browser wars. Netscape Navigator 3.0 came in two versions, Standard Edition and Gold Edition. The latter consisted of the Navigator browser with mail and news readers and a web page WYSIWYG composition tool integrated into it. The extra functionality only made the software program larger, slower, and more prone to crashes, and the decision to integrate all these features together was widely criticized. By the end of the decade, Netscape's web browser had unquestionably lost its former dominance on the Windows platform. 2.1.2.3 Criticisms to Netscape

Netscape Navigator has mostly been criticized for implementing nonstandard HTML mark-up extensions such as the BLINK tag, which is sometimes referred to as a symbol for Netscape's urge to develop extensions not standardized by the W3C (The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web (abbreviated WWW or W3). It is arranged as a consortium where member organizations maintain full-time staff for the purpose of working together in the development of standards for the W3), and even mentioned in the fictional Book of Mozilla. Netscape has also been criticized for following actual web standards poorly, often lagging behind or supporting them very poorly or even incorrectly. This criticism wasn't very loud during the days of its popularity as web designers then often simply developed for Netscape Navigator, but came to be an increasing annoyance to web designers who wish to provide backward compatibility, most often with Netscape Navigator 4 and Netscape Communicator, to their web sites. Today, many web masters simply do not choose to support these old versions, due to their extremely small market share and lack of standardization. However, Netscape's own contributions to the web of this sort havent always been of frustration to web developers. 2.2 Web Terminologies

Some of the commonly used terms associated with Web surfings are as follows:
2.2.1

Page or Web Page. A file that can be read over the World Wide

Web.
4

Introduction to the Internet

Pages or Web Pages. The global collection of documents associated with and accessible via the World Wide Web.
2.2.2 2.2.3

Hyperlink. A string of clickable text or a clickable graphic that points to another Web page or document. When the hyperlink is selected, another Web page is requested, retrieved, and rendered by the browser. Hypertext. Web pages that have hyperlinks to other pages. More generally, any text having nonlinear links to other text.
2.2.4 2.2.5

Browser. A software tool used to view Web pages, read email, and read newsgroups, among other things. Browsers are also called Web clients. Multimedia. Information in the form of graphics, audio, video, or movies. A multimedia document contains a media element other than just plain text.
2.2.6 2.2.7 2.2.8 2.2.9

Hypermedia. Media with links and navigational tools.

Uniform Resource Locator. A string of characters that specify the address of a Web page. Surfer. A person who spends time exploring the World Wide Web.

2.2.10 Web Presentation. A collection of associated and hyperlinked Web

pages. Usually, there is an underlying theme to the pages. For example, a Web presentation for a company may describe facts about the company, its employees, its products, and the method for ordering the products on line.
2.2.11 Webmaster. A person who maintains, creates, and manages a Web

presentation, often for a business, organization, or university. This person usually signs Web pages, so that questions and comments can be sent to them.
2.2.12 Web manager. Synonym for Webmaster. 2.2.13

Web site. An entity on the Internet that publishes Web pages. A Web site typically has a computer serving Web pages, whereas a Web presentation is the actual Web pages themselves. For example, www.bsaitm.org is the name of a Web site, whereas www.bsaitm.org/home.php is the name of a Web presentation.

2.2.14

Web server. A computer that satisfies users requests for Web pages. Mirror site. A site that contains a duplicate copy of a Web presentation from another site. If a Web presentation is
5

2.2.15

Internet & Web Technology

extremely popular, other sites may be used to mirror the original presentation; i.e., they contain the same information as the original site. This allows the load on the Web server and the network to be distributed. If one server is down, a mirror site can be tried. If several mirror sites exist, it is a good idea to try the closest to the user first.

How the Web Works


To surf the Web the user needs a Web browser. A WEB BROWSER IS AN APPLICATION THAT DISPLAYS THE WEB DOCUMENTS. The user first opens a Web browser, and then enters a URL (Uniform Resource Locator) in the address bar. A URL IS THE ADDRESS OF A DOCUMENT THAT A USER NEEDS TO BE ABLE TO GET HOLD ON. After inputting the URL, the protocol determines the address of the server computer that contains the required document to be opened. Then the connection is established with the server and a request is sent for the required document. The server in response sends the requested document to the browser. This can be explained in following steps: Parsing the URL. The browser looks at the URL to determine the address of the document on Web. It firstly, determines the address i.e., domain name of the server. Let us consider a URL:
1.

http://www.microsoft.com/home.asp The http (THE HYPERTEXT TRANSFER PROTOCOL DESCRIBES HOW TO TELL THE SERVER WHICH DOCUMENT TO BE PROVIDED TO THE USER AND HOW TO RETRIEVE IT.) determines the address or domain name of the server to be connected as: www.microsoft.com Resolving the IP address. The Domain Name System (DNS) executes a resolver program to determine the IP address corresponding to the given domain name.
2.

Establishing the Connection with the server. After resolving the IP address, the connection is established the determined server.
3. 6

Introduction to the Internet

Sending the request. As the connection is established, the browser transmits the following request to the server: GET/request URI HTTP/version Where version tells the server which http version is used.
4.

The servers response. When the server receives the HTTP request, it locates the appropriate file and returns it to the clients browser. In the above example the requested file is: Home.asp
5.

Downloading the Document. The document returned by the server (be it an HTML document) is downloaded by the browser and then displayed to the user.
6.

If the HTML document contains other resources like graphics, sound, animations, etc. these are downloaded in the similar manner as the HTML document and then displayed. Connection Termination. As the file or document is downloaded, the connection with the server is terminated.
7.

HTTP and URL (As done in Class) 2.3 How the Web Browser Works
The working of the Browser depends on some of the issues like: a. b. c. MIME types Plug INS, and Helper Applications

These are explained below:

2.3.1

MIME Types

MULTIPURPOSE INTERNET MAIL EXTENSION (MIME) IS AN INTERNET STANDARD THAT EXTENDS THE FORMAT OF E-MAIL TO SUPPORT: TEXT IN CHARACTER SETS; NON TEXT ATTACHMENTS; MESSAGE BODIES WITH MULTIPLE PARTS; and HEADER INFORMATION IN NON ASCII CHARACTER SETS.

Internet & Web Technology

Originally, only ASCII text files could be sent via email. Today, an email message may contain an attachment that consists of virtually any type of file. Usually ASCII files are referred as text or plain text files and all other files as binary files. MIME is used to send messages containing another form of media such as graphics, HTML code, a spreadsheet document, video, voice, and/or a word processor document attached in addition to text. All that is necessary is that the mailer and the recipients mailer be MIME compliant. If the recipients mailer is not MIME compliant, the files transferred through attachment will not be displayed properly with all features of the document. MIME was originally developed as an extension to the Internet mail protocol that allows for the inclusion of multimedia in mail messages. BASIC IDEA OF MIME is transmission of text files with headers that indicate Binary data that will follow. Each MIME type is composed of two parts that indicate the data type and subtype in the following format: Content type : type / subtype Where type can be image, audio, text, video, application, message, or extension token; and subtype gives the specifics of the content. Some samples of the types are listed below: text / html video / quicktime image / jpeg application / x pdf image / gif audio / x wav

When a web server delivers a file, the header information is intercepted by the browser and questioned. The MIME type is specified by the Content type HTTP response field. For example, if a browser receives a basic HTML file, the text / html value in the Content type header indicates what the browser should do. The browser first would read the HTML being delivered and then retrieve any other objects, such as GIF images, sound files, Flash files, Java Applets, and so on, that would result in another request to the server. If a browser encountered something like this: <IMG src = images / logo.gif alt = demo company height = 100 width = 200>
8

Introduction to the Internet

It would then form a request like this: GET / images / logo.gif HTTP / 1.1 Connection : Keep Alive User Agent : Mozilla / 4.0 (Compatible; MSIE 5.01; Windows 98)

Accept : application / x pdf, image / gif, image / x bitmap, image / jpeg, */* Accept Language : en us The server then would respond with a similar answer (indicating that a MIME type of image / gif is being returned, followed by the appropriate form of binary data to make up an image), as: HTTP / 1.1 200 OK Date : Tue, 18 Jan 2000 04:41:15 GMT Server : Apache / 1.3.4 (UNIX) Last Modified : Wed, 13 Oct 1999 23:37:38 GMT Content Length : 28531 Connection : Close Content Type : image / gif

2.3.2

Plug INS and Helper Applications

To get the most from the browser, the user need to configure it to handle the different types of files that are used on the web. Plug-in's and helper applications are two different ways to view these files. Plug-in's are special programs designed to view documents in the browser. Helper applications can be any program on your computer.

2.3.2.1

PLUG - INS

PLUG INS ARE SMALL PROGRAMS THAT EXTEND THE BROWSER TO SUPPORT NEW FUNCTIONALITY Users must locate and download plug INS, install them, and occasionally even restart their browsers. The plug INS can be included in Web pages by using the <embed> or <object> tags. Typically, the <embed> syntax is used, but the
9

Internet & Web Technology

<object> syntax is preferred method because it is a part of the XHTML specification, and will, therefore, validate.

1.

<EMBED> SYNTAX FOR PLUG INS

In general embed element takes src attribute to specify the URL of the included binary object. The height and width attributes often are used to indicate the pixel dimensions of the included object, if it is visible. For example, <EMBED src 100></embed> = welcome.avi height = 100 width =

The <embed> tag displays the plug in as part of the HTML / XHTML document. Some of the other attributes used in embed element are: align: used to align the object relative to the page and allow text to flow around the object. (ii) hspace and vspace: used to set the buffer region, in pixels, between the embedded objects and the surrounding text. (iii) Border: used to set a border for the plug in, in pixels.
(i)

Values for height and width should always be set, unless the hidden attribute is used. Setting the hidden attribute to true in an <embed> tag causes the plug in to be hidden and overrides any height and width settings. 1.1 CUSTOM PLUG IN ATTRIBUTES

In addition to standard attributes, plug INS might have custom attributes to communicate specialized information to the plug in code. For example, a movie player plug in may have a loop attribute to indicate how many times to loop the movie. 1.2 ATTRIBUTES FOR INSTALLATION OF PLUG INS

If embedded data in a web page has no associated plug in, the user will need to install a plug in to address it. So the user should set the pluginspage attribute equal to a URL that indicates the instructions for Installing the needed plug in. The user can use pluginurl attribute also in place of pluginspage attribute.
10

Introduction to the Internet

2.

<NOEMBED>

Some browsers dont understand the <embed> tag. So the <noembed> tag enables the user to provide some alternative text or marked up content. For example, <EMBED src = welcome.avi height = 100 width = 100 / > <NOEMBED> <img src = welcome.gif alt = Welcome to Demo Company / > < / NOEMBED> One potential problem with the <noembed> approach occurs when a browser supports plug INS but locks the specific plug in to deal with the included binary object. In this case, the user is presented with a broken puzzle piece icon or a similar icon, and then is directed to a page to download missing plug in. 3. <OBJECT> SYNTAX FOR PLUG INS

The primary attribute of the <object> element when referencing plug INS is data, which represents the URL of the objects data and is equivalent to the src attribute of <embed>. type:attribute represents MIME type of the objects data. codebase: attribute is similar to pluginspage attribute representing the URL of the plug in. classid: attribute is used to specify the URL to use to install the plug in just like codebase. id: attribute is used to set the name of the object for scripting. For example, <OBJECT data=click.wav type=audio / wav height = 60 width = 144 autostart = false> <B> Sorry, No Live Audio Installed< / B> < / OBJECT> DRAWBACKS OF PLUG - INS Although plug INS can go a long way toward extending the possible capabilities of a browser, the technology does have its drawbacks:
11

Internet & Web Technology

Users must locate and download plug INS, install them, and occasionally even restart their browsers. Even if installation were not such a big problem, plug INS are not available on every machine. An executable program, or binary, must be created for each particular operating system. Because of the machine specific approach, many plug INS work only on Windows based systems. Each plug in installed on system is a persistent extension to the browser, and takes up memory and disk space. BENEFITS OF PLUG - INS The benefit of plug INS is that they can be well integrated into web pages. These can be included by using the <embed> or <object>.

2.3.2.2

HELPER - APPLICATIONS

A HELPER APPLICATION IS AN EXTERNAL VIEWER PROGRAM LAUNCHED TO DISPLAY CONTENT RETRIEVED USING A WEB BROWSER. Some common examples include Windows Media Player and QuickTime Player for playing streaming content. Unlike a plugin (whose full code is included into browser code), a small line is added to the browser code to tell it to open a certain helper application in case it encounters a certain file format. Common helper applications include RealAudio, which allows browsers to play live sound tracks such as radio broadcasts or recorded lectures; Acrobat; For local files and other files for which a web browser does not get a content type from the server, user can use a helper application entry to give a specific MIME Type to files with a specific extension. To do this, user just needs to create an entry in Edit/Preferences/Helper Applications, enter the MIME Type that he want in the Type field, and enter the extensions in the extension field. The rest can be left at the default values. User might get a warning that browser can handle the type internally. If all user want is to change the assigned MIME type of a file, he can ignore this message (click "Proceed anyway").
12

Introduction to the Internet

2.4 Directories
An approach to organize and locate information on the world wide web. It offers a hierarchical representation of hyperlinks to web pages and presentations broken down into topics and subtopics. The hierarchy can descend many levels. Directories can be classified as either general or specialized. A GENERALIZED DIRECTORY is also called a web directory, a subject directory or sometimes a web guide. The top level of a general directory provides a wide range of very broad topics such as arts, automobiles, education, news, science, and so on. In addition to being very easy to use, another benefit of a directory, structure is that the user need not know exactly what he is looking for in order to find something worthwhile. Just select (click on) the category for the topic in which the user is interested. Continue to move down through the hierarchy, selecting subcategories and narrowing the search at each level, until a list of hyperlinks that pertain into the desired topic is presented. While searching, one may find other interesting items of which he/she is previously unaware. On the other hand, one may reach the bottom of the directory without finding the information that is being searched for. In such cases, backtracking may be done, going up several levels and then proceeding down again. When traversing a directory downwards, one is moving towards more specific topics. When going upwards, one is heading back to more general topics. A SPECIALIZED DIRECTORY is usually organized by an expert in a particular field, thus offering a narrow selection of topics that have more depth. Specialized directories are also called subject guides or gateway pages. These deals with a variety of topics including law, medicine, news, shopping, and so on. The top level of a directory about law, for instance, may contain topics such as legal forms, top law firms, law schools, and rankings, research resources and on line legal journals.
13

Internet & Web Technology

In both generalized as well as specialized directories, each topic is a hyperlink that leads to more specific subtopics. They, in turn, have a number of subtopics, and so on until a specific web page or web presentation is reached. For a general query, a generalized web directory will probably supply the answer quickly. However, for a very specific query, a specialized subject guide may be necessary to locate the information efficiently and quickly. In addition to providing a shortcut to finding obscure information on the WWW, some subject guides also provide access to what has become known as the invisible web information that is not available to general search engines accessing the web. The invisible web is also referred to as the hidden or deep web, while the other part of the web that can be cataloged by the search engines has been dubbed the surface web or visible web. Popular Generalized Directories: LookSmart www.looksmart.com Lycos www.lycos.com Open Directory Project (ODP) www.dmoz.com or www.dmoz.org Yahoo! www.yahoo.com Popular Subject Guides: The Alternative Medicine Home Page www.pitt.edu/~cbw/altm.html Copyright Resources on the Internet www.Groton.k12.ct.us/mts/pt2a.htm Financial Aid Resource Center www.theoldschool.org

2.5 Search Engines


It is a computer program that does the following: Allows submitting a form containing a query that consist of a word or phrase describing the specific information to be located on the web. Searches the database to try to match the query. Collates and returns a list of clickable URLs containing presentations that match the query; the list is usually ordered, with the better matches appearing on the top. Permits to revise and resubmit a query.
14

Introduction to the Internet

Like directories, search engines can also be classified as either general or specialty search engines. A GENERAL SEARCH ENGINE retrieves information from a database that contains information on a wide variety of topics. A SPECIALTY SEARCH ENGINE is also called a vertical search engine or a topic search engine, and its database contains information on a specific topic. Because its focus is narrow, a specialty search engine can usually provide in depth information on specific topics that may be more valuable for a particular application. To use a search engine, a query is supplied to it by entering information into a field on the screen. Its disadvantage is that one has to learn the query language or syntax to use a specific search engine. The user friendliness and power of query languages vary from search engine to search engine. Popular General Search Engines: AltaVista www.altavista.com Google www.google.com AskJeeves www.ask.com Apart from finding information on a topic, specialty search engine can also tap the invisible web. Popular Specialty Search Engines: Moreover www.moreover.com MP3 search www.mp3search.nu Travelocity www.travelocity.com To locate specialized search engines, refer to the Individual Web catalog www.invisibleweb.com (the search engine for search engines).

15

Internet & Web Technology

2.6 Meta Search Engines


A metasearch engine or all in one search engine performs a search by calling on more than one other search engines to do the actual work. It does not maintain its own database of information; by submitting searches to other search engines, it queries the databases of the other search engines. Many metasearch engines will collate the search results into one list, remove duplicates, and then rank the pages according to how well they match a given query. Others like Dogpile, will provide results from each search engine separately. Its advantage is that one can access a number of different search engines with a single query. Its disadvantage is that one will often have a high noise to signal ratio, i.e. a lot of the matches will not be of interest to you. This require a lot more time to be spent evaluating the results and deciding which hyperlinks to follow. Popular Metasearch Engines: DogPile www.dogpile.com MetaCrawler www.metacrawler.com MetaSearch www.metasearch.com

2.7 Search Fundamentals


The basic fundamentals of search that are used by a search engine involve:

Search Terminology Pattern Matching Queries Boolean Queries Search Domain Search Subjects

16

Introduction to the Internet

2.7.1 Search Terminology


Few common search related terms for various search engines are: Search Tools. Mechanism for locating information on the web; (search engine or metasearch engine or directory); Query. Information entered into a form on a search engines web page that is to be searched. Query Syntax. Set of rules describing what constitutes a legal query. Hit. A URL that search engine returns in response to a query. Match. A synonym for Hit. Relevancy Score. Value that indicates how close a match a URL was to a query; usually expressed as a value from 1 to 100, with the higher score meaning more relevant.

2.7.2 Pattern Matching Queries


Most basic type of a query. Formulated using a keyword or a group of keywords.

Search engine returns the URL of any page that contains these words. The exact details of how pattern matching queries are resolved are search engine specific.

2.7.3 Boolean Queries


Involves the Boolean operations AND, OR, and NOYT. Most search engines allow entering a Boolean query.

For eg., paint AND house; It will turn up all web pages that contain both paint and house.

Some search engines permit using multiple ANDs. For eg., Janet AND Tito AND Michael AND Latoya.

In many search engines, using quotes around a phrase means the words must appear together. Exact syntax of query varies from search engine to search engine.

17

Internet & Web Technology

2.7.4 Search Domain


Most search tools provide some flexibility in the choice of domains to search

For eg., one can search the web, newsgroups, specialized databases or the Internet.

Depending on the item for which one is looking he or she may decide to try either a more specific domain first, in hopes of a more efficient search, or a comprehensive and more time consuming search.

2.7.5 Search Subjects


Several search and metasearch engines provide a way to view the search queries of anonymous users in real time. Here a list of queries that are currently being processed by that search engine. Since users are submitting the queries in real time, queries list displayed may be refreshed every 15 seconds or so. For eg., a Metaspy, the spy page for MetaCrawler, a popular metasearch engine. MetaSpy offers a filteredand non filtered version, acknowledging that some queries are obscene and / or offensive.

Some web pages that allows to view real time searches are:
o o o o

www.askjeeves.com/docs/peek (AskJeeves Peek Through the Keyhole) www.excite.com/voyeur-xt (Excite Search Voyeur) www.metaspy.com (MetaSpy) www.webcrawler.com/SearchTicker.html Search Voyeur) (WebCrawler

o Buzz.yahoo.com (Yahoo! Buzz Index)

18

Vous aimerez peut-être aussi