Vous êtes sur la page 1sur 31

COSC1300

Networking and the Internet

HTTP Protocol

Web - What spiders make inside sheds


The Rural Australia Thesaurus of Computer Terminology

Table Of Contents

n this chapter, we cover a complete study of


webservers and related issues.

1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links

1. Introduction
In Chapter 2, we discussed the TCP/IP Network Protocol Suite and the functionality of
each layer. There we mentioned various application protocols in the Application Layer
of the TCP/IP hierarchy. Among these application layer protocols, the Hypertext
Transfer Protocol is one of the most important, and is the topic for this chapter.
HTTP is the language that Web Browsers (the client) and Web Servers (the server) use
to speak to each other. It is important to enforce a strict set of rules for this conversation,
as the client probably needs to communicate with many servers (e.g you access this site
as well as many other web sites) and the server needs to communicate with many clients
(e.g this site is accessed by many students). The Internet Engineering Task Force (IETF)
has released several RFCs (Requests for Comments) that outline HTTP and set the
standard for Web communication. The following table lists a few important RFCs
related to HTTP.
RFC
Number
1945
2616
2617

Purpose

URL

HTTP/1.0 Specifications
HTTP/1.1 Specifications
HTTP Basic & Digest
Authentication

http://www.w3.org/Protocols/rfc1945/rfc1945
http://www.ietf.org/rfc/rfc2616.txt
http://www.ietf.org/rfc/rfc2617.txt

You can find a full list of RFCs at the http://www.faqs.org.

1.1 Uniform Resource Locators & Uniform Resource Identifiers


In a heterogeneous environment like the World Wide Web, it is important to use an
unambiguous way of refering to resources that clients can access. This is done through
the Uniform Resource Locator (URL) notation, a straightforward way of indicating the
location in terms of the protocol, the host and the path of the resource within the host.
Typical components of a URL are illustrated in Fig. 1.

Fig 1. Anatomy of a URL


The first part of the URL specifies the communication protocol. For example, when the
client wants to access a resources from a web server using HTTP, the client uses http://.
Other valid protocols include FTP, MAIL, FILE and TELNET. The second part is the
host name where the requested resource resides. By default, the web clients assume that
the server listens to the default port (port 80) for web requests. But, if the server is
configured for a non-standard port, the URL should include the port number, separated
by a colon. The rest of the URL specifies the relative path of the resource (relative to the
DOCUMENT_ROOT of the web server). We discuss about the DOCUMENT_ROOT
later in Section 5.
A URI (Universal Resource Identifier) is a superset of a URL, in anticipation of
different resource naming conventions being developed for the Web. For the time being,
however, the only URI syntax used in practice is the URL - you can safely assume that
"URI" is synonymous with "URL", even though this is not exactly correct. For more
information about URIs please read RFC 1630.

1.2 Terminology
There are a number of terms used in this chapter that have specific meanings in the
context of HTTP comunication. A few of the most important terms are given below.
Connection
A transport layer virtual connection (TCP/IP connection in most cases) established
between the server and the client for the purpose of communication.
Message
The basic unit of HTTP communication.
Request
An HTTP request message sent by the client to the server.

Response
An HTTP response message sent by the server to the client.
Resource
A network data object or a service that can be identified by a URI.
Note: A resource may not necessarily be a web page; it could be any resource that
can be served via the network (e.g. a voice stream).
User Agent
The client which initiates the request. In most cases, this is a web browser.
Server
An application program that accepts connections, receives requests and sends back
responses. This is a very broad definition, and depending on the nature of the
requests being served, the server could be an origin server, proxy, or another type
of server.
The rest of the chapter focuses on the details of the HTTP and discusses how this
protocol operates. In later sections, we discuss web server performance issues and
attempt to identify bottlenecks in web communication. Furthermore, we consider how a
web server can be tuned to maximize its performance under different conditions.

Checkpoint
1. As a part of the assignment, we asked you to set up an Apache web server on a
port > 50000. Why do you not use port 80?
2. What are the pros and cons of defining a protocol as a sequence of interactions
within a single session (like SMTP) versus one single REQUEST, one single
RESPONSE and then disconnectiong (as in HTTP 1.0).

Networking and the Internet


COSC1300 - Lecture Notes
Web Servers and Web Technology

HTTP Protocol
Copyright 2000 RMIT Computer Science
All Rights Reserved

COSC1300
Introduction

Request Methods

Table Of Contents

n this chapter, we discuss about HTTP protocol, request


and response phases of an HTTP connection, and
different request methods.

1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links

2. Hypertext Transfer Protocol (HTTP)


Each HTTP transaction is handled as a separate conversation between the browser and the
server. For this reason, we say that HTTP is stateless -- it does not remember the state that it
was in at the end of the last conversation.
Note: This statelessness has some advantages and disadvantages: one advantage is
efficiency. It reduces the overhead of keeping track of historical transactions and makes a
big difference to a heavily loaded web server. On the other hand, statelessness creates
problems: Just imagine a online shopping cart application that cant remember what you
ordered on the previous page. In such cases, the users will benefit if the server can remember
information on previous transactions.
HTTP is stateless, but you can add a state to this stateless mechanism using other methods
such as sessions. These topics are discussed elsewhere.
The protocol normally consists of two phases: the request phase and the response phase. In the
request phase, the browser sends out a request consisting of a request method, the path part of
an URL, and the version number of the HTTP protocol. It then sends some header
information, terminated by a blank line. In the response phase, the server returns the protocol
version, a status code, and a few lines of header information, terminated by a blank line. Then,
the server sends data, the actual content requested by the browser.

Fig: 2. A short conversation between your browser and


http://goanna.cs.rmit.edu.au:2000/hello.html.
All HTTP transactions follow the same general format. Each client request and server
response has three parts: the request or response line, a header section, and the body.

2.1 Request Phase


The three parts of a client request are as follows:
1. The client contacts the server at a designated port. (The default port is 80, but we can set
up a web server listening to non-standard port). It then sends a document request by
specifying an HTTP command called a Method, followed by a document address and the
HTTP version number.
For example, if the client wants to fetch hello.html using the HTTP/1.1 protocol, it
sends the command:
GET /hello.html HTTP/1.1

This command uses the GET method to request the document hello.html.
The Methods supported by HTTP protocol are discussed in Section 2.1.1
2. Next, the client sends optional header information to inform the server of its
configuration and the document formats it will accept. All header information is given
as a <Header Name:Value> pair.
For example,

Connection:Keep-Alive
User-Agent:Mozilla/4.73
Accept:image/gif, image/jpeg, */*

tells the server:


- keep the TCP/IP connection open even after the document is delivered.
- the browser name is Mozilla (Netscape) and its version is 4.73
- the browser can handle gif and jpg images.
The header section terminates with a blank line. You can find a extensive list of HTTP
request headers and their meanings in Section 2.1.3.
3. The third part of the client request is optional. The client may use it to send additional
information that might be needed to process POST requests. In other words, if you use
the GET method to request, then there is no need to pass this portion to the server. We
will discuss how the POST method works and how it differs from the GET method later.

Checkpoint
1. What is the general format of a HTTP request and a HTTP response?

Introduction
COSC1300 - Lecture Notes
Web Servers and Web Technology

Request Methods
Copyright 2000 RMIT Computer Science
All Rights Reserved

COSC1300
HTTP

Request Headers

Table Of Contents

2.1.1 Methods
n this section, we discuss different methods used in
Hypertext Transfer Protocol, between clients and servers.

1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links

There are 5 methods defined in the HTTP protocol. They are listed below.
Method
Description
GET
Returns the contents of the document
HEAD
Returns the header information of the document
POST
Treats the document as a script, executes it and sends results
PUT
Replaces the content of the document with some data.
DELETE
Deletes the document

2.1.1.1 The GET Method


GET is the most common method used by clients to request documents. When a client uses the
GET method, the server responds with a status line, headers, and the requested document. If the
server cannot process the request due to an error or lack of authorization, the server usually sends
a textual explanation in the data portion of the response.
We mentioned that the client request may comprise of three portions, but the GET request has
only two parts: the request command and the request headers. The third part of the request (the
entity-body portion) of a GET request is always empty. GET is basically used for Please send me
this file -type requests.
However, it should be noted that you can use the GET method to pass data to a script, for example
when processing a form. In such cases, we attach these additional information (for example, form
fields and their values) to the requested URL. In other words, these additional information is
passed as a part of the request command. For a clarification, please refer the following illustration.

Illustration of the GET method used in a more complex situation.


1. The user requests a form by GETing its URL.

2. The user fills in the form, and hits the submit button.

At this point, the browser collects the form fields and their values, attaches them to the URL (of
the script that processes the form), and passes back to the server using the GET method.
The command part of this request is of the following form:
GET /serve_drink.php?username=Citizen&favorite=Water&submit=Submit HTTP/1.1

3. The server executes the script (using the arguments it received) and sends the processed results
back to the client.

2.1.1.2 The POST Method

This is another method the client can use to send a request to a web server. However, the server
responds in a different way this time around. When the server receives a POST request, it redirects
this request and its associated data to another program (or a script). In most cases, such a program
acts as a web gateway or a web interface to a database or another information system. This
program is executed and the result is sent back to the web server. The web server in return sends
the processed result back to the client. The POST method, in general, can be considered as a
please do this for me-type request.
Essentially, a POST request has three parts: the command, the request headers and additional data
required to process the request. For example, in a form processing program, this additional data
may contain form field values.

Fig 4. The conversation between the client and the server, when you POST
http://goanna.cs.rmit.edu.au:2000/multiply.php program.

2.1.1.3 HEAD Method


The HEAD method is functionally similar to the GET method except that the server will send only
the response header in its reply. A HEAD request consist of only two parts: the command and the
request headers. These request headers are similar to the request headers in a GET request.
This method is used when the client wants to find out information about the document and not
retrieve it. For example, the client may desire the following information:
The modification time of a document, useful for cache-related queries. (Caching will be
discussed in Chapter 6.)
The size of the document, useful for page layout, estimating arrival time, or determining
whether to request a smaller version of the document.
The type of the document.
The type of the server, to allow customized server queries.
Please note that the header information sent by the server can vary from server to server.
The following diagram illustrates a conversation between the client and the server using the
HEAD method.

Fig 4. The conversation between the client and the server, when you HEAD
http://goanna.cs.rmit.edu.au:2000/hello.html .

2.1.1.4 Other Methods


Apart from the above three methods, HTTP specifies a few other methods that are used less
frequently. In fact, not all servers implement these methods.
DELETE Method

This allows the client to request the server to delete a document specified in the command line.
PUT Method
This allows the client to pass a document to be saved in the servers document tree.
OPTIONS methods
This method allows the client to determine the options associated with a resource or the
capabilities of a server, without initiating a retrieval.
TRACE method
This allows the client to send a request body to the server and get it back. It is useful for checking
the connections & to trace its path.
CONNECT method
This is a reserved method, used specifically for SSL tunnelling. (SSL is described in Chapter 6).
Availability of methods in HTTP/1.0 & HTTP/1.1
Method
GET
POST
HEAD
DELETE
PUT
OPTIONS
TRACE
CONNECT

HTTP/1.0
Yes
Yes
Yes
Yes
Yes
No
No
No

HTTP/1.1
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes

More about these methods can be found in RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1,
RFC 1945 - Hypertext Transfer Protocol -- HTTP/1.0, and Key differences between HTTP/1.0
and HTTP/1.1.

2.1.2 Comparison between GET & POST Methods


Although the GET method is meant for Please send me-type requests, we saw in section
2.1.1.1 how the GET method can be used to pass some values (as a part of the GET command) to
the server, and then on to a server-side script. The reader may be confused about how to determine
whether to use GET method or POST method, when the client wants to pass some values (say,
form fields) to a server-side script. The following table gives you a rule of thumb to make a
reasonable decision.

Use GET method, if

Use POST method, if


- you have a very long list of
- you have only a few values to
values (say more than 1000 bytes)
pass to the script
to be passed to the script
- you do not want to allow the
- you expect the user wants to
user to bookmark the page with
save the query result as a
the query values, and you expect
bookmark, and the user expects
the user to select them
to retrieve the same results
dynamically
- your query consists of some
- you dont care, if the user sees
confidential data that should not
the values passed to the script
appear in the URL
- you want to pass only ASCII - you want to pass non-ASCII
data
data (say, an image).

Checkpoint
1. If the server has Multi-lingual support (it can deliver documents in different languages), how
does it determine in which language the document be delivered?

HTTP
COSC1300 - Lecture Notes
Web Servers and Web Technology

Request Headers
Copyright 2000 RMIT Computer Science
All Rights Reserved

COSC1300
HTTP Methods

Response Phase

Table Of Contents

2.1.3 Request Headers


n this section, we discuss the request phase of the
HTTP connection, request headers, in particular.

1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links

The request header is comprised of an arbitrary number of header fields. Most of these
fields are informational, and are generally optional. The following table gives a list of
commonly used header fields and their meanings.
Header

Description

From
User-Agent
Referer
Accept
Accept-Encoding
Accept-Language
Authorization
If-Modified-Since
Content-Length
Connection
Host
Cookie

E-mail address of the requesting user


Name and the version no. of client browser
URL of the last document the client displayed
File types that the client will accept
Compression methods that client will accept
Language(s) that client will accept
Used for authentication purposes
Return document only if modified since specified date/time
Length (in bytes) of the request body (used in POST method)
Connection options, such as keep-alive
Virtual host to retrieve data from
Send a previously saved cookie to the server

User-Agent
This header is useful for the server to generate custom-built pages. For example, the
server may deliver a Frames version of a document to a Netscape client, while

delivering a No-Frames version to a Lynx client. (Lynx is a UNIX based text-only


browser that does not support frames, well, sort of).
Referer
This header is used to send the last visited URL to the server. This could be used to
dynamically generate a Back button in your documents. Furthermore, it can be used
to determine whether the client followed a proper sequence of pages, when such a
sequence exists.
Accept
This field is used to specify what document types the client (browser) wants to receive.
There may be multiple Accept lines in a request header.
For example:
Accept: text/plain
Accept: text/html
Accept: image/jpeg

headers tell the server that it can accept plain text and HTML documents and Jpeg
images.
Accept-Encoding and Accept-Language specifies what compression methods that the
client can understand (and uncompress) and the language priorities.
If the same document is available in different languages, the server can determine the
document to deliver using the Accept-Language field.
If-Modified-Since
This is used in caching schemes. In order to improve efficiency, most browsers keep a
copy of previously accessed documents in a browser cache, and display the local copy
when the user requests it again, rather than downloading it again. However, in order for
this to work well, the browser must check the remote server to make sure that the
document hasnt changed. If-Modified-Since is used by the browser to ask the server to
return the document only if it has changed since the specified date/time. Caching is
discussed in Chapter 6.
Connection
This header field is sent to the server to ask for special handling mechanisms. For
example, if the client wishes to establish a persistent connection for the entire
transaction, it can ask for a Keep-Alive connection.
Authorization
Authorization is used by various validation schemes. This will contain the name of the
authorization method and any other information expected by the validation method, such
as realm, username and password.
Cookie

The Cookie header field is not defined in HTTP/1.0 nor in HTTP/1.1. Nevertheless,
among all the request header fields, this is the most popular request header field, and is
used in millions of sites. Cookie is an extension provided by Netscape, and widely used
to maintain the state of the web pages. It is used by the browser to send cookie values
that had been saved in the browser. Cookies are discussed in detail in Chapter 4.
How Cookies work in the client-side
When the user types in a URL, the browser searches its Cookies Database " to see if
there are any cookies associated with the requested page. If any such cookies exist (and
they have not expired), it attaches a Cookie header field to the request header (along
with the cookie <name=value> pairs) and sends to the server.
E.g:
Cookie: username=Citizen
Cookie: favorite=Water

The way the cookies are stored in the browser varies from browser to browser. For
example, Netscape Navigator keeps them in a single file, while Internet Explorer stores
them in individual files.
On arrival of the request header, the server detaches the cookie and acts on the
received information. Most servers store cookie data in an environment variable called
HTTP_COOKIE and make them available for server-side scripts.
The Set-Cookie header field is used in the response headers, and it is used by the server
to send cookies to be saved in the browser. Well discuss the Server-side of the
Cookies story later.
For more information about Cookies, please visit
http://developer.netscape.com/viewsource/archive/goodman_cookies.html.

Checkpoint
1. Why is the Host header required when requesting a resource from a virtual host?
2. How does Basic Authentication work at the client side?
3. At http://www.fruit.com, people can buy apples, oranges, and bananas. A
customers basket can contain, 0.5 kilogram apples and 1.4 kilograms bananas.
What command, and with which parameters, can such a web site use to store this
information in a cookie in the users workstation?

HTTP Methods
COSC1300 - Lecture Notes
Web Servers and Web Technology

Response Phase
Copyright 2000 RMIT Computer Science
All Rights Reserved

COSC1300
Request Headers

Persistent Connections

Table Of Contents

2.2 Response Phase


n this section, we discuss how the response phase
of a HTTP connection works.

1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links

Now its the servers turn to respond the client request. Similar to the client request, the
server response consists of three components.
The status line
The Response Headers
The Response Body
The server first sends back a line, usually referred to as the Status line containing the
protocol version, a three-digit status code, and a text explanation of the status.

2.2.1 Status Codes


The status codes are categorized into 4 groups as follows.

Code

Text

Description

2XX codes - success


200

OK

201

Created

202

Accepted

204

No Response

The URL was found. The contents will


be delivered in the response body
A URL was created in response to a
POST request
The request was accepted for processing
later
The request is successful, but there is no
data to send. This happens when an
executable script has done some
processing in response to a query, but it
doesnt have any particular information
to display.

3XX codes - Redirection


301

Moved

302

Found

4XX codes - Client Errors


400
Bad Request
401

Unauthorized

403

Forbidden

404

Not Found.

The URL has permanently moved to a


new location.
The URL can be temporarily found at a
new location.
Syntax error in the request.
The client failed to authenticate itself
successfully.
This URL is forbidden. It could be IP
address based restriction, user-based
restriction, or directory-based restriction.
This type of errors cant be overcome by
just providing the correct
username/password.
You knocked the wrong door; the
document is not there.

5XX codes - Server Errors


500
502
503

The server encountered an unexpected


error.
Service
The server is overloaded at the moment
Overloaded
with too many requests.
The server was trying to fetch data from
Gateway Timeout elsewhere when the remote service
failed.
Internal Error

2.2.2 Response Headers


After the status line, the server sends out a response header. The header is a mixture of
various pieces of information about the server and the document to follow. Like the
request header, much of the information in the response header is optional, with the

exception of the Content-Type field.


After the header, the server sends a blank line that delimits the header from the response
body. In the response body, you will also find the actual document. After this, the HTTP
conversation between the server and the client is terminated.
In the following table and the subsequent sub sections, you will find the most often used
response headers.
Header
Server
Date
Last-Modified
Expires

Description
Name & the version of the server software
The current date & time (GMT)
Date on which the document was last modified.
Date on which the document expires.
The location of the document. This is used when the document is
Location
retrieved from a redirected location.
MIME-Version
The MIME version used
Content-Length
The length in bytes
Content-Encoding The compression method of this data
Content-Language The language in which this document is written.
Pragma
Additional information for the browser
WWW-Authenticate Used for authentication.
ETag
Unique identification number for the server.
Set-Cookie
Sets and sends a cookie to the browser.
WWW-Authenticate
This specifies the authorization scheme and the realm of authorization required for the
requested URL. When the client receives this header, it pops up a dialogue window for
user to enter the username and the password.
e.g: This site returns
WWW-Authenticate: BASIC realm="SameAsForums"

and when the client receives it for the first time, it displays the user authentication
dialogue box. This is covered in more detailed in Chapter 4.
Content-Type
This describes the media type and the subtype of the response body. The server should
return media types that conform with the clients preferred formats. The client usually
specifies what it wishes to receive in its Accept request header.
ETag
This indicates an entity tag. This field provides the client with a unique identifier for the

server resource. It is highly unlikely that different server resources will have the same
entity tag. This tag provides a powerful mechanism for caching.
e.g:
ETag: "2f5cd-964-381e1bd6"

Set-Cookie
This is the server-side part of the Cookie communication. This header contains a
<name=value> pair (the actual cookie) which the server wants the client to maintain.
There are other optional fields the server may include in the header. The additional fields
include the expire date of the cookie and the path of the document tree to which this
cookie is attached. Cookies are discussed in more detail in chapter 4.
e.g:
Set-Cookie username=Citizen expires= Saturday 29-Jul-00 12:30:00 GMT

This will store a cookie named username with the value Citizen in the client
browser, and it is attached to the current document.
It is possible to send a cookie that affects to a whole branch of the document tree or even
more than one server.
Pragma
Pragma is used to send various instructions to the browser. A commonly-used hint is
no-cache, which tells the browser not to add the document into its local browser cache.
This is useful if the document is a result of a POST request and is generated on-the-fly by
a script and changes every time it is requested.
e.g:
Pragma "no-cache"

Checkpoint
1. The client receives the following response header.
HTTP/1.1 302 Found
Date: Wed, 02 Aug 2000 01:19:50 GMT
Server: Apache/1.3.12 (Unix) PHP/4.0.0 mod_ssl/2.6.4 OpenSSL/0.9.5a
Location: https://yallara.cs.rmit.edu.au:8001/new_server.html
Connection: close
Content-Type: text/html; charset=iso-8859-1

What is the meaning of these response headers?

Request Headers

Persistent Connections

COSC1300 - Lecture Notes


Web Servers and Web Technology

Copyright 2000 RMIT Computer Science


All Rights Reserved

COSC1300
Response Phase

Comparison between HTTP/1.0 &


HTTP/1.1

Table Of Contents

3. Persistent HTTP Connections


n this section, we discuss how HTTP/1.1
protocol handles persistent connections, and how
we can achieve this in HTTP/1.0

1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links

One main drawback in HTTP/1.0 is that it requires a new TCP/IP connection be set up
and destroyed for each document transferred. This imposes a severe performance
degradation when a browser needs to fetch several URLs from the same server - a
common case when downloading a document that contains several images.
E.g. Lets assume that we want to download the following page:
<HTML><HEAD>
<TITLE>The multiple images example<TITLE></HEAD>
<BODY>
<IMG SRC="1.gif">
<IMG SRC="2.gif">
<IMG SRC="3.gif">
<BODY></HTML>

The entire conversation that takes place between the server and the client is as follows.
1.
2.
3.
4.
5.
6.

The client starts up a TCP/IP connection with the server.


The client sends the HTTP request.
The server sends the document, with image tags, but not the images.
The connection is destroyed.
The client establishes three new TCP/IP connections with the server.
The client hands over each HTTP request (for each image) via newly-established
connections.
7. The Server sends images
8. The connections are destroyed.

Since we destroy the original connection at a time when we have not completed the
download (i.e. document and the images), the performance is degraded.
HTTP/1.1 proposes a solution for this drawback. It allows the client and the server to
establish persistent connections, allowing the client to continue with the existing
connection if it needs to download more resources.
If we used HTTP/1.1 the above conversation would as follows:
1.
2.
3.
4.
5.

The client starts up a TCP/IP connection with the server.


The client sends the HTTP request.
The server send the document, with image tags, but not the images.
The client establishes two more TCP/IP connections with the server.
The client hands over each HTTP request (for each image) via the existing
connection and the newly-established connections.
6. The Server sends images
7. The connections are destroyed.

In comparison to the previous example, we need to establish only two new connections,
saving the start-up time for one TCP/IP connection.

Response Phase

COSC1300 - Lecture Notes


Web Servers and Web Technology

Comparison between HTTP/1.0 &


HTTP/1.1
Copyright 2000 RMIT Computer Science
All Rights Reserved

COSC1300
Persistent Connections

Web Servers

Table Of Contents

4. HTTP/1.0 and HTTP/1.1


n this chapter, we present a comparison between
HTTP/1.0 and HTTP/1.1 protocols.

1. Introduction
2. Hypertext Transfer
Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links

Comparison between HTTP/1.0 and HTTP/1.1


Some of the most significant differences between HTTP/1.0 and HTTP/1.1 are given
below.
Persistent TCP/IP Connections
As discussed in the above section, HTTP/1.1 connections remain open by default,
allowing the browser to download multiple resources using the same TCP/IP
session.
Partial Document Transfers
HTTP/1.1 allows browsers to obtain specific portions of documents by specifying
the start and end positions to be retrieved.
In addition, this protocol allows documents to be divided into logical chunks that
are handled independently.
This allows for caching schemes in which only those portions of the document that
have changed need to be downloaded from the web server.
Conditional Fetch
HTTP/1.0 allowed only a single type of conditional fetch - using the

If-modified-since header field.


HTTP/1.1 adds several additional types of conditional fetch, increasing the
flexibility of this feature.
Better Content Negotiation
HTTP/1.0 implements server-side content negotiation. The browser gives the
server a prioritized list of MIME types it is willing to accept, and the server
decides which version of a document to send. HTTP/1.1 adds client-side content
negotiation, in which the server announces what formats are available and the
browser picks the version it wants.
Official Support for Nonstandard HTTP/1.0 Extensions
There were quite a few non-standard features that were used in the HTTP/1.0
protocol. One example is the Host field used to select a logical host from a server
that housed several virtual hosts.
Better Support for Alternative Character Sets
HTTP/1.1 provides better support for alternative character sets, such as Mandarin
and Japanese.
More Flexible Authentication
HTTP/1.1 adds support for user authentication across firewalls and gateways. It
also provides an authentication mechanism based on the MD5 cryptography
algorithm that avoids the problem of sending usernames/passwords across the
network using plaintext. Cryptography is discussed in Chapter 6.
For more information about performance improvements gained in HTTP/1.1, please
read the article W3C Recommendations - Reducing World Wide Wait and the paper
Network Performance Effects of HTTP/1.1, CSS1, and PNG.

Checkpoint
1. How does the server choose the protocol to be used, i.e either HTTP/1.0 or
HTTP/1.1?

Persistent Connections
COSC1300 - Lecture Notes
Web Servers and Web Technology

Web Servers
Copyright 2000 RMIT Computer Science
All Rights Reserved

COSC1300
Comparison between HTTP/1.0 &
HTTP/1.1

Useful Links

Table Of Contents

5. Web Servers
n this chapter, we discuss the installation,
configuration and running of a web server.

1. Introduction
2. Hypertext Transfer
Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links

A web server is an application that listens for requests from a client (generally a web
browser), processes this request in some way, and sends a response. The language that is
used for this communication is HTTP, and is possible because there is an HTTP logical
connection between the client and the server.
The best way of understanding something is doing it for yourself. Installing and
configuring a web server is no exception. You will be able to understand most of the
topics that are covered easily if you spend some time installing your own server,
tweaking its configuration options, and experimenting with its performance.

5.1 Why Apache?


There are a number of reasons why we choose Apache.
It is easy to install and configure - the installation and configuration is
straight-forward and self-descriptive, therefore, you should be able to install it
successfully without being an expert in the field.
It is open - the configuration is so open that you know exactly the effects of each
change you make in the configuration, making it ideal for teaching purposes.
It is open source and released under GPL (GNU Public License) - therefore we can
use it without licensing costs. For more information on license issues visit the
GNU web site
It is the most popular web server today - approximately 60% of all web sites in the
world use Apache servers. For more information about web server usage statistics,
visit http://www.netcraft.com/survey web server survey. You may wish to use

Netcrafts Exploring sites facility to detect the web servers running at your
favorite web sites.
It can be smoothly integrated with many other useful modules. For example, the
PHP scripting language can be accommodated in the Apache server as a module.

5.2 Apache Server Installation - Directory Structure


When installing a server from scratch using Apache source code, we need to store this
source code temporarily in a suitable place, traditionally /usr/local/src. It is always a
good idea to keep the source code directory away from the final server software
installation directories.
The location where we install the Apache server software is referred to as the
SERVER_ROOT. The installation process creates various subdirectories, including the
following, under the SERVER_ROOT.

is the DOCUMENT_ROOT of your web server, where you put in the documents
you want to publish.
bin is where the executable scripts that come with Apache, such as apachectl
and apxs are located.
conf is where the Apache configuration files, such as httpd.conf are located.
htdocs

5.3 Configuring Apache


Most configuration information for Apache is held in the file httpd.conf. Other files
can be used by placing the Includes filename directive in the configuration file.
Apache reads its configuration into memory when it is first started, so if you make any
changes to your configuration file, you will need to restart your server for them to take
effect. The configuration file is read from top to bottom, so if the same directive appears
twice, only the second one will be used.
5.3.1 Server-Level Directives
Server-level directives apply to the server as a whole. Some directives, such as
LoadModule only make sense on the server level. For other directives, it is useful to set a
default value which can be specifically overridden by container or per-directory
directives.
5.3.2 Container Directives
There are nine container directives that can be used in the Apache configuration file.
These directives are used to specify resources or request methods. The containers can
then include configuration directives specific to the matched entities. In most cases, the
resource can be spcified with or without quotes.

The Match forms are used for matching multiple resources using regular expressions.
and <DirectoryMatch>
These directives are used to match specific directories under the web document root.
<Directory>

<Directory "/usr/local/htdocs/php_examples">

This would match the directory /usr/local/htdocs/php_examples


<DirectoryMatch "^/usr/local/htdocs/.*/[A-Z]{3}">

This would match any directory under /usr/local/htdocs consisting of three capital
letters.
<Files>

and <FilesMatch>

These directives are used to match specific files under the web document root.
<Files "apache.gif">

This would match any file named apache.gif


<FilesMatch "\.(jpeg|gif)$">

This would match any file ending with .gif or .jpeg.


<Location>

and <LocationMatch>

These directives are used to match a URL. This means that the parameter does not have
to match the file system but may match files or directories.
<Location "php_examples">

This would match the URL http://domainname/php_examples


<LocationMatch "(c|php|pl)_examples">

This would match any URL containing c_examples, php_examples or pl_examples,


such as http://domainname/pl_examples and http://domainname/php_examples
<Limit>

and <LimitExcept>

These containers can be used to limit the scope of their effectiveness to the HTTP
methods specified.
<Limit GET POST>

The directives entered in this container will only apply to requests made using the GET
and POST HTTP methods.
<LimitExcept HEAD>

The directives entered in this container will apply to requests made using any HTTP
method except HEAD.

<VirtualHost>

This container allows one server to serve files for multiple domains or IP addresses. It is
possible to override server-level directives in a VirtualHost container. For example,
each virtual host can have its own logs and web document root.
5.3.3 Per-Directory Directives
Files can be placed in individual directories containing directives that will apply to that
directory and its subdirectories. The AllowOverride directive controls the types of
directives that can be placed in these files, while the AccessFileName directive specifies
what these files must be called. The default name is .htaccess.
5.3.4 Order Allow,Deny
One of the most common tasks that a server administrator will want to perform is to
allow or deny access to certain resources. This is achieved by the use of the Order,
Allow and Deny directives. Allow and Deny can be used to specify hosts or networks, by
domain or IP address and allow or deny access to them. Order is used to specify the
order in which the Allow and Deny directives are evaluated.
Deny from 192.168.12.122

This will deny from the host with the IP address 192.168.12.122.
Allow from 192.168

This will allow from all hosts with an IP address beginning with 192.168.
Deny from yallara.cs.rmit.edu.au

This will deny from the host yallara.cs.rmit.edu.au.


Allow from rmit.edu.au

This will allow from all hosts on the rmit network.


Order Allow,Deny
Deny directives.

will force the evaluation of all Allow directives, followed by all

Order Deny,Allow
Deny from cs.rmit.edu.au
Allow from yallara.cs.rmit.edu.au

This will cause all hosts except yallara on RMITs computer science network to be
denied access.
Note that the Order directive uses the second argument to provide default access.
Order Allow,Deny

This will deny access to all hosts. While the default access does work, it is unclear and
should be avoided in favour of more explicit directives, such as below.
Order Allow,Deny
Deny from all

Comparison between HTTP/1.0 &


HTTP/1.1
COSC1300 - Lecture Notes
Web Servers and Web Technology

Useful Links

Copyright 2000 RMIT Computer Science


All Rights Reserved

COSC1300
Web Servers

Web Server Performance

Table Of Contents

6. Useful Tools
n this chapter, we present you with a list of Web
resources that could be useful in your studies.

1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links

In this section, you will find some useful software tools for setting up, configuring and
maintaining a web server.
Of course, you need Apache if you are going to install an Apache server.
If you think it is difficult to edit configuration files manually, try TkApache
Graphical User Interface.
Apache server is bundled with RedHat Linux. If you run Linux at home, you
might like to install using .rpm format. Instructions can be found here.
Alternatively, you can order RedHat Linux from http://www.lsl.com.au.
You can download PHP source code or binaries for many platforms including
Win32 from http://www.php.net/downloads.php. The Australian mirror is
http://au.php.net/downloads.php.
Zend optimizer is very useful add-on feature for a PHP-capable Apache server. It
optimizes intermediate PHP code, and enhance the server performance. Zend
Optimizer can be downloaded from http://www.zend.com.
Jigsaw Web Server is is W3Cs leading-edge Web server platform, providing a
sample HTTP 1.1 implementation and a variety of other features on top of an
advanced architecture implemented in Java. The W3C Jigsaw Activity statement
explains the motivation and future plans in more detail. Jigsaw is an W3C Open
Source Project, started May 1996.

Internet Information Server is the Microsofts candidate in the Web Server


market competition.
Another very popular relational database system used with web servers is MySql.
The precompilled PHP binary for Windows platform has built-in MySql functions.

7. Related Links
1. http://www.apacheweek.com - ApacheWeek Weekly online Magazine. Read this
to know about whats happening in the Apache world.
2. http://www-genome.wi.mit.edu/WWW/resource_guide.html Linoln Steins How
to setup and maintain a Web Site home page.
3. http://serverwatch.internet.com/webservers.html - Web server technical details &
server comparison.
4. http://www.w3.org/Talks/1998/10/WAP-NG-Overview - W3Cs presentation on
HTTP/ng (I guess HTTP/ng is not progressing).
5. http://Apache-Server.Com/tutorials - Ken Coars Apache tutorials (Author of
Apache Server for Dummies).
6. http://www8.org/w8-papers/5c-protocols/key/key.html -Key Differences between
HTTP/1.0 and HTTP/1.1 - A paper on HTTP/1.0 & HTTP/1.1
7. http://developer.netscape.com/docs/manuals/enterprise.html Netscape Enterprise
Server Documentation
8. http://www.microsoft.com/ISN/whitepapers.com - Web Hosting with IIS 5.0 - A
review of Internet Information Server 5.0.
9. http://www.irt.org/articles/js177/index.htm - Apache at your Web Service , an
IRT document on Apache.
10. http://www.devshed.com/Server_Side/PHP/SoothinglySeamless - Devsheds
Apache+PHP+SSL+MySql installation tutorial.

Contributors:
Santha Sumanasekara (santhas@cs.rmit.edu.au)
Michael Harris (miharris@cs.rmit.edu.au)

Web Servers
COSC1300 - Lecture Notes
Web Servers and Web Technology

Web Server Performance


Copyright 2000 RMIT Computer Science
All Rights Reserved

Vous aimerez peut-être aussi