Académique Documents
Professionnel Documents
Culture Documents
HTTP Protocol
Table Of Contents
1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links
1. Introduction
In Chapter 2, we discussed the TCP/IP Network Protocol Suite and the functionality of
each layer. There we mentioned various application protocols in the Application Layer
of the TCP/IP hierarchy. Among these application layer protocols, the Hypertext
Transfer Protocol is one of the most important, and is the topic for this chapter.
HTTP is the language that Web Browsers (the client) and Web Servers (the server) use
to speak to each other. It is important to enforce a strict set of rules for this conversation,
as the client probably needs to communicate with many servers (e.g you access this site
as well as many other web sites) and the server needs to communicate with many clients
(e.g this site is accessed by many students). The Internet Engineering Task Force (IETF)
has released several RFCs (Requests for Comments) that outline HTTP and set the
standard for Web communication. The following table lists a few important RFCs
related to HTTP.
RFC
Number
1945
2616
2617
Purpose
URL
HTTP/1.0 Specifications
HTTP/1.1 Specifications
HTTP Basic & Digest
Authentication
http://www.w3.org/Protocols/rfc1945/rfc1945
http://www.ietf.org/rfc/rfc2616.txt
http://www.ietf.org/rfc/rfc2617.txt
1.2 Terminology
There are a number of terms used in this chapter that have specific meanings in the
context of HTTP comunication. A few of the most important terms are given below.
Connection
A transport layer virtual connection (TCP/IP connection in most cases) established
between the server and the client for the purpose of communication.
Message
The basic unit of HTTP communication.
Request
An HTTP request message sent by the client to the server.
Response
An HTTP response message sent by the server to the client.
Resource
A network data object or a service that can be identified by a URI.
Note: A resource may not necessarily be a web page; it could be any resource that
can be served via the network (e.g. a voice stream).
User Agent
The client which initiates the request. In most cases, this is a web browser.
Server
An application program that accepts connections, receives requests and sends back
responses. This is a very broad definition, and depending on the nature of the
requests being served, the server could be an origin server, proxy, or another type
of server.
The rest of the chapter focuses on the details of the HTTP and discusses how this
protocol operates. In later sections, we discuss web server performance issues and
attempt to identify bottlenecks in web communication. Furthermore, we consider how a
web server can be tuned to maximize its performance under different conditions.
Checkpoint
1. As a part of the assignment, we asked you to set up an Apache web server on a
port > 50000. Why do you not use port 80?
2. What are the pros and cons of defining a protocol as a sequence of interactions
within a single session (like SMTP) versus one single REQUEST, one single
RESPONSE and then disconnectiong (as in HTTP 1.0).
HTTP Protocol
Copyright 2000 RMIT Computer Science
All Rights Reserved
COSC1300
Introduction
Request Methods
Table Of Contents
1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links
This command uses the GET method to request the document hello.html.
The Methods supported by HTTP protocol are discussed in Section 2.1.1
2. Next, the client sends optional header information to inform the server of its
configuration and the document formats it will accept. All header information is given
as a <Header Name:Value> pair.
For example,
Connection:Keep-Alive
User-Agent:Mozilla/4.73
Accept:image/gif, image/jpeg, */*
Checkpoint
1. What is the general format of a HTTP request and a HTTP response?
Introduction
COSC1300 - Lecture Notes
Web Servers and Web Technology
Request Methods
Copyright 2000 RMIT Computer Science
All Rights Reserved
COSC1300
HTTP
Request Headers
Table Of Contents
2.1.1 Methods
n this section, we discuss different methods used in
Hypertext Transfer Protocol, between clients and servers.
1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links
There are 5 methods defined in the HTTP protocol. They are listed below.
Method
Description
GET
Returns the contents of the document
HEAD
Returns the header information of the document
POST
Treats the document as a script, executes it and sends results
PUT
Replaces the content of the document with some data.
DELETE
Deletes the document
2. The user fills in the form, and hits the submit button.
At this point, the browser collects the form fields and their values, attaches them to the URL (of
the script that processes the form), and passes back to the server using the GET method.
The command part of this request is of the following form:
GET /serve_drink.php?username=Citizen&favorite=Water&submit=Submit HTTP/1.1
3. The server executes the script (using the arguments it received) and sends the processed results
back to the client.
This is another method the client can use to send a request to a web server. However, the server
responds in a different way this time around. When the server receives a POST request, it redirects
this request and its associated data to another program (or a script). In most cases, such a program
acts as a web gateway or a web interface to a database or another information system. This
program is executed and the result is sent back to the web server. The web server in return sends
the processed result back to the client. The POST method, in general, can be considered as a
please do this for me-type request.
Essentially, a POST request has three parts: the command, the request headers and additional data
required to process the request. For example, in a form processing program, this additional data
may contain form field values.
Fig 4. The conversation between the client and the server, when you POST
http://goanna.cs.rmit.edu.au:2000/multiply.php program.
Fig 4. The conversation between the client and the server, when you HEAD
http://goanna.cs.rmit.edu.au:2000/hello.html .
This allows the client to request the server to delete a document specified in the command line.
PUT Method
This allows the client to pass a document to be saved in the servers document tree.
OPTIONS methods
This method allows the client to determine the options associated with a resource or the
capabilities of a server, without initiating a retrieval.
TRACE method
This allows the client to send a request body to the server and get it back. It is useful for checking
the connections & to trace its path.
CONNECT method
This is a reserved method, used specifically for SSL tunnelling. (SSL is described in Chapter 6).
Availability of methods in HTTP/1.0 & HTTP/1.1
Method
GET
POST
HEAD
DELETE
PUT
OPTIONS
TRACE
CONNECT
HTTP/1.0
Yes
Yes
Yes
Yes
Yes
No
No
No
HTTP/1.1
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
More about these methods can be found in RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1,
RFC 1945 - Hypertext Transfer Protocol -- HTTP/1.0, and Key differences between HTTP/1.0
and HTTP/1.1.
Checkpoint
1. If the server has Multi-lingual support (it can deliver documents in different languages), how
does it determine in which language the document be delivered?
HTTP
COSC1300 - Lecture Notes
Web Servers and Web Technology
Request Headers
Copyright 2000 RMIT Computer Science
All Rights Reserved
COSC1300
HTTP Methods
Response Phase
Table Of Contents
1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links
The request header is comprised of an arbitrary number of header fields. Most of these
fields are informational, and are generally optional. The following table gives a list of
commonly used header fields and their meanings.
Header
Description
From
User-Agent
Referer
Accept
Accept-Encoding
Accept-Language
Authorization
If-Modified-Since
Content-Length
Connection
Host
Cookie
User-Agent
This header is useful for the server to generate custom-built pages. For example, the
server may deliver a Frames version of a document to a Netscape client, while
headers tell the server that it can accept plain text and HTML documents and Jpeg
images.
Accept-Encoding and Accept-Language specifies what compression methods that the
client can understand (and uncompress) and the language priorities.
If the same document is available in different languages, the server can determine the
document to deliver using the Accept-Language field.
If-Modified-Since
This is used in caching schemes. In order to improve efficiency, most browsers keep a
copy of previously accessed documents in a browser cache, and display the local copy
when the user requests it again, rather than downloading it again. However, in order for
this to work well, the browser must check the remote server to make sure that the
document hasnt changed. If-Modified-Since is used by the browser to ask the server to
return the document only if it has changed since the specified date/time. Caching is
discussed in Chapter 6.
Connection
This header field is sent to the server to ask for special handling mechanisms. For
example, if the client wishes to establish a persistent connection for the entire
transaction, it can ask for a Keep-Alive connection.
Authorization
Authorization is used by various validation schemes. This will contain the name of the
authorization method and any other information expected by the validation method, such
as realm, username and password.
Cookie
The Cookie header field is not defined in HTTP/1.0 nor in HTTP/1.1. Nevertheless,
among all the request header fields, this is the most popular request header field, and is
used in millions of sites. Cookie is an extension provided by Netscape, and widely used
to maintain the state of the web pages. It is used by the browser to send cookie values
that had been saved in the browser. Cookies are discussed in detail in Chapter 4.
How Cookies work in the client-side
When the user types in a URL, the browser searches its Cookies Database " to see if
there are any cookies associated with the requested page. If any such cookies exist (and
they have not expired), it attaches a Cookie header field to the request header (along
with the cookie <name=value> pairs) and sends to the server.
E.g:
Cookie: username=Citizen
Cookie: favorite=Water
The way the cookies are stored in the browser varies from browser to browser. For
example, Netscape Navigator keeps them in a single file, while Internet Explorer stores
them in individual files.
On arrival of the request header, the server detaches the cookie and acts on the
received information. Most servers store cookie data in an environment variable called
HTTP_COOKIE and make them available for server-side scripts.
The Set-Cookie header field is used in the response headers, and it is used by the server
to send cookies to be saved in the browser. Well discuss the Server-side of the
Cookies story later.
For more information about Cookies, please visit
http://developer.netscape.com/viewsource/archive/goodman_cookies.html.
Checkpoint
1. Why is the Host header required when requesting a resource from a virtual host?
2. How does Basic Authentication work at the client side?
3. At http://www.fruit.com, people can buy apples, oranges, and bananas. A
customers basket can contain, 0.5 kilogram apples and 1.4 kilograms bananas.
What command, and with which parameters, can such a web site use to store this
information in a cookie in the users workstation?
HTTP Methods
COSC1300 - Lecture Notes
Web Servers and Web Technology
Response Phase
Copyright 2000 RMIT Computer Science
All Rights Reserved
COSC1300
Request Headers
Persistent Connections
Table Of Contents
1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links
Now its the servers turn to respond the client request. Similar to the client request, the
server response consists of three components.
The status line
The Response Headers
The Response Body
The server first sends back a line, usually referred to as the Status line containing the
protocol version, a three-digit status code, and a text explanation of the status.
Code
Text
Description
OK
201
Created
202
Accepted
204
No Response
Moved
302
Found
Unauthorized
403
Forbidden
404
Not Found.
Description
Name & the version of the server software
The current date & time (GMT)
Date on which the document was last modified.
Date on which the document expires.
The location of the document. This is used when the document is
Location
retrieved from a redirected location.
MIME-Version
The MIME version used
Content-Length
The length in bytes
Content-Encoding The compression method of this data
Content-Language The language in which this document is written.
Pragma
Additional information for the browser
WWW-Authenticate Used for authentication.
ETag
Unique identification number for the server.
Set-Cookie
Sets and sends a cookie to the browser.
WWW-Authenticate
This specifies the authorization scheme and the realm of authorization required for the
requested URL. When the client receives this header, it pops up a dialogue window for
user to enter the username and the password.
e.g: This site returns
WWW-Authenticate: BASIC realm="SameAsForums"
and when the client receives it for the first time, it displays the user authentication
dialogue box. This is covered in more detailed in Chapter 4.
Content-Type
This describes the media type and the subtype of the response body. The server should
return media types that conform with the clients preferred formats. The client usually
specifies what it wishes to receive in its Accept request header.
ETag
This indicates an entity tag. This field provides the client with a unique identifier for the
server resource. It is highly unlikely that different server resources will have the same
entity tag. This tag provides a powerful mechanism for caching.
e.g:
ETag: "2f5cd-964-381e1bd6"
Set-Cookie
This is the server-side part of the Cookie communication. This header contains a
<name=value> pair (the actual cookie) which the server wants the client to maintain.
There are other optional fields the server may include in the header. The additional fields
include the expire date of the cookie and the path of the document tree to which this
cookie is attached. Cookies are discussed in more detail in chapter 4.
e.g:
Set-Cookie username=Citizen expires= Saturday 29-Jul-00 12:30:00 GMT
This will store a cookie named username with the value Citizen in the client
browser, and it is attached to the current document.
It is possible to send a cookie that affects to a whole branch of the document tree or even
more than one server.
Pragma
Pragma is used to send various instructions to the browser. A commonly-used hint is
no-cache, which tells the browser not to add the document into its local browser cache.
This is useful if the document is a result of a POST request and is generated on-the-fly by
a script and changes every time it is requested.
e.g:
Pragma "no-cache"
Checkpoint
1. The client receives the following response header.
HTTP/1.1 302 Found
Date: Wed, 02 Aug 2000 01:19:50 GMT
Server: Apache/1.3.12 (Unix) PHP/4.0.0 mod_ssl/2.6.4 OpenSSL/0.9.5a
Location: https://yallara.cs.rmit.edu.au:8001/new_server.html
Connection: close
Content-Type: text/html; charset=iso-8859-1
Request Headers
Persistent Connections
COSC1300
Response Phase
Table Of Contents
1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links
One main drawback in HTTP/1.0 is that it requires a new TCP/IP connection be set up
and destroyed for each document transferred. This imposes a severe performance
degradation when a browser needs to fetch several URLs from the same server - a
common case when downloading a document that contains several images.
E.g. Lets assume that we want to download the following page:
<HTML><HEAD>
<TITLE>The multiple images example<TITLE></HEAD>
<BODY>
<IMG SRC="1.gif">
<IMG SRC="2.gif">
<IMG SRC="3.gif">
<BODY></HTML>
The entire conversation that takes place between the server and the client is as follows.
1.
2.
3.
4.
5.
6.
Since we destroy the original connection at a time when we have not completed the
download (i.e. document and the images), the performance is degraded.
HTTP/1.1 proposes a solution for this drawback. It allows the client and the server to
establish persistent connections, allowing the client to continue with the existing
connection if it needs to download more resources.
If we used HTTP/1.1 the above conversation would as follows:
1.
2.
3.
4.
5.
In comparison to the previous example, we need to establish only two new connections,
saving the start-up time for one TCP/IP connection.
Response Phase
COSC1300
Persistent Connections
Web Servers
Table Of Contents
1. Introduction
2. Hypertext Transfer
Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links
Checkpoint
1. How does the server choose the protocol to be used, i.e either HTTP/1.0 or
HTTP/1.1?
Persistent Connections
COSC1300 - Lecture Notes
Web Servers and Web Technology
Web Servers
Copyright 2000 RMIT Computer Science
All Rights Reserved
COSC1300
Comparison between HTTP/1.0 &
HTTP/1.1
Useful Links
Table Of Contents
5. Web Servers
n this chapter, we discuss the installation,
configuration and running of a web server.
1. Introduction
2. Hypertext Transfer
Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links
A web server is an application that listens for requests from a client (generally a web
browser), processes this request in some way, and sends a response. The language that is
used for this communication is HTTP, and is possible because there is an HTTP logical
connection between the client and the server.
The best way of understanding something is doing it for yourself. Installing and
configuring a web server is no exception. You will be able to understand most of the
topics that are covered easily if you spend some time installing your own server,
tweaking its configuration options, and experimenting with its performance.
Netcrafts Exploring sites facility to detect the web servers running at your
favorite web sites.
It can be smoothly integrated with many other useful modules. For example, the
PHP scripting language can be accommodated in the Apache server as a module.
is the DOCUMENT_ROOT of your web server, where you put in the documents
you want to publish.
bin is where the executable scripts that come with Apache, such as apachectl
and apxs are located.
conf is where the Apache configuration files, such as httpd.conf are located.
htdocs
The Match forms are used for matching multiple resources using regular expressions.
and <DirectoryMatch>
These directives are used to match specific directories under the web document root.
<Directory>
<Directory "/usr/local/htdocs/php_examples">
This would match any directory under /usr/local/htdocs consisting of three capital
letters.
<Files>
and <FilesMatch>
These directives are used to match specific files under the web document root.
<Files "apache.gif">
and <LocationMatch>
These directives are used to match a URL. This means that the parameter does not have
to match the file system but may match files or directories.
<Location "php_examples">
and <LimitExcept>
These containers can be used to limit the scope of their effectiveness to the HTTP
methods specified.
<Limit GET POST>
The directives entered in this container will only apply to requests made using the GET
and POST HTTP methods.
<LimitExcept HEAD>
The directives entered in this container will apply to requests made using any HTTP
method except HEAD.
<VirtualHost>
This container allows one server to serve files for multiple domains or IP addresses. It is
possible to override server-level directives in a VirtualHost container. For example,
each virtual host can have its own logs and web document root.
5.3.3 Per-Directory Directives
Files can be placed in individual directories containing directives that will apply to that
directory and its subdirectories. The AllowOverride directive controls the types of
directives that can be placed in these files, while the AccessFileName directive specifies
what these files must be called. The default name is .htaccess.
5.3.4 Order Allow,Deny
One of the most common tasks that a server administrator will want to perform is to
allow or deny access to certain resources. This is achieved by the use of the Order,
Allow and Deny directives. Allow and Deny can be used to specify hosts or networks, by
domain or IP address and allow or deny access to them. Order is used to specify the
order in which the Allow and Deny directives are evaluated.
Deny from 192.168.12.122
This will deny from the host with the IP address 192.168.12.122.
Allow from 192.168
This will allow from all hosts with an IP address beginning with 192.168.
Deny from yallara.cs.rmit.edu.au
Order Deny,Allow
Deny from cs.rmit.edu.au
Allow from yallara.cs.rmit.edu.au
This will cause all hosts except yallara on RMITs computer science network to be
denied access.
Note that the Order directive uses the second argument to provide default access.
Order Allow,Deny
This will deny access to all hosts. While the default access does work, it is unclear and
should be avoided in favour of more explicit directives, such as below.
Order Allow,Deny
Deny from all
Useful Links
COSC1300
Web Servers
Table Of Contents
6. Useful Tools
n this chapter, we present you with a list of Web
resources that could be useful in your studies.
1. Introduction
2. Hypertext Transfer Protocol
2.1 Request Phase
2.2 Response Phase
3. Persistent Connections
4. Comparing HTTP/1.0 &
HTTP/1.1
5. Web Servers
6. Useful Tools
7. Related Links
In this section, you will find some useful software tools for setting up, configuring and
maintaining a web server.
Of course, you need Apache if you are going to install an Apache server.
If you think it is difficult to edit configuration files manually, try TkApache
Graphical User Interface.
Apache server is bundled with RedHat Linux. If you run Linux at home, you
might like to install using .rpm format. Instructions can be found here.
Alternatively, you can order RedHat Linux from http://www.lsl.com.au.
You can download PHP source code or binaries for many platforms including
Win32 from http://www.php.net/downloads.php. The Australian mirror is
http://au.php.net/downloads.php.
Zend optimizer is very useful add-on feature for a PHP-capable Apache server. It
optimizes intermediate PHP code, and enhance the server performance. Zend
Optimizer can be downloaded from http://www.zend.com.
Jigsaw Web Server is is W3Cs leading-edge Web server platform, providing a
sample HTTP 1.1 implementation and a variety of other features on top of an
advanced architecture implemented in Java. The W3C Jigsaw Activity statement
explains the motivation and future plans in more detail. Jigsaw is an W3C Open
Source Project, started May 1996.
7. Related Links
1. http://www.apacheweek.com - ApacheWeek Weekly online Magazine. Read this
to know about whats happening in the Apache world.
2. http://www-genome.wi.mit.edu/WWW/resource_guide.html Linoln Steins How
to setup and maintain a Web Site home page.
3. http://serverwatch.internet.com/webservers.html - Web server technical details &
server comparison.
4. http://www.w3.org/Talks/1998/10/WAP-NG-Overview - W3Cs presentation on
HTTP/ng (I guess HTTP/ng is not progressing).
5. http://Apache-Server.Com/tutorials - Ken Coars Apache tutorials (Author of
Apache Server for Dummies).
6. http://www8.org/w8-papers/5c-protocols/key/key.html -Key Differences between
HTTP/1.0 and HTTP/1.1 - A paper on HTTP/1.0 & HTTP/1.1
7. http://developer.netscape.com/docs/manuals/enterprise.html Netscape Enterprise
Server Documentation
8. http://www.microsoft.com/ISN/whitepapers.com - Web Hosting with IIS 5.0 - A
review of Internet Information Server 5.0.
9. http://www.irt.org/articles/js177/index.htm - Apache at your Web Service , an
IRT document on Apache.
10. http://www.devshed.com/Server_Side/PHP/SoothinglySeamless - Devsheds
Apache+PHP+SSL+MySql installation tutorial.
Contributors:
Santha Sumanasekara (santhas@cs.rmit.edu.au)
Michael Harris (miharris@cs.rmit.edu.au)
Web Servers
COSC1300 - Lecture Notes
Web Servers and Web Technology