Vous êtes sur la page 1sur 15

Introduction to Internet

The internet is a network of networks that connects computers all over the world. The
internet has its root in the U.S. military, which funded a network in 1969, called the ARPANET
(Advance Research Project Agency), to connect the computers at some of the college and
universities where military research took place. As more computers connected, the ARPANET
was replaced by the NSFNET, which was run by the National Science Foundation Network. By
the late 1990’s, the internet had shed its military and research heritage and was available for use
by the general public. Internet service Provider (ISP’s) began offering dial-up Internet accounts
for a monthly fee, giving users access to e-mail, discussion groups, and file transfer. In 1989, the
World Wide Web (an Internet based system of interlinked pages of information) was born, and in
the early 1990’s, the combination of e-mail the web and online chat propelled the internet into
national and international prominence.

Computers connected to the internet communicate by using the Internet Protocol (IP), which
slices information onto packages (chunks of data to be transmitted separately) ands routes them
to their destination. One definition of the Internet is all the computers that pass packets to each
other by using IP. Along with IP, most computers on the Internet communicate with
Transmission Control Protocol (TCP), and the combination is called TCP/IP.

HISTORY OF INTERNET AND WWW


The history of the Internet is best explained via a timeline. While the timeline begins in 1969,
here some general comments on the 1960’s are presented. The history of the Internet is
fascinating both for itself and as a case of technological innovation.

1960s Telecommunications
Essential to the early Internet concept was packet switching, in which data to be transmitted
is divided into small packets of information and labeled to identify the sender and recipient. The
packets were sent over a network and the reassembled at their destination. If any packet did not
arrive or was not intact, the original sender was requested to resend the packet. Prior to packet
switching, the less efficient circuit switching method of data transmission was used. In the early
1960s, several papers on packet switching theory were written, laying the groundwork for
computer networking as it exists today.

 ARPANET, 1969

In 1969, Bolt, Beranek, and Newman, Inc., (BBN) designed a network called the Advance
Research Projects Agency Network (ARPANET) for the United Stated Department of Defense.
The military created ARPA 1 to enable researchers to share “Super computing” power. Initially
only 4 nodes (or Hosts) comprised the ARPANET. They were located at the University of
California at Los Angeles, the University of California at Santa Barbara, the University of Utah,
and the Stansford Research Institute. The ARPANET later became known as the Internet.

1970s Telecommunications
In this decade, the ARPANET was used primarily by the military, some of the larger companies,
such as IBM, and Universities for e-mail. The general population was not yet connected to the
system and very few people were on-line at work.

The use of local area networks (LANs) became more prevalent during the 1970s. Also, the idea
of an open architecture was promoted; that is, networks making up the ARPANET could have
any designs. In later years, this concept had a tremendous impact on the growth of the
ARPANET.

 1972

By 1972, the ARPANET was international, with nodes in Europe at the University College in
London, England, and the Royal Radar Establishment in Norway. The number of nodes on the
network was upto 23, and the trend would be for that number to double every year from then on.
Ray Tomlinson, who worked at BBM, invented e-mail.

 UUCP, 1976

AT&T Bell Labs developed UNIX to UNIX copy (UUCP). In 1977, UUCP was distributed with
UNIX.

 USENET, 1979

User Network (USENET) was started by using UUCP to connect Duke University and the
University of North Carolina at Chapel Hill. Newsgroups emerged from this early development.

1980s Telecommunications
In this decade, Transmission Control Protocol/Internet Protocol (TCP/IP), a set of rules
governing how networks making up the ARPANET communicate, was established. For the first
time, the term “Internet” was being used to describe the ARPANET. Security became a concern,
as viruses appeared and electronic breaking occurred.

The 1980s swathe Internet grow beyond being predominantly research oriented to including
business applications and supporting a wide range of users. As the Internet became larger, the
Domain Name Systems (DNS) was developed, to allow the network to expand more easily by
assigning names to host computers in a distributed fashion.

 CSNET, 1980
The Computer Science Network (CSNET) connected all University computer science
departments in the United States. Computer science departments were relatively new, and only a
limited number existed in 1980. CSNET joined the ARPANET in 1981.

 BITNET, 1981

The Because It’s Time Network (BITNET) formed at the City University of New York and
connected to You University. Many mailing lists originated with BITNET.

 TCP/IP, 1983

The United States Defense Communications Agency required the TCP/IP be used for all
ARPANET hosts. Since TCP/IP was distributed at no charge, the Internet became what is called
an open system. This allowed the Internet to grow quickly, as all connected computers were now
“speaking the same language.” Central administration was no longer necessary to run the
network.

 NSFNET, 1985

The National Science Foundation Network (NSFNET) was formed to connect the National
Science Foundation’s five super computing centers. This allowed researchers to access the most
powerful computers in the world, at a time when large, powerful, and expensive computers were
a rarity and generally inaccessible.

 The Internet Worm and IRC, 1988

The Internet Worm (created by Robert Morris while he was a computer science graduate student
at Cornell University) was released. It infected 10 percent of all Internet hosts. Also in this year,
Internet Relay Chat (IRC) was written by Jarkko Oikarinen.

 NSFs control of the ARPANET, 1988

NSF took over control of the ARAPNET in 1989. This change over went unnoticed by nearly all
users. Also, the number of hosts on the Internet exceeded the 100,000 mark.

1990s Telecommunications
During the 1990s, lots of commercial organizations started getting on-line. This stimulated the
growth of the internet like never before. Graphical browsing tools were developed, and the
programming language HTML allowed users all over the world to publish on what was called
the World Wide Web. Millions of people net online to work, shop, bank and be entertained. The
Internet played a much more significant role in society, as many non technical users from all
walks of life got involved with computers.

 GOPHER, 1991
Gopher was developed at the University of Minnesota. Gopher allows to fetch files on the
internet using a menu based system.

 World Wide Web, 1991

The World Wide Web (WWW) was created by Tim Berners –Lee at CERN ( a French acronym
for the European Laboratory for Particle Physics), as a simple way to publish information and
make it available on the Internet.

 WWW, 1992

The interesting nature of the Web caused it to spread, and it became available to the public in
1992.

 Mosaic, 1992

Mosaic, a graphical browser for the Web, was released by Marc Andreesen and several other
graduate students at the University of Illinois. Mosaic was first released under X Windows and
graphical UNIX.

 Netscape Communication, 1994

The company called Netscape Communications, formed by Jim Clark, released Netscape
Navigator, a Web browser that captured the imagination of everyone who used it.

 Yahoo!, 1994

Stansford graduate students David Filo and Jerry Yang developed their Internet search engine
and directory called Yahoo!

 Java, 1995

The Internet programming environment, Java, was released by Sun Microsystems, Inc. This
language, originally called Oak, allowed programmers to develop Web pages that were more
interactive.

 Microsoft discovers the Internet, 1995

The software giant committed many of its resources to developing its browsers, Microsoft
Internet Explorer, and Internet applications.
Hyper Text Transfer Protocol
HTTP Protocol

The Hypertext Transfer Protocol (HTTP) is an application-level TCP/IP based protocol with the
lightness and speed necessary for distributed, collaborative, hypermedia information systems
(internet).

HTTP stands for Hypertext Transfer Protocol 

HTTP Overview

HTTP stands for Hypertext Transfer Protocol. It is an TCP/IP based communication protocol


which is used to deliver virtually all files and other data, collectively called resources, on the
World Wide Web. These resources could be HTML files, image files, query results, or anything
else.

A browser is works as an HTTP client because it sends requests to an HTTP server which is
called Web server. The Web Server then sends responses back to the client. The standard and
default port for HTTP servers to listen on is 80 but it can be changed to any other port like 8080
etc.

There are three important things about HTTP of which you should be aware:

 HTTP is connectionless: After a request is made, the client disconnects from the server
and waits for a response. The server must re-establish the connection after it process the
request.

 HTTP is media independent: Any type of data can be sent by HTTP as long as both the
client and server know how to handle the data content. How content is handled is
determined by the MIME specification.

 HTTP is stateless: This is a direct result of HTTP's being connectionless. The server and
client are aware of each other only during a request. Afterwards, each forgets the other.
For this reason neither the client nor the browser can retain information between different
requests across the web pages.
Following diagram shows where HTTP Protocol fits in communication:

HTTP Message Structure

Like most network protocols, HTTP uses the client-server model: An HTTP client opens a
connection and sends a request message to an HTTP server; the server then returns a response
message, usually containing the resource that was requested. After delivering the response, the
server closes the connection.

The format of the request and response messages is similar and will have following structure:

 An initial line CRLF


 Zero or more header lines CRLF
 A blank line ie. a CRLF
 An optional message body like file, query data or query output.

Initial lines and headers should end in CRLF. More exactly, CR and LF here mean ASCII values
13 and 10.

Initial Line: Request

The initial line is different for the request than for the response. A request line has three parts,
separated by spaces:

 An HTTP Method Name


 The local path of the requested resource.
 The version of HTTP being used.

Here is an example of initial line for Request Message.

GET /path/to/file/index.html HTTP/1.0

 GET is the most common HTTP method. Other methods could be POST, HEAD etc.
 The path is the part of the URL after the host name. This path is also called the request
Uniform Resource Identifier (URI). A URI is like a URL, but more general.
 The HTTP version always takes the form "HTTP/x.x", uppercase.
Initial Line: Response

The initial response line, called the status line, also has three parts separated by spaces:

 The version of HTTP being used.


 A response status code that gives the result of the request.
 An English reason phrase describing the status code.

Here is an example of initial line for Response Message.

HTTP/1.0 200 OK

or

HTTP/1.0 404 Not Found

Header Lines

Header lines provide information about the request or response, or about the object sent in the
message body.

The header lines are in the usual text header format, which is: one line per header, of the form
"Header-Name: value", ending with CRLF. It's the same format used for email and news
postings, defined in RFC 822.

 A header line should end in CRLF, but you should handle LF correctly.
 The header name is not case-sensitive.
 Any number of spaces or tabs may be between the ":" and the value.
 Header lines beginning with space or tab are actually part of the previous header line,
folded into multiple lines for easy reading.

Here is an example of one header line

User-agent: Mozilla/3.0Gold

or

Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT

The Message Body

An HTTP message may have a body of data sent after the header lines. In a response, this is
where the requested resource is returned to the client (the most common use of the message
body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data
or uploaded files are sent to the server.
If an HTTP message includes a body, there are usually header lines in the message that describe
the body. In particular:

 The Content-Type: header gives the MIME-type of the data in the body, such


as text/html or image/gif.
 The Content-Length: header gives the number of bytes in the body.

HTTP Methods

 The GET Method

The GET method means retrieves whatever information (in the form of an entity) is identified by
the Request-URI. If the Request-URI refers to a data-producing process, it is the produced data
which shall be returned as the entity in the response and not the source text of the process, unless
that text happens to be the output of the process.

A conditional GET method requests that the identified resource be transferred only if it has been
modified since the date given by the If-Modified-Since header. The conditional GET method is
intended to reduce network usage by allowing cached entities to be refreshed without requiring
multiple requests or transferring unnecessary data.

The GET method can also be used to submit forms. The form data is URL-encoded and
appended to the request URI

 The HEAD Method

A HEAD request is just like a GET request, except it asks the server to return the response
headers only, and not the actual resource (i.e. no message body). This is useful to check
characteristics of a resource without actually downloading it, thus saving bandwidth. Use HEAD
when you don't actually need a file's contents.

The response to a HEAD request must never contain a message body, just the status line and
headers.

 The POST Method

A POST request is used to send data to the server to be processed in some way, like by a CGI
script. A POST request is different from a GET request in the following ways:

 There's a block of data sent with the request, in the message body. There are usually extra
headers to describe this message body, like Content-Type: and Content-Length:
 The request URI is not a resource to retrieve; it's usually a program to handle the data
you're sending.
 The HTTP response is normally program output, not a static file.

The most common use of POST, by far, is to submit HTML form data to CGI scripts. In this
case, the Content-Type: header is usually application/x-www-form-urlencoded, and
the Content-Length: header gives the length of the URL-encoded form data. The CGI script
receives the message body through STDIN, and decodes it. Here's a typical form submission,
using POST:

POST /path/script.cgi HTTP/1.0


From: frog@jmarshall.com
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 32

home=Mosby&favorite+flavor=flies

 GET vs. POST Methods

If you were writing a CGI script directly i.e. not using PHP, but Perl, Shell, C, or another
language you would have to pay attention to where you get the user's value/variable
combinations. In the case of GET you would use the QUERY_STRING environment variable
and in the case of POST you would use the CONTENT_LENGTH environment variable to
control your iteration as you parsed for special characters to extract a variable and its value.

POST Method:

 Query length can be unlimited (unlike in GET)

 Is used to send a chunk of data to the server to be processed.

 You can send entire files using post.

 Your form data is attached to the end of the POST request (as opposed to the URL).

 Not as quick and easy as using GET, but more versatile (provided that you are writing the
CGI directly).

GET Method :

 Your entire form submission can be encapsulated in one URL, like a hyperlink so can
store a query by a just a URL

 You can access the CGI program with a query without using a form.
 Fully includes it in the URL: http://myhost.com/mypath/myscript.cgi?
name1=value1&name2=value2.

 Is how your browser downloads most files.

 Don't use GET if you want to log each request.

 Is used to get a file or other resource.

HTTP Header Fields

Header lines provide information about the request or response, or about the object sent in the
message body.

 Allow

The Allow entity-header field lists the set of methods supported by the resource identified by the
Request-URI. The purpose of this field is strictly to inform the recipient of valid methods
associated with the resource.

Example

Allow: GET, HEAD

 Authorization

The Authorization field value consists of credentials containing the authentication information of
the user agent for the realm of the resource being requested.

Example

Authorization : credentials

 Content-Encoding

The Content-Encoding entity-header field is used as a modifier to the media-type. When present,
its value indicates what additional content coding has been applied to the resource, and thus what
decoding mechanism must be applied in order to obtain the media-type referenced by the
Content-Type header field. The Content-Encoding is primarily used to allow a document to be
compressed without losing the identity of its underlying media type.

Example

Content-Encoding: x-gzip
 Content-Length

The Content-Length entity-header field indicates the size of the Entity-Body, in decimal number
of octets, sent to the recipient or, in the case of the HEAD method, the size of the Entity-Body
that would have been sent had the request been a GET.

Example

Content-Length: 3495

 Content-Type

The Content-Type entity-header field indicates the media type of the Entity-Body sent to the
recipient or, in the case of the HEAD method, the media type that would have been sent had the
request been a GET.

Example

Content-Type: text/html

 Date

The Date general-header field represents the date and time at which the message was originated,
having the same semantics as orig-date in RFC 822.

Example

Date: Tue, 15 Nov 1994 08:12:31 GMT

 Expires

The Expires entity-header field gives the date/time after which the entity should be considered
stale. This allows information providers to suggest the volatility of the resource, or a date after
which the information may no longer be valid.

Example

Expires: Thu, 01 Dec 1994 16:00:00 GMT

 From

The From request-header field, if given, should contain an Internet e-mail address for the human
user who controls the requesting user agent. The address should be machine-usable, as defined
by mailbox in RFC 822.
Example

From: webmaster@w3.org

 If-Modified-Since

The If-Modified-Since request-header field is used with the GET method to make it conditional:
if the requested resource has not been modified since the time specified in this field, a copy of
the resource will not be returned from the server; instead, a 304 (not modified) response will be
returned without any Entity-Body.

Example

If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT

 Last-Modified

The Last-Modified entity-header field indicates the date and time at which the sender believes
the resource was last modified.

Example

Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT

 Location

The Location response-header field defines the exact location of the resource that was identified
by the Request-URI. For 3xx responses, the location must indicate the server's preferred URL for
automatic redirection to the resource. Only one absolute URL is allowed.

Example

Location: http://www.w3.org/hypertext/WWW/NewLocation.html

 Pragma

The Pragma general-header field is used to include implementation-specific directives that may
apply to any recipient along the request/response chain. All pragma directives specify optional
behavior from the viewpoint of the protocol; however, some systems may require that behavior
be consistent with the directives.

Example

Pragma = "Pragma" ":" 1#pragma-directive


pragma-directive = "no-cache" | extension-pragma

extension-pragma = token [ "=" word ]

 Referer

The Referer request-header field allows the client to specify, for the server's benefit, the address
(URI) of the resource from which the Request-URI was obtained.

Example

Referer: http://www.w3.org/hypertext/DataSources/Overview.html

 Server

The Server response-header field contains information about the software used by the origin
server to handle the request. The field can contain multiple product tokens and comments
identifying the server and any significant subproducts.

Example

Server: CERN/3.0 libwww/2.17

 User-Agent

The User-Agent request-header field contains information about the user agent originating the
request. This is for statistical purposes, the tracing of protocol violations, and automated
recognition of user agents for the sake of tailoring responses to avoid particular user agent
limitations.

Example

User-Agent: CERN-LineMode/2.15 libwww/2.17b3

 WWW-Authenticate

The WWW-Authenticate response-header field must be included in 401 (unauthorized) response


messages. The field value consists of at least one challenge that indicates the authentication
scheme(s) and parameters applicable to the Request-URI.

Example

WWW-Authenticate = "WWW-Authenticate" ":" 1#challenge


TWO MARK QUESTIONS
1. Define Internet.

The Internet is a global system of interconnected computer networks that use the


standard Internet Protocol Suite (TCP/IP) to serve billions of users worldwide. It is a network of
networks that consists of millions of private, public, academic, business, and government
networks, of local to global scope, that are linked by a broad array of electronic and optical
networking technologies. The Internet carries a vast range of information resources and services,
such as the inter-linked hypertext documents of the World Wide Web (WWW) and the
infrastructure to support electronic mail.

2. Define Protocol.

A computing, protocol is a formal description of digital message formats and the rules for
exchanging those messages in or between computing systems and in telecommunications.
Protocols may include signaling, authentication and error detection and correction capabilities. A
protocol describes the syntax, semantics, and synchronization of communication and may be
implemented in hardware or software, or both.

3. Difference between Computer networks and Distributed systems.

Computer Networks: Computers are connected generally in the same physical location, using
different styles e.g. Token ring, Star, Serial connection, etc. Also called as LAN, Local Area
Network.

Distributed Systems: Also can be considered as a type of computer network but in a much large
scale. The key distinction is that in a distributed system, a collection of independent computers
appears to its users as single coherent systems. Usually, it has a single model or paradigm that it
presents to the users. In computer networks, this coherence, model, or software are absent. Users
are exposed to the actual machines, without any attempt by the system to make the machine look
and act in a coherent way.

4. Define World Wide Web

The World Wide Web, abbreviated as WWW and commonly known as the Web, is a system of
interlinked hypertext documents accessed via the Internet. With a web browser, one can
view web pages that may contain text, images, videos, and other multimedia and navigate
between them via hyperlinks.

5. What is W3C? Who is the founder of W3C?

W3C Stands for the World Wide Web Consortium. W3C was created in October 1994 by Tim
Berners-Lee (W3C was created by the Inventor of the Web).
W3C is working to make the Web accessible to all users (despite differences in culture,
education, ability, resources, and physical limitations). W3C also coordinates its work with many
other standards organizations such as the Internet Engineering Task Force, the Wireless
Application Protocols (WAP) Forum and the Unicode Consortium.

Vous aimerez peut-être aussi