Web Cache

INTERNET ACCESSING TECHNOLOGY
SEMINAR REPORT
Submitted in partial fulfillment of the requirements For the award of the degree of Master Of Computer Applications Sri Venkateswara University, Tirupati.
BY Y.PANDURANGASWAMY
M.C.A. IV SEMESTER.
Under the valuable guidance of

Mrs. N.V. Muthu Lakshmi
Department of Computer Applications
Department Of Computer Applications Sri Venkateswara University College of Engineering Tirupati
2001 1
SEMINAR REPORT ON INTERNET ACCESSING TECHNOLOGY
BY PANDURANGASWAMY.Y, (0199114). 2
CONTENTS
1. INTRODUCTION 2. WHAT IS WEB CACHE? 3. WHY CACHING IS IMPORTANT NOW 3.1. QUALITY OF SERVICE 3.2. TRAFFIC SURGE PROTECTION 3.3. OVERALL TRAFFIC REDUCTION 4. WHAT CACHING PRODUCTS LOOK LIKE 5. ASSURING DATA FRESHNESS 6. HOW CACHES ARE CONFIGURED. 6.1. PROXY CACHES 6.2. TRANSPARENT CACHES 6.3. SERVER ACCELERATORS 6.4. IMPROVING SITE ACCESS WITH DISTRIBUTED 7. CONCLUSION 8. BIBLIOGRAPHY
1.INTRODUCTION
World Wide Web technology (HTTP publishing and browsing) capacity in place within corporate Intranets and on the Internet. Web caching products are a key solution to this problem. Caches diminish the need for network bandwidth, typically by reducing the traffic from browsers to content servers. Caches can make even more dramatic improvements in the Quality of Service (QoS)
has become so popular that the increasing traffic volume threatens to overwhelm the network
for browser users by delivering content at higher bandwidth and reducing transmission delays (latency). Caches gather superior network management information that allows for smarter
network management.
2.WHAT IS A WEB CACHE ?
A Web cache exploits some inalienable facts of life on an HTTP network. There are billions of Web pages out there, but only a small fraction of those pages (or objects on pages) are requested frequently. Many users request the same popular pages and objects. A simple example is the logo image at the top of most Amazon.com pages. This image must be delivered to a browser user every time the browser accesses one of Amazons pages, and these pages are requested tens of thousands of times a day. A Web cache is a dedicated computer system within the Internet that monitors Web object requests and stores objects it retrieves form a server. On subsequent requests for the same object, the cache delivers the object from its storage rather than passing the request on to the origin server. Every object changes over time, so each Web object has a useful life or freshness. Caches determine whether or not their copy an object is still fresh, or whether they need to retrieve a new copy from the origin server. The higher the number of people requesting the same object during its useful life, the more upstream traffic the cache eliminates. By handling object requests rather than passing them upstream to the origin server, caches reduce network traffic and improver the browser experience for users. Caches can be located anywhere on a network, and each cache will store a different set of objects based on the needs of the users it serves.
Caches come in all sizes. Caches for individual LAN servers cost as little as $1800, while the largest carrier class products for network peering points can run above $100,000.
3.WHY CACHING IS IMPORTANT NOW

Web caching is a promising approach to the problem of rising Internet and Intranet traffic volume for three main reasons: quality of service, surge protection, and overall traffic reduction.
EUROPE
3.1. Quality of service

Web object requests that come from long distances can take longer to satisfy because there are more network routers to be traversed between browser and server, and any router in the path can become overwhelmed with traffic. The network connection between browser and server is actually a series of links, and the speed of the connection is limited to the speed of the slowest link in the path. When a router reaches capacity, it drops data packets it cant handle and requests that they be retransmitted, thereby causing a delay. A cross-country browser-server link can pass through as many as 20 routers.
A cache located closer to the browser delivers frequently requested content through fewer routers, thus reducing the potential for packet loss delays and speeding overall service. And because here are fewer service delays between a cache and a browser, a cache increases the transmission bandwidth to the browser for cached objects. With a streaming media object, for example, a cache detection a 1 MB bit connection to a browser will serve the media to the browser at that speed. On the other hand, a media server located across the country would detect less available bandwidth for the distance to the browser, and would therefore serve the object at a much slower speed, reducing the quality of the browser users experience. Caches implemented in the diverse geographic locations also minimize the distance that data has to travel, thereby reducing long-distance transmission costs.
3.2. Traffic surge protection

Caches also help to reduce bandwidth demands during network traffic surges. Surges occur when a very large group of users wants access to the same small number of pages. For example, the release of the Starr report In 1998 overwhelmed capacity due to the volume of people requesting access to the same objects, causing a network traffic jam that took hours to clear. Surges can swamp any portion of a network, regardless of the networks bandwidth. However, the more users requesting the same objects, the more likely it is that a cache will alleviate the problem. The more users wanting the same object, the more likely it is that the objects will be stored in the cache, and the more
effective the cache becomes at eliminating upstream traffic. Caches can completely eliminate upstream traffic surges caused by heavy demand for a few specific objects.
3.3. Overall traffic reduction

The more people who use Web browsers, the higher the probability that any cache will receive requests for the same objects. Once again, the more users per cache, the more efficient the cache is at reducing upstream traffic. As a rule of thumb, caches have a hit rate of 35%, which means that 35% of content requested through them can be successfully cached, and they therefore reduce upstream traffic on the network by that same percentage. 4.WHAT CACHING PRODUCTS LOOK LIKE Web caching products come in two forms: appliances and software.
4.1. Caching Appliances:

Caching appliances from companies like Net Apps and Cache Flow integrate caching software with a hardware platform and a proprietary operating system. These devices are designed for easy setup and require minimal administration. Many appliances are rackmountable like routers and other network devices.
Because appliances have proprietary operating system, however, it can be more difficult to integrate them with additional hardware of software, or to combine their functions with those of other server-based programs that run on standard operating systems. For example, it would require some additional software engineering work to manipulate an appliance caches usage data with a Solaris-based software cache with other Solaris-based programs.
4.2. Software:
Caching software products run on standard operating system platforms such as UNIX and Window NT. Server hardware and operating system vendors such as Sun, Novell, and Microsoft offer their own caching products, and third parties like Inktomi offer products as well. Because caching software runs on the same operating system as other network management applications, the data generated by such a cache is easier to integrate with other network management functions. 5.ASSURING DATA FRESHNESS A cache stores objects, and objects change over time. The cache must therefore determine the freshness of each object and replace outdated ones as they change. All caches perform this function passively using one of three methods. A cache can
Pass a get if modified request to the server each time an object is requested. In between the cache and the server when the cache sends get if modified requests that prove unnecessary. To improve cache performance, some caching vendors are promoting the idea of active caching. Active caching takes on of the forms:
HTTP, a standard object is called a get
The cache automatically issues get if modified requests on its own when there is
no traffic to the server, and thereby continually builds a more accurate freshness model for its contents during low traffic periods. This reduces bandwidth between the cache and the server because it results in fewer file refreshes taking place. It also reduces the need for the cache to send get if modified requests when the server is busy, and so reduces server load at peak times. The network administrator can instruct the cache to refresh data at specific times or intervals. The cache can be configured to evaluate its logs for use behavior, and then refresh
the more popular data in anticipation that it will remain popular in the future baled on the level or frequency of demand or other criteria.
6.HOW CACHES ARE CONFIGURED ?
10
A cache can be configured as a proxy for browser users, or it can be transparent to browser users. Virtually all cache products can be configured to operate in either form. With some additional engineering by the vendor, caches can also be configured as server accelerators, or reverse proxy caches.
6.1. Proxy caches

A proxy cache operates by explicitly cooperating with the browser. Rather than direction HTTP requests to a target Web-server, the browser directs them to the cache. The cache then either satisfies the request itself or passes on request to the server as a proxy for the browser (hence the name). Proxy caches are particularly useful on enterprise Intranets, where they serve as a firewall that protects Intranet servers against attacks from the Internet. Linking an Intranet to the Internet offers a companys users direct access to everything out there, but it also means exposing internal systems to attack from the Internet. With a proxy server, only the proxy server system need be literally on the Internet, and all the internal systems are on a relatively isolated and protected Intranet. The proxy server can then enforce specific policies for external access to the Intranet. The most obvious disadvantage of the proxy configuration is that each browser must be explicitly configured to use it. Earlier browsers required manual user set-up changes
11
when a proxy server was installed or changed, which was a support headache at best for ISPs supporting thousands of users. Today, a user can configure the current version of either Navigator or Internet Explorer to locate a proxy without further user involvement. Another disadvantage of the proxy configuration is that the cache itself become another point of system failure: The cache server can crash and interrupt Internet access to all Intranet systems configured to use the proxy for access. The cache server can become overloaded and become and incremental performance limitation.
6.2. Transparent caches

A transparent cache is so named because it works by intercepting the network traffic transparently to the browser. In this mode, the cache short-circuits the retrieval process if the desired file is in the cache. Transparent caches are especially useful to ISPs because they require no browser set-up modification. Transparent caches are also the simplest way to use a cache internally on a network (at peering-hand off points between an ISP and a larger network, form example), because they dont require explicit coordination with other caches.
12
The main disadvantage of the transparent approach is that the cache must be placed at a choke point in the network through which all the network traffic to benefit from caching is guaranteed to pass. Using a transparent cache therefore requires an understanding of the traffic routing in place. However, HTTP also supports a get if modified request, where the request is fulfilled only if the object has been modified since the previous request for the same object. When a cache receives a request for an object that it has already stored, it sends a get if modified request. If the caches modification date for that object is older that the servers, the cache retrieves a new copy of the object. Use freshness data (such as the time expiration data on the objects header under
HTTP 1.1) to evaluate a stored object, and then retrieve a fresh copy of an object when its freshness expires. Apply heuristics to judge the life expectancy of each object based on the elapsed
time since the object was last modified. The heuristics approach is popular because it reduces the latency between the cache and a browser. When it retrieves an object from the server, the cache notes the Last Modified date on the object, and then assumes that the object has an additional useful life that is affixed percentage (10 %, for example) of the time elapsed since the last modification. So, for example, if an object was last modified 10 days before the cache
13
fetches it, the cache assumes that the object will grow stale in one more day, and the cache itself satisfies requests for that object for that day. When the freshness period elapses, the cache will return to the server to revalidate the objects freshness and obtain a new copy if the object has changed. Heuristics sometimes results in stale files being sent from the cache, because the 10 % additional freshness allowance sometimes proves to be too generous. On the other hand, heuristics can also result in unnecessary traffic. However, there are many network topologies where a suitable choke point is obvious, such as the place where a cache is next to an outgoing data line.
6.3. Server accelerators

A server accelerator is a proxy cache that stands in for one or more specific Web servers rather than working on behalf of a group of browser users. A server accelerator (or reverse proxy cache) is so named because it reduces load on a particular server or server cluster, rather than reducing upstream bandwidth on a network. With a server accelerator cache, the cache intercepts all requests for one or more servers, caches a copy of the objects served, and then serves those objects when it next receives requests for them. By serving frequently requested content itself, the cache relieves the origin servers of this load, freeing up processing power on those servers for other tasks.
14
Server accelerator caches also help control costs. A cache implemented on a relatively cheap PC platform can offload processing from a much more expensive server platform such a SPARC station. On any platform, however, cache software is tuned to deliver data quickly and so will outperform a server using the same type of platform when it comes to satisfying requests for objects.
6.4. Improving site access with distributed accelerators

Most Web content is stored on servers in the United States, and the cost of retrieving data from them increases with distance. By adding multiple server accelerator caches to the network in Europe, Asia, or Latin America and by integrating a traffic distributor device such as Cisco Distributed Director, U.S. Web site operators can reduce transmission costs and improve the quality of service.
7.CONCLUSION
15
Web caching is a very effective technology and the uses of having it is a very good technology.There are difficulties posed by the current and proposed legal regime to the practice of caching. However it is a debate whether to have web caching technology or not? I am in the opinion that it would be better if we can have a technology in such a way that a method should be originated to find exactly the web counts or hits of a particular web page/site, even though the clients access the cached servers. If the above said condition is justified then we can conclude that WEB -CACHING is definitely a very good technology for the internet world
16
8.BIBLIOGRAPHY
Magazines: 1. 2. 3. Books: 1. 2. Sites : Web Caching by JET FRIENDS. by ROBERT FROST. Internet Protocol PC Quest Developer IQ Data quest
1. 2. 3.
www.developeriq.com www.web_cache.com www.webcaching.com
17
18

Web Cache

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Web Cache

Transféré par

Droits d'auteur :

Formats disponibles

INTERNET ACCESSING TECHNOLOGY

Under the valuable guidance of

Department Of Computer Applications Sri Venkateswara University College of Engineering Tirupati

SEMINAR REPORT ON INTERNET ACCESSING TECHNOLOGY

2.WHAT IS A WEB CACHE ?

3.WHY CACHING IS IMPORTANT NOW

3.1. Quality of service

3.2. Traffic surge protection

3.3. Overall traffic reduction

4.1. Caching Appliances:

HTTP, a standard object is called a get

6.HOW CACHES ARE CONFIGURED ?

6.1. Proxy caches

6.2. Transparent caches

6.3. Server accelerators

6.4. Improving site access with distributed accelerators

www.developeriq.com www.web_cache.com www.webcaching.com

Vous aimerez peut-être aussi