Académique Documents
Professionnel Documents
Culture Documents
SEMINAR REPORT
Submitted in partial fulfillment of the requirements For the award of the degree of Master Of Computer Applications Sri Venkateswara University, Tirupati.
BY Y.PANDURANGASWAMY
M.C.A. IV SEMESTER.
2001 1
BY PANDURANGASWAMY.Y, (0199114). 2
CONTENTS
1. INTRODUCTION 2. WHAT IS WEB CACHE? 3. WHY CACHING IS IMPORTANT NOW 3.1. QUALITY OF SERVICE 3.2. TRAFFIC SURGE PROTECTION 3.3. OVERALL TRAFFIC REDUCTION 4. WHAT CACHING PRODUCTS LOOK LIKE 5. ASSURING DATA FRESHNESS 6. HOW CACHES ARE CONFIGURED. 6.1. PROXY CACHES 6.2. TRANSPARENT CACHES 6.3. SERVER ACCELERATORS 6.4. IMPROVING SITE ACCESS WITH DISTRIBUTED 7. CONCLUSION 8. BIBLIOGRAPHY
1.INTRODUCTION
World Wide Web technology (HTTP publishing and browsing) capacity in place within corporate Intranets and on the Internet. Web caching products are a key solution to this problem. Caches diminish the need for network bandwidth, typically by reducing the traffic from browsers to content servers. Caches can make even more dramatic improvements in the Quality of Service (QoS)
has become so popular that the increasing traffic volume threatens to overwhelm the network
for browser users by delivering content at higher bandwidth and reducing transmission delays (latency). Caches gather superior network management information that allows for smarter
network management.
A Web cache exploits some inalienable facts of life on an HTTP network. There are billions of Web pages out there, but only a small fraction of those pages (or objects on pages) are requested frequently. Many users request the same popular pages and objects. A simple example is the logo image at the top of most Amazon.com pages. This image must be delivered to a browser user every time the browser accesses one of Amazons pages, and these pages are requested tens of thousands of times a day. A Web cache is a dedicated computer system within the Internet that monitors Web object requests and stores objects it retrieves form a server. On subsequent requests for the same object, the cache delivers the object from its storage rather than passing the request on to the origin server. Every object changes over time, so each Web object has a useful life or freshness. Caches determine whether or not their copy an object is still fresh, or whether they need to retrieve a new copy from the origin server. The higher the number of people requesting the same object during its useful life, the more upstream traffic the cache eliminates. By handling object requests rather than passing them upstream to the origin server, caches reduce network traffic and improver the browser experience for users. Caches can be located anywhere on a network, and each cache will store a different set of objects based on the needs of the users it serves.
Caches come in all sizes. Caches for individual LAN servers cost as little as $1800, while the largest carrier class products for network peering points can run above $100,000.
EUROPE
A cache located closer to the browser delivers frequently requested content through fewer routers, thus reducing the potential for packet loss delays and speeding overall service. And because here are fewer service delays between a cache and a browser, a cache increases the transmission bandwidth to the browser for cached objects. With a streaming media object, for example, a cache detection a 1 MB bit connection to a browser will serve the media to the browser at that speed. On the other hand, a media server located across the country would detect less available bandwidth for the distance to the browser, and would therefore serve the object at a much slower speed, reducing the quality of the browser users experience. Caches implemented in the diverse geographic locations also minimize the distance that data has to travel, thereby reducing long-distance transmission costs.
effective the cache becomes at eliminating upstream traffic. Caches can completely eliminate upstream traffic surges caused by heavy demand for a few specific objects.
Because appliances have proprietary operating system, however, it can be more difficult to integrate them with additional hardware of software, or to combine their functions with those of other server-based programs that run on standard operating systems. For example, it would require some additional software engineering work to manipulate an appliance caches usage data with a Solaris-based software cache with other Solaris-based programs.
4.2. Software:
Caching software products run on standard operating system platforms such as UNIX and Window NT. Server hardware and operating system vendors such as Sun, Novell, and Microsoft offer their own caching products, and third parties like Inktomi offer products as well. Because caching software runs on the same operating system as other network management applications, the data generated by such a cache is easier to integrate with other network management functions. 5.ASSURING DATA FRESHNESS A cache stores objects, and objects change over time. The cache must therefore determine the freshness of each object and replace outdated ones as they change. All caches perform this function passively using one of three methods. A cache can
Pass a get if modified request to the server each time an object is requested. In between the cache and the server when the cache sends get if modified requests that prove unnecessary. To improve cache performance, some caching vendors are promoting the idea of active caching. Active caching takes on of the forms:
The cache automatically issues get if modified requests on its own when there is
no traffic to the server, and thereby continually builds a more accurate freshness model for its contents during low traffic periods. This reduces bandwidth between the cache and the server because it results in fewer file refreshes taking place. It also reduces the need for the cache to send get if modified requests when the server is busy, and so reduces server load at peak times. The network administrator can instruct the cache to refresh data at specific times or intervals. The cache can be configured to evaluate its logs for use behavior, and then refresh
the more popular data in anticipation that it will remain popular in the future baled on the level or frequency of demand or other criteria.
10
A cache can be configured as a proxy for browser users, or it can be transparent to browser users. Virtually all cache products can be configured to operate in either form. With some additional engineering by the vendor, caches can also be configured as server accelerators, or reverse proxy caches.
11
when a proxy server was installed or changed, which was a support headache at best for ISPs supporting thousands of users. Today, a user can configure the current version of either Navigator or Internet Explorer to locate a proxy without further user involvement. Another disadvantage of the proxy configuration is that the cache itself become another point of system failure: The cache server can crash and interrupt Internet access to all Intranet systems configured to use the proxy for access. The cache server can become overloaded and become and incremental performance limitation.
12
The main disadvantage of the transparent approach is that the cache must be placed at a choke point in the network through which all the network traffic to benefit from caching is guaranteed to pass. Using a transparent cache therefore requires an understanding of the traffic routing in place. However, HTTP also supports a get if modified request, where the request is fulfilled only if the object has been modified since the previous request for the same object. When a cache receives a request for an object that it has already stored, it sends a get if modified request. If the caches modification date for that object is older that the servers, the cache retrieves a new copy of the object. Use freshness data (such as the time expiration data on the objects header under
HTTP 1.1) to evaluate a stored object, and then retrieve a fresh copy of an object when its freshness expires. Apply heuristics to judge the life expectancy of each object based on the elapsed
time since the object was last modified. The heuristics approach is popular because it reduces the latency between the cache and a browser. When it retrieves an object from the server, the cache notes the Last Modified date on the object, and then assumes that the object has an additional useful life that is affixed percentage (10 %, for example) of the time elapsed since the last modification. So, for example, if an object was last modified 10 days before the cache
13
fetches it, the cache assumes that the object will grow stale in one more day, and the cache itself satisfies requests for that object for that day. When the freshness period elapses, the cache will return to the server to revalidate the objects freshness and obtain a new copy if the object has changed. Heuristics sometimes results in stale files being sent from the cache, because the 10 % additional freshness allowance sometimes proves to be too generous. On the other hand, heuristics can also result in unnecessary traffic. However, there are many network topologies where a suitable choke point is obvious, such as the place where a cache is next to an outgoing data line.
14
Server accelerator caches also help control costs. A cache implemented on a relatively cheap PC platform can offload processing from a much more expensive server platform such a SPARC station. On any platform, however, cache software is tuned to deliver data quickly and so will outperform a server using the same type of platform when it comes to satisfying requests for objects.
7.CONCLUSION
15
Web caching is a very effective technology and the uses of having it is a very good technology.There are difficulties posed by the current and proposed legal regime to the practice of caching. However it is a debate whether to have web caching technology or not? I am in the opinion that it would be better if we can have a technology in such a way that a method should be originated to find exactly the web counts or hits of a particular web page/site, even though the clients access the cached servers. If the above said condition is justified then we can conclude that WEB -CACHING is definitely a very good technology for the internet world
16
8.BIBLIOGRAPHY
Magazines: 1. 2. 3. Books: 1. 2. Sites : Web Caching by JET FRIENDS. by ROBERT FROST. Internet Protocol PC Quest Developer IQ Data quest
1. 2. 3.
17
18