Académique Documents
Professionnel Documents
Culture Documents
INTRODUCTION
(ii)
Keeping in cache only those websites that give a better hit ratio than
the existing ones.
rural
people are less, so they do not access internet arbitrarily, rather they stick to their
necessities. For ex, the requirements of the villagers vary between health, agriculture,
job, education and governmental functionaries. They rarely go for using internet
beyond that.
Isaacman and Martonosi (2008) have discussed extensively on this issue.
According to them, as the internet becomes more pervasive in everyday life in the
developed world, those who wish to be competitive in modern markets must have
access to its information. Those unable to harness the internets vast resources will be
disadvantaged through a lack of powerful tools for communication, healthcare, and
we make the web caching system a bit intelligent, i.e. the system would in one way,
read the users mind and act accordingly. To make a system intelligent, the only way
is to apply Genetic Algorithms or similar technology. This helps the system to be
simpler in design, yet increases the efficiency of the system. It eliminates the need for
any architectural reconfiguration. In other words, it enhances the internal behavior of
the system, in our case, it is the web cache.
rural
people are less, so they do not access internet arbitrarily, rather they stick to their
necessities. For ex, the requirements of the villagers vary between health, agriculture,
job, education and governmental functionaries. They rarely go for using internet
beyond that.
We have observed that cache is the key of all these. So improving the performance of
cache improves the efficiency of all these techniques. But cache is inherently a hash
table in functionality and configuration. So we emphasize on improving the
performance of a hash table so that the same principles can be applicable to a cache
also. Our work is based on the following system model.
w1
1.3 Methodology
We have assumed that the websites accessed are named w1, w2, w3, w4 .wn.
Individual web pages be named w11, w12, w13 w1m and similarly w21, w22, w23.
etc. Each incoming web page is indexed according to its ip address. Each ip address is
a number which can be expressed in binary form. Here we assume that the index
numbers are consecutive so that each new website would be given the next binary
number in succession. For example, Suppose if a website W xy is represented by a
binary number m, then the next website Wxy+1 be represented by the next
consecutive number m+1 so that substitution operations in the chromosomes would be
traceable and would result in a valid website address. Suppose P be the set of websites
or the population of our study. Each time a new website comes, let the population be
changed to P1. Corresponding hit ratio of the current population be studied each time.
The best possible population is chosen at the end.
----- (F1)
(F2)
x is a prime number, M x< 2M, p, q are any two random integers, 0<p<x, 0<q<x.
Ultimately, the GEP has been chosen to be the basis of this project. We have
developed the following algorithm to carry out the task. Chromosome chosen is the
above hash function F (2).
Collaborative caching and Pre-fetching in the internet is the basis of this research
paper. Potential for these techniques have been elaborately discussed in Isaacman and
Martonosi [1]. They have coined the term in the pretext of largely disconnecting
internet in remote villages. Distributed Collaborative Caching for Proxy Servers
Discussed in Kasbekar and Desai [2] proposes distributed proxy server and WWW
clients, being implemented on a network of Sun workstations running Solaris2.5.1
connected. This system writes a wrapper for the client. If the reply from the remote
server implies no change in the document since the last update date supplied, the
document received from the user stub is sent to the requesting client. On the other
hand, if the remote server sends the document, it is sent to the requesting client, and
the proxys index of the locally cached documents is updated.
The success and popularity of social network systems, such as del.icio.us, Facebook,
MySpace, and YouTube, have generated many interesting and challenging problems
to the research community. So Li, Guo and Yihong Zhao [3] discuss among others,
discovering social interests shared by groups of users is very important. The main
challenge to solving this problem comes from the difficulty of detecting and
representing the interest of the users. The existing approaches are all based on the
online connections of users and so unable to identify the common interest of users
who have no online connections. In this paper, a social interest discovery approach is
suggested based on user-generated tags. The authors have developed an Internet
Social Interest Discovery system, ISID, which can effectively cluster similar
documents by interest topics and discover user communities with common interests
no matter if they have any online connections. CoCache, as discussed by Qian, Xu,
Zhou, and Zhou [4], is a Query processing system based on collaborative caching
technology in a peer-to-peer environment. But it is different from existing P2P
systems in the sense that both the caching process and query processing are fully
decentralized based on a distributed hash table (DHT) scheme, called CON
(Coordinator Overlay Network). Query answering performance is improved greatly
with low overhead for maintaining CON. Systematic P2P Aided Cache Enhancement
or SPACE, a new collaboration scheme among clients in a computer cluster of a high
performance computing facility to share their caches with each other has been
discussed by the authors in [5]. Here the clients create an environment which gives
the perception of a large pseudo global cache by exchanging information through
gossip messages. If a request can not be served in the local cache, it is looked upon in
the pseudo global cache without involving any file manager or central server. The
collaboration is achieved in a distributed manner, and is designed based on peer-topeer computing model. Dominguez-Sal, Larriba-Pey and Surdeanu [6] have designed
multilayer collaborative cache for question answering. In this work a multi-class
maximum Entropy classifier is used to map each question into a known answer type
taxonomy. Another algorithm is used to retrieve the queries. The question set contains
questions which are randomly selected using Zipf distribution. Two different
protocols have been proposed here for the management of the multilayer and
distributed cache. Xu, Liu , Li and Jia [7] have discussed about caching and prefetching for web content distribution.
Zhang, Lee, Ma and Zhou [13] have discussed about pre-fetching co-coordinator
(PFC), which is a multi level independent pre-fetching. An immediate layer of
intelligence has been placed between upper and lower level strategies for prefetching
and cache replacement. They implemented four well known prefetching algorithms
used in real systems: P- Block Read Ahead (RA), Linux kernel prefetching, SARC
and AMP. They have shown that when the four algorithms are applied to a two level
storage system, the addition of PFC can considerably improve the system
performance by upto 35%. Tang, Zhang, and Chanson [14] compare the various
caching algorithms designed for transcoding proxies. They propose an adaptive
algorithm that dynamically selects an appropriate for adjusting the management
policy. Their experimental results show that the algorithm significantly outperforms
those that cache only transcoded or only untranscoded objects. Kipruto, Tan, Musau
and Mushi [15] implemented our web caching within the proxy web caching
architecture using Squid running on windows operating system. To preserve a
constituent cache population, optimize cacheable objects and results of the e-learning
content, their implementation adapts a Genetic Algorithm(GA) approach. With
adoption of GA or GEP, one can avoid the necessity of using multiple or multi-level
caches. Davison [16] elaborately explains about various web caching methods and
resources. Rabinovich and Spatscheck[17] have highlighted the term called
replication. According to them, replication is the method that enhance scalability from
the server side. Replication creats and maintains distributed copies of content under
the control of the content provider. Client requests are sent to the nearest and least
busy server. A decentralized peer-to-peer web cache system, called squirrel has been
explained extensively by Iyer, Rowstron and Druschel[19]. There are many works
relating to cooperative web caching and workload characterization. But this paper
demonstrates that it is efficient to adopt a peer-to-peer web caching system in a
corporate LAN situated in a single geographical location, as it uses no extra hardware
and administration, simultaneously being fault-tolerant. Differential service
architecture has been designed and implemented by Venketesh, Sivanandamand
Manigandan[22]. This model achieves service differentiation with improved hit rate
with separate replacement algorithm for each class based on its requirement
specifications. Du and Subhlok [23] have evaluated the performance of cooperative
Pass Down C-LRU ( PDC-LRU).Each web page is first placed in a fit region but
when the cache becomes full, instead of being removed based on the LRU scheme,
get passed down into the unfit region. Che, Tung and Wang[33] have viewed a cache
as a low-pass filter and have designed a hierarchical web caching model. The
hierarchy is a k-level tree with single cache C0 at the root level and k no. of leaf
nodes of size Ck at the leaf level. Available traces at any level of the tree are the
filtered traces of the lower levels. Each cache is identified by a unique request arrival
time. This approach is compared with the traditional uncooperative hierarchical
models.
After discussing all the above traditional techniques, we have observed that cache is
the key of all these. So improving the performance of cache improves the efficiency
of all these techniques. But cache is inherently a hash table in functionality and
configuration [41], the wikipaedia. So we emphasize on improving the performance
of a hash table so that the same principles can be applicable to a cache also. We made
the following observations regarding the functions of a hash table. It is done in two
ways. (1) By reducing the collisions to a minimum and (2) By increasing the hit ratio
considerably. Estebanez, Cesar and Ribagorda [9] have implemented hash functions
using Genetic Programming(GP). Here the fitness function is the hash function with a
single bit flipped. After hashing both the inputs, the hamming distance between the
two is calculated. Safdari and Joshi[10] have implemented the
universal hash
functions using Genetic Algorithm(GA). They have proved that a Universal Hash
function gives the minimum collision. Going ahead in Genetic Algorithm way, we
came across Gene Expression Programming, first discovered by Ferreira [11].
According to her, GEP is similar to GA and GP where individuals are selected from
populations, selects them according to fitness and uses genetic variations using
genetic operators. But the difference lies in the nature of individuals. Skaruz and
Seredynski [35] have discussed about anomaly detection methods in Web applications
using Gene Expression Programming. Security of an Organization depends on an
effective intrusion detection system. GEP allows to make the system an intelligent one
so that currently known attacks as well as those that can occur in future can be
detected. Liu, English and Pohl [38] propose a GEP-based data mining method to
10
obtain formulas for the reliability of the C (k, n: F) system. Wang and Lu have proved
that GATree algorithms have improved performance over the traditional LRU based
algorithms.
11
CHAPTER 2
OVERVIEW OF CACHING TECHNOLOGY
(ii)
Sequentiality:
Level 1
Level 2
Cache
12
Main memory
map control
Back up
Fig. 2.1 The memory hierarchy
A cache is a component that transparently stores data so that future requests for that
data can be served faster. The data that is stored within a cache might be values that
have been computed earlier or duplicates of original values that are stored elsewhere.
If requested data is contained in the cache (cache hit), this request can be served by
simply reading the cache, which is comparatively faster. Otherwise (cache misses),
the data has to be recomputed or fetched from its original storage location, which is
comparatively slower. Hence, the more requests can be served from the cache the
faster the overall system performance is.
To be cost efficient and to enable an efficient use of data, caches are relatively small.
Nevertheless, caches have proven themselves in many areas of computing because
access patterns in typical computer applications have locality of reference. References
exhibit temporal locality if data is requested again that has been recently requested
already. References exhibit spatial locality if data is requested that is physically stored
close to data that has been requested already.
13
14
found in the pseudo global cache, the clients cooperate in fetching the block.
Moreover, through the coordination, the performance of a busy cache can be
improved by properly utilizing a nearby idle cache. Here, an idle cache not only helps
the busy cache in fetching blocks, but also preserves critical blocks to reduce the cost
of retrieval from the mass storage. In addition, we propose that the clients introduce
replicas of frequently accessed blocks. Replication of such blocks often reduces the
bottleneck of the central server, distributes the service load among clients, and
increases the chance of hits in the local as well as in the pseudo global cache.
However, when the system gets busy, the clients coordinate in an elimination process
to remove one or more replicas, and make space for newly introduced blocks. Due to
the collaboration, a client acts as a requester for services from other clients and at the
same time, acts like a service point for other clients. As a result, the load of the system
is distributed among the participators. In this scheme, the data server gets a service
request, only if the request can not be served by the pseudo global cache, i.e., a miss
happens in both the local and the global cache. To achieve this coordination, a peer-topeer (P2P) client partnership has been proposed to model the collaborative cache. In
this partnership, the client-server relation becomes dubious, and cooperation among
peers (i.e., clients of the file server) emerges to provide higher number of hits in the
pseudo global cache. With this approach, we obtain three additional fundamental
benefits: (1) low maintenance cost, (2) easy integration with existing software
platforms, and (3) easy development platform. The authors have successfully shown
that the proposed scheme reasonably approximates the ideal Global LRU caching
policy which has the instantaneous view of all the caches in the system. The results
also show that the scheme performs better than existing centralized solutions.
Additionally, the results demonstrate that the message communication and memory
overhead for the maintenance operations are fairly low.
15
occurs). The prefetch cache attempts to anticipate the locality about to be requested by
the processor and thus prefetches it into the cache.
16
The updates on cache table are propagated to target database in two modes.
Synchronous mode makes sure that after the database operation is complete, the
updates are applied at the target database as well. In case of Asynchronous mode the
updates are delayed to the target database till all database operations are complete.
Synchronous mode gives high cache consistency and is suited for real time
applications. Asynchronous mode gives high throughput and is suited for non- real
time applications.
Transparent Fail over: There should not be any service outages in case of caching
platform failure. Client connections should be routed to the target database.
No or very few changes to application for the caching solution: Support for
standard interfaces JDBC, ODBC etc that will make the application should work
seamlessly without any application code changes. It should route all stored procedure
calls to target database so that they dont need to be migrated.
17
the resulting path is stored into the query plan cache, and the stored path is used for
the 300 execute requests. Because the path is already known, the optimizer does not
need to be called, which saves the database CPU and time.
The third, relation caching, simply involves putting the entire relation (usually a table
or index) into memory so that it can be read quickly. This saves disk access, which
basically means that it saves time. (This type of caching also can occur at the OS
level, which caches files).Those are the three basic types of caching, ways of
implementing each are discussed below. Each one should complement the other, and a
query may be able to use one, two, or all three of the caches.
18
size of the cache will be compared to the size of the query+output, to see if there is
room for it. If there is, the query will be saved, with a status of valid, a time of 'now', a
count of 1, a list of all affected columns found by parsing the query, and the total size
of the query+output. If there is no room, then it will try to delete one or more to make
room. Deleting can be done based on the oldest access time, smallest access count, or
size of the query output. Some balance of the first two would probably work best,
with the access time being the most important. Everything will be configurable, of
course. Whenever a table is changed, the cache must be checked as well. A list of all
columns that were actually changed is computed and compared against the list of
columns for each query. At the first sign of a match, the query is marked as "invalid."
This should happen before the changes are made to the table itself. We do not delete
the query immediately since this may be inside of a transaction, and subject to
rollback. However, we do need to mark it as invalid for the current user inside the
current transaction: thus, the status flag. When the transaction is committed, all
queries that have an "invalid" flag are deleted, and then the tables are changed. Since
the only time a query can be flagged as "invalid" is inside your own transaction, the
deletion can be done very quickly.
II. Query plan caching: If a query is not cached, then it "falls through" to the
next level of caching, the query plans. This can either be automatic or strictly on a
user-requested format (i.e. through the prepare-execute paradigm). The latter is
probably better, but it also would not hurt much to store non-explicitly prepared
queries in this cache as long as there is room. This cache has a field for the query
itself, the plan to be followed (i.e. scan this table, that index, sort the results, then
group them), the columns used, the access time, the access count, and the total size. It
may also want a simple flag of "prepared or non-prepared", where prepared indicates
an explicitly prepared statement that has placeholders for future values. A good
optimizer will actually change the plan based on the values plugged in to the prepared
queries, so that information should become a part of the query itself as needed, and
multiple queries may exist to handle different inputs. In general, most of the inputs
will be similar enough to use the same path (e.g. SELECT flavor FROM foo
WHERE size=?" will most usually result in a simple numeric value for the
19
executables). If a match *is* found, then the database can use the stored path, and not
have to bother calling up the optimizer to figure it out. It then updates the access time,
the access count, and continues as normal. If a match was *not* found, then it might
possibly want to be cached. Certainly, explicit prepares should always be cached.
Non-explicitly prepared queries (those without placeholders) can also be cached. In
theory, some of this will also be in the result cache, so that should be checked as well:
it is there, no reason to put it here. Prepared queries should always have priority over
non-prepared and the rest of the rules above for the result query should also apply,
with a caveat that things that would affect the output of the optimizer (e.g.
vacuuming) should also be taken into account when deleting entries.
20
Client Caching: It is installed in the web browser and addresses to store the contents
of the web sites near the client side.
Proxy Caches: A proxy cache is installed near the Web users, say within an
enterprise. Users in the enterprise are told to configure their browsers to use the proxy.
Requests for objects from a website are intercepted and handled by the proxy cache. If
they are not in the cache, the proxy gets them from another cache or from the website
itself.
Gateway caches: Are installed in the gateways connecting different networks.
Desired relative hit rate for each class is assigned based on service differentiation
policy. This architecture assigns value based on previous measurements taken without
differentiation criteria. The desired relative hit rate of all classes is normalized such
that it equals 1.
21
Hi
Ri =
H1 +H2 + H3+Hn
Difference between measured relative hit rate and desired relative hit rate is used
to adjust the space allocated to particular class. To evaluate this differentiated model
that provides QoS, the architecture has been implemented in widely used Squid Proxy
Server. Squid is an open-source, high performance, Internet Proxy Cache that services
HTTP requests on behalf of clients. Squid maintains cache of documents that are
requested to avoid refetching from the web server if another client makes the same
request. Hit rate (H) is considered an important parameter in measuring the Cache
efficiency.
22
23
24
CHAPTER 3
GENE EXPRESSION PROGRAMMING CONCEPT
25
with the start codon, continues with the amino acid codons, and ends at a
termination codon. However, a gene is more than the respective ORF, with sequences
upstream from the start codon and sequences downstream from the stop codon.
Although in GEP the start site is always the first position of a gene, the termination
point does not always coincide with the last position of a gene. It is common for GEP
genes to have non coding regions downstream from the termination point.
Consider, for example, the algebraic expression:
(5.1)
This can also be represented as a diagram or ET:
(5.2)
which is the straightforward reading of the ET from left to right and from top to
bottom. Expression (3.2) is an ORF, starting at Q (position 0) and terminating at d
(position 7). These ORFs were named K-expressions (from the Karva language, the
name chosen for the language of GEP). Note that this ordering differs from both the
postfix and prefix expressions used in different GP implementations with arrays or
stacks. The inverse process, that is, the translation of a K-expression into an ET, is
also very simple. Consider the following K-expression:
26
01234567890
Q*+*a*Qaaba
(5.3)
Looking only at the structure of GEP ORFs, it is difficult or even impossible to see
the advantages of such a representation, except perhaps for its simplicity and
elegance. However, when ORFs are analyzed in the context of a gene, the advantages
of such representation become obvious. As stated previously, GEP chromosomes have
fixed length and are composed of one or more genes of equal length. Therefore the
length of a gene is also fixed. Thus, in GEP, what varies is not the length of genes
(which is constant), but the length of the ORFs. Indeed, the length of an ORF may be
equal to or less than the length of the gene. In the first case, the termination point
coincides with the end of the gene, and in the second case, the termination point is
somewhere upstream from the end of the gene .So, what is the function of these non
coding regions in GEP genes? They are, in fact, the essence of GEP and evolvability,
for they allow modification of the genome using any genetic operator without
restrictions, always producing syntactically correct programs without the need for a
complicated editing process or highly constrained ways of implementing genetic
operators. Indeed, this is the paramount difference between GEP and previous GP
implementations, with or without linear genomes.
27
mod
mod
*
p
q
r
28
minimum collision. They have constructed the hash function using Genetic
Algorithm. We have designed our algorithm by using GEP.
The combination of p, q and x would be treated as the chromosome for our
experiment. Let the initial population start with the following.
Population
p
00000000000000000000000000000001
00000000000000000000000000000010
00000000000000000000000000000011 00000000000000000000000000000100
00000000000000000000000000000101
00000000000000000000000000000110
00000000000000000000000000000110
00000000000000000000000000000111
00000000000000000000000000000101
00000000000000000000000000000100
Q
2
2
2
2
2
2
2
2
r
0
0
0
0
0
0
1
1
x
29
31
37
41
43
47
29
31
29
P*r +q (A)
02
02
02
02
02
02
03
03
A mod x(B)
2
2
2
2
2
2
3
3
B mod N(=20)
2
2
2
2
2
2
3
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
4
4
4
4
4
4
5
5
5
5
5
5
6
6
6
6
6
6
7
7
7
7
7
7
8
8
8
8
8
8
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
30
03
03
03
03
04
4
4
4
4
4
5
5
5
5
5
5
6
6
6
6
6
6
7
7
7
7
7
7
8
8
8
8
8
8
9
9
9
9
9
9
10
10
10
10
10
10
3
3
3
3
4
4
4
4
4
4
5
5
5
5
5
5
6
6
6
6
6
6
7
7
7
7
7
7
8
8
8
8
8
8
9
9
9
9
9
9
10
10
10
10
10
10
3
3
3
3
4
4
4
4
4
4
5
5
5
5
5
5
6
6
6
6
6
6
7
7
7
7
7
7
8
8
8
8
8
8
9
9
9
9
9
9
10
10
10
10
10
10
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
9
9
9
9
9
9
10
10
10
10
10
10
11
11
11
11
11
11
12
12
12
12
12
12
13
13
13
13
13
13
14
14
14
14
14
14
15
15
15
15
15
15
16
16
16
16
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
31
11
11
11
11
11
11
12
12
12
12
12
12
13
13
13
13
13
13
14
14
14
14
14
14
15
15
15
15
15
15
16
16
16
16
16
16
17
17
17
17
17
17
18
18
18
18
11
11
11
11
11
11
12
12
12
12
12
12
13
13
13
13
13
13
14
14
14
14
14
14
15
15
15
15
15
15
16
16
16
16
16
16
17
17
17
17
17
17
18
18
18
18
11
11
11
11
11
11
12
12
12
12
12
12
13
13
13
13
13
13
14
14
14
14
14
14
15
15
15
15
15
15
16
16
16
16
16
16
17
17
17
17
17
17
18
18
18
18
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
16
16
17
17
17
17
17
17
18
18
18
18
18
18
19
19
19
19
19
19
20
20
20
20
20
20
21
21
21
21
21
21
22
22
22
22
22
22
23
23
23
23
23
23
24
24
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
37
41
43
47
29
31
32
18
18
19
19
19
19
19
19
20
20
20
20
20
20
21
21
21
21
21
21
22
22
22
22
22
22
23
23
23
23
23
23
24
24
24
24
24
24
25
25
25
25
25
25
26
26
18
18
19
19
19
19
19
19
20
20
20
20
20
20
21
21
21
21
21
21
22
22
22
22
22
22
23
23
23
23
23
23
24
24
24
24
24
24
25
25
25
25
25
25
26
26
18
18
19
19
19
19
19
19
0
0
0
0
0
0
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
4
4
4
4
4
4
5
5
5
5
5
5
6
6
1
1
1
1
2
2
2
2
24
24
24
24
37
41
43
47
26
26
26
26
26
26
26
26
6
6
6
6
3.4 The Tools & Sample Data: We used two tools for our experiment.
1. CodeBlock 10.05
2. DTREG and
3. GeneXproTools 4.0
Sample data File:
A sample of history files was obtained which is as follows.
Each of the web sites above has a particular ip address. For our project point of
view, we would convert each of them into an index. The index is any arbitrary
33
number as we like, because it is only a mere representation of the actual site and
hence, we have taken numbers from 1001 to 1015 for our input file.
Fig. 3.3 The collected website history over the month of August 2011
Table- 3.2 (A sample of 15 collected Web site History pages)
Web address
1. https://login.tikona.in/userportal
2.http://www.google.com/firefox?client=firefox3. http://www.orkut.com/Logout?msg=0&hl=en-US
4. http://www.google.co.in/accounts/
5. http://www.orkut.co.in/Main#Home
6. http://uk.yahoo.com/
7. http://1.254.254.254/?N=1314542648179
8.https://www.irctc.co.in/cgi-bin/bv60.dll/irctc/booking/planner.do?
Index
1001
1002
1003
1004
1005
1006
1007
1008
ReturnBhttps://www.ir
9. ankResponse=true&ErrorTemplate
10 http://n.admagnet.net/d/pc/?AwMNCgAUDgxUXltfWl9JCh
11. http://www.indianrail.gov.in/dont_Know_Station_Code.html
1009
1010
1011
34
12.http://uk.mc290.mail.yahoo.com/mc/welcome?.gx=1&.tm=1314454
1012
235&.rand=1l5oigv20vkup#_pg=showFolder;_ylc
13.http://www.gamesfreak.net/games/Grand-Prix-Go_4208.html
14. http://news.indiaagainstcorruption.org/?p=3520
15. https://www.irctc.co.in/
This table is obtained from the above history page.
1013
1014
1015
35
Target variable: The target variable is the variable whose values are to be modeled
and predicted by other variables. It is analogous to the dependent variable (i.e., the
variable on the left of the equal sign) in linear regression. There must be one and only
one target variable.
Predictor variable: A predictor variable is a variable whose values will be used to
predict the value of the target variable. It is analogous to the independent variables
(i.e., the variables on the right side of the equal sign) in linear regression. There must
be at least one predictor variable specified, and there may be many predictor
variables. If more than one predictor variable is specified, DTREG will determine
how the predictor variables can be combined to best predict the values of the target
variable. For time series analysis, DTREG can automatically generate lag variables.
Weight variable -- Optionally, you can specify a weight variable. If a weight variable
is specified, it must a numeric (continuous) variable whose values are greater than or
equal to 0 (zero). The value of the weight variable specifies the weight given to a row
in the dataset. For example, a weight value of 2 would cause DTREG to give twice as
much weight to a row as it would to rows with a weight of 1; the effect on model
training is the same as two occurrences of the row in the dataset. Weight values may
be real (non-integer) values such as 2.5. A weight value of 0 (zero) causes the row to
be ignored. If you do not specify a weight variable, all rows are given equal weight.
An integer weight value has the same effect on model training as duplicating rows the
equivalent number of times in the training data. Since the goal of model training is to
tune parameters to minimize the overall error (or variance) of the training data,
weighted (or duplicated) rows that are misclassified add a greater amount to the total
error than un-weighted rows, so they have an increased influence on the model.
Types of Variables
Variables may be of two types: continuous and categorical.
Continuous variables with ordered values -- A continuous variable has numeric
values such as 1, 2, 3.14, -5, etc. The relative magnitude of the values is significant
36
37
38
To create a new project, the leftmost icon on the toolbar is clicked: Project wizard
screens
guide
through
setting
39
up
the
project.
The Evolution property page contains parameters that control evolution operations
such as mutation and recombination.
Mutation and inversion rates
Mutation rate This is the probability that a symbol (variable, function or constant)
in a gene will be mutated during each generation. Symbols in the head of a gene can
be replaced by variables, functions and constants (if constants are used); symbols in
the tail of the gene can be replaced only by variables and constants.
Inversion rate: This is the probability that the inversion operation will be performed
on a chromosome. Inversion selects a random starting symbol in a gene and a random
ending symbol. All of the symbols between the starting and ending points are then
reversed in order.
Transposition rates
Transposition is the process of moving a sequence of symbols in a gene from one
location to another. Some types of transposition allow sequences of symbols to be
moved from one gene to another gene in the same chromosome.
IS transposition rate: This is probability that Insertion Sequence Transposition will
be applied to a chromosome. Source and destination genes are selected in the
chromosome; the source gene may be the same as the destination. Starting and ending
symbol positions are selected in the source gene. The starting point may be in the
head or tail section of the gene, and the selected section may span the head and tail.
The destination, insertion point is selected in the head of the destination gene, but it is
not allowed to be the first (root) symbol of the gene, and the selection length is
restricted so that it will remain entirely in the head of the destination gene. The
selected sequence of symbols is then inserted into the destination gene, and any
symbols following the insertion point that are in the head of the destination gene are
moved right to make room of the insertion. Symbols shifted out of the head by the
insertion are discarded.
RIS transposition rate This is probability that Root Insertion Sequence
Transposition will be applied to a chromosome. A random scan point is selected in the
head of a gene beyond the first (root) symbol of the gene. The process then scans
40
forward looking for a function symbol. If no function is found, RIS transposition does
nothing. If a function is found, a random ending point is selected beyond the starting
point but in the head of the gene. The symbols in the selected range are then inserted
at the beginning (root) of the gene. Symbols pushed out of the head by the insertion
are discarded.
Gene transposition rate This is probability that Gene Transposition will be applied
to a chromosome. A random gene that is not the first gene of a chromosome is
selected. This gene is then inserted as the first gene of the chromosome. The gene
being inserted is removed from its original location, and the genes preceding it are
moved over to make room for the insertion at the head of the chromosome. So the
length of the chromosome is not changed.
Recombination rates
During Recombination, two chromosomes are randomly selected, and genetic material
is exchanged between them to produce two new chromosomes. It is analogous to the
process that occurs when two individuals are bred, and the offspring share genetic
material from both parents.
One-point rate This is probability that one-point recombination will be applied to a
chromosome. Two parent chromosomes are randomly selected and paired together. A
split point is selected anywhere in the chromosomes (any gene and any position in a
gene head or tail). The symbols in the parents from the split point to the ends of the
chromosomes are then exchanged between the parents. Note that all chromosomes
have the same number of symbols, so no symbols are lost during the exchange.
Two-point rate This is probability that two-point recombination will be applied to a
chromosome. Two parent chromosomes are randomly selected and paired together.
Two recombination points are selected in the chromosomes. The symbols between the
starting and ending recombination points are then exchanged between the parent
genes.
Gene recombination rate This is probability that gene recombination will be
applied to a chromosome. Two parent chromosomes are randomly selected and paired
together. A random gene is selected and exchanged between the parent chromosomes.
41
42
3.5.3 Genes
A gene consists of a fixed number of symbols encoded in the Karva language. A gene
has two sections, the head and the tail. The head is used to encode functions for the
expression. The tail is a reservoir of extra terminal symbols that can be used if there
arent enough terminals in the head to provide arguments for the functions. Thus, the
head can contain functions, variables and constants, but the tail can contain only
variables and constants (i.e. terminals). The number of symbols in the head of a gene
is specified as a parameter for the analysis. The number of symbols in the tail is
determined by the equation
t = h *(MaxArg -1) +1
Where t is the number of symbols in the tail, h is the number of symbols in the head,
and MaxArg is the maximum number of arguments required by any function that is
allowed to be used in the expression. For example, if the head length is 6 and the
allowable set of functions consists of binary operators (+, -, *, /), then the tail length
is:
t = 6*(2-1) +1 = 7
The purpose of the tail is to provide a reservoir of terminal symbols (variables and
constants) that can be used as arguments for functions in the head if there arent
enough terminals in the head.
Chromosomes and Linking Functions
43
44
45
(a + b) / (b + a) => 1
a AND NOT a => 0
Optimization of Random Constants
In addition to functions and variables, expressions can contain constants. You can
specify a set of explicit constants, and you can allow DTREG to generate and evolve
random constants. While evolution can do a good job of finding an expression that fits
data well, it is difficult for evolution to come up with exact values for real constants.
DTREG provides an optional final step to the GEP process to refine the values of
random constants. If this option is enabled, DTREG uses a sophisticated nonlinear
regression algorithm to refine the values of the random constants. This optimization is
performed after evolution has developed the functional form and linking and
simplification have been performed. DTREG uses a model/trust-region technique
along with an adaptive choice of the model Hessian. The algorithm is essentially a
combination of Gauss-Newton and Levenberg-Marquardt methods; however, the
adaptive algorithm often works much better than either of these methods alone.
If nonlinear regression does not improve the accuracy of the model, the original
model is used. So there is no risk of losing accuracy by using this option.
Chapter 4
EXPERIMENT AND RESULT ANALYSIS
46
47
return prod;
}
void mutation(int a)
{
int i = a,j=0;
int arr1[12],arr2[12];
do{
arr1[j]= i%2;
i= (i/2);
++j;
} while(i!=1);
arr1[j] = 1;
for(int n=j+1;n<12;n++)
{
arr1[n]=0;
}
cout<<"\nBefore Mutation.........\n";
for(int k=11,l=0;k>=0;k--,l++)
{
arr2[l]=arr1[k];
cout<<arr2[l];
}
cout<<"\n";
int pos;
pos=6;
cout<< "\nAfter mutation at position "<<pos<<endl;
int l;
l= NOT(arr2[pos]);
for(int x=0;x<pos;x++)
cout<<arr2[x];
cout<<l;
for(int x=pos;x<11;x++)
48
cout<<arr2[x];
}
void delay(int n)
{
for(int i=0;i<n;i++)
{
}
}
void CrossOver(int a, int b)
{
int i = a,j=0;
int arr1[12],arr2[12], arr3[12], arr4[12],arr5[12],arr6[12];
char ch;
ch = getch();
if(ch=='y'){
cout<<"\n\n\nGoing to exit from console.......\n\n";
exit(0);
}
else if(ch=='c')
{
do{
arr1[j]= i%2;
i= (i/2);
++j;
} while(i!=1);
arr1[j] = 1;
for(int n=j+1;n<12;n++)
{
arr1[n]=0;
}
49
50
for(q=pos;q<12;q++)
{
temp=arr2[q];
arr2[q] = arr4[q];
arr4[q]=temp;
}
int sum_K=0,sum_R=0;
for(int p=0,m=11;p<=11;p++,m--)
{
cout<<arr2[p];
arr5[m]=arr2[p];
}
//cout<<"\nContents of array5 = ";
for(int m=0;m<=11;m++)
{
int y= power(m);
arr5[m] = arr5[m]*y;
sum_K=sum_K+arr5[m];
}
cout<<"\n";
cout<<sum_K;
cout<<"\n";
for(int r=0, n=11;r<=11;r++,n--)
{
cout<<arr4[r];
arr6[n]=arr4[r];
}
for(int n=0;n<=11;n++)
{
int y= power(n);
arr6[n]=arr6[n]*y;;
//cout<<arr5[p];
51
sum_R=sum_R+arr6[n];
}
cout<<"\n";
cout<<sum_R;
static int val=0;
char ch1=getch();
if (ch1=='t')
{
cout<<"\nPerforming mutation on "<<val<<" th No. after Cross Over....";
if(val==0)
mutation(sum_K);
else
if(val==1)
{
mutation(sum_R);
val++;
if(val>=2)
val=0;
}
}
else
CrossOver(sum_K, sum_R);
delay(500000000);
}
else exit(0);
}
int main()
{
CrossOver(135,245);
//mutation(56);
return 0;
}
Output :
52
53
4.2 Program Showing GEP is faster than GA: This program demonstrates the
steps of each evolution. In our project, according to the principles of GEP, the fitness function
is the chromosome itself. So each evolution would depict each possible combination of the
following chromosomes.
Evolution 1: + * % p q r x n p q r x n
Evolution 2: * % p q r x n p q r x n +
Evolution 3: % p q r x n p q r x n + * ..and so on. In simplifying the structure and to
evaluate the computations, we have taken the following expressions in the program. The
expressions are not visible to the outside, but predict as if each chromosome is either selected
or discarded. The basic principle in this GEP program is that discard those chromosome that
can not produce a valid Expression Tree, in other words, each chromosome that does not
start with a character such as +, * or % .
#include <iostream>
#include<cstdlib>
#include<conio.h>
#include<math.h>
#include<cstdio>
using namespace std;
int p, q, r, x, n;
int funct1(){
return (p+q+r+x+n); }
int funct2(){
return ((p+r)*(p+r)); }
int funct3(){
return (p+r+p+q); }
int funct4(){
return (p+r+p+x); }
54
int funct5(){
return (p+r+q+x); }
int funct6(){
return (p+r+x+n); }
int funct7(){
return (p+r+p+n); }
int funct8(){
return (p+r+r+x); }
int funct9(){
return (p+r+q+n); }
int funct10(){
return (p*r+p*r); }
int funct11(){
return (p*r+p+r); }
int funct12(){
return (p*r+p+q); }
int funct13(){
return (p*r+p+x); }
int funct14(){
return (p*r+p+n); }
int funct15(){
return (p*r+r+q); }
int funct16(){
return (p*r+r+x); }
int funct17(){
return (p*r+r+n); }
int funct18(){
return (p*r+q+x); }
int funct19(){
return (p*r+q+n); }
int funct20(){
return (p*r+x+n); }
int funct21(){
return (p*r+p*r); }
int funct22(){
return (p*r+p*q); }
int funct23(){
return (p*r+p*x); }
55
int funct24(){
return (p*r+p*n); }
int funct25(){
return (p*r+r*q); }
return (p*r+p%r); }
int funct31(){
return (p*r+p%q); }
int funct32(){
return (p*r+r%q); }
int funct33(){
return (p*r+r%x); }
int funct34(){
return (p*r+r%n); }
int funct35(){
return (p*r+r%p); }
int funct36(){
return (p*r+q%p); }
int funct37(){
return (p*r+q%r); }
int funct38(){
return (p*r+q%x); }
int funct39(){
return (p*r+q%n); }
int funct40(){
return (p*r+x%n); }
int funct41(){
return (p*r+x%p); }
int funct42(){
return (p*r+x%q); }
56
int funct43(){
return (p*r+x%r); }
int funct44(){
return (p*r+n%p); }
int funct45(){
return (p*r+n%q); }
int funct46(){
return (p*r+n%x); }
int funct47(){
return (p*r+n%r); }
int funct48(){
return (p*r+q*x); }
int funct49(){
return (p*r+q*n); }
int funct50(){
return (p*r+q*r); }
int funct51(){
return (p*r+x*n); }
int funct52(){
return (p*r+q%n); }
int funct53(){
return (p*r*(p+q)); }
int funct54(){
return (p*r*(x+n)); }
int funct55(){
return (p*r*(p+n)); }
int funct56(){
return (p*r*(r+x)); }
int funct57(){
return (p*r*(q+n)); }
int funct58(){
return (p*r*(p*r)); }
int funct59(){
return (p*r+q%n); }
int funct60(){
return (p*r*(p*x)); }
int funct61(){
return (p*r*(p*n)); }
57
int funct62(){
return (p*r*(r*q)); }
int funct63(){
return (p*r*(q*x)); }
int funct64(){
return (p*r*r*x); }
int funct65(){
return (p*r*r*n); }
int funct66(){
return (p*r*r%n); }
int funct67(){
return (p*r*p%n); }
int funct68(){
return (p*r*p%q); }
int funct69(){
return (p*r*r%q); }
int funct70(){
return (p*r*r%x); }
int funct71(){
return (p*r*r%p); }
int funct72(){
return (p*r*q%p); }
int funct73(){
return (p*r*q%n); }
int funct74(){
return (p*r*q%x); }
int funct75(){
return (p*r*q%r); }
int funct76(){
return (p*r*x%q); }
int funct77(){
return (p*r*x%r); }
int funct78(){
return (p*r*x%n); }
int funct79(){
return (p*r*x%p); }
58
int funct80(){
return (p*r*(n%p)); }
int funct81(){
return (p*r*(n%q)); }
int funct82(){
return (p*r*(n%x)); }
int funct83(){
return (p*r*(n%r)); }
int funct84(){
return (p*r*(q*n)); }
int funct85(){
return (p*r*x*n); }
int funct86(){
return (p*r*p*q); }
int funct87(){
return (p*r+q%n); }
int funct88(){
return (p+q+p+q);}
int funct89(){
return (p+q+x+n); }
int funct90(){
return (p+q+x+r); }
int funct91(){
return (p+q+r+n); }
int funct92(){
return (p+q+q+x); }
int funct93(){
return (p+q+q+n);
int funct94(){
return (p+q+p*q); }
int funct95(){
return (p+q+p*x); }
int funct96(){
return (p+q+p*n); }
int funct97(){
return (p+q+r*q); }
int funct98(){
return (p+q+r*x); }
59
int funct99(){
return (p+q+q*x); }
int funct100(){
return (p+q+q*n); }
int funct101(){
return (p+q+r*n); }
int funct102(){
return (p+q+x*n); }
int funct103(){
return (p+q+p%q); }
int funct104(){
return (p+q+q%r); }
int funct105(){
return (p+q+q%x); }
int funct106(){
return (p+q+q%n); }
int funct107(){
return (p+q+p%r); }
int funct108(){
return (p+q+p%x); }
int funct109(){
return (p+q+p%n); }
int funct110(){
return (p+q+r%p); }
int funct111(){
return (p+q+r%q); }
int funct112(){
return (p+q+r%x); }
int funct113(){
return (p+q+r%n); }
int funct114(){
return (p+q+x%p); }
int funct115(){
return (p+q+x%q); }
int funct116(){
return (p+q+x%n); }
int funct117(){
return (p+q+x%r); }
60
int funct118(){
return (p+q+n%p); }
int funct119(){
return (p+q+n%q); }
int funct120(){
return (p+q+n%r); }
int funct121(){
return (p+q+n%x); }
int funct122(){
return (p+x+p+r); }
int funct123(){
return (p+x+p+n); }
int funct124(){
return (p+x++q); }
return (p+x+r+x); }
int funct127(){
return (p+x+r+n); }
int funct128(){
return (p+x+q+x); }
int funct129(){
return (p+x+q+n); }
int funct130(){
return (p+x+x+n); }
int funct131(){
return (p+x+p*r); }
int funct132(){
return (p+x+p*q); }
int funct133(){
return (p+x+p*n); }
int funct134(){
return (p+x+r*q); }
int funct135(){
return (p+x+r*x); }
int funct136(){
return (p+x+r*n); }
61
int funct137(){
return (p+x+p%q); }
int funct138(){
return (p+x+p%r); }
int funct139(){
return (p+x+p%n); }
int funct140(){
return (p+x+p%x); }
int funct141(){
return (p+x+r%q); }
int funct142(){
return (p+x+r%n); }
int funct143(){
return (p+x+q*x); }
int funct144(){
return (p+x+q*n); }
int funct145(){
return (p+x+x*n); }
int funct146(){
return (p+x+q%x); }
int funct147(){
return (p+x+x%n); }
int funct148(){
return (p+x+q%n); }
int funct149(){
return (p+x+n%p); }
int funct150(){
return (p+x+n%q); }
int funct151(){
return (p+x+n%r); }
int funct152(){
return (p+x+n%x); }
int funct153(){
return (p+x+q%r); }
int funct154(){
return (p+x+r%p); }
int funct155(){
return (p*r+q%p); }
62
int funct156(){
return (p+n+p+r); }
int funct157(){
return (p+n+p+q); }
int funct158(){
return (p+n+p+x); }
int funct159(){
return (p+n+p+n); }
int funct160(){
return (p+n+r+q); }
int funct161(){
return (p+n+r+x); }
int funct162(){
return (p+n+r+n); }
int funct163(){
return (p+n+q+x); }
int funct164(){
return (p+n+q+n); }
int funct165(){
return (p+n+x+n); }
int funct166(){
return (p+n+p*r); }
int funct167(){
return (p+n+p*q); }
int funct168(){
return (p+n+p*n); }
int funct169(){
return (p+n+r*q); }
int funct170(){
return (p+n+r*x); }
int funct171(){
return (p+n+r*n); }
int funct172(){
return (p+n+p%r); }
int funct173(){
return (p+n+p%q); }
int funct174(){
return (p+n+p%x); }
63
int funct175(){
return (p+n+p%n); }
int funct176(){
return (p+n+r%q); }
int funct177(){
return (p+n+r%x); }
int funct178(){
return (p+n+r%n); }
int funct179(){
return (p+n+q*x); }
int funct180(){
return (p+n+q*n); }
int funct181(){
return (p+n+q*r); }
int funct182(){
return (p+n+x*n); }
int funct183(){
return (p+n+q%x);}
int funct184(){
return (p+n+q%r);}
int funct185(){
return (p+n+q%p);}
int funct186(){
return (p+n+q%n); }
int funct187(){
return (p+n+x%p);}
int funct188(){
return (p+n+x%q);}
int funct189(){
return (p+n+x%r);}
int funct190(){
return (p+n+x%n);}
int funct191(){
return (p+n+r%p);}
int funct192(){
return (r+q+p+r);}
int funct193(){
return (r+q+p+q);}
64
int funct194(){
return (r+q+p+x);}
int funct195(){
return (r+q+p+n);}
int funct196(){
return (r+q+r+q);}
int funct197(){
return (r+q+r+x);}
int funct198(){
return (r+q+r+n);}
int funct199(){
return (r+q+q+x);}
int funct200(){
return (r+q+q+n);}
int funct201(){
return (r+q+x+n);}
int funct202(){
return (r+q+p*r);}
int funct203(){
return (r+q+p*q);}
int funct204(){
return (r+q+p*x);}
int funct205(){
return (r+q+p*n);}
int funct206(){
return (r+q+q*r);}
int funct207(){
return (r+q+q*x);}
int funct208(){
return (r+q+q*n);}
int funct209(){
return (r+q+r*x);}
int funct210(){
return (r+q+r*n);}
int funct211(){
return (r+q+x*n);}
int funct212(){
return (r+q+p%r);}
65
int funct213(){
return (r+q+p%q);}
int funct214(){
return (r+q+p%x);}
int funct215(){
return (r+q+p%n);}
int funct216(){
return (r+q+q%p);}
int funct217(){
return (r+q+q%r);}
int funct218(){
return (r+q+q%x);}
int funct219(){
return (r+q+q%n);}
int funct220(){
return (r+q+r%p);}
int funct221(){
return (r+q+r%q);}
int funct222(){
return (r+q+r%x);}
int funct223(){
return (r+q+r%n);}
int funct224(){
return (r+q+x%p);}
int funct225(){
return (r+q+x%q);}
int funct226(){
return (r+q+x%r);}
int funct227(){
return (r+q+x%n);
int funct228(){
return (r+q+n%p);}
int funct229(){
return (r+q+n%q);}
int funct230(){
return (r+q+n%r);}
int funct231(){
return (r+q+n%x);}
66
int funct232(){
return (r+x+p+r);}
int funct233(){
return (r+x+p+x);}
int funct234(){
return (r+x+p+n);}
int funct235(){
return (r+x+p+q);}
int funct236(){
return (r+x+q+r);}
int funct237(){
return (r+x+q+x);}
int funct238(){
return (r+x+q+n);}
int funct239(){
return (r+x+r+x);}
int funct240(){
return (r+x+r+n);}
int funct241(){
return (r+x+x+n);}
int funct242(){
return (r+x+p*q);}
int funct243(){
return (r+x+p*r);}
int funct244(){
return (r+x+p*x);}
int funct245(){
return (r+x+p*n);}
int funct246(){
return (r+x+q*r);}
int funct247(){
return (r+x+q*x);}
int funct248(){
return (r+x+q*n);}
int funct249(){
return (r+x+r*x);}
int funct250(){
return (r+x+r*n);}
67
int funct251(){
return (r+x+x*n);}
int funct252(){
return (r+x+p%q);}
int funct253(){
return (r+x+p%r);}
int funct254(){
return (r+x+p%x); }
int funct255(){
return (r+x+p%n);}
int funct256(){
return (r+x+q%p);}
int funct257(){
return (r+x+q%r);}
int funct258(){
return (r+x+q%x);}
int funct259(){
return (r+x+q%n); }
int funct260(){
return (r+x+r%p);}
int funct261(){
return (r+x+r%q);}
int funct262(){
return (r+x+r%x);}
int funct263(){
return (r+x+r%n);}
int funct264(){
return (r+x+x%p);}
int funct265(){
return (r+x+x%q);}
int funct266(){
return (r+x+x%r);}
int funct267(){
return (r+x+x%n);}
int funct268(){
return (r+x+n%p);}
68
int funct269(){
return (r+x+n%q);}
int funct270(){
return (r+x+n%r);}
int funct271(){
return (r+x+n%x);}
int funct272(){
return (r+n+p+q);}
int funct273(){
return (r+n+p+r);}
int funct274(){
return (r+n+p+x);}
int funct275(){
return (r+n+p+n);}
int funct276(){
return (r+n+q+r);}
int funct277(){
return (r+n+q+x);}
int funct278(){
return (r+n+q+n);}
int funct279(){
return (r+n+r+x);}
int funct280(){
return (r+n+r+n);}
int funct281(){
return (r+n+x+n);}
int funct282(){
return (r+n+p*q);}
int funct283(){
return (r+n+p*r);}
int funct284(){
return (r+n+p*x);}
int funct285(){
return (r+n+p*n);}
int funct286(){
return (r+n+q*r);}
int funct287(){
return (r+n+q*x);}
69
int funct288(){
return (r+n+q*n);}
int funct289(){
return (r+n+r*x);}
int funct290(){
return (r+n+r*n);}
int funct291(){
return (r+n+x*n);}
int funct292(){
return (r+n+p%q);}
int funct293(){
return (r+n+p%r);}
int funct294(){
return (r+n+p%x);}
int funct295(){
return (r+n+p%n);}
int funct296(){
return (r+n+q%p);}
int funct297(){
return (r+n+q%r);}
int funct298(){
return (r+n+q%x);}
int funct299(){
return (r+n+q%n);}
int funct300(){
return (r+n+r%p);}
int funct301(){
return (r+n+r%q);}
int funct302(){
return (r+n+r%x);}
int funct303(){
return (r+n+r%n);}
int funct304(){
return (r+n+q%p);}
int funct305(){
return (r+n+q%r);}
int funct306(){
return (r+n+q%x);}
70
int funct307(){
return (r+n+q%n);}
int funct308(){
return (r+n+x%p);}
int funct309(){
return (r+n+x%q);}
int funct310(){
return (r+n+x%r);}
int funct311(){
return (r+n+x%n);}
int funct312(){
return (r+n+n%r);}
int funct313(){
return (r+n+n%p);}
int funct314(){
return (r+n+n%q);}
int funct315(){
return (r+n+n%x);}
int funct316(){
return (r+n+n%x);}
int funct317(){
return (r+n+n%x);}
int funct318(){
return (r+n+n%x);}
int funct319(){
return (r+n+n%x);}
int funct320(){
return (r+n+n%x);}
int funct321(){
return (r+n+n%x);}
int funct322(){
return (r+n+n%x);}
int funct323(){
return (r+n+n%x);}
int funct324(){
return (r+n+n%x);}
int funct325(){
return (r+n+n%x);}
71
int funct326(){
return (r+n+n%x);}
int funct327(){
return (r+n+n%x);}
int funct328(){
return (r+n+n%x);}
int funct329(){
return (r+n+n%x);}
int funct330(){
return (r+n+n%x);}
int funct795(){
return (p%x+x%n);}
int funct1015(){
return(q*n+x%n);}
int funct1115(){
return (q%n+x%n);}
int funct2015(){
return (((p*r+q)%x)%n);}
int main()
{
int i , a[320];
int f;
p=1, q=2, r=1001, x=19, n=20;
f= (((p * r+ q) %x) %n);
a[0]=funct1();
a[1]=funct2();
72
a[2]=funct3();
a[3]=funct4();
a[4]=funct5();
a[5]=funct6();
a[6]=funct7();
a[7]=funct8();
a[8]=funct9();
a[9]=funct10();
a[10]=funct11();
a[11]=funct12();
a[12]=funct13();
a[13]=funct14();
a[14]=funct15();
a[15]=funct16();
a[16]=funct17();
a[17]=funct18();
a[18]=funct19();
a[19]=funct20();
a[20]=funct21();
73
a[21]=funct22();
a[22]=funct23();
a[23]=funct24();
a[24]=funct25();
a[25]=funct26();
a[26]=funct27();
a[27]=funct28();
a[28]=funct29();
a[29]=funct30();
a[30]=funct31();
a[31]=funct32();
a[32]=funct33();
a[33]=funct34();
a[34]=funct35();
a[35]=funct36();
a[36]=funct37();
a[37]=funct38();
a[38]=funct39();
a[39]=funct40();
74
a[40]=funct41();
a[41]=funct42();
a[42]=funct43();
a[43]=funct44();
a[44]=funct45();
a[45]=funct46();
a[46]=funct47();
a[47]=funct48();
a[48]=funct49();
a[49]=funct50();
a[50]=funct51();
a[51]=funct52();
a[52]=funct53();
a[53]=funct54();
a[54]=funct55();
a[55]=funct56();
a[56]=funct57();
a[57]=funct58();
a[58]=funct59();
75
a[59]=funct60();
a[60]=funct61();
a[61]=funct62();
a[62]=funct63();
a[63]=funct64();
a[64]=funct65();
a[65]=funct66();
a[66]=funct67();
a[67]=funct68();
a[68]=funct69();
a[69]=funct70();
a[70]=funct71();
a[71]=funct72();
a[72]=funct73();
a[73]=funct74();
a[74]=funct75();
a[75]=funct76();
a[76]=funct77();
a[77]=funct78();
76
a[78]=funct79();
a[79]=funct80();
a[80]=funct81();
a[81]=funct82();
a[82]=funct83();
a[83]=funct84();
a[84]=funct85();
a[85]=funct86();
a[86]=funct87();
a[87]=funct88();
a[88]=funct89();
a[89]=funct90();
a[90]=funct91();
a[91]=funct92();
a[92]=funct93();
a[93]=funct94();
a[94]=funct95();
a[95]=funct96();
a[96]=funct97();
77
a[97]=funct98();
a[98]=funct99();
a[99]=funct100();
a[100]=funct101();
a[101]=funct102();
a[102]=funct103();
a[103]=funct104();
a[104]=funct105();
a[105]=funct106();
a[106]=funct107();
a[107]=funct108();
a[108]=funct109();
a[109]=funct110();
a[110]=funct111();
a[111]=funct112();
a[112]=funct113();
a[113]=funct114();
a[114]=funct115();
a[115]=funct116();
78
a[116]=funct117();
a[117]=funct118();
a[118]=funct119();
a[119]=funct120();
a[120]=funct121();
a[121]=funct122();
a[122]=funct123();
a[123]=funct124();
a[124]=funct125();
a[125]=funct126();
a[126]=funct127();
a[127]=funct128();
a[128]=funct129();
a[129]=funct130();
a[130]=funct131();
a[131]=funct132();
a[132]=funct133();
a[133]=funct134();
a[134]=funct135();
79
a[135]=funct136();
a[136]=funct137();
a[137]=funct138();
a[138]=funct139();
a[139]=funct140();
a[140]=funct141();
a[141]=funct142();
a[142]=funct143();
a[143]=funct144();
a[144]=funct145();
a[145]=funct146();
a[146]=funct147();
a[147]=funct148();
a[148]=funct149();
a[149]=funct150();
a[150]=funct151();
a[151]=funct152();
a[152]=funct153();
a[153]=funct154();
80
a[154]=funct155();
a[155]=funct156();
a[156]=funct157();
a[157]=funct158();
a[158]=funct159();
a[159]=funct160();
a[160]=funct161();
a[161]=funct162();
a[162]=funct163();
a[163]=funct164();
a[164]=funct165();
a[165]=funct166();
a[165]=funct167();
a[167]=funct168();
a[168]=funct169();
a[169]=funct170();
a[170]=funct171();
a[171]=funct172();
a[172]=funct173();
81
a[173]=funct174();
a[174]=funct175();
a[175]=funct176();
a[176]=funct177();
a[177]=funct178();
a[178]=funct179();
a[179]=funct180();
a[180]=funct181();
a[181]=funct182();
a[182]=funct183();
a[183]=funct184();
a[184]=funct185();
a[185]=funct186();
a[186]=funct187();
a[187]=funct188();
a[188]=funct189();
a[189]=funct190();
a[190]=funct191();
a[191]=funct192();
82
a[192]=funct193();
a[193]=funct194();
a[194]=funct195();
a[195]=funct196();
a[196]=funct197();
a[197]=funct198();
a[198]=funct199();
a[199]=funct200();
a[200]=funct201();
a[201]=funct202();
a[202]=funct203();
a[203]=funct204();
a[204]=funct205();
a[205]=funct206();
a[206]=funct207();
a[207]=funct208();
a[208]=funct209();
a[209]=funct210();
a[210]=funct211();
83
a[211]=funct212();
a[212]=funct213();
a[213]=funct214();
a[214]=funct215();
a[215]=funct216();
a[216]=funct217();
a[217]=funct218();
a[218]=funct219();
a[219]=funct220();
a[220]=funct221();
a[221]=funct222();
a[222]=funct223();
a[223]=funct224();
a[224]=funct225();
a[225]=funct226();
a[226]=funct227();
a[227]=funct228();
a[229]=funct229();
a[229]=funct230();
84
a[230]=funct231();
a[231]=funct232();
a[232]=funct233();
a[233]=funct234();
a[234]=funct235();
a[235]=funct236();
a[236]=funct237();
a[237]=funct238();
a[238]=funct239();
a[239]=funct240();
a[240]=funct241();
a[241]=funct242();
a[242]=funct243();
a[243]=funct244();
a[244]=funct245();
a[245]=funct246();
a[246]=funct247();
a[247]=funct248();
a[248]=funct249();
85
a[249]=funct250();
a[250]=funct251();
a[251]=funct252();
a[252]=funct253();
a[253]=funct254();
a[254]=funct255();
a[255]=funct256();
a[256]=funct257();
a[257]=funct258();
a[258]=funct259();
a[259]=funct260();
a[260]=funct261();
a[261]=funct262();
a[262]=funct263();
a[263]=funct264();
a[264]=funct265();
a[265]=funct266();
a[266]=funct267();
a[267]=funct268();
86
a[268]=funct269();
a[269]=funct270();
a[270]=funct271();
a[271]=funct272();
a[272]=funct273();
a[273]=funct274();
a[274]=funct275();
a[275]=funct276();
a[276]=funct277();
a[277]=funct278();
a[278]=funct279();
a[279]=funct280();
a[280]=funct281();
a[281]=funct282();
a[282]=funct283();
a[283]=funct284();
a[284]=funct285();
a[285]=funct286();
a[286]=funct287();
87
a[287]=funct288();
a[288]=funct289();
a[289]=funct290();
a[290]=funct291();
a[291]=funct292();
a[292]=funct293();
a[293]=funct294();
a[294]=funct295();
a[295]=funct296();
a[296]=funct297();
a[297]=funct298();
a[298]=funct299();
a[299]=funct300();
a[300]=funct301();
a[301]=funct302();
a[302]=funct303();
a[303]=funct304();
a[304]=funct305();
a[305]=funct306();
88
a[306]=funct307();
a[307]=funct308();
a[308]=funct309();
a[309]=funct310();
a[310]=funct311();
a[311]=funct312();
a[312]=funct313();
a[313]=funct314();
a[314]=funct315();
a[315]=funct316();
a[316]=funct317();
a[317]=funct318();
a[318]=funct319();
a[319]=funct320();
a[320]=funct795();
a[321]=funct1015();
a[322]=funct1115();
a[323]=funct2015();
for(i=0;i<324;i++)
89
{
if (a[i]==f)
cout<<"Target is found after "<<i<<" th iteration \n";
}
return 0;
}
Output:
90
go upto a number much more than these, without guarantying a valid result. On the other
hand, a GEP always ensures a valid result.
4.3 Simulating GEP Using The DTREG Tool: This is the first step towards
initiating the project. It is done by single
following screen will come up. It asks for a name of the project following the location of the
input file. The input file is a csv ( comma separated value) file. A csv file is an ancient form
of an excel file.
91
92
93
94
95
96
97
4.4.1Data:
98
4.4.2General settings:
99
100
4.4.4Run
4.4.5History
101
4.4.6Results
102
103
CHAPTER 5
5.1 Conclusion:
A key performance measure for the World Wide Web is the speed with which
content is served to users. As traffic on the Web increases, users are faced with
increasing delays and failures in data delivery. Web caching is one of the key
strategies that has been explored to improve performance. Caching has been
employed to improve the efficiency and reliability of data delivery over the Internet. A
nearby cache can serve a (cached) page quickly even if the originating server is
swamped or the network path to it is congested. While this argument provides the
self-interested user with the motivation to exploit caches, it is worth noting that using
widespread use of caches also engenders a general good: if requests are intercepted by
nearby caches, then fewer go to the source server, reducing load on the server and
network traffic to the benefit of all users.
Applying hashing to integers is an established technique. But applying hashing to
enhance the functionality of a cache that stores web pages is a new approach which
needs lots of experiment and careful observation. Therefore, future scope of this
project lies in more and more no of experiments by using various kinds of websites
such as jobsites, educational and e-governance sites.
104
into enhancing a hash function which actually represents a web cache. In our
experiment, we have taken a cluster of computer systems in rural areas called the
nodes, which are connected to a slow internet connection and at the same time
connected to each other through high speed network. Each of them has a local cache.
They can access each others cache when a miss occurs in their own cache. In this
scenario, the village community which uses the system has very limited requirements.
They mostly need internet for healthcare, education, jobs and land usages. So our
improved cache is full of related websites which are meant to fulfill the needs of the
villagers. It is not optimized to incorporate the needs of people of other communities
with varied interests such as people living in metros. Therefore, the future scope of
our project lies in developing improved cache functionality for different usage
patterns for different groups of people. For ex. Some people are interested in research,
some are into politics, some are into govt. functionaries and people are interested in
social networking sites. Our project would throw light into improving the web cache
functionality for varied set of web sites used by different groups of users with
different ranges of interests.
105
REFERENCES
1. Sibren Isaacman and Margaret Martonosi, Potential for Collaborative caching and
Prefetching in largely-DisconnectedVillages, WiNS- DR08, September 19, 2008,
San Francisco, California, USA. Copyright 2008 ACM 978-1-
60558-190-3/08/09.
on
Collaborative
Caching
in
P2P
Systems,
homepage.fudan.edu.cn/~wnqian/publications/DASFAA05P2P.pdf
5. Mursalin Akon,Towhidul Islam, Xuemin Shen and Ajit Singh,SPACE: A
lightweight Collaborative caching for clusters.,Peer-to-Peer NetwAppl(2010)3:83
99,DOI 10.1007/s12083-009-0047-5
6. Dominguez-Sal, D., Larriba-Pey,J.,and Surdeanu, M., A Multi-layer Collaborative
Cache for Question Answering. In Proceedings of Euro-Par. 2007, 295-306.
106
7. Jianliang Xu; Jiangchuan Liu; Bo Li; Xiaohua Jia; Caching and prefetching for
WebContent distribution , July-Aug. 2004, IEEE Xplore, Volume: 06 Issue: 4, 54
59, 1521- 9615, 8034831, 10.1109/MCSE.2004.5
Systems,ISBN978-83-60434-59-8,pages89
398,iis.ipipan.waw.pl/2009/proceedings/iis09-38. pdf
11. GEP: Mathematical modeling by an Artificial Intelligence, C.Fereira, Online
version.
12. Michael J. Flynn ,Computer architecture, Pipelined and Parallel Processor
Design, ISBN- 978-81-7319-100-8.
13. PFC: Transparent Optimization of Existing Prefetching Strategies for Multi-level
Storage Systems, Zhe Zhang; Kyuhyung Lee; Xiaosong Ma;Yuanyuan Zhou; ISBN:
978-0-7695- 3172-4, IEEE, ICDCS08
14. Tang; Zhang; Chanson; Streaming Media Caching Algorithms for Transcoding
Proxies, http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.6017.
107
15. Kipruto; Tan; Musau; Mushi Using Genetic Algorithms to Optimize Web
Caching in Multimedia-Integrated e-Learning Content, International Journal of
Digital Content Technology and its Applications. Volume 5, Number 8, August 2011.
16. Brian D. Davison, A Web Caching Primer , IEEE. Reprinted from IEEE
Internet Computing,Volume 5,Number 4, July/August 2001, pages 38-45.
17. Hai liu; Maobian Chen Evaluation of Web Caching Consistency, 20IO 3rd
International Conference on Advanced Computer Theory and Engineering(ICACTE),
IEEE ISBN: 978-1-4244-6539-2
18. Michael Rabinovich and Oliver Spatscheck; Web Caching and Replication;
Addison Wesley; 1st edition (2002), ISBN 0-201-61570-3.
19. Sitaram Iyer, Antony Rowstron, Peter Druschel, Squirrel: A decentralized Peer
to peer web cache, 21th ACM Symposium on Principles of Distributed Computing
(PODC 2002).
20. Pawan Kumar Choudhary and Kishor S. Trivedi; Performance evaluation of
Web Cache, www.ee.duke.edu/~pkc4/webcache.pdf
21. Sarina Sulaiman,1 Siti Mariyam Shamsuddin,1 and Ajith Abraham, Rough Web
Caching, www.softcomputing.net/sarina-rs.pdf
22. P.Venketesh, S.N. Sivanandam, S.Manigandan , Enhancing qos in web caching
using
108
109
110
40 .Athena Vakali, A Genetic Algorithm scheme for Web Replication and Caching,
http://www.csd.auth.gr/teachers/vakali.html
41. en.wikipedia.org/wiki/Hash_table
Appendix I
List of Publications:
2.
International
conference
on
Advances
in
Computing
111
and
National
Conference
on
"Information &
Communication
Technology
Opportunities
& Challenges in
21st Century"
Web-Mining
Enabled Services in Web Mining using Advanced
1.
2.
3.
Expression Programming
Web Mining: - Ranking Metrics Method
Organized By:
BIRLA
INSTITUTE
OF
112
TECHNOLOGY
113
114
115
116
Appendix II
CURRICULUM VITAE
Profile Summary
10 years of job experience comprising both academic and industry.
Education:
B.E.
Publications:
1. National Conference on "Information & Communication Technology:
Opportunities & Challenges in 21st Century, NCICT, Birla Institute of Technology,
Noida, Feb, 2011.
2. International conference on Advances in Computing and
Communication ICACC-2011, NIT Hamirpur, April 2011, on the topic
Improving web caching mechanisms using Gene Expression Programming.
117
118