Académique Documents
Professionnel Documents
Culture Documents
Christoforus Yoga Haryanto, Sutrisno, Rahmadi Trimananda Department of Computer Systems, Faculty of Computer Science, Universitas Pelita Harapan cyharyanto@gmail.com, sutrisno@uph.edu, rahmadi.trimananda@staff.uph.edu
Abstract
This report presents the Garcinia peer-to-peer distributed computing framework. Garcinia is implemented on the top of Java Platform and uses IP multicasting to create an unstructured overlay network layer. The framework consists of the agent and the shared library. The agent acts as a node in the peer-to-peer network and also as worker and dispatcher service provider to the consumer of the shared library. The agent keeps an upto-date list of neighbouring nodes information, provides that information to the requesting consumer application, and accepts the task submitted by the consumer application. All communication between the agent and the consumer application is done via the shared library. A local agent instance must be started to ensure the fairness of the network. The framework provides an interface for developing application to solve bag-of-tasks kind of processing. The framework gives the flexibility to the consumer application to choose the worker or to use the built-in mechanism to perform randomised dynamic load balancing. The randomised dynamic load balancing in this framework can attain 62.2% system efficiency compared to ideal parallel ensemble. Keywords: peer-to-peer, distributed computing, distributed systems, Java
2.2 Peer-to-Peer
A peer-to-peer system is a type of distribution in a distributed system where the processes that constitute the system are equal. Every process must represent the functions needed to be carried out hence the interaction between them will be symmetric. Each process will act as a client and a server at the same time.[13] The way a peer-to-peer system organizes the processes is defined in an overlay network. The overlay network is a network in which the nodes are formed by the process and the links between nodes represent the permissible communication channel. Every node communicates with each other only through the permissible channel on the overlay network.[13] There are two types of overlay network: structured and unstructured. The distributed hash table (DHT) is the most common example of structured overlay network. An example in unstructured peer-to-peer file-sharing systems is Gnutella which is known for the ping, pong, and bye maintenance messages. Pings are used to discover hosts, pongs are replies to pings
and contain information about the responding peer and other peers, and byes are optional messages to inform the impending connection close. [13][14][15] The previous research by the Purdue University on the peer-to-peer distributed computing also use unstructured overlay network. [4] A design issue is the construction of the overlay network. There are two approaches; the first is nodes may organize themselves directly into a tree and there will be unique path between every pair of nodes whereas the second is that nodes organize into a mesh network where generally there are multiple paths between nodes. The second generally provides higher robustness.[13]
2.5 Multicast
A multicast is a mechanism to send packets to multiple destinations simultaneously in a network. It is different from broadcast in sense of, in the multicast, the packet is sent to a set of destinations joined in a group rather than sending the packet to each destination. Multicast is developed to eliminate inefficiencies of broadcasting in using the network bandwidth and computational resources as each machine has to process every broadcasted datagram. The multicast can be hardware based such as Ethernet multicast or IP based as in IP multicasting which is the internet abstraction of hardware multicasting.[17]
The multicast make use of Internet Group management Protocol (IGMP) to communicate group membership information. IGMP supports the local membership declaration and propagation to multicast routers. It also supports dynamic membership by periodically poll hosts on the local network. IGMP is carefully designed to avoid too much overhead and should not congest networks which include multiple routers and hosts that all participate in multicasting. In most cases, IGMP only introduces periodic message from a multicast router and single reply for each multicast group.[17]
2.6 Socket
Socket is an abstraction of network input and output. It is an application programming interface (API) between applications program and the protocol software. The socket is not in the TCP/IP standards and the de facto standard of socket is the implementation on the BSD UNIX operating system. It is a generalisation of UNIX file access mechanism that provides an endpoint for communication. Thus the program using socket will use the network input and output similar to file access with its read and write operation. It is usually encapsulates Transmission Control Protocol (TCP) connection between two machines. Many operating systems support sockets and there are libraries to facilitate the socket operations.[18]
where, 1) is the execution time of the sequential program that solves the problem; 2) is the execution time of the parallel program that solves the same problem on an processor system. The system efficiency is the indication of actual degree of speedup achieved by a parallel computer compared to the maximum value. The system efficiency for an -processor system is defined by: [21]
where, 1) is the speedup of the -processor system. 2) is the execution time on an uniprocessor system. 3) is the execution time on the -processor system. The speedup and system efficiency of the parallel computer is tied to the certain system architecture and the algorithm used. The speedup and system efficiency is expected to be higher when the overhead is lower such as when the program requires less interprocess interactions, generates less idling, and can avoid excess computation. Embarrassingly parallel problems are the best candidate to execute independently without communication. It is said that the best expected speedup is linear to the number of processing elements and the most scalable system is the system with efficiency as close as possible to 100%.[21][23][24]
Figure 1. Java Platform, Standard Edition 6 JDK component diagram. 2006, Sun Microsystems, Inc. Source: http://download.oracle.com/javase/6/docs/api/index.html
3. The Design of Garcinia Framework 3.1 Garcinia Peer-to-Peer Overlay Network Architecture
The Garcinia Framework peer-to-peer layer uses unstructured overlay network architecture where the agents act as the nodes on the overlay network. An IPv4 multicast address is used as the overlay network communication channel. The multicast address should not be fixed and must be able to be changed in production environment. In the Garcinia Framework, agents are used to represent the nodes in the overlay network. An agent is a multifunctional piece of program which one of the functions is to be a peer to the other neighbouring agents. As a peer, the agent communicates with the other agents in peer-to-peer manner where each agent acts equally with every other agent in the network. The first time the agent is started, the agent will join the multicast group by registering as a destination peer. After successfully joining the multicast group, the agent will start to send the service discovery datagram to the group and wait for the reply from the other agents reachable within the group. The replying agents will be put into a local list of neighbouring nodes. The agent will periodically maintain its own list of neighbours within the multicast network. The agents joined to the multicast group are essentially creating an unstructured overlay network of with mesh network organisation where each agent can communicate freely with every other reachable agent and the underlying multicast topology itself will be transparent to the agent.
Consumer Application
User Application Code
Overlay Service Manager Listener Worker Service Listener Executor Dispatcher Service
Task (Abstract)
Task Metadata
Assignment
Task Implementation
Logging Service
Assignment Result
Assignment Data
Worker Info
agent program contains four main components: Command Line Service, Overlay Service, Worker Service, Dispatcher Service, and Logging Service. The Garcinia Agent also makes use of the Garcinia Shared Library to facilitate its operation.
Worker Service
Task code segment migration Task executor Assignment data Assignment data transfer
Assignment data
Task executor
Example of Interaction between Consumer Application and the Garcinia Shared Library
Consumer Application
User Application Code Create a new Sender
Task (Abstract)
Task Metadata
Assignment
Assignment Result
Worker Info
10
The overlay service manager periodically maintains the list of the neighbouring nodes by sending and receiving datagram between the nodes in multicast and unicast manner. The 11
maintenance includes the service discovery by processing the incoming datagram. The datagram is a simple UDP packet containing a message in plaintext format. There are three types of message to be processed: the HELLO message, the HI message, and the LEAVE message. The structure of the messages can be observed on the Table 1.
12
13
TEST-02
TEST-07
TEST-03
TEST-08
TEST-04
TEST-09
14
Each node is connected to a 1000BASE-T switched network by a Category 6 Unshielded Twisted-pair (Cat6 UTP) cable in star-topology. The IPv4 address block 122.200.8.0/25 is assigned to the test network.
Table 3. Node name, IP address, packet loss, and TCP latency. Node Name IP Address Packet Loss TCP Latency TEST-01 122.200.8.59 0.00% 60.92 ms TEST-02 122.200.8.60 0.00% 70.63 ms TEST-03 122.200.8.63 0.00% 61.33 ms TEST-04 122.200.8.64 0.00% 57.89 ms TEST-05 122.200.8.65 0.00% 61.89 ms TEST-06 122.200.8.69 0.00% 58.13 ms TEST-07 122.200.8.70 0.00% 57.41 ms TEST-08 122.200.8.71 0.00% 58.51 ms TEST-09 122.200.8.74 0.00% 63.68 ms TEST-10 122.200.8.75 0.00% 67.47 ms MONITOR 122.200.8.79 n/a n/a Of Total 524 TCP Connections: 0.00% 28 ms Minimum 0.00% 103 ms Maximum 0.00% 62.40 ms Average 0.00% 27.96% CV
15
From the Table 4, it can be observed that each node generates exactly 60 multicast UDP packets over the period of 10 minutes (one packet every 10 seconds). This confirms the implementation of the overlay service which updates to the network every 10 seconds.
Table 5. Summary of IGMP traffic of multicast address 239.7.12.90 over period of 10 minutes. Address A Address B Packets Bytes 122.200.8.59 224.0.0.22 5 310 122.200.8.60 224.0.0.22 5 310 122.200.8.63 224.0.0.22 5 310 122.200.8.64 224.0.0.22 5 310 122.200.8.65 224.0.0.22 5 310 122.200.8.69 224.0.0.22 5 310 122.200.8.70 224.0.0.22 5 310 122.200.8.71 224.0.0.22 5 310 122.200.8.74 224.0.0.22 5 310 122.200.8.75 224.0.0.22 5 310 122.200.8.79 224.0.0.22 5 270 122.200.8.147 224.0.0.22 5 510 TOTAL: 60 3,880
16
From the Table 5 it can be seen the multicast group membership report datagram are sent by the individual node IP address to the 224.0.0.22 multicast group. Take note the Join group 239.7.12.90 for any sources messages; these messages are to inform that the corresponding source IP address is willing to receive any multicast datagram sent to the 239.7.12.90 multicast group. A stray machine on IP address 122.200.8.47 also appears on the list as it appears to run software which also uses multicast groups and includes the address 239.7.12.90 beside the shown address 239.192.152.143 in its IGMP packet. This occurrence considered not relevant to the operation of Garcinia Framework peerto-peer overlay network. From the total 318,858 bytes in 3,478 packets captured within 600 seconds (10 minutes), only 73,712 bytes in 720 packets generated by the Garcinia overlay network and its corresponding IGMP. That translates to 123.0 bytes/second traffic in just 1.201 packet/second throughput. For comparison, the total UDP traffic is 179,269 bytes in 1,601 packets over the same 10 minutes period.
Figure 7. The consumer application and agents output of reverse string test application.
17
The successful local agent functional test also prove that the Garcinia Framework resilience to the network failures as the task execution can still be done without the presence of another agents in the overlay network.
It can be observed from the Table 6 that every node can successfully execute the distributed LCS test application with the average execution time of 8,840 milliseconds. The execution time variation is measured using coefficient of variation (CV). The fact that the execution time varies by 11.94% can be explained by the use of randomised assignment distribution and the variation of the network performance over time.
Table 7. Worker service load using random assignment distribution.
Worker Service Load Test Location TEST-01 TEST-02 TEST-03 TEST-04 TEST-05 TEST-06 TEST-07 TEST-08 TEST-09 TEST-10 TEST-01 0.0% 11.1% 15.6% 31.1% 11.1% 6.7% 4.4% 8.9% 6.7% 4.4% TEST-02 22.2% 0.0% 8.9% 17.8% 4.4% 2.2% 15.6% 8.9% 6.7% 13.3% TEST-03 20.0% 22.2% 0.0% 13.3% 13.3% 4.4% 4.4% 8.9% 4.4% 8.9% TEST-04 15.6% 28.9% 4.4% 0.0% 20.0% 2.2% 6.7% 2.2% 8.9% 11.1% TEST-05 20.0% 26.7% 13.3% 11.1% 0.0% 8.9% 2.2% 2.2% 11.1% 4.4% TEST-06 15.6% 24.4% 13.3% 17.8% 2.2% 0.0% 8.9% 6.7% 4.4% 6.7% TEST-07 11.1% 28.9% 13.3% 17.8% 13.3% 8.9% 0.0% 2.2% 2.2% 2.2% TEST-08 11.1% 20.0% 20.0% 20.0% 2.2% 6.7% 11.1% 0.0% 0.0% 8.9% TEST-09 15.6% 22.2% 6.7% 13.3% 8.9% 11.1% 8.9% 2.2% 0.0% 11.1% TEST-10 20.0% 24.4% 6.7% 15.6% 6.7% 8.9% 6.7% 8.9% 2.2% 0.0% Average: 16.8% 23.2% 11.4% 17.5% 9.1% 6.7% 7.7% 5.7% 5.2% 7.9% CV: 24.0% 23.5% 44.2% 33.2% 64.9% 47.1% 52.6% 59.1% 67.8% 46.9% = Dispatcher Node CV = 64.7% = Worker Nodes
The load is being balanced by making use of the built-in randomised dynamic load balancing. The way the load can be measured is by analysing the number of assignments assigned to a worker service relative to the total number of assignments of the task, called relative load. In this test, the relative load is measured to the total load of each corresponding test location which is set at 100%. 18
It can be observed from the Table 7 that the nodes worker service load is not perfectly balanced and varies considerably from one test to another test, with average CV over total 10 nodes is 64.7%. In comparison to the data in the Error! Reference source not found., it can be seen although the load is not perfectly balanced, the framework can maintain the execution time variation of just 11.64%, still considerably less than the worker service load variation.
OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF ON ON
OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF ON
An important note about the test configuration is the Reference test configuration uses the sequential version of the LCS Matching algorithm and every other test configuration uses the distributed parallel version of the algorithm. The Reference test configuration execution time is the basis of the speedup and system efficiency analysis. The reason for that is because of the speedup and system efficiency analysis of a distributed computing system by using a 19
distributed parallel program needs a reference to the sequential version of the program runs in a sequential computer. Table 9 shows the 10 cases of the number of strands used in each test configuration along with the length of the strands and the maximum number of concurrent assignment executions. A combination of a test configuration and a case number of strands is called a treatment. In totals, there are 120 treatments are used to test the Garcinia Framework speedup and system efficiency.
Table 9. Test cases for the speedup and system efficiency analysis.
No. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Number of Strands 5 10 15 20 25 30 35 40 45 50 Length of Strand 5,000 5,000 5,000 5,000 5,000 5,000 5,000 5,000 5,000 5,000 Max. Concurrency 100 100 100 100 100 100 100 100 100 100 Exp. Number of Assignments 10 45 105 190 300 435 595 780 990 1225
The execution time is the time that elapses from the first time the calculation starts, including the assignment scheduling, to the moment of the LCS Genetic Matching program finishes execution. Each data series in Figure 8 corresponds to a test configuration and each data point corresponds to a treatment. The vertical axis shows the execution time in seconds and the horizontal axis shows the number of strands used in the treatments. The data is plotted linear scale on both axes. Lower execution time means better performance.
Distributed LCS Matching Execution Time
800
N=0
N=1
0 5 10 15 20 25 30 35 40 45 50
N = 10
Number of Strands
20
Figure 9 shows the execution time for 50 strands relative to the number of processing elements. It clearly shows the execution time is faster (lower) when additional processing elements is added. Note that there is no data for the execution with two processing elements and three processing elements as there is no test is done; the dual core monitor node is equivalent to four processing elements instead of two.
Distributed LCS Matching Execution Time for 50 Strands
800
700 600
Execution Time (seconds)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Figure 10 shows the speedup in relation to the number of processing elements with different types of test cases. The speedup is always less than 2.000 for the five strands test case regardless of the test configuration. Although with the lower number of strands ( ) the speedup increase is not readily apparent, it can be observed from that the rate of speedup raise is generally increasing by the number of processing elements although the relation is not always linear. It is very clear that the speedup is almost linear to the number of processing elements, especially on the higher order of test cases ( ).
Distributed LCS Matching Speedup vs. Number of Processing Elements
10.000 9.000
Strands = 5
Strands = 10
Strands = 30
21
The correlations between the numbers of processing elements with the speedup are positive and more than 0.90 for . Because of the test is done in a controlled test environment, it can be said that higher number of nodes, and correspondingly the number of processing elements, is the cause of the increasing speedup, especially on the higher number of strands.
Distributed LCS Matching Speedup vs. Number of Strands
10.000 9.000
N=0
N=4
N=9 N = 10
Number of Strands
From Figure 11, it can be observed that the speedup is increasing along with the number of strands. In the test configuration with larger number of processing elements, the speedup is higher than the one with less number of processing elements; almost nine times speedup in the test configuration with 14 processing elements (N = 10 test configuration) but almost no speedup in the N = 0 test configuration. The correlations between the numbers of strands with the speedup are positive and averaged more than 0.85 for five or more processing elements. Because of the test is done in a controlled test environment, it can be said that larger number of strands (consequently, larger number of assignments) is also the cause of the increasing speedup, especially on the system with many processing elements available. The reason for that is as the test case getting larger, more discrete assignments are generated and the amount of parallelism is increased relative to the problem size. Therefore it is beneficial to use the framework to process rather large amount of discrete assignments instead small number of discrete assignments. One of the most common use cases of distributed computing is to speed up the processing of large data set; therefore, it is wise to focus more on the system efficiency on the larger test case. As it can be observed on the Figure 12, the system efficiency is actually increased as the number of processing elements is increased; it is contrary to the usual case of reduced system efficiency when more processing elements are used. It may indicate the proportion of the system overhead is actually lower when the system is scaled up although the 100% efficiency may be not possible as the overhead will be still exists.
22
50.00%
40.00%
Strands = 15
Strands = 20
30.00%
Strands = 25 Strands = 30
20.00%
Strands = 35
Strands = 40
10.00%
Strands = 45 Strands = 50
0.00% 4 5 6 7 8 9 10 11 12 13 14
It is quite remarkable here that the system efficiency can reach 62.2% on 50 strands test case and 14 processing elements configuration with the randomised dynamic load balancing. It can be related to the successful achievements of 44% system efficiency on the research by the Purdue University by using random walk load balancing. The result cannot be compared directly because the Purdue Universitys research is based on simulation-based evaluation.
Distributed LCS Matching System Efficiency vs. Number of Strands
70.00%
N=1 N=2
System Efficiency
40.00%
N=3
N=4
30.00%
N=5 N=6
20.00%
N=7
N=8
10.00%
N=9 N = 10
0.00% 5 10 15 20 25 30 35 40 45 50
Number of Strands
It can be readily observed from the Figure 18, the larger number of strands result in the higher system efficiency. Due to the way of bag-of-task distributed computing works, the amount of parallelism is dependant to the number of discrete assignments. The number of discrete assignments is increasing in quadratic manner with the increasing number of strands. In the lower number of strands, only few discrete assignments are available and due to the random assignment distribution mechanism used in the Garcinia 23
Framework, the assignments cannot be reliably distributed to every processing element in balanced manner. Therefore, it is better to utilise the available distributed processing by solving the larger number of assignments instead to solve problems with few discrete assignments.
24
192.168.90.1
192.168.90.2
The physical network topology on the network is star topology. The central hub is the VMware virtual switch mapped to the Realtek RTL8103E PCI-E Fast Ethernet network interface of the WINDOWS-01. The Figure 14 shows the physical topology of the multiplatform test network and the Table 10 shows the node configurations.
Table 10. Node name, IP address, and specifications of the multiplatform test.
Node Name WINDOWS-01 WINDOWS-02 WINDOWS-03 WINDOWS-04 LINUX-01 LINUX-02 IP Address 192.168.90.1 192.168.90.2 192.168.90.3 192.168.90.4 192.168.90.5 192.168.90.6 Operating System Windows 7 64-bit Windows 7 32-bit Windows XP SP2 32-bit Windows XP SP2 32-bit Linux Fedora 14 32-bit Linux Fedora 14 32-bit CPU 2 cores 4 cores 1 core 1 core 1 core 1 core RAM 4096 MB 3072 MB 256 MB 256 MB 256 MB 256 MB
From Figure 15a it can be clearly observed that there are six different nodes, which consist of four nodes with single core processor per node, a node with four cores, and a node with dual core processor. To test the functionality of the multiplatform Garcinia network, the distributed sort is run with 50,000 data elements generated randomly. The consumer application output can be seen on Figure 15b. The Figure 15c is the log output of the Garcinia Agent on the test machine. From the log output, it can be observed that the agent launched three threads in the thread pool. The first thread is assigned to the dispatcher service and the next two threads are assigned to process the work request of the consumer application.
Figure 15a-c. The consumer application and agents output on the multiplatform test.
4) the data traffic of the peer-to-peer overlay network datagram are low when compared to the data traffic sharing the same network; 5) the Garcinia Framework supports the remote invocation and code migration reduce the need to manually share the code segment of the consumer program; 6) no total loss of computing capability when there is a total loss of network connectivity as the Garcinia Framework is resilient to the network failure by the means of immediate fallback to the local agent worker service; 7) the Garcinia Framework is scalable enough to hold linear speedup relative to the number of processing elements in large number of discrete assignments and its dynamic load balancing obtain comparable system efficiency relative to the other peer-to-peer distributed computing. 8) the Garcinia Framework Application Programming Interface (API) is available to consumer application without the need of exposing the details of the network topology and communication protocols of the peer-to-peer distributed computing system; 9) the write-once run-everywhere flexibility of the Java Platform is fully utilised as the Garcinia Framework is able to run on several types of operating system platform and shares the same library, the same task code segment, the same assignment structure, the same overlay network, the worker services, and the dispatcher services. 26
6.2 Recommendations
For the future developments, the following improvements can be done to the Garcinia Framework: 1) The extended functionality test should be done within relatively hostile network environment consists of hundreds of nodes where multiple nodes entry, frequent leave, and failures are common. 2) The performance of the worker service and the assignment scheduling can be optimised by better load balancing strategy to fully utilise the potential of peerto-peer distributed computing. 3) The error reporting services must be implemented to help the monitoring the operation of the remote task invocation. 27
4) The security subsystem should be implemented in order to limit the privilege of incoming task as currently the task can execute almost arbitrary code on the agent working service. 5) A well made peer-to-peer overlay network such as JXTA can be used to support advanced features such as Network Address Translation (NAT) piercing and proxy services. 6) The provision for the IPv6 networks should be implemented as soon as possible to reduce the reliance on IPv4.
7. Bibliography
[1] Stanford University. (2010) Folding@home Distributed Computing. [Online]. http://folding.stanford.edu/. Last accessed 25 June 2011. [2] University of California. (2011) SETI@home Search for Extraterrestrial Intelligence. [Online]. http://setiathome.berkeley.edu/. Last accessed 25 June 2011. [3] Distributed.net. (2011) Distributed.net. [Online]. http://www.distributed.net/. Last accessed 25 June 2011. [4] Asad Awan, Ronaldo A. Ferreira, Suresh Jagannathan, and Ananth Grama, "Unstructured Peer-to-Peer Networks for Sharing Processor Cycles," Parallel Computing - Parallel matrix algorithms and applications (PMAA'04), pp. 115-135, 2006. [5] Nazareno Andrade, Lauro Costa, Guilherme Germoglio, and Walfredo Cirne, "Peer-topeer grid computing with the OurGrid Community," in 23rd Brazilian Symposium on Computer Networks (SBRC 2005) - 4th Special Tools Session, 2005. [6] Mohamad bin Osman and Abd Rahman Milan, Mangosteen: Garcinia mangostana L., J. T. Williams et al., Eds. Southampton, United Kingdom: Southampton Centre for Underutilised Crops, 2006. [7] Arnold L. Rosenberg, "Optimal Schedules for Cycle-Stealing in a Network of Workstations with a Bag-of-Tasks Workload," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 2, pp. 179-191, February 2002. [8] George Coulouris, Jean Dollimore, and Tim Kindberg, Distributed Systems: Concept and Design, 4th ed. Boston: Addison Wesley, 2005. [9] Nazareno Andrade, Francisco Brasileiro, and Walfredo Cirne, "Discouraging Free Riding in a Peer-to-Peer CPU-Sharing Grid," in 13th IEEE International Symposium on High performance Distributed Computing, 2004, pp. 129-137. [10] Oracle Corporation. (2011) Java Platform, Standard Edition 6 API Specification. [Online]. http://download.oracle.com/javase/6/docs/api/index.html. Last accessed 25 June 2011. [11] William Y. Arms, Digital Libraries. Cambridge: MIT Press, 2000. [12] G. Timoty Mattson, Beverly A. Sanders, and L. Berna Massingill, A Pattern Language for Parallel Programming. Boston: Addison Wesley, 2004. [13] Andrew S. Tanenbaum and Maarten Van Steen, Distributed Systems: Principles and Paradigms. New Jersey: Pearson Prentice Hall, 2007. [14] J. Risson and T. Moors, "Survey of Research towards Robust Peer-to-Peer Networks: Search Methods," Computer Networks, vol. 50, pp. 47, 226, 2006. [15] Yi Qiao and Fabian E. Bustamante, "Structured and Unstructured Overlays under the Microscope," in USENIX Annual Technical Confrence, 2006, pp. 341-355. 28
[16] A. Fuggetta, G. P. Picco, and G. Vigna, "Understanding Code Mobility," IEEE Transactions on Software Engineering, vol. 24, no. 5, p. 105, May 1998. [17] Douglas E. Comer, Internetworking with TCP/IP: Principles, Protocols, and Architecture, 4th ed. New Jersey: Prentice Hall, 2000, vol. 1. [18] Douglas E. Comer, Internetworking with TCP/IP: Principles, Protocols, and Architecture, 4th ed. New Jersey: Prentice Hall, 2000. [19] Hisao Kameda, El-Zoghdy Said Fathy, Inhwan Ryu, and Jie Li, "A Performance Comparison of Dynamic vs. Static Load Balancing Policies in a Mainframe -- Personal Computer Network Model," in Proceedings of the 39th IEEE Conference on Decision and Control, Sydney, NSW, Australia, 2000, pp. 1415-1420 vol. 2. [20] Kenneth A. Berman and Jerome L. Paul, Algorithms: Sequential, Parallel, and Distributed. Boston: Thomson Course Technology, 2005. [21] Kai Hwang, Advanced Computer Architecture: Parallelism, Scalability, Programmability.: McGraw-Hill, 2005. [22] Ruby Lee, "Empirical Results on the Speedup, Efficiency, Redundancy, and Quality of Parallel Computations," in Proceeding of International Conference in Parallel Processings, 1980, pp. 91-96. [23] Anath Grama, Anshul Gupta, George Karypis, and Vipin Kumar. (2003) Analytical Modeling of Parallel Systems. Lecture Notes. [24] Ian Foster, Designing and Building Parallel Programs. Boston: Addison-Wesley, 1995. [25] Ulrich Drepper, "How to Write Shared Libraries," in UKUUG Linux Developers' Conference, Bristol, 2010. [26] Greg Travis. (2001, May) Build Your Own Java Library. [Online]. http://www.ibm.com/developerworks/java/tutorials/j-javalibrary/j-javalibrary-pdf.pdf. Last accessed 25 June 2011. [27] Oracle Corporation, Your First Cup: An Introduction to the Java EE Platform.: Oracle, 2011. [28] Internet Assigned Numbers Authority. (2011, June) IPv4 Multicast Address Space Registry. [Online]. http://www.iana.org/assignments/multicast-addresses/multicastaddresses.xml. Last accessed 25 June 2011. [29] David Meyer. (1998, July) Best Current Practice, Administratively Scoped IP Multicast (IETF RFC 2365). [Online]. http://tools.ietf.org/html/rfc2365. Last accessed 25 June 2011.
29