Garcinia: A Peer-To-Peer Distributed Computing Framework

GARCINIA: A PEER-TO-PEER DISTRIBUTED COMPUTING FRAMEWORK
Christoforus Yoga Haryanto, Sutrisno, Rahmadi Trimananda Department of Computer Systems, Faculty of Computer Science, Universitas Pelita Harapan cyharyanto@gmail.com, sutrisno@uph.edu, rahmadi.trimananda@staff.uph.edu
Abstract
This report presents the Garcinia peer-to-peer distributed computing framework. Garcinia is implemented on the top of Java Platform and uses IP multicasting to create an unstructured overlay network layer. The framework consists of the agent and the shared library. The agent acts as a node in the peer-to-peer network and also as worker and dispatcher service provider to the consumer of the shared library. The agent keeps an upto-date list of neighbouring nodes information, provides that information to the requesting consumer application, and accepts the task submitted by the consumer application. All communication between the agent and the consumer application is done via the shared library. A local agent instance must be started to ensure the fairness of the network. The framework provides an interface for developing application to solve bag-of-tasks kind of processing. The framework gives the flexibility to the consumer application to choose the worker or to use the built-in mechanism to perform randomised dynamic load balancing. The randomised dynamic load balancing in this framework can attain 62.2% system efficiency compared to ideal parallel ensemble. Keywords: peer-to-peer, distributed computing, distributed systems, Java
1. Introduction 1.1 Background

We see some success stories of Folding@home[1], SETI@home[2], distributed.net[3] and some grid computing initiatives that taps the available computing resources on participating computers, desktop and specialised computer alike. The problem with those initiatives is the one-way contribution where the participants give their processor cycle to the project. This problem is solved by using a peer-to-peer network where each participating computer can also distribute their task while also contributing its processing power to the other such as has been done in a research by Purdue University[4] and in OurGrid initiative of Universidade Federal de Campina[5]. To make the overall throughput as high as possible, it is preferable that tasks are distributed on some ways to not overload some nodes while leaving the other nodes sit idle. Previous study by Purdue University[4] shows that randomised algorithm for assignment distribution is quite effective on balancing the load. It achieves 44% efficiency compared to ideal parallel ensemble on large job size while maintaining stateless configuration. The name Garcinia is chosen for the name of this peer-to-peer distributed computing framework. The background of the name choice is to have an easy to remember code name to at least mimic one of the natures of the peer-to-peer network, namely the dynamic number of available nodes. The Garcinia is the name of a plant genus of the family Clusiaceae native to Asia, Australia, tropical and southern Africa, and Polynesia.[6]
1.2 Purpose of the Research

The purposes of this research are to create a peer-to-peer distributed computing framework which has the following characteristics: 1) has the ability to handle unexpected node entrance and leave pattern without total loss of the computing ability; 2) has the ability to support code migration of the task code segment from the frameworks consumer application to the remotely executing agent; 3) has the ability to speedup certain category of parallel programs which are suitable to be processed in bag-of-tasks style of distributed processing; 4) has the ability to provide the frameworks consumer application with an Application Programming Interface (API) in a library which abstracts the network topology as well as communication protocol details and can be included in general purpose application development.
1.3 Scope of the Research

The scope of this research is to be narrowed by the following limits: 1) The task to be executed by the framework is limited to execute the bag-of-tasks type of assignments.[7] 2) Full-trust can be given to remote node and to the task code segment belonging to the consumer application in order to reduce the complexity of system security.[8] 3) The problem of nodes free-riding on the network can be disregarded thus there is no requirement on the node contribution accounting.[9] 4) The implementation shall support only the Java Platform based general purpose application as the consumer application. 5) Due to the limitation of the Java Platform environment code loader, the only supported consumer application is the Java-based application which implements the task code segment in Java.[10]
2. Theory Review 2.1 Distributed Computing

The term distributed computing refers to computing systems where groups of computers collaborating over a network to provide services to users.[11] More specifically, distributed computing is a type of computing in which a collection of networked computers execute a computational task which is divided into subtasks. The networks are general-purpose networks (LANs, WANs, or the Internet) instead of dedicated cluster interconnects.[12]
2.2 Peer-to-Peer
A peer-to-peer system is a type of distribution in a distributed system where the processes that constitute the system are equal. Every process must represent the functions needed to be carried out hence the interaction between them will be symmetric. Each process will act as a client and a server at the same time.[13] The way a peer-to-peer system organizes the processes is defined in an overlay network. The overlay network is a network in which the nodes are formed by the process and the links between nodes represent the permissible communication channel. Every node communicates with each other only through the permissible channel on the overlay network.[13] There are two types of overlay network: structured and unstructured. The distributed hash table (DHT) is the most common example of structured overlay network. An example in unstructured peer-to-peer file-sharing systems is Gnutella which is known for the ping, pong, and bye maintenance messages. Pings are used to discover hosts, pongs are replies to pings
and contain information about the responding peer and other peers, and byes are optional messages to inform the impending connection close. [13][14][15] The previous research by the Purdue University on the peer-to-peer distributed computing also use unstructured overlay network. [4] A design issue is the construction of the overlay network. There are two approaches; the first is nodes may organize themselves directly into a tree and there will be unique path between every pair of nodes whereas the second is that nodes organize into a mesh network where generally there are multiple paths between nodes. The second generally provides higher robustness.[13]
2.3 Code Migration

The communication in the distributed systems is not limited to passing data. There are times when passing programs will be beneficial to the design and operation of the distributed system. The process of passing programs is called code migration. Traditionally, the code migration is done in form of process migration in which an entire running process was moved to a different machine. The other form of code migration is dynamically fetching the code from the code repository and invoking the code locally.[13] The code migration models can be categorised into weak mobility and strong mobility models. The weak mobility model is to provide the bare minimum code migration ability where the code segment and perhaps some initialization data are transferred between machines. The strong mobility model is to provide the ability to transfer the execution segment as well thus make it possible for the remote machine to resume the execution process where it left off. Irrespective to whether mobility is weak or strong, further distinction can be made between sender-initiated migration and receiver-initiated migration. The senderinitiated is where the sender push the code to the execution server and the receiver-initiated is where the receiver makes the first move the code migration download process.[13][16]
2.4 User Datagram Protocol

The User Datagram Protocol (UDP) provides an unreliable connectionless delivery service using Internet Protocol (IP) to transport messages from machine to machine. It has the ability to distinguish among multiple destinations within a given host computer by the means of port number. An application program that uses UDP accepts full responsibility for handling the reliability problems such as message loss, duplication, delay, out-of-order delivery, and lossof-connectivity. In the protocol layering, UDP is in the transport layer, above the Internet Protocol layer, and below the application layer.[17]
2.5 Multicast
A multicast is a mechanism to send packets to multiple destinations simultaneously in a network. It is different from broadcast in sense of, in the multicast, the packet is sent to a set of destinations joined in a group rather than sending the packet to each destination. Multicast is developed to eliminate inefficiencies of broadcasting in using the network bandwidth and computational resources as each machine has to process every broadcasted datagram. The multicast can be hardware based such as Ethernet multicast or IP based as in IP multicasting which is the internet abstraction of hardware multicasting.[17]
The multicast make use of Internet Group management Protocol (IGMP) to communicate group membership information. IGMP supports the local membership declaration and propagation to multicast routers. It also supports dynamic membership by periodically poll hosts on the local network. IGMP is carefully designed to avoid too much overhead and should not congest networks which include multiple routers and hosts that all participate in multicasting. In most cases, IGMP only introduces periodic message from a multicast router and single reply for each multicast group.[17]
2.6 Socket
Socket is an abstraction of network input and output. It is an application programming interface (API) between applications program and the protocol software. The socket is not in the TCP/IP standards and the de facto standard of socket is the implementation on the BSD UNIX operating system. It is a generalisation of UNIX file access mechanism that provides an endpoint for communication. Thus the program using socket will use the network input and output similar to file access with its read and write operation. It is usually encapsulates Transmission Control Protocol (TCP) connection between two machines. Many operating systems support sockets and there are libraries to facilitate the socket operations.[18]
2.7 Load Balancing

Load balancing involves the distribution of jobs throughout a networked computer system to increase throughput without having to obtain additional hardware. Load balancing may be either static or dynamic. Static load balancing are generally based on the average behaviour of the system and independent of current system state whereas dynamic load balancing react to the actual current system state in making decision.[19]
2.8 Algorithm Analysis

The complexity of an algorithm can be denoted by where is the problem size. For the purpose of algorithm analysis, the asymptotic behaviour (order) of a function not only ignores small values of but also ignores the positive constant between and .A function denoted by by requiring only that every eventually upper bounded by a constant multiple so that grows slower than . The function is often called Big Oh or Big O and is often used as an upper bound on the performance of a particular algorithm to solve a problem.[20]
2.9 Speedup and System Efficiency

The effectiveness of the distributed computing system can be measured by running a parallel program in the distributed computing system and analyse the run time similar to a parallel program running in a parallel computer. The two type of metrics can be used in the analysis are speedup (speedup factor) and system efficiency. The system efficiency is an important metric in describing the performance of the parallel computer. The higher system efficiency means the higher utilisation of the processors available in parallel computer.[21][22] The speedup can be obtained by processing the execution time data. By the definition, the speedup for an -processor system is:[21]
where, 1) is the execution time of the sequential program that solves the problem; 2) is the execution time of the parallel program that solves the same problem on an processor system. The system efficiency is the indication of actual degree of speedup achieved by a parallel computer compared to the maximum value. The system efficiency for an -processor system is defined by: [21]
where, 1) is the speedup of the -processor system. 2) is the execution time on an uniprocessor system. 3) is the execution time on the -processor system. The speedup and system efficiency of the parallel computer is tied to the certain system architecture and the algorithm used. The speedup and system efficiency is expected to be higher when the overhead is lower such as when the program requires less interprocess interactions, generates less idling, and can avoid excess computation. Embarrassingly parallel problems are the best candidate to execute independently without communication. It is said that the best expected speedup is linear to the number of processing elements and the most scalable system is the system with efficiency as close as possible to 100%.[21][23][24]
2.10 Shared Library

Shared library is a type of library which loaded into physical memory only once and can be reused by multiple processes to make it possible to use same pieces of code at the run-time. The shared library also has the advantages to reduce the development time and to reduce errors since the code has to be debugged only once.[25] A library should be self-contained and tight rather than scattered collection of object without clear functions and relationships. Encapsulation defines the boundaries of a piece of code. Inheritance is a mechanism for extending a piece of code without violating the encapsulation.[26]
2.11 Java Platform

Java is a programming language and computing platform first released by Sun Microsystems in 1995. As a programming language, Java is an object-oriented and platform independent. A program written in Java programming language is compiled into an intermediate language called bytecode and subsequently the bytecode will be executed in a runtime engine. As a computing platform, Java is a particular environment in which application written in Java programming language run. The Java SE is the most commonly referred when people think of the Java programming language. Java SEs API provides the core functionality of the Java programming languages and defines everything from the basic types to high-level Java classes. The Java SE also contains a virtual machine and common development tools used in Java technology applications.[27]
Figure 1. Java Platform, Standard Edition 6 JDK component diagram. 2006, Sun Microsystems, Inc. Source: http://download.oracle.com/javase/6/docs/api/index.html
3. The Design of Garcinia Framework 3.1 Garcinia Peer-to-Peer Overlay Network Architecture
The Garcinia Framework peer-to-peer layer uses unstructured overlay network architecture where the agents act as the nodes on the overlay network. An IPv4 multicast address is used as the overlay network communication channel. The multicast address should not be fixed and must be able to be changed in production environment. In the Garcinia Framework, agents are used to represent the nodes in the overlay network. An agent is a multifunctional piece of program which one of the functions is to be a peer to the other neighbouring agents. As a peer, the agent communicates with the other agents in peer-to-peer manner where each agent acts equally with every other agent in the network. The first time the agent is started, the agent will join the multicast group by registering as a destination peer. After successfully joining the multicast group, the agent will start to send the service discovery datagram to the group and wait for the reply from the other agents reachable within the group. The replying agents will be put into a local list of neighbouring nodes. The agent will periodically maintain its own list of neighbours within the multicast network. The agents joined to the multicast group are essentially creating an unstructured overlay network of with mesh network organisation where each agent can communicate freely with every other reachable agent and the underlying multicast topology itself will be transparent to the agent.
3.2 Task and Assignment

In the Garcinia Framework, there are two distinct but similar term to denote the parts of work to be done by the framework. The first part is the task. Task can be thought as a particular work description which is unique for the kind of work to be done. The examples of tasks are a sorting task and a matrix multiplication task. The second part is the assignment. Assignment can be thought as the real work to be done according to the work description contained in a task. The examples of the assignments are a segment of a list containing a million floating point numbers to be sorted and two matrices to be multiplied. 6
3.3 Garcinia Framework Components Design

The Garcinia Framework components are divided into two system part and one external part. The first part of the system is the Garcinia Shared Library which contains the library of shared classes and provides a common set of Application Programming Interface (API) components to both the consumer application and the agent. The second part of the system is the Garcinia Agent which contains the components to provide the Garcinia peer-to-peer overlay service, the worker service, and the dispatcher service. The external part is the consumer application which implements the task to be executed remotely.
Garcinia Framework Components

Garcinia Agent
Command Line Service
Garcinia Shared Library

Sender
Consumer Application
User Application Code
Overlay Service Manager Listener Worker Service Listener Executor Dispatcher Service
Task (Abstract)
Task Metadata
Task Execution Manager
Assignment
Task Implementation
Logging Service
Assignment Result
Assignment Data
Worker Info
Figure 2. Garcinia Framework Components.
3.4 Garcinia Shared Library

The first part of the Garcinia Framework is the Garcinia Shared Library which is built as a class library. The Garcinia Shared Library is responsible to provide the common set of Application Programming Interface (API) to both the consumer applications and the agent. The library contains seven main components: Sender, Task (Abstract), Task Metadata, Task Execution Manager, Assignment, Assignment Result, and Worker Info. The Garcinia Shared Library can be thought as a bridge between the Garcinia Agent and the consumer application. The library provides a common API while at the same time reduce the responsibility of the consumer application to know more in the details on how the Garcinia Framework works. It is a both abstraction layer and communication layer to the application. The consumer application part of the Garcinia Shared Library resides in the consumer applications address space and does the communication to the agent on behalf of the consumer application.
3.5 Garcinia Agent

The second part of the Garcinia Framework is the Garcinia Agent which is built as a standalone program. The Garcinia Agent is responsible to be a participating node in the peerto-peer overlay network and provide the worker service for the remote task execution. The 7
agent program contains four main components: Command Line Service, Overlay Service, Worker Service, Dispatcher Service, and Logging Service. The Garcinia Agent also makes use of the Garcinia Shared Library to facilitate its operation.
3.5.1 Overlay Service

The first function of the Garcinia Agent is to be a participating node in the peer-to-peer overlay network. In order to participate in the overlay network, the Garcinia Agent will use the overlay service. The overlay service will join the agent to the IP multicast group then start communicating with the other agents which also joined to the group. In the overlay networks, each agent will exchange the information about itself to the other agents in simple datagram exchange called HELLO-HI protocol where each agent will tell its agent ID and the information about available resources. The included information is at least the number of available processors and the size of the free memory available for the task execution. The list of agents and the respective information about the available resources will be maintained in by each agent without depending to any centralised entity.
3.5.2 Worker Service

The second function of the Garcinia Agent is to provide a worker service for tasks created by the consumer applications. As the requirement, the agent must be able to support code migration as the agent will not know in advance the kind of task it has to execute. The agent is designed to support the weak mobility code migration model where the supported operation is code segment migration from the consumer application to the worker service. The code migration is initiated by the sender (sender-initiated model) which tells the worker service to receive and cache the received task code segment for further use. The worker service also receives the assignment data that will be used as the input data for the task execution. The task is then executed on the worker service address space. The output of the task execution is called the assignment result and will be sent back to the consumer application after the task execution is done. The worker service has an authentication mechanism to check whether the consumer application is permitted to do the work with the worker. The authentication will be done in challenge-response fashion so no secret will be transmitted between the worker service and the consumer application. The main reason for this authentication mechanism is to make sure the assignment is not send to the wrong worker service in case of the worker info list is not yet updated after the remote agent is stopped and restarted. See the diagram for the overview of the worker service operation:
The Worker Service Operation

Authentication challenge Task implementation (code segment) Authentication response Task executor
Worker Service
Task code segment migration Task executor Assignment data Assignment data transfer
Assignment data
The task execution by the worker service
Task executor
Assignment data result transfer
Figure 3. The worker service operation diagram.
3.5.3 Dispatcher Service

The dispatcher service is a service to tell the consumer application where to put the task. The service works closely with the peer-to-peer overlay service to provide the list of available worker service. In the spirit of peer-to-peer distributed computing where every node in the network should be equal to the other nodes and the same node can be a service provider and also at the same time become the consumer of the service, it makes sense that the dispatcher service should be available only to the local consumer application. The dispatcher should reject any incoming connection from a consumer application which belong the other node. This is done to ensure the local agent is started and to keep the fairness by making the node also available to the others. In the simplest form, the dispatcher service could respond to the consumer application request for worker service list by giving the consumer application a list of available worker service belonging to each agent joined in the overlay network. The consumer application will then choose a worker service from the list and send the assignment to that worker service. On the other hand, the Garcinia Shared Library can also make use the available information to abstract out the worker list from the consumer application and make the decision on behalf of the consumer application to where the assignment should be sent and try to balance the load.
3.6 Consumer Application

As a user of Garcinia Framework, the consumer application makes use the Garcinia Shared Library as the main entry point to the whole Garcinia Framework. The consumer application shall create a class which inherits from the abstract Task in the Garcinia Shared Library to provide the implementation of the task processes. The consumer application also shall make use of the provided Assignment component to supply the data to the worker service.
Example of Interaction between Consumer Application and the Garcinia Shared Library
User Application Code Create a new Sender

Sender
Send Task (1) and Assignment (1.1)
Task (Abstract)
Receive the result of Assignment (1.1)
Task Metadata
Task Implementation (1)
Task Execution Manager
Assignment Data (1.1)
Assignment
Assignment Result
Task Implementation (2)
Worker Info
Figure 4. The interaction of Consumer Application and Shared Library.
3.7 Assignment Scheduling Strategy

To efficiently make use of available processing resource, the Garcinia Framework supports two types of load balancing: static load balancing and randomised dynamic load balancing. On the static load balancing, the assignment distribution is determined by the consumer application. On the randomised dynamic load balancing, the assignment distribution is determined by the Garcinia Framework. Both load balancing methods shall utilise the information of available processing resource of the nodes connected to the network. The consumer application utilising the static load balancing should make use the information of the available nodes in the network. The Garcinia Framework provides the number of processing elements and the free memory space of each node to the consumer application. The consumer application shall split and distribute the assignment accordingly. On the other hand, the consumer application utilising the randomised dynamic load balancing does not have to request the information of available processing resources from the framework. The application may simply call the method in the Garcinia Shared Library to do the assignment and the framework will choose the appropriate worker service on behalf of the application. The randomised dynamic load balancing in this framework is collaboration between the shared library, the dispatcher service, and the worker service. At first, the shared library request the list of worker service to the dispatcher service on behalf of the consumer application. The list of worker service is then randomised and put into a queue. The assignment will be sent to the first worker service in the queue. If the chosen worker service receives a new incoming connection after its queue is full, the incoming connection is rejected and the sender is told. The sender, seeing the connection is rejected, will then put the worker service back to the queue and the next worker service will be selected from the queue.
10
4. The Implementation of Garcinia Framework 4.1 The Network Overview

An organisational-local scope IPv4 multicast address 239.7.12.90 is used as the overlay network communication channel in this research. The choice of the IP address is quite arbitrary as the 239.0.0.0/10 block is not yet assigned so it will minimise the possibility of collision. The multicast address is currently hardcoded as a static final variable but is not fixed and can be able to be changed in production environment. The physical topology of the overlay network depends on the underlying multicast topology. In the implementation, the agent makes no attempt to control the multicast topology; therefore, it is up to the network administrator to configure the underlying multicast routing. There is no inter-network routing in the test network. The Garcinia Agent overlay service listens for the multicast datagram on the User Datagram Protocol (UDP) port 9011. Same as the address, the port number is currently hardcoded as a static final variable but may be changed in the production environment. The UDP port 9011 is bound to the agent on each available network interface which supports the multicasting. The Garcinia Agent also listens for the incoming connection on Transmission Control Protocol (TCP) port 9011. For the practical reason, both of worker service and the dispatcher service share the same TCP port and the same protocol header information. Every network interface is bound to the agent. The loopback interface is also bound to the listener in order to serve the request from the consumer applications run on the same machine. No provision is made to run the Garcinia Framework Peer-to-peer overlay network on Internet Protocol version 6 (IPv6) although the dispatcher and worker service already support it. It is done on the reason of unavailability of native IPv6 network on the development environment.
4.2 Overlay Network Datagram

Table 1. Overlay network datagram message format. HELLO Message Field Name Message Type Sender ID CPU Tag No. of CPU RAM Tag Size of RAM HI Message Message Type Sender ID CPU Tag No. of CPU RAM Tag Size of RAM LEAVE Message Message Type Sender ID Data type String (const) UUID String (const) Integer String (const) Long String (const) UUID String (const) Integer String (const) Long String (const) UUID Example Content HELLO! a937c962-6cb4-442b-9d1a-9bf4ff090e9e CPU 4 MEM 16777216 HI! 3c6ac065-565d-478e-8d76-627df99d77f5 CPU 4 MEM 16777216 LEAVE ce94cf4a-2f69-45cf-85c5-33be5e94adb8
The overlay service manager periodically maintains the list of the neighbouring nodes by sending and receiving datagram between the nodes in multicast and unicast manner. The 11
maintenance includes the service discovery by processing the incoming datagram. The datagram is a simple UDP packet containing a message in plaintext format. There are three types of message to be processed: the HELLO message, the HI message, and the LEAVE message. The structure of the messages can be observed on the Table 1.
4.3 Garcinia Worker-Dispatcher Protocol

The Garcinia Worker-Dispatcher Protocol is implemented as a Java object stream over TCP socket stream. The object stream is dependant to the Java Platform and is not interoperable with another platform which does not implement the Java object stream. The protocol is heavily dependent to the Java serialization subsystem. The Table 2 show the implemented strings for tagging the communication between the dispatcher, the worker, and the Garcinia Shared Library on behalf of the consumer application:
Table 2. Garcinia Worker-Dispatcher Protocol Dispatcher - Consumer communication strings
GARCINIA_DC_GET_AVAILABLE_WORKER GARCINIA_DC_AVAILABLE_WORKER_LIST_TAG "GET-AVAILABLE-WORKER" "AVAILABLE-WORKER-LIST" "GARCINIA-AGENT-WORKER" "GARCINIA-CLIENT-WORK" "TASK-BYTECODE-REQUESTED" "TASK-BYTECODE-LOADED" "TASK-BYTECODE" "ASSIGNMENT-DATA" "ASSIGNMENT-RESULT" "GARCINIA-AGENT" "GARCINIA-MODE-DISPATCH" "GARCINIA-MODE-DISPATCH-OK" "GARCINIA-MODE-WORK" "GARCINIA-MODE-WORK-OK" "PROTOCOL-VERSION-1.0" "TASK-METADATA" "AUTH-REQUIRED" "AUTH-FAIL" "CLOSE"
Worker - Consumer authentication strings

GARCINIA_WC_AGENT_HEADER GARCINIA_WC_CLIENT_HEADER
Worker - Consumer communication strings

GARCINIA_WC_TASK_BYTECODE_REQUESTED GARCINIA_WC_TASK_BYTECODE_LOADED GARCINIA_WC_TASK_BYTECODE_TAG GARCINIA_WC_ASSIGNMENT_DATA_TAG GARCINIA_WC_ASSIGNMENT_RESULT_TAG
Shared communication strings

GARCINIA_SH_AGENT_HEADER GARCINIA_SH_SET_MODE_DISPATCH GARCINIA_SH_DISPATCH_MODE_OK GARCINIA_SH_SET_MODE_WORK GARCINIA_SH_WORK_MODE_OK GARCINIA_SH_PROTOCOL_VERSION_1 GARCINIA_SH_TASK_METADATA_TAG GARCINIA_SH_AUTHENTICATION_REQUIRED GARCINIA_SH_AUTHENTICATION_FAIL GARCINIA_SH_CLOSE
4.4 Garcinia Shared Library

The Garcinia Shared Library is implemented on top of Java Platform, Standard Edition 6 (Java SE) as a class library. It contains two packages: garcinia and garcinia.util. The garcinia package contains the Assignment class, the AssignmentResult class, the AssignmentResultList class, the GarciniaProtocolException class, the Protocol class, the Sender class, the Task abstract class, the TaskExecutionManager class, the TaskMetadata class, the WorkerInfo class, and the WorkerInfoList class. The garcinia.util package contains the ByteClassLoader class, the Helper class, and the Network class.
12
4.5 Garcinia Agent

The Garcinia Agent is implemented on top of Java Platform, Standard Edition 6 (Java SE) as a standalone program. It contains the overlay service, the worker service, and the dispatcher service. The Garcinia Agent worker service and dispatcher service both share the same TCP port 9011 and the same protocol header. Based on that rationale and practical reasons, the worker service and the dispatcher service are implemented into the same package and share the same listener and manager. The Garcinia Agent is implemented as a multi-threaded program, thus the each service can be executed concurrently. In addition, the worker service and dispatcher service implements a thread pool service thus a limited number of connections can be served simultaneously. The limit is currently set at 10 simultaneous threads and if there are more requests than the number of the threads, the requests will be queued until the previous requests are served. As for the implementations details, the Garcinia Agent contains four packages: garcinia, garcinia.logging, garcinia.overlay, and garcinia.worker. The garcinia package contains the Main class. The garcinia.logging package contains the LogLevel class, the StdoutHandler class, and the VerySimpleFormatter class. The garcinia.overlay package contains the DatagramProcessor interface, the Node class, the OverlayHelper class, the OverlayListener class, the OverlayManager class, and the SimpleDatagram class. The garcinia.workerdispatcher package contains the ConnectionHandler class, the Listener class, and the Manager class.
4.6 Garcinia Framework Test Application

The Garcinia Framework test applications are implemented as the Garcinia Frameworks consumer applications itself. The application is an ordinary application built on top of the Java Platform. The test applications make use the Garcinia Shared Library to communicate to the local and remote instance of the Garcinia Agent.
4.6.1 Reverse String Test Application

The Reverse String test application consists of a simple string reversal program to be executed remotely. The test application will accept lines of string as the input and must be ended with a quit keyword. The string reversal is done using Javas built in reverse() method of StringBuilder class. The time complexity of the reverse() method is .
4.6.2 Distributed Sort Test Application

The Distributed Sort test application consists of a remotely executed list sorting program to sort a list of double floating point numbers and a locally executed array merge program. The test application will accept the number of assignments to be generated and the number of data to be generated. The time complexity of remote sort by using the Javas built in Collections.sort() is and the local array merge time complexity is . The complexity of the sort done sequentially is . The distributed sort makes use of static load balancing method.
13
4.6.3 LCS Genetic Matching Test Application

The LCS Genetic Matching (LCS Matching) test application is an implementation of Longest Common Subsequence algorithm to do all-to-all pair matching of equal-length strings of genetic strands. The test application will accept the length of strings to be generated and the number of strands to be generated. The strands of length consisting of just four different characters of A, T, G, C will be randomly created. Each assignment will contains a pair of strands and the assignment result will be a whole integer containing the length of the longest common subsequence from the both strands. After the LCS lengths of all pairs are calculated, the pair with the longest length is the closest match. The order of growth of the single LCS calculation done remotely is where equals to strand length and the local pair matching order of growth is where equals to the number of strands . The asymptotic complexity of the matching done sequentially will be . The LCS Matching makes use the built-in randomised dynamic load balancing. The LCS algorithm is adapted from the book Algorithms: Sequential, Parallel, and Distributed.[20]
5. Functional Analysis 5.1 Test Environment

The test environment for the functional analysis of the Garcinia Framework consists of 10 test nodes and a monitor node. The test nodes are named TEST-XX where XX is the ordinal number of the node and the monitor node is named MONITOR. The specifications of the test node are: 1) Intel Pentium 4 CPU, one single-thread physical core at 3.0 GHz; 2) 512 MB of DDR RAM with 8 MB shared VRAM; 3) Broadcom NetXtreme Gigabit Ethernet (1 Gbit/s); 4) Windows XP SP2 32-bit operating system; 5) Java Platform, Standard Edition version 6 update 26. Whereas the specifications of the monitor node are: 1) Intel Core T4200 CPU, dual single-thread core at 2.0 GHz; 2) 4096 MB of DDR2 RAM; 3) Realtek RTL8102E/RTL8103E PCI-E Fast Ethernet (100 Mbit/s); 4) Windows 7 64-bit operating system; 5) Java Platform, Standard Edition version 6 update 26.
IP Address: 122.200.8.0/25 TEST-01 TEST-06
TEST-02
TEST-07
TEST-03
TEST-08
TEST-04
TEST-09
MONITOR TEST-05 TEST-10
Figure 5. The physical network topology.
14
Each node is connected to a 1000BASE-T switched network by a Category 6 Unshielded Twisted-pair (Cat6 UTP) cable in star-topology. The IPv4 address block 122.200.8.0/25 is assigned to the test network.
Table 3. Node name, IP address, packet loss, and TCP latency. Node Name IP Address Packet Loss TCP Latency TEST-01 122.200.8.59 0.00% 60.92 ms TEST-02 122.200.8.60 0.00% 70.63 ms TEST-03 122.200.8.63 0.00% 61.33 ms TEST-04 122.200.8.64 0.00% 57.89 ms TEST-05 122.200.8.65 0.00% 61.89 ms TEST-06 122.200.8.69 0.00% 58.13 ms TEST-07 122.200.8.70 0.00% 57.41 ms TEST-08 122.200.8.71 0.00% 58.51 ms TEST-09 122.200.8.74 0.00% 63.68 ms TEST-10 122.200.8.75 0.00% 67.47 ms MONITOR 122.200.8.79 n/a n/a Of Total 524 TCP Connections: 0.00% 28 ms Minimum 0.00% 103 ms Maximum 0.00% 62.40 ms Average 0.00% 27.96% CV
5.2 Overlay Network Connectivity Test

The overlay network connectivity check is done by using the built-in logger function in the Garcinia Agents user interface. The logger displays the current agents service activity to the standard output stream and standard error stream of the currently active console session. From the output of the list command on the Figure 6b, it can be observed that there are 11 active nodes including the local agent on the 122.200.8.79:9011. The time of the last update, the number of available CPUs and free memory space to the task execution are also shown on the list. The set-logging-level command can be seen on Figure 6c. It is used to change the logging detail level.
Figure 6a-b. The agent activity display in a console session
15
Figure 6b. The agent activity display in a console session.
5.3 Overlay Network Data Traffic

The overlay network data traffic consist of UDP datagram directed to the port 9011 and the IGMP multicast group membership report containing a reference to the multicast address 239.7.12.90. The packet capture was done after the overlay network had converged and no node joined or left the network.
Table 4. Summary of UDP traffic on port 9011 over period of 10 minutes. Address A Port A Address B Port B Packets 122.200.8.59 9011 239.7.12.90 9011 60 122.200.8.60 9011 239.7.12.90 9011 60 122.200.8.63 9011 239.7.12.90 9011 60 122.200.8.64 9011 239.7.12.90 9011 60 122.200.8.65 9011 239.7.12.90 9011 60 122.200.8.69 9011 239.7.12.90 9011 60 122.200.8.70 9011 239.7.12.90 9011 60 122.200.8.71 9011 239.7.12.90 9011 60 122.200.8.74 9011 239.7.12.90 9011 60 122.200.8.75 9011 239.7.12.90 9011 60 122.200.8.79 9011 239.7.12.90 9011 60 TOTAL: 660 Bytes 6,360 6,360 6,240 6,360 6,360 6,360 6,360 6,360 6,352 6,360 6,360 69,832
From the Table 4, it can be observed that each node generates exactly 60 multicast UDP packets over the period of 10 minutes (one packet every 10 seconds). This confirms the implementation of the overlay service which updates to the network every 10 seconds.
Table 5. Summary of IGMP traffic of multicast address 239.7.12.90 over period of 10 minutes. Address A Address B Packets Bytes 122.200.8.59 224.0.0.22 5 310 122.200.8.60 224.0.0.22 5 310 122.200.8.63 224.0.0.22 5 310 122.200.8.64 224.0.0.22 5 310 122.200.8.65 224.0.0.22 5 310 122.200.8.69 224.0.0.22 5 310 122.200.8.70 224.0.0.22 5 310 122.200.8.71 224.0.0.22 5 310 122.200.8.74 224.0.0.22 5 310 122.200.8.75 224.0.0.22 5 310 122.200.8.79 224.0.0.22 5 270 122.200.8.147 224.0.0.22 5 510 TOTAL: 60 3,880
16
From the Table 5 it can be seen the multicast group membership report datagram are sent by the individual node IP address to the 224.0.0.22 multicast group. Take note the Join group 239.7.12.90 for any sources messages; these messages are to inform that the corresponding source IP address is willing to receive any multicast datagram sent to the 239.7.12.90 multicast group. A stray machine on IP address 122.200.8.47 also appears on the list as it appears to run software which also uses multicast groups and includes the address 239.7.12.90 beside the shown address 239.192.152.143 in its IGMP packet. This occurrence considered not relevant to the operation of Garcinia Framework peerto-peer overlay network. From the total 318,858 bytes in 3,478 packets captured within 600 seconds (10 minutes), only 73,712 bytes in 720 packets generated by the Garcinia overlay network and its corresponding IGMP. That translates to 123.0 bytes/second traffic in just 1.201 packet/second throughput. For comparison, the total UDP traffic is 179,269 bytes in 1,601 packets over the same 10 minutes period.
5.4 Local Agent Functional Test

For the local agent functional test only a physical machine is used which is the monitor node. The test is a very basic test; a string is sent to the task and the task should return a reversed string. The Garcinia Agent is being run disconnected from the network thus essentially the only way to run the task is on the local agent itself. It can be clearly observed from the Figure 7 that the agents thread pool is launching two threads; one thread to serve the dispatch request and another thread to serve the work request. The dispatch request is received by the agent on the loopback interface 127.0.0.1 whereas the work request is received on network interfaces IP address 192.168.70.1 which registered to the overlay network.
Figure 7. The consumer application and agents output of reverse string test application.
17
The successful local agent functional test also prove that the Garcinia Framework resilience to the network failures as the task execution can still be done without the presence of another agents in the overlay network.
5.5 Peer-to-Peer Functional Test

The peer-to-peer functional test is done to show the Garcinia Framework works as supposed in peer-to-peer environment. In this test, each test node runs the distributed Longest Common Subsequence (LCS) test application to find the genetic match (LCS Genetic Matching / LCS Matching) for ten strands of 5,000 base pairs helped by the neighbouring nodes. The test is executed on each node in order to give chance for every node to push tasks to the network. The agent in the monitor node is turned off and accordingly does not participate in the peerto-peer network.
Table 6. Peer-to-peer functional test results.
Test Location TEST-01 TEST-02 TEST-03 TEST-04 TEST-05 TEST-06 TEST-07 TEST-08 TEST-09 TEST-10 IP Address 122.200.8.59 122.200.8.60 122.200.8.63 122.200.8.64 122.200.8.65 122.200.8.69 122.200.8.70 122.200.8.71 122.200.8.74 122.200.8.75 Result Success Success Success Success Success Success Success Success Success Success Average: CV: Execution Time (ms) 10,621 7,898 8,173 9,922 9,182 8,635 9,997 7,396 8,013 8,558 8,840 11.94%
It can be observed from the Table 6 that every node can successfully execute the distributed LCS test application with the average execution time of 8,840 milliseconds. The execution time variation is measured using coefficient of variation (CV). The fact that the execution time varies by 11.94% can be explained by the use of randomised assignment distribution and the variation of the network performance over time.
Table 7. Worker service load using random assignment distribution.
Worker Service Load Test Location TEST-01 TEST-02 TEST-03 TEST-04 TEST-05 TEST-06 TEST-07 TEST-08 TEST-09 TEST-10 TEST-01 0.0% 11.1% 15.6% 31.1% 11.1% 6.7% 4.4% 8.9% 6.7% 4.4% TEST-02 22.2% 0.0% 8.9% 17.8% 4.4% 2.2% 15.6% 8.9% 6.7% 13.3% TEST-03 20.0% 22.2% 0.0% 13.3% 13.3% 4.4% 4.4% 8.9% 4.4% 8.9% TEST-04 15.6% 28.9% 4.4% 0.0% 20.0% 2.2% 6.7% 2.2% 8.9% 11.1% TEST-05 20.0% 26.7% 13.3% 11.1% 0.0% 8.9% 2.2% 2.2% 11.1% 4.4% TEST-06 15.6% 24.4% 13.3% 17.8% 2.2% 0.0% 8.9% 6.7% 4.4% 6.7% TEST-07 11.1% 28.9% 13.3% 17.8% 13.3% 8.9% 0.0% 2.2% 2.2% 2.2% TEST-08 11.1% 20.0% 20.0% 20.0% 2.2% 6.7% 11.1% 0.0% 0.0% 8.9% TEST-09 15.6% 22.2% 6.7% 13.3% 8.9% 11.1% 8.9% 2.2% 0.0% 11.1% TEST-10 20.0% 24.4% 6.7% 15.6% 6.7% 8.9% 6.7% 8.9% 2.2% 0.0% Average: 16.8% 23.2% 11.4% 17.5% 9.1% 6.7% 7.7% 5.7% 5.2% 7.9% CV: 24.0% 23.5% 44.2% 33.2% 64.9% 47.1% 52.6% 59.1% 67.8% 46.9% = Dispatcher Node CV = 64.7% = Worker Nodes
The load is being balanced by making use of the built-in randomised dynamic load balancing. The way the load can be measured is by analysing the number of assignments assigned to a worker service relative to the total number of assignments of the task, called relative load. In this test, the relative load is measured to the total load of each corresponding test location which is set at 100%. 18
It can be observed from the Table 7 that the nodes worker service load is not perfectly balanced and varies considerably from one test to another test, with average CV over total 10 nodes is 64.7%. In comparison to the data in the Error! Reference source not found., it can be seen although the load is not perfectly balanced, the framework can maintain the execution time variation of just 11.64%, still considerably less than the worker service load variation.
5.6 Speedup and System Efficiency

The speedup and system efficiency of the parallel computer is tied to the certain system architecture and the algorithm used. In this test, the LCS Matching is used because the properties of the algorithm which is computationally expensive relative to the data size and the lack of required communication and synchronisation between workers (embarrassingly parallel). The distributed computing is especially suited to do that certain kind of algorithm and therefore it is the most common use case. If another algorithm is used, the speedup and system efficiency may be higher or lower. The single core Intel Pentium 4 3.0 GHz processor is the basic processing element unit used in the analysis. The monitor node uses a dual core processor (Intel Core T4200 2.0 GHz) whereas the test nodes use single core processor (Intel Pentium 4 3.0 GHz) hence an assumption must be used on the analysis to account the effect of different processor type used by the monitor node. The empirical performance test using sequential LCS Matching program shows each individual Intel Core T4200 processor core is about 1.8 times faster than a Intel Pentium 4 3.0 GHz. To simplify the analysis, it is assumed to be 2.0 times faster instead. As the effect of this assumption, an Intel Core T4200 processor is 4.0 times faster than an Intel Pentium 4 3.0 GHz processor and is comparable to four individual processing element units. The execution time data is obtained from running 12 different test configurations with 10 test cases used in each test configuration for the total of 120 combinations. Each test configuration test is consists of running every test case of randomly generated strands on the LCS Matching program three times and the execution time is averaged. Each test configuration has equivalent number of processing elements and corresponding nodes state as listed on the Table 8.
Table 8. The node state in each test configuration
No. of Test Config. No. Processing TESTID MONITOR 01 Elements 1. Reference 1 OFF ON 2. N = 0 4 ON OFF 3. N = 1 5 ON ON 4. N = 2 6 ON ON 5. N = 3 7 ON ON 6. N = 4 8 ON ON 7. N = 5 9 ON ON 8. N = 6 10 ON ON 9. N = 7 11 ON ON 10. N = 8 12 ON ON 11. N = 9 13 ON ON 12. N = 10 14 ON ON Nodes State
TEST02 TEST03 TEST04 TEST05 TEST06 TEST07 TEST08 TEST09 TEST10
OFF OFF OFF ON ON ON ON ON ON ON ON ON
OFF OFF OFF OFF ON ON ON ON ON ON ON ON
OFF OFF OFF OFF OFF ON ON ON ON ON ON ON
OFF OFF OFF OFF OFF OFF ON ON ON ON ON ON
OFF OFF OFF OFF OFF OFF OFF ON ON ON ON ON
OFF OFF OFF OFF OFF OFF OFF OFF ON ON ON ON
OFF OFF OFF OFF OFF OFF OFF OFF OFF ON ON ON
OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF ON ON
OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF OFF ON
An important note about the test configuration is the Reference test configuration uses the sequential version of the LCS Matching algorithm and every other test configuration uses the distributed parallel version of the algorithm. The Reference test configuration execution time is the basis of the speedup and system efficiency analysis. The reason for that is because of the speedup and system efficiency analysis of a distributed computing system by using a 19
distributed parallel program needs a reference to the sequential version of the program runs in a sequential computer. Table 9 shows the 10 cases of the number of strands used in each test configuration along with the length of the strands and the maximum number of concurrent assignment executions. A combination of a test configuration and a case number of strands is called a treatment. In totals, there are 120 treatments are used to test the Garcinia Framework speedup and system efficiency.
Table 9. Test cases for the speedup and system efficiency analysis.
No. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Number of Strands 5 10 15 20 25 30 35 40 45 50 Length of Strand 5,000 5,000 5,000 5,000 5,000 5,000 5,000 5,000 5,000 5,000 Max. Concurrency 100 100 100 100 100 100 100 100 100 100 Exp. Number of Assignments 10 45 105 190 300 435 595 780 990 1225
The execution time is the time that elapses from the first time the calculation starts, including the assignment scheduling, to the moment of the LCS Genetic Matching program finishes execution. Each data series in Figure 8 corresponds to a test configuration and each data point corresponds to a treatment. The vertical axis shows the execution time in seconds and the horizontal axis shows the number of strands used in the treatments. The data is plotted linear scale on both axes. Lower execution time means better performance.
Distributed LCS Matching Execution Time
800
700 Reference 600

Execution Time (seconds)
N=0
N=1
500 400 300

200 100
N=2 N=3 N=4

N=5
N=6 N=7 N=8

N=9
0 5 10 15 20 25 30 35 40 45 50
N = 10
Number of Strands
Figure 8. Distributed LCS "Genetic Matching" execution times.
20
Figure 9 shows the execution time for 50 strands relative to the number of processing elements. It clearly shows the execution time is faster (lower) when additional processing elements is added. Note that there is no data for the execution with two processing elements and three processing elements as there is no test is done; the dual core monitor node is equivalent to four processing elements instead of two.
Distributed LCS Matching Execution Time for 50 Strands
800
700 600
Execution Time (seconds)
500 400 Strands = 50 300

200 100
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Number of Processing Elements
Figure 9. The execution time relative to the number of processing elements.
Figure 10 shows the speedup in relation to the number of processing elements with different types of test cases. The speedup is always less than 2.000 for the five strands test case regardless of the test configuration. Although with the lower number of strands ( ) the speedup increase is not readily apparent, it can be observed from that the rate of speedup raise is generally increasing by the number of processing elements although the relation is not always linear. It is very clear that the speedup is almost linear to the number of processing elements, especially on the higher order of test cases ( ).
Distributed LCS Matching Speedup vs. Number of Processing Elements
10.000 9.000
8.000 7.000 6.000

Speedup
Strands = 5
Strands = 10
Strands = 15 Strands = 20 Strands = 25
5.000 4.000 3.000 2.000 1.000 0.000 4 5 6 7 8 9 10 11 12 13 14
Strands = 30
Strands = 35 Strands = 40 Strands = 45

Strands = 50
Figure 10. Speedup relative to the number of processing elements.
21
The correlations between the numbers of processing elements with the speedup are positive and more than 0.90 for . Because of the test is done in a controlled test environment, it can be said that higher number of nodes, and correspondingly the number of processing elements, is the cause of the increasing speedup, especially on the higher number of strands.
Distributed LCS Matching Speedup vs. Number of Strands
10.000 9.000
8.000 7.000 6.000

Speedup
N=0
N=1 N=2 N=3
5.000 4.000 3.000 2.000 1.000 0.000 5 10 15 20 25 30 35 40 45 50
N=4
N=5 N=6 N=7

N=8
N=9 N = 10
Number of Strands
Figure 11. Speedup relative to the number of strands.
From Figure 11, it can be observed that the speedup is increasing along with the number of strands. In the test configuration with larger number of processing elements, the speedup is higher than the one with less number of processing elements; almost nine times speedup in the test configuration with 14 processing elements (N = 10 test configuration) but almost no speedup in the N = 0 test configuration. The correlations between the numbers of strands with the speedup are positive and averaged more than 0.85 for five or more processing elements. Because of the test is done in a controlled test environment, it can be said that larger number of strands (consequently, larger number of assignments) is also the cause of the increasing speedup, especially on the system with many processing elements available. The reason for that is as the test case getting larger, more discrete assignments are generated and the amount of parallelism is increased relative to the problem size. Therefore it is beneficial to use the framework to process rather large amount of discrete assignments instead small number of discrete assignments. One of the most common use cases of distributed computing is to speed up the processing of large data set; therefore, it is wise to focus more on the system efficiency on the larger test case. As it can be observed on the Figure 12, the system efficiency is actually increased as the number of processing elements is increased; it is contrary to the usual case of reduced system efficiency when more processing elements are used. It may indicate the proportion of the system overhead is actually lower when the system is scaled up although the 100% efficiency may be not possible as the overhead will be still exists.
22
Distributed LCS Matching System Efficiency vs. Number of Processing Elements

70.00%
60.00% Strands = 5 Strands = 10

System Efficiency
50.00%
40.00%
Strands = 15
Strands = 20
30.00%
Strands = 25 Strands = 30
20.00%
Strands = 35
Strands = 40
10.00%
Strands = 45 Strands = 50
0.00% 4 5 6 7 8 9 10 11 12 13 14
Figure 12. System efficiency relative to the number of processing elements.
It is quite remarkable here that the system efficiency can reach 62.2% on 50 strands test case and 14 processing elements configuration with the randomised dynamic load balancing. It can be related to the successful achievements of 44% system efficiency on the research by the Purdue University by using random walk load balancing. The result cannot be compared directly because the Purdue Universitys research is based on simulation-based evaluation.
Distributed LCS Matching System Efficiency vs. Number of Strands
70.00%
60.00% N=0 50.00%
N=1 N=2
System Efficiency
40.00%
N=3
N=4
30.00%
N=5 N=6
20.00%
N=7
N=8
10.00%
N=9 N = 10
0.00% 5 10 15 20 25 30 35 40 45 50
Number of Strands
Figure 13. System efficiency relative to the number of strands.
It can be readily observed from the Figure 18, the larger number of strands result in the higher system efficiency. Due to the way of bag-of-task distributed computing works, the amount of parallelism is dependant to the number of discrete assignments. The number of discrete assignments is increasing in quadratic manner with the increasing number of strands. In the lower number of strands, only few discrete assignments are available and due to the random assignment distribution mechanism used in the Garcinia 23
Framework, the assignments cannot be reliably distributed to every processing element in balanced manner. Therefore, it is better to utilise the available distributed processing by solving the larger number of assignments instead to solve problems with few discrete assignments.
5.7 Multiplatform Functional Test

In particular for the multiplatform test, two physical machines and four virtual machines which correspond to six nodes are used. The first physical node is named WINDOWS-01. The specifications of the first node are: 1) Intel Core T4200 CPU, dual single-thread core at 2.0 GHz; 2) 4096 MB of DDR2 RAM (shared with VM, 2048 MB effective); 3) Realtek RTL8102E/RTL8103E PCI-E Fast Ethernet (100 Mbit/s); 4) Windows 7 64-bit operating system; 5) Java Platform, Standard Edition version 6 update 26. Whereas the second node is a physical machine node named WINDOWS-02 and its specifications are: 1) Intel Core i5 M430, four cores CPU at 2.27 GHz; 2) 3,072 MB of DDR3 physical RAM; 3) Realtek PCI-E GBE Family Controller (1 Gbit/s); 4) Windows 7 32-bit operating system; 5) Java Platform, Standard Edition version 6 update 26. The third and the fourth nodes are Windows XP virtual machines which belong to the first node. They are named WINDOWS-03 and WINDOWS-04 respectively. The Windows XP virtual node specifications are: 1) Intel Core T4200 virtual CPU, one core at 2.0 GHz; 2) 256 MB of DDR2 RAM shared with the physical machine; 3) VMWare Virtual Ethernet Adapter (1 Gbit/s) tied to the Realtek RTL8102E/RTL8103E PCI-E Fast Ethernet (100 Mbit/s) of WINDOWS-01 physical machine; 4) Windows XP SP2 32-bit operating system; 5) Java Platform, Standard Edition version 6 update 26. The fifth and the sixth nodes are Linux virtual machines which belong to the first node. They are named LINUX-01 and LINUX-02 respectively. The specifications of those two virtual machines are: 1) Intel Core T4200 virtual CPU, one core at 2.0 GHz; 2) 256 MB of DDR2 RAM shared with the physical machine; 3) VMWare Virtual Ethernet Adapter (1 Gbit/s) tied to the Realtek RTL8102E/RTL8103E PCI-E Fast Ethernet (100 Mbit/s) of WINDOWS-01 physical machine; 4) Linux Fedora 14 32-bit operating system; 5) Java Platform, Standard Edition version 6 update 26.
24
WINDOWS-01 Windows 7 64-bit Physical Machine
WINDOWS-02 Windows 7 32-bit Physical Machine
192.168.90.1
192.168.90.2
192.168.90.3 WINDOWS-03 Windows XP SP2 32-bit (VM)
192.168.90.6 LINUX-02 Fedora 14 (VM)
192.168.90.4 WINDOWS-04 Windows XP SP2 32-bit (VM)
192.168.90.5 LINUX-01 Fedora 14 (VM)
Figure 14. The physical network topology of the multiplatform test.
The physical network topology on the network is star topology. The central hub is the VMware virtual switch mapped to the Realtek RTL8103E PCI-E Fast Ethernet network interface of the WINDOWS-01. The Figure 14 shows the physical topology of the multiplatform test network and the Table 10 shows the node configurations.
Table 10. Node name, IP address, and specifications of the multiplatform test.
Node Name WINDOWS-01 WINDOWS-02 WINDOWS-03 WINDOWS-04 LINUX-01 LINUX-02 IP Address 192.168.90.1 192.168.90.2 192.168.90.3 192.168.90.4 192.168.90.5 192.168.90.6 Operating System Windows 7 64-bit Windows 7 32-bit Windows XP SP2 32-bit Windows XP SP2 32-bit Linux Fedora 14 32-bit Linux Fedora 14 32-bit CPU 2 cores 4 cores 1 core 1 core 1 core 1 core RAM 4096 MB 3072 MB 256 MB 256 MB 256 MB 256 MB
From Figure 15a it can be clearly observed that there are six different nodes, which consist of four nodes with single core processor per node, a node with four cores, and a node with dual core processor. To test the functionality of the multiplatform Garcinia network, the distributed sort is run with 50,000 data elements generated randomly. The consumer application output can be seen on Figure 15b. The Figure 15c is the log output of the Garcinia Agent on the test machine. From the log output, it can be observed that the agent launched three threads in the thread pool. The first thread is assigned to the dispatcher service and the next two threads are assigned to process the work request of the consumer application.
5.8 Evaluation of the Garcinia Framework 5.8.1 The Benefits

The Garcinia Framework has some benefits as listed: 1) the peer-to-peer distributed computing networks on the top of the Garcinia Framework is self organising without any need of central entity to keep the information of the network; 2) the reliance of the peer-to-peer network on the underlying IP multicast network relieves the burden of the manual node configuration to discover the service; 3) the dynamics of the node entering and leaving the system can be monitored by the Garcinia Agent automatically without manual intervention; 25
Figure 15a-c. The consumer application and agents output on the multiplatform test.
4) the data traffic of the peer-to-peer overlay network datagram are low when compared to the data traffic sharing the same network; 5) the Garcinia Framework supports the remote invocation and code migration reduce the need to manually share the code segment of the consumer program; 6) no total loss of computing capability when there is a total loss of network connectivity as the Garcinia Framework is resilient to the network failure by the means of immediate fallback to the local agent worker service; 7) the Garcinia Framework is scalable enough to hold linear speedup relative to the number of processing elements in large number of discrete assignments and its dynamic load balancing obtain comparable system efficiency relative to the other peer-to-peer distributed computing. 8) the Garcinia Framework Application Programming Interface (API) is available to consumer application without the need of exposing the details of the network topology and communication protocols of the peer-to-peer distributed computing system; 9) the write-once run-everywhere flexibility of the Java Platform is fully utilised as the Garcinia Framework is able to run on several types of operating system platform and shares the same library, the same task code segment, the same assignment structure, the same overlay network, the worker services, and the dispatcher services. 26
5.8.2 The Detriments

The Garcinia Framework is not yet perfected and not yet thoroughly tested in many real world situations. In summary, the current detriments of the Garcinia Framework to be used in the production environment are listed as follows: 1) the scalability of the Garcinia Framework is not yet tested on larger scale organisational-level peer-to-peer networks; 2) the type of task can be executed by the worker service is rather limited to the bag-of-tasks style; 3) only Java Platform consumer applications are currently supported as the protocol is completely built on the top of the Java without provision for the interoperability wrapper; 4) no error reporting mechanism between the agents and the consumer application besides the null result; 5) no security provision is used to limit the privilege of the incoming task.
6. Conclusions and Recommendations 6.1 Conclusions

Based on the design, implementation, and the evaluation of the Garcinia Framework, it can be concluded that the framework has met the following criteria: 1) has the ability to handle the dynamics of peer-to-peer distributed computing situation where unexpected node entrance and leave pattern; 2) has the ability to support task code segment migration from the frameworks consumer application to the remotely executing agent; 3) has the ability to provide an Application Programming Interface (API) to abstract the network topology and communication protocol details in the form of the Garcinia Shared Library which can be included in general purpose application development. Based on the test of the Garcinia Framework, it can be concluded that the framework is capable to be used in the following situation with the related advantages: 1) Individual node without network connectivity to another node as the framework is resilient to the network failures and has a fallback mechanism to execute within the local agent worker service. 2) A small network composed of heterogeneous operating system platforms as the framework is built on a write-once run-everywhere application development platform.
6.2 Recommendations
For the future developments, the following improvements can be done to the Garcinia Framework: 1) The extended functionality test should be done within relatively hostile network environment consists of hundreds of nodes where multiple nodes entry, frequent leave, and failures are common. 2) The performance of the worker service and the assignment scheduling can be optimised by better load balancing strategy to fully utilise the potential of peerto-peer distributed computing. 3) The error reporting services must be implemented to help the monitoring the operation of the remote task invocation. 27
4) The security subsystem should be implemented in order to limit the privilege of incoming task as currently the task can execute almost arbitrary code on the agent working service. 5) A well made peer-to-peer overlay network such as JXTA can be used to support advanced features such as Network Address Translation (NAT) piercing and proxy services. 6) The provision for the IPv6 networks should be implemented as soon as possible to reduce the reliance on IPv4.
7. Bibliography
[1] Stanford University. (2010) Folding@home Distributed Computing. [Online]. http://folding.stanford.edu/. Last accessed 25 June 2011. [2] University of California. (2011) SETI@home Search for Extraterrestrial Intelligence. [Online]. http://setiathome.berkeley.edu/. Last accessed 25 June 2011. [3] Distributed.net. (2011) Distributed.net. [Online]. http://www.distributed.net/. Last accessed 25 June 2011. [4] Asad Awan, Ronaldo A. Ferreira, Suresh Jagannathan, and Ananth Grama, "Unstructured Peer-to-Peer Networks for Sharing Processor Cycles," Parallel Computing - Parallel matrix algorithms and applications (PMAA'04), pp. 115-135, 2006. [5] Nazareno Andrade, Lauro Costa, Guilherme Germoglio, and Walfredo Cirne, "Peer-topeer grid computing with the OurGrid Community," in 23rd Brazilian Symposium on Computer Networks (SBRC 2005) - 4th Special Tools Session, 2005. [6] Mohamad bin Osman and Abd Rahman Milan, Mangosteen: Garcinia mangostana L., J. T. Williams et al., Eds. Southampton, United Kingdom: Southampton Centre for Underutilised Crops, 2006. [7] Arnold L. Rosenberg, "Optimal Schedules for Cycle-Stealing in a Network of Workstations with a Bag-of-Tasks Workload," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 2, pp. 179-191, February 2002. [8] George Coulouris, Jean Dollimore, and Tim Kindberg, Distributed Systems: Concept and Design, 4th ed. Boston: Addison Wesley, 2005. [9] Nazareno Andrade, Francisco Brasileiro, and Walfredo Cirne, "Discouraging Free Riding in a Peer-to-Peer CPU-Sharing Grid," in 13th IEEE International Symposium on High performance Distributed Computing, 2004, pp. 129-137. [10] Oracle Corporation. (2011) Java Platform, Standard Edition 6 API Specification. [Online]. http://download.oracle.com/javase/6/docs/api/index.html. Last accessed 25 June 2011. [11] William Y. Arms, Digital Libraries. Cambridge: MIT Press, 2000. [12] G. Timoty Mattson, Beverly A. Sanders, and L. Berna Massingill, A Pattern Language for Parallel Programming. Boston: Addison Wesley, 2004. [13] Andrew S. Tanenbaum and Maarten Van Steen, Distributed Systems: Principles and Paradigms. New Jersey: Pearson Prentice Hall, 2007. [14] J. Risson and T. Moors, "Survey of Research towards Robust Peer-to-Peer Networks: Search Methods," Computer Networks, vol. 50, pp. 47, 226, 2006. [15] Yi Qiao and Fabian E. Bustamante, "Structured and Unstructured Overlays under the Microscope," in USENIX Annual Technical Confrence, 2006, pp. 341-355. 28
[16] A. Fuggetta, G. P. Picco, and G. Vigna, "Understanding Code Mobility," IEEE Transactions on Software Engineering, vol. 24, no. 5, p. 105, May 1998. [17] Douglas E. Comer, Internetworking with TCP/IP: Principles, Protocols, and Architecture, 4th ed. New Jersey: Prentice Hall, 2000, vol. 1. [18] Douglas E. Comer, Internetworking with TCP/IP: Principles, Protocols, and Architecture, 4th ed. New Jersey: Prentice Hall, 2000. [19] Hisao Kameda, El-Zoghdy Said Fathy, Inhwan Ryu, and Jie Li, "A Performance Comparison of Dynamic vs. Static Load Balancing Policies in a Mainframe -- Personal Computer Network Model," in Proceedings of the 39th IEEE Conference on Decision and Control, Sydney, NSW, Australia, 2000, pp. 1415-1420 vol. 2. [20] Kenneth A. Berman and Jerome L. Paul, Algorithms: Sequential, Parallel, and Distributed. Boston: Thomson Course Technology, 2005. [21] Kai Hwang, Advanced Computer Architecture: Parallelism, Scalability, Programmability.: McGraw-Hill, 2005. [22] Ruby Lee, "Empirical Results on the Speedup, Efficiency, Redundancy, and Quality of Parallel Computations," in Proceeding of International Conference in Parallel Processings, 1980, pp. 91-96. [23] Anath Grama, Anshul Gupta, George Karypis, and Vipin Kumar. (2003) Analytical Modeling of Parallel Systems. Lecture Notes. [24] Ian Foster, Designing and Building Parallel Programs. Boston: Addison-Wesley, 1995. [25] Ulrich Drepper, "How to Write Shared Libraries," in UKUUG Linux Developers' Conference, Bristol, 2010. [26] Greg Travis. (2001, May) Build Your Own Java Library. [Online]. http://www.ibm.com/developerworks/java/tutorials/j-javalibrary/j-javalibrary-pdf.pdf. Last accessed 25 June 2011. [27] Oracle Corporation, Your First Cup: An Introduction to the Java EE Platform.: Oracle, 2011. [28] Internet Assigned Numbers Authority. (2011, June) IPv4 Multicast Address Space Registry. [Online]. http://www.iana.org/assignments/multicast-addresses/multicastaddresses.xml. Last accessed 25 June 2011. [29] David Meyer. (1998, July) Best Current Practice, Administratively Scoped IP Multicast (IETF RFC 2365). [Online]. http://tools.ietf.org/html/rfc2365. Last accessed 25 June 2011.
29

Garcinia: A Peer-To-Peer Distributed Computing Framework

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Garcinia: A Peer-To-Peer Distributed Computing Framework

Transféré par

Droits d'auteur :

Formats disponibles

GARCINIA: A PEER-TO-PEER DISTRIBUTED COMPUTING FRAMEWORK

1. Introduction 1.1 Background

1.2 Purpose of the Research

1.3 Scope of the Research

2. Theory Review 2.1 Distributed Computing

2.3 Code Migration

2.4 User Datagram Protocol

2.7 Load Balancing

2.8 Algorithm Analysis

2.9 Speedup and System Efficiency

2.10 Shared Library

2.11 Java Platform

3.2 Task and Assignment

3.3 Garcinia Framework Components Design

Garcinia Framework Components

Garcinia Shared Library

Task Execution Manager

Garcinia Shared Library

Garcinia Shared Library

Figure 2. Garcinia Framework Components.

3.4 Garcinia Shared Library

3.5 Garcinia Agent

3.5.1 Overlay Service

3.5.2 Worker Service

The Worker Service Operation

The task execution by the worker service

Assignment data result transfer

Figure 3. The worker service operation diagram.

3.5.3 Dispatcher Service

3.6 Consumer Application

Garcinia Shared Library

Send Task (1) and Assignment (1.1)

Receive the result of Assignment (1.1)

Task Implementation (1)

Send Task (1) and Assignment (1.2)

Task Execution Manager

Assignment Data (1.1)

Receive the result of Assignment (1.2)

Assignment Data (1.2)

Send Task (2) and Assignment (2.1)

Task Implementation (2)

Receive the result of Assignment (2.1)

Assignment Data (2.1)

Figure 4. The interaction of Consumer Application and Shared Library.

3.7 Assignment Scheduling Strategy

4. The Implementation of Garcinia Framework 4.1 The Network Overview

4.2 Overlay Network Datagram

4.3 Garcinia Worker-Dispatcher Protocol

Worker - Consumer authentication strings

Worker - Consumer communication strings

Shared communication strings

4.4 Garcinia Shared Library

4.5 Garcinia Agent

4.6 Garcinia Framework Test Application

4.6.1 Reverse String Test Application

4.6.2 Distributed Sort Test Application

4.6.3 LCS Genetic Matching Test Application

5. Functional Analysis 5.1 Test Environment

MONITOR TEST-05 TEST-10

Figure 5. The physical network topology.

5.2 Overlay Network Connectivity Test

Figure 6a-b. The agent activity display in a console session

Figure 6b. The agent activity display in a console session.