Vous êtes sur la page 1sur 4

Effectiveness of Four Load Balancing Algorithms in Java Parallel Processing Framework

Muris Agić, Mirsad Bulbulušić, Fadila Žilić, Novica Nosović
Faculty of Electrical Engineering, Unviersitiy of Sarajevo Zmaja od Bosne b.b., Sarajevo, Bosnia and Herzegovina
ma14887@etf.unsa.ba mb13929@etf.unsa.ba fz14787@etf.unsa.ba novica.nosovic@etf.unsa.ba Abstract— This paper presents how four load balancing algorithms affect the total execution time of the parallel job using JPPF (Java Parallel Processing Framework). JPPF[1] is Open Source tool that enables applications with large processing power requirements to be run on any number of nodes, in order to reduce their processing time . The algorithms differ in the way of balancing – scheduling tasks between the nodes. One of the algorithms uses the fixed (static) number of tasks given by a user. The other three algorithms use some form of the adaptive load balancing[2]. There are two sets of the measurements presented using these four algorithms for executing a parallel job of matrix multiplication. The results of these measurements led to the conclusion that the static load balancing algorithms spend less time on communication between server and a nodes compared to the others. Keywords— Load balancing, JPPF, autotuned algorithm, proportional algorithm, manual algorithm, reinforcement learning algorithm

One computer was used as a server, while the other as nodes. Using the instance of administration console on server, ten nodes were administrated. Scenario 1: Execution time of a program that performs multiplication two square matrices of size 750x750 was measured. Measurements are made using 2, 4, 6, 8 and 10 nodes and using each of the four load balancing algorithms from. Parameters of all 4 load balancing algorithms are given at Table I. Scenario 2 : Execution time was measured on the same setup as scenario 1, but with matrices of size 1500x1500.
TABLE I. SETTINGS FOR PROPERTIES OF 4 LOAD BALANCING ALGORITHMS

Load Balancing Algorithm MANUAL size = 5

Parameters

I. INTRODUCTION Recently, Java has emerged as a language of choice for parallel programming. As one of the open source tools, Java arouse a lot of interest among professional software engineers. By using the JPPF, they can use things that enable easier parallel programming[3]. One of the most helpful tools are the load balancing algorithms. Usually, other frameworks that support parallel programming use only one algorithm to schedule jobs. JPPF offers four load balancing algorithms which give users a possibilities to analyze their effect on the total execution time of parallelized program. This will be explained in the following sections. II. PERFORMANCE MEASUREMENT SCENARIOS To determine how the load balancing algorithms effect on the total execution time of parallel program, execution time measurement was conducted in four scenarios. 11 computer with following characteristics were used:  CPU, Intel Core 2 Duo E7200.  clock frequency, 2.533 GHz.  RAM size. 3 GB (visible to OS),  Operating System. Windows 7 Professional and  Java Run Environment (JRE) 7.

AUTOTUNED

PROPORTIONAL REINFORCEMENT LEARNING

minSamplesToAnalyze = 5 minSamplesToCheckConvergence = 5 maxDeviation = 0.2 maxGuessToStable = 50 sizeRatioDeviation = 1.5 decreaseRatio = 0.2 size = 5 performanceCacheSize = 2000 proportionalityFactor = 2 performanceCacheSize = 2000 maxActionRange = 10 performanceVariationThreshold = 0.001

Parameters of these load balancing algorithms are explained below : 1) Manual algorithm - divides the job on a fixed number of tasks. This algorithm has only one parameter - size – fixed number of tasks per node. 2) Autotuned algorithm - is heuristic in the sense that it determines inital number of tasks to be send to each node (may not be the optimal) and then seeks better solution. This algorithm is loosely based on the

the percentage of deviation of the current mean to the mean when the system was considered stable. Here.the maximum size of the performance samples cache b) proportionalityFactor. as the maximum.the maximum number of guesses of number generated that were already tested for the algorithm to consider the current best solution stable. .this parameter defines how fast does it will stop generating random numbers and g) size . Bundle size for each node depends on others. Besides these two scenarios.the minimum number of samples that must be collected before an analysis is triggered. f) decreaseRatio . This algorithm has two parameters : a) performanceCacheSize .this parameter defines the multiplicity used to define the range available to random generator. The bundle size used in this scenario are given in Table II named as granularity of tasks.fixed number of tasks per node. M is the other algorithm parameter. It determines bundle size for each node according to the performance of that node in the previous run. Two parameters were modified during the measurement: the number of nodes on which the program was executed and the granularity of tasks. There were two cases .one computer as a server and as a node. c) maxDeviation . hence the name of the algorithm. N is one of the algorithm's parameters. TABLE II.learning refers to how to map situations to actions. called "performance cache size". so the only parameter that was changed was the bundle size in both cases. This algorithm has three parameters : a) performanceCacheSize .Monte Carlo algorithm[6]. all node bundle sizes are re-computed. This algorithm has seven parameters : a) minSamplesToAnalyse . d) maxGuessToStable .the maximum size of the performance samples cache b) maxActionRange .is purely adaptive[5] and based solely on the known previous performance of the nodes. Scenario 3 : Execution time of a program that performs multiplication two square matrices of size 750x750 was measured. The mean time is computed as a moving average over the last M executed tasks for a given node. GRANULARITY OF TASKS PER NODE Number of nodes Granularity of tasks si is then computed as: 1 2 4 6 750 375 188 125 375 188 94 63 250 125 63 42 188 94 47 32 150 75 38 25 . Then it defines: The bundle size for each node is proportional to its contribution to sum S. These parameters are given at Table II. Every time performance data is fed back to the server. In this scenario only manual algorithm was used. Scenario 3 and 4 are added to provide additional arguments for the conclusion that will be based on Scenario 1 and 2. called "proportionality factor". The basic idea is simply to capture the most important aspects of the real problem facing a learning agent interacting with its environment to achieve a goal. there was a need to add two more. 3) Proportional algorithm . Implementation of this algorithm is based on this formula : n = current number of nodes attached to the driver max = maximum number of tasks in a job in the current queue state meani = mean execution time for node i si = number of tasks to send to node i p = proportionality factor parameter. b) minSamplesToCheckConvergence the minimum number of samples to be collected before checking if the performance profile has changed. Scenario 4 : In this scenario there was conducted a measurement to evaluate the effect of the network traffic on the total executing time of program which multiplies two square matrices of size 750x750. in order to maximize a numerical reward signal. 4) Reinforcement learning algorithm[4] . e) sizeRatioDeviation .one computer as a server and the other as a node. Each bundle size is determined in proportion to the mean execution time to the power of N.the absolute value of the maximum increase of the the bundle size c) performanceVariationThreshold the variation of the mean execution time that triggers a change in bundle size.

this anomaly can be explained by the fact that proportional algorithm is using information about past performance of nodes which requires additional network communication between server and nodes. To prove these claims. by increasing the size of the problem the execution time is increasing too.Number of nodes 8 10 94 75 47 38 Granularity of tasks 32 25 24 19 19 15 III. RESULT OF MATRIX MULTIPLICATION OF SIZE 750X750 ON 2. about influence network traffic on executing time. These algorithms are based or on past performance of nodes or on the distance between them. 8 AND 10 NODES USING ALL 4 LOAD BALANCING ALGORITHMS FIGURE 1. These conclusions needs additional evidence. MEASUREMENT RESULTS All the results represent:  time needed to schedule data to nodes. Although the proportional algorithm has the nearest execution time to manual algorithm. INFLUENCE OF THE NETWORK TRAFFIC ON EXECUTION TIME . and between the nodes. 4. B Results in the second scenario As can be seen in the Figure 1. proportional and RL algorithm are less effective as high-performance algorithm than manual algorithm. 8 AND 10 NODES USING ALL 4 LOAD BALANCING ALGORITHMS From Figure 1 it can be seen that the best results were achieved with manual algorithm. manual algorithm sends fixed number of tasks to each node.  time for executing operations on node and  time it takes to return the results to client. A Results in the first scenario FIGURE 2. the best execution time was achieved using manual algorithm. it can be seen that the autotuned. then the server forwards same fixed number of tasks to the first available node. also in the Figure 2. 6. RESULT OF MATRIX MULTIPLICATION OF SIZE 1500X1500 ON 2. the experiment was extended by two additional scenarios. Considering results from Figure 1 and Figure 2. C Results in the third scenario FIGURE 3. 4. their adaptivity leads to conclusion that they spend more time on communication between server and nodes. and if there are more tasks in queue waiting for execution. As previously mentioned. 6. than it’s a case with a manual algorithm. Obviously. As it can be seen from the second result.

Lazowska . Sutton and Andrew G. Edward D. This figure represents the results of conducting Scenario 3. Eager. Increased network communication leads to increased total execution time of parallel program.wikipedia. Granularity of tasks determines the amount of communication between server and nodes while sending and receiving the results of deployed tasks.org/about. Considering that in first case there was no communication between a server and a node.5. Adaptive algorithms using the variable granularity of work creates more network communication between servers and nodes. the network communication increases too. Available: http://jppf. and poor granularity of work can also significantly increase the total executing time. p. D Results in the fourth scenario REFERENCES [1] (2011) The JPPF website [Online]. In other algorithms. John Zahorjan. Available : http://en. Barto (Book) [5] Derek L. RESULTS OBTAINED BY CHANGING THE SIZE OF GRANULARITY OF TASKS AND THE NUMBER OF NODE From the Figure 4 it is possible to conclude that by increasing the granularity of tasks.org/wiki/Load_balancing(computi ng) [3] (2011) The JPPF Performance website [Online]. the network communication comes into play because they are using heuristics for finding the optimal deployment of tasks. which deteriorate the efects of parallelization.php [2] (2011) The Wikipedia website [Online].org/wiki/Monte_Carlo_method FIGURE 4. .Figure 3 clearly shows the influence of network traffic on the total execution time of program that multiplies two square matrices of size 750x750. the results from this measurement can be used as a execution time on one node. IEEE Transactions on Software Engineering. Available: http://jppf. Available: http://en. CONCLUSION The above comparison shows that static load balancing algorithms are more efficient compared to adaptive ones and it is also ease to predict the behaviour of static algorithms. As it can be seen. This leads to increased network traffic which affects the total execution time.12 n. Comparing these two results from Figure 3 it was possible to determine the time that has been in communication between a server and a node. “Adaptive load sharing in homogeneous distributed systems”.662-675.wikipedia. IV. May 1986 [6] Monte Carlo method [Online]. increasing the granularity of tasks increases the network communication which automatically increases the total execution time. v.php?title=JPPF_Perform ance [4] (2005) Reinforcement Learning: An Introduction Richard S.org/doc/v2/index.