Vous êtes sur la page 1sur 4

Effectiveness of Four Load Balancing Algorithms in Java Parallel Processing Framework

Muris Agić, Mirsad Bulbulušić, Fadila Žilić, Novica Nosović
Faculty of Electrical Engineering, Unviersitiy of Sarajevo Zmaja od Bosne b.b., Sarajevo, Bosnia and Herzegovina
ma14887@etf.unsa.ba mb13929@etf.unsa.ba fz14787@etf.unsa.ba novica.nosovic@etf.unsa.ba Abstract— This paper presents how four load balancing algorithms affect the total execution time of the parallel job using JPPF (Java Parallel Processing Framework). JPPF[1] is Open Source tool that enables applications with large processing power requirements to be run on any number of nodes, in order to reduce their processing time . The algorithms differ in the way of balancing – scheduling tasks between the nodes. One of the algorithms uses the fixed (static) number of tasks given by a user. The other three algorithms use some form of the adaptive load balancing[2]. There are two sets of the measurements presented using these four algorithms for executing a parallel job of matrix multiplication. The results of these measurements led to the conclusion that the static load balancing algorithms spend less time on communication between server and a nodes compared to the others. Keywords— Load balancing, JPPF, autotuned algorithm, proportional algorithm, manual algorithm, reinforcement learning algorithm

One computer was used as a server, while the other as nodes. Using the instance of administration console on server, ten nodes were administrated. Scenario 1: Execution time of a program that performs multiplication two square matrices of size 750x750 was measured. Measurements are made using 2, 4, 6, 8 and 10 nodes and using each of the four load balancing algorithms from. Parameters of all 4 load balancing algorithms are given at Table I. Scenario 2 : Execution time was measured on the same setup as scenario 1, but with matrices of size 1500x1500.
TABLE I. SETTINGS FOR PROPERTIES OF 4 LOAD BALANCING ALGORITHMS

Load Balancing Algorithm MANUAL size = 5

Parameters

I. INTRODUCTION Recently, Java has emerged as a language of choice for parallel programming. As one of the open source tools, Java arouse a lot of interest among professional software engineers. By using the JPPF, they can use things that enable easier parallel programming[3]. One of the most helpful tools are the load balancing algorithms. Usually, other frameworks that support parallel programming use only one algorithm to schedule jobs. JPPF offers four load balancing algorithms which give users a possibilities to analyze their effect on the total execution time of parallelized program. This will be explained in the following sections. II. PERFORMANCE MEASUREMENT SCENARIOS To determine how the load balancing algorithms effect on the total execution time of parallel program, execution time measurement was conducted in four scenarios. 11 computer with following characteristics were used:  CPU, Intel Core 2 Duo E7200.  clock frequency, 2.533 GHz.  RAM size. 3 GB (visible to OS),  Operating System. Windows 7 Professional and  Java Run Environment (JRE) 7.

AUTOTUNED

PROPORTIONAL REINFORCEMENT LEARNING

minSamplesToAnalyze = 5 minSamplesToCheckConvergence = 5 maxDeviation = 0.2 maxGuessToStable = 50 sizeRatioDeviation = 1.5 decreaseRatio = 0.2 size = 5 performanceCacheSize = 2000 proportionalityFactor = 2 performanceCacheSize = 2000 maxActionRange = 10 performanceVariationThreshold = 0.001

Parameters of these load balancing algorithms are explained below : 1) Manual algorithm - divides the job on a fixed number of tasks. This algorithm has only one parameter - size – fixed number of tasks per node. 2) Autotuned algorithm - is heuristic in the sense that it determines inital number of tasks to be send to each node (may not be the optimal) and then seeks better solution. This algorithm is loosely based on the

4) Reinforcement learning algorithm[4] . all node bundle sizes are re-computed. The bundle size used in this scenario are given in Table II named as granularity of tasks.learning refers to how to map situations to actions. called "performance cache size". N is one of the algorithm's parameters.one computer as a server and the other as a node. f) decreaseRatio .the maximum size of the performance samples cache b) proportionalityFactor.the minimum number of samples that must be collected before an analysis is triggered. there was a need to add two more.fixed number of tasks per node. hence the name of the algorithm. Scenario 3 and 4 are added to provide additional arguments for the conclusion that will be based on Scenario 1 and 2.this parameter defines how fast does it will stop generating random numbers and g) size . called "proportionality factor". .this parameter defines the multiplicity used to define the range available to random generator.is purely adaptive[5] and based solely on the known previous performance of the nodes. Here. in order to maximize a numerical reward signal. These parameters are given at Table II. There were two cases .Monte Carlo algorithm[6]. This algorithm has seven parameters : a) minSamplesToAnalyse . d) maxGuessToStable . Bundle size for each node depends on others. In this scenario only manual algorithm was used. Scenario 3 : Execution time of a program that performs multiplication two square matrices of size 750x750 was measured. The basic idea is simply to capture the most important aspects of the real problem facing a learning agent interacting with its environment to achieve a goal. Each bundle size is determined in proportion to the mean execution time to the power of N. It determines bundle size for each node according to the performance of that node in the previous run. Implementation of this algorithm is based on this formula : n = current number of nodes attached to the driver max = maximum number of tasks in a job in the current queue state meani = mean execution time for node i si = number of tasks to send to node i p = proportionality factor parameter.the percentage of deviation of the current mean to the mean when the system was considered stable. e) sizeRatioDeviation . This algorithm has two parameters : a) performanceCacheSize .the maximum size of the performance samples cache b) maxActionRange .one computer as a server and as a node. TABLE II. GRANULARITY OF TASKS PER NODE Number of nodes Granularity of tasks si is then computed as: 1 2 4 6 750 375 188 125 375 188 94 63 250 125 63 42 188 94 47 32 150 75 38 25 .the absolute value of the maximum increase of the the bundle size c) performanceVariationThreshold the variation of the mean execution time that triggers a change in bundle size. The mean time is computed as a moving average over the last M executed tasks for a given node. 3) Proportional algorithm . Then it defines: The bundle size for each node is proportional to its contribution to sum S. Two parameters were modified during the measurement: the number of nodes on which the program was executed and the granularity of tasks. Scenario 4 : In this scenario there was conducted a measurement to evaluate the effect of the network traffic on the total executing time of program which multiplies two square matrices of size 750x750. b) minSamplesToCheckConvergence the minimum number of samples to be collected before checking if the performance profile has changed. as the maximum. M is the other algorithm parameter. so the only parameter that was changed was the bundle size in both cases. Every time performance data is fed back to the server. This algorithm has three parameters : a) performanceCacheSize . Besides these two scenarios.the maximum number of guesses of number generated that were already tested for the algorithm to consider the current best solution stable. c) maxDeviation .

 time for executing operations on node and  time it takes to return the results to client. As previously mentioned. 8 AND 10 NODES USING ALL 4 LOAD BALANCING ALGORITHMS From Figure 1 it can be seen that the best results were achieved with manual algorithm. and between the nodes. Obviously. the experiment was extended by two additional scenarios. it can be seen that the autotuned. than it’s a case with a manual algorithm. To prove these claims. RESULT OF MATRIX MULTIPLICATION OF SIZE 1500X1500 ON 2. RESULT OF MATRIX MULTIPLICATION OF SIZE 750X750 ON 2. this anomaly can be explained by the fact that proportional algorithm is using information about past performance of nodes which requires additional network communication between server and nodes. their adaptivity leads to conclusion that they spend more time on communication between server and nodes. manual algorithm sends fixed number of tasks to each node.Number of nodes 8 10 94 75 47 38 Granularity of tasks 32 25 24 19 19 15 III. 4. Although the proportional algorithm has the nearest execution time to manual algorithm. These conclusions needs additional evidence. 4. about influence network traffic on executing time. MEASUREMENT RESULTS All the results represent:  time needed to schedule data to nodes. C Results in the third scenario FIGURE 3. 6. INFLUENCE OF THE NETWORK TRAFFIC ON EXECUTION TIME . B Results in the second scenario As can be seen in the Figure 1. and if there are more tasks in queue waiting for execution. also in the Figure 2. As it can be seen from the second result. A Results in the first scenario FIGURE 2. 8 AND 10 NODES USING ALL 4 LOAD BALANCING ALGORITHMS FIGURE 1. then the server forwards same fixed number of tasks to the first available node. by increasing the size of the problem the execution time is increasing too. Considering results from Figure 1 and Figure 2. 6. the best execution time was achieved using manual algorithm. proportional and RL algorithm are less effective as high-performance algorithm than manual algorithm. These algorithms are based or on past performance of nodes or on the distance between them.

IEEE Transactions on Software Engineering. the results from this measurement can be used as a execution time on one node. p.org/doc/v2/index. Adaptive algorithms using the variable granularity of work creates more network communication between servers and nodes. RESULTS OBTAINED BY CHANGING THE SIZE OF GRANULARITY OF TASKS AND THE NUMBER OF NODE From the Figure 4 it is possible to conclude that by increasing the granularity of tasks. Available: http://jppf.5. IV. Eager.org/wiki/Load_balancing(computi ng) [3] (2011) The JPPF Performance website [Online].Figure 3 clearly shows the influence of network traffic on the total execution time of program that multiplies two square matrices of size 750x750. As it can be seen. Considering that in first case there was no communication between a server and a node. CONCLUSION The above comparison shows that static load balancing algorithms are more efficient compared to adaptive ones and it is also ease to predict the behaviour of static algorithms. This leads to increased network traffic which affects the total execution time.wikipedia. Increased network communication leads to increased total execution time of parallel program. “Adaptive load sharing in homogeneous distributed systems”. Available: http://en.wikipedia. the network communication increases too.12 n. which deteriorate the efects of parallelization.php [2] (2011) The Wikipedia website [Online]. Available : http://en. Sutton and Andrew G. In other algorithms. Available: http://jppf. This figure represents the results of conducting Scenario 3.org/about.php?title=JPPF_Perform ance [4] (2005) Reinforcement Learning: An Introduction Richard S. and poor granularity of work can also significantly increase the total executing time.org/wiki/Monte_Carlo_method FIGURE 4. . Comparing these two results from Figure 3 it was possible to determine the time that has been in communication between a server and a node. John Zahorjan. increasing the granularity of tasks increases the network communication which automatically increases the total execution time. Lazowska . the network communication comes into play because they are using heuristics for finding the optimal deployment of tasks.662-675. May 1986 [6] Monte Carlo method [Online]. Barto (Book) [5] Derek L. Granularity of tasks determines the amount of communication between server and nodes while sending and receiving the results of deployed tasks. D Results in the fourth scenario REFERENCES [1] (2011) The JPPF website [Online]. v. Edward D.