Vous êtes sur la page 1sur 22

HYBRID ALGORITHM FOR WORKFLOW

SCHEDULING IN
CLOUD-BASED CYBERINFRASTRUCTURES

Andrei Alexandru Nicolae, Catalin Negru, Florin Pop, Mariana Mocanu,


Valentin Cristea
Computer Science and Engineering Department
University Politehnica of Bucharest
Bucharest, Romania
andrei.nicolae@hpc.pub.ro, catalin.negru@cs.pub.ro, florin.pop@cs.pub.ro,
mariana.mocanu@cs.pub.ro valentin.cristea@cs.pub.ro

NBiS2014,Salerno, Italy
Contents
 Introduction
 Related Work
 Hybrid Algorithm for Workflow Scheduling
 Cyberwater: A Cyberinfrastructure for Water
Quality Management
 Testing Scenarios and Experimental Results
 Conclusions and Future Work
INTRODUCTION
 The computational power has risen to new heights:
 but unfortunately so have the costs associated
 a new interest raise: efficient usage of power consumption
 Cyber-Infrastructure - relatively new concept
 mix of technologies
 Data platforms acquisition through realtime sensor networks, Big
Data,Visualisation Tools, HPC,Web services
 Inter-disciplinary approach
 Various sciences, engineering and social disciplines
 Advantages for using cloud services in Cyberinfrastructures:
 scalability up and down
 real-time resource provisioning
 simplified deployment and management of resources and applications
 better cost/performance ratio
INTODUCTION(1)
 In Cloud: it is mandatory to achieve a good
balance between cost and performance
 CyberWater project
 Monitors natural resources and water related events
 Offer a solution for water quality in respect to the
pollution phenomena
HYBRID ALGORITHM FOR
WORKFLOW SCHEDULING
 The application is represented by a DAG - G(V;E;C;W):
 V is the set of v nodes, and each node vi ∈V represents an
application task;
 w is the set of computation costs, where wi ∈W is the
execution time of task vi;
 E is the set of communication edges. The directed edge ei,j
joins node vi and vj
 node vi is called the parent node and node
 vj is called the child node
 This implies that vj cannot start until vi finishes and sends its data to vj
 C is the set of communication costs, and the edge ei,j has a
communication cost ci,j apartine C.
HYBRID ALGORITHM FOR
WORKFLOW SCHEDULING (2)
 We consider a homogeneous computing environment
model :
 a set P of p identical processors connected in a fully
connected graph
 Assumptions:
 any processor can execute the task and communicate with
other processors at the same time;
 once a processor has started task execution, it continues
without interruption, and
 on completing the execution it sends immediately the output
data to all children tasks in parallel.
HYBRID ALGORITHM FOR
WORKFLOW SCHEDULING (3)
 Communication cost ci,j -for transferring data from task vi (scheduled on pm):
ci , j  S  Ri , j
 S is the cost of starting communication between processors (in seconds)
 µi,j is the amount of data transmitted from task vi to task vj (in bytes)
 R is the cost of communication per transferred byte (in seconds/byte).

 Assumptions:
 the startup cost S is negligible and
 the unit cost R is the same for any two processors,
 that the communication cost for any two tasks is a function of the amount of transferred
data only
 We present a near optimal scheduling while taking into consideration two
things:
 minimizing the total execution time and
 minimizing the number of resources (processors) used
HYBRID ALGORITHM FOR
WORKFLOW SCHEDULING (4)
 HER algorithm descripription:
 Phase I:
 Group nodes in such a way that the communication cost
is as small as possible;
 Nodes are placed in lists. Every list will run on a
different core;
 Second, lists are sorted, in an ascended order by the
total execution time of each node.
HYBRID ALGORITHM FOR
WORKFLOW SCHEDULING (5)
 Phase II:
 Node 0 is assigned with priority 0;
 Nodes (excluding node 0) with 2 or more children are
assigned priority 2 and are placed as early as possible in
their respective lists;
 The total execution time is calculated for each list.
 Phase III:
 If all lists are balanced then jump to Phase V;
 Nodes, grouped together with their parents, are split to
different lists;
 All nodes (except final node) without children are placed at
the end of their respective lists.
HYBRID ALGORITHM FOR
WORKFLOW SCHEDULING (6)
 Phase IV:
 thelists are split and reorganized taking into account
load balancing.
 Phase V:
 all tasks are allocated to their respective processors
 since the nodes in the lists are not arranged in order of
execution we need to go over each lists multiple times
until all nodes are placed on their respective processors
and all tasks are executed
CYBERWATER: A CYBERINFRASTRUCTURE FOR
WATER QUALITY MANAGEMENT
 CyberWater - Prototype Cyberinfrastructure based System for Decision-
Making Support in Water Resources Management
 Data level
 Measured data, predicted data, modeled data, suscribers data
 Storage level
 prediction module
 propagation module
 decision support module
 alerts module
 Visualization level
 Customized web
application
CYBERWATER: A CYBERINFRASTRUCTURE FOR
WATER QUALITY MANAGEMENT (1)

 CyberWater Application Workflows:


 CyberWater: Decision Support Workflow
CYBERWATER: A CYBERINFRASTRUCTURE FOR
WATER QUALITY MANAGEMENT (2)

 CyberWater: Real-Time Alerts Support Workflow


CYBERWATER: A CYBERINFRASTRUCTURE FOR
WATER QUALITY MANAGEMENT (3)

 Integration in a Cloud-Based Cyber-Infrastructure


TESTING SCENARIOS AND
EXPERIMENTAL RESULTS
 HER algorithm has been applied on various
workflows, which are represented by DAGs
 each task is viewed as a unique node
 edges between nodes have a cost, which represents
the communication cost between different processors
 the most important limitation is that there always
needs to be just one entry node and only one exit
node
TESTING SCENARIOS AND
EXPERIMENTAL RESULTS (1)
 Processor efficiency
BusyTime
Ep 
BusyTime  IdleTime

 Efficiency of a group of n processors


1
E g   E p (i )
n
TESTING SCENARIOS AND
EXPERIMENTAL RESULTS (2)
 Scenario 1:
 Worklow used in, image processing – image filtering
 High communication costs:
 Node without children
 Node with multiple parents
 Eg –has 88% efficiency
 HLFET
 Use 4 processors
 39% efficiency
TESTING SCENARIOS AND
EXPERIMENTAL RESULTS (3)
 Scenario 2:
 DAG which can be used for tree indexing
 nodes without children
 small communication cost

 Eg – 61%
TESTING SCENARIOS AND
EXPERIMENTAL RESULTS (4)
 Scenario 3:
 DAG which can be used in services
composition
 ”fork-join” structure

 many dependencies as possible

 because of dependencies there

are no realy efficient ways to use


more processors
 Eg -48% efficiency
TESTING SCENARIOS AND
EXPERIMENTAL RESULTS (5)
 Scenario 4:
 DAG used in HPC
 communication costs are

relatively low
 the computational cost of

nodes are quite high


 Eg- 46%
CONCLUSIONS and FUTURE WORK
 HLFET or MCP, are not the most efficient for general cases
 HER algorithm
 improves the overall efficiency
 by reducing the number of used processors and
 reducing the total execution time
 near-optimal algorithm
 Need further improvements:
 better processor balancing
 better way to take dependencies into account
 it doesn’t function under CPU’s restrictions
 improve the number of times the total execution time is
calculated
Questions
 Andrei Nicolae
 Email:andrei.nicolae@hpc.pub.ro
 Catalin Negru
 Email: catalin.negru@cs.pub.ro
 Florin Pop
 Email: florin.pop@cs.pub.ro
 Mariana Mocanu
 Email: mariana.mocanu@cs.pub.ro
 Valentin Cristea
 Email: valentin.cristea@cs.pub.ro

Vous aimerez peut-être aussi