Vous êtes sur la page 1sur 86

INTERIM PROJECT REPORT

B.Tech (CSE) VII Sem

UTILITY BASED SCHEDULING IN A MULTIPROCESSOR


SYSTEM USING MACHINE LEARNING

Under the Guidance of: Submitted By: Abhishek Mishra (07206G)

Mr. Abhishek Srivastava Group Members: Ankur Chhaparia (07234G)


Shivendra Mishra (07342G)

DECEMBER 2010

Submitted in partial fulfillment of the Degree of

Bachelor of Technology

Department of Computer Science & Engineering


JAYPEE UNIVERSITY OF ENGINEERING & TECHNOLOGY,
A-B ROAD, RAGHOGARH, DT. GUNA - 473226, M.P., INDIA

JAYPEE UNIVERSITY OF ENGINEERING & TECHNOLOGY


(Establishment Under MP Private University Act, 2007)

A.B. ROAD, P.B. No. 1, RAGHOGARH, DIST: GUNA (M.P.) INDIA.

Phone : 07544 267051, 267310 - 14 Fax : 07544 267011

Website: www.juet.ac.in

PROJECT CERTIFICATE

This is to certify that the work titled “Utility based scheduling in a multiprocessor system using
machine learning” submitted by Abhishek Mishra(07206g) along with Ankur Chhaparia(07234g) and
Shivendra Mishra(07342g) in partial fulfillment for the award of degree of B. Tech programme for the award of
degree of B.Tech of Jaypee University of Engineering And Technology. This work has not been submitted partially
or wholly to any other University or Institute for the award of this or any other degree or diploma.

Signature of Supervisor …………………….


Name of Supervisor Mr. ABHISHEK SRIVASTAVA
Designation Lecturer
Date ……………………

ii
Acknowledgement

We express our gratitude and sincere thanks to Mr. Abhishek Srivastava, our project guide, who has guided
us all along by his wise lead, benevolent direction, suggestion and time worthy interaction sessions .

We express our heartfelt and profound gratitude to Dr.Shishir kumar , HOD – Dept of Computer Science and
Engineering whose valuable suggestions and cooperation have encouraged us and provided the impetus to get the
project off the ground.

Articulation of any sort will be less than sufficient in expressing our sincere and honest thanks for his keen
and personal interest, able guidance and moral support, encouragement throughout the entire period of this
project work. We will always remain thankful to them for providing his precious time and giving new ideas
along with the appropriate facilities.

Signature of the Student ………………..

Name of Student Abhishek Mishra

Date ………………..

iii
Summary

In this project a general methodology for online scheduling of parallel jobs onto multi processor servers in a soft
real-time environment, where the final utility of each job decreases with the job completion time. A solution
approach is presented where each server uses Reinforcement Learning for tuning its own value function, which
predicts the average future utility per time step obtained from completed jobs based on the dynamically observed
state information. The server then selects jobs from its job queue, possibly preempting some currently running jobs
and “squeezing” some jobs into fewer CPUs than they ideally require, so as to maximize the value of the resulting
server state. The experimental results demonstrate the feasibility and benefits of the proposed approach.

iv
Table of Contents
1. Introduction.........................................................................................................................................2
1.1 Scheduler:..........................................................................................................................................3
1.2 Scheduling Basics:..............................................................................................................................4
1.3 Conflicting goals of Scheduling..........................................................................................................4
1.4 Multiple Processor Systems:..............................................................................................................5
1.4.1 A shared-memory multiprocessor:.............................................................................................5
1.4.2 A message-passing multicomputer:............................................................................................6
1.4.3 A wide area distributed system:.................................................................................................6
1.5 Multiprocessor Hardware..................................................................................................................7
1.6 Multiprocessor Operating System Types:..........................................................................................7
1.7 How to choose a scheduling algorithm:.............................................................................................8
1.8 Operating System Scheduler Implementations:................................................................................9
1.8.1 Windows.....................................................................................................................................9
1.8.2 Mac OS........................................................................................................................................9
1.8.3 Linux.........................................................................................................................................10
1.8.4 FreeBSD....................................................................................................................................10
1.8.5 NetBSD......................................................................................................................................10
1.8.6 Solaris.......................................................................................................................................10
2. Related Work.....................................................................................................................................12
2.1 Existing System:...............................................................................................................................12
2.2 What is Required:............................................................................................................................12
2.3 End Product.....................................................................................................................................12
2.10 Why quantum can be (quite) long in Linux:...................................................................................15
2.16 Fuzzy logic......................................................................................................................................24
2.17 Memory Architecture:...................................................................................................................25
2.18 Application Specific Memory Design:............................................................................................25
2.19 Architectural Resource Allocation:................................................................................................26
2.20 Multiprocessor vs Multicore :........................................................................................................26
2.21 Preliminary Work Done:................................................................................................................28
2.22 Resources Available.......................................................................................................................29
2.23 Scope of Work:..............................................................................................................................31

v
2.24 Requirement Analysis:...................................................................................................................31
3. Problem & Suggested Solution..........................................................................................................34
3.1 Problem Formulation.......................................................................................................................34
3.2 Solution Methodology.....................................................................................................................34
3.2.1 Overview...................................................................................................................................34
3.2.2 Fuzzy Rulebase..........................................................................................................................35
3.3 Value-Based Job Scheduling Algorithm............................................................................................40
3.3.1 Scheduling on a Single Machine................................................................................................40
3.3.2 Scheduling on Multiple Machines.............................................................................................41
3.4 LinSched: The Linux Scheduler Simulator.......................................................................................42
3.5 Architecture.....................................................................................................................................43
3.6 Design of Agent:..............................................................................................................................44
4. Conclusion and future work...................................................................................................................46
Activity-Time Schedule..............................................................................................................................48
Refrences:..................................................................................................................................................50
Personal Details:........................................................................................................................................52

vi
Chapter - 1

1. Introduction

The concept of scheduling jobs based on Time Utility Function’s objective is to maximize the total utility accrued
by the system over time was used. The utility accrual (UA) paradigm is a generalization of the deadline scheduling

vii
in hard real-time systems. That is, if the system receives a utility of 1 for completing a job before its deadline and a
utility of 0 otherwise, then the Earliest Deadline First (EDF) algorithm will maximize the total utility accrued by the
system if it is possible to complete all jobs before their deadlines. An important benefit of the UA paradigm is that it
allows one to optimize the productivity of the system: average utility from completed jobs per unit of time.

Each job arriving into the system requires a certain number of CPUs to be executed at the maximum rate but can be
executed with fewer CPUs at a slower rate. The final job utility decays as its waiting + execution time increases. In
addition to deciding on the order in which the arriving jobs should be scheduled, the scheduler is also allowed to
“squeeze” a job into a smaller number of CPUs than it ideally requires, thereby extending its execution time and
receiving a smaller final utility. Alternatively, the scheduler can wait until the initially requested number of CPUs
becomes available for a particular job in order to ensure the maximum execution rate, while risking to have a long
waiting time for this job and a low final utility once again. The scheduler can also suspend some of the currently
running jobs

so as to schedule some of the waiting jobs, and resume the execution of the suspended jobs at a later time. If some
CPUs become available before the “squeezed” job completes its execution, the scheduler assigns more CPUs to that
job until it gets all its originally requested CPUs.

Sometimes it takes much time for some tasks which are need to be processed first. As what exist now or what was
there in the past dates is that either the scheduler works on priority based levels or preemptive but then also some
drawback was there. So to improve the performance and to find a new scheduler code so that processing can be done
at faster rate. That is why a Utility based Scheduler is needed.

Scheduling is the process of deciding how to use resources between a variety of possible tasks. Time can be
specified (scheduling a flight to leave at 8:00) or floating as part of a sequence of events.

It is a key concept in computer multitasking, multiprocessing operating system and real-time operating system
designs.

Scheduling refers to the way processes are assigned to run on the available CPUs, since there are typically many
more processes running than there are available CPUs.

Scheduling is an important tool for manufacturing and engineering, where it can have a major impact on the
productivity of a process.

Scheduling, In the fields of databases and transaction processing (transaction management), a schedule (also called
history) of a system is an abstract model to describe execution of transactions running in the system.

1.1 Scheduler:

Scheduler is the part of the kernel which selects a job to run when a context switch occurs.

viii
For most interactive systems, the only input to the scheduler is a list of ready jobs, and the things they are waiting
for.

However, in real-time systems, the processes may specify some known properties useful to the scheduler:

 Processing time required


 Deadline of the process
 The importance of the process, either as a priority or as the benefit gained if the job completes by its
deadline.
 Given these properties, the scheduler (or scheduling algorithm) selects a job for the dispatcher to switch
to.

It is the concurrent scheduling of processes of a parallel job on individual nodes of a time sharing clusters. Co-
scheduling for clusters is a challenging problem because it must reconcile the demands of parallel and local
computations, balancing parallel efficiency against local interactive response.Ideally a co-scheduling system would
provide the efficiency of a batch-scheduled system for parallel jobs and a private timesharing system for interactive
users. In reality, the situation is much more complex, as we expect some parallel jobs to be interactive.

1.2 Scheduling Basics:

ix
Processes switch between two states: it might need the CPU to run, or it might need to wait for an I/O device. A
period of time when a process needs the CPU is called a CPU burst, and a period of time when a process needs I/O
is called an I/O burst. In a simplified view, each CPU burst requires using the CPU for an amount of time
independent on the scheduling (mostly true for batch systems, vastly inaccurate on interactive systems). While we
mostly talk about process scheduling, we are actually scheduling CPU bursts of different processes (actually,
threads).Intuitively, a CPU-bounded process have a few long CPU burst, while an I/O bounded process have a
lot of short CPU bursts.

The Scheduler is concerned mainly with:

 CPU Utilization - to keep the CPU as busy as possible.


 Throughput - number of processes that complete their execution per time unit.
 Turnaround - total time between submission of a process and its completion.
 Waiting Time - amount of time a process has been waiting in the ready queue.
 Response Time - amount of time it takes from when a request was submitted until the first response is
generated
 Fairness - Equal CPU time to each thread.

1.3 Conflicting goals of Scheduling:

 There are multiple goals in scheduling, many of them conflicting:


 Maximize CPU utilization, or throughput. The more work that can be completed per time, the better.
 Minimize completion time (“turn-around” time), or waiting time. The less time required to complete each
job, the better.
 Minimize response time, i.e., time required for the program to start output. Important only for interactive
programs. In most systems, we want to optimize the average of the above criteria. In interactive systems, it
may also be desirable to minimize the variance. In real-time systems, the worse case behaviors are more
important.
 Conflicting: e.g., to increase CPU utilization, we may want to delay jobs so that we always have something
to do: increased completion time.

x
1.4 Multiple Processor Systems:

Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The
term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks
between them.There are many variations on this basic theme, and the definition of multiprocessing can vary with
context, mostly as a function of how CPUs are defined (multiple cores on one die, multiple dies in one package,
multiple packages in one system unit, etc.).

1.4.1 A shared-memory multiprocessor:

A shared-memory multiprocessor consists of a number of processors accessing one or more shared memory
modules. The processors can be physically connected to the memory modules in a variety of ways, but logically
every processor is connected to every module.

 Oldest and most popular model


 Based on timesharing
 Multiple processes can overlap (share), but ALL threads share a process address space
 Each processor can name every physical location in the machine
 Each process can name all data it shares with other processes
 Data transfer via load and store
 Data size: byte, word, ... or cache blocks.

xi
1.4.2 A message-passing multicomputer:

Each memory is local to a single CPU and can be accessed only by that CPU. The machines communicate by
sending multiword messages over the interconnect.

1.4.3 A wide area distributed system:

The third model, connects complete computer systems over a wide area network, such as the Internet, to form a
distributed system . Each of these has its own memory, of course, and the systems communicate by message
passing.

xii
1.5 Multiprocessor Hardware:

Although all multiprocessors have the property that every CPU can address all of memory,some multiprocessors
have the additional property that every memory word can be read as fast as every other memory word. These
machines are called UMA (Uniform Memory Access ) multiprocessors. In contrast, NUMA (Nonuniform Memory
Access ) multiprocessors do not have this property.

1.6 Multiprocessor Operating System Types:

1.6.1 Each CPU Has Its Own Operating System:

The simplest possible way to organize a multiprocessor operating system is to statically divide memory into as many
partitions as there are CPUs and give each CPU its own private memory and its own private copy of the operating
system. In effect, the n CPUs then operate as n independent computers. One obvious optimization is to allow all the
CPUs to share the operating system code and make private copies of only the data.

Partitioning multiprocessor memory among four CPUs, but sharing a single copy of the operating system code.

1.6.2 Master-Slave Multiprocessors:

One copy of the operating system and its tables are present on CPU 1 and not on any of the others. All system calls
are redirected to CPU 1 for processing there. CPU 1 may also run user processes if there is CPU time left over. This
model is called master-slave since CPU 1 is the master and all the others are slaves.

xiii
A master-slave multiprocessor model.

1.6.3 Symmetric Multiprocessors:

Our third model, the SMP (Symmetric MultiProcessor ), eliminates this asymmetry There is one copy of the
operating system in memory, but any CPU can run it. When system call is made, the CPU on which the system call
was made traps to the kernel and processes the system call.

The SMP multiprocessor model

1.7 How to choose a scheduling algorithm:

When designing an operating system, a programmer must consider which scheduling algorithm will perform best for
the use the system is going to see. There is no universal “best” scheduling algorithm, and many operating systems
use extended or combinations of the scheduling algorithms above. For example, Windows NT/XP/Vista uses a
Multilevel feedback queue, a combination of fixed priority preemptive scheduling, round-robin, and first in first out.

xiv
In this system, processes can dynamically increase or decrease in priority depending on if it has been serviced
already, or if it has been waiting extensively. Every priority level is represented by its own queue, with round-robin
scheduling amongst the high priority processes and FIFO among the lower ones. In this sense, response time is short
for most processes, and short but critical system processes get completed very quickly. Since processes can only use
one time unit of the round robin in the highest priority queue, starvation can be a problem for longer high priority
processes.

1.8 Operating System Scheduler Implementations:

1.8.1 Windows

Very early MS-DOS and Microsoft Windows systems were non-multitasking, and as such did not feature a
scheduler. Windows 3.1x used a non-preemptive scheduler, meaning that it did not interrupt programs. It relied on
the program to end or tell the OS that it didn't need the processor so that it could move on to another process. This is
usually called cooperative multitasking. Windows 95 introduced a rudimentary preemptive scheduler; however, for
legacy support opted to let 16 bit applications run without preemption.Windows NT-based operating systems use a
multilevel feedback queue. 32 priority levels are defined, 0 through to 31, with priorities 0 through 15 being
"normal" priorities and priorities 16 through 31 being soft real-time priorities, requiring privileges to assign. 0 is
reserved for the Operating System. Users can select 5 of these priorities to assign to a running application from the
Task Manager application, or through thread management APIs. The kernel may change the priority level of a thread
depending on its I/O and CPU usage and whether it is interactive (i.e. accepts and responds to input from humans),
raising the priority of interactive and I/O bounded processes and lowering that of CPU bound processes, to increase
the responsiveness of interactive applications. The scheduler was modified in Windows Vista to use the cycle
counter register of modern processors to keep track of exactly how many CPU cycles a thread has executed, rather
than just using an interval-timer interrupt routine.Vista also uses a priority scheduler for the I/O queue so that disk
defragmenters and other such programs don't interfere with foreground operations.

1.8.2 Mac OS

Mac OS 9 uses cooperative scheduling for threads, where one process controls multiple cooperative threads, and
also provides preemptive scheduling for MP tasks. The kernel schedules MP tasks using a preemptive scheduling
algorithm. All Process Manager processes run within a special MP task, called the "blue task". Those processes are
scheduled cooperatively, using a round-robin scheduling algorithm; a process yields control of the processor to
another process by explicitly calling a blocking function such as Wait Next Event. Each process has its own copy of
the Thread Manager that schedules that process's threads cooperatively; a thread yields control of the processor to
another thread by calling Yield to any thread or Yield to thread .Mac OS X uses a multilevel feedback queue, with
four priority bands for threads - normal, system high priority, kernel mode only, and real-time Threads are scheduled

xv
preemptively; Mac OS X also supports cooperatively-scheduled threads in its implementation of the Thread
Manager in Carbon.

1.8.3 Linux

From version 2.5 of the kernel to version 2.6, Linux used a multilevel feedback queue with priority levels ranging
from 0-140. 0-99 are reserved for real-time tasks and 100-140 are considered nice task levels. For real-time tasks,
the time quantum for switching processes is approximately 200 ms, and for nice tasks approximately 10 ms. The
scheduler will run through the queue of all ready processes, letting the highest priority processes go first and run
through their time slices, after which they will be placed in an expired queue. When the active queue is empty the
expired queue will become the active queue and vice versa. From versions 2.6 to 2.6.23, the kernel used an O(1)
scheduler. In version 2.6.23, they replaced this method with the Completely Fair Scheduler that uses red-black trees
instead of queues.

1.8.4 FreeBSD

FreeBSD uses a multilevel feedback queue with priorities ranging from 0-255. 0-63 are reserved for interrupts, 64-
127 for the top half of the kernel, 128-159 for real-time user threads, 160-223 for time-shared user threads, and 224-
255 for idle user threads. Also, like Linux, it uses the active queue setup, but it also has an idle queue.

1.8.5 NetBSD

Net BSD uses a multilevel feedback queue with priorities ranging from 0-223. 0-63 are reserved for time-shared
threads (default, SCHED_OTHER policy), 64-95 for user threads which entered kernel space, 96-128 for kernel
threads, 128-191 for user real-time threads (SCHED_FIFO and SCHED_RR policies), and 192-223 for software
interrupts.

1.8.6 Solaris

Solaris uses a multilevel feedback queue with priorities ranging from 0-169. 0-59 are reserved for time-shared
threads, 60-99 for system threads, 100-159 for real-time threads, and 160-169 for low priority interrupts. Unlike
Linux, when a process is done using its time quantum, it's given a new priority and put back in the queue.

xvi
Chapter – 2

xvii
2. Related Work

The following features tries to summarize the most important points from the related work presented in utility based
scheduling :

2.1 Existing System:

In earlier works every CPU in a multiprocessor environment has its own algorithm by which it schedules a task on
FCFS or EDF. In this the problem is that if a new job arrives which have higher priority, still it has to wait till the
scheduled jobs completes. Although utility based scheduling are playing an increasingly important role in modern
computing system, many important tasks has to wait for processor to get free from already assigned task, but the
problem with those are that if an extremely important task arrives then it has to wait for other task to get completed.
Though few important scheduler techniques are discussed in next chapter, currently there are not many utility based
schedulers in the market.

2.2 What is Required:

Users seek a scheduler which process the task according to the usefulness of the work. Suppose if one has assigned
four or five tasks which has to be processed and in between he/she gets a very important task to be processed. So he
can be able to get that task done before all those running tasks with maximum utilization of the scheduler.

2.3 End Product:

A scheduler that works according to the utility of the users task. It can be done by using machine learning technique
in a multiprocessor environment between local storage module and the CPU.

2.4 Technology Involved:

The Technology being used would be C/C++ language so as to do some changes in Linux scheduling code as it is an
open source.

2.5 Process:

Different agent would be different modules of code performing specialized functions (processes).The agent with
local storage and the other with CPU module resources will analyze each other, where the CPU module and the local
storage module are self-managing and self-optimizing.

xviii
2.6 Suggested Testing:

Testing would be done in a real environment with few CPU’s in our lab by assigning tasks to them. Then we will
analyse execution time, waiting time and period to judge the final utilization.We will use a tool named LINSCHED
for the testing in Linux environment.

2.7 Limitations:

When an high priority task arrives, it preempts and take the processor to get executed first. So in this case, the tasks
that came earlier has to wait for little more time.

2.8 Usage and Benefits:

Usage of the utility based algorithm is to maximize the utilization of processors. Benefit includes fast processing,
higher utilization, increased performance.

2.9 Existing Scheduling Techniques:

2.9.1 First Come First Serve (FCFS):


By far the simplest strategy, non-preemptive.
•When processes are created or are woken up (i.e., when a CPU burst starts), they are put to the end of the ready
queue.
• Whenever the scheduling algorithm is invoked, the process in the head of the ready queue is selected.
• The process runs until it completes or needs to wait (i.e., the CPU burst ends).
Good: extremely simple and easy to implement.
Bad: the algorithm is not good for most situations, as it usually results in long completion time and response time.
Worst: if one process refuse to release the CPU, other processes get no progress.

2.9.2 Round Robin (RR):

If we add preemption to FCFS, we results in Round Robin. This is the algorithm that most real-world algorithms are
based on.
• When processes are created or are woken up, they are put to the end of the ready queue.
• Whenever the scheduling algorithm is invoked, the process in the head of the ready queue is selected.
• The process runs until it completes or needs to wait, or used up its time-slice.

xix
The algorithm is better than FCFS in average waiting time.
The length of time-slice determines the responsiveness and scheduling overhead incurred by an RR algorithm.

2.9.3 Shortest Job First (SJF):

Another interesting way to determine priority during run time is to employ the processing time. Jobs of less
processing time have higher priority, and is thus serviced before jobs of more processing time. This is called SJF,
and can be preemptive or non-preemptive just like priority scheduling .An interesting property of preemptive
SJF, or Shortest Remaining Processing Time (SRPT), is that it minimize average turnaround time.
• By completely working on a shorter job before a longer one, the shorter job is completed earlier, and the longer job
may need more time to complete.
• But the amount of delay in the long job is never more than the reduction in turnaround time for the short job.

2.9.4 Priority Scheduling:

A different approach would consider processes to have different priorities. These priority are be specified when the
processes are created.
• The ready queue is implemented as a priority queue.
• Every time the scheduling algorithm is invoked, the ready job with the highest priority is selected.
• A preemptive scheme would allow high a priority process to preempt a low priority process when it is released or
is waken up.
• Priority scheduling involves no time interrupt. While the FCFS and RR schemes advertize fairness, a priority
scheme is designed to be unfair.

2.9.5 Linux scheduling algorithm:

Real world scheduling algorithms are usually a hybrid of many algorithms. For example, the Linux uniprocessor
scheduler :
• Processes are either real-time or conventional. Most processes are conventional, while real-time processes have
absolute priority over them.
• Real-time processes are divided into 99 priority levels, higher priority jobs are always serviced before lower
priority ones.
Processes may select the FIFO or RR algorithm, the only difference is that the scheduler will not preempt a FIFO
process on timer interrupt.

xx
• Conventional processes uses a scheme that is similar in spirit to the RR algorithm, but is implemented in a fairer
way, which also favour I/O bounded processes.

2.9.6 Medium and long term schedulers:

Interactive systems usually allows every process to be admitted. This is not true for other batch or real time systems.
In these systems, a long term scheduler will decide whether processes re-admitted, and a medium term scheduler
may decide to put a process side by completely swapping it out of memory.

2.10 Why quantum can be (quite) long in Linux:

• For a conventional RR algorithm, a long quantum means long response time.


• However, response time is really important only for interactive processes, which is mostly I/O bounded (since
human are slow).
• In Linux, those I/O bounded processes tends not to use up the time quantum allocated to it, and is carried
forwarded the next epoch.
• This larger time-quantum gives these processes higher goodness values. Once they are complete (e.g., the user
type a key), it typically can preempt other CPU-bounded jobs immediately. The user won’t
experience any delay.
• Thus Linux can use a larger quantum (10ms) to reduce the number of context switch, while maintaining a low
response time.

xxi
2.11 Starvation and Process Aging:

Since priority schemes are not fair, it is possible that a low priority process never get serviced. The process is said to
be starved from the CPU.
• One solution is to gradually increase the priority of jobs when it is not serviced for a long time: Aging. Eg : one
may decrement priority per 15 minutes inactivity.
• This results in a more complicated algorithm, since the priority queue must support changing priority. This also
increase slightly the scheduling overhead to periodically change the priority of each process.
• The algorithm is more responsive for the low priority jobs.
• It is possible to add the idea of priority changing and process aging to RR algorithm to combat its deficiency.

CPU Turnaround Deadline  Starvation


Scheduling algorithm Throughput Response Time
Utilization time handling free

First In First Out Low Low High Low No Yes

Shortest Job First Medium High Medium Medium No No

Priority Based
Medium Low High High Yes No
Scheduling

Round Robin
High Medium Medium High No Yes
Scheduling

2.12 Clustering Algorithm:

Clustering, or processor assignment involves the collection of tasks that exchange a large amount of data onto the
same processors, while at the same time distributing the tasks in order to achieve good balancing. Heuristics have
been suggested for clustering dominant sequence clustering(DSC) algorithm is one part of a multi-part graph
scheduling system. The algorithm clusters the dominant sequence of a directed acyclic graph(DAG), which is the
critical path of a DAG whose nodes have been allocated to processors. The critical path of a DAG is the path
connecting the programs which dominate the execution time of a DAG. If the programs on the

xxii
critical path are not optimally scheduled, the resulting execution time for the job will be non-minimal. Once the
dominant sequence of the DAG has been clustered, the remainder of the DAG is placed
so as to minimize total execution time.

Algorithm:

First we describe the clustering algorithm which breaks up a given DAG (tasks) into a set of m disjoint clusters
depending on the number of processors on the system. The idea of this two phase algorithm is to decouple the
clustering algorithm from scheduling algorithm which actually maps the clusters to physical processors. Then we
give the scheduling algorithm which assigns the clusters on the processors with optimal execution time. Then we
analyze the algorithm with the help of an example and with respect to complexity.
Clustering algorithm
Step 1: Start with the task with highest index in the DAG.
Step 2: Go to the node with the highest dependency from that node.
Step 3: Go on like this until the bottom of the DAG is reached.
Step 4: Eliminate the set of vertices from the original set which occur on the selected path V= V- V’.
We get the cluster V’ which is a subset of set of vertices V.
Step 5: Modify the DAG and go on dividing it into further cluster until we get equal number of clusters as the
number of physical processors or the number of tasks exhaust. At the end of this algorithm we might encounter few
tasks which do not have any precedence relation and can be executed at any point of time. We keep all such tasks in
a separate data structure called the Independent Task Table. Now that we have the clusters, we will map the
clusters independently on different physical processors.

xxiii
2.13 Multiprocessing:

Multiprocessing is a type of processing in which two or more processors work together to process more than one
program simultaneously. It allows the system to do more work in a shorter period of time. Multiprocessing is the
use of two or more central processing units (CPUs) within a single computer system. The term also refers to the
ability of a system to support more than one processor and/or the ability to allocate tasks between them.
Multiprocessor system is also known as parallel system or tightly-coupled system. It means that multiple processors
are tied together in some manner. Generally, the processors are in close communication with each other. They share
common data structures and a common system clock.
In multiprocessor system there is one master processor and other are the Slave. If one processor fails then master can
assign the task to other slave processor. But if Master will be fail than entire system will fail. Central part of
Multiprocessor is the Master. All of them share the hard disk and Memory and other memory devices.

xxiv
Advantages of Multiprocessor Systems:

Reduced Cost: Multiple processors share the same resources. Separate power supply or mother board for each chip
is not required. This reduces the cost.

• Increased Reliability: The reliability of system is also increased. The failure of one processor does not affect the
other processors though it will slow down the machine. Several mechanisms are required to achieve increased
reliability. If a processor fails, a job running on that processor also fails. The system must be able to reschedule the
failed job or to alert the user that the job was not successfully completed.

• Increased Throughput: An increase in the number of processes completes the work in less time. It is important to
note that doubling the number of processors does not halve the time to complete a job. It is due to the overhead in
communication between processors and contention for shared resources etc.

Basic Multiprocessor Scheduling:

Given a set of runnable threads, and a set of CPUs, assign threads to CPUs
• Same considerations as Uniprocessor scheduling
• Fairness, efficiency, throughput, response time etc.
• But also new considerations
• Ready queue implementation
• Load balancing
• Processor affinity

xxv
Ready Queue Implementation:

Scheduling events occur per CPU


• Local timer interrupt.
• Currently-executing thread blocks or yields.
• Scheduler code executing on any CPU simply accesses shared queue.
• Synchronization is needed.

2.14 Load Balancing:

Try to keep run queue sizes balanced across system


• Main goal – CPU should not idle while other CPUs have waiting threads in their queues.
• Secondary – scheduling overhead may scale with size of run queue.
• Keep this overhead roughly the same for all CPUs.

xxvi
2.15 Machine Learning:

Learning denotes change in system that enables the system to do the same task next time with more
efficiency.Learning is an impotant feature of intelligence.

Major paradigms of machine learning:


 Rote learning – One-to-one mapping from inputs to stored representation. “Learning by memorization.”
Association-based storage and retrieval.
 Induction – Use specific examples to reach general conclusions
 Clustering – Unsupervised identification of natural groups in data
 Analogy – Determine correspondence between two different representations
 Discovery – Unsupervised, specific goal not given
 Genetic algorithms – “Evolutionary” search techniques, based on an analogy to “survival of the fittest”
 Reinforcement – Feedback (positive or negative reward) given at the end of a sequence of steps

2.15.1 Reinforcement Learning:

Reinforcement learning takes place in an environment where the agent cannot directly compare the results of
its action to a desired result. Instead, it is given some reward or punishment that relates to its actions. It may win
or lose a game, or be told it has made a good move or a poor one. The job of reinforcement learning is to find a
successful function using these rewards.

Reinforcement Learning refers to a class of problems in machine learning which postulate an agent exploring an
environment.The agent perceives its current state and takes actions.The environment, in return, provides a reward
positive or negative.The method attempt to find a policy for maximizing cumulative reward for the agent over the
course of problem.

xxvii
Intelligent Agent:
Agent is an entity that is capable of perceiving and do action. In computer Science, an agent is a software agent.
Percept : agent's perceptual inputs
Agent function : describes agent’s behavior
Agent program : Implements agent's function
Learning Agent consist of four main components : Learning element, Performance element ,Critic and
Problem generator.

xxviii
2.15.2 Components of a Learning System:

Performance Element: The Performance Element is the agent itself that acts in the world. It takes in percepts and
decides on external actions.

Learning Element: It is responsible for making improvements, takes knowledge about performance element and
some feedback, determines how to modify performance element.

Critic: Tells the Learning Element how agent is doing (success or failure) by comparing with a fixed standard of
performance.

xxix
Problem Generator: Suggests problems or actions that will generate new examples or experiences that will aid in
training the system further.

Reward Function:
 The function defines the goal in a reinforcement learning problem.
 Maps each perceived state (or state-action pair) of the environment to a single number, indicating intrinsic
desirability of that state.
 Indicates what is good in an immediate sense.
 Tells what are the good and bad events for the agent; Example : a biological system identify rewards high
with pleasure and low with pain .

Maximize Reward:
Agent's goal is to maximize reward it receives in long run. "Agent–Environment" interaction is generally a
sequence of episodes. If the interaction result into a sequence of separate episodes, it is known as episodic tasks.
If the interaction does not break into identifiable episodes, but continue without limit is called continuing tasks.

2.16 Fuzzy logic

Fuzzy logic is a form of multi-valued logic derived from fuzzy set theory to deal with reasoning that is approximate
rather than accurate. In contrast with "crisp logic", where binary sets have binary logic, fuzzy logic variables may
have a truth value that ranges between 0 and 1 and is not constrained to the two truth values of classic propositional
logic. Furthermore, when linguistic variables are used, these degrees may be managed by specific functions.Fuzzy
logic emerged as a consequence of the 1965 proposal of fuzzy set theory by Lotfi Zadeh.Though fuzzy logic has
been applied to many fields, from control theory to artificial intelligence, it still remains controversial among most
statisticians, who prefer Bayesian logic, and some control engineers, who prefer traditional two-valued logic.

xxx
A basic application might characterize subranges of a continuous variable. For instance, a temperature measurement
for anti-lock brakes might have several separate membership functions defining particular temperature ranges
needed to control the brakes properly. Each function maps the same temperature value to a truth value in the 0 to 1
range. These truth values can then be used to determine how the brakes should be controlled.

Fuzzy logic temperature

In this image, the meaning of the expressions cold, warm, and hot is represented by functions mapping a
temperature scale. A point on that scale has three "truth values"—one for each of the three functions. The vertical
line in the image represents a particular temperature that the three arrows (truth values) gauge. Since the red arrow
points to zero, this temperature may be interpreted as "not hot". The orange arrow (pointing at 0.2) may describe it
as "slightly warm" and the blue arrow (pointing at 0.8) "fairly cold".

Example:

Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in applying this is that the appropriate fuzzy
operator may not be known. For this reason, fuzzy logic usually uses IF-THEN rules, or constructs that are
equivalent, such as fuzzy associative matrices.

Rules are usually expressed in the form:


IF variable IS property THEN action

For example, a simple temperature regulator that uses a fan might look like this:

IF temperature IS very cold THEN stop fan


IF temperature IS cold THEN turn down fan
IF temperature IS normal THEN maintain level
IF temperature IS hot THEN speed up fan

There is no "ELSE" – all of the rules are evaluated, because the temperature might be "cold" and "normal" at the
same time to different degrees.

xxxi
The AND, OR, and NOT operators of boolean logic exist in fuzzy logic, usually defined as the minimum,
maximum, and complement; when they are defined this way, they are called the Zadeh operators. So for the fuzzy
variables x and y:

NOT x = (1 - truth(x))
x AND y = minimum(truth(x), truth(y))
x OR y = maximum(truth(x), truth(y))

There are also other operators, more linguistic in nature, called hedges that can be applied. These are generally
adverbs such as "very", or "somewhat", which modify the meaning of a set using a mathematical formula.

2.17 Memory Architecture:

The processor speed has gone up many folds over the years, but memory speed though increasing has not kept pace.
This is also due to the fact that higher complexity of the applications also demand larger memory sizes. Because of
this speed mismatch, memory has become the bottleneck. One can notice that development of techniques such as
caching, paging etc. are results of processor-memory speed mismatch. The problem becomes even more complicated
in embedded systems because of real time processing requirements.There are a number of factors which need to be
considered in the design of memory architecture:

 Word size, line size and number of ports.

 Interleaved/non-interleaved memory and inter-leaving factor.

 Synchronous/Asynchronous memory.

2.18 Application Specific Memory Design:

Designing such an architecture is a complex optimization problem. A designer has to decide on the following
aspects manually or using CAD tools in order to realize an application specific Multiprocessor.

xxxii
2.19 Architectural Resource Allocation:

The architecture is composed of multiple processors connected to the memories. These processors could be of the
same type (homogeneous multiprocessor) or of different types eterogeneous multiprocessor).Architectural resource
allocation is nothing but deciding the number and types of processors to be instantiated in the architecture. This
phase is driven by application’s computation requirements. Apart from processor allocation, decision on memories
is also taken.

Interconnection Network (IN) Design: Once processors and memories have beendecided, they need to be
connected through an interconnection network. The IN could be as simple as a shared bus, a cross-bar switch or
some complex IN such as amesh. This design is driven by the communication requirements of the application.

Application Mapping onto the Architecture: Mapping various parts of the application onto the
architecture is another important design decision. Essentially, in this step, computation needs of the application are
mapped to the processors and communication requirements to the communication resources such as buses,
memories etc. This phase is again guided by the application’s requirements and performance constraints.

Scheduling: There are two possible ways in which the application can be executed.One possibility is to statically
schedule the application onto the architecture such that constraints are met. Static scheduling is mostly pessimistic
and uses worst case behavior of the application in order to provide performance guarantees. The other possibility is
to use a dynamic scheduler which takes scheduling decisions at run time. This scheme does not rely on worst case
behavior and decisions are taken dynamically.

2.20 Multiprocessor vs Multicore :

A multi-core processor is composed of two or more independent cores. One can describe it as an
integrated circuit which has two or more individual processors (called cores in this sense).Manufacturers

xxxiii
typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP),
or onto multiple dies in a single chip package. A many-core processor is one in which the number of
cores is large enough that traditional multi-processor techniques are no longer efficient this threshold is
somewhere in the range of several tens of cores and probably requires a network on chip. (The
threshold would be reached when the Operating System have more processor cores than the (average)
number of simultaneously running processes, because beyond that point, to use all cores efficiently, an
Operating System must be able to know how to divide a single process onto multiple cores.)

Advantages :

The proximity of multiple CPU cores on the same die allows the cache coherency circuitry to operate at a much
higher clock-rate than is possible if the signals have to travel off-chip. Combining equivalent CPUs on a single die
significantly improves the performance of cache snoop (alternative: Bus snooping) operations. Put simply, this
means that signals between different CPUs travel shorter distances, and therefore those signals degrade less. These
higher-quality signals allow more data to be sent in a given time period, since individual signals can be shorter and
do not need to be repeated as often.

The largest boost in performance will likely be noticed in improved response-time while running CPU-intensive
processes, like antivirus scans, ripping/burning media (requiring file conversion), or searching for folders. For
example, if the automatic virus-scan runs while a movie is being watched, the application running the movie is far
less likely to be starved of processor power, as the antivirus program will be assigned to a different processor core
than the one running the movie playback.

xxxiv
Assuming that the die can fit into the package, physically, the multi-core CPU designs require much less printed
circuit board (PCB) space than do multi-chip SMP designs. Also, a dual-core processor uses slightly less power than
two coupled single-core processors, principally because of the decreased power required to drive signals external to
the chip. Furthermore, the cores share some circuitry, like the L2 cache and the interface to the front side bus (FSB).
In terms of competing technologies for the available silicon die area, multi-core design can make use of proven CPU
core library designs and produce a product with lower risk of design error than devising a new wider core-design.
Also, adding more cache suffers from diminishing returns

Disadvantages

Maximizing the utilization of the computing resources provided by multi-core processors requires adjustments both
to the operating system (OS) support and to existing application software. Also, the ability of multi-core processors
to increase application performance depends on the use of multiple threads within applications. The situation is
improving: for example the Valve Corporation's Source engine offers multi-core support, and Crytek has developed
similar technologies for CryEngine 2, which powers their game, Crysis. Emergent Game Technologies' Gamebryo
engine includes their Floodgate technology which simplifies multicore development across game platforms. In
addition, Apple Inc.'s latest OS, Mac OS X Snow Leopard has a built-in multi-core facility called Grand Central
Dispatch for Intel CPUs.

Integration of a multi-core chip drives chip production yields down and they are more difficult to manage thermally
than lower-density single-chip designs. Intel has partially countered this first problem by creating its quad-core
designs by combining two dual-core on a single die with a unified cache, hence any two working dual-core dies can
be used, as opposed to producing four cores on a single die and requiring all four to work to produce a quad-core.
From an architectural point of view, ultimately, single CPU designs may make better use of the silicon surface area
than multiprocessing cores, so a development commitment to this architecture may carry the risk of obsolescence.
Finally, raw processing power is not the only constraint on system performance. Two processing cores sharing the
same system bus and memory bandwidth limits the real-world performance advantage. If a single core is close to
being memory-bandwidth limited, going to dual-core might only give 30% to 70% improvement. If memory
bandwidth is not a problem, a 90% improvement can be expected [.It would be possible for an application that used
two CPUs to end up running faster on one dual-core if communication between the CPUs was the limiting factor,
which would count as more than 100% improvement.

2.21 Preliminary Work Done:

We have taken Introduction to Computer Programming, Introduction to computer Programming Lab, Algorithms,
Operating Systems, Operating Systems Lab, System and Network Programming Lab. We have gained information
which is crucial for completing this project successfully. Knowledge gained through study and practical of these
subjects will help us implement our project.

xxxv
2.22 Resources Available:
In terms of input we need:

Hardware Systems: We need any PC (a computer) whose configuration could be anything close to

 RAM: 1 GB
 Hard Disk: 40 GB and above
 Processor: Intel Pentium 3 or above or AMD processor
 Peripheral devices: Monitor, Keyboard, Mouse

Software Tools: C compiler and Linux OS

All the required resources are available in order to implement our project, i.e. Open source scheduler. The source
language would be C. And knowledge of Operating system, programming language i.e C/C++ is also required.We
have required OS and compiler installed on our computers so that we can go ahead for the project. For the
information about the latter, we have various books, Articles and Research Paper with us, and also the faculty
members and the project led our project guide.

Resource Availability Index (RAI):

As per our knowledge, we can rank the resources available as 0.80 on the scale of 0 to 1.

CONFIDENCE FACTOR :

0<=CF<=1

This factor accounts for how much we are able to exploit the resources available. In our case, since our
resources are available, their exploitation depends on the amount of the knowledge base we have about what our
project is all about so that we can go forth and implement it, And for the purpose of the latter, we have done and we
are continually doing as much research about the concepts involved as possible, and we have already gone through
all the basic concepts involved. Different parameters to decide the Confidence Factor are-

 Available resources

We have all the hardware and software resources to develop this project.

xxxvi
CF1 = 0.8

 Knowledge of the Subject:

To develop the system it is required that we must have knowledge of the basics of Operating system, artificial
intelligence and C programming language. On the basis of our knowledge of the subjects we assume,
CF2 = 0.7

 Time Management:

Time available to accomplish the project is sufficient, so all the phases can be completed in expected time slots. We
have 9 months to complete our project and as for now it is expected that we will successfully do it.

CF3 = 0.8

Thus to calculate the confidence factor (CF)

CF= avg (0.8, 0.7, 0.8)

 CF= 0.77

Therefore, the confidence factor, until now, can be considered to be 0.77 on a scale of 0 to 1.

CAPABILITY INDEX (CI) :

This denotes the ability required to synergize the resources available. Since we are now equipped with all
the basic concepts and tools involved, we believe synergizing the requirements into a basic design, and this design
into a good code will not be difficult for us. Therefore, according to us, our capability index at this point of time can
be said to be of the rank 0.6, on a scale of 0 to 1.

CI= RAI * CF

CAPABILITY INDEX (CI) = (0.80 * 0.77)

xxxvii
 CI= 0.61

2.23 Scope of Work:

Today, more and more, enterprise computing relies increasingly on computer networks. Network environments are
growing, fueled by extensive enhancements of computer hardware and software as well as the rapid growth of the
Internet and World-Wide Web (WWW or the Web). More systems are being connected to networks, rapidly
increasing traffic on a server. Such growth impacts on the performance of many Servers as how they handle the
tasks as they get many request at the same time. These conditions gave birth to various problems and solutions for
Scheduling technique. We propose to implement a scheduler which will increase the performance of computing by
making change in the code of Linux scheduler. This ensures better performance. This scheduler will increase the
utilization of the CPU means it can execute tasks at a faster rate. The waiting time will get decrease, execution rate
will increase. Finally we will find a better scheduling technique.

2.24 Requirement Analysis:

This section analyses each functional requirement listed above in the same order. It gives the reason why each of the
requirements has been considered.

 It should be able to execute the tasks according to the priority and as they arrive and to maximize the
utilization of CPU.
o One can never be sure regarding what type of tasks can arrive means high priority or low priority.
It has to identify and then act accordingly

 Scheduling of equal priority tasks must be done according to time at which the task arrived.
o This is very important as the person who requested first must get the service before then who
requested later.

 Scheduler must allocate tasks to every CPU. No CPU must remain idle if there is any service request.
o Scheduler must act in a way to maximize the utilization of the CPU’s connected. If there are tasks
to b processed, then it must be assigned accordingly.

 Open source Scheduler is needed to perform this task.


o This we can perform in an open source scheduler as we have to do changes in scheduler coding. It
can’t be done in windows as its not an open source.

xxxviii
 C compiler is needed.
o The change in coding need to be done will be either C or C++. So the compiler must be there to
compile and then only our purpose can be fulfilled.

 Five to six computers are required to perform the test.


O We need five to six system. We will make two or three as a server and from others we will send
the service request to see how it utilizes the CPU.

xxxix
Chapter – 3

xl
3. Problem & Suggested Solution

3.1 Problem Formulation


Each job arriving into the system requires a certain number of CPUs to be executed at the maximum rate but can be
executed with fewer CPUs at a slower rate. The final job utility decays as its waiting+ execution time increases. In
addition to deciding on the order in which the arriving jobs should be scheduled, the scheduler is also allowed to
“squeeze” a job into a smaller number of CPUs than it ideally requires, thereby extending its execution time and
receiving a smaller final utility. Alternatively, the scheduler can wait until the initially requested number of CPUs
becomes available for a particular job in order to ensure the maximum execution rate, while risking to have a long
waiting time for this job and a low final utility once again. The scheduler can also suspend some of the currently
running jobs so as to schedule some of the waiting jobs, and resume the execution of the suspended jobs at a later
time. If some CPUs become available before the “squeezed” job completes its execution, the scheduler assigns more
CPUs to that job until it gets all its originally requested CPUs.
This method presents a novel utility-based scheduling framework capable of adequately resolving the tradeoffs
mentioned above. An overview of the considered problem is given below.
The unit utility of each job is a decreasing function of the job completion time, which includes the job waiting time.
The final utility of each job is its unit utility multiplied by K*L, where K is its desired number of CPUs and L is its
ideal execution time if the job were to receive K CPUs. The ideal execution time can either be specified as a part of
the job description or can be predicted by the scheduler using the past execution times of similar jobs (an example of
such a prediction approach is described in [4]). The K*L factor is introduced to reflect the assumption that larger
and longer jobs should receive the same scheduling priority as smaller and shorter jobs – to make the scheduler
indifferent between scheduling one job requiring K CPUs and having the ideal execution time L and scheduling K*L
jobs each of which requires one CPU and has an ideal execution time of one unit of time. The jobs are assumed to
arrive randomly into the system. The goal is to continually schedule incoming jobs onto the system and to decide
how many CPUs should be allocated to each job so as to maximize average system productivity (final utility from
all completed jobs) per time step.

3.2 Solution Methodology

3.2.1 Overview

The proposed utility-based scheduling framework most generally applies to the multi-machine grid environment (or
a multi-module massively parallel computer), where multiple machines may be available to accept multiple waiting
jobs. The basic scheduling algorithm uses the best-fit technique: each machine selects the next job to be scheduled

xli
which has the tightest fit for this machine (will have the fewest free CPUs remaining after the job is scheduled). If
there is a tie among the jobs that provide the best fit to the available resources, then the highest-utility job gets
scheduled. In addition to this basic scheduling algorithm, this paper considers two additional types of decisions that
can be made by the scheduler independently on each machine, which apply if the machine cannot fit any of the
currently waiting jobs.
The first preemption policy decides whether one or several of the jobs currently running on that machine should be
temporarily suspended in order to allow one of the waiting jobs to be scheduled with all the desired CPUs. The
suspended jobs are placed in the waiting queue for that machine and can be resumed from the place they left off.
The second oversubscribing policy decides whether any of the currently waiting jobs should be “squeezed” into the
remaining free CPUs on that machine even if there are fewer of them than the job ideally desires. Both of these
policies make scheduling decisions by selecting the job configuration that provides the best starting point for the
machine’s future operations in terms of maximizing the long-term utility (value) from all jobs completed in the
future. Thus, the key component of the proposed scheduling framework is the possibility of learning the value
function for each machine based on the state of its resources, the jobs it is executing and the jobs waiting to be
scheduled. Once such a value function is obtained, any kind of scheduling decisions can be considered on each
machine (not just preemption and oversubscribing), which can now be driven by maximization of the machine’s
state value following the scheduling decision.
The most common approach for learning functions that match inputs to outputs is to assume some flexible function
approximation architecture with tunable parameters and then adjust these parameters so as to improve the quality of
approximation. Any kind of parameterized value function approximation architectures can be used in the proposed
scheduling framework, since the only criterion for its compatibility with the reinforcement learning methodology is
function differentiability with respect to each tunable parameter. As a demonstration, a parameterized fuzzy rulebase
will be used in this paper to approximate machine value functions. The fuzzy rulebase parameters will then be tuned
using the reinforcement learning process described in Section 3.4, which consists of observing the utilities of
completed jobs following different states of the machine and changing parameters of the value function so as to
increase the value of states after which high job utilities were observed while lowering the value of the states after
which low job utilities were observed.

3.2.2 Fuzzy Rulebase

A fuzzy rulebase is a function f that maps an input vector x € R K into a scalar output y. This function is formed out
of fuzzy rules, where a fuzzy rule i is a function fi that maps an input vector x € R k into a scalar pi. The following
common form of the fuzzy rules is used in this paper:
Rule i: IF (x1 is Si1) and (x2 is Si2) and ... (xK is SiK) THEN (output=pi),
where xj is the jth component of x, Sij are the input labels in rule i and pi are the output coefficients.

xlii
The degree to which the linguistic expression (x j is Sij) is satisfied is given by a membership function μ: R→R which
maps its input xj into the degree to which this input belongs to the fuzzy category described by the corresponding
label. The output of the fuzzy rulebase f(x) is a weighted average of pi:

where M is the number of rules and wi(x) is the weight of rule i. The product inference is commonly used for
computing the weight of each rule:

The membership functions μSij(xj), i = 1, ...,M, are usually chosen so as to jointly cover the range of possible values
of the input variable xj . Therefore, each fuzzy rule can be visualized as a box in space with fuzzy boundaries (jointly
these boxes cover the range of possible values for x), so that when x is observed within a box i, the output
recommended by the rule i is pi. The actual rulebase output is a weighted sum of pi, with the weights indicating the
“distance” between x and the center of box i. If the membership functions μ() are kept constant and only the output
coefficients pi are tuned, then the fuzzy rulebase becomes equivalent to a linear combination of basis functions – a
well known statistical regression model. If the membership functions are tuned as well, then the above form of the
fuzzy rulebase was proven to be a universal function approximator , just like a multi-layer perceptron neural
network. However, tuning the membership functions requires a nonlinear learning algorithm, which is much harder
to set up and use. In practice, tuning the output coefficients pi is often sufficient to come up with a good policy, and
for simplicity of exposition this approach is taken in this project.

3.2.3 Value Function Architecture for Job Scheduling

As a simple demonstration of the proposed scheduling framework, we use the following variables for computing the
value of machine state at any point in time (a more sophisticated input set could be considered):

• x1 = average unit utility expected to be received by the currently running jobs (introduced in
Section 2 and computed from Figure 1) weighted by the number of CPUs each job is currently
occupying,
• x2 = the expected time remaining until any of the currently running jobs is completed,
• x3 = the number of currently idle CPUs on the machine.
Our experimental results showed that the three input variables described above are sufficient for encoding the
machine state under the assumption that “squeezing” a job on a given machine affects only the execution time of

xliii
that job. If, however, the execution time of all jobs on that machine is affected, then a fourth variable should be used
by each machine – the average degree to which all of its jobs are “squeezed.”
The above variables are used as inputs to the fuzzy rulebase, and its output ˆ V (x) is an approximation to the
expected future utility per time step obtained from completed jobs when starting from the state x. A description of
the reinforcement learning algorithm for updating the rulebase parameters pi to improve its approximation quality
will be given in section 3.4.
The following fuzzy rulebase were used to represent ˆ V (x):
Rule 1: IF (x1 is S1) and (x2 is S2) and (x3 is S3) then p1
Rule 2: IF (x1 is S1) and (x2 is S2) and (x3 is L3) then p2
Rule 3: IF (x1 is S1) and (x2 is L2) and (x3 is S3) then p3
Rule 4: IF (x1 is S1) and (x2 is L2) and (x3 is L3) then p4
Rule 5: IF (x1 is L1) and (x2 is S2) and (x3 is S3) then p5
Rule 6: IF (x1 is L1) and (x2 is S2) and (x3 is L3) then p6
Rule 7: IF (x1 is L1) and (x2 is L2) and (x3 is S3) then p7
Rule 8: IF (x1 is L1) and (x2 is L2) and (x3 is L3) then p8
The above rules softly classify each input variable xj into two fuzzy categories: small (S) and large (L). The weight
wi of the fuzzy rule i is the product of the degrees to which every precondition is satisfied. The membership
functions μ, which compute these degrees, have the following form:
• degree to which (x1 is L1): μL1(x1) = x1
• degree to which (x2 is L2): μL1(x2) = x2/MaxJobLength
• degree to which (x3 is L3): μL3(x3) = x3/N,
• degree to which (x1 is S1): μS1(x1) = 1 − x1
• degree to which (x2 is S2): μS2(x2) = 1 − x2/MaxJobLength
• degree to which (x3 is S3): μS3(x3) = 1 − x3/N
where N is the maximum number of CPUs on the machine and MaxJobLength is the maximum length of a job that
can be scheduled on this machine. The output of the above fuzzy rulebase is computed according to equation (1).
Since each membership function μ(xj) is continuous and the product inference is used, the output of the fuzzy
rulebase changes smoothly as the inputs change, conforming to the idea described in section 3.3.4 of “softly”
generalizing the learned experience across similar states. The rulebase parameters pi can be set using prior
knowledge or dynamically adjusted starting from any initial values using the reinforcement learning (RL) procedure
described in the next section.

3.2.4 Reinforcement Learning Algorithm for Tuning Value Functions

We first describe the general mathematical context of the Markov Decision Process (MDP) where
reinforcement learning (RL) algorithms can be used. An MDP for a single agent (decision maker) can
be described by a quadruple (S, A,R, T) consisting of:

xliv
• A finite set of states S
• A finite set of actions A
• A reward function r : S × A × S → R
• A state transition function T : S × A → PD(S), which maps the agent’s current state and action
into the set of probability distributions over S.
At each time t, the agent observes the state st 2 S of the system, selects an action a € A and the system changes its
state according to the probability distribution specified by T, which depends only on st and at. The agent then
receives a real-valued reward signal r(st, at, st+1). The agent’s objective is to find a stationary policy : S → A that
maximizes expectation the average reward per time step starting from any initial state.
For a stationary policy _, the average reward per time step is defined as:

The optimal policy Π* from a class of policies Π is defined as Π* = argmax p


Π
The value of a state s
under a policy Π is defined as:

A well-known procedure for iteratively approximating V π(s) is called temporal difference (TD) learning.
Its simplest form is called TD(0):

where at is the learning rate, ˆ V (s) is the current approximation to V π(s) when the policy π is being
followed and pt is updated as:

The TD approach based on assigning a value to each state becomes impractical when the state space becomes very
large or continuous, since visits to any given state become very improbable. In this case, a function approximation
architecture needs to be used in order to generalize the value function across neighboring states. Let ˆ V (s, p) be an
approximation to the value function V π(s) based on a linear combination of basis functions with a parameter vector

p
The parameter updating rule in this case becomes (executed for all parameters simultaneously):

where the average reward estimate pt is updated as in equation (5). The above iterative procedure is also guaranteed
to converge to the locally optimal parameter vector p* (the one giving the best approximation to V π(s) in its

xlv
neighborhood) if certain additional conditions are satisfied . The most important ones are that the basis functions
фi(s) are linearly independent and that the states for update are sampled according to the steady-state distribution of
the underlying Markov chain for the given policy _. Note that the state description chosen for each machine in
section 3.3 implies that a value-maximizing scheduling decision immediately changes the machine state s to a
higher-valued state ˜s, from which the state evolution proceeds. The particular RL methodology we propose for
tuning the value function parameters makes use of this observation by specifying that the state st in equation (6) is
the one observed AFTER a scheduling decision (if any) is taken. That is, decisions made during the scheduling
process affect only the way the states are sampled for update in equation (6), but the evolution of a state st to a state
st+1 is always governed by a transition policy π embedded into the particular scheduling environment (which
depends on the job arrival rate, on the distribution of job characteristics, etc.). So far no theoretical convergence
analysis has been performed for the proposed sampling approach of choosing at every decision point the highest-
valued state out of those that can be obtained, but our positive experimental results suggest its feasibility.
Also note that the above view of the learning process as that of estimating the value function for a fixed
state transition policy eliminates the need for performing action exploration (which is necessary when we try to learn
a new state transition policy), thus making it possible to use this approach on-line without degrading performance of
the currently used policy. For example, if we initialize the machine value function to prefer some obviously bad
states, then after some period of learning and tuning the value function parameters, the values of these states will
come down to their real levels and the system will start using appropriate state values when making scheduling
decisions, without an explicit exploration having been performed.
The final defining aspect of the RL algorithm we propose for the scheduling domain described in Section 2 is the
timing of the value function parameter updates. The traditional implementation of RL is to update parameters
whenever the system’s state changes. However, in our case this would lead to parameter updates at every time step,
since the input variable 2 to the value function (the expected time remaining until any of the currently running jobs
is completed) constantly changes. Such frequent parameter updates would “dilute” the impact of any action taken by
the scheduler (such as preemption/ squeezing) on the future states and also create an unnecessary computational load
on the scheduler. Instead, we propose to perform the updates using equation (6) when the machine state is changed
due to a job being preempted or “squeezed” into fewer than desired CPUs. The parameters are also updated when
the machine state is changed due to a job completing or a new job being scheduled following the situation when the
decision to “squeeze” or preempt any jobs was considered but not made. In this way, the connection between the
most recent “free action” by the scheduler and the next state when another action can be taken is most clear.
A fuzzy rulebase, as described in section 3.2, is an instance of a linear parameterized function approximation
architecture, where the normalized weight of each rule i is a basis function ф i(s) and pi are the tunable parameters.
The reward signal r(st, πst, st+1) is computed as the total utility received from completed jobs during the time elapsed
between states st and st+1. The vector x of section 3.3, used as an input to the fuzzy rulebase, is treated as the state
vector s in equation (6) when updating the parameters of the fuzzy rulebase. The vector x does not summarize all the
information needed for determining its expected future evolution, which implies that we have set up this problem as
a Partially Observable MDP (POMDP). This was a conscious decision, since accounting for all the relevant

xlvi
information (the expected completion time of each running job, the number of CPUs each job occupies and the
characteristics of all currently waiting jobs) would lead to a very high-dimensional state, making the learning very
difficult. The experimental results presented in the next section demonstrate that the RL approach presented above is
robust enough to learn good decision policies starting from parameters pi are being initialized at 0 even when the
system dynamics deviates from the ideal MDP environment (when the state summarizes all relevant information and
decisions can be made at every time step).

3.3 Value-Based Job Scheduling Algorithm

3.3.1 Scheduling on a Single Machine

Given an architecture for approximating the machine state value (which can still be in the process of being tuned by
RL), the following general steps are taken by the scheduling algorithm whenever a new job arrives into the queue or
one of the currently running jobs is completed:
1. Schedule jobs onto the free CPUs without any preemption or oversubscribing using any ”traditional” scheduling
approach such as best-fit scheduling.
If any jobs are still waiting, perform the following steps:
2. Compute the variables x1, x2, and x3 for the current machine state.
3. Use the computed state vector x as an input to the fuzzy rulebase to compute V0 – the long-term expected utility
(average utility per time step) that can be obtained by the machine if none of the currently waiting jobs are
scheduled and the same algorithm is invoked at every future decision point. If the preemption policy is enabled, the
scheduling algorithm performs the following two steps for every waiting job i:
4. Compute the alternate possible machine state in terms of the new values for x1, x2, and x3 that
would arise if job i were forcefully scheduled, preempting enough jobs to fit itself, in the order of
increasing remaining utility. The remaining utility of each running job is its expected unit utility at completion
divided by the expected time remaining to completion. If job i needs to preempt a job with a higher remaining unit
utility than its own in order to fit itself (e.g., if all the currently running jobs have high remaining unit utilities), then
it is eliminated from further consideration for forceful scheduling.
5. If job i qualifies for forceful scheduling, use the alternate state vector x0 computed in step 4 as an input to the
i
fuzzy rulebase to compute V 1 – the long-term expected utility that can be obtained by the machine if job i were
forcefully scheduled, assuming the same algorithm will be invoked at every future decision point.
6. Let MaxV 1 = maxi(V1i)
i
7. If MaxV 1 > V0 then preempt enough lowest-utility jobs and schedule the job with the highest V 1 onto the
machine; otherwise, do not preempt any jobs and do not schedule any of the currently waiting jobs.
If the oversubscribing policy is also enabled, the scheduling algorithm performs the following two
steps for every job i that is still waiting to be scheduled:
8. Compute the alternate machine state in terms of the variables x1, x2, and x3 that would result if job

xlvii
i were scheduled onto the free CPUs, oversubscribing the machine.
9. Use the alternate state vector x0 computed in step 8 as an input to the fuzzy rulebase to compute
V2 i – the expected long-term benefit of oversubscribing the machine with the job i, assuming the same
algorithm will be invoked at every future decision point.
10. Let MaxV 2 = maxi(V 2i)
11. If MaxV 2 > V0 then oversubscribe the machine with the job that has the highest V i2; otherwise -
do not oversubscribe.

3.3.2 Scheduling on Multiple Machines

If several machines are available to accept the waiting jobs, then the scheduling algorithm needs to decide how to
allocate jobs among the machines. The natural objective function for this problem is to maximize the sum of values
of all machines at the end of the allocation process. Unfortunately, the classical allocation algorithms based on
Integer Programming do not apply to this problem because the value of each machine is nonlinear in the jobs
allocated to it. However, various heuristics can be used instead, where a given allocation is perturbed in some way,
and the new allocation is implemented if it increases the total value of all machines. A study of various job
allocation heuristics and allocation perturbation methods on multiple machines is outside the scope of this paper, as
we want to demonstrate here the possibility of learning machine value functions that are required for all such
heuristics. As a simple example, following two-stage process can be used. First, jobs are scheduled without
preemption or oversubscribing until no more jobs can fit in. This can be accomplished by scheduling jobs one at a
time by fixing a machine and finding the best-fitting job or fixing a job and finding the best-fitting machine. The
first heuristic is expected to work better when the number of machines is small relative to the number of
unscheduled jobs, while the second one is expected to work better in the opposite scenario. In the second stage of
the scheduling process, each machine needs to decide whether some of the currently running jobs should be
preempted in order to schedule any jobs that are still waiting or “squeeze” any of the currently waiting jobs into the
CPUs that are still available. In the simplest case, each machine can independently execute the algorithm described
in section 4.1.

xlviii
3.4 LinSched: The Linux Scheduler Simulator

LinSched is a user-space program that hosts the Linux scheduling subsystem. Its purpose is to provide a tool for
observing and modifying the behavior of the Linux scheduler, and prototyping new Linux scheduling policies, in a
way that may be easier (or otherwise less painful or time-consuming) to many developers than direct
experimentation with the Linux kernel, especially in the initial stages of development. Due to the high degree of
code sharing between LinSched and the Linux scheduler, porting LinSched code to Linux is reasonably
straightforward. LinSched may be especially useful to those who are new to Linux scheduler development.

Architecture

LinSched consists of three main components,as illustrated in Fig. .The simulation engine presents an API that can be
used to initialize and control a simulation, and calls the appropriate scheduler functions to simulate the Linux
scheduling policies. The environment module provides an abstraction of the Linux kernel in terms of code
dependencies such as functions and macros (most of which are supplied directly by including Linux source and
header files), and is responsible for presenting an appropriate platform topology to the simulation engine. Finally,
stimuli are provided by a scripting interpreter or another tool that uses the API of the simulation engine to create
tasks and start a simulation.This component can be parameterized, so that batch scripts can be used to run
simulations where many different types of workloads and platforms are investigated.

xlix
l
3.5 Architecture

li
3.6 Design of Agent:

lii
liii
Chapter - 4

4. STORM

liv
Simulation TOol for Real-time Multiprocessor scheduling.

4.1 Introduction

STORM that stands for “Simulation TOol for Real time Multiprocessor scheduling”, is a
multiprocessor scheduling simulation and evaluation platform. Figure 4.1 gives an overview of
the STORM platform architecture. For the time being, engineering efforts on STORM have
concerned its simulator component since it is the retaining element of the platform. For a given
“problem” i.e. a software application that has to run on a (multiprocessor) hardware architecture,
this simulator is able to “play its execution” over a specified time interval while taking into
account the requirements of tasks, the characteristics and functioning conditions of hardware
components and the scheduling rules.

As shown in Figure 4.2, the specification of the architectures and scheduling policy to be
simulated is done via an XML file. The result of simulation is a set of execution tracks that are
either made directly observable through diagrams, or recorded into files for a subsequent
computer-aided analysis. All these results allow the user to analyze the behaviour of the system
(tasks, processors, timing, performances, etc.).

STORM is a free, flexible and portable tool :

i) “free” because the STORM software is freeware under Creative Commons License .
ii) “flexible” because it offers the possibility to the user to program and add new
components through well defined API(s) with the simulation kernel.
iii) “portable” because it is possible to run it on various OS thanks to the Java
programming language.

lv
4.2 The STORM platform

Generator

Software Architecture Hardware Architecture

Simulator

Execution Task

Analyzer

Metrics

Figure 4.1

lvi
4.3 The STORM simulator :-

Hardware Software
Architecture Architecture Software Components

Configuration parameters
Hardware Components

Simulator

Diagrams Report Export

Figure 4.2

lvii
4.4 The architecture of the STORM simulator:

4.4.1. The functional architecture : The architecture of the STORM simulator is


composed of a set of entities built around a simulation kernel (see Figure 4.3). Software entities
stand for the tasks and data that compose the software architecture for which the simulation is
conducted. Up to now, hardware architectures are composed of processors only; that’s why
processor entities are the only hardware ones. At the moment, system entities are the task list
manager, the scheduler and the memory manager. The scheduler entity is in charge of sharing the
processor(s) between the ready tasks; its election rules depending on the scheduling strategy it
implements.

Figure 4.3

lviii
4.4.2. The software architecture :
The STORM simulator is written in Java programming language which makes it independent of
any execution platform. Figure 4 gives a simplified view of the current UML class diagram of
STORM.

Figure 4.5

In accordance with Figure 4.5, we can find there, on the one side, the SimulationKernel final
class that implements the kernel of the simulator, and on the other side, a set of classes (Task,
Data, Processor, Scheduler, TaskListManager, etc.) and their subclasses that model the various
simulation entities (some of them correspond to the value that has to be given to the className
attributes in the input xml file). All of the subclasses inherit from the superclass Entity. It
contains the declaration of the methods involved in the implementation of the different services.
Depending on the subclasses and the behaviour they must exhibit, those methods may need to be
overrided

lix
4.5 The entities that interact with the scheduler component :

The scheduler entity is led to interact mainly with:


i) the task entities.
ii) the processor entities.
iii) the simulation kernel.

4.5.1 The task entity :

There are as many task entities as specified in the xml input file. It is important to note that these
entities capture the behaviour of the real components they represent only from a control
viewpoint and not a functional one, i.e. no applicative programs run for the tasks. Whatever its
type, the generic state diagram of a task entity is shown on Figure 4.6. At the very beginning, a
task is unexisting. As soon as its first activation occurs, it becomes ready and falls under the
control of the task list manager. Depending on the scheduling decisions, it may run (running
state) and possibly be pre-empted. On its definitive completion, the task goes back into the
unexisting state. On a job completion, it becomes waiting until all its execution conditions be
met, i.e. only its next release in case of an independent task, but together with the availability of
all the data it requires in case of a consumer task.

 The task state diagram

Figure 4.6
lx
4.5.2 The processor entity : Each processor of the considered hardware architecture specified in
the xml input file has its equivalent processor entity. Up to now no control is modeled in such an entity
but instead it encapsulates some properties stating about its current activity (no operational, idle, busy
with the running task identifier) and its current functional conditions (voltage, frequency in case of a
processor with DVFS capabilities), together with the functions for updating them.

4.5.3 The simulation kernel

A simulation is achieved in a discrete way, i.e. the overall simulation interval is cut into a
sequence of unitary slots [0,1), [1,2), …, [t,t+1), etc., and simulation moves forward at each
instant 0, 1, …t, t+1, etc. Thus, at t, the next simulation state (for instant t+1) is computed from
the current one (of instant t) and the pending simulation events at time t. From a functional point
of view, the kernel provides a very few basic services for managing simulation time and some
interactions between entities. Only the time service may concern the designer of a scheduler
component. It indicates the spending of time to the simulation entities. Thus, at each slot, all the
simulation entities are informed that a unit of time has elapsed (of course, it can be ignored by
the entity if it has no concern about time). Due to our discrete approach for the simulation
process, the behaviour of the simulation kernel is a cyclic one as shown by the Figure 6, one
cycle for one slot. After a necessary initial step where all the simulation entities are created and
the global time variable is initialized, the loop of cycles is entered. Any cycle is split into
successive steps that come down to:

a) manage the watchdogs, i.e. to detect those possible watchdogs (cf. the watchdog service no
described here) that expire and to call the specified function of the specified entity.
b) process all the currently pending kernel messages (it leads for the most to inform other
simulation entities of the current situation; no described here).
c) call upon the scheduler. We underline that it doesn’t mean that the scheduling selection rules
have to be applied at each slot: that is part of scheduling policy type and implementation
choice.
d) manage time passing, i.e. to inform all the simulation entities of a tick (cf. the
aforementioned time service).
e) ask for the task list manager to operate (no described here).
f) increment the time variable.

lxi
4.5.3.1 The simulation process :

Figure 4.7

lxii
4.6 The available APIs for the scheduler component:
In this section, we give a description of the (part of the) interfaces of those classes that a designer
may be concerned with while building a new scheduler component.

4.6.1 The Task interface


Table 4.1 gives the list of the public methods of the Task class that may be called on any Task
instance.

Method Description

int getId() Returns the value of the Id attribute given to the task in
the input xml file.

int getActivationDate() Returns the value of the activationDate attribute of the


task. It represents the release date (in slots) of the first
job of the task

int getWCET() Returns the value of the WCET attribute of the task. It
represents the worst case execution time (in slots) of
the task

int getBCET() Returns the value of the BCET attribute of the task. It
represents the best case execution time (in slots) of the
task.

int getAET() Returns the value of the AET attribute of the task. It
represents the actual execution time (in slots) of the
current job of the task.

int getPeriod() Returns the value of the period attribute of the task. It
represents the period (in slots) of the task.

int getPriority() Returns the value of the priority attribute of the task. It
represents the priority of the task.

lxiii
int getDeadline() Returns the value of the deadline attribute of the task. It
represents the critical delay (in slots) of the task.

void runningOn(Processor P) Assigns the task to the processor the reference of which
is P and moves the task state from Ready to Running

void preempt() Moves the task state from Running to Read The
processor on which the task was previously running
becomes free.
Table 4.1

4.6.2. The Processor API


Table 4.2 gives the list of the public methods of the Processor class that may be called on any
Processor instance.

Method Description
int getId() Returns the value of the Id attribute given to the
processor in the input xml file.
boolean isRunning() Returns true if the processor is presently busy,
otherwise false.
Task getrunning() Returns the reference of the task instance that is
presently running on the processor if the processor is
busy, otherwise null.
int getLoad() Returns the value of the processor load i.e. the
number of slots (since the beginning of the
simulation) where the processor has been running.
void setRunning(Task T) Assigns the task the reference of which is T to the
processor and moves the task state from Ready state
to Running.
Table 4.2
4.6.3 User-defined dynamic attributes : The Entity class provides additional methods
that enable the use of dynamical attributes and map them on hashtables. By dynamical attributes,

lxiv
we mean attributes that are not explicitly declared as members in the Java code of a class, but
that can be created, initialized and then updated thanks to specific function calls inside the
methods of this class.

Method Description
void setOwnFieldIntValue(String name, int i) Creates (or updates if yet created) the
integer dynamic attribute name with the
value i.
int getOwnFieldIntValue(String name) Returns the current value of the integer
dynamic attribute name.
void setOwnFieldStringValue(String name, Creates (or updates if yet created) the
String s) string dynamic attribute name with the
value s.
int getOwnFieldStringValue(String name) Returns the current value of the string
dynamic attribute name.

4.6.4. The SimulationKernel interface


Table 4.4 gives the list of the public methods of the SimulationKernel class that may be called on
the SimulationKernel instance on which STORM relies.

lxv
Table 4.3
Method Description
int getDuration() Returns the value of the duration attribute (in
slots) given to the simulation in the input xml
file.
TasksListeManager Returns the reference on the object that
getTasksListeManager() implements the task list manager.

4.6.5. The TaskListeManager interface


Table 4.5 gives the list of the public methods of the TaskListeManager class that may be called
on the TaskListeManager instance returns by the getTasksListeManager() previously listed
method.

Method Description
ArrayList getProcessors() Returns the reference on the ArrayList object that
records all the processors of the hardware
architecture.
ArrayList getTasks() Returns the reference on the ArrayList object that
records all the tasks that are presently either in the
Ready state or the Running one.
ArrayList getAllTasks() Returns the reference on the ArrayList object that
records all the tasks of the software architecture.

4.7 The XML file as the specification of a simulation :

This section describes the XML file used as input with the "exec" command . This
file is organized hierarchically around some tags as shown in Figure 4.8. Attributes
are associated to some tags. Moreover it is possible for the developer of new
scheduler components to introduce and inform new dynamic attributes.

 The present hierarchy of tags in the XML input file

lxvi 4.5
Table
Figure 4.8

 SIMULATION
The SIMULATION root tag of the XML file is associated with two attributes.The "duration"
attribute gives the upper bound of the simulation interval in number of slots.The “precision”
attribute is a floating point value which gives the absolute duration of one slot inmilliseconds. By
default, the precision is equal to 1.0, and thus the duration of a slot is one millisecond.

Example:

lxvii
<SIMULATION duration="75" precision=”1.0”>

...

</SIMULATION>

 SCHED

The SCHED tag enables to specify the scheduler used in the simulation. Its unique attribute
"className" must indicate the name of the scheduler Java class. A lot of classes are provided in
the library of schedulers associated with STORM. See appendix 3 for the list of presently
available scheduler components.

Example:

<SCHED className="storm.Schedulers.FP_P_Scheduler">

</SCHED>

 CPUS

The CPUS tag (for "CPU Set") begins the section relative to the specification of the set of

processors that compose the hardware architecture.

 CPU

The CPU tag enables to define an instance of processor. The attributes that must accompany this
tag are:

o className: the name of the Java class that models the type of the considered processor.
It is a reference to a processor type in the library. See appendix 4 for the list of presently
available processor types;

lxviii
o name: the name given to this instance of processor. It is used as a reference in some
outputs ( for example, the title given to the windows representing temporal diagrams of
this processor instance);
o id: the numeric identifier that the simulation kernel will use to identify this processor
instance. This identifier must be unique in the XML file.

Example:

<CPU ClassName="storm.Processors.CT11MPCore" name ="CPU B" id="1">

</CPU>

4.8 Simulation graph display

 Plotall (pa)
The command displays the execution sequences built by the simulation. For each task and
each processor, a window shows its Gantt diagram .
 Plotpower (pp) <cpu_Id>
The command plots a graph showing the power consumption for the processor <cpu_Id>.
The value of <cpu_Id> is given in the XML file or is known with the “context” or

lxix
“explorer” commands .
 Plotdvfs (pd) <cpu_Id>
The command plots a graph showing the behavior of the DVFS system for the processor
<cpu_Id>. The value of <cpu_Id> is given in the XML file or is known with the “context” or
“explorer” commands.
 Plotload (pl) <cpu_Id>
The command plots a graph showing the load of the processor <cpu_Id>. The value of
<cpu_Id> is given in the XML file or is known with the “context” or “explorer” commands.
 Plotret (pt) <task_Id>
The command plots a graph showing the progress of the RET (“Remaining Execution
Time”) for the task <task_Id>. The value of <task_Id> is given in the XML file or is known
with the“context” or “explorer” commands.
 gantt (gtt) <item_Id>
The command displays the Gantt diagram for the task or processor <item_Id>. The value of
<item_Id> is given in the XML file or is known with the “context” or “explorer” commands.
 close (cl) <window_Id>
The command closes the window <window_Id>. The value of <window_Id> is known with
the “windowsinfo” command.
 closeall (cla)
The command closes all the windows except, of course, the console window.
 setpos (sp) <window_Id> <time_slot>
 setpos (sp) <window_Id> <time_slot> <zoom_value>
The command changes the origin of the graph <window_Id> to the slot <time_slot>.
Optionally the display is zoomed (in or out) depending on the value <zoom_value> (values:
0|1|2|3). The value of <window_Id> is known with the “windowsinfo” command.
 setzoom (sz) <window_Id> <slot_1> <slot_2>
 setzoom (sz) <window_Id>
The command displays the graph <window_Id> between <slot_1> and <slot_2>. If these two
values are missing, the display returns to its original form. The value to use for <window_Id>
is known with the “windowsinfo” command.

lxx
4.9 Gantt diagrams for tasks and processors:

The “plotall” command displays in several windows the execution sequences that result from the
last simulation.

 For each task, a window shows its Gantt diagram over an interval beginning at date 0 and
ending at date 50 (default values) . Figure 4.9 illustrates such a task diagram. The title of
the window refers to the name given to the task in the XML file (cf. attribute “name”).

Gantt diagram for TASK T1

Figure 4.9

 For each processor, a window shows its allocation to the tasks over an interval beginning at
date 0 and ending at date 50 (default values). Figure 6 illustrates such a processor diagram.
The title of the window refers to the name given to the processor in the XML file ( attribute
“name”).

Gantt diagram for CPU B

lxxi
Figure 4.10

4.10. On the Gantt diagram of a task, icons are used to identify special
instants:

lxxii
Chapter 5

lxxiii
5 XML file as input:

This is a xml file with a hardware architecture composed of 2 processors and a software
architecture composed of 3 periodic independent tasks. Scheduler implements a fixed priority
full preemptive scheduling algorithm.

<project_input.xml>
<!-- FP_P-->
<!-- comments -->
<SIMULATION duration="50">
<SCHED className="storm.Schedulers.FP_P_Scheduler"></SCHED>
<CPUS>
<CPU className="storm.Processors.CT11MPCore" name ="CPU A" id="11" ></CPU>
<CPU className="storm.Processors.CT11MPCore" name ="CPU B" id="12" ></CPU>
</CPUS>
<TASKS>
<TASK className="storm.Tasks.PTask_NAM" name ="PTASK T1" id="1" period="10"
activationDate="0" WCET="5" priority="1"></TASK>
<TASK className="storm.Tasks.PTask_NAM" name ="PTASK T2" id="2" period="9"
activationDate="2" WCET="3" priority="5"></TASK>
<TASK className="storm.Tasks.PTask_NAM" name ="PTASK T3" id="3" period="6"
activationDate="4" WCET="2" priority="10"></TASK>
</TASKS>
</SIMULATION>

lxxiv
5.1 Java File:
/*A fixed priorities for the task and allocated on CPU A and CPU B on the basis of first in first
out with a feedback of machine learning of our agent.

*/
import storm.Schedulers.Scheduler;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Set;
import java.util.TreeMap;
import java.util.Stack;
import storm.EvtContext;
import storm.Processors.Processor;
import storm.Tasks.Task;

public class FP_P_FIFO_Scheduler extends Scheduler {

public class ordering_priority implements Comparator{

public int compare(Object arg0, Object arg1) {


// TODO Auto-generated method stub
Integer a0 = (Integer) arg0;
Integer a1 = (Integer) arg1;
if (a0.intValue() > a1.intValue()) return -1;
else if (a1.intValue() == a0.intValue()) return 0;

lxxv
else return 1;
}
}

TreeMap listePriority;
Boolean todo=false;

public void init() {

this.listePriority = new TreeMap(new ordering_priority());

}
public void onUnBlock(EvtContext c){
Task T = (Task) c.getSource();
addTask(T);

todo = true;
}
public void onBlock(EvtContext c){
Task T = (Task) c.getCible();
removeTask(T);
todo = true;
}
private void removeTask(Task t) {
// TODO Auto-generated method stub
Integer A = new Integer(t.priority);

LinkedList fifo =(LinkedList) this.listePriority.get(A);


fifo.remove(t);

lxxvi
}
public void onActivate(EvtContext c) {

Task T =(Task) c.getCible();


addTask(T);

todo=true;

}
private void addTask(Task t) {
// TODO Auto-generated method stub
Integer A = new Integer(t.priority);
if (this.listePriority.containsKey(A)){

LinkedList fifo =(LinkedList) this.listePriority.get(A);


fifo.addLast(t);

}
else{

LinkedList fifo = new LinkedList();


fifo.addLast(t); this.listePriority.put(A,fifo );

}
public void sched() {
// TODO Auto-generated method stub

lxxvii
if (todo) select();
todo=false;
}

public void select() {


Stack pileCPUS = new Stack();
Stack pileTask = new Stack();
// TODO Auto-generated method stub

ArrayList CPUS= this.Kernel.getTasksListeManager().getProcessors();


int sizeCPUS= CPUS.size();
Set allKey = this.listePriority.keySet();
Iterator I= allKey.iterator();
int k=0;
//Preemption
while(I.hasNext()){
Integer a = (Integer) I.next();
LinkedList fifo = (LinkedList) this.listePriority.get(a);
Iterator F=fifo.iterator();

while(F.hasNext()){
Task T = (Task) F.next();
/*
utilization
*/

if (k>=sizeCPUS){
if (T.isIsrunning()){
T.preempt();

lxxviii
}
}

// System.out.println(k);
k++;

}
//recherche CPUs libre
for (int l=0; l< sizeCPUS ; l++){
Processor p=(Processor)CPUS.get(l);
if (!(p.isRunning())) pileCPUS.push(p);
}

//Affectation des ressources de libre


I= allKey.iterator();
// task not running

while(I.hasNext()) {

if (pileCPUS.size()==pileTask.size()) break;

Integer a = (Integer) I.next();


LinkedList fifo = (LinkedList) this.listePriority.get(a);
Iterator F = fifo.iterator();
while(F.hasNext()){
Task T = (Task) F.next();
if (!(T.isIsrunning())){
pileTask.push(T);
}

lxxix
}
}

System.out.println(" C -- " + pileCPUS.size());


System.out.println(" T -- " + pileTask.size());

while((pileCPUS.size()>0)){
if (pileTask.size()==0) break;
Task T = (Task) pileTask.pop();
Processor C = (Processor) pileCPUS.pop();
T.runningOn(C);
}

lxxx
lxxxi
4. Conclusion and future work

Utility based scheduling has become an increasingly important factor in


everyday work. Users seek for a high speed processing. Some functions are
need to be done in a Real-Time environment so as to complete a particular
task in a minimum time. This can be done but some scheduling technique so
as to complete all the task in time without failing to cross its deadline. This
improvement can be done in, The performance in terms of speed and minimal
resource consumption. We will make changes in process scheduling code for
Linux, which would offer better performance or something improved of what
exist at present and use reinforcement learning for making the scheduler
intelligent so that it can decide itself that which job should be scheduled first
or how much CPU’s required. It will be an intelligent artificial intelligence
based scheduler which will perform job with maximum utility in
multiprocessor systems.

lxxxii
Refrences:

[1] P. Li. Utility Accrual Real-Time Scheduling: Models and Algorithms. Ph.D. Dissertation, Department of
Electrical and Computer Engineering, Virginia Polytechnic Institute and State University. 2004..

[2] P. Li, B. Ravindran, H. Wu, E. D. Jensen. “A Utility Accrual Scheduling Algorithm for Real-Time
Activities With Mutual Exclusion Resource Constraints,” The MITRE Corporation Working Paper, April 2004..

[3] R.S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

[4] Vahalia H. Wu, B. Ravindran, E. D. Jensen, and U. Balli. “Utility Accrual Scheduling Under
ArbitraryTime/Utility Functions and Multiunit Resource Constraints,” In proceedings of the 10th International
Conference on Real-Time and Embedded Computing Systems and Applications (RTCSA), Gothenburg,
Sweden, August 2004..

lxxxiii
[5] D. Vengerov, “Reinforcement learning framework for utility-based scheduling in resource-constrained
systems.” Sun Microsystems Laboratories technical report TR-2005-141, Feb 1, 2005.

[6] R. Sutton, A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 1998.

[7] L. P. Kaelbling, L. M. Littman, and A. W. Moore, “Reinforcement learning: a survey.” Journal of Artificial
Intelligence Research, Vol. 4, pp. 237–285, 1996.

[8] J. Kepner, “HPC Productivity: An Overarching View,” International Journal of High Performance
Computing Applications: Special Issue on HPC Productivity. J. Kepner (editor), Vol. 18, no. 4, Winter 2004.

[9] C. L. Liu and J. W. Layland. “Scheduling algorithms for multiprogramming in hard real-time environment,”
Journal of ACM, Vol. 20, No. 1, pp. 46-61, 1973.

[10]. R. K. Clark. “Scheduling Dependent Real-Time Activities,” Ph.D. Dissertation, Carnegie Mellon
University, CMU-CS-90-155, 1990.
[11]. Wang, L.-X. “Fuzzy systems are universal approximators,” In Proceedings of the IEEE International
Conference on Fuzzy Systems (FUZZ-IEEE ’92), pp. 1163-1169,1992.

[12]. Vengerov, D. and Iakovlev, N. “Reinforcement Learning Framework for Dynamic Resource Allocation:
First Results” In proceedings of the 2nd IEEE International Conference on Autonomic Computing, June 13-16,
Seattle, WA, 2005.

[13]. H. Wu, B. Ravindran, E. D. Jensen, and U. Balli. “Utility Accrual Scheduling Under Arbitrary
Time/Utility Functions and Multiunit Resource Constraints,” IEEE Real-Time and Embedded Computing
Systems and Applications, August 2004, To appear.

[14] L.-X. Wang, Fuzzy systems are universal approximators, in: Proceedings of the IEEE International
Conference on Fuzzy Systems, FUZZ-IEEE ’92, 1992, pp. 1163–1169.

[15] C.L. Liu, J.W. Layland, Scheduling algorithms for multiprogramming in hard real-time environment,
Journal of ACM 20 (1) (1973) 46–61.

[16] www.myreaders.info

lxxxiv
[17] http://lxr.linux.no/linux-bk+v2.5.49

[18] http://en.wikipedia.org/wiki/Real-time_operating_system

PERSONAL DETAILS

lxxxv
Name: Abhishek Mishra
Enrolment No.: 07206G
Course: Bachelor of Technology
Branch: Computer Science and Engineering
Batch: 2007 – 2011
Date of Birth: 5th February, 1988
Email: abhishekmishra07206gcse@gmail.com
Phone No.: +917828030067
Address: 707,Avas Vikas Coloney,Gonda(U.P)
Pincode-271001

lxxxvi

Vous aimerez peut-être aussi