Vous êtes sur la page 1sur 8

Concurrent analysis of a log, using a MapReduce strategy

1. General description
One of the many functions of an operating system is to take statistics about the use of system
resources by the users and the processes. These statistics are written in files called logs. The goal
of this task is to make a program that, on a concurrent way, allows us to consult a log that
registers the execution of jobs in a computer with multiple processors.

Once the program reads the log, it will show the user a menu so that he can consult the data on
the log. Two programs will be implemented: a concurrent program using processes and a
concurrent program using threads. You can find more information on what must be
implemented.
Entries

Two executable programs must be made: analogp and analogh, which will implement the
concurrency with processes (analogp) and with threads (analogh).

The following chart explains how the programs and their different arguments will be invoked

Invoking the process from the shell Meaning of the arguments that the
process receives
$ analogp logfile lines nmappers logfile: It’s the log file that will be analyzed on
nreducers intermediates a concurrent way

lines: it’s the number of lines or registries that


the log file contains. These lines do not include
the header of the file, which will pre
$ analogh logfile lines nmappers suppressed previously.
nreducers
nmappers: number of process that Will
execute the function Map. (> 0)

nreducers: number of process that Will


execute the function Reduce (> 0). The
number of reducers must be <= number of
mappers

intermediates: It’s a number between 0 and 1


that indicates if the intermediate files left by
the map and reduce functions must be deleted
after consulting the log. If the value is 0, the
intermediate values must be deleted. If the
value is 1, the values must be left there until a
new log consult is started. This flag only
applies to the process program (analogp)
because threads communicate with each
other through shared memory.

Format of the input file (logfile)

In the logfile, each line contains information regarding a job or process in 18 columns. The logs
that will be used for this task have already been processed and it’s not necessary that you worry
about the meaning of the fields.
You will find the meaning of each column here, but you do not need to worry about it (its just so
that you get a clear view of what we’re doing).

1. Job number

2. Submit time

3. Wait time

4. Run time

5. Number of processors assigned to the job

6. Mean CPU times used

7. Memory used by the job

8. Number of processes requested by the job

9. Requested time

10. Requested memory

11. Status

12. User ID

13. Group ID

14. number that identifies the app

15. Que number

16. Partition number

17. Previous job

18. Think time


In the following image you will find a sample of a log from job 6 to 26

Keeping in mind this image, the job #6 waited 1 second before it executed. Its execution lasted
214651 seconds and it was done in 24 processors. The job ended and was executed by a user
identified with the number 5. It’s about a batch process that was placed in the queue #1.

2.2 Outputs

Once the programs are called, both analogp and analogh will show the user a simple
menu so that he consults the log. These are the two options in the menu:

1. Consult the log


2. Exit

Each consult will be made on one of the log columns, this is the format of the
consultation.

$ column, sign, value

Where
column: it’s the number of the column where the consultation will be made
sign: >, <, <=,>=, =
value: keeping in mind the sign field, it is expected that the log registries that satisfy the
consulting will be smaller, bigger, bigger or equal, smaller or equal, or equal to the given
value.

The result of the consulting is: the number of registries that satisfy the given conditions,
followed by the total time that the program took to get an answer. To measure the time
it took, use gettimeofday. Time must be measured in the master, from the moment that
the consult starts until the result is showed.

Example:

a) If it’s required to consult the number of jobs that were executed in more than 30
processes, the consult will have the following format:

5,>,30

The output for this example should be 13, followed by the time the program took to do the job.

b) If we want to know how many jobs used an amount of memory less or equal to 20, the used
should write:

7, <=, 20

Output= 11, followed by the time it took the program to make the consultation

2.4 Map reduce strategy to solve the problem

/
2.4.2 Implementation

User Program-Master: This function could be used by just one process. These will be the
processes analogp and analogh, in charge of creating processes to use the functions Map,
reduce and divide the logfile to spread it with the workers that made the function Map.
When creating the workers that will implement the reduce function, the master must
pass to them the files Buf0..Buf1..Bufk that it will work with. Finally, the master must
read the output files that the reducers generate and show the result of the consultation
to the user.

Workers (mappers): They receive from the father (master) the consultation that is being
made and the part of the log file on which they will be working on. The result is left in
the storage elements Buf0,Buf,...Bufk. If the intermediate value for the case of the
processes is equals to 0, these temporary archives must not be deleted until a new
consultation is made.

Workers (Reducers): These are the processes that execute the reduce function on the
algorithm. The master will assign the resulting buffers from the map function to the
reducers. The reducers must count the number of present registries in the assigned
buffers and then write the totals in the output files.

Storage elements

Input files (split0, split1, split2,…splitk): These are the chunks of files in which the
original logfile is divided, to be assigned to different mappers. The division must be as
equal as possible. K is the number of mappers.

Buf0,Buf1,Bufk: These are the storage entities where the mappers store their results. In
the case of processes, these entities must be files. For the thread implementation, these
elements must be data structures stored in memory. These files will be the input for the
reducer processes.

Output file 0, file1, …output file N: These are the files or storage entities where each
reducer process will place the total amount of entities that were analyzed (in the
buffers). Like in the previous case, these entities can be files for the case of processes
and data structures in memory for the threads case. N is the number of created
reducers.
Here you can find an example for the execution of the process program and a
consultation example. The key to the MapReduce algorithm will be the number of jobs

$ analogp log 3 2 0 //3 mappers and 2 reducers//

consultation
$ 5, >, 30 //Jobs that used more than 30 processors in their execution//

The following image shows the division with the mappers.

The following image shows the result of


what each mapper does.
This final image shows the input and output of the reducer processes. The master
takes these outputs and adds them to show the result to the user

Vous aimerez peut-être aussi