Vous êtes sur la page 1sur 9

CS 6378 Advanced Operating Systems -- Project Description

This project is to let you practice the use of threads and sockets and learn to setup and run web
services. You need to write your programs in Java. C++ is also allowed, but not supported.

1 Server Program
The server program provides three procedures, one to initialize itself, one to process the request,
and one to print the content of the file. The pseudo codes for the two procedures are as follows.

Initialization:
dataF = open the file “data.x” for read and write;

ProcessReq (clientID, reqID, loc1, loc2)


y = read (dataF, loc1);
z = compute y! % P;
write (dataF, loc2, z);
return (z);

Print (loc, len)


print (dataF, loc, len);

You can use the configuration information to determine the server id “x” and x will be two digits
in the file name (e.g., “data.01”, “data.03”, etc.) Also, “data.x” is a file of integers (in both C or
Java). Each integer is of 4 bytes. The parameters “loc1” and “loc2” in the ProcessReq function
refers to the integer location, counting by words. For example, loc1 = 10, it means the 10-th
integer in the file and it will also be the 40-th byte in the file.

The parameter “loc” in the Print function is the start location for printing and “len” is the number
of integers to be printed. Both loc and len are of type integer. Similar to loc1 and loc2, loc counts
by integers, not by bytes. The web services print out should be directed to a log file. You should
use “log.x” as the log file name, where x is the serverID (same as the case in data.x).

2 Client Program
Each client program creates K threads to simulate K clients. You need to run M instances of the
client program to form client groups. Each client opens file “client.x” to read in the request they
are going to send, where x is the clientID. Each client ID consists of the group ID (besides the
last two digits of clientID) and the client replica ID in the group (the last two digits of the client
ID). The client replica ID can be obtained from the command line when you run the client
program. The groupID can be given when you create client threads. Note that the clientID x will
be expressed by 6 digits in the file names (e.g., “client.000101”, “client000302”, etc.).

File “client.x” contains the requests to be sent to the server, one request in a line. It has the
following format.
reqID command param1 param2
The “command” can be send, fail, malf, or stop. When it is send, it means to send the request to
the server and the two parameters are the locations to be read and written as discussed in the
server program. If the command is “fail”, it means the client fails, but fails cleanly. In this case,
the client program simply terminates. We do not consider recover from failure here. If the
command is “malf”, it means the client is failing maliciously. It sends data that is different from
the other clients in the group. When the command is “stop”, the client thread terminates
successfully.

When all client threads terminates, the client program should terminate. Your program should
handle proper termination.

3 Fault Tolerance Services


The goal of the project is to develop mechanisms to make the web service system fault tolerant
via replication. We consider a web services group of N replicated web services. Similarly, clients
are also replicated. The system has K client groups. Each client group consists of M client
replicas. The fault tolerance mechanisms are implemented as web services, FT service, to
achieve encapsulation. FT service implements both client side service and server side service.
There can be different protocols to achieve fault tolerance, based on different assumptions of
faults or different performance goals. Thus, there can be different implementations of the FT
mechanism.

Client program 1 Client program 2


K (=3) threads K (=3) threads Client id = 000302
Group id = 3
Replica id = 2

C1 C1 C1 C1 C2 C2 C2 C2 C3 C3 C3 C3

F2 F2 F2 F2

F1 F1 F1 F1

S1 S1 S1 S1 S2 S2 S2 S2

Please note that in this project, we only consider one server group. The diagram contains a
second server group which is used to show that there can be another server group and the idea is
just the same.

3.1 Basic FT Protocol for the Client Side


To achieve fault tolerance, each client needs to send the client request to multiple servers and
receive multiple responses and perform majority voting. To make the actual client logic
transparent from the FT protocol, you need to implement the client stub to hide the client call to
FT web services. The traditional stub has already been generated for you by the web service
environment you are using. You need to go into the generated stub and modify it to implement
the fault tolerance protocol. Or you can simply implement your own client stub (FT client stub)
on top of the client stub generated by the WS environment (SOAP client stub). The FT protocol
for the client stub is specified in the following pseudo procedure.

ProcessReq (clientID, reqID, serviceID, parameters)


locate server side FT service based on service ID
use N threads to
call QueueReq of all FT services at the server side
receive responses or time-out; -- you can treat time-out like a wrong response
register the response in a list of array;
if all responses are ready and the response has been processed then
delete the entire entry
if majority of responses are ready then
send response to the client
mark response as being processed
otherwise, do nothing
end of thread task

Client, instead of sending request directly to the server, it calls its local FT service and invokes
ProcessReq. In the client stub, you need to use multiple threads to concurrently send the N
requests to the N servers. Thus, you will get N responses from the N servers. You need to do a
majority voting (> 1/2 responses should be received and should be the same) to get the final
response and then send the final response to the actual client routine. After getting the response,
please let your client stub print out the number of time out and the number of incorrect requests
and let your client print out the correct response.

3.2 Basic FT Protocol for the Server Side


Similar to the client side, we need to keep the web service logic transparent from the FT
implementations. The server side FT support layer implements the FT protocol functions to hide
the FT implementation details. Each web service is replicated into N replicas, running on N
platforms. For each WS replica, a FT service layer is provided to achieve FT functionalities.

One fundamental problem with request handling with replication is the problem of
synchronization. Each replica server should run the requests in the same order in order to
maintain consistency. Thus, each FT service for the server side should queue the requests it
receives and check with each other to determine an agreed-upon order for processing. Since each
node will receive the same request sent from multiple client nodes, a majority voting is
performed on these requests before it can be considered for request ordering. The FT service
implements the request queue mechanism as follows.

QueueReq (clientID, reqID, serviceID, parameters)


store (clientID, reqID, serviceID, parameters) in the search table
-- if the request is a duplicate, same clientID X reqID then simply discard it
if the request already exists then check if a sufficient number of them have been received
-- same request should have different clientID, but should have the same groupID X reqID
If a sufficient number of replicas of a request have already been received
check consistency using majority voting
if pass consistency check then
put (clientID, reqID, serviceID, parameters) in ordered queue;
else if some more are still to come then wait for more replicas
else discard the request
endif;
endif;
wait till response is ready;
send response to the FT service at the client side;

The client requests may not come in order. Assume that M=5 (so majority is 3). You may receive,
for example, one request from client group 2, then one from group 3, then one from group 2, then
one from group 1, then one from group 3, then one from group 2. So, now you have a majority of
requests from group 2 and the request can be placed in Ready queue. Since client requests can
come in any order, you need to implement a data structure to store the client requests you
received. The data structure should have (groupID, reqID) as the key and message content as the
body. When a new request with, for example, (clientID=000203, reqID=100) is received, then
the message body is entered in entry (groupID=0002, reqID=100). You need to check whether
the message body is consistent with other message bodies already there. You also need to update
“the number of requests received” and “the number of consistent requests”. If there are already a
majority of consistent requests for entry (groupID=0002, reqID=100), then the request is entered
into the Ready queue. If the number of requests received = M (including the timed out ones),
then the entry can be deleted. When you receive a request and after you process it, please print
out the related information in the entry (number of requests received, number of consistent
requests, etc.)

The argument “parameters” in QueueReq includes the original parameters “loc1” and “loc2”. It
should be a structure containing two integers. We use a single parameter to make the mechanism
general to any server side function (general function could have any number of parameters). But
in our system, there is only one case, which has two parameters.

ReadyReq ()
-- get invoked periodically;
synchronize all the requests in the queue with other servers;
ExecuteReqList (list);

ExecuteReqList (list)
-- list contains ordered requests that should be processed
loop till list is empty
remove the top request in the list from the list and from the queue;
call the appropriate web service and obtain the response;
provide the response to QueueReq;
signal response ready;
endloop;
Note that QueueReq is an external service function (to be invoked by the client). ReadyReq is an
internal function that should be invoked periodically. ExecuteReqList is also an internal function.
It is invoked by ReadyReq after ReadyReq decides the list of request that are ready on all server
nodes and the consistent order of them.

If the FT service is the leader, then it should have a thread to periodically invoke its ReadyReq
function, which in turn, invokes the ReadyReq functions of other FT services. The period for
invoking the ReadyReq is TR, which is defined in config.sys file and it is given in milliseconds.
If the FT service is not the leader, then it should have a thread to periodically check whether the
ReadyReq is invoked properly.

Each Web service may fail independently. You need to implement a function in the FT protocol
layer to simulate server failure. The function is:

FailureSimulator (mode, duration)


Set the FT service in “Failed” or “Malicious” mode;
mode and duration are integers. Failed = 0 and Malicious = 1. Duration is in milliseconds.

Failed FT service
QueueReq
-- Still receive the requests, but simply discard them
ReadyReq
-- If it is invoked, simply ignore the invocation
If this FT service is the leader, then it will no longer invoke others
ExecuteReqList
-- Will stop functioning, not calling its corresponding WS
If it is half way of invoking a WS, then finish the invocation,
but do not forward the response to the client

In this project, we do not consider failure recovery (it will be too complicated if we do). So, each
FT service, after going to “Failed” mode, simply stops functioning. But to make the matter
simple, avoid problems at the message sender side, the failed FT service continues to accept
messages and simply discard them. The caller will get a time out in this case.

Malicious FT service
QueueReq
-- Remains the same
ReadyReq
-- Will generate incorrect list and send to others, you need to do the incorrect list generation
and print out what you generated
ExecuteReqList
-- Will forward the client requests to the WS like before
But after receiving WS responses, mess it up, and send to the client
You need to print out what you sent to the client
We consider a fixed duration for malicious failures. All activities during malicious failures
remain the same, except that the messages sent out by a malicious FT service may be
inconsistent with others. Once the duration is over, the malicious service will stop acting
maliciously and send out the correct messages like a normal FT service.

The FailureSimulator function will be called by a separate client, the control client. The control
client may call FailureSimulator of any FT service to simulate failure of the corresponding WS.
The control client may also call the print function in each WS directly to obtain the current state
information (the data file content) of the WS.

To simply the deployment efforts, the FT service functions will be deployed together with the
original web service functions “ProcessReq” and “print” within one web service. You need to
clearly print out status information (needed for us to check the correctness of your program).
Note that the log file name is “log.x” for the web services with serverID = x.

3.3 Leader Based FT Service


The FT services need to synchronize their lists of requests so that the requests are executed in the
same order in all services. One of the FT service serves as a leader for the synchronization
process and the rest are called the followers. The synchronization process is as follows.

Leader FT service:
send synchronization requests to the followers;
receive the lists of ready requests from all FT services or time out;
merge the lists by selecting the minimal set;
send the final list to all other FT services;

Each follower FT service:


receive the synchronization request from the leader;
send the list of ready requests to the leader;
receive the final list of ready requests from the leader;

We assume that the leader does not fail. So there is no time out needed on waiting for ReadyReq
invocation. But there will be time out for the leader to receive the list and the timeout period is
TL. Please print out the list sent out by the each server as well as by the leader .

3.4 Time Out


When there are failures in the system, some entities in the system may wait for messages and
need to have a time-out period to stop further waiting. Time out situations include (1) the
synchronization among FT services in ReadyReq and the timeout period is TL, (2) the FT
service QueueReq waiting for the majority of the client requests and the timeout period is TR,
and (3) the client FT stub waiting for service responses and the timeout period is TQ. The period
for the ReadyReq to get activated is TH. The constants TL, TQ, TR, and TH are defined in
config.sys and they are given in milliseconds.
4 System Setup
The system has a configuration file “config.sys” which provides all the system configuration
information. Its content is specified in the following.
N K M TH TQ TR TL P
first server URL IPaddr portNum
……
N-th server URL IPaddr portNum

For the first phase, we have N = M = 1, but the entries will be there. Correspondingly, the file
only contains one server URL.

5 Project Specifics
This project consists of three phases.

5.1 First Phase


Implement a single client program (including K client threads) and a single server web service
without any fault tolerance. You need to still assign group id for each client, except that it is
always 1.

5.2 Second Phase


In this phase, you should implement the fault tolerance web service mechanism, including the FT
service for the server and FT stub for the client. You will implement one class for client FT stub
and another class for server side FT service. Each web service uses the same FT service class you
implemented. Each client uses the same client FT stub you implemented.

In second phase, you leave out the request ordering (synchronization) process. Each FT service
invokes its own ReadyReq periodically and ReadyReq simply calls ExecuteReqList. You can use
a separate thread to do the time keeping and ReadyReq invocation. Also, in this phase,
FailureSimulator for regular failure in ReadyReq is simple. ReadyReq will simply not be
invoked.

In the system, there will be M*K clients, generated from M client programs. Each client has its
own input file. There are M*K input files. For each client group, the requests are the same unless
there are failures. So, in each client group, M files are the same, except when there are failures.
Each server group has N server replicas. In our system, there is only ONE server group. Each
server replica should be deployed as a web service. This means you need to deploy N web
services. They are identical, except for their ServiceID. ServiceID is used only for the file names
(log.x and data.x). You do not need to use ServiceID when you deploy your service replicas on
different machines.

Make sure your program does not use busy waiting anywhere, not for any of the time out
situations, especially not for the response in QueueReq. Also make sure your program does not
use socket anywhere. We will check your source code for these violations.
5.3 Third Phase
Implement the leader based synchronization protocol for request ordering. The synchronization
protocol should be implemented using sockets.

In Phase 2, every node has one thread to keep timer and when the timer is up, it invokes
ReadyReq. In Phase 3, only the leader has the timer thread to invoke ReadyReq. The timer thread
in Phase 2 has to be changed for the followers. They should become a socket receive command.

6 Bonus Project
In the bonus project, you need to explore different designs for fault tolerance web services and
compare their performance. You need to submit a report on the implementations and the
performance curves. You need to explore various parameter settings for performance study.

We will only grade fully working bonus project. Semi working system will not be considered.
The grading is based on the quality of the design and the performance studies, etc.

You will get up to 6 points added to your final grade for the bonus project.

6.1 Protocols
You need to consider three protocols, two you already implemented, and you are implementing
the third one RobustFT.

NoFT
This is the same as the first phase, N=1, M=1. But K can be large.

SimpleFT
Use the leader based protocol for synchronization. All are the same as what you have already
implemented for FT service.

RobustFT
This one consider malicious faults. For example, in the leader based approach, the leader can
send different lists to the followers and makes all the replicas inconsistent. This can happen if
one of the FT service got compromised by an attacker. Thus, the final solution is to run a
Byzantine agreement protocol among the services to get a guaranteed agreement among the
working FT services.

6.2 Response Time Measurement


You need to modify your program to study performance of the system. Most importantly, the
client should measure the response time for each request and compute average response time.
You need to prepare the data file and command file such that the computation time for factorial
function varies. You need to measure the average service time as well (how long does it take for
the ProcessReq to finish execution). It is desired that the service time follows a exponential
distribution with a variant average.
6.3 Parameters to Consider
You need to adjust K = 1..100 (we will have to see how much the system can handle).
You need to adjust M and N.
You need to adjust the average service time by adjusting the average y value (the number for
factorial computation).
You need to adjust TR to control the frequency of synchronization, i.e., to control the tradeoffs
between synchronization overhead and the client response time.

7 Handin Procedure
You need to electronically submit your program before midnight of the due date (check the web
page for due dates). Follow the instruction below for submission.
 Prepare a directory that only contains:
 All source code files that makes up your solutions to this assignment.
 A README file that explains how to run your program.
 The DesignDoc file that contains the description of the major features of your program that is not
specified in the project specification, including all problems your program may have, all design
that deviates from the specification, or all additional features you implemented.
 Zip the directory and submit your code through WebCT.
 Sign up for project demo.

Vous aimerez peut-être aussi