Académique Documents
Professionnel Documents
Culture Documents
This project is to let you practice the use of threads and sockets and learn to setup and run web
services. You need to write your programs in Java. C++ is also allowed, but not supported.
1 Server Program
The server program provides three procedures, one to initialize itself, one to process the request,
and one to print the content of the file. The pseudo codes for the two procedures are as follows.
Initialization:
dataF = open the file “data.x” for read and write;
You can use the configuration information to determine the server id “x” and x will be two digits
in the file name (e.g., “data.01”, “data.03”, etc.) Also, “data.x” is a file of integers (in both C or
Java). Each integer is of 4 bytes. The parameters “loc1” and “loc2” in the ProcessReq function
refers to the integer location, counting by words. For example, loc1 = 10, it means the 10-th
integer in the file and it will also be the 40-th byte in the file.
The parameter “loc” in the Print function is the start location for printing and “len” is the number
of integers to be printed. Both loc and len are of type integer. Similar to loc1 and loc2, loc counts
by integers, not by bytes. The web services print out should be directed to a log file. You should
use “log.x” as the log file name, where x is the serverID (same as the case in data.x).
2 Client Program
Each client program creates K threads to simulate K clients. You need to run M instances of the
client program to form client groups. Each client opens file “client.x” to read in the request they
are going to send, where x is the clientID. Each client ID consists of the group ID (besides the
last two digits of clientID) and the client replica ID in the group (the last two digits of the client
ID). The client replica ID can be obtained from the command line when you run the client
program. The groupID can be given when you create client threads. Note that the clientID x will
be expressed by 6 digits in the file names (e.g., “client.000101”, “client000302”, etc.).
File “client.x” contains the requests to be sent to the server, one request in a line. It has the
following format.
reqID command param1 param2
The “command” can be send, fail, malf, or stop. When it is send, it means to send the request to
the server and the two parameters are the locations to be read and written as discussed in the
server program. If the command is “fail”, it means the client fails, but fails cleanly. In this case,
the client program simply terminates. We do not consider recover from failure here. If the
command is “malf”, it means the client is failing maliciously. It sends data that is different from
the other clients in the group. When the command is “stop”, the client thread terminates
successfully.
When all client threads terminates, the client program should terminate. Your program should
handle proper termination.
C1 C1 C1 C1 C2 C2 C2 C2 C3 C3 C3 C3
F2 F2 F2 F2
F1 F1 F1 F1
S1 S1 S1 S1 S2 S2 S2 S2
Please note that in this project, we only consider one server group. The diagram contains a
second server group which is used to show that there can be another server group and the idea is
just the same.
Client, instead of sending request directly to the server, it calls its local FT service and invokes
ProcessReq. In the client stub, you need to use multiple threads to concurrently send the N
requests to the N servers. Thus, you will get N responses from the N servers. You need to do a
majority voting (> 1/2 responses should be received and should be the same) to get the final
response and then send the final response to the actual client routine. After getting the response,
please let your client stub print out the number of time out and the number of incorrect requests
and let your client print out the correct response.
One fundamental problem with request handling with replication is the problem of
synchronization. Each replica server should run the requests in the same order in order to
maintain consistency. Thus, each FT service for the server side should queue the requests it
receives and check with each other to determine an agreed-upon order for processing. Since each
node will receive the same request sent from multiple client nodes, a majority voting is
performed on these requests before it can be considered for request ordering. The FT service
implements the request queue mechanism as follows.
The client requests may not come in order. Assume that M=5 (so majority is 3). You may receive,
for example, one request from client group 2, then one from group 3, then one from group 2, then
one from group 1, then one from group 3, then one from group 2. So, now you have a majority of
requests from group 2 and the request can be placed in Ready queue. Since client requests can
come in any order, you need to implement a data structure to store the client requests you
received. The data structure should have (groupID, reqID) as the key and message content as the
body. When a new request with, for example, (clientID=000203, reqID=100) is received, then
the message body is entered in entry (groupID=0002, reqID=100). You need to check whether
the message body is consistent with other message bodies already there. You also need to update
“the number of requests received” and “the number of consistent requests”. If there are already a
majority of consistent requests for entry (groupID=0002, reqID=100), then the request is entered
into the Ready queue. If the number of requests received = M (including the timed out ones),
then the entry can be deleted. When you receive a request and after you process it, please print
out the related information in the entry (number of requests received, number of consistent
requests, etc.)
The argument “parameters” in QueueReq includes the original parameters “loc1” and “loc2”. It
should be a structure containing two integers. We use a single parameter to make the mechanism
general to any server side function (general function could have any number of parameters). But
in our system, there is only one case, which has two parameters.
ReadyReq ()
-- get invoked periodically;
synchronize all the requests in the queue with other servers;
ExecuteReqList (list);
ExecuteReqList (list)
-- list contains ordered requests that should be processed
loop till list is empty
remove the top request in the list from the list and from the queue;
call the appropriate web service and obtain the response;
provide the response to QueueReq;
signal response ready;
endloop;
Note that QueueReq is an external service function (to be invoked by the client). ReadyReq is an
internal function that should be invoked periodically. ExecuteReqList is also an internal function.
It is invoked by ReadyReq after ReadyReq decides the list of request that are ready on all server
nodes and the consistent order of them.
If the FT service is the leader, then it should have a thread to periodically invoke its ReadyReq
function, which in turn, invokes the ReadyReq functions of other FT services. The period for
invoking the ReadyReq is TR, which is defined in config.sys file and it is given in milliseconds.
If the FT service is not the leader, then it should have a thread to periodically check whether the
ReadyReq is invoked properly.
Each Web service may fail independently. You need to implement a function in the FT protocol
layer to simulate server failure. The function is:
Failed FT service
QueueReq
-- Still receive the requests, but simply discard them
ReadyReq
-- If it is invoked, simply ignore the invocation
If this FT service is the leader, then it will no longer invoke others
ExecuteReqList
-- Will stop functioning, not calling its corresponding WS
If it is half way of invoking a WS, then finish the invocation,
but do not forward the response to the client
In this project, we do not consider failure recovery (it will be too complicated if we do). So, each
FT service, after going to “Failed” mode, simply stops functioning. But to make the matter
simple, avoid problems at the message sender side, the failed FT service continues to accept
messages and simply discard them. The caller will get a time out in this case.
Malicious FT service
QueueReq
-- Remains the same
ReadyReq
-- Will generate incorrect list and send to others, you need to do the incorrect list generation
and print out what you generated
ExecuteReqList
-- Will forward the client requests to the WS like before
But after receiving WS responses, mess it up, and send to the client
You need to print out what you sent to the client
We consider a fixed duration for malicious failures. All activities during malicious failures
remain the same, except that the messages sent out by a malicious FT service may be
inconsistent with others. Once the duration is over, the malicious service will stop acting
maliciously and send out the correct messages like a normal FT service.
The FailureSimulator function will be called by a separate client, the control client. The control
client may call FailureSimulator of any FT service to simulate failure of the corresponding WS.
The control client may also call the print function in each WS directly to obtain the current state
information (the data file content) of the WS.
To simply the deployment efforts, the FT service functions will be deployed together with the
original web service functions “ProcessReq” and “print” within one web service. You need to
clearly print out status information (needed for us to check the correctness of your program).
Note that the log file name is “log.x” for the web services with serverID = x.
Leader FT service:
send synchronization requests to the followers;
receive the lists of ready requests from all FT services or time out;
merge the lists by selecting the minimal set;
send the final list to all other FT services;
We assume that the leader does not fail. So there is no time out needed on waiting for ReadyReq
invocation. But there will be time out for the leader to receive the list and the timeout period is
TL. Please print out the list sent out by the each server as well as by the leader .
For the first phase, we have N = M = 1, but the entries will be there. Correspondingly, the file
only contains one server URL.
5 Project Specifics
This project consists of three phases.
In second phase, you leave out the request ordering (synchronization) process. Each FT service
invokes its own ReadyReq periodically and ReadyReq simply calls ExecuteReqList. You can use
a separate thread to do the time keeping and ReadyReq invocation. Also, in this phase,
FailureSimulator for regular failure in ReadyReq is simple. ReadyReq will simply not be
invoked.
In the system, there will be M*K clients, generated from M client programs. Each client has its
own input file. There are M*K input files. For each client group, the requests are the same unless
there are failures. So, in each client group, M files are the same, except when there are failures.
Each server group has N server replicas. In our system, there is only ONE server group. Each
server replica should be deployed as a web service. This means you need to deploy N web
services. They are identical, except for their ServiceID. ServiceID is used only for the file names
(log.x and data.x). You do not need to use ServiceID when you deploy your service replicas on
different machines.
Make sure your program does not use busy waiting anywhere, not for any of the time out
situations, especially not for the response in QueueReq. Also make sure your program does not
use socket anywhere. We will check your source code for these violations.
5.3 Third Phase
Implement the leader based synchronization protocol for request ordering. The synchronization
protocol should be implemented using sockets.
In Phase 2, every node has one thread to keep timer and when the timer is up, it invokes
ReadyReq. In Phase 3, only the leader has the timer thread to invoke ReadyReq. The timer thread
in Phase 2 has to be changed for the followers. They should become a socket receive command.
6 Bonus Project
In the bonus project, you need to explore different designs for fault tolerance web services and
compare their performance. You need to submit a report on the implementations and the
performance curves. You need to explore various parameter settings for performance study.
We will only grade fully working bonus project. Semi working system will not be considered.
The grading is based on the quality of the design and the performance studies, etc.
You will get up to 6 points added to your final grade for the bonus project.
6.1 Protocols
You need to consider three protocols, two you already implemented, and you are implementing
the third one RobustFT.
NoFT
This is the same as the first phase, N=1, M=1. But K can be large.
SimpleFT
Use the leader based protocol for synchronization. All are the same as what you have already
implemented for FT service.
RobustFT
This one consider malicious faults. For example, in the leader based approach, the leader can
send different lists to the followers and makes all the replicas inconsistent. This can happen if
one of the FT service got compromised by an attacker. Thus, the final solution is to run a
Byzantine agreement protocol among the services to get a guaranteed agreement among the
working FT services.
7 Handin Procedure
You need to electronically submit your program before midnight of the due date (check the web
page for due dates). Follow the instruction below for submission.
Prepare a directory that only contains:
All source code files that makes up your solutions to this assignment.
A README file that explains how to run your program.
The DesignDoc file that contains the description of the major features of your program that is not
specified in the project specification, including all problems your program may have, all design
that deviates from the specification, or all additional features you implemented.
Zip the directory and submit your code through WebCT.
Sign up for project demo.