4944 A 391

Trusted Sampling-Based Result Verification on Mass Data Processing
Yan Ding
1
, Huaimin Wang, Peichang Shi, Hongyi Fu
National University of Defense Technology
Changsha, China
1
yanding@nudt.edu.cn
Changguo Guo, Muhua Zhang
Chinese Electronic Equipment System Corporation Institute
Beijing, China

AbstractComputation integrity is difficult to verify when
mass data processing is outsourced. Current integrity
protection mechanisms and policies verify the results
generated by participating nodes within a computing
environment of service providers (SP), which can not
preventing the subjective cheating of SPs. This paper provides
an analysis and a modeling of computation integrity for mass
data processing services. A third-party sampling-result
verification method called trusted sampling-based third-party
result verification (TS-TRV) is proposed to prevent lazy
cheating by SPs. TS-TRV is a general solution for common
computing jobs and uses the powerful computing capability of
SPs to support verification computing, thus lessening the
computing and transmission burden of the verifier. A series of
simulation experiments and theoretical analysis indicates that
TS-TRV is an effective method of detecting the cheating
behavior of SP while ensuring the authenticity of sampling.
Compared with the transmission overhead of nave sampling
verification, which is O(N), the network transmission overhead
of TS-TRV is only O(logN). TS-TRV efficiently solves the
verification problem of the intermediate results in
MapReduce-based mass data processing.
Keywords- result verification, mass data processing,
MapReduce, trusted sampling, Merkle tree
I. INTRODUCTION
Cloud computing is currently the focus of much
research. The key feature of cloud computing, which is
based on servitization [1], provides remarkable convenience
to users. However, issues arise with outsourcing service
mode; that is, the control of resources and processing shifts
from users to cloud service provider (SP), which results in
uncontrollable service processing and hard-to-verify results.
As a result, the integrity of services is technically assured by
the SP. However, in reality, cloud computing is charged on
demand; thus, profit-driven SPs, such as data storage
[2][3], large-scale computing outsourcing [4][5], and so on,
may degrade service quality and reduce the size of the
computing problem.
Currently, mass data processing techniques are in
demand both in research and business domains. Numerous
data analysis and processing services are provided to
accommodate various requests. Actual service modes
include processing and analyzing specified data provided by
users, as well as data analyzing services for open data
platforms. Considering that concrete data processing is
accomplished by the SP, users cannot identify whether data
processing is done completely. Users do not have enough
computing resources to verify the authenticity of the results
because of the huge data amount of the computing problem.
Thus, SPs are able to cut corners in computing and cheat the
results to make profit. In practice, some doubts surround
data analysis results provided by inexpensive and minor
data analysis agents. Thus, verifying the computation
integrity of mass data processing is necessary to guarantee
service quality and to protect user interests.
In practical mass data processing techniques, the large
amount of computed data makes the traditional relational
data management unsuitable. MapReduce [6], which is a
new parallel programming framework, dynamically
organizes a large amount of nodes into a computing
environment and uses the principle of parallel computing to
fulfill mass data processing tasks. MapReduce has been
adopted by Google, Yahoo!, Amazon, and Facebook, and
has become the dominant mass data processing technique.
Thus, applying MapReduce for computation integrity
verification is practical in solving the service integrity issue
of mass data processing. Currently, researches on result
verification of mass data processing are mainly focused on
maintaining the integrity of results generated by
participating computing nodes in a computing environment.
However, in cloud computing, trust between the user and
the SP is a major factor considered by the user when
choosing services. Therefore, from the viewpoint of users,
verifying the computing result in mass data processing as
well as the computation integrity of the SP is important.
To address this challenge, this paper proposes the trusted
sampling-based third-party result verification method (TS-
TRV). By sampling the MapReduce intermediate results, we
can verify whether user data are processed completely in
map phase. TS-TRV utilizes the Merkle tree [7] to organize
the intermediate results of the SP for verification, thereby
guaranteeing the authenticity of sampling and decreasing the
overhead of result submission. Theory analysis and
simulation experiments showed that the communication
overhead of TS-TRV is O(logN), whereas that of the
common sampling techniques is O(N). The computational
overhead of verification is mainly distributed on the side of
the SP. Thus, the verifier cuts computing and network
transmission costs, which results in more flexible computing
environment requests to the verifier, in compliance with the
cloud computing principle, which states that computing
should be concentrated on the cloud side.
The paper is organized as follows. Section 2 introduces
2013 IEEE Seventh International Symposium on Service-Oriented System Engineering
978-0-7695-4944-6/12 $26.00 2012 IEEE
DOI 10.1109/SOSE.2013.65
391
related research in this domain. Section 3 defines the
MapReduce computing model and the cheating model, and
Section 4 presents TS-TRV. Section 5 presents the analysis
and experimental evaluations as well as the comparison of
TS-TRV with similar works. Section 6 concludes this paper
and presents our future research focus.

II. RELATED WORKS
The result verification of outsourcing is an emergent
topic along with distributed computing modes. Given that
computational jobs are dispatched to participation nodes,
which cannot be controlled by the job manager, the
computing results submitted by nodes need to be verified
before use. Result verification comprises three categories of
techniques. First, replication and voting features for
redundant computing are applied such that multiple
computing nodes will perform the same job; the result is
accepted when it is submitted by more than half of the total
nodes [8]. Second, sampling techniques address the resource
cost of replication, involving result-based sampling [9] and
test job injection sampling [10]. With sampling techniques,
computation results are verified and trusted with a certain
probability. Finally, checkpointing deals with result
verification for sequential computation [11]. Computation is
divided into several time slices. At the end of each slice, the
checkpoint is obtained to partially verify the computing
result.
Currently, research studies concerning result verification
for mass data processing of MapReduce focus on the
computation integrity of the inner nodes in the MapReduce
computing environment. Wei Wei et al. worked on the
verification problem of computation results in an open
MapReduce environment [12]. According to them, a
computation result generated by participation nodes from
different resource owners may not be trusted. Thus, they
proposed an integrity protection mechanism called
SecureMR, which uses two-copy replication to verify the
result in the map phase. Results can be submitted to the
reduce phase only if the results of all copies are the same.
SecureMR aims for 100% detection rate. However, this
method has increased computational costs and cannot cope
with collusion. Using this finding as basis, Yongzhi Wang
and his colleagues worked against the collusion problem
[13]. They introduced the verifier role in the MapReduce
computing model. Computation results undergo replication
verification, are sampled, and are then recomputed by the
verifier, thus solving the collusion problem to a certain
extent. However, this method is based on the assumption
that the verifier is absolutely trusted. Thus, the verifier
becomes a system bottleneck. Z. Xiao et al. worked on
result cheating which is caused by network attack on the
working nodes in the MapReduce platform

[14]. They used
a set of trusted auditing nodes to record the result generated
by various phases of MapReduce. The cheating nodes can
be located by recomputing the results. Considering the
credibility of objects, Chu Huang et al. proposed a
watermark injection method to verify if the submitted
results are completed correctly [15]. The watermarks used
for verification are inserted randomly into the job before the
job is submitted by the user. After the result is submitted,
the watermarks are first checked to determine whether they
are correctly processed. If they are, the integrity request is
assumed to be met with a certain probability. This solution
is effective in text processing jobs, which utilize substitute
encryption to generate watermarks. However, creating
watermarks is difficult for jobs that are difficult to predict,
such as statistics.
Current studies on result verification for MapReduce
focus on the inner computing environment. It is an urgent
need to provide simple and efficient computation integrity
verification for common MapReduce computing results from
the viewpoint of users. Considering the characteristic of
mass data processing jobs, verifying a whole task via
replication induces unacceptable computational overhead if
the result of the whole MapReduce framework is verified.
Thus, decreasing the overhead induced by verification
becomes a vital factor to consider.
III. RESULT CHEATING PROBLEM ON MASS DATA
PROCESSING
A. MapReduce programing model
The MapReduce programming model consists of a
single master node (job tracker) and several slave nodes
(task tracker). By taking Hadoop
[16]
as a sample
implementation, we can illustrate the MapReduce
programming model as follows.

Figure 1. Illustration of the MapReduce programming model.
As shown in Fig. 1, the MapReduce process can be
divided into two phases: map and reduce. First, during the
map phase, the input is partitioned into m splits, which are
independent from each other. The master node dispatches
these splits to several worker nodes to do parallel map
operations. These workers are called mappers. During
execution, each mapper deals with one split, casts map
operation on all input keyvalue pairs, and saves the result
on the local node. The computational result in this phase is
called intermediate result. When map computation is
completed, all intermediate results are partitioned into r
different parts according to their keys, and every partition is
assigned to a worker node to cast reduce operation; this type
of worker is called a reducer. In the reduce phase, each
392
reducer reads partitioned intermediate result from all
necessary mapper nodes and casts reduce operation on the
mapper nodes to obtain a final result, which is then stored
on the distributed file system.
On the basis of the MapReduce principle, we construct
definitions of the MapReduce programming model as
follows:
Definition 1 (MapReduce programming model).
Given a problem with an input set of D={x
1
, ..., x
n
}, where
the type of x
i
is in the <key1, value1> keyvalue pair
format. The computation procedure includes two phases,
and the results form a set R.
z Map phase
In this phase, the computation of each input keyvalue
pair is independent from each other. Thus, a map function
can be regarded as one-to-one mapping and denoted as f(x).
Hence, the computation in map phase can be expressed as
follows: for all x
i
D, y
i
= f(x
i
) is computed, where f is a
user-defined map function. The intermediate result set is Y
= {y
1
, ..., y
n
}, where y
i
is in the <key2, value2> keyvalue
pair format.
In a real process, job D is divided into m sets denoted by
D
i
, where i[1, m]. Each D
i
is dispatched to a mapper, and
a total of m mappers complete computation in parallel. The
computation results generated by all mappers form sets Y
i
,
where i[1, m] and Y=Y
1
... Y
m
.
z Reduce phase
Reduce computation is completed by reduction of results
generated in the map phase. Thus, the reduce function can
be regarded as mapping from the intermediate result set Y
to the final result R. Therefore, the computation in reduce
phase can be expressed as R=g(Y), where g is a user-
defined reduce function, and the result set R={r
1
,,r
s
},
where r
i
is in <key3, value3> keyvalue pair format.
In a real process, when the value of key2 of y
i
, Y is
divided into several sets,
, where i[1, r], each
gets a
reducer, the r reducers work in parallel and produce the
computation result set R
i
, and R=R
1
... R
m
.
B. Cheating model
During MapReduce data processing, the SP can gain
business profits by conducting partial computation to save
computational costs. The SP can also use a false result,
thereby confusing the user. Thus, computational integrity is
compromised. Based on motivation, the possible cheating
behavior of the SP is categorized into two:
z Lazy cheating
Following this model, the SP may only perform part of
the computation task and use the partial computing result as
a substitute of the real result, which must be generated by
performing all necessary computations. Thus, computational
cost is lowered and extra profits are gained. Cheating has
three types according to the period when the cheating
occurred:1) Cheating in map phase: The SP does the actual
computing only on part of the input and performs cheaper
computing
(x
i
) (for x
i

) on other parts of the input,

thus saving computational costs. ;2) Cheating in reduce
phase: The SP pretends to perform reduction on a part of the
intermediate result set. 3) Both 1) and 2) happened.
z Malicious cheating
As the name of this model suggests, the SP may cast any
malicious actions to present wrong results intensively, thus
confusing the user.
Based on the models discussed above, this paper
addresses the cheating in map phase issue in the lazy
cheating model and designs a trusted sampling-based third-
party result verification method named TS-TRV. TS-TRV
can verify the integrity of the results provided by the SP in
the map phase.
IV. TRUSTED SAMPLING-BASED RESULT VERIFICATION
A. System Model
TS-TRV is composed of the SP, user, and a third-party
verification mechanism (verifier). SP provides its
computation capability of mass data processing as a service
format, and the data to be processed may come from a user
or from open resources, such as open data platforms.
According to the data processing demand, the user commits
computing tasks to the SP and receives the result after
processing. The verifier acts on behalf of the user,
supervises the data processing performed by the SP, and
verifies whether the computation integrity is satisfied by the
result.
The proposed method involves the following procedures
(see Fig. 2): TS-TRV begins in the task commission phase,
in which SP obtains the task-relevant data input from users
or other data sources. Afterward, task execution begins. The
SP processes the data and returns the result to the user. In
the result verification phase, the user provides the input data
and descriptions of the task for verification. The verifier
performs an interactive check with SP to determine whether
computation integrity is achieved and then sends the
verification result to the user.
C
o
n
c
lu
s
io
n

o
f

v
e
r
if
ic
a
t
io
n
In
te
r
a
c
tiv
e

V
e
r
ific
a
tio
n

Figure 2. System model of TS-TRV.
According to the above model, this third-party
verification mechanism compensates for the computation
inefficiency of the user. However, compared with cloud
computing SP, the computation capability of third-party
verifiers is still weak. Verification overhead must be
considered when designing an actual verification
mechanism.
B. Mechanism design
Computation of the SP consists of two phases: map and
reduce. In the map phase, the original input data are
393
processed. Verifying the computation integrity ensures that
all data are processed on mappers. Thus, TS-TRV needs to
verify the intermediate result generated in the map phase.
Given the huge amount of data, recomputing a whole
task is difficult. Therefore, TS-TRV performs random
sampling to improve verification performance. Under this
principle, recomputing can be performed only on the
random samples, thus avoiding a large amount of
computational overhead. A high level of confidence can be
achieved by choosing a small number of samples.
Verification aims to discover the false computation of the
SP; thus, the distribution of false computation has high
subjective content and is hard to predict. We use simple
random sampling to perform verification. According to the
system model introduced previously, the verifier samples
the intermediate result for verification and obtains the input
set of the user task as reference to verify the correctness of
the sample. The correctness of a sample determines whether
the sampled value supplied by the SP is identical to the
correct result.
During verification, the verifier will randomly choose
several samples to verify. If all samples are correct, then the
data pass verification; otherwise, the SP cheated. Assuming
that the size of the intermediate result set is N, the false
results rate generated by lazy cheating is 50%. If s values
are chosen for verification in the intermediate result set,
then the probability of successful detect cheating is 1
(1/2)
s
.
However, confirming whether the SP is honest in the
verification is difficult because verification is made against
the subjective cheating actions of the SP. If cheating really
exists and the verifier chooses x
i

, then the SP has

great intention to recompute and submit f(x
i
) as the sample
value to the verifier to pass verification. Therefore, the
authenticity of the sampled data must be ensured.
Therefore, the design target of sampling verification is to
check the correctness of a sample set on the premise of
ensuring the authenticity of samples. Distinguishing forged
samples is necessary. In a simple method known as
commitment-based sampling, the SP must submit all
intermediate results to the verifier before the verifier
performs sampling, which inducing heavy network transfer
overhead of O(N).
In grid computing, to lower the data transfer overhead of
result verification, [9] proposed a sampling verification
method based on the Merkle tree, by which the network cost
of result commission is decreased to O(logN). The proposed
method organizes the results of all inputs into a Merkle tree
and submits the value of the root node in the tree for
verification. Because of the one-way feature of hash
functions, all elements in the result set must not be changed
after submission. This method deals with result verification
for a single node, whereas MapReduce is needed for
multiple nodes. Using this method as a foundation, we
designed a Merkle tree-based intermediate result
verification mechanism for map nodes. When all map nodes
are finished with their computing, the following steps are
taken for result submission and verification.
Step 1: Result submission of SP
First, Merkle trees of each node are constructed. That is,
after map nodes finish computing, a Merkle tree is
constructed, taking all results as the leaf nodes. Assuming
that map node k has n inputs, every output value
corresponding to an input will act as a leaf node, and then
the intermediate nodes are generated level by level by using
the following rules until the root node is generated.
L
fx
i n
v hashv
hashv

where represents nodes in the tree, hash is the
specified one-way hash function which may be MD5 or
SHA1 or so, || stands for conjunction of two hashed value,
and R
is the root node. After the construction, the map

node sends R
to the master node.

Figure 3. Merkle tree structure of map nodes.
As shown in Fig. 3, assuming the current node is the
third map node, its corresponding leaf nodes in Merkle tree
are the hashed values of all results, f(x
i
). Any one of the
intermediate nodes is the conjunction of the hashed values
of its two leaf nodes. In this way, the value of the root node
R
3
,
is calculated and submitted to the master.

The global Merkle tree is then constructed. The master
builds the global Merkle tree of results by using the rules
shown in (1) and takes all root values submitted by map
nodes as leaf nodes of the global tree. The root value
of this global Merkle tree is sent to the third party for
verification.
Fig. 4 shows the global view of the nodes and the global
result of Merkle trees on SP.

Figure 4. Global Merkel tree.
Once the root node of the global Merkle tree is
submitted, no values of the leaf nodes can be changed
because of the characteristic of the hash function. In other
words, after the sampling positions are determined, the
394
values provided by the SP must match the ones used in
Merkle tree; otherwise, will not match and cheating is
exposed. Such an approach ensures the authenticity of the
samples.
Step 2: Verification challenge issued by the verifier
The verifier randomly picks up s inputs, {x
1
, ..., x
s
} in all
inputs involved in the job and sends their sequence numbers
to the SP.
Step 3: SPs response
The SP accepts the sampling challenge issued by the
verifier and constructs the response information. For every
challenging input x
i
, the input number included in the
challenge is used to locate the corresponding map node k. Its
result y
i
is obtained as well as all the sibling values

along the path from the y
i
s corresponding node to the root
node in ks Merkle tree, where l is the height of ks Merkle
tree. Then, in the master node, the values
along
the path formed by
s corresponding leaf node to the

root node in the global Merkle tree are obtained, where h is
the height of the global tree. Finally, the challenge response
information set is generated as
.
For example, the input x
6
of the third map node is
sampled. As shown in Fig. 1, the sample corresponds to L
6
in the Merkle tree of the third node. To build the root node
from L
6
, all necessary nodes which form the set {L
5
, A, B,
C} are needed. As shown in Fig. 2, the necessary nodes for
calculating the global root node in the global Merkle-tree
level are {R
4
, D, E}. Therefore, the verification information
set of this sampling is

After constructing response sets for all sampling
challenges, the SP sends them to the verifier.
Step 4: Result verification
For sample input x
i
, the verifier first calculates f(x
i
), then
compares f(x
i
) and y
i
. If they are not equal, the SP cheated.
Otherwise, the verifier uses
to build the global Merkle

tree and compares the root value of to as in
Step 1. A match indicates that the SP passes the current
challenge. If the SP passes all the challenges, the
verification is successful. If not, the computation is deemed
erroneous. After verification, the verifier returns the results
to the user.
V. PERFORMANCE ANALYSIS
This section will analyze and evaluate the effectiveness
of cheat detection and the transfer overhead of verification
information.
Cheating detection rate is the probability that the
verifier can detect the SPs computation error after a certain
number of samples if any errors are present in the results.
Assumed that the input scale of the job is N, and a total of m
map nodes participate in the computation. For simplicity, all
inputs are evenly dispatched to map nodes, all mappers have
identical probability of cheating (mapper cheating
probability), and all inputs in one cheating mapper have
identical probability of cheating (intra-node input cheating
probability). The mapper cheating probability is expressed
as p
1
, and intra-node input cheating probability is expressed
as p
2
. Based on these assumptions, the verifier samples the
input set of the job and total s samples are chosen. For one
sampling, the probability of revealing cheating is
.
For s samplings, the cheating detection rate of the
verification is
Pi p
(2)
The simulation environment will then be constructed on
Matlab to test the cheating detection rate of TS-TRV. In this
case, the input size is N=150,000 and the number of mapper
nodes is M=150. Therefore, each mapper deals with 1000
inputs. First, errors are injected according to the different
mapper cheating probability p
1
and intra-node cheating
probability p
2
, and then sampling tests are conducted on the
result. To reduce the effect on the test result induced by
random factors for every similar parameter configuration,
the test is repeated 200 times, and the mean values are taken
as the final result.
Fig. 5 shows that a fixed number of samples affects the
cheating detection rate according to different p1 and p2. In
Figs. 5-a and 5-b, 100 and 300 samples were selected,
respectively, and a test was conducted to determine how
cheating detection rate follows p1 and p2 changes. When the
number of samples is 100, 15.3% of all 2601 dots have a
detection rate lower than 95%, and 77.32% of the results are
higher than 99%. When the number of samples increases to
300, only 8.3% dots have a detection rate lower than 95%,
and 87.93% of the results are higher than 99%. According to
formula (2), the effect of mapper cheating probability p1 and
intra-node cheat probability p2 are equivalent to the cheating
detection rate. However, in the experiment, errors are
introduced on specific mapper cheating probability. When
the number of mappers is rather small, the error injection is
obviously affected by randomicity. As a result, the cheating
detection could not be improved by p2s increase when p1 is
very small. When the number of mappers become larger than
a certain value (N>=100), system cheating detection rate will
follow the theoretical analysis of formula (2).

0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p1
a. Sample Number = 100
p2
C
h
e
a
t D
e
te
c
tio
n
R
a
te
395

Figure 5. Relationship between cheating detection rate and cheating
probability.
For commitment-based sampling method, the time spent
on network transmission is mainly used for passing all
intermediate results to verifier. The concrete sampling and
verification steps are all done solely by the verifier, and no
interaction with SP is needed. Hence, if the input size is N,
the transfer overhead is around O(N). As to TS-TRV during
result submission, only the root value of Merkle tree is
transferred. The network overhead is barely O(1). In
addition, after sampling, the SP needs to transfer the
verification information set. If the input size is N, the
number of samples is s, and the overhead of network
transfer is O(logN).
VI. CONCLUSIONS
This paper provided an analysis and modeling of
computing integrity for mass data processing services. To
handle the lazy cheating model of SPs, we proposed a third-
party sampling result verification method called TS-TRV.
The cheating detection rate and the performance of TS-TRV
are both analyzed and simulated in an experimental
environment. The result shows that TS-TRV achieves a high
cheating detection rate and low network transfer overhead.
Its verification cost is concentrated on the SPs side, thereby
reducing the computing and transmission burden of the
verifier. In our future work, we will address the
computational integrity issue in the reduce phase in the
MapReduce framework.

ACKNOWLEDGMENT
This work was supported by the National Basic Research
Program of China under Grant No.2011CB302600, the
National Natural Science Foundation of China under Grant
No. 61161160565, the HGJ Major Project of China under
Grant No. 2012ZX01040001 and the Fund No. KJ-12-06.

REFERENCES
[1] D. G. Feng, M. Zhang, Y. Zhang and Z. Xu, Study on Cloud
Computing Security, Journal of Software, Vol. 22, Jan. 2011, pp.71-
83(in Chinese), doi: 10.3724/SP.J.1001.2011.03958.
[2] A. Juels and B. S. Kaliski, Pors: Proofs of retrievability for large
files, Proc. the 14th ACM Conf. on Computer and Communications
Security (CCS 07), ACM Press, Oct. 2007, pp.584-597,
doi:10.1145/1315245.1315317.
[3] G. Ateniese, R. Burns, R. Curtmola, R. Curtmola, J. Herring, L.
Kissner and et al, Provable data possession at untrusted stores,
Proc. the 14th ACM Conf. on Computer and Communications
Security (CCS 07), ACM Press, Oct. 2007, pp. 598-609, doi:
10.1145/1315245.1315318.
[4] C. Wang, K. Ren, J. Wang and K. M. R. Urs, Harnessing the Cloud
for Securely Outsourcing Large-scale Systems of Linear Equations,
Proc. the 31st International Conference on Distributed Computing
Systems (ICDCS 11), IEEE Press, Jun. 2011, pp 549- 558,
doi:10.1109/ICDCS.2011.41.
[5] C. Wang, K. Ren and J. Wang, Secure and Practical Outsourcing of
Linear Programming in Cloud Computing, Proc. 2011 Proceedings
IEEE INFOCOM, IEEE Press, Apr. 2011, pp. 820-828,
doi:10.1109/INFCOM.2011.5935305.
[6] J. Dean and S. Ghemawat, Mapreduce: simplified data processing on
large clusters, Proc. the 6th conference on Symposium on Opearting
Systems Design & Implementation (OSDI 04), USENIX Association
Berkeley, Mar. 2004. pp. 10-10.
[7] R. Merkle. Secrecy, authentication, and public key systems. PhD
thesis, Electrical Engineering, Stanford University, 1979.
[8] M. Taufer, D. Anderson, P. Cicotti, C. Brooks III, Homogeneous
redundancy: A technique to ensure integrity of molecular simulation
results using public computing, Proc. the 19th IEEE International
Parallel and Distributed Processing Symposium (IPDPS 05),
Workshop 1, IEEE Press, Apr. 2005, pp. 119a, doi:
10.1109/IPDPS.2005.247.
[9] W. Du, J. Jia, M. Mangal, M. Murugesan, Uncheatable grid
computing, Proc. the 24th International Conference on Distributed
Computing Systems (ICDCS 04), IEEE Press, Mar. 2004, pp. 4-11,
doi: 10.1109/ICDCS.2004.1281562.
[10] S. Zhao, V. Lo and C. G. Dickey, Result Verification and Trust-
Based Scheduling in Peer-to-Peer Grids, Proc. the 5th IEEE
International Conference on Peer-to-Peer Computing (P2P 05), IEEE
Press, Aug.2005, pp. 31-38, doi: 10.1109/P2P.2005.32.
[11] F. Monrose, P. Wycko and A. Rubin, Distributed execution with
remote audit, Proc. the Network and Distributed System Security
Symposium (NDSS 99), Internet Society, Feb. 1999, pp. 103-113.
[12] W. Wei, J. Du, T. Yu, X. Gu, SecureMR: A service integrity
assurance framework for mapreduce, Proc. the 25th Annual
Computer Security Applications Conference (ACSAC 09), IEEE
Press, Dec. 2009, pp. 73-82, doi: 10.1109/ACSAC.2009.17.
[13] Y. Wang and J. Wei, VIAF: Verification-based Integrity Assurance
Framework for MapReduce, Proc. IEEE International Conference on
Cloud Computing (Cloud 11), IEEE Press, Jul. 2011, pp. 300-307,
doi:10.1109/CLOUD.2011.33.
[14] Z. Xiao, Y. Xiao, Accountable MapReduce in cloud computing,
Proc IEEE Conference on Computer Communications Workshops
(INFOCOM WKSHPS 11), IEEE Press, Apr. 2011, pp. 1082 1087,
doi: 10.1109/INFCOMW.2011.5928788.
[15] C. Huang, S. Zhu and D. Wu, Towards Trusted Services: Result
Verification Schemes for MapReduce, Proc the 12th IEEE/ACM
International Symposium on Cluster, Cloud and Grid Computing
(CCGrid 12), IEEE Press, May 2012. pp. 41-48, doi:
10.1109/CCGrid.2012.77.
[16] Hadoop, Apache hadoop. Available:http://hadoop.apache.org

0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p1
b. Sample Number = 300
p2
C
h
e
a
t D
e
te
c
tio
n
R
a
te
396

4944 A 391

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

4944 A 391

Transféré par

Droits d'auteur :

Formats disponibles

Trusted Sampling-Based Result Verification on Mass Data Processing

, where i[1, r], each

) on other parts of the input,

, then the SP has

is the root node. After the construction, the map

to the master node.

is calculated and submitted to the master.

s corresponding leaf node to the

to build the global Merkle

Vous aimerez peut-être aussi