Académique Documents
Professionnel Documents
Culture Documents
net/publication/260094378
CITATIONS READS
9 840
4 authors, including:
Geoffrey G. Xie
Naval Postgraduate School
91 PUBLICATIONS 1,996 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Geoffrey G. Xie on 10 February 2014.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
2. RELATED WORK
for profit or commercial advantage and that copies bear this notice and the full cita- Replicating data to multiple locations is a common tech-
tion on the first page. Copyrights for components of this work owned by others than nique for improving performance and reliability in dis-
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- tributed computing. Erasure coding has also been a widely
publish, to post on servers or to redistribute to lists, requires prior specific permission
used technique for enhancing data reliability [12]. Repli-
and/or a fee. Request permissions from permissions@acm.org.
MobiHoc’13, July 29–August 1, 2013, Bangalore, India.
cating data incurs storage overhead while achieving higher
Copyright 2013 ACM 978-1-4503-2193-8/13/07 ...$15.00. reliability; coding techniques reduce storage overhead while
281
Application
not performing well when nodes are not reliable, e.g., in data3
data2
func3
func2
Processor
data1 func1
dynamic networks [13].
AllocateData() ProcessData()
Some researchers proposed solutions for achieving higher
k-out-of-n Computing
reliability in dynamic networks. Dimakis et al. proposed Framework
Failure Probability Estimation Client
several erasure coding algorithms for maintaining a dis- Topology
Discovery
tributed storage system in a dynamic network [6]. Leong &
Expected Transmission Time
Computation
Storage
282
∑
N
∑
N Data Creator Data Retriever
Ropt = arg min Dij Rij (1)
R
i=1 j=1
Write Read
∑
N
∑
N Encrypted AES
Rij = k ∀i (3)
j=1
Xj − Rij ≥ 0 ∀i (4)
Mobile Ad Hoc Network
Xj and Rij ∈ {0, 1} ∀i, j (5)
Figure 2: An overview of improved MDFS.
sing the same task. As a result, only the transmission energy retrieve all tasks and their instances. The first constraint
affects the energy efficiency of the final solution. (Eq 7) indicates that n of the N nodes will be selected as
processor nodes. The second constraint (Eq 8) replicates
4.1 k-out-of-n Data Allocation each task to (n − k + 1) different processor nodes. The third
In this problem, we are interested in finding n storage constraint (Eq 9) ensures that the j th column of R can have
nodes denoted by S = {s1 , s2 , ...sn } , S ⊆ V such that the a non-zero element if only if Xj is 1; and constraint (Eq 10)
total expected transmission cost from any node to its k clos- is binary requirements for the decision variables.
est storage nodes – in terms of ETT – is minimized. We
formulate this problem as an ILP as shown in Equations 1
- 5.
The first constraint (Eq 2) selects exactly n nodes as stor- 5. SYSTEM IMPLEMENTATION
age nodes; the second constraint (Eq 3) indicates that each This section investigates the feasibility of running our
node has access to k storage nodes; the third constraint framework on real hardware. We implemented a mobile dis-
(Eq 4) ensures that j th column of R can have a non-zero tributed file system (MDFS) on top of the k-out-of-n frame-
element if only if Xj is 1; and constraint (Eq 5) is binary work. Figure 2 shows an overview of our MDFS. As shown,
requirements for the decision variables. each file is encrypted by a secret key and partitioned into
n1 file fragments. The secret key is also decomposed into n2
4.2 k-out-of-n Data Processing key fragments by using Shamir’s key sharing algorithm. The
The objective of this problem is to find n nodes in V as file and key fragments are distributed independently to the
processor nodes such that energy consumption for processing network. When a node needs to access a file, it must retrieve
M tasks of a single job is minimized. In addition, all tasks at least k1 file fragments and k2 key fragments. Our k-out-
must complete as long as k or more processor nodes can of-n data allocation allows nodes to optimally distribute the
finish their assigned tasks. file and key fragments, when compared with the state-of-art
In this problem, n nodes are selected as processor nodes; MDFS that distributes file and key fragments uniformly in
each processor node is assigned one or more tasks. Each the network [7]. The ratio k1 /n1 determines the reliability
task is replicated to n − k + 1 processor nodes. However, not of a file and the ratio k2 /n2 indicates the security of the file.
all instances are processed – once an instance of the task Decreasing k1 /n1 ratio improves the file reliability, while in-
completes, all other instances will be canceled. The task creasing k2 /n2 ratio strengthens the security. Consequently,
allocation can be formulated as an ILP as shown in Equa- our k-out-of-n enabled MDFS to achieve higher reliability
tions 6 - 10. In the formulation, Rij is a N × M matrix ,energy efficiency, and security. On top of our MDFS, we
which predefines the relationship between processor nodes also test our k-out-of-n data processing component by im-
and tasks; each element Rij is a binary variable indicat- plementing a face recognition application that counts the
ing whether task j is assigned to processor node i. X is a number of faces appearing in a set of files stored in the net-
binary vector containing processor nodes, i.e., Xi = 1 in- work. Our k-out-of-n framework selects n processor nodes
dicates that vi is a processor node. The objective function to retrieve and analyze the files in an energy-efficient and
minimizes the transmission time for n processor nodes to reliable way.
We implemented our system on HTC Evo 4G Smart-
phones, which run Android 2.3 OS using 1G Scorpion CPU,
∑
N
∑
M
512MB RAM, and a Wi-Fi 802.11 b/g interface. To enable
Ropt = arg min Tijr Rij (6) the Wi-Fi AdHoc mode, we rooted the device and modified
R i=1 j=1
a config file – wpa_supplicant.conf. For this experiment we
∑
N varied network size N and set n1 = ⌈0.6N ⌉, k1 = ⌈0.3n1 ⌉
Subject to: Xi = n (7) and n2 = ⌈0.6N ⌉, k2 = ⌈0.6n2 ⌉. Figure 3 shows measured
i=1 time of each component. As shown, distributing/retrieving
∑
N fragments takes much longer time than computation, which
Rij = n − k + 1 ∀j (8) affirms that our goal of minimizing communication energy
i=1
is necessary. We also observe that larger network sizes incur
longer distributing/retrieving time. This is because frag-
Xi − Rij ≥ 0 ∀i (9) ments are more sparsely distributed, resulting in more hops
Xj and Rij ∈ {0, 1} ∀i, j (10) to distribute/retrieve fragments.
283
16000 Topo Discovery Random Greedy KNF
MC Simulation 20
Random
8000
0.6
4000
2000
0.5
0
4 6 8 10 15 20 25 30 35 40 45
Number of Nodes
Network size
Figure 4: Data retrieval time. Figure 6: Effect of node density.
We then compared the file retrieval time between our al- decreases as the time elapses. The lower the energy is, the
location and random allocation. As shown in Figure 4, our higher the failure probability. Each node is assigned a con-
framework achieves 15% to 25% smaller data retrieval time stant application-dependent failure probability. In particu-
than Random. To validate the performance of our k-out-of- lar, for the simulations of our k-out-of-n data processing, we
n data processing, we measured the completion rate of our also artificially force various number of nodes to fail.
face-recognition algorithm by varying the number of node
failures. The face recognition algorithm had an average com- 6.1 k-out-of-n data allocation
pletion rate of 95% in our experimental setting. Figure 5 shows that mobility causes nodes to spend higher
energy in retrieving data compared with the static network
6. SIMULATION RESULTS and the energy consumption for RPGM is smaller than that
for Markov. It also shows that the energy consumption for
We performed simulations to evaluate the performance of
RPGM is smaller than that for Markov. The reason is that a
our k-out-of-n framework (denoted by KNF) in larger scale
storage node usually serves the nodes in its proximity; thus
networks. We consider a network of 400×400m2 where up
when nodes move in a group, the impact of mobility is less
to 45 mobile nodes are randomly deployed. The commu-
severe than when all nodes move randomly. In all scenarios,
nication range of a node is 130m. Two different mobility
KNF consumes lower energy than others.
models are tested – Markovian Waypoint Model and Refer-
Figure 6 shows that KNF achieves 15% to 25% higher
ence Point Group Mobility (RPGM). Markovian Waypoint
retrieval rate than others because KNF takes the dynamic
is similar to Random Waypoint Model in which it randomly
nature of the network into account when selecting storage
selects the waypoint of a node, but it accounts for the current
nodes. We also observe that the retrieval rates increase with
waypoint when it determines the next waypoint. RPGM
higher network density. The explanation is that, with higher
is a group mobility model where a set of leaders are se-
network density, client nodes can find more reliable paths to
lected; leaders move based on Markovian Waypoint; and
storage nodes and the network is less likely to be partitioned
other nodes follow their closest leaders.
due to node failures. Since the simulations run for 4 hours
We compare KNF with two other schemes – a greedy al-
and several nodes fail in the last hour due to depleted energy,
gorithm (Greedy) and a random placement algorithm (Ran-
the highest retrieval rate of KNF is only 80%.
dom). Greedy selects nodes with the largest number of
neighbors as storage nodes. Random selects storage nodes
randomly. The goal is to evaluate how the selected storage 6.2 k-out-of-n data processing
nodes impact the performance. We measure the following This section investigates how the failures of processor
metrics: consumed energy for retrieving data, consumed en- nodes affect the energy efficiency and job completion rate.
ergy for processing a job, data retrieval rate, and completion In Greedy, each task is replicated to n-k+1 processor nodes
rate of a job. A node may fail due to two independent fac- that have the lowest energy consumption for retrieving the
tors: depleted energy and an application-dependent failure task, and given a task. In Random, the processor nodes are
probability; specifically, the energy associated with a node selected randomly and each task is also replicated to n-k+1
284
440 1
340 0.3
0.2
320 KNF
0.1 Greedy
Random
300 0
0 2 4 6 8 10 0 2 4 6 8 10
Number of Node Failures Number of Node Failures
Figure 7: Effect of node failure on energy efficiency. Figure 8: Effect of node failure on completion ratio.
processor nodes randomly. When a node fails, we assume system. In Proceedings of Dependable Systems and
none of the tasks assigned to it can be completed. Networks, 2005.
Figure 7 shows that KNF consumes 10% to 30% lower en- [2] Byung-Gon Chun, Sunghwan Ihm, Petros Maniatis,
ergy than Greedy and Random. We observe that the energy Mayur Naik, and Ashwin Patti. Clonecloud: elastic
consumption is not sensitive to the number of node failures. execution between mobile device and cloud. In
When there is a node failure, a task may be executed on a Proceedings of EuroSys, 2011.
less optimal processor node and causes higher energy con- [3] Douglas S. J. De Couto. High-Throughput Routing for
sumption. However, this difference is small because given a Multi-Hop Wireless Networks. PhD dissertation, MIT,
task, it is replicated to n-k+1 processor nodes and failing 2004.
an arbitrary processor may have no effect on the execution [4] Eduardo Cuervo, Aruna Balasubramanian, Dae-ki
time of the job at all. Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra,
In Figure 8, we see that the completion ratio is 1 when no and Paramvir Bahl. Maui: making smartphones last
more than n−k nodes fail. An interesting observation is that longer with code offload. In Proceedings of MobiSys,
Greedy has the highest completion ratio. The main reason is 2010.
because, in Greedy, the load on each node is highly uneven, [5] Jiachen Liu David W. Coit. System reliability
meaning that some processor nodes may have many tasks optimization with k-out-of-n subsystems. Int. Journal
but some may not have a task. This allocation strategy of Reliability, Quality and Safety Engineering,
achieves high completion ratio because all tasks can com- 7(2):129–142, 2000.
plete as long as one of such high load processor nodes can
[6] A.G. Dimakis, K. Ramchandran, Y. Wu, and Changho
finish all its assigned tasks. In our simulation, about 30%
Suh. A survey on network codes for distributed
of processor nodes in Greedy are assigned all M tasks. An-
storage. Proceedings of the IEEE, 99:476–489, 2011.
alytically, if three of the ten processor nodes contain all M
[7] S. Huchton, G. Xie, and R Beverly. Building and
tasks, the (probability
) ( ) of completion when 9 processor nodes evaluating a k-resilient mobile distributed file system
fail is 1 − 76 / 10
9
= 0.3.
resistant to device compromise. In Proceedings of
Military Communications Conference, 2011.
7. CONCLUSIONS [8] S. Kosta, A. Aucinas, Pan Hui, R. Mortier, and
We presented the first k-out-of-n data alloca- Xinwen Zhang. Thinkair: Dynamic resource allocation
tion/processing framework that jointly addresses the and parallel execution in the cloud for mobile code
energy-efficiency and fault-tolerance challenges in dynamic offloading. In Proceedings of INFOCOM, 2012.
networks. It allocates data fragments to storage nodes [9] D. Leong, A.G. Dimakis, and Tracey Ho. Distributed
such that other nodes retrieve data reliably with minimal storage allocation for high reliability. In Proceedings of
energy consumption; when applications need to process ICC, 2010.
data stored in the network, the framework selects a set [10] M. Satyanarayanan, P. Bahl, R. Caceres, and
of processor nodes to execute the job such that the job N. Davies. The case for vm-based cloudlets in mobile
can be completed with minimal energy consumption and computing. Pervasive Computing, IEEE, 8:14–23,
high completion rate. We demonstrated the effectiveness of 2009.
our framework through both system implementation and [11] Cong Shi, Vasileios Lakafosis, Mostafa H. Ammar,
simulations in large-scale networks. and Ellen W. Zegura. Serendipity: enabling remote
computing among intermittently connected mobile
Acknowledgment devices. In Proceedings of MobiHoc, 2012.
This work was supported by Naval Postgraduate School un- [12] A Shokrollahi. Raptor codes. Information Theory,
der Grant No. N00244-12-1-0035. 52:2551–2567, 2006.
[13] Hakim Weatherspoon and John Kubiatowicz. Erasure
8. REFERENCES coding vs. replication: A quantitative comparison. In
Proceedings of Peer-to-Peer Systems, 2002.
[1] M.K. Aguilera, R. Janakiraman, and Lihao Xu. Using
erasure codes efficiently for storage in a distributed
285