Secure Deduplication Main

1
CHAPTER 1
INTRODUCTION
The advent of cloud storage motivates enterprises and organizations to outsource

data storage to third-party cloud providers, as evidenced by many real-life case studies.
One critical challenge of todays cloud storage services is the management of the ever
increasing volume of data. To make data management scalable, deduplication has been
a well-known technique to reduce storage space and upload bandwidth in cloud storage.
Instead of keeping multiple data copies with the same content, deduplication eliminates
redundant data by keeping only one physical copy and referring other redundant data to
that copy.
From a users perspective, data outsourcing raises security and privacy concerns.
One must trust third-party cloud providers to properly enforce confidentiality, integrity
checking, and access control mechanisms against any insider and outsider attacks.
However, deduplication, while improving storage and bandwidth efficiency, is
incompatible with traditional encryption. Specifically, traditional encryption requires
different users to encrypt their data with their own keys. Thus, identical data copies of
different users will lead to different cipher texts, making deduplication impossible.
The convergent encryption provides a viable option to enforce data

confidentiality while realizing deduplication. It encrypts/decrypts a data copy with a
convergent key, which is derived by computing the cryptographic hash value of the
content of the data copy itself. After key generation and data encryption, users retain the
keys and send the cipher text to the cloud. Since encryption is deterministic, identical
data copies will generate the same convergent key and the same cipher text. This allows
the cloud to perform deduplication on the cipher texts.
2
1.1 BACKGROUND CONCEPTS
1.1.1 Cloud computing
Cloud computing is the delivery of computing as a service rather than a product,

whereby shared resources, software, and information are provided to computers and
other devices as a metered service over a network typically the Internet. Cloud
computing provides computation, software, data access, and storage resources without
requiring cloud users to know the location and other details of the computing
infrastructure. End users access cloud based applications through a web browser or a
light weight desktop or mobile app while the business software and data are stored on
servers at a remote location. Cloud application providers strive to give the same or
better service and performance as if the software programs were installed locally on
end-user computers.
Figure1.1 Cloud Computing

3
1.1.1.1 Characteristics
Cloud computing exhibits the following key characteristics:
Empowerment of end-users of computing resources by putting the provisioning

of those resources in their own control, as opposed to the control of a centralized
IT service
Agility improves with users' ability to re-provision technological infrastructure

resources.
API accessibility to software that enables machines to interact with cloud

software in the same way the user interface facilitates interaction between
humans and computers. Cloud computing systems typically use REST-based
APIs.
Cost is claimed to be reduced and in a public cloud delivery model capital

expenditure is converted to operational expenditure. This is purported to lower
barriers to entry, as infrastructure is typically provided by a third-party and does
not need to be purchased for one-time or infrequent intensive computing tasks.
Pricing on a utility computing basis is fine-grained with usage-based options and
fewer IT skills are required for implementation in-house.
Device and location independence enable users to access systems using a web
browser regardless of their location or what device they are using e.g., PC,
mobile phone. As infrastructure is off-site typically provided by a third-party
and accessed via the Internet, users can connect from anywhere.
Virtualization technology allows servers and storage devices to be shared and

utilization be increased. Applications can be easily migrated from one physical
server to another.
4
Multi-tenancy enables sharing of resources and costs across a large pool of

users thus allowing for:
o Centralization of infrastructure in locations with lower costs such as real

estate, electricity, etc.
o Peak-load capacity increases users need not engineer for highest possible
load-levels
o Utilization and efficiency improvements for systems that are often only
1020% utilized.
Reliability is improved if multiple redundant sites are used, which makes well-
designed cloud computing suitable for business continuity and disaster recovery.
Scalability and Elasticity via dynamic on-demand provisioning of resources on a

fine-grained, self-service basis near real-time, without users having to engineer
for peak loads.
Performance is monitored and consistent and loosely coupled architectures are

constructed using web services as the system interface.
Security could improve due to centralization of data, increased security-focused

resources, etc., but concerns can persist about loss of control over certain
sensitive data, and the lack of security for stored kernels. Security is often as
good as or better than other traditional systems, in part because providers are
able to devote resources to solving security issues that many customers cannot
afford. However, the complexity of security is greatly increased when data is
distributed over a wider area or greater number of devices and in multi-tenant
systems that are being shared by unrelated users. In addition, user access to
security audit logs may be difficult or impossible. Private cloud installations are
in part motivated by users' desire to retain control over the infrastructure and
avoid losing control of information security.
5
Maintenance of cloud computing applications is easier, because they do not need

to be installed on each user's computer and can be accessed from different
places.
1.1.1.2 Service Models
Cloud computing providers offer their services according to three fundamental

models: IaaS, PaaS, and SaaS where IaaS is the most basic and each higher model
abstracts from the details of the lower models.
Figure 1.2 Service Models
1.1.1.2.1 Infrastructure as a Service
In this most basic cloud service model, cloud providers offer computers as
physical or more often as virtual machines , raw block storage, firewalls, load balancers,
and networks. IaaS providers supply these resources on demand from their large pools
installed in data centers. Local area networks including IP addresses are part of the
offer. For the wide area connectivity, the Internet can be used or - in carrier clouds -
dedicated virtual private networks can be configured.
6
To deploy their applications, cloud users then install operating system images on
the machines as well as their application software. In this model, it is the cloud user who
is responsible for patching and maintaining the operating systems and application
software. Cloud providers typically bill IaaS services on a utility computing basis, that
is, cost will reflect the amount of resources allocated and consumed.
1.1.1.2.2 Platform as a Service
In the PaaS model, cloud providers deliver a computing platform and/or solution
stack typically including operating system, programming language execution
environment, database, and web server. Application developers can develop and run
their software solutions on a cloud platform without the cost and complexity of buying
and managing the underlying hardware and software layers. With some PaaS offers, the
underlying compute and storage resources scale automatically to match application
demand such that the cloud user does not have to allocate resources manually.
1.1.1.2.3 Software as a Service
In this model, cloud providers install and operate application software in the
cloud and cloud users access the software from cloud clients. The cloud users do not
manage the cloud infrastructure and platform on which the application is running. This
eliminates the need to install and run the application on the cloud user's own computers
simplifying maintenance and support. What makes a cloud application different from
other applications is its elasticity. This can be achieved by cloning tasks onto multiple
virtual machines at run-time to meet the changing work demand.
Load balancers distribute the work over the set of virtual machines. This
process is transparent to the cloud user who sees only a single access point. To
accommodate a large number of cloud users, cloud applications can be multitenant, that
is, any machine serves more than one cloud user organization. It is common to refer to
special types of cloud based application software with a similar naming convention:
desktop as a service, business process as a service, Test Environment as a Service,
communication as a service.
7
1.1.1.3 Cloud clients
Users access cloud computing using networked client devices, such as desktop
computers, laptops, tablets and smart phones. Some of these devices - cloud clients -
rely on cloud computing for all or a majority of their applications so as to be essentially
useless without it. Examples are thin clients and the browser-based Chrome book. Many
cloud applications do not require specific software on the client and instead use a web
browser to interact with the cloud application. With AJAX and HTML5 these Web user
interfaces can achieve a similar or even better look and feel as native applications. Some
cloud applications, however, support specific client software dedicated to these
applications e.g., virtual desktop clients and most email clients. Some legacy
applications line of business applications that until now have been prevalent in thin
client Windows computing are delivered via a screen-sharing technology.
1.1.1.4 Deployment models
Figure 1.3 Deployment Models
1.1.1.4.1 Public cloud
A public cloud is one based on the standard cloud computing model, in which a
service provider makes resources, such as applications and storage, available to the
general public over the Internet. Public cloud services may be free or offered on a pay-
per-usage model.
8
1.1.1.4.2 Community cloud
Community cloud shares infrastructure between several organizations from a

specific community with common concerns like security, compliance, jurisdiction, etc.
whether managed internally or by a third-party and hosted internally or externally. The
costs are spread over fewer users than a public cloud but more than a private cloud, so
only some of the cost savings potential of cloud computing are realized.
1.1.1.4.3 Hybrid cloud
Hybrid cloud is a composition of two or more clouds private, community, or

public that remain unique entities but are bound together, offering the benefits of
multiple deployment models. It can also be defined as multiple cloud systems that are
connected in a way that allows programs and data to be moved easily from one
deployment system to another.
1.1.1.4.4 Private cloud
Private cloud is infrastructure operated solely for a single organization, whether

managed internally or by a third-party and hosted internally or externally. They have
attracted criticism because users still have to buy, build, and manage them and thus do
not benefit from less hands-on management, essentially the economic model that makes
cloud computing such an intriguing concept.
9
1.1.1.5 Architecture
Figure1.4 Cloud Computing - Sample Architecture
Cloud architecture, the systems architecture of the software systems involved in

the delivery of cloud computing, typically involves multiple cloud components
communicating with each other over a loose coupling mechanism such as a messaging
queue. Elastic provision implies intelligence in the use of tight or loose coupling as
applied to mechanisms such as these and others.
1.1.1.6 Inter cloud and Cloud Engineering
The Inter cloud is an interconnected global "cloud of clouds" and an extension of

the Internet "network of networks" on which it is based.
Cloud engineering is the application of engineering disciplines to cloud

computing. It brings a systematic approach to the high level concerns of
commercialization, standardization, and governance in conceiving, developing,
operating and maintaining cloud computing systems. It is a multidisciplinary method
encompassing contributions from diverse areas such as systems, software, web,
performance, information, security, platform, risk, and quality engineering.
10
1.1.2 Deduplication
In computing, data deduplication is a specialized data compression technique for

eliminating duplicate copies of repeating data. Related and somewhat synonymous
terms are intelligent (data) compression and single-instance (data) storage. This
technique is used to improve storage utilization and can also be applied to network data
transfers to reduce the number of bytes that must be sent. In the deduplication process,
unique chunks of data, or byte patterns, are identified and stored during a process of
analysis.
As the analysis continues, other chunks are compared to the stored copy and
whenever a match occurs, the redundant chunk is replaced with a small reference that
points to the stored chunk. Given that the same byte pattern may occur dozens,
hundreds, or even thousands of times (the match frequency is dependent on the chunk
size), the amount of data that must be stored or transferred can be greatly reduced.
1.1.2.1 Benefits
Storage-based data deduplication reduces the amount of storage needed for a

given set of files. It is most effective in applications where many copies of very
similar or even identical data are stored on a single diska surprisingly
common scenario. In the case of data backups, which routinely are performed to
protect against data loss, most data in a given backup remain unchanged from
the previous backup.
Network data deduplication is used to reduce the number of bytes that must be
transferred between endpoints, which can reduce the amount of bandwidth
required. See WAN optimization for more information.
Virtual servers benefit from deduplication because it allows nominally separate

system files for each virtual server to be coalesced into a single storage space. At
the same time, if a given server customizes a file, deduplication will not change
the files on the other serverssomething that alternatives like hard links or
shared disks do not offer.
11
1.2 PROBLEM STATEMENT
Conventional cryptosystem to encrypt its files, then two identical files encrypted
with different users keys would have different encrypted representations, and the DFC
subsystem could neither recognize that the files are identical nor coalesce the encrypted
files into the space of a single file, unless it had access to the users private keys, which
would be a significant security violation.
1.3 OBJECTIVE OF THE PROJECT
Data management through cloud is being viewed as a technique that can save the
cost for data sharing and management. A key concept for remote data storage is client-
side deduplication, in which the server stores only a single copy of each file, regardless
of how many clients need to store that file. That is only the first client needs to upload
the file to the server. This design will save both the communication bandwidth as well
as the storage capacity. Data deduplication is a technique for eliminating duplicate
copies of data, and has been widely used in cloud storage to reduce storage space and
upload bandwidth.
1.4 SCOPE OF THE PROJECT
Dekey, in which users do not need to manage any keys on their own but instead
securely distribute the convergent key shares across multiple servers. It preserves
semantic security of convergent keys and confidentiality of outsourced data.
12
1.5 LITERATURE SURVEY
In [1] Jin Li, Xiaofeng Chen, Mingqiang Li, Jingwei Li, Patrick P.C. Lee, and
Wenjing Lou ,proposes a technique, data deduplication for eliminating duplicate copies
of data, and has been widely used in cloud storage to reduce storage space and upload
bandwidth. Promising as it is, an arising challenge is to perform secure deduplication in
cloud storage. Although convergent encryption has been extensively adopted for secure
deduplication, a critical issue of making convergent encryption practical is to efficiently
and reliably manage a huge number of convergent keys. This makes the first attempt to
formally address the problem of achieving efficient and reliable key management in
secure deduplication.
This paper introduces a baseline approach in which each user holds an

independent master key for encrypting the convergent keys and outsourcing them to the
cloud. However, such a baseline key management scheme generates an enormous
number of keys with the increasing number of users and requires users to dedicatedly
protect the master keys. To this end, it propose Dekey, a new construction in which
users do not need to manage any keys on their own but instead securely distribute the
convergent key shares across multiple servers. Security analysis demonstrates that
Dekey is secure in terms of the definitions specified in the proposed security model. As
a proof of concept, it implement Dekey using the Ramp secret sharing scheme and
demonstrate that Dekey incurs limited overhead in realistic environments.
In [2] A. Acharya, M. Uysal, and J. Saltz, describes an active disks, which is a

Programming model, algorithms and evaluation.Several application and technology
trends indicate that it might be profitable and feasible to move data-intensive
computation closer to the data that it processes. At the application end, the rate at which
new data is being placed online is outstripping the growth in disk capacity as well as the
improvement in performance of commodity processors. Furthermore, there is a change
in user expectations regarding large datasets from primarily archival storage to frequent
reprocessing in their entirety.
13
Active Disk architectures which integrate significant processing power and

memory into a disk drive and allow application-specific code to be downloaded and
executed on the data that is being read from written to disk. To utilize Active Disks, an
application is partitioned between a host-resident component and a disk- resident
component. The key idea is to offload bulk of the processing to the disk-resident
processors and to use the host processor primarily for coordination, scheduling and
combination of results from individual disks.
Active Disks present a promising architectural direction for two reasons. First,
since the number of processors scales with the number of disks, active-disk
architectures are better equipped to keep up with the processing requirements for rapidly
growing datasets. Second, since the processing. Components are integrated with the
drives, the processing capacity will evolve as the disk drives evolve. This is similar to
the evolution of disk caches as the drives get faster, the disk cache becomes larger.
The introduction of Active Disks raises several questions. First, how are they
programmed? What is disk-resident code i.e., a disklet allowed to do? How does it
communicate with the host-resident component. Second, how does one protect against
buggy or malicious programs. Third, is it feasible to utilize Active Disks for the classes
of datasets that are expected to grow rapidly - i.e. commercial data warehouses, image
databases and satellite data repositories. To be able to take advantage of processing
power that scales with dataset size, it should be possible to partition algorithms that
process these datasets such that most of the processing can be offloaded to the disk-
resident processors. Finally, how much benefit can be expected with current technology
and in foreseeable future.
In [3] T. Cholez, I. Chrisment, and O. Festor, evaluates Sybil attacks in kad

schemes.Peer-to-Peer (P2P) networks have proven their ability to host and share a large
amount of resources thanks to the collaboration of many individual peers. They are
known to have many advantages compared to the client-server scheme: P2P networks
scale better; the cost of the infrastructure is distributed and they are fault tolerant. Most
currently deployed structured P2P networks are based on Distributed Hash Tables.
14
Each peer is responsible for a subset of the network. This organization improves
the efficiency of P2P networks ensuring routing in O (log n) but can also lead to
security issues like the Sybil Attack.
The Sybil Attack, as consists in creating a large number of fake peers called the
"Sybils" and placing them in a strategic way in the DHT to take control over a part of it.
Douceur proved that the Sybil Attack cannot be totally avoided as long as the malicious
entity has enough resources to create the Sybils. This problem was not considered when
designing most of the major structured P2P networks. In this context, the goal of the
defense strategies described in the literature is to limit the Sybil Attack as completely
stopping it is impossible.
The latest versions of the major KAD clients have introduced new protection
mechanisms to limit the Sybil attack, making the previous experiments concerning the
security issues of KAD like inefficient. These newly implemented protection
mechanisms have neither been described nor been evaluated and assessed. The purpose
of this study is to evaluate the implemented security mechanisms against real attacks.
We will then be able to have an updated view of KAD vulnerabilities: Is the network
still vulnerable to the attack proposed.
This work is a first and necessary step to design improved defense mechanisms
in future works. As far as we know, this paper is also the first attempt to experiment and
assess practical protections set by the P2P community to protect a real network. Even if
the security mechanisms are a step forward, they are not yet sufficient. We have shown
that a distributed eclipse attack focused on a particular ID still remains possible with a
moderate cost. This result shows that the main weakness of KAD has shifted to the
possibility to reference many KADIDs with the same IP address to the left possibility to
freely choose its KADID. More-over, if we consider an attacker with many resources,
particularly considering the number of IP addresses, the overall protection can be
threatened due to the specific design using local rules.
15
Figure 1.5 KAD routing table scheme
In [4] A.Shamir introduces a method to share a secret. Generalizing the problem

to one in which the secret is some data D e.g., the safe combination and in which non
mechanical solutions which manipulate this data are also allowed. The goal is to divide
D into n pieces D1 . . . D n in such a way that:
(1) Knowledge of any k or more D i pieces makes D easily computable.

(2) Knowledge of any k- 1 or fewer Di pieces leaves D completely
undetermined .
Such a scheme is called a (k, n) threshold scheme.
Efficient threshold schemes can be very helpful in the management of

cryptographic keys. In order to protect data we can encrypt it, but in order to protect the
encryption key we need a different method further encryptions change the problem
rather than solve it. The most secure key management scheme keeps the key in a single,
well-guarded location a computer, a human brain, or a safe. This scheme is highly
unreliable since a single misfortune a computer breakdown, sudden death, or sabotage
can make the information inaccessible.
16
An obvious solution is to store multiple copies of the key at different locations,

but this increases the danger of security breaches computer penetration betrayal, or
human errors. By using a (k, n) threshold scheme with n = 2k- 1 we get a very robust
key management scheme: We can recover the original key even when [n/2] = k- 1 of the
n pieces are destroyed, but our opponents cannot reconstruct the key even when security
breaches expose [n/2] = k- 1 of the remaining k pieces.
In other applications the tradeoff is not between secrecy and reliability, but
between safety and convenience of use. Consider, for example, a company that digitally
signs all its checks. If each executive is given a copy of the company's secret signature
key, the system is convenient but easy to misuse. If the cooperation of all the company's
executives is necessary in order to sign each check, the system is safe but inconvenient.
The standard solution requires at least three signatures per check, and it is easy to
implement with a (3, n) threshold scheme.
In [5] Y. Tang, P. P. C. Lee, J. C. S. Lui, and R. Perlman, describes that cloud

storage e.g., Amazon S3, MyAsiaCloud offers an abstraction of infinite storage space
for clients to host data, in a pay-as-you-go manner. Thus, instead of self-maintaining
data centers, enterprises can now outsource the storage of a bulk amount of digitized
content to those third-party cloud storage providers so as to save the financial overhead
in data management. Apart from enterprises, individuals can also benefit from cloud
storage as a result of the advent of mobile devices e.g., smartphones, laptops.
Given that mobile devices have limited storage space in general, individuals can
move audio/video files to the cloud and make effective use of space in their mobile
devices. However, privacy and integrity concerns become relevant as we now count on
third parties to host possibly sensitive data. To protect outsourced data, a
straightforward approach is to apply cryptographic encryption onto sensitive data with a
set of encryption keys, yet maintaining and protecting such encryption keys will create
another security issue
17
FADE generalizes time-based file assured deletion i.e., files are assuredly
deleted upon time expiration into a more fine-grained approach called policy based file
assured deletion, in which files are associated with more flexible file access policies
e.g., time expiration, read/write permissions of authorized users and are assuredly
deleted when the associated file access policies are revoked and become obsolete.
In this paper makes the following contributions:
Proposes a new policy-based file assured deletion scheme that reliably deletes
files with regard to revoked file access policies. In this context, we design the
key management schemes for various file manipulation operations.
Implement a working prototype of FADE atop Amazon S3. Our implementation

aims to illustrate that various applications can benefit from FADE, such as
cloud-based backup systems. FADE consists of a set of API interfaces that we
can export, so that we can adapt FADE into different cloud storage
implementations.
Empirically evaluate the performance overhead of FADE atop Amazon S3, and
using realistic experiments, we show the feasibility
Figure 1.6 The FADE Architecture

.
Figure 2.2 shows the FADE architecture. In the following, we define the metadata of
FADE attached to individual files. We then describe how we implement the data owner
and the key manager, and how the data owner interacts with the storage cloud.
18
Representation of Metadata
For each file protected by FADE, we include the metadata that describes the
policies associated with the file as well as a set of encrypted keys. In FADE, there are
two types of metadata: file metadata and policy metadata. File metadata. The file
metadata mainly contains two pieces of information: file size and HMAC. We hash the
encrypted file with HMAC-SHA1 for integrity checking. The file metadata is of fixed
size with 8 bytes of file size and 20 bytes of HMAC and attached at the beginning of the
encrypted file. Both the file metadata and the encrypted data file will then be treated as
a single file to be uploaded to the storage cloud.
In [6] C. Wang, Q. Wang, K. Ren, and W. Lou, defines how public auditing
preserves storage security in cloud computing. Cloud Computing has been envisioned as
the next-generation information technology architecture for enterprises, due to its long
list of unprecedented advantages in the IT history: on-demand self-service, ubiquitous
network access, location independent resource pooling, rapid resource elasticity, usage-
based pricing and transference of risk. As a disruptive technology with profound
implications, Cloud Computing is transforming the very nature of how businesses use
information technology. One fundamental aspect of this paradigm shifting is that data is
being centralized or outsourced to the cloud.
From users perspective, including both individuals and IT enterprises, storing

data remotely to the cloud in a flexible on-demand manner brings appealing benefits:
relief of the burden for storage management, universal data access with independent
geographical locations, and avoidance of capital expenditure on hardware, software, and
personnel maintenances. While Cloud Computing makes these advantages more
appealing than ever, it also brings new and challenging security threats towards users
outsourced data. Since CSP are separate administrative entities, data outsourcing is
actually relinquishing users ultimate control over the fate of their data. As a result, the
correctness of the data in the cloud is being put at risk due to the following reasons.
19
This problem, if not properly addressed, may impede the successful deployment
of the cloud architecture. As users no longer physically possess the storage of their data,
traditional cryptographic primitives for the purpose of data security protection cannot be
directly adopted. In particular, simply downloading all the data for its integrity
verification is not a practical solution due to the expensiveness in I/O and transmission
cost across the network. Besides, it is often insufficient to detect the data corruption
only when accessing the data, as it does not give users correctness assurance for those
unaccessed data and might be too late to recover the data loss or damage.
In this paper work is among the first few ones to support privacy-preserving
public auditing in Cloud Computing, with a focus on data storage. Besides, with the
prevalence of Cloud Computing, a foreseeable increase of auditing tasks from different
users may be delegated to TPA.
To address these problems, our work utilizes the technique of public key based
homomorphic linear authenticator or HLA for short which enables TPA to perform the
auditing without demanding the local copy of data and thus drastically reduces the
communication and computation overhead as compared to the straightforward data
auditing approaches. By integrating the HLA with random masking, our protocol
guarantees that the TPA could not learn any knowledge about the data content stored in
the cloud server during the efficient auditing process.
The aggregation and algebraic properties of the authenticator further benefit our
design for the batch auditing. Specifically, our contribution can be summarized as the
following three aspects:
We motivate the public auditing system of data storage security in Cloud

Computing and provide a privacy-preserving auditing protocol, i.e., our scheme
enables an external auditor to audit users outsourced data in the cloud without
learning the data content.
To the best of our knowledge, our scheme is the first to support scalable and
efficient public auditing in the Cloud Computing. Specifically, our scheme
20
achieves batch auditing where multiple delegated auditing tasks from different
users can be performed simultaneously by the TPA.
We prove the security and justify the performance of our proposed schemes.
In [7] S. A.Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn,

proposes a scalable, high-performance distributed file system called ceph. System
designers have long sought to improve the performance of file systems, which have
proved critical to the overall performance of an exceedingly broad class of
applications. The scientific and high-performance computing communities in
particular have driven advances in the performance and scalability of distributed
storage systems, typically predicting more general purpose needs by a few years.
Traditional solutions, exemplified by NFS, provide a straightforward model in

which a server exports a file system hierarchy that clients can map into their local name
space. Although widely used, the centralization inherent in the client/server model has
proven a significant obstacle to scalable performance. More recent distributed file
systems have adopted architectures based on object-based storage, in which
conventional hard disks are replaced with intelligent object storage devices OSD which
combine a CPU, network interface, and local cache with an underlying disk or RAID.
The architecture is based on the assumption that systems at the petabyte scale
are inherently dynamic: large systems are inevitably built incrementally, node failures
are the norm rather than the exception, and the quality and character of workloads are
constantly shifting over time. Ceph decouples data and metadata operations by
eliminating file allocation tables and replacing them with generating functions. This
allows Ceph to leverage the intelligence present in OSDs to distribute the complexity
surrounding data access, update serialization, replication and reliability, failure
detection, and recovery. Ceph utilizes a highly adaptive distributed metadata cluster
architecture that dramatically improves the scalability of metadata access, and with it,
the scalability of the entire system.
21
Figure 1.7 System architecture:CEPH

The Ceph file system has three main components: the client, each instance of
which exposes a near-POSIX file system interface to a host or process; a cluster of
OSDs, which collectively stores all data and metadata; and a metadata server cluster,
which manages the namespace file names and directories while coordinating security,
consistency and coherence. We say the Ceph interface is near-POSIX because we find it
appropriate to extend the interface and selectively relax consistency semantics in order
to better align with the needs of applications and to improve system performance. The
primary goals of the architecture are scalability to hundreds of petabytes and beyond,
performance, and reliability.
In [8] B. Welch, M. Unangst, Z. Abbasi, G. Gibson, B. Mueller, J. Small, J.

Zelenka, and B. Zhou, studies about scalable performance of the panasas parallel file
system.Storage systems for high performance computing environments must be
designed to scale in performance so that they can be configured to match the required
load. Clustering techniques are often used to provide scalability. In a storage cluster,
many nodes each control some storage, and the overall distributed file system assembles
the cluster elements into one large, seamless storage system. The storage cluster can be
hosted on the same computers that perform data processing, or they can be a separate
cluster that is devoted entirely to storage and accessible to the compute cluster via a
network protocol.
22
The unique aspects of the Panasas system are its use of per-file, client-driven
RAID, its parallel RAID rebuild, its treatment of different classes of metadata block,
file, system and a commodity parts based blade hardware with integrated UPS. Of
course, the system has many other features such as object storage, fault tolerance,
caching and cache consistency, and a simplified management model that are not unique,
but are necessary for a scalable system implementation.
An object is a container for data and attributes; it is analogous to the inode inside
a traditional UNIX file system implementation. Specialized storage nodes called OSD
store objects in a local OSDFS file system. The object interface addresses objects in a
two-level partition ID/object ID namespace.The OSD wire protocol provides byte
oriented access to the data, attribute manipulation, creation and deletion of objects, and
several other specialized operations [OSD04]. We use an iSCSI transport to carry OSD
commands that are very similar to the OSDv2 standard currently in progress within
SNIA and ANSI-T10 [SNIA].
The Panasas file system is layered over the object storage. Each file is striped
over two or more objects to provide redundancy and high bandwidth access. The file
system semantics are implemented by metadata managers that mediate access to objects
from clients of the file system. The clients access the object storage using the
iSCSI/OSD protocol for Read and Write operations. The I/O operations proceed directly
and in parallel to the storage nodes, bypassing the metadata managers.
23
Figure 1.8 Panasas System Components
In [9]. Wickremesinghe, J. Chase, and J. Vitter, proposes an approach for

Distributed computing with load-managed active storage.Disk storage densities have
been growing by over 50% per year, and storage systems are accumulating larger
amounts of data than ever before. Storage is increasingly network-based and shared by
many applications including parallel applicationsaccessing storage through high-
speed interconnects such as SANs, Infiniband, or Ethernet networks at gigabit speeds.
It is an open question how to best integrate computation with storage access in

these systems. Technology trends favor migration of limited application processing
capability down the network storage hierarchy into storage servers or even high-end
disk drives. This active storage or smart disks model has been the focus of a
significant body of research mostly directed at data mining and other database
processing tasks.
24
Future storage system components may offer varying degrees of computational

power at varying granularities; thus we refer to computation-enabled storage nodes
generically as Active Storage Units or ASUs. ASUs correspond to what is often called a
storage brick to evoke their role as a building block for large-scale systems that store
and process massive data. However, in our model an ASU could also correspond to a
larger entity such as a block storage server or a site combining storage and processing.
ASUs allow processing capacity to scale naturally with the size of storage. They
also have the potential to reduce data movement across the interconnect if searching,
filtering, or read/modify/write steps execute directly on ASUs. This allows aggregation
of larger numbers of drives behind each network port, and it can improve host
processing performance since data movement in host memory is often a leading drain
on host CPU resources. However, ASUs introduce new distributed computing
challenges relating to controlling the mapping of application functions to ASUs,
coordinating functions across ASUs and hosts, and sharing of ASU resources.
This paper explores a programming model to support computation on ASUs,

with an emphasis on flexible resource management at the system level. Three factors
motivate our approach. First, network storage is a shared resource, and storage-based
computation should not occur if it interferes with storage access for other applications.
Second, ASUs represent an asymmetric parallel processing model; the processing power
available in the storage hierarchy may vary widely across configurations, and
applications should configure to make the best use of the available parallelism. Third,
active storage offers a potential for local control over data movement and access order
to optimize storage performance; application structure should expose ordering
constraints precisely, so that ASUs may reorder operations when it is beneficial to do
so.
In [10] Wolchok, O. S. Hofmann, N. Heninger, E. W. Felten, J. A. Halderman,

C. J. Rossbach, B. Waters, and E. Witchel,defines a new approach for defeating vanish
with low-cost sybil attacks against large DHEs.As storage capacities increase and
applications move into the cloud, controlling the lifetime of sensitive data is becoming
increasengly difficult.
25
A user in possession of a VDO can retrieve the plaintext prior to the expiration
time T by simply reading the secret shares from at least k indices in the DHT and
reconstructing the decryption key. When the expiration time passes, the DHT will
expunge the stored shares, and, the Vanish authors assert, the information needed to
reconstruct the key will be permanently lost. The Vanish team released an
implementation based on the million-node Vuze DHT, which is used mainly for Bit
Torrent tracking.
Vanish is an intriguing approach to an important problem; unfortunately, in its

present form, it is insecure. In this paper, we show that data stored using the deployed
Vanish system can be recovered long after it is supposed to have been destroyed, that
such attacks can be carried out inexpensively, and that alternative approaches to
building Vanish are unlikely to be much safer. We also examine what was wrong with
the Vanish papers security analysis, which anticipated attacks like ours but concluded
that they would be prohibitively expensive, and draw lessons for the design of future
systems.
Attacks : Vanishs security depends on the assumption that an attacker cannot

efficiently extract VDO key shares from the DHT before they expire. Suppose an
adversary could continuously crawl the DHT and record a copy of everything that gets
stored. Later, if he wished to decrypt a Vanish message, he could simply look up the
key shares in his logs. Vanish works by encrypting each message with a random key
and storing shares of the key in a large, public distributed hash table (DHT).
The paper present two Sybil attacks against the current Vanish implementation,
which stores its encryption keys in the million-node Vuze BitTorrent DHT. These
attacks work by continuously crawling the DHT and saving each stored value before it
ages out. The security guarantees that Vanish sets out to provide would be extremely
useful, but, unfortunately, the system in its current form does not provide them in
practice. As we have shown, efficient Sybil attacks can recover the keys to almost all
Vanish data objects at low cost.
26
CHAPTER 2
SYSTEM DESIGN
2.1 SYSTEM CONFIGURATION
2.1.1 Hardware Requirements
Processor : Pentium IV 2.4 GHz

Hard Disk : 350 GB
RAM : 3 GB
Input device : Standard Keyboard and Mouse.
Output device : VGA and High Resolution Monitor.
2.1.2 Software Requirements
Operating System : Windows XP and above

Software : JDK 1.6.1
27
2.2 SYSTEM ANALYSIS
2.2.1 Existing System
In existing system a Dekey is introduced which is a new construction in which

users do not need to manage any keys on their own but instead securely distribute the
convergent key shares across multiple servers. Dekey using the Ramp secret sharing
scheme and demonstrate that Dekey incurs limited overhead in realistic environments
we propose a new construction called Dekey, which provides efficiency and reliability
guarantees for convergent key management on both user and cloud storage sides.
A new construction Dekey is proposed to provide efficient and reliable

convergent key management through convergent key Deduplication and secret sharing.
Dekey supports both file-level and block level Deduplication. Security analysis
demonstrates that Dekey is secure in terms of the definitions specified in the proposed
security model. In particular, Dekey remains secure even the adversary controls a
limited number of key servers. We implement Dekey using the Ramp secret sharing
scheme that enables the key management to adapt to different reliability and
confidentiality levels. Our evaluation demonstrates that Dekey incurs limited overhead
in normal upload/download operations in realistic cloud environments.
2.2.1.1 Disadvantages of Existing System
Deduplication is not done by the authorized one which may lead to the security
violation
Any user can modify the data, if they know about the information of files.
28
2.2.2 Proposed System
Convergent encryption has been proposed to enforce data confidentiality while

making deduplication feasible. It encrypts/decrypts a data copy with a convergent key,
which is obtained by computing the cryptographic hash value of the content of the data
copy. After key generation and data encryption, users retain the keys and send the
cipher text to the cloud. Since the encryption operation is deterministic and is derived
from the data content, identical data copies will generate same convergent key and
hence the same cipher text.
To prevent unauthorized access, a secure proof of ownership protocol is also

needed to provide the proof that the user indeed owns the same file when a duplicate is
found. After the proof, subsequent users with the same file will be provided a pointer
from the server without needing to upload the same file. A user can download the
encrypted file with the pointer from the server, which can only be decrypted by the
corresponding data owners with their convergent keys. Thus, convergent encryption
allows the cloud to perform deduplication on the cipher texts and the proof of ownership
prevents the unauthorized user to access the file.
However, previous deduplication systems cannot support differential

authorization duplicate check, which is important in many applications. In such an
authorized deduplication system, each user is issued a set of privileges during system
initialization each file uploaded to cloud is also bounded by a set of privileges to specify
which kind of users is allowed to perform the duplicate check and access the files.
To better protect data security, this paper makes the first attempt to formally
address the problem of authorized data deduplication. Different from traditional
deduplication systems, the differential privileges of users are further considered in
duplicate check besides the data itself. We also present several new deduplication
constructions supporting authorized duplicate check in a hybrid cloud architecture.
Security analysis demonstrates that our scheme is secure in terms of the definitions
specified in the proposed security model.
29
As a proof of concept, we implement a prototype of our proposed authorized

duplicate check scheme and conduct test bed experiments using our prototype. We show
that our proposed authorized duplicate check scheme incurs minimal overhead
compared to normal operations.
This secure client side deduplication is done as follows:
Suppose user 1 is the first user who uploads file F. He will execute algorithm E with file
F and security parameter 1 as input and obtain a short secret encryption key , a short
encoding C {0, 1}3 and a long encoding CF . User 1 will send both C and CF to the
cloud storage server User 2. User 2 will compute the hash value hash(CF ), put C in
secure and small primary storage, and put CF in the potentially insecure but large
secondary storage. At the last, User 2 will add (key = hash(F); value = (hash(CF );C )
into his lookup database. Suppose User 3 who tries to upload the same file F after User
1. User 3 will send hash(F) to the cloud storage server User 2. User 2 finds that hash(F)
is already in his lookup database. Then User 2, who is running algorithm V with C as
input, interacts with User 3, who is running algorithm P with F as input. At the end of
interaction, User 3 will learn and user 2 will compare the hash value hash(CF )
provided by user 3 with the one computed by himself. Later, User 3 is allowed to
download CF from User 2 at any time and decrypt it to obtain the file F by running
algorithm D( , CF ).
2.2.2.1 Advantages of Proposed System
The user is only allowed to perform the duplicate check for files marked with the
corresponding privileges.
We present an advanced scheme to support stronger security by encrypting the

file with differential privilege keys.
Reduce the storage size of the tags for integrity check. To enhance the security
of deduplication and protect the data confidentiality
30
2.2.3 Feasibility Study
Feasibility studies aim to objectively and rationally uncover the strengths and
weaknesses of the existing business or proposed venture, opportunities and threats as
presented by the environment, the resources required to carry through, and ultimately
the prospects for success. In its simplest term, the two criteria to judge feasibility are
cost required and value to be attained. As such, a well-designed feasibility study should
provide a historical background of the business or project, description of the product or
service, accounting statements, details of the operations and management, marketing
research and policies, financial data, legal requirements and tax obligations. Generally,
feasibility studies precede technical development and project implementation.
2.2.3.1 Economical Feasibility
This study is carried out to check the economic impact that the system will have
on the organization. The amount of fund that the company can pour into the research
and development of the system is limited. The expenditures must be justified. Thus the
developed system as well within the budget and this was achieved because most of the
technologies used are freely available. Only the customized products had to be
purchased.
2.2.3.2 Technical Feasibility
Technical feasibility study is carried out to check the technical feasibility, that is,
the technical requirements of the system. Any system developed must not have a high
demand on the available technical resources. This will lead to high demands on the
available technical resources. This will lead to high demands being placed on the client.
The developed system must have a modest requirement, as only minimal or null
changes are required for implementing this system.
31
2.2.3.3 Operational Feasibility
The aspect of study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The user
must not feel threatened by the system, instead must accept it as a necessity. The level
of acceptance by the users solely depends on the methods that are employed to educate
the user about the system and to make him familiar with it. His level of confidence must
be raised so that he is also able to make some constructive criticism, which is
welcomed, as he is the final user of the system.
2.3 SOFTWARE SPECIFICATION
2.3.1 Front End: Java- JDK 1. 6.1
Java is a high-level, third generation programming language, like C,

Fortran, Smalltalk, Perl, and many others. You can use Java to write computer
applications that crunch numbers, process words, play games, store data or do any of the
thousands of other things computer software can do. Compared to other programming
languages, Java is most similar to C. However although Java shares much of C's syntax,
it is not C. Knowing how to program in C or, better yet, C++, will certainly help you to
learn Java more quickly, but you don't need to know C to learn Java. Unlike C++ Java is
not a superset of C. A Java compiler won't compile C code, and most large C programs
need to be changed substantially before they can become Java programs.
What's most special about Java in relation to other programming languages

is that it lets you write special programs called applets that can be downloaded from the
Internet and played safely within a web browser. Traditional computer programs have
far too much access to your system to be downloaded and executed willy-nilly.
Although you generally trust the maintainers of various ftp archives and bulletin boards
to do basic virus checking and not to post destructive software, a lot still slips through
the cracks. Even more dangerous software would be promulgated if any web page you
visited could run programs on your system.
32
Java solves this problem by severely restricting what an applet can do. A
Java applet cannot write to your hard disk without your permission. It cannot write to
arbitrary addresses in memory and thereby introduce a virus into your computer. It
should not crash your system.
Object oriented programming is the catch phrase of computer programming

in the 1990's. Although object oriented programming has been around in one form or
another since the Simulate language was invented in the 1960's, it's really begun to take
hold in modern GUI environments like Windows, Motif and the Mac. In object-oriented
programs data is represented by objects. Objects have two sections, fields (instance
variables) and methods. Fields tell you what an object is. Methods tell you what an
object does. These fields and methods are closely tied to the object's real world
characteristics and behavior. When a program is run messages are passed back and forth
between objects. When an object receives a message it responds accordingly as defined
by its methods.
33
CHAPTER 3
SYSTEM DESCRIPTION
3.1 MODULES DESCRIPTION
1. System Setup
2. User Module
3. Secure DeDuplication System
4. Security Of Duplicate Check Token
5. Send key
6. Performance Evaluation
3.1.1 System Setup
In this module cloud environment setup will be done. The number of users
present in the cloud environment and the cloud service providers will be setup. Cloud
server is the one who provides an data storage access to the data owners. Its provides an
efficient access to the cloud servers in the effective manner. Cloud user is the one who
access and modify the information present in the cloud data centers.
In Dekey, we assume that the number of KM-CSPs is n.
S1: On input security parameter 1, the user initializes a convergent encryption

scheme, and two PoW protocols POWF and POWB for the file ownership proof
and block ownership proof, respectively.
S2: The S-CSP initializes both the rapid storage system and the file storage system
and set them to be .
S3: Each KM-CSP initializes a rapid storage system for block tags and a lightweight
storage system for holding convergent key shares, and sets them to be .
34
3.1.2 User Module:
In this module, Users are limited from having authentication and security to
access the detail which is presented in the ontology system. Before accessing or
searching the details user should have the account in that otherwise they should register
first.
3.1.3 Secure DeDuplication System:
To support authorized deduplication, the tag of a file F will be determined by

the file F and the privilege. To show the difference with traditional notation of tag, we
call it file token instead. To support authorized access, a secret key kp will be bounded
with a privilege to generate a file token. Let F,p= TagGen(F, kp) denote the token of F
that is only allowed to access by user with privilege p. In another word, the token F, p
could only be computed by the users with privilege p. As a result, if a file has been
uploaded by a user with a duplicate token F,p, then a duplicate check sent from
another user will be successful if and only if he also has the file F and privilege p. Such
a token generation function could be easily implemented as H(F, kp), where H()
denotes a cryptographic hash function.
Suppose User 1 is the first user who uploads a sensitive file F to the cloud
storage. She will independently choose a random AES key , and produces two cipher
texts. The first cipher text CF is generated by encrypting file F with encryption key
using AES method, and the size of CF is almost equal to the size of F; the second cipher
text C is generated by encrypting the short AES key with file F as the encryption key
using some custom encryption method, and the size of C is in O(j j) which is very
small.
3.1.4 Security Of Duplicate Check Token :
We consider several types of privacy we need protect, that is, unforgeability of

duplicate-check token: There are two types of adversaries, that is, external adversary
and internal adversary. The external adversary can be viewed as an internal adversary
without any privilege.
35
If a user has privilege p, it requires that he adversary cannot forge and output a
valid duplicate token with any other privilege p on any file F, where p does not match
p. Furthermore, it also requires that if the adversary does not make a request of token
with its own privilege from private cloud server, it cannot forge and output a valid
duplicate token with p many F that has been queried.
3.1.5 Send Key
Once the key request was received, the sender can send the key or he can decline
it. With this key and request id which was generated at the time of sending key request
the receiver can decrypt the message.
3.1.6 Performance Evaluation
The proposed method will be compared with the existing mechanism in order to
evaluate the performance of our proposed methodology.
3.2 ARCHITECTURE DIAGRAM
Figure 3.1 Architecture Diagram

36
3.3 DATA FLOW DIAGRAM
User
Send request to cloud server

to get file token
No
Is user is
authorize
d?
Yes
Send file token
Duplication check by user

using file token
No Is duplicate
found?
Yes
Upload Request authorization for

files
End
Figure 3.2 Data Flow Diagram

37
3.4 EXPERIMENTAL RESULTS
The performance evaluation is done to show the effectiveness of the proposed

algorithm by comparing it with the existing algorithm based on the performance metrics
called confidentiality level and the reliability level.
Confidentiality Level:
The confidentiality level defines the importance of data confidentiality that

stored on the cloud storage. The confidentiality level both at encoding time and at the
decoding time is shown in the following figure.
Fig.3.3. Confidentiality Level
In the above graph, the x axis plots the level at which confidentiality is
measured. And y axis plot the confidentiality level. From the graph it is clear that 0.68
confidetniality level achived at encoding level and 0.73 confidentiality achieved at
decoding level.
38
Reliability Level
Reliability level defines, the ability of an system to perform an particular task

with satisfaction of user requests. The reliability level at both encoding and decoding
level is compared in the following graph.
Figure3.4 Reliability Level
This graph assures that the encoding can be done at 0.7 reliability rate and decoding can
be performed at 0.76 reliability rate.
39
CHAPTER 4
CONCLUSION
Dekey, an efficient and reliable convergent key management scheme for secure
deduplication. Dekey applies deduplication among convergent keys and distributes
convergent key shares across multiple key servers, while preserving semantic security
of convergent keys and confidentiality of outsourced data In this paper, an important
security concern is addressed in cross-user client-side deduplication of encrypted files in
the cloud storage. Confidentiality of users sensitive files against both outside
adversaries and the honest-but-curious cloud storage server in the bounded leakage
model. On technique aspect, we enhanced and generalized the convergent encryption
method, and the resulting encryption scheme could support client-side deduplication of
encrypted file in the bounded leakage model.
40
APPENDIX 1
SOURCE CODE
STORAGE SERVER
public StorageServer()
System.out.println("*********STORAGE SERVER********");
try
logs=new Vector();
BufferedReader brs=new BufferedReader(new FileReader("Log.txt"));
String lins="";
while((lins=brs.readLine())!=null)
logs.add(lins.trim());
bws=new BufferedWriter(new FileWriter("Log.txt",true));
db=new database_conn();
pan.setLayout(null);
firstMenu.add(firstItem);
firstMenu.add(secondItem);
bar.add(firstMenu);
pan.add(bar);
bar.setBounds(0,0,600,35);
frm.setSize(600,650);
frm.setLocation(350,50);
41
jsp.setBounds(5,40,585,580);
firstItem.addActionListener(this);
secondItem.addActionListener(this);
tarea.setBackground(new Color(210,220,220));
tarea.setFont(new java.awt.Font("Bookman Old Style", 1, 18));
pan.add(jsp);
frm.add(pan);
frm.setVisible(true);
tarea.append("\t Storage Server\n\n");
tarea.append("Secure Deduplication with Efficient and Reliable Convergent Key
Management\n\n");
tarea.setEditable(false);
KEY SERVER
public KeyServer()
try
System.out.println("*********KEYSERVER********");
firstMenu.add(thirdItem);
42
bar.add(firstMenu);
pan.add(bar);
thirdItem.addActionListener(this);
pan.add(jsp);
frm.add(pan);
tarea.append("\t Key Server\n\n");
tarea.append(" Secure Deduplication with Efficient and Reliable Convergent Key
Management\n\n");
tarea.setEditable(false);
catch(Exception jj)
jj.printStackTrace();
}
43
class storageComm implements Runnable
ObjectOutputStream out=null;
ObjectInputStream in=null;
storageComm(Socket sc)
try
out=new ObjectOutputStream(sc.getOutputStream());
in=new ObjectInputStream(sc.getInputStream());
String kid=new String();
kid="KS"+ID;
out.writeObject(kid);
String rid=(String)in.readObject();
inputStr.add(in);
outputStr.add(out);
conId.add(rid);
tarea.append("\nConnection Established With Storage Server Which Id is : "+rid);
String iqry="insert into Key"+ID+" values('"+rid+"','SS')";
db.st.executeUpdate(iqry);
catch(Exception jj)
}
44
USER
public user()
System.out.println("*********USER********");
try
firstMenu.add(seventhItem);
firstMenu.add(fourthItem);
firstMenu.add(fifthItem);
firstMenu.add(sixthItem);
bar.add(firstMenu);
pan.add(bar);
lbl_mail.setBounds(620,80,200,30);
jsp1.setBounds(620,130,200,100);
btn_mail.setBounds(620,240,150,30);
45
fourthItem.addActionListener(this);
fifthItem.addActionListener(this);
sixthItem.addActionListener(this);
seventhItem.addActionListener(this);
btn_mail.addActionListener(this);
pan.add(jsp);
pan.add(lbl_mail);
pan.add(jsp1);
pan.add(btn_mail);
frm.add(pan);
lbl_mail.setEnabled(false);
jsp1.setEnabled(false);
btn_mail.setEnabled(false);
txt_mail.setEnabled(false);
tarea.append("\t User Process\n\n");
tarea.append(" Secure Deduplication with Efficient and Reliable Convergent
Key Management\n\n");
//tarea.setEditable(false);
}
46
catch(Exception jj)
public int conditionChk(int prime,int a,int b)
int ret=0;
try
//////////formula 4a2 +27b2 mod prime=0
int res=((4*(a*a))+(27*(b*b)))%prime;
// System.out.println("Result "+res);
if(res==0)
ret=0;
else
ret=1;
catch(Exception jj)
47
return ret;
CONSUMER
public Consumer()
{
try
{
System.out.println("*********CONSUMER********");
//firstMenu.add(fourthItem);
firstMenu.add(seventhItem);
firstMenu.add(sixthItem);
bar.add(firstMenu);
pan.add(bar);
fourthItem.addActionListener(this);
sixthItem.addActionListener(this);
seventhItem.addActionListener(this);
48
pan.add(jsp);
frm.add(pan);
tarea.append("\t Consumer Process\n\n");
tarea.append(" Secure Deduplication with Efficient and Reliable Convergent
Key Management\n\n");
//tarea.setEditable(false);
}
catch(Exception jj)
{
}
}
49
APPENDIX 2
SCREEN SHOTS
STORAGE SERVER SETUP

50
STORAGE SERVER INITIALIZATION

51
KEY SERVER SETUP
CONNECTION TO STORAGE SERVER

52
KEY SERVER INITIALIZATION

53
USER SETUP
54
CONNECTION TO KEY SERVER AND STORAGE SERVER

55
USER PROCESS
56
FILE UPLOAD
CONFIRMATION
57
CONSUMER SETUP
58
FILE DOWNLOAD BY CONSUMER

59
REFERENCES
[1] Jin Li, Xiaofeng Chen, Mingqiang Li, Jingwei Li, Patrick P.C. Lee, and Wenjing
Lou(2014) , Secure Deduplication with Efficient and Reliable Convergent Key
Management, IEEE TRANSACTIONS ON PARALLEL AND
DISTRIBUTED SYSTEMS, VOL. 25, NO. 6.
[2] Acharya.A, Uysal.M, and Saltz.J (1998), Active disks: Programming model,
algorithms and evaluation, in Proc. 8th Conf. Architectural Support for
Programming Languages and Operating System (ASPLOS), pp. 8191.
[3] Cholez.T, Chrisment.I, and Festor.O (2009), Evaluation of sybil attack

protection schemes in kad, in Proc. 3rd Int. Conf. Autonomous Infrastructure,
Management and Security, Berlin, Germany, pp. 7082.
[4] Shamir.A(1979), How to share a secret, Commun. ACM, vol. 22, no. 11, pp.
612613.
[5] Tang.Y, Lee P.P.C, Lui J.C.S, and Perlman.R (2010), FADE: Secure overlay
cloud storage with file assured deletion, in Proc. SecureComm.
[6] Wang.C, Wang.Q, Ren.K, and Lou.W (2010), Privacy-preserving public

auditing for storage security in cloud computing, in Proc. IEEE INFOCOM.
[7] Weil.S.A, Brandt.S.A, Miller.E.L, Long.D.D.D.E, and Maltzahn.C (2006),

Ceph: A scalable, high-performance distributed file system, in Proc. 7th
Symp. Operating Systems Design and Implementation (OSDI).
60
[8] Welch.B, Unangst.B, Abbasi.Z, Gibson.G, Mueller.BB, Small.J, Zelenka.J,

and Zhou.B(2008), Scalable performance of the panasas parallel file system,
in Proc. 6th USENIX Conf. File and Storage Technologies (FAST).
[9] Wickremesinghe.R, Chase.J, and Vitter.J(2002), Distributed computing with

load-managed active storage, in Proc. 11th IEEE Int. Symp. High Performance
Distributed Computing (HPDC), pp. 1323.
[10] Wolchok.S, HofmannO.S, Heninger.N, Felten.E.W, Halderman.J.A,

Rossbach.C.J, Waters.B, and Witchel.E(2010), Defeating vanish with low-cost
sybil attacks against large DHEs, in Proc. Network and Distributed System
Security Symp.

Secure Deduplication Main

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Secure Deduplication Main

Transféré par

Droits d'auteur :

Formats disponibles

1

The advent of cloud storage motivates enterprises and organizations to outsource

The convergent encryption provides a viable option to enforce data

1.1 BACKGROUND CONCEPTS

1.1.1 Cloud computing

Cloud computing is the delivery of computing as a service rather than a product,

Figure1.1 Cloud Computing

Cloud computing exhibits the following key characteristics:

Empowerment of end-users of computing resources by putting the provisioning

Agility improves with users' ability to re-provision technological infrastructure

API accessibility to software that enables machines to interact with cloud

Cost is claimed to be reduced and in a public cloud delivery model capital

Virtualization technology allows servers and storage devices to be shared and

Multi-tenancy enables sharing of resources and costs across a large pool of

o Centralization of infrastructure in locations with lower costs such as real

Scalability and Elasticity via dynamic on-demand provisioning of resources on a

Performance is monitored and consistent and loosely coupled architectures are

Security could improve due to centralization of data, increased security-focused

Maintenance of cloud computing applications is easier, because they do not need

1.1.1.2 Service Models

Cloud computing providers offer their services according to three fundamental

Figure 1.2 Service Models

1.1.1.2.1 Infrastructure as a Service

1.1.1.2.2 Platform as a Service

1.1.1.2.3 Software as a Service

1.1.1.3 Cloud clients

1.1.1.4 Deployment models

Figure 1.3 Deployment Models

1.1.1.4.1 Public cloud

1.1.1.4.2 Community cloud

Community cloud shares infrastructure between several organizations from a

1.1.1.4.3 Hybrid cloud

Hybrid cloud is a composition of two or more clouds private, community, or

1.1.1.4.4 Private cloud

Private cloud is infrastructure operated solely for a single organization, whether

Figure1.4 Cloud Computing - Sample Architecture

Cloud architecture, the systems architecture of the software systems involved in

1.1.1.6 Inter cloud and Cloud Engineering

The Inter cloud is an interconnected global "cloud of clouds" and an extension of

Cloud engineering is the application of engineering disciplines to cloud

In computing, data deduplication is a specialized data compression technique for

Storage-based data deduplication reduces the amount of storage needed for a

Virtual servers benefit from deduplication because it allows nominally separate

1.2 PROBLEM STATEMENT

1.3 OBJECTIVE OF THE PROJECT

1.4 SCOPE OF THE PROJECT

1.5 LITERATURE SURVEY

This paper introduces a baseline approach in which each user holds an

In [2] A. Acharya, M. Uysal, and J. Saltz, describes an active disks, which is a

Active Disk architectures which integrate significant processing power and

In [3] T. Cholez, I. Chrisment, and O. Festor, evaluates Sybil attacks in kad

Figure 1.5 KAD routing table scheme

In [4] A.Shamir introduces a method to share a secret. Generalizing the problem

(1) Knowledge of any k or more D i pieces makes D easily computable.

Efficient threshold schemes can be very helpful in the management of

An obvious solution is to store multiple copies of the key at different locations,

In [5] Y. Tang, P. P. C. Lee, J. C. S. Lui, and R. Perlman, describes that cloud

Implement a working prototype of FADE atop Amazon S3. Our implementation

Figure 1.6 The FADE Architecture

From users perspective, including both individuals and IT enterprises, storing

We motivate the public auditing system of data storage security in Cloud

In [7] S. A.Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn,

Traditional solutions, exemplified by NFS, provide a straightforward model in