De Duplication

INTRODUCTION
• Cloud services can be divided into three categories based on the

amount of control retained by the cloud providers.
– Software as a Service (SaaS)
– Platform as a Service (PaaS)
– Infrastructure as a Service (IaaS)

CON ...
• Software as a Service (SaaS) retains the most control and allows
customers to access software functionality on demand, but little else.
• Platform as a Service (PaaS) provides customers with a choice of
execution environment, development tools, etc., but not the ability
to administer their own Operating System (OS).
• Infrastructure as a Service (IaaS) relinquishes the most control by
providing customers with the ability to install and administer their
own choice of OS and install and run anything on the provided
virtualised hardware.
ABSTRACT
• Data deduplication is a technique for eliminating duplicate copies of
data, and has been widely used in cloud storage to reduce storage
space and upload bandwidth. However, there is only one copy for
each file stored in cloud even if such a file is owned by a huge
number of users.
• This project makes the first attempt to formalize the notion of
distributed reliable deduplication system. New distributed
deduplication systems with higher reliability in which the data
chunks are distributed across multiple cloud servers.
• The security requirements of data confidentiality and tag
consistency are also achieved by introducing a deterministic secret
sharing scheme in distributed storage systems, instead of using
convergent encryption as in previous deduplication systems.
LITERATURE REVIEW
1. Reclaiming Space from Duplicate Files in a Server
less Distributed File System
Author’s Name: John R. Douceur, Atul Adya, William J. Bolosky,
• This paper presents a mechanism to reclaim space from this
incidental duplication to make it available for controlled file
replication.
• Their mechanism includes convergent encryption, which enables
duplicate files to coalesced into the space of a single file, even if the
files are encrypted with different users’ keys.
Limitations:
• Relocating the replicas of files with identical content to a common set
of storage machines.
• Coalescing the identical files to reclaim storage space, while
maintaining the semantics of separate files.
2. Fast and Secure Laptop Backups with Encrypted
De-duplication
Author’s Name: Paul Anderson and Le Zhang
• This paper describes an algorithm which takes advantage of the data

which is common between users to increase the speed of backups, and
reduce the storage requirements.
• This algorithm supports client-end per-user encryption which is necessary
for confidential personal data.
Limitations:
• In particular, a change to any node implies a change to all of the ancestor
nodes up to the root.
• It is extremely difficult to estimate the impact of this in a production
environment.
3. A secure data deduplication scheme for cloud
storage
Author’s Name: Jan Stanek, Alessandro Sorniotti, Elli Androulaki
• This paper presents a novel idea that differentiates data according to

their popularity.
• This way, data de-duplication can be effective for popular data,
whilst semantically secure encryption protects unpopular content.
Limitations:
• The design of storage efficiency functions in general and of
deduplication functions in particular that do not lose their
effectiveness in presence of end-to-end security is therefore still an
open problem.
EXISTING SYSTEM
• Users are keeping multiple data copies with the same content,
it leads to data redundancy.
• In existing work file is eliminated base on the file name, the
content of the particular file is not verified.
DISADVANTAGES OF EXISTING SYSTEM
• No existing work on secure deduplication can properly address

the reliability and tag consistency problem in distributed
storage systems.
• It leads to data redundancy and data lose.
PROPOSED WORK
• With the explosive growth of digital data, deduplication techniques

are widely employed to backup data and minimize network and
storage overhead by detecting and eliminating redundancy among
data.
• Instead of keeping multiple data copies with the same content,
deduplication eliminates redundant data by keeping only one
physical copy and referring other redundant data to that copy.
• Proposed work describes the concept of file-level deduplication,
which discovers redundancies between different files and removes
these redundancies to reduce capacity demands.
ADVANTAGES OF PROPOSED SYSTEM
• As a result, deduplication system improves storage utilization while

reducing reliability.
• It increases storage area in cloud environment
• Number of user can upload the files with perfect bandwidth
utilization.
• Confidentiality, reliability and integrity can be achieved in our
proposed system.
SYSTEM SPECIFICATIONS
HARDWARE DESCRIPTION
• Processor : INTEL Pentium 4

• RAM : 2 GB
• Hard Disk Drive : 500 GB
• Key Board : Standard 128 Keys
• Monitor : 19.5 “ TFT Monitor
• Mouse : Logitech Serial Mouse
SOFTWARE DESCRIPTION
• Operating System : Windows 7

• Front- End : Visual Studio 2012
• Back- End : SQL SERVER 2005
SYSTEM FLOW DIAGRAM
IMPLEMENTATION
• User Registration and User Login
• Cloud Service Provider
• Data Users Module
• File Upload &Download

USER REGISTRATION AND USER LOGIN
• In this module if a user wants to access the data which is stored in a

cloud, he/she should register their details first.
• These details are maintained in a Database. If the user is an
authorized user, he/she can download the file by using file id which
has been stored by data owner when it was uploading.
• Owner can permit access or deny access for accessing the data. So
users can able to access his/her account by the corresponding data
owner.
• If owner does not allow, user can’t able to get the data.
CLOUD SERVICE PROVIDER
• In this module, we develop Cloud Service Provider module.

This is an entity that provides a data storage service in public
cloud.
• The S-CSP provides the data outsourcing service and stores
data on behalf of the users.
• To reduce the storage cost, the S-CSP eliminates the storage of
redundant data via deduplication and keeps only unique data.
• In this paper, we assume that S-CSP is always online and has
abundant storage capacity and computation power.
DATA USERS MODULE
• A user is an entity that wants to outsource data storage to the

S-CSP and access the data later.
• In a storage system supporting deduplication, the user only
uploads unique data but does not upload any duplicate data to
save the upload bandwidth, which may be owned by the same
user or different users.
• In the authorized deduplication system, each user is issued a
set of privileges in the setup of the system.
FILE UPLOAD
• In this module Owner uploads the file(along with meta data) into
database, with the help of this metadata and its contents, the end
user has to download the file.
Admin login form
File upload form
File upload result form
File Exited
User download form
REFERENCES
• Gantz and D. Reinsel. (2012, Dec.). The digital universe in 2020: Big
data, bigger digital shadows, and biggest growth in the fareast.
• M. O. Rabin, “Fingerprinting by random polynomials,” Center for Res.
Comput. Technol., Harvard Univ., Tech. Rep. TR-CSE-03-01,1981.
• J. R. Douceur, A. Adya, W. J. Bolosky and M. Theimer, “Reclaiming
space from duplicate files in a serverless distributed file system,” in
Proc. 22nd Int. Conf. Distrib. Comput. Syst., 2002, pp. 617–624.
• M. Bellare, S. Keelveedhi, and T. Ristenpart, “Dupless: Server aided
encryption for deduplicated storage,” in Proc. 22nd USENIX Conf.
Secur. Symp., 2013, pp. 179–194.
• M. Bellare, S. Keelveedhi, and T. Ristenpart, “Message-locked
encryption and secure deduplication,” in Proc. EUROCRYPT, 2013, pp.
296–312.

De Duplication

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

De Duplication

Transféré par

Droits d'auteur :

Formats disponibles

INTRODUCTION

• Cloud services can be divided into three categories based on the

– Software as a Service (SaaS)

– Platform as a Service (PaaS)

– Infrastructure as a Service (IaaS)

• This paper describes an algorithm which takes advantage of the data

Author’s Name: Jan Stanek, Alessandro Sorniotti, Elli Androulaki

• This paper presents a novel idea that differentiates data according to

• No existing work on secure deduplication can properly address

• With the explosive growth of digital data, deduplication techniques

• As a result, deduplication system improves storage utilization while

• Processor : INTEL Pentium 4

• Operating System : Windows 7

• User Registration and User Login

• Cloud Service Provider

• Data Users Module

• File Upload &Download

• In this module if a user wants to access the data which is stored in a

• In this module, we develop Cloud Service Provider module.

• A user is an entity that wants to outsource data storage to the

Vous aimerez peut-être aussi