Vous êtes sur la page 1sur 23

Seminar-I Report

On

Hybrid Cloud Approach for Secure Authorized


De-duplication
Submitted by
Ms.Tejaswini Bhamare
MASTER OF ENGINEERING IN
(Computer Engineering)
Under the Guidance of
Prof. B.R.Nandwalkar

DEPARTMENT OF COMPUTER ENGINEERING


Kalyani Charitable Trusts
Late G. N. Sapkal College of Engineering, Anjaneri, Nashik- 422212
Academic Year 2014-2015

Kalyani Charitable Trusts

Late G.N.Sapkal College of


Engineering

Certificate
This is to certify that the Seminar-I Entitled

Hybrid Cloud Approach For Secure Authorized Deduplication


Submitted by

Ms.Tejaswini Bhamare
M.E. (Computer Engineering)
Has successfully completed her Seminar I
Towards the partial fulfillment of
Masters Degree in Computer Engineering
Savitribai Phule Pune University
During the year 2014-2015

Prof.B.R. Nandwalkar
(Guide)

Prof.N.R.Wankhede

Dr.V.J.Gond

(HOD)

(Principal)

Contents
Contents

ii

List of Figures

iv

1 Introduction

1.1

Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.1

Justification of Problem . . . . . . . . . . . . . . . . . . . . . . .

1.1.2

Need of System . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.3

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.4

Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Literature Survey
2.1

Study of Existing Systems / Technologies . . . . . . . . . . . . . . . . . .

2.1.1

DupLESS Server-Aided Encryption for Deduplicated Storage . . .

2.1.2

Proofs of Ownership in Remote Storage Systems . . . . . . . . . .

2.1.3

Twin Clouds: An Architecture for Secure Cloud Computing . . .

2.1.4

Private Data Deduplication Protocols in Cloud Storage . . . . . .

2.2

Analysis of Existing Systems / technologies

. . . . . . . . . . . . . . . .

2.3

Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3.1

Encryption Files . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3.2

Confidential Encryption . . . . . . . . . . . . . . . . . . . . . . .

2.3.3

Proof of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Technical Details
3.1

10

Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

3.1.1

Identification protocol . . . . . . . . . . . . . . . . . . . . . . . .

10

3.1.2

Hash based Deduplication . . . . . . . . . . . . . . . . . . . . . .

10

ii

3.1.3

Roles of Entities

. . . . . . . . . . . . . . . . . . . . . . . . . . .

11

3.1.4

Operations performed on Hybrid Cloud . . . . . . . . . . . . . . .

13

3.2

Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

3.3

Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

4 Conclusion

17

iii

List of Figures
1.1

Architecture of Cloud Computing . . . . . . . . . . . . . . . . . . . . . .

1.2

Architecture of Hybrid Cloud . . . . . . . . . . . . . . . . . . . . . . . .

1.3

Notation used in paper . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1

Confidential Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2

Architecture of Authorized Deduplication . . . . . . . . . . . . . . . . . .

3.1

Implementation of hash algorithm . . . . . . . . . . . . . . . . . . . . . .

11

3.2

Hash Work Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

iv

Chapter 1
Introduction
1.1

Problem Definition

In computing, data deduplication is a specialized data compression technique for


eliminating duplicate copies of repeating data. Related and somewhat synonymous terms
are intelligent (data) compression and single-instance (data) storage. This technique is
used to improve storage utilization and can also be applied to network data transfers to
reduce the number of bytes that must be sent [?].In the deduplication process, unique
chunks of data, or byte patterns, are identified and stored during a process of analysis.
As the analysis continues, other chunks are compared to the stored copy and whenever a
match occurs, the redundant chunk is replaced with a small reference that points to the
stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even
thousands of times (the match frequency is dependent on the chunk size), the amount of
data that must be stored or transferred can be greatly reduced.
A Hybrid Cloud is a combined form of private clouds and public clouds in which some
critical data resides in the enterprises private cloud while other data is stored in and
accessible from a public cloud. Hybrid clouds seek to deliver the advantages of scalability,
reliability, rapid deployment and potential cos savings of public clouds with the security
and increased.control and management of private clouds. As cloud computing becomes
famous, an increasing amount of data is being stored in the cloud and used by users with
specified privileges, which define the access rights of the stored data.

Figure 1.1: Architecture of Cloud Computing

The critical challenge of cloud storage or cloud computing is the management of the
continuously increasing volume of data. Data deduplication or Single Instancing essentially refers to the elimination of redundant data. In the deduplication process, duplicate
data is deleted, leaving only one copy (single instance) of the data to be stored. However,
indexing of all data is still retained should that data ever be required. In general the
data deduplication eliminates the duplicate copies of repeating data.
The data is encrypted before outsourcing it on the cloud or network. This encryption
requires more time and space requirements to encode data. In case of large data storage
the encryption becomes even more complex and critical. By using the data deduplication
inside a hybrid cloud, the encryption will become simpler.
As we all know that the network is consist of abundant amount of data, which is being
shared by users and nodes in the network. Many large scale network uses the data cloud
to store and share their data on the network. The node or user, which is present in the
network have full rights to upload or download data over the network. But many times
different user uploads the same data on the network. Which will create a duplication
inside the cloud. If the user wants to retrieve the data or download the data from cloud,
every time he has to use the two encrypted files of same data. The cloud will do same
operation on the two copies of data files. Due to this the data confidentiality and the
security of the cloud get violated. It creates the burden on the operation of cloud.
2

To avoid this duplication of data and to maintain the confidentiality in the cloud we

Figure 1.2: Architecture of Hybrid Cloud

using the concept of Hybrid cloud. It is a combination of public and private cloud. Hybrid
cloud storage combines the advantages of scalability, reliability, rapid deployment and
potential cost savings of public cloud storage with the security and full control of private
cloud storage.

1.1.1

Justification of Problem

Data deduplication is the process of eliminating redundant data by comparing new


segments with segments already stored and only keeping one copy. The technology can
lead to a significant reduction in required storage space, especially in situations where
redundancy is high. As a result, data deduplication has firmly established itself in the
backup market.

1.1.2

Need of System

Data deduplication is a technique for reducing the amount of storage space an


organization needs to save its data. In most organizations, the storage systems contain
duplicate copies of many pieces of data. For example, the same file may be saved in
several different places by different users, or two or more files that arent identical may
still include much of the same data. Deduplication eliminates these extra copies by
saving just one copy of the data and replacing the other copies with pointers that lead
back to the original copy. Companies frequently use deduplication in backup and disaster
recovery applications, but it can be used to free up space in primary storage as well.

1.1.3

Applications

Hybrid clouds are mainly built to suit any of the IT environment or architecture,
whether it might be any enterprise wide IT network or any department. Public data
which is stored can be analyzed from statistical analyses which is done by social media,
government entities can be used to enhance and analyze their own corporate data stand
which is internal to gain the most form of perusing hybrid cloud Benefits. But analysis of
big data and high performance computing that is involved between clouds is challenging.

1.1.4

Contribution

For solving the problems of deduplication we consider a hybrid cloud architecture


consisting of a public cloud and a private cloud. Different privilege levels have been allocated to securely perform duplicate check in private cloud. A new deduplication system
supporting differential duplicate check is proposed under this hybrid cloud architecture
where the S-CSP resides in the public cloud.

Figure 1.3: Notation used in paper

Chapter 2
Literature Survey
2.1

Study of Existing Systems / Technologies

2.1.1

DupLESS Server-Aided Encryption for Deduplicated Storage


DupLess: Server aided encryption for deduplicated storage for cloud stor-

age service provider like Mozy, Dropbox, and others perform deduplication to save
space by only storing one copy of each file uploaded .Message lock encryption is
used to resolve the problem of clients encrypt their file however the saving are lock.
Dupless is used to provide secure deduplicated storage as well as storage resisting
brute-force attacks. Clients encrypt under message-based keys obtained from a
key-server via an oblivious PRF protocol in dupless server. It allow clients to store
encrypted data with an existing service, have the service occurs deduplication on
their on the part , and yet achieves strong confidentiality guarantees. It show that
encryption for deduplicated storage can successfully reach desired performance and
space savings close to that of using the storage service with plaintext data.

Characteristics:

1. More Security .
2. Easily-deployed solution for encryption that supports deduplication.
3. User Friendly: Use command-line client that supports both Dropbox and Google
Drive.
4. Resolve the problem of message lock Encryption.

2.1.2

Proofs of Ownership in Remote Storage Systems

It stores only the single copy of the duplicate data. Client-side deduplication tries to
identify deduplication chance already at the client and save the bandwidth of uploading
copies of existing files to the server.To overcome the attacks Shai Halevi1, Danny Harnik,
Benny Pinkas, and Alexandra Shulman-Peleg proposes the Proof of ownership which lets
a client efficiently prove to a server that that the client keep a file, rather than just some
short information about it present solutions based on Merkle trees and specific encodings,
and analyse their security.
Characteristics:

1. To identify the attacks that exploit client-side deduplication..


2. Proofs of ownership provide the rigorous security.
3. Rigorous efficiency requirements of Peta-byte scale storage systems.

2.1.3

Twin Clouds: An Architecture for Secure Cloud Computing

S. Bugiel, S. Nurnberger, A. Sadeghi, and T. Schneider proposed architecture for


secure outsourcing of data and arbitrary computations to an untrusted commodity cloud.
In come towards, the user communicates with a trusted cloud. Which encrypts as well
as verifies the data stored and operations occurred in the untrusted cloud .It divide the
6

computations such that the trusted cloud is used for security-critical operations in the
less time-critical setup phase, whereas queries to the outsourced data are processed in
parallel by the fast cloud on encrypted data.

2.1.4

Private Data Deduplication Protocols in Cloud Storage

Most important issue in the cloud storage is utilization of the storage capacity.
In this paper, there are two categories of data deduplication strategy, and extend the
fault-tolerant digital signature scheme proposed by Zhang on examining redundancy of
blocks to achieve the data deduplication. The proposed scheme in this paper not only
reduces the cloud storage capacity, but also improves the speed of data deduplication.
Furthermore, the signature is computed for every uploaded file for verifying the integrity
of files.

2.2

Analysis of Existing Systems / technologies


Our system is designed to solve the differential privilege problem in secure dedupli-

cation. The security will be analysed in terms of two aspects, that is, the authorization
of duplicate check and the confidentiality of data.Some basic tools have been used to
construct the secure deduplication, which are assumed to be secure. These basic tools
include the convergent encryption scheme, symmetric encryption scheme, and the PoW
scheme. Based on this assumption, we show that systems are secure with respect to the
following security analysis.

2.3

Proposed System
In the proposed system we are achieving the data deduplication by providing the

proof of data by the data owner. This proof is used at the time of uploading of the
file. Each file uploaded to the cloud is also bounded by a set of privileges to specify
which kind of users is allowed to perform the duplicate check and access the files. Before
submitting his duplicate check request for some file, the user needs to take this file and
his own privileges as inputs. The user is able to find a duplicate for this file if and only
if there is a copy of this file and a matched privilege stored in cloud.
7

2.3.1

Encryption Files

Here we are using the common secret key k to encrypt as well as decrypt data. This
will use to convert the plain text to cipher text and again cipher text to plain text. Here
we have used three basic functions, KeyGenSE: k is the key generation algorithm that
generates using security parameter 1.
EncSE (k, M): C is the symmetric encryption algorithm that takes the secret and
message M and then outputs the ciphertext C;
DecSE (k, C): M is the symmetric decryption algorithm that takes the secret and
ciphertext C and then outputs the original message M.

2.3.2

Confidential Encryption

It provides data confidentiality in deduplication. A user derives a convergent key


from each original data copy and encrypts the data copy with the convergent key. In
addition, the user also derives a tag for the data copy, such that the tag will be used to
detect duplicates.

Figure 2.1: Confidential Encryption

2.3.3

Proof of Data

It provides data confidentiality in deduplication. A user derives a convergent key


from each original data copy and encrypts the data copy with the convergent key. In
addition, the user also derives a tag for the data copy, such that the tag will be used to
detect duplicates.

Figure 2.2: Architecture of Authorized Deduplication

Chapter 3
Technical Details
3.1
3.1.1

Concept
Identification protocol

An identification protocol can be described with two phases: Proof and Verify. In the
stage of Proof, a prover/user U can demonstrate his identity to a verifier by performing some identification proof related to his identity. The input of the prover/user is his
private key skU that is sensitive information such as private key of a public key in his
certificate or credit card number etc. that he would not like to share with the other
users. The verifier performs the verification with input of public information pkU related
to skU. At the conclusion of the protocol, the verifier outputs either accept or reject to
denote whether the proof is passed or not. There are many efficient identification protocols in literature, including certificate-based, identity-based identification etc.

3.1.2

Hash based Deduplication

Hash based data de-duplication methods use a hashing algorithm to identify chunks of
data. Commonly used algorithms are Secure Hash Algorithm (SHA-1) and MessageDigest Algorithm (MD5). When data is processed by a hashing algorithm, a hash is
created that represents the data. A hash is a bit string (128 bits for MD5 and 160 bits
for SHA-1) that represents the data processed. If you processed the same data through
the hashing algorithm multiple times, the same hash is created each time.
Hash based de-duplication breaks data into chunks, either fixed or variable length, and
10

processes the chunk with the hashing algorithm to create a hash. If the hash already
exists, the data is deemed to be a duplicate and is not stored. If the hash does not exist,
then the data is stored and the hash index is updated with the new hash.
In Figure 5, data chunks A, B, C, D, and E are processed by the hash algorithm and
creates hashes Ah, Bh, Ch, Dh, and Eh; for purposes of this example, we assume this is
all new data. Later, chunks A, B, C, D, and F are processed. F generates a new hash
Fh. Since A, B, C, and D generated the same hash, the data is presumed to be the same
data, so it is not stored again. Since F generates a new hash, the new hash and new data
are stored.

Figure 3.1: Implementation of hash algorithm

Flow Chart

3.1.3

Roles of Entities

S-CSP
The purpose of this entity to work as a data storage service in public cloud.On the
half of the user S-CSP store the data.The S-CSP eliminate the duplicate data using
deduplication and keep the unique data as it is.S-SCP entity is used to reduce the
storage cost.S-CSP han abundant storage capacity and computational power.When
user send respective token for accessing his file from public cloud S-CSP matches
this token with internally if it matched then an then only he send the file or ciphertext Cf with token, otherwise he send abort signal to user.After receiving file user
use convergent key KF to decrypt the file.
11

Figure 3.2: Hash Work Flow Chart

Data User
A user is an entity that want to access the data or files from S-SCP.User generate
the key and store that key in private cloud.In storage system supporting deduplication,The user only upload unique data but do not upload any duplicate data
to save the upload bandwidth,which may be owned by the same user or different
users.Each file is protected by convergent encryption key and can access by only
athorized person.In our system user must need to register in private cloud for storing token with respective file which are store on public cloud.When he want to
access that file he access respective token from private cloud and then access his
files from public cloud.token consist of file content F and convergent key KF.

Private Cloud
In general for providing more security user can use the private cloud instead of
public cloud.User store the generated key in private cloud.At the time of downloading system ask the key to download the file.User can not store the secrete key
internally.for providing proper protection to key we use private cloud.Private cloud
only store the convergent key with respective file.When user want to access the key
he first check authority of user then an then provide key.

12

Public Cloud
Public cloud entity is used for the storage purpose.User upload the files in public
cloud.Public cloud is similar as S-CSP.When the user want to download the files
from public cloud,it will be ask the key which is generated or stored in private
cloud.When the users key is match with files key at that time user can download
the file,without key user can not access the file.Only authorized user can access
the file.In public cloud all files are stored in encrypted format.If any chance unauthorized person hack our file,but without the secrete or convergent key he doesnt
access original file.On public cloud there are lots of files are store each user access
its respective file if its token matches with S-CSP server token.

3.1.4

Operations performed on Hybrid Cloud

File Uploading :
When user want to upload the file to the public cloud then user first encrypt the
file which is to be upload by make a use of the symmetric key,and send it to the
Public cloud. At the same time user generates the key for that file and sends it to
the private cloud. in this way user can upload the file in to the public cloud.

File Downloading
When user wants to download the file that he/she has upload on the public cloud.he/she
make a request to the public cloud. then public cloud provide a list of files that
many users are upload on it.Among that user select one of the file form the list of
files and enter the download option.at that time private cloud sends a message that
enter the key for the file generated by the user.then user enters the key for the file
that he/she is generated.then private cloud checks the key for that file and if the
key is correct that means the user is valid.only then and then the user can download the file from the public cloud otherwise user cant download the file. When
user download the file from the public cloud it is in the encrypted format then user
decrypt that file by using the same symmetric key.

13

3.2

Design Goals

1. Differential Authorization
Each authorized user is able to get his/her individual token of his file to perform
duplicate check based on his privileges. Under this assumption, any user cannot
generate a token for duplicate check out of his privileges or without the aid from
the private cloud server.

2. Authorized Duplicate Check


Authorized user is able to use his/her individual private keys to generate query for
certain file and the privileges he/she owned with the help of private cloud, while
the public cloud performs duplicate check directly and tells the user if there is any
duplicate.

3. Unforgeability of file token/duplicate-check token


Unauthorized users without appropriate privileges or file should be prevented from
getting or generating the file tokens for duplicate check of any file stored at the
S-CSP. The duplicate check token of users should be issued from the private cloud
server in our scheme.

3.3

Performance Analysis

We implement a prototype of the proposed authorized deduplication system, in which


we model three entities as separate C++ programs. A Client program is used to model
the data users to carry out the file upload process. A Private Server program is used
to model the private cloud which manages the private keys and handles the file token
computation. A Storage Server program is used to model the S-CSP which stores and
deduplicates files. We implement cryptographic operations of hashing and encryption
with the OpenSSL library.We also implement the communication between the entities
based on HTTP, using GNU Libmicrohttpd and libcurl .Thus, users can issue HTTP
Post requests to the servers.

14

Our implementation of the Client provides the following function calls to support
token generation and deduplication along the file upload process.

FileTag(File) - It computes SHA-1 hash of the File as File Tag;


TokenReq(Tag, UserID) - It requests the Private Server for File Token generation
with the File Tag and User ID;
DupCheckReq(Token) - It requests the Storage Server for Duplicate Check of the
File by sending the file token received from private server;
ShareTokenReq(Tag, Priv.) - It requests the Private Server to generate the Share
File Token with the File Tag and Target Sharing Privilege Set;
FileEncrypt(File) - It encrypts the File with Convergent Encryption using 256-bit
AES algorithm in cipher block chaining (CBC) mode, where the convergent key is
from SHA-256 Hashing of the file;
FileUploadReq(FileID, File, Token) It uploads the File Data to the Storage Server
if the file is Unique and updates the File Token stored.
Our implementation of the Private Server includes corresponding request handlers
for the token generation and maintains a key storage with Hash Map.
TokenGen(Tag, UserID) - It loads the associated privilege keys of the user and
generate the token with HMAC-SHA-1 algorithm;

15

ADVANTAGES OF AUTHORISED DEDUPLICATION SYSTEM

1. The client is permitted to perform the duplicate copy check for records selected
with the particular subject.

2. The complex subject to help stronger security by encoding the record with distinct
privilege keys.

3. Decrease the storage space of the tags for reliability check. To strengthen the security of deduplication and ensure the data privacy.

16

Chapter 4
Conclusion
The idea of Authorized Data deduplication was proposed to secure the information
security by counting differential benefits of clients in the copy check. Yan Kit Li et al
additionally exhibited a few new deduplication developments supporting approved copy
check in hybrid cloud construction modeling, in which the copy check tokens of documents are created by the private cloud server having private keys. Security examination
shows that our plans are secure as far as insider and outsider attacks determined in the
proposed security model. As an issue verification of idea, they actualized a model of
the proposed approved copy check plan and behavior test bed investigates their model.
They indicated that their authorized copy check plan brings about insignificant overhead
comparing convergent encryption and system exchange.
We design and implement a new system which could protect the security for predicatable message. The main idea of our technique is that the novel encryption key generation
algorithm. For simplicity, we will use the hash functions to define the taggeneration functions and convergent keys in this section. In traditional convergent encryption, to support
duplicate check, the key is derived from the file F by using some cryptographic hash function kF = H(F). To avoid the deterministic key generation, the encryption key kF for
file F in our system will be generated with the aid of the private key cloud server with
privilege key kp. The encryption key can be viewed as the form of kF;p =H0(H(F), kp)
H2(F), where H0,H and H2 are all cryptographic hash functions. The file F is encrypted
with another key k, while k will be encrypted with kF;p. In this way, both the private
cloud server and S-CSP cannot decrypt the ciphertext.

17

Bibliography

18

Vous aimerez peut-être aussi