Académique Documents
Professionnel Documents
Culture Documents
On
Certificate
This is to certify that the Seminar-I Entitled
Ms.Tejaswini Bhamare
M.E. (Computer Engineering)
Has successfully completed her Seminar I
Towards the partial fulfillment of
Masters Degree in Computer Engineering
Savitribai Phule Pune University
During the year 2014-2015
Prof.B.R. Nandwalkar
(Guide)
Prof.N.R.Wankhede
Dr.V.J.Gond
(HOD)
(Principal)
Contents
Contents
ii
List of Figures
iv
1 Introduction
1.1
Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1
Justification of Problem . . . . . . . . . . . . . . . . . . . . . . .
1.1.2
Need of System . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.4
Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Literature Survey
2.1
2.1.1
2.1.2
2.1.3
2.1.4
2.2
. . . . . . . . . . . . . . . .
2.3
Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1
Encryption Files . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2
Confidential Encryption . . . . . . . . . . . . . . . . . . . . . . .
2.3.3
Proof of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Technical Details
3.1
10
Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
3.1.1
Identification protocol . . . . . . . . . . . . . . . . . . . . . . . .
10
3.1.2
10
ii
3.1.3
Roles of Entities
. . . . . . . . . . . . . . . . . . . . . . . . . . .
11
3.1.4
13
3.2
Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
3.3
Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
4 Conclusion
17
iii
List of Figures
1.1
1.2
1.3
2.1
Confidential Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
3.1
11
3.2
12
iv
Chapter 1
Introduction
1.1
Problem Definition
The critical challenge of cloud storage or cloud computing is the management of the
continuously increasing volume of data. Data deduplication or Single Instancing essentially refers to the elimination of redundant data. In the deduplication process, duplicate
data is deleted, leaving only one copy (single instance) of the data to be stored. However,
indexing of all data is still retained should that data ever be required. In general the
data deduplication eliminates the duplicate copies of repeating data.
The data is encrypted before outsourcing it on the cloud or network. This encryption
requires more time and space requirements to encode data. In case of large data storage
the encryption becomes even more complex and critical. By using the data deduplication
inside a hybrid cloud, the encryption will become simpler.
As we all know that the network is consist of abundant amount of data, which is being
shared by users and nodes in the network. Many large scale network uses the data cloud
to store and share their data on the network. The node or user, which is present in the
network have full rights to upload or download data over the network. But many times
different user uploads the same data on the network. Which will create a duplication
inside the cloud. If the user wants to retrieve the data or download the data from cloud,
every time he has to use the two encrypted files of same data. The cloud will do same
operation on the two copies of data files. Due to this the data confidentiality and the
security of the cloud get violated. It creates the burden on the operation of cloud.
2
To avoid this duplication of data and to maintain the confidentiality in the cloud we
using the concept of Hybrid cloud. It is a combination of public and private cloud. Hybrid
cloud storage combines the advantages of scalability, reliability, rapid deployment and
potential cost savings of public cloud storage with the security and full control of private
cloud storage.
1.1.1
Justification of Problem
1.1.2
Need of System
1.1.3
Applications
Hybrid clouds are mainly built to suit any of the IT environment or architecture,
whether it might be any enterprise wide IT network or any department. Public data
which is stored can be analyzed from statistical analyses which is done by social media,
government entities can be used to enhance and analyze their own corporate data stand
which is internal to gain the most form of perusing hybrid cloud Benefits. But analysis of
big data and high performance computing that is involved between clouds is challenging.
1.1.4
Contribution
Chapter 2
Literature Survey
2.1
2.1.1
age service provider like Mozy, Dropbox, and others perform deduplication to save
space by only storing one copy of each file uploaded .Message lock encryption is
used to resolve the problem of clients encrypt their file however the saving are lock.
Dupless is used to provide secure deduplicated storage as well as storage resisting
brute-force attacks. Clients encrypt under message-based keys obtained from a
key-server via an oblivious PRF protocol in dupless server. It allow clients to store
encrypted data with an existing service, have the service occurs deduplication on
their on the part , and yet achieves strong confidentiality guarantees. It show that
encryption for deduplicated storage can successfully reach desired performance and
space savings close to that of using the storage service with plaintext data.
Characteristics:
1. More Security .
2. Easily-deployed solution for encryption that supports deduplication.
3. User Friendly: Use command-line client that supports both Dropbox and Google
Drive.
4. Resolve the problem of message lock Encryption.
2.1.2
It stores only the single copy of the duplicate data. Client-side deduplication tries to
identify deduplication chance already at the client and save the bandwidth of uploading
copies of existing files to the server.To overcome the attacks Shai Halevi1, Danny Harnik,
Benny Pinkas, and Alexandra Shulman-Peleg proposes the Proof of ownership which lets
a client efficiently prove to a server that that the client keep a file, rather than just some
short information about it present solutions based on Merkle trees and specific encodings,
and analyse their security.
Characteristics:
2.1.3
computations such that the trusted cloud is used for security-critical operations in the
less time-critical setup phase, whereas queries to the outsourced data are processed in
parallel by the fast cloud on encrypted data.
2.1.4
Most important issue in the cloud storage is utilization of the storage capacity.
In this paper, there are two categories of data deduplication strategy, and extend the
fault-tolerant digital signature scheme proposed by Zhang on examining redundancy of
blocks to achieve the data deduplication. The proposed scheme in this paper not only
reduces the cloud storage capacity, but also improves the speed of data deduplication.
Furthermore, the signature is computed for every uploaded file for verifying the integrity
of files.
2.2
cation. The security will be analysed in terms of two aspects, that is, the authorization
of duplicate check and the confidentiality of data.Some basic tools have been used to
construct the secure deduplication, which are assumed to be secure. These basic tools
include the convergent encryption scheme, symmetric encryption scheme, and the PoW
scheme. Based on this assumption, we show that systems are secure with respect to the
following security analysis.
2.3
Proposed System
In the proposed system we are achieving the data deduplication by providing the
proof of data by the data owner. This proof is used at the time of uploading of the
file. Each file uploaded to the cloud is also bounded by a set of privileges to specify
which kind of users is allowed to perform the duplicate check and access the files. Before
submitting his duplicate check request for some file, the user needs to take this file and
his own privileges as inputs. The user is able to find a duplicate for this file if and only
if there is a copy of this file and a matched privilege stored in cloud.
7
2.3.1
Encryption Files
Here we are using the common secret key k to encrypt as well as decrypt data. This
will use to convert the plain text to cipher text and again cipher text to plain text. Here
we have used three basic functions, KeyGenSE: k is the key generation algorithm that
generates using security parameter 1.
EncSE (k, M): C is the symmetric encryption algorithm that takes the secret and
message M and then outputs the ciphertext C;
DecSE (k, C): M is the symmetric decryption algorithm that takes the secret and
ciphertext C and then outputs the original message M.
2.3.2
Confidential Encryption
2.3.3
Proof of Data
Chapter 3
Technical Details
3.1
3.1.1
Concept
Identification protocol
An identification protocol can be described with two phases: Proof and Verify. In the
stage of Proof, a prover/user U can demonstrate his identity to a verifier by performing some identification proof related to his identity. The input of the prover/user is his
private key skU that is sensitive information such as private key of a public key in his
certificate or credit card number etc. that he would not like to share with the other
users. The verifier performs the verification with input of public information pkU related
to skU. At the conclusion of the protocol, the verifier outputs either accept or reject to
denote whether the proof is passed or not. There are many efficient identification protocols in literature, including certificate-based, identity-based identification etc.
3.1.2
Hash based data de-duplication methods use a hashing algorithm to identify chunks of
data. Commonly used algorithms are Secure Hash Algorithm (SHA-1) and MessageDigest Algorithm (MD5). When data is processed by a hashing algorithm, a hash is
created that represents the data. A hash is a bit string (128 bits for MD5 and 160 bits
for SHA-1) that represents the data processed. If you processed the same data through
the hashing algorithm multiple times, the same hash is created each time.
Hash based de-duplication breaks data into chunks, either fixed or variable length, and
10
processes the chunk with the hashing algorithm to create a hash. If the hash already
exists, the data is deemed to be a duplicate and is not stored. If the hash does not exist,
then the data is stored and the hash index is updated with the new hash.
In Figure 5, data chunks A, B, C, D, and E are processed by the hash algorithm and
creates hashes Ah, Bh, Ch, Dh, and Eh; for purposes of this example, we assume this is
all new data. Later, chunks A, B, C, D, and F are processed. F generates a new hash
Fh. Since A, B, C, and D generated the same hash, the data is presumed to be the same
data, so it is not stored again. Since F generates a new hash, the new hash and new data
are stored.
Flow Chart
3.1.3
Roles of Entities
S-CSP
The purpose of this entity to work as a data storage service in public cloud.On the
half of the user S-CSP store the data.The S-CSP eliminate the duplicate data using
deduplication and keep the unique data as it is.S-SCP entity is used to reduce the
storage cost.S-CSP han abundant storage capacity and computational power.When
user send respective token for accessing his file from public cloud S-CSP matches
this token with internally if it matched then an then only he send the file or ciphertext Cf with token, otherwise he send abort signal to user.After receiving file user
use convergent key KF to decrypt the file.
11
Data User
A user is an entity that want to access the data or files from S-SCP.User generate
the key and store that key in private cloud.In storage system supporting deduplication,The user only upload unique data but do not upload any duplicate data
to save the upload bandwidth,which may be owned by the same user or different
users.Each file is protected by convergent encryption key and can access by only
athorized person.In our system user must need to register in private cloud for storing token with respective file which are store on public cloud.When he want to
access that file he access respective token from private cloud and then access his
files from public cloud.token consist of file content F and convergent key KF.
Private Cloud
In general for providing more security user can use the private cloud instead of
public cloud.User store the generated key in private cloud.At the time of downloading system ask the key to download the file.User can not store the secrete key
internally.for providing proper protection to key we use private cloud.Private cloud
only store the convergent key with respective file.When user want to access the key
he first check authority of user then an then provide key.
12
Public Cloud
Public cloud entity is used for the storage purpose.User upload the files in public
cloud.Public cloud is similar as S-CSP.When the user want to download the files
from public cloud,it will be ask the key which is generated or stored in private
cloud.When the users key is match with files key at that time user can download
the file,without key user can not access the file.Only authorized user can access
the file.In public cloud all files are stored in encrypted format.If any chance unauthorized person hack our file,but without the secrete or convergent key he doesnt
access original file.On public cloud there are lots of files are store each user access
its respective file if its token matches with S-CSP server token.
3.1.4
File Uploading :
When user want to upload the file to the public cloud then user first encrypt the
file which is to be upload by make a use of the symmetric key,and send it to the
Public cloud. At the same time user generates the key for that file and sends it to
the private cloud. in this way user can upload the file in to the public cloud.
File Downloading
When user wants to download the file that he/she has upload on the public cloud.he/she
make a request to the public cloud. then public cloud provide a list of files that
many users are upload on it.Among that user select one of the file form the list of
files and enter the download option.at that time private cloud sends a message that
enter the key for the file generated by the user.then user enters the key for the file
that he/she is generated.then private cloud checks the key for that file and if the
key is correct that means the user is valid.only then and then the user can download the file from the public cloud otherwise user cant download the file. When
user download the file from the public cloud it is in the encrypted format then user
decrypt that file by using the same symmetric key.
13
3.2
Design Goals
1. Differential Authorization
Each authorized user is able to get his/her individual token of his file to perform
duplicate check based on his privileges. Under this assumption, any user cannot
generate a token for duplicate check out of his privileges or without the aid from
the private cloud server.
3.3
Performance Analysis
14
Our implementation of the Client provides the following function calls to support
token generation and deduplication along the file upload process.
15
1. The client is permitted to perform the duplicate copy check for records selected
with the particular subject.
2. The complex subject to help stronger security by encoding the record with distinct
privilege keys.
3. Decrease the storage space of the tags for reliability check. To strengthen the security of deduplication and ensure the data privacy.
16
Chapter 4
Conclusion
The idea of Authorized Data deduplication was proposed to secure the information
security by counting differential benefits of clients in the copy check. Yan Kit Li et al
additionally exhibited a few new deduplication developments supporting approved copy
check in hybrid cloud construction modeling, in which the copy check tokens of documents are created by the private cloud server having private keys. Security examination
shows that our plans are secure as far as insider and outsider attacks determined in the
proposed security model. As an issue verification of idea, they actualized a model of
the proposed approved copy check plan and behavior test bed investigates their model.
They indicated that their authorized copy check plan brings about insignificant overhead
comparing convergent encryption and system exchange.
We design and implement a new system which could protect the security for predicatable message. The main idea of our technique is that the novel encryption key generation
algorithm. For simplicity, we will use the hash functions to define the taggeneration functions and convergent keys in this section. In traditional convergent encryption, to support
duplicate check, the key is derived from the file F by using some cryptographic hash function kF = H(F). To avoid the deterministic key generation, the encryption key kF for
file F in our system will be generated with the aid of the private key cloud server with
privilege key kp. The encryption key can be viewed as the form of kF;p =H0(H(F), kp)
H2(F), where H0,H and H2 are all cryptographic hash functions. The file F is encrypted
with another key k, while k will be encrypted with kF;p. In this way, both the private
cloud server and S-CSP cannot decrypt the ciphertext.
17
Bibliography
18