Vous êtes sur la page 1sur 65

HYDRAstor: a Scalable Secondary Storage

7th TF-Storage Meeting


September 9th 2010 ukasz Heldt

Largest Japanese IT company $43 Billion in annual revenue 143,000 staff www.nec.com
Owns & sells

Polish R&D company 50 engineers and scientists www.9livesdata.com

R&D of critical backend component

Scalable disk based storage for backup with global deduplication Started in 2003 in NEC Labs by Cezary Dubnicki 2007 Product of the year award by SearchStorage.com 2008 Product innovation award by Network Products Guide 2009/2010 FAST conference publication in San Jose Sold in US and Japan since 2007 Will be sold in Poland in 2011 by 9LivesData in coop. with NEC

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

Backup storage

Tapes are most common, despite:


Sensitive environment requirements Unreliable restore Low performance Manual labor or expensive robots Problematic replication

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

Backup storage size

Usual backup policy


4-12+ full backups 7-30+ incremental Majority of data does not change Data compression 2:1

Secondary storage size:

5x-20x more than primary storage Includes many copies of the same data Each data chunk stored 5-10+ times

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

Backup storage size

Usual backup policy


4-12+ full backups 7-30+ incremental Majority of data does not change Data compression 2:1

Secondary storage size:

5x-20x more than primary storage Includes many copies of the same data Each data chunk stored 5-10+ times

High potential for the deduplication technology.

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

Deduplication

Save disk space by eliminating duplicates Sample reduction ratio 10:1 Lowers price of gigabyte
Sub-file level deduplication
File A File B File A

(depends on backup policy)

A B C A D E A B C
Only unique blocks are stored Stored blocks

A B C D E

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

Global deduplication

Prevent silos of deduped data One system to manage


Global vs. siloed dedup

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

HYDRAstor product

Provides

global deduplication using DataRedux performance, storage scalability and data resiliency using Distributed Resilient Data

HYDRAstor deployment

Interface: CIFS, NFS, Symantec OST Marker filtering for: Tivoli, Netbackup, Networker, CommVault

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

10

HYDRAstor architecture

Accelerator Nodes realize performance Storage Nodes realize capacity


NFS / CIFS / OST over Ethernet Accelerator Nodes

Internal Network Storage Nodes

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

11

HYDRAstor architecture

Accelerator Nodes realize performance Storage Nodes realize capacity


NFS / CIFS / OST over Ethernet Accelerator Nodes

Internal Network Storage Nodes

Non-disruptive grid expansion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

12

HYDRAstor scalability

MiniHYDRA single server


Storage: 12 TB 240 TB* Performance: 1.3 TB / hour Storage: 48 TB 960 TB* Performance: 3.6 TB / hour Storage: 480 TB 9600 TB* Performance: 36 TB / hour

2AN 4SN

20AN 40SN (4 racks)


* - assuming 20x data reduction through DataRedux

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

13

HYDRAstor scalability

Slide from Curtis Preston presentation


Curtis Preston is a famous storage analyst owning independent consulting company

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

14

HYDRAstor other features

Fully automatic/non-disruptive mgmt


Recovery of lost data resiliency Periodic data scrubbing Machine and disk failure recovery erasure coding better than RAID6

Configurable redundancy level

Optimized replication Smart resource management

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

15

HYDRAstor backend design

Details of the design: http://www.usenix.org/events/fast09/tech/full_papers/dubnicki/dubnicki.pdf

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

16

Programming Model

Repository of blocks

Content-addressed Immutable Variable-sized

hash=011..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

17

Programming Model

Repository of blocks

Content-addressed Immutable Variable-sized

Exposed pointers to other blocks


E
011. .0

hash=011..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

18

Programming Model

Repository of blocks

hash=010..1 Root1 E

Content-addressed Immutable Variable-sized

Exposed pointers to other blocks Trees of blocks

E
011. .0

hash=011..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

19

Programming Model

Repository of blocks

hash=010..1 Root1 E

Root2 E

Content-addressed Immutable Variable-sized

hash=110..0

Exposed pointers to other blocks Trees of blocks


E
011. .0

DAGs due to deduplication No cycles possible

hash=011..0

01 1. .0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

20

Programming Model

Repository of blocks

hash=010..1 Root1 E

Root2 E

Content-addressed Immutable Variable-sized

hash=110..0

Exposed pointers to other blocks Trees of blocks


E
011. .0

DAGs due to deduplication No cycles possible

Deletion of whole trees

hash=011..0

01 1. .0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

21

Programming Model

Repository of blocks

hash=010..1 Root1 E

Root2 E

Content-addressed Immutable Variable-sized

hash=110..0

Exposed pointers to other blocks Trees of blocks


E
011. .0

DAGs due to deduplication No cycles possible

Deletion of whole trees

hash=011..0

01 1. .0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

22

Programming Model

Repository of blocks

hash=010..1 Root1 E

Root2 E

Content-addressed Immutable Variable-sized

hash=110..0

Exposed pointers to other blocks Trees of blocks


E
011. .0

DAGs due to deduplication No cycles possible

Deletion of whole trees

hash=011..0

01 1. .0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

23

Programming Model

Repository of blocks

Root2 E

Content-addressed Immutable Variable-sized


hash=110..0

Exposed pointers to other blocks


E
011. .0

DAGs due to deduplication No cycles possible

Deletion of whole trees

hash=011..0

01 1. .0

Trees of blocks

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

24

Failure tolerance: erasure coding


Redundant Fragments
Example: N=8, m=5

Decode

Encode
Original block

Any 3 fragments can be lost

Original Fragments

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

25

Failure tolerance: erasure coding


Redundant Fragments
Example: N=8, m=5

Decode

Encode
Original block Mirror Resiliency Overhead 1 100%

Any 3 fragments can be lost


Assuming 12 disks array
3-copy 2 200% 2 20% RAID6 2 20% Erasure coding 3 33%

Original Fragments

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

26

Scalability with DHT: data placement

Block location: DHT with prefix routing


empty prefix

00

01

10

11

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

27

Scalability with DHT: data placement


Block location: DHT with prefix routing Block mapped to hash prefix
0 empty prefix

hash=011..0 Block
1

00

01

10

11

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

28

Scalability with DHT: data placement


Block location: DHT with prefix routing Block mapped to hash prefix Prefix components

empty prefix

hash=011..0 Block
1

Hosted on SNs N components per prefix


00

N=4
11

01

10

Node 1 Node 1 2 1 Node 3 Node 4 1 1 Node 5 1 Node 6

1 3 2 0 0 2 3 1

0 1

3 2 1 3 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

29

Scalability with DHT: data placement


Block location: DHT with prefix routing Block mapped to hash prefix Prefix components

empty prefix

hash=011..0 Block
1

Hosted on SNs N components per prefix Store fragments


00

N=4
11

01

10

Node 1 Node 1 2 1 Node 3 Node 4 1 1 Node 5 1 Node 6

1 3 2 0 0 2 3 1

0 1

3 2 1 3 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

30

Scalability with DHT: data placement


Block location: DHT with prefix routing Block mapped to hash prefix Prefix components

empty prefix

hash=011..0 Block
1

Hosted on SNs N components per prefix Store fragments


00

N=4
11

01

10

Node 1 Node 1 2 1 Node 3 Node 4 1 1 Node 5 1 Node 6

1 3 2 0 0 2 3 1

0 1

Distributed consensus

3 2 1 3 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

31

Scalability with DHT: data placement


Block location: DHT with prefix routing Block mapped to hash prefix Prefix components

empty prefix

hash=011..0 Block
1

Hosted on SNs N components per prefix Store fragments


00

N=4
11

01

10

Node 1 Node 1 2 1 Node 3 Node 4 1 1 Node 5 1 Node 6

1 3 2 0 0 2 3 1

0 1

Distributed consensus

3 2 1 3 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

32

Scalability with DHT: data placement


Block location: DHT with prefix routing Block mapped to hash prefix Prefix components

empty prefix

hash=011..0 Block
1

Hosted on SNs N components per prefix Store fragments


00

N=4
11

01

10

Node 1 Node 1 2 1 Node 3 Node 4 1 1 Node 5 1 Node 6 3 2 0 1 0 2 3 3 1 1 0 2 3 2 1 0

Distributed consensus

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

33

Scalability with DHT: data placement


Block location: DHT with prefix routing Block mapped to hash prefix Prefix components

empty prefix

hash=011..0 Block
1

Hosted on SNs N components per prefix Store fragments


00

N=4
11

01

10

Node 1 Node 1 2 1 Node 3 Node 4 1 1 Node 5 1 Node 6 3 2 0 1 0 2 3 3 1 1 0 2 3 2 1 0

Distributed consensus

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

34

Scalability with DHT: data placement


Block location: DHT with prefix routing Block mapped to hash prefix Prefix components

empty prefix

hash=011..0 Block
1

Hosted on SNs N components per prefix Store fragments


00

N=4
11

01

10

Node 1 Node 1 2 1 Node 3 Node 4 1 1 Node 5 1 Node 6 3 2 0 1 0 2 3 3 1 1 0 2 3 2 1 0

Distributed consensus Load balancing

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

35

Data organization: synchrun chains


A B C D E F G

Data stream split to blocks

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

36

Data organization: synchrun chains


A
Hash 010

B
Hash 101

C
Hash 110

D
Hash 011

E
Hash 000

F
Hash 011

G
Hash 100

Data stream split to blocks Hashes of blocks computed

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

37

Data organization: synchrun chains


A
Hash 010

B
Hash 101

C
Hash 110

D
Hash 011

E
Hash 000

F
Hash 011

G
Hash 100

Data stream split to blocks Hashes of blocks computed Routing through DHT

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

38

Data organization: synchrun chains


A
Hash 010

B
Hash 101

C
Hash 110

D
Hash 011

E
Hash 000

F
Hash 011

G
Hash 100

Data stream split to blocks Hashes of blocks computed Routing through DHT

Prefix 01

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

39

Data organization: synchrun chains


A
Hash 010

B
Hash 101

C
Hash 110

D
Hash 011

E
Hash 000

F
Hash 011

G
Hash 100

Data stream split to blocks Hashes of blocks computed Routing through DHT

Prefix 01 Compression Erasure Coding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

40

Data organization: synchrun chains


A
Hash 010

B
Hash 101

C
Hash 110

D
Hash 011

E
Hash 000

F
Hash 011

G
Hash 100

Data stream split to blocks Hashes of blocks computed Routing through DHT

Prefix 01 Compression Erasure Coding


Component

Erasure-coded fragments stored by components

0
Component

1
Component

2
Component

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

41

Data organization: synchrun chains


A
Hash 010

B
Hash 101

C
Hash 110

D
Hash 011

E
Hash 000

F
Hash 011

G
Hash 100

Data stream split to blocks Hashes of blocks computed Routing through DHT

Prefix 01 Compression Erasure Coding


Component

Erasure-coded fragments stored by components

0
Component

A D F

1
Component

A D F

2
Component

A D F

A D F

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

42

Data organization: synchrun chains


A
Hash 010

B
Hash 101

C
Hash 110

D
Hash 011

E
Hash 000

F
Hash 011

G
Hash 100

Data stream split to blocks Hashes of blocks computed Routing through DHT

Prefix 01 Compression Erasure Coding


Synchrun 1 Synchrun 2 Synchrun 3

Erasure-coded fragments stored by components Grouped into synchruns

Component

0
Component

A D F

1
Component

A D F

2
Component

A D F

A D F

Synchrun

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

43

Data organization: synchrun chains


A
Hash 010

B
Hash 101

C
Hash 110

D
Hash 011

E
Hash 000

F
Hash 011

G
Hash 100

Data stream split to blocks Hashes of blocks computed Routing through DHT

Prefix 01 Compression Erasure Coding


Synchrun 1 Synchrun 2 Synchrun 3

Erasure-coded fragments stored by components Grouped into synchruns Containers stored on disks

Component

0
Component

A D F

1
Component

A D F

2
Component

A D F

Fragment metadata separately from data

3 Container

A D F

Synchrun

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

44

Data organization: synchrun chains


A
Hash 010

B
Hash 101

C
Hash 110

D
Hash 011

E
Hash 000

F
Hash 011

G
Hash 100

Data stream split to blocks Hashes of blocks computed Routing through DHT

Prefix 01 Compression Erasure Coding


Synchrun 1 Synchrun 2 Synchrun 3

Erasure-coded fragments stored by components Grouped into synchruns Containers stored on disks

Component

0
Component

A D F

1
Component

A D F

2
Component

A D F

Fragment metadata separately from data Preserve order & locality Manageable

Ordered synchrun chains

3 Container

A D F

Synchrun

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

45

Data Services: Identification of data resiliency level


Missing fragments
Component

01:0
Component

01:1
Component

01:2
Component

01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

46

Data Services: Identification of data resiliency level


Component

01:0
Component

01:1
Component

01:2
Component

01:3

Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

47

Data Services: Identification of data resiliency level


Component

01:0
Component

01:1
Component

01:2
Component

01:3

Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

48

Data Services: Identification of data resiliency level


Component

01:0
Component

01:1
Component

01:2
Component

01:3

Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

49

Data Services: Identification of data resiliency level


Component

01:0
Component

01:1
Component

01:2
Component

01:3

Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

50

Data services: reconstruction

Component

01:0
Component

01:1
Component

01:2
Component

01:3

Sequential read/write of entire Containers Erasure decoding and re-encoding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

51

Data services: reconstruction

Component

01:0
Component

01:1
Component

01:2
Component

01:3

Sequential read/write of entire Containers Erasure decoding and re-encoding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

52

Data services: reconstruction

Component

01:0
Component

01:1
Component

01:2
Component

01:3

Sequential read/write of entire Containers Erasure decoding and re-encoding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

53

Data services: fast data transfer


Component

01:0
Component

01:1
Component

01:2
Component

01:3

Location of new node (DHT)


Old component 01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

54

Data services: fast data transfer


Component

01:0
Component

01:1
Component

01:2
Component

01:3

Data transfer
Old component 01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

55

Data services: fast data transfer


Component

01:0
Component

01:1
Component

01:2
Component

01:3

Data transfer
Old component 01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

56

Data services: fast data transfer


Component

01:0
Component

01:1
Component

01:2
Component

01:3

Data transfer
Old component 01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

57

Data services: fast data transfer


Component

01:0
Component

01:1
Component

01:2
Component

01:3

Old component 01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

58

Data services for deduplication


hash=011.. Block
Component

01:0
Component

Choose complete chain

01:1
Component

01:2
Component

01:3

Completeness: definitely not a duplicate Deletion interaction: wasn't the block scheduled for deletion?

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

59

Data services for deduplication


hash=011.. Block
Component

01:0
Component

01:1
Component

01:2 Query
Component

01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

60

Data services for deduplication


hash=011.. Block
Component

01:0
Component

01:1
Component

01:2
Component

01:3

Local candidate found

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

61

Data services for deduplication


hash=011.. Block
Component

01:0
Component

01:1
Component

01:2 Successful dedup


Component

01:3

Candidate verification

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

62

On-demand data deletion


Distributed garbage collection Per-block reference counter stored perfragment Failure-tolerant

Block reference counter calculated independently on peer Container chains duplicates resurrection after garbage collection space reclamation in background

Interference with duplicate elimination:


HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

63

Resource management

Configurable load balancing between:


backup/restore background tasks (reconstruction, transfer, etc.) garbage collection

Shares depend on system state Assigns priority of tasks automatically

e.g. reconstruction before transfer or space reclamation

Maximizes resources utilization

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

64

Topics for further discussion


Features and technical details of HYDRAstor Sales of HYDRAstor in Poland Cooperation with 9LivesData on other projects

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC

65

Questions?

Contact: heldt@9livesdata.com www.9livesdata.com www.hydrastor.com

Vous aimerez peut-être aussi