Vous êtes sur la page 1sur 16

SSD Overview

Terminologies Associated with SSD


Write Endurance estimation example
3PAR Architectures Flash Friendliness

SSD Layout
Basic Building blocks of NAND Flash devices

Is made up of multiple Memory cells

NAND memory cell is a MOS transistor with floating gate


Which permanently stores charge
While programming puts electrons on floating gate
Erase takes them of
One program/erase (p/e) cycle is a round trip by the electrons
Back-and-forth round trips gradually damage the tunnel oxide

i.e. as more p/e cycles happen the tunnel oxide degrades. If it degrades beyond a point
then the Cell becomes useless.

If the tunnel oxide layer is thick then it can sustain larger P/E cycles thereby increasing
Endurance. Typically measured in number of p/e cycles:

50nm MLC ~ 10,000 p/e cycles


34nm/25nm/20nm MLC ~ 3,000 5,000 p/e cycles
While physical size reduces with lower Die Size, the drive endurance is impacted.

SSD Layout
Types of NAND Flash

Types of Memory cells

SLC (Single Level Cell) 1 Bit per memory cell


Higher Density
=
MLC (Multi Level Cell) 2 Bits per memory cell
Capacity &
TLC (Triple Level Cell) 3 Bits per memory cell
Lower Endurance
16LC (16 Level Cell) 4 Bits per memory cell

Higher

SSD Layout - Summary


Understanding the Internal Constructs of a SSD Drive

Cells

Pages (Multiple Cells)

Blocks (Multiple Blocks)

Plane (Multiple Blocks)

Die (Multiple Planes)

TSOP (Multiple Dies)

SSD (Multiple TSOPs)


4

A basic I/O
(Reads/Writes)
happens at a Page
Level.
However Erase is
done in terms of
Blocks
This leads to
situations where there
can be more Writes
on the back-end than
the actual Writes

SSD Layout
Pages & Blocks

Data is accessed (Read/Write) in terms


of Pages
But Erase is done in terms of Blocks

Host I/O

Pages =
4KiB

This is like earning


in Rs. And spending
in US$ J

A Page = Multiple memory cells

one page is the smallest structure which can be


read or written. Standard Page size is 4K in
size.

Blocks = Multiple pages

one block is the smallest structure which can be


erased

e.g.one block = 128 pages at 4 KiB 512 KiB Block

Block = 128 Pages =


512KiB

Erase at
Block

SSD Layout
The next SSD construct is a Plane

Multiple blocks make up a plane

e.g. 1.024 Blocks = 1 Plane

The next higher construct is a Die

Multiple planes make up a die

E.g. 4 Planes = 1 Die

SSD Layout
TSOPs (thin small outline
packages)
Multiple Dies make up a TSOP

typically one or two dies in a TSOP

up to eight dies possible 64 GiByte in a


TSOP

SSDs
Multiple TSOPs (e.g. ten) make up a SSD
currently capacities up to 800/1400 GB

Terms associated with SSDs


Jargons explained

Over Provisioning
Wear Levelling
Garbage Collection
Write Amplification
Drive Endurance / Write Endurance
DWPD- Drive/Device Writes Per Day . This is a way of rating endurance
and can be used to match Application with specific SSD Type (SLC, eMLC
etc.)
the associated assumption is that this daily usage figure is good for an
operating period of 5 years.

Space Utilization & Management


Techniques
Over Provisioning
Each SSD Drive has a higher capacity than the actual advertised
capacity of the Drive. This area is spare area or overprovisioned
space.

Typically between 7% and 28% of net capacity

e.g. 800 GByte visible, but actual capacity is 1200 GiByte (also
called soft capacity)

Write Latency (ms)


80
70
60
50
40
30
20
10
0

What is the Extra area used for


o

o
9

keep free pages for quick writing and with less impact on
host latency (reduce or avoid what is known as write
clif).
wear leveling (ensure that all blocks are evenly utilized, so
as to increase life of the drive

What is Write Clif when latency increases


exponentially since
there are not enough
clean pages to flush
writes

Space Utilization & Management


Techniques
Wear levelling

Since Flash memory cells can only be erased


(written) a limited amount of times,
controllers/drive firmware has intelligence to
ensure that all cells are evenly utilized.
Wear Leveling distributes the wear-out over all
memory cells blocks are redistributed in order
to ensure all blocks are evenly utilized.
This is where the Over-provisioned capacity of
the drive comes into picture.
Types of Levelling

10

Dynamic Levelling
Static Levelling

Example :
LBA 1 was initially associated
with Block-A Page-1. When a
subsequent write to the data
in that block happens, then
this LBA 1 was reassigned to
Block C Page 5.
LBA 2 which was earlier on
Block-A Page-2 is remapped to
Block-C Page-6.

Space Utilization & Management


Techniques
Garbage collection
Since I/O is done at Page Level while but erase happens at
Block level there will be times where some pages are filled
and some pages are dirty and need to be
overwritten/deleted. Garbage Collection is the background
process for aggregating all the used Pages into a new set of
blocks while aggregating all dirty pages so that they can be
erased.
How does this process work

11

Periodically, at times even without I/O, the SSD controller


merges partly-filled blocks.

This helps to increase the number of deleted blocks that


can be erased and kept ready so as to aid/improve
writes.

This is usually a background process to merge partially

In this example, orange Page are


dirty and due for deletion. To erase
these blocks the green Pages are
remapped to another block and

Space Utilization & Management


Techniques
Today, most Storage systems are capable of handling deletes of Pages intelligently.
Deleted pages are proactively marked for garbage collection so that pages can be
reclaimed for future allocation. While unmap is a useful capability to have for
Spinning Media, it is of utmost importance for SSD Drives!

SCSI Unmap (Commercial grade drives)


ATA TRIM Commands (for Drives with SATA interfaces
Consumer grade drives)
Basically OS tells the SSD which LBAs are not needed anymore and can be erased.
This helps to increase the number of free blocks by initiating garbage collection,
thereby increases the write performance.
12

Space Utilization Write Amplification


Write Amplification (WA)

An undesirable but unavoidable phenomenon with SSDs


where the amount of physical information written on the
drive is actually higher than the actual amount of data
written or sent by the host.

Why does this happen

13

Since flash memory must be erased before it can be


rewritten, these operations result in moving (or rewriting)
user data and metadata more than once. This multiplying
efect increases the number of writes required over the life
of the SSD
Many factors afect the write amplification

some can be controlled by the user

some are a direct result of the data written to and


usage of the SSD.

Sequential I/O has the least WA factor while Random I/O has
the highest.
Typical WA values range from 1.1 for Sequential to as high as 3

Source : wikipedia.org

Factors Impacting WA

Wear Levelling
Garbage Collection
Over Provisioning
I/O Pattern
Random/Sequential
Data Compression / De-dupe

Space Utilization & Drive Endurance


Drive Endurance/Write Endurance

Prolonged Drive usage (writes) afects the life of the drive and referred to as Drive or
Write Endurance.
Typically quantified in terms of Device/Drive Writes Per Day (DWPD).

How does SSD handle endurance


Bad blocks

Over time, erase slows down with p/e cycles. If a NAND block fails to erase, it reports
back and the drive controller will use another block instead (block is remapped with
another block)

No data is lost - a failed NAND block is not a problem (as long as there is enough spare
capacity to remap that block)

Write data errors

14

Due to prolonged usage, blocks may encounter write data errors.


RBER (raw bit error rate) soft errors are usually corrected by ECC . Many a time RBER
gradually increases with p/e cycles (hardware errors)
UBER (uncorrectable bit error rate) usually very low (<1 error out of every 10 15 to 1016
accesses) that usually results in a block remap.

Space Utilization & Drive Endurance


Drive Endurance/Write Endurance
The device choice (SLC, eMLC, cMLC etc.) can arrived at based on required Write Endurance. This can be
arrived at based on the Write workload.
The Write Endurance is typically specified in units of device-writes-per-day (DWPD). It is defined as the
amount of writes (PetaBytesWritten) that can be sustained over the entire product lifetime (DaysPerLife)
normalized to the drives capacity (Capacity):

Hence DWPD = (PetaBytesWritten / DaysPerLife ) / Drive Capacity.


This is a theoretical formula that derives how much amount of data per day per drive based on information
given in the Drive Manufacturers datasheet.
Looking at a Hitachi Data sheet , they specify a 400GB eMLC SSD Drive can endure 7.3 PB of Write over
its life time.
Assuming the Writes will be sustained over 24 hours x 365 Days x 5 years, the DWPD for this drive works
out to

DWPD = (7.3/1825) / (0.0004) = 10


(Drive capacity to be converted into PB for calculation). This means on a 400GB SSD drive you could
15write 4TB per day over its life of 5 Years.

Space Utilization & Drive Endurance


DWPD-Calc

Calculating the Required DWPD for sample Real World


workloads
While the theoretical DWPD on the
Lets take a Core Banking System example to deliver 50 tps.

No. of TPS

No. of DB Transaction per Fin


Transaction
No. of I/Os per DB Transaction
Total Host IOPS
Read Percentage

100

20 Assumption based on a Finacle workload.


5

Host Write IOPS

3,000

16

Required DWPD =
(write_physical_MBps *
seconds_perday) / ( Capacity *
1000)

70
7,000

Block Size (KB)

If we know the workload, the


required DWPD can be calculated :

10,000

Host Read IOPS

Average Working time (hours)

400GB drive is 10, in a real world


scenario the requirement could be
a lot lesser.

12
8

Assuming a 12 hour sustained workload


window

This example assumes a single


Application load on SSD Drives. If
multiple workloads share the same
SSD CPG then the Writes of all
this level even a
cMLC to
canbe
thoseAtapplications
have
comfortable
comfortable sustain
sustain the
the workload
workload
aggregatedover
and5 the
required
years.
over 5
years.

Vous aimerez peut-être aussi