Shard-In After Sharding Out With SSD

Scale In after Scaling Out
WHY, WHEN AND HOW
Who am I
S Been using mySQL since 1999
S Worked for FriendFinder, Friendster, Flickr, Rockyou,
SchoolFeed, Weebly
S Presented on Flickr Architecture: Doing Billions of Queries Per
Day. Record Every Referral For Flickr Real-time. Scaling to 200K TPS per second with Open Source. Scaling a Widget Company, MySpace Vrs Facebook API Load Patterns. University Of Utah Presentation and various others.
Patterns from Start to Scale
S Start a project with a single mySQL DB.
S Get some users add more disks to the mySQL DB

S Get some more users add a slave S Add more Slaves S Then need to split up the master
Patterns: Continued
S Master is not strong enough
S Put tables on other servers with slaves

S Constantly battle slave lag
All Tables Minus the Big Table
Big Table
Still not scaling Horizontally
S Lets Shard
User 1s Data User 2s Data User 3s Data . User Ns Data
User 1s Data
User 3s Data
User 2s Data
User Ns Data
Assign a section of the Database as a whole to a Server
S Have a Layer to tell the connector what to connect to
S Add slaves for redundancy

S Go Master-Master S Do this N times S Finally provided stable service
Federation
This Increases Write Throughput
Now you have the ability to scale Horizontally
S What problem was solved at its lowest levels?

S Lack of IOPS solved
S Handling of concurrency solved
Problems introduced
S Lots of power used
S Lots of servers to manage

S Lots of rack space used S Some less then optimal hardware usage
SSD
S SSD use NAND flash chips, each chip holds millions of
Cells.
S SLC can hold a single data bit, MLC can hold multiple
data bits yielding a higher density or more disk space.

S Typically MCL provides Slower throughput than SLC due
to more complicated error correction algorithms and false positive reads.
MLC is not all bad major leaps in Firmware improved it

S S S S S
More enterprises are using MLC Its cheaper Fast Enough Endurance improved Write amplification improved (erasure and data wad resends)
S
S
Stay on top of Firmware changes

TRIM improvements which solves the progressively slower writes to blocks over and over.
We use Intel SSDSA2CW160 320 Series MLC SSD

S S S
Its FAST Its Reliable It has advance power protection features

S
Really big capacitors to flush buffered data
S S S S
Low power usage We consider it the best Its no longer made Everyone wants it
Speed of single SSD verses Single Spinning Metal Drive
S 20K IOPS writes reported - SSD
S 35K IOPS reads reported SSD
200 IOPS for SEAGATE ST9146852SS HT043TB0584C
Now that we have more IOPS need Space
S RAID-5 gives the best
space performance we can use.
S 8 160GB SSD gives 1TB 613GB of usable space

S Raw size is 149 GB per disk S Reserved for wear
Now that we have space we can combine Shards
S First get the IOP usage of the current shards

S
is 12K IOPS
S Next get Disk space requirements

S Do not use more then 50% so you have room for growth S I use now 56% of space
S Depending on the Replication Traffic per shard you may
need another plan
How to combine Data
Code Steps
S Have a program that keeps a hashmap of tablename to
federated column
S Lock the federated entity by throwing an error in the application
that says this federated entity is not available

S SELECT ALL Federated data (in chunks) and add it to the new
combined table.
S Update pointers
S Error if any step fails and keep the data locked otherwise
Unlock
Another way to combine, more Operational
S Take a copy of the shard.
S Configure multi instance mysql

S Run that shard off a different port
I choose to do both methods here is how

S Lets take a case. S Support 20 million websites S 90% of all sites get 1 or more hits but less then 1000 hits per day S Less then 10% of sites gets more then 1000 hits per day S 8 shards to handle 12K IOPS S 64 CPU Threads S 288 GB of memory S 64 2.5 drives S Roughly $40K Of hardware S Multiply * 2 for redundancy
Replication was lagging
S Simply combining the data onto one server will not work
S Master needs 10K IOPS replication with some tricks can
use 2.5K IOPS

S Innodb_fake_changes did not work
S Facebook:faker helped but CPU was underpowered thus
not really good to saturate IOPS and keep it in sync
Multi_Mysql is the answer
S Set up 4 mySQL instances
S Instead of 1 Replication thread I now have 4

S Instead of limiting 2.5K IOPS ON SSD from single Replication
thread I now have 10K IOPS

S Master produces 10K IOPS from ETL that runs on all Front End
S I dont need much memory in fact each instance only has a
4GB buffer pool
Lets look at the details
All tables are compressed
S KEY_BLOCK_SIZE=8K with INNODB
Consistent Hash for Hostname to bigint
S Remove Lookup in exchange for a small CPU
computation
S Md5 based off 1st 16 bits of hex number to produce 8byte
bigint
S Primary KEY is HashId + Hostname(10) S HashId maps to ShardId with range blocks
Run through all hostnames to assign to a shard
Test Hashing is even
Shard on a Single Server
S There are 8 Databases per Server Instance
S A database represents a shard

S There are 4 mySQL Server Instances S 32 Shards Total S Can isolate a single DB to a single Server S Can isolate a single Host to a single Shard
Base on a Range go to the correct server, port and database
How to switch
S Write in both locations
S Log if write fails in a single location (none happened)

S Backfill old data to new format S Switch reads over to the new format once data is verified
as correct
In Staging switch Reads to new format

S Verify that Production and Staging render the same
graph
S Verify that Production and Staging have the same
referrers
S Sample random Pro accounts and make sure numbers
match
Roll out with a switch to rollback

S There was a bug where some website where passed as
user input to the lookup method, yet I stored everything lowercase names
S Turn off new reads with Application config switch S Fix issue and turn on new reads
Clean Up
S Once fully over on new format
S Kill old format

S Repurpose servers S Profit
Some Stats
S $80K potential in server cost reduced to $7K
S Utilize all the CPU

S Less memory per server but more IOPS S All replicas stay in sync because there is now more then
1 replication thread per physical server. (There are 4)
Next Generation
S Fusion I/O PCIe SSD Card
S 1U Form factor
S Less power 40W-50W S No need to RAID
Questions
S Twitter @dathanvp
S http://mysqldba.blogspot.com
S http://facebook.com/dathan S http://linkedIn.com/in/dathan S http://about.me/dathan S mailto:dathanvp@gmail.com

Shard-In After Sharding Out With SSD

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Shard-In After Sharding Out With SSD

Transféré par

Droits d'auteur :

Formats disponibles

Scale In after Scaling Out

WHY, WHEN AND HOW

S Been using mySQL since 1999

S Worked for FriendFinder, Friendster, Flickr, Rockyou,

Patterns from Start to Scale

S Start a project with a single mySQL DB.

S Get some users add more disks to the mySQL DB

S Master is not strong enough

S Put tables on other servers with slaves

All Tables Minus the Big Table

Still not scaling Horizontally

User 1s Data User 2s Data User 3s Data . User Ns Data

Assign a section of the Database as a whole to a Server

S Have a Layer to tell the connector what to connect to

S Add slaves for redundancy

This Increases Write Throughput

Now you have the ability to scale Horizontally

S What problem was solved at its lowest levels?

S Lots of power used

S Lots of servers to manage

S SSD use NAND flash chips, each chip holds millions of

data bits yielding a higher density or more disk space.

to more complicated error correction algorithms and false positive reads.

MLC is not all bad major leaps in Firmware improved it

Stay on top of Firmware changes

We use Intel SSDSA2CW160 320 Series MLC SSD

Its FAST Its Reliable It has advance power protection features

Really big capacitors to flush buffered data

Speed of single SSD verses Single Spinning Metal Drive

S 20K IOPS writes reported - SSD

S 35K IOPS reads reported SSD

200 IOPS for SEAGATE ST9146852SS HT043TB0584C

Now that we have more IOPS need Space

S RAID-5 gives the best

space performance we can use.

S 8 160GB SSD gives 1TB 613GB of usable space

Now that we have space we can combine Shards

S First get the IOP usage of the current shards

S Next get Disk space requirements

S Depending on the Replication Traffic per shard you may

need another plan

How to combine Data

that says this federated entity is not available

Another way to combine, more Operational

S Take a copy of the shard.

S Configure multi instance mysql

I choose to do both methods here is how

Replication was lagging

S Master needs 10K IOPS replication with some tricks can

use 2.5K IOPS

S Facebook:faker helped but CPU was underpowered thus

not really good to saturate IOPS and keep it in sync

Multi_Mysql is the answer

S Set up 4 mySQL instances

S Instead of 1 Replication thread I now have 4

thread I now have 10K IOPS

4GB buffer pool

Lets look at the details

All tables are compressed

S KEY_BLOCK_SIZE=8K with INNODB

Consistent Hash for Hostname to bigint

S Remove Lookup in exchange for a small CPU

Run through all hostnames to assign to a shard