Vous êtes sur la page 1sur 41

Scale In after Scaling Out

WHY, WHEN AND HOW

Who am I

S Been using mySQL since 1999

S Worked for FriendFinder, Friendster, Flickr, Rockyou,

SchoolFeed, Weebly
S Presented on Flickr Architecture: Doing Billions of Queries Per

Day. Record Every Referral For Flickr Real-time. Scaling to 200K TPS per second with Open Source. Scaling a Widget Company, MySpace Vrs Facebook API Load Patterns. University Of Utah Presentation and various others.

Patterns from Start to Scale

S Start a project with a single mySQL DB.

S Get some users add more disks to the mySQL DB


S Get some more users add a slave S Add more Slaves S Then need to split up the master

Patterns: Continued

S Master is not strong enough

S Put tables on other servers with slaves


S Constantly battle slave lag

All Tables Minus the Big Table

Big Table

Still not scaling Horizontally

S Lets Shard

User 1s Data User 2s Data User 3s Data . User Ns Data

User 1s Data

User 3s Data

User 2s Data

User Ns Data

Assign a section of the Database as a whole to a Server

S Have a Layer to tell the connector what to connect to

S Add slaves for redundancy


S Go Master-Master S Do this N times S Finally provided stable service

Federation

This Increases Write Throughput

Now you have the ability to scale Horizontally

S What problem was solved at its lowest levels?


S Lack of IOPS solved
S Handling of concurrency solved

Problems introduced

S Lots of power used

S Lots of servers to manage


S Lots of rack space used S Some less then optimal hardware usage

SSD

S SSD use NAND flash chips, each chip holds millions of

Cells.
S SLC can hold a single data bit, MLC can hold multiple

data bits yielding a higher density or more disk space.


S Typically MCL provides Slower throughput than SLC due

to more complicated error correction algorithms and false positive reads.

MLC is not all bad major leaps in Firmware improved it


S S S S S

More enterprises are using MLC Its cheaper Fast Enough Endurance improved Write amplification improved (erasure and data wad resends)

S
S

Stay on top of Firmware changes


TRIM improvements which solves the progressively slower writes to blocks over and over.

We use Intel SSDSA2CW160 320 Series MLC SSD


S S S

Its FAST Its Reliable It has advance power protection features


S

Really big capacitors to flush buffered data

S S S S

Low power usage We consider it the best Its no longer made Everyone wants it

Speed of single SSD verses Single Spinning Metal Drive

S 20K IOPS writes reported - SSD

S 35K IOPS reads reported SSD

200 IOPS for SEAGATE ST9146852SS HT043TB0584C

Now that we have more IOPS need Space

S RAID-5 gives the best

space performance we can use.

S 8 160GB SSD gives 1TB 613GB of usable space


S Raw size is 149 GB per disk S Reserved for wear

Now that we have space we can combine Shards

S First get the IOP usage of the current shards


S

is 12K IOPS

S Next get Disk space requirements


S Do not use more then 50% so you have room for growth S I use now 56% of space

S Depending on the Replication Traffic per shard you may

need another plan

How to combine Data

Code Steps
S Have a program that keeps a hashmap of tablename to

federated column
S Lock the federated entity by throwing an error in the application

that says this federated entity is not available


S SELECT ALL Federated data (in chunks) and add it to the new

combined table.
S Update pointers
S Error if any step fails and keep the data locked otherwise

Unlock

Another way to combine, more Operational

S Take a copy of the shard.

S Configure multi instance mysql


S Run that shard off a different port

I choose to do both methods here is how


S Lets take a case. S Support 20 million websites S 90% of all sites get 1 or more hits but less then 1000 hits per day S Less then 10% of sites gets more then 1000 hits per day S 8 shards to handle 12K IOPS S 64 CPU Threads S 288 GB of memory S 64 2.5 drives S Roughly $40K Of hardware S Multiply * 2 for redundancy

Replication was lagging

S Simply combining the data onto one server will not work

S Master needs 10K IOPS replication with some tricks can

use 2.5K IOPS


S Innodb_fake_changes did not work

S Facebook:faker helped but CPU was underpowered thus

not really good to saturate IOPS and keep it in sync

Multi_Mysql is the answer

S Set up 4 mySQL instances

S Instead of 1 Replication thread I now have 4


S Instead of limiting 2.5K IOPS ON SSD from single Replication

thread I now have 10K IOPS


S Master produces 10K IOPS from ETL that runs on all Front End
S I dont need much memory in fact each instance only has a

4GB buffer pool

Lets look at the details

All tables are compressed

S KEY_BLOCK_SIZE=8K with INNODB

Consistent Hash for Hostname to bigint

S Remove Lookup in exchange for a small CPU

computation
S Md5 based off 1st 16 bits of hex number to produce 8byte

bigint
S Primary KEY is HashId + Hostname(10) S HashId maps to ShardId with range blocks

Run through all hostnames to assign to a shard

Test Hashing is even

Shard on a Single Server

S There are 8 Databases per Server Instance

S A database represents a shard


S There are 4 mySQL Server Instances S 32 Shards Total S Can isolate a single DB to a single Server S Can isolate a single Host to a single Shard

Base on a Range go to the correct server, port and database

How to switch

S Write in both locations

S Log if write fails in a single location (none happened)


S Backfill old data to new format S Switch reads over to the new format once data is verified

as correct

In Staging switch Reads to new format


S Verify that Production and Staging render the same

graph
S Verify that Production and Staging have the same

referrers
S Sample random Pro accounts and make sure numbers

match

Roll out with a switch to rollback


S There was a bug where some website where passed as

user input to the lookup method, yet I stored everything lowercase names
S Turn off new reads with Application config switch S Fix issue and turn on new reads

Clean Up

S Once fully over on new format

S Kill old format


S Repurpose servers S Profit

Some Stats

S $80K potential in server cost reduced to $7K

S Utilize all the CPU


S Less memory per server but more IOPS S All replicas stay in sync because there is now more then

1 replication thread per physical server. (There are 4)

Next Generation

S Fusion I/O PCIe SSD Card

S 1U Form factor
S Less power 40W-50W S No need to RAID

Questions

S Twitter @dathanvp

S http://mysqldba.blogspot.com
S http://facebook.com/dathan S http://linkedIn.com/in/dathan S http://about.me/dathan S mailto:dathanvp@gmail.com

Vous aimerez peut-être aussi