Vous êtes sur la page 1sur 45

Scalable Event Analytics with

Ruby on Rails & MongoDB


Ruby Conf China 2010
Jared Rosoff (@forjared)
jrosoff@yottaa.com
Yottaa!!!! (www.yottaa.com)
Overview
• Ruby at Scale

• What is Event Analytics?

• What are the different ways you could
do it?

• How we did it

Ruby At
Scale?
http://www.flickr.com/photos/laughingsquid
Event Analytics

Data Source
Event
Event
Event
Query User
Data Source Event
Event
Report
Event

Event Analytics
Event Query
Event
Event
Data Source Report User
Event
Event
Event

Data Source
High Write Volume
Each new data source adds X requests per second
Data never stops arriving

Continuous Data Growth


We only add more data
Historical data is valuable

Flexible Data Exploration


Ad hoc queries
Complex aggregations
Oh and we are a startup
Our requirements:
On Launch Day
# of data sources 15
# of events per minute 80
# GBs data stored 20

3 months later (projected)


# of data sources 45
# of events per minute 5600
# GBs data stored 100
Rails default architecture

Data Source Collection Server

MySQL

User Reporting Server


Rails default architecture

Data Source Collection Server

MySQL

User Reporting Server

“ Just ” a Rails
App
Rails default architecture
Performance
Bottleneck : Too much
load

Data Source Collection Server

MySQL

User Reporting Server

“ Just ” a Rails
App
Let’s add replication!

Data Source Collection Server MySQL


Master

Replication

User Reporting Server MySQL


MySQL
MasterMySQL
Master
Master
Let’s add replication!

Data Source Collection Server MySQL


Master

Replication

User Reporting Server MySQL


MySQL
MasterMySQL
Master
Master

Off the shelf !


Scalable Reads !
Let’s add replication!
Performance
Bottleneck : Still
can ’ t scale writes

Data Source Collection Server MySQL


Master

Replication

User Reporting Server MySQL


MySQL
MasterMySQL
Master
Master

Off the shelf !


Scalable Reads !
What about sharding?

Sharding
Data Source Collection Server

MySQL
MySQL
MasterMySQL
Master
Master

Sharding
User Reporting Server
What about sharding?
Scalable Writes !

Sharding
Data Source Collection Server

MySQL
MySQL
MasterMySQL
Master
Master

Sharding
User Reporting Server
What about sharding?
Scalable Writes !

Sharding
Data Source Collection Server

MySQL
MySQL
MasterMySQL
Master
Master

Sharding
User Reporting Server

Development
Bottleneck :
Need to write custom
code
Key Value stores to the rescue?

Data Source Collection Server

MySQL
MySQL
Master
Cassandra
Master or
Voldemort
User Reporting Server
Key Value stores to the rescue?
Scalable Writes !

Data Source Collection Server

MySQL
MySQL
Master
Cassandra
Master or
Voldemort
User Reporting Server
Key Value stores to the rescue?
Scalable Writes !

Data Source Collection Server

MySQL
MySQL
Master
Cassandra
Master or
Voldemort
User Reporting Server

Development
Bottleneck :
Reporting is limited /
hard
Can I Hadoop my way out of this?
MySQL
MySQL
Master
Cassandra
Master or
Data Source Collection Server Voldemort

Hadoop

MySQL
Master

User Reporting Server MySQL


MySQL
MasterMySQL
Master
Slave
Can I Hadoop my way out of this?
Scalable Writes !

MySQL
MySQL
Master
Cassandra
Master or
Data Source Collection Server Voldemort

Hadoop

MySQL
Master

User Reporting Server MySQL


MySQL
MasterMySQL
Master
Slave
Can I Hadoop my way out of this?
Scalable Writes !

MySQL
MySQL
Master
Cassandra
Master or
Data Source Collection Server Voldemort

Hadoop

Flexible
Reports ! MySQL
Master

User Reporting Server MySQL


MySQL
MasterMySQL
Master
Slave
Can I Hadoop my way out of this?
Scalable Writes !

MySQL
MySQL
Master
Cassandra
Master or
Data Source Collection Server Voldemort

Hadoop

Flexible
Reports ! MySQL
Master
“ Just ” a Rails
App

User Reporting Server MySQL


MySQL
MasterMySQL
Master
Slave
Can I Hadoop my way out of this?
Scalable Writes !

MySQL
MySQL
Master
Cassandra
Master or
Data Source Collection Server Voldemort

Development
Bottleneck : Hadoop
Too many systems !

Flexible
Reports ! MySQL
Master
“ Just ” a Rails
App

User Reporting Server MySQL


MySQL
MasterMySQL
Master
Slave
MongoDB!

Data Source Collection Server

MySQL
MySQL
Master
Master
MongoDB

User Reporting Server


MongoDB!
Scalable Writes !

Data Source Collection Server

MySQL
MySQL
Master
Master
MongoDB

User Reporting Server


MongoDB!
Scalable Writes !

Data Source Collection Server

MySQL
MySQL
Master
Master
MongoDB

User Reporting Server

Flexible
Reporting !
MongoDB!
“ Just ” a rails
app Scalable Writes !

Data Source Collection Server

MySQL
MySQL
Master
Master
MongoDB

User Reporting Server

Flexible
Reporting !
MongoD
App Server
Data Source

Passenger
Collection

Mongos
Nginx
Load
Balancer MongoD
Reporting
User

MongoD
Sharding !

MongoD
App Server
Data Source

Passenger
Collection

Mongos
Nginx
Load
Balancer MongoD
Reporting
User

MongoD
Sharding !

High
Concurrency

MongoD
App Server
Data Source

Passenger
Collection

Mongos
Nginx
Load
Balancer MongoD
Reporting
User

MongoD
Sharding !

Scale - Out High


Concurrency

MongoD
App Server
Data Source

Passenger
Collection

Mongos
Nginx
Load
Balancer MongoD
Reporting
User

MongoD
MongoDB Sharding
MongoDB Sharding

Replica Sets
let us scale
storage &
transaction
capacity for
each shard
MongoDB Sharding

Replica Sets
let us scale
storage &
transaction
capacity for
each shard
Mongos routes
transactions to
shards based on
“ shard key ”
MongoDB Sharding
Config servers
store
information
about which
shards exist

Replica Sets
let us scale
storage &
transaction
capacity for
each shard
Mongos routes
transactions to
shards based on
“ shard key ”
Inserting

3 Insert { ‘ name ’ :
bob }
Shard key ==
2 name
bob  Shard 2

1 insert { ‘ name ’ :
bob }
Querying

3 Query { ‘ name ’ :
bob }
Shard key ==
2 name
bob  Shard 2

1 Query { ‘ name ’ :
bob }
Map Reduce

2 2 2 2

Map - reduce (…)

1 Map - reduce ( … )
Working with Mongo
• MongoMapper makes it
look like ActiveRecord

• Documents are more
natural than rows in
many cases

• Map-Reduce rocks (but
needs better support
http://www.flickr.com/photos/elhamalawy/2526783078/
in rails)
Ruby

Mongo
Runs over all the objects in the views table, counting ho

Adds up all the counts for a unique url /

Run the map reduce job and return a collection


Results
• Version 1 of our analytics system took 2 weeks
with 1 engineer
– We have since added a lot more complexity, but we did
it incrementally

• We replaced MySQL entirely with MongoDB
– No need for joins, transactions
– Every table is now a document collection

• It’s fast!
– 63ms – Average response time for sending data to
server
– 93ms – Average response time for displaying reports

Vous aimerez peut-être aussi