Académique Documents
Professionnel Documents
Culture Documents
Who Am I?
Ruby developer at Scribd.com Lazy blogger at kpumuk.info Experienced in Ruby & Rails, ASP.NET, MySQL, Sphinx, etc. Author of several projects Sphinx Ruby API maintainer
What Is Scribd.com
Social document sharing The largest Rails site over the Net 65th place on Quantcast (before Digg) 53.5M visitors, 178M page views 10.5M users, 14M document, over 1PB 15 app, 17 db, 7 search, 3 web, 4 proxy boxes
Online Viewer
Groups
Partners
Desktop Uploader
Nginx
Delivers static content Handles file uploads Selects app cluster (main, api, etc) Forwards doc page requests to Squid Forwards all requests to HAProxy
HAProxy
Performs load balancing among application servers Thats all - as easy as pie :-)
Squid
Caches all document pages for bots and anonymous users Forwards requests to HAProxy Allows gracefully clear whole cache Clears cached pages by request (HTCP) Handles 90% of Scribd traffic!
MySQL
All writes to master Almost all reads from slaves. Texts are in separate DB (sharded) All tables are in InnoDB Mysql 5.0 / 5.1 with Percona patches
Application Boxes
Apache + Passenger (Ruby on Rails) Memcached Monit (we are looking for another monitoring tool)
Used for browsing, private documents search, extended site search, API search We are using Google Custom Search Engine, but users are able to switch to internal search engine Index consists of many small datebased chunks for fast indexing
Amazon Services
Amazon S3 for images and documents Each document in several formats Background task for doc backups Amazon EC2 for documents converting
Contacts
Email: kpumuk@kpumuk.info Blog: http://kpumuk.info/ Twitter: http://twitter.com/kpumuk Github: http://github.com/kpumuk Scribd: http://www.scribd.com/kpumuk
We Are Hiring
Advanced knowledge of Ruby on Rails Java, .NET, Python Gurus are welcome Experience with heavily loaded apps Deep understanding of MySQL and RDBMS Ability to use Google and... Google etc.