Vous êtes sur la page 1sur 74

Scaling with SkyTools & More

Scaling-Out Postgres with Skypes Open-Source Toolset

Gavin M. Roy September 14th, 2011

About Me
PostgreSQL ~ 6.5 CTO @myYearbook.com Scaled initial infrastructure Not as involved day-to-day database
operational and development

Twitter: @Crad

Scaling?

Concurrency
Requests per Second
6am

8am 10am 12pm 2pm 4pm 6pm 8pm 10pm 12am 2am Hourly breakdown

4am

6am

Increasing Size-On-Disk

Size in GB
Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Scaling and PostgreSQL Behavior

Size on Disk

Tuples, Indexes, Overhead

Table Size + Size of all combined Indexes

Relations

Indexes

Constraints

Available Memory Disk Speed IO Bus Speed

Keep it in memory.

Get Fast Disks & I/O.

Process Forking + Locks

Client Connections

One Connection per Concurrent Request

Apache+PHP
One connection per backend for each pg_connect

Python
One connection per connection*

ODBC
One connection to Postgres per ODBC connection

Lock Contention?
Each backend for a connected client has to check for locks

Master Process

Stats Collector

Autovacuum

Wall Writer

Wall Writer

Connection Backend

Client Connection

Master Process

New Client Connection?


Access Share Access Exclusive Exclusive Share Share Row Exclusive Share Update Row Share Row Exclusive

Stats Collector

Autovacuum

Wall Writer

Wall Writer

Connection Backend

Client Connection

Connection Backend

Client Connection

Master Process

Too many connections?


Slow performance

Stats Collector

Autovacuum

Wall Writer

Wall Writer

Connection Backend

Client Connection

Connection Backend

Client Connection

...
Connection Backend Client Connection

250 Apache Backends x 1 Connection per Backend x 250 Servers = 62,500 Connections

Solvable Problems!

The Trailblazers

Solving Concurrency

pgBouncer

Session Pooling

Transactional Pooling

Statement Pooling

Connection Pooling
Clients Clients Clients

Hundreds

Hundreds

Hundreds

pgBouncer

Tens
Postgres Server #1

Tens
Postgres Server #2

Tens
Postgres Server #3

Add Local Pooling


Clients Clients Clients

Hundreds

Hundreds

Hundreds

Local pgBouncer

Local pgBouncer

Local pgBouncer

Tens

Tens

Tens

pgBouncer

Tens
Postgres Server #1

Tens
Postgres Server #2

Tens
Postgres Server #3

Easy to run
Usage: pgbouncer [OPTION]... config.ini -d, --daemon Run in background (as a daemon) -R, --restart Do a online restart -q, --quiet Run quietly -v, --verbose Increase verbosity -u, --user=<username> Assume identity of <username> -V, --version Show version -h, --help Show this help screen and exit

userlist.txt
username password foo bar

pgbouncer.ini

Specifying Connections
[databases] ; foodb over unix socket foodb = ; redirect bardb to bazdb on localhost bardb = host=localhost dbname=bazdb ; access to dest database will go with single user forcedb = host=127.0.0.1 port=300 user=baz password=foo client_encoding=UNICODE datestyle=ISO connect_query='SELECT 1'

Base Daemon Cong


[pgbouncer] logfile = pgbouncer.log pidfile = pgbouncer.pid ; ip address or * which means all ip-s listen_addr = 127.0.0.1 listen_port = 6432 ; unix socket is also used for -R. ;unix_socket_dir = /tmp

Authentication
; any, trust, plain, crypt, md5 auth_type = trust #auth_file = 8.0/main/global/pg_auth auth_file = etc/userlist.txt admin_users = user2, someadmin, otheradmin stats_users = stats, root

Stats Users?
SHOW HELP|CONFIG|DATABASES|POOLS|CLIENTS|SERVERS|VERSION SHOW FDS|SOCKETS|ACTIVE_SOCKETS|LISTS|MEM
pgbouncer=# SHOW CLIENTS; type | user | database | state | addr | port | local_addr | local_port | connect_time ------+-------+-----------+--------+-----------+-------+------------+------------+--------------------C | stats | pgbouncer | active | 127.0.0.1 | 47229 | 127.0.0.1 | 6000 | 2011-09-13 17:55:46 * Truncated columns for display purposes

psql 9.0+ Problem?


psql -U stats -p 6432 pgbouncer psql: ERROR: Unknown startup parameter Add to pgbouncer.ini: ignore_startup_parameters = application_name

Pooling Behavior
pool_mode = statement server_check_query = select 1 server_check_delay = 10 max_client_conn = 1000 default_pool_size = 20 server_connect_timeout = 15 server_lifetime = 1200 server_idle_timeout = 60

Skytools

Scale-Out Reads
Clients Clients Clients Clients pgBouncer

Load Balancer

Read Only Copy

Read Only Copy

Read Only Copy

Read Only Copy

Canonical Database

PGQ

The Ticker

ticker.ini
[pgqadm] job_name = pgopen_ticker db = dbname=pgopen # how often to run maintenance [seconds] maint_delay = 600 # how often to check for activity [seconds] loop_delay = 0.1 logfile = ~/Source/pgopen_skytools/%(job_name)s.log pidfile = ~/Source/pgopen_skytools/%(job_name)s.pid

Getting PGQ Running


Setup our ticker:
pgqadm.py ticker.ini install

Run the ticker daemon:


pgqadm.py ticker.ini ticker -d

Londiste

replication.ini
[londiste] job_name = pgopen_to_destination provider_db = dbname=pgopen subscriber_db = dbname=destination # it will be used as sql ident so no dots/spaces pgq_queue_name = pgopen logfile = ~/Source/pgopen_skytools/%(job_name)s.log pidfile = ~/Source/pgopen_skytools/%(job_name)s.pid

Install Londiste
londiste.py replication.ini provider install londiste.py replication.ini subscriber install

Start Replication Daemon

londiste.py replication.ini replay -d

DDL?

Add the Provider Tables and Sequences


londiste.py replication.ini provider add public.auth_user

Add the Subscriber Tables and Sequences


londiste.py replication.ini subscriber add public.auth_user

Great Success!

PL/Proxy

Scale-Out Reads & Writes

plProxy Server

A-F Server

G-L Server

M-R Server

S-Z Server

How does it work?

Simple Remote Connection


CREATE FUNCTION get_user_email(username text) RETURNS SETOF text AS $$ CONNECT 'dbname=remotedb'; SELECT email FROM users WHERE username = $1; $$ LANGUAGE plproxy;

Sharded Request
CREATE FUNCTION get_user_email(username text) RETURNS SETOF text AS $$ CLUSTER usercluster; RUN ON hashtext(username); $$ LANGUAGE plproxy;

Sharding Setup
Need 3 Functions:

plproxy.get_cluster_partitions(cluster_name text) plproxy.get_cluster_version(cluster_name text) plproxy.get_cluster_cong(in cluster_name text, out key text, out val text)

get_cluster_partitions
CREATE OR REPLACE FUNCTION plproxy.get_cluster_partitions(cluster_name text) RETURNS SETOF text AS $$ BEGIN IF cluster_name = 'usercluster' THEN RETURN NEXT 'dbname=part00 host=127.0.0.1'; RETURN NEXT 'dbname=part01 host=127.0.0.1'; RETURN; END IF; RAISE EXCEPTION 'Unknown cluster'; END; $$ LANGUAGE plpgsql;

get_cluster_version
CREATE OR REPLACE FUNCTION plproxy.get_cluster_version(cluster_name text) RETURNS int4 AS $$ BEGIN IF cluster_name = 'usercluster' THEN RETURN 1; END IF; RAISE EXCEPTION 'Unknown cluster'; END; $$ LANGUAGE plpgsql;

get_cluster_cong
CREATE OR REPLACE FUNCTION plproxy.get_cluster_config( in cluster_name text, out key text, out val text) RETURNS SETOF record AS $$ BEGIN -- lets use same config for all clusters key := 'connection_lifetime'; val := 30*60; -- 30m RETURN NEXT; RETURN; END; $$ LANGUAGE plpgsql;

get_cluster_cong values
connection_lifetime query_timeout disable_binary keepalive_idle keepalive_interval keepalive_count

SQL/MED

SQL/Med Cluster Denition


CREATE SERVER a_cluster FOREIGN DATA WRAPPER plproxy OPTIONS ( connection_lifetime '1800', disable_binary '1', p0 'dbname=part00 hostname=127.0.0.1', p1 'dbname=part01 hostname=127.0.0.1', p2 'dbname=part02 hostname=127.0.0.1', p3 'dbname=part03 hostname=127.0.0.1' );

PLProxy + SQL/Med Behavior


PL/Proxy will prefer SQL/Med cluster
denitions over the plproxy.get_* functions functions if there are no SQL/Med clusters

PL/Proxy will fallback to plproxy.get_*

SQL/MED User Mapping


CREATE USER MAPPING FOR bob SERVER a_cluster OPTIONS (user 'bob', password 'secret'); CREATE USER MAPPING FOR public SERVER a_cluster OPTIONS (user 'plproxy', password 'foo');

plproxyrc
plpgsql based api for table based
management of PL/Proxy

Used to manage complicated PL/Proxy


infrastructure @myYearbook

BSD Licensed
https://github.com/myYearbook/plproxyrc

Server-to-Server
Postgres Server #1 Postgres Server #2 Postgres Server #3

pgBouncer

Complex PL/Proxy and pgBouncer Environment


Clients Local pgBouncer Load Balancer

Clients

Local pgBouncer

Clients

Local pgBouncer

pgBouncer

pgBouncer

Postgres Server #1 pgBouncer Postgres Server #3 pgBouncer Postgres Server #3

plProxy Server

plProxy Server

Load Balancer

Other Tools and Methods?

Questions?

Vous aimerez peut-être aussi