Vous êtes sur la page 1sur 97

CouchDB A Database for the Web

Karel Minak

Karel Minak
Independent web designer and developer
Ruby, Rails, Git and CouchDB propagandista in .cz
Previously: Flash Developer; Art Director; Information Architect; (see LinkedIn)
@karmiq at Twitter

karmi.cz

CouchDB A Database for the Web

$couchdb
ApacheCouchDBhasstarted.Timetorelax.

CouchDB A Database for the Web

Apache CouchDB is a distributed, fault-tolerant


and schema-free document-oriented database
accessible via a RESTful HTTP/JSON API.
http://wiki.apache.org/couchdb

CouchDB A Database for the Web

r
t
s
i
D

d
e
t
ibu

e
e
r
F
Apache CouchDB is a distributed,
fault-tolerant
a
em

and

t
n
e
m
u
c
Do nted
schema-free
e
i
r
O

Sch

document-oriented database

l
accessible via a RESTful
HTTP/JSON
API.
u
J
f
S
O
T
S
E
R

http://wiki.apache.org/couchdb

CouchDB A Database for the Web

Talk Outline

The NoSQL Moniker

The CouchDB Story

Schema-free Document Storage

HTTP From Top to Bottom

RESTful API

Querying With Map/reduce

Fault-tolerant, Distributed, Highly-available and Concurrent

Demo: Example Application (Address Book)

CouchDB A Database for the Web

Project Voldemort

NOSQL

Reasons for NoSQL

NoSQL is neither a protest movement


nor trendy bullshit.

Reasons for developing new


databases are real.
Most stem from some real pain.

CouchDB A Database for the Web

NOSQL

Database denormalization at Digg


SELECT`digdate`,`id`FROM`Diggs`
WHERE`userid`IN(1,2,3,4,...1000000)
ANDitemid=123ORDERBY`digdate`DESC,`id`DESC;

A full query can actually clock in at 1.5kb, which is many times


larger than the actual data we want. With a cold cache, this query
can take 14 seconds to execute.

Non-relational data stores reverse this model completely, because they dont have the
complex read operations of SQL. The model forces you to shift your computation to the
writes, while reducing most reads to simple operations the equivalent of SELECT*
FROM`Table`.
http://about.digg.com/blog/looking-future-cassandra
CouchDB A Database for the Web

NOSQL

Redis: Big O Notation BuiltIn


redis>rpushmylist1
redis>rpushmylist2

redis>lpopmylist
"1"
...
redis>llenmylist
(integer)1000000

redis>lpopmylist
"2"
$ redis-benchmark
...
====== LPOP ======
10025 requests completed in 0.53 seconds
...

93.43% <= 3 milliseconds


CouchDB A Database for the Web

NOSQL

Use Case: Job Queue

RPUSH

LPOP

O(1)

Millions of items

http://github.com/defunkt/resque/blob/master/lib/resque.rb#L133-138

CouchDB A Database for the Web

The CouchDB Story

CouchDB A Database for the Web

THE COUCHDB STORY

Damien Katz: CouchDB and Me

Damien Katz
(RubyFringe 2008)

http://www.infoq.com/presentations/katz-couchdb-and-me

CouchDB A Database for the Web

THE COUCHDB STORY

Damien Katz: CouchDB and Me

In the beginning, there was C++, XML and


custom query language.
Stuff nobody ever got fired for.
Then came Erlang, HTTP, JSON and map/reduce.

CouchDB A Database for the Web

Schemafree Documents

CouchDB A Database for the Web

SCHEMA-FREE STORAGE

Relational Data

OH: The world is relational!!!


17 minutes ago via Tweetie for Mac
Retweeted by 10000 people

CouchDB A Database for the Web

That does not mean the world conforms


to the third normal form.

CouchDB A Database for the Web

In fact, its rather the exact opposite.

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

The Textbook Example

Design a customer database.


People have names, e-mail, phone numbers,

How many phone numbers?

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

The Textbook Example

Relational Databases 101

Customers
id INTEGER

A N P

first_name VARCHAR
last_name VARCHAR
phone VARCHAR

Now. What about multiple phone numbers?

http://en.wikipedia.org/wiki/First_normal_form#Domains_and_values

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

The Textbook Example


The solution, Pt. 1

Customers
id INTEGER

A N P

first_name VARCHAR
last_name VARCHAR
phone VARCHAR

We will use the database only from the application, anyway.


http://en.wikipedia.org/wiki/First_normal_form#Domains_and_values

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

The Textbook Example


The solution, Pt. 2

Customers
id INTEGER

A N P

first_name VARCHAR
last_name VARCHAR
phone_1 VARCHAR
phone_2 VARCHAR
phone_3 VARCHAR

This is clearly better design!


Alright. Then, please answer these questions:
How do you search for a customers given a phone number?
Which customers have the same phone number?
How many phone numbers a customer has?
Then, please add the ability to store four phone numbers. Thanks.

http://en.wikipedia.org/wiki/First_normal_form#Domains_and_values

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

The Textbook Example


The Right Solution

Customers
id INTEGER

CustomerPhones
U A N P

first_name VARCHAR

customer_id INTEGER
phone VARCHAR

N F
N

last_name VARCHAR

http://en.wikipedia.org/wiki/First_normal_form#Domains_and_values

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

The Textbook Example

mysql>SELECT*FROMCustomersLEFTJOINCustomerPhones
ONCustomers.id=CustomerPhones.customer_id;
++++++
|id|first_name|last_name|customer_id|phone|
++++++
|1|John|Smith|1|123|
|1|John|Smith|1|456|
++++++

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

The Textbook Example

mysql>SELECT*FROMCustomersWHEREid=1;
++++
|id|first_name|last_name|
++++
|1|John|Smith|
++++

mysql>SELECTphoneFROMCustomerPhonesWHEREcustomer_idIN(1);
++
|phone|
++
|123|
|456|
++

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

Structured data
But, damn!, I want something like this:
{
"id":1,
"first_name":"Clara",
"last_name":"Rice",
"phones":["0000777888999","0000123456789","0000314181116"]
}

No problem, you just iterate over the rows and build your object. Thats the way it is!
If this would be too painful, we will put some cache there.

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

Ephemeral data

Not everything needs to be done right. Right?


classUser<ActiveRecord::Base
serialize:preferences
end

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

Consistency

Does the Right Way sometimes fail?


Hell yeah.
EXAMPLE

When designing an invoicing application, you store the


customer for the invoice the right way, via foreign keys.
Then, the customer address changes.
Did the address on the invoice also changed?
CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

Documents in the Real World

i
s
de
123

t
Fac

s
g
n
i
h
t
n
g

ory

Cit

ee
Str

y,

Ph

000
St

12

3E

VE
RY
Fa
e:
x:
W
5
5
Sk
H
5
.
yp
55 ER
e: 444
E
5
AV
cit .44 .55
5
EN
4
y.r
5
ea .44
UE
4
lit
y.l 4
td

00

on

cityr

eali

CI

TY
,S

ty, lt

00

00

d.

UNDERGROUND RECORDS
in fo @u nd er gr ou

nd re co rd s.c om

F: 555.555.5555

ITE 000 ANGELES, U.S.A/


P: 555.555.5555123 BOULEVARDAVE, SU
LOS
om
e c o r d s .c
M: 777.777.7777
groundr
r
e
d
n
.u
www
M: 888.000.1111

http://guide.couchdb.org/draft/why.html#better

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

Documents in the Real World


{
"_id"
"_rev"

: "clara-rice",
: "1-def456",

"first_name"
"last_name"

: "Clara",
: "Rice",

N
O
S
J

"phones"
: {
"mobile" : "0000 777 888 999"
"home"
: "0000 123 456 789",
"work"
: "0000 314 181 116"
},
"addresses"
:
"home"
: {
"street" :
"number" :
"city"
:
"country" :
},
},

{
"Wintheiser Ports",
"789/23",
"Erinshire",
"United Kingdom"

"occupation" : "model",
"birthday"
: "1970/05/01",
"groups"

: ["friends", "models"],

"created_at" : "2010/01/01 10:00:00 +0000"


}

CouchDB A Database for the Web

SCHEMAFREE DOCUMENTS

Documents in the Real World

CouchDB A Database for the Web

Procrustean Bed
CouchDB A Database for the Web

RESTful HTTP

CouchDB A Database for the Web

1990s

HTTP

Built Of the Web

Django may be built for the Web,


but CouchDB is built of the Web.
Ive never seen soware that so completely
embraces the philosophies behind HTTP.
Jacob Kaplan-Moss, Of the Web (2007)

http://jacobian.org/writing/of-the-web/

CouchDB A Database for the Web

HTTP

Built Of the Web

CouchDB makes Django look old-school in the


same way that Django makes ASP look
outdated.

http://jacobian.org/writing/of-the-web/

CouchDB A Database for the Web

HTTP

Built Of the Web

HTTP is the lingua anca of our age; if you


speak HTTP, it opens up all sorts of doors.
eres something almost subversive about
CouchDB; its completely language-, platform-,
and OS-agnostic.

http://jacobian.org/writing/of-the-web/

CouchDB A Database for the Web

HTTP

HTTP API
HOST=http://localhost:5984
curlXGET$HOST
#{"couchdb":"Welcome","version":"0.11.0b22c551bbgit"}
curlXGET$HOST/mydatabase
#{"error":"not_found","reason":"no_db_file"}
curlXPUT$HOST/mydatabase
#{"ok":true}
curlXPUT$HOST/mydatabase/abc123d'{"foo":"bar"}'
#{"ok":true,"id":"abc123","rev":"14c6114c65e295552ab1019e2b046b10e"}
curlXGET$HOST/mydatabase/abc123
#{"_id":"abc123","_rev":"14c6114c65e295552ab1019e2b046b10e","foo":"bar"}
curlXDELETE$HOST/mydatabase/abc123?rev=2d179f665eb01834faf192153dc72dcb3
#{"ok":true,"id":"abc123","rev":"14c6114c65e295552ab1019e2b046b10e"}

CouchDB A Database for the Web

HTTP

Easy To Wrap
require'rubygems'
require'ostruct'

require'restclient'
require'json'

1 HTTP library
2 JSON library

classArticle<OpenStruct
defself.db(path='')
RestClient::Resource.new"http://localhost:5984/blog/#{path}",
:headers=>{:content_type=>:json,:accept=>:json}
end
db.put''rescueRestClient::PreconditionFailed
defself.create(params={})
newdb.post(params.to_json)
end
defself.find(id)
newJSON.parse(db(id).get)
end
defdestroy
self.class.db(self._id+"?rev=#{self._rev}").delete
end

end

CouchDB A Database for the Web

HTTP

Easy To Wrap

Article.create:_id=>'myfirstpost',
:title=>'CouchDBiseasy',
:body=>'Sorelax!',
:tags=>['couchdb','databases']rescueRestClient::Conflict
article=Article.find('myfirstpost')
puts"Gotanarticle:"
particle
puts"\n"
puts"Title:%s"%article.title+"(class:#{article.title.class})"
puts"Tags:%s"%article.tags.inspect+"(class:#{article.tags.class})"
puts"\n\n"
puts"Deletingarticle..."
article.destroy

CouchDB A Database for the Web

HTTP

HTTP from Top to Bottom

$curlXPOSThttp://localhost:5984/_replicate\
d'{"source":"database",
"target":"http://example.org/database"}'

CouchDB A Database for the Web

HTTP

Making Real Use of HTTP

$curliXGET$HOST/mydatabase/abc123
HTTP/1.1200OK
Server:CouchDB/1.0.1(ErlangOTP/R14B)

Etag:"4f04f2435e031054d6b5298c5841ae052"
Date:Thu,23Sep201012:56:37GMT
ContentType:text/plain;charset=utf8
ContentLength:73
CacheControl:mustrevalidate
{"_id":"abc123","_rev":"4f04f2435e031054d6b5298c5841ae052","foo":"bar"}

CouchDB A Database for the Web

$cat/etc/squid3/squid.conf
cache_peer192.168.100.2parent59840noqueryoriginservername=master
aclmaster_aclmethodGETPOSTPUTDELETE
cache_peer_accessmasterallowmaster_acl

HTTP

What is RESTful?

REST is a set of principles that define how Web standards, such as HTTP and
URIs, are supposed to be used. (...) In summary, the five key principles are:
Give every thing an ID
Link things together
Use standard methods
Resources with multiple representations
Communicate statelessly

Stefan Tilkov, A Brief Introduction to REST

http://www.infoq.com/articles/rest-introduction

CouchDB A Database for the Web

HTTP

What is RESTful?

The basic idea is even more simple, though.


HTTP is not just a transfer protocol.
It is the interface for interacting with things itself.

CouchDB A Database for the Web

Fault-Tolerant and Concurrent

CouchDB A Database for the Web

CouchDB has no off switch.


CouchDB has no repair command.

$kill9<PID>
CouchDB A Database for the Web

FAULTTOLERANT

Erlang

Erlang!

http://www.youtube.com/watch?v=uKfKtXYLG78

CouchDB A Database for the Web

FAULTTOLERANT

Erlang

Erlang's main strength is support for concurrency. It has a small but powerful
set of primitives to create processes and communicate among them.
() a benchmark with 20 million processes has been successfully performed.

http://en.wikipedia.org/wiki/Erlang_(programming_language)

CouchDB A Database for the Web

FAULTTOLERANT

AppendOnly BTree

http://guide.couchdb.org/draft/btree.html

CouchDB A Database for the Web

Querying With Map/reduce

CouchDB A Database for the Web

MAP/REDUCE

The Google Paper

http://labs.google.com/papers/mapreduce.html

CouchDB A Database for the Web

MAP/REDUCE

The Concept

moduleEnumerable
alias:reduce:injectunlessmethod_defined?:reduce
end

(1..3).map{|number|number*2}
#=>[2,4,6]
(1..3).reduce(0){|sum,number|sum+=number}
#=>6

CouchDB A Database for the Web

MAP/REDUCE

The Simplest View

function(doc){
if(doc.last_name&&doc.first_name){
emit(doc.last_name+''+doc.first_name,doc)
}
}

CouchDB A Database for the Web

MAP/REDUCE

The Simplest View

INPUT

function(doc){
if(doc.last_name&&doc.first_name){
emit(doc.last_name+''+doc.first_name,doc)
}
}
OUTPUT

KEY

VALUE

CouchDB A Database for the Web

MAP/REDUCE

The Result of Map


Key

Value
_id:"lottiearmstrong",
_rev:"2fcb71b26096957b3ff3ffd2970f3c933",
addresses:{
home:{

city:"Murphyville"

...
}
},
first_name:"Lottie",
last_name:"Armstrong",
occupation:"programmer",

"ArmstrongLottie"

_id:"kaelynbailey",
_rev:"12e25e6c9448520fa796988894423a23b",
addresses:{
home:{
city:"LakeDedric"
...
}
},
first_name:"Kaelyn",
last_name:"Bailey",
occupation:"supermodel"

"BaileyKaelyn"

...

...

CouchDB A Database for the Web

MAP/REDUCE

The Result of Map

CouchDB A Database for the Web

MAP/REDUCE

Even Simpler View

function(doc) {
emit(doc.occupation, 1);
}

CouchDB A Database for the Web

MAP/REDUCE

Result of Even Simpler View

http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_occupation

CouchDB A Database for the Web

MAP/REDUCE

A Simple Reduce

function(keys,values){
returnsum(values)
}

CouchDB A Database for the Web

MAP/REDUCE

Result of a Simple Reduce

http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_occupation

CouchDB A Database for the Web

MAP/REDUCE

BuiltIn Erlang Reduce functions


$couchdb
ApacheCouchDBhasstarted.Timetorelax.

_count
_sum
_stats

http://wiki.apache.org/couchdb/Built-In_Reduce_Functions#Available_Build-In_Functions

CouchDB A Database for the Web

MAP/REDUCE

Map/Reduce for Counting tag-like stuff

function(doc){
for(groupindoc.groups){
emit(doc.groups[group],1)
}
}
_count

CouchDB A Database for the Web

MAP/REDUCE

Result of the Map phase

http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_groups

CouchDB A Database for the Web

MAP/REDUCE

Result of the Reduce Phase

http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_groups

CouchDB A Database for the Web

MAP/REDUCE

Group Levels

function(doc){
vardate=newDate(doc.birthday)
emit([date.getFullYear(),date.getMonth()+1,date.getDate()],1)
}
COMPOSITEKEY(ARRAY)
_count

CouchDB A Database for the Web

MAP/REDUCE

Group Level Exact

http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_birthday

CouchDB A Database for the Web

MAP/REDUCE

Group Level 2

http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_birthday

CouchDB A Database for the Web

MAP/REDUCE

Group Level 1

http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_birthday

CouchDB A Database for the Web

QUERYING VIEWS

Parameters for querying views


key
startkey
startkey_docid
endkey
endkey_docid
limit
stale
descending
skip
group
group_level
reduce
include_docs
CouchDB A Database for the Web

QUERYING VIEWS

A Complex Map/Reduce

CouchDB A Database for the Web

QUERYING VIEWS

A Complex Map/Reduce

SELECT
COUNT(*)AScount,
DATE_FORMAT(published_at,"%Y/%m/%d")ASdate,
keywords.valueASkeyword
FROMfeed_entries
INNERJOINfeedsONfeed_entries.feed_id=feeds.id
INNERJOINkeywordsONfeeds.keyword_id=keywords.id
WHEREDATE_SUB(CURDATE(),INTERVAL90DAY)<=feed_entries.published_at
GROUPBYdate,keyword
ORDERBYdate,keywordASC;

CouchDB A Database for the Web

QUERYING VIEWS

A Complex Map/Reduce

But. We dont need a table. We need the data in a format like this:
Streamgraph.load_data({
max:170,
keywords:['ruby','python','erlang','javascript','haskell'],
values:[
{date:'2010/01/01',ruby:50,python:20,erlang:5,javascript:30,haskell:50},
{date:'2010/02/01',ruby:20,python:20,erlang:2,javascript:40,haskell:43},
{date:'2010/03/01',ruby:70,python:20,erlang:10,javascript:80,haskell:15},
{date:'2010/04/01',ruby:20,python:40,erlang:8,javascript:30,haskell:12},
{date:'2010/05/01',ruby:150,python:30,erlang:12,javascript:40,haskell:18},
{date:'2010/06/01',ruby:30,python:10,erlang:14,javascript:170,haskell:14}
]
});

CouchDB A Database for the Web

QUERYING VIEWS

The Map Phase


function(doc){
varfix_date=function(junk){
varformatted=junk.toString().replace(//g,"/").replace("T","").substring(0,19);
returnnewDate(formatted);
};
//Formatintegerstohaveatleasttwodigits.
varf=function(n){returnn<10?'0'+n:n;}
//Thisisaformatthatcollatesinorderandtendstoworkwith
//JavaScript'snewDate(string)dateparsingcapabilities,unlikerfc3339.
Date.prototype.toJSON=function(){
returnthis.getUTCFullYear()+'/'+
f(this.getUTCMonth()+1)+'/'+
f(this.getUTCDate())+''+
f(this.getUTCHours())+':'+
f(this.getUTCMinutes())+':'+
f(this.getUTCSeconds())+'+0000';
};
if(doc['couchresttype']=='Mention'){
for(keywordindoc.keywords){
varkey=fix_date(doc.published_at).toJSON().substring(0,10);
varvalue={};
value[doc.keywords[keyword]]=1;
emit(key,value);
}
}
}

CouchDB A Database for the Web

QUERYING VIEWS

The Reduce Phase


function(keys,values,rereduce){
if(rereduce){
varresult={}
for(iteminvalues){
for(propinvalues[item]){
if(result[prop]){result[prop]+=values[item][prop]}
else{result[prop]=values[item][prop]}
}
}
returnresult;
}
else{
//Preparethedatafortherereduce
vardate=keys[0][0];
varresult={}
for(valueinvalues){
varitem=values[value];
for(propinitem){
if(result[prop]){result[prop]+=item[prop]}
else{result[prop]=item[prop]}
}
}
returnresult;
}
}

CouchDB A Database for the Web

QUERYING VIEWS

The Result

$curlhttp://localhost:5984/customer_database/_design/Mention/_view/by_date_and_keyword?group=true

{
"rows":[
{
"key":"2010/09/22",
"value":{"ruby":8,"python":19}
},
{
"key":"2010/09/23",
"value":{"ruby":24,"python":12}
},
{
"key":"2010/09/24",
"value":{"ruby":7,"python":8}
}
]
}

CouchDB A Database for the Web

QUERYING VIEWS

I JS. Or... dont?

CouchDB A Database for the Web

QUERYING VIEWS

Complex Queries

So What if you need something like:


Show me all supermodels who live in Beckerborough.
Out of luck?

CouchDB A Database for the Web

COMPLEX QUERIES

CouchDBLucene

This guy knows.

Show me all supermodels who live in Beckerborough.


CouchDB A Database for the Web

COMPLEX QUERIES

foo AND bar

Couchdb-Lucene.
When you need foo AND bar.

http://github.com/rnewson/couchdb-lucene

CouchDB A Database for the Web

COUCHDB-LUCENE

Indexing function
function(doc){
varresult=newDocument();
if(doc.occupation){
result.add(doc.occupation,{"field":"occupation"})
}
if(doc.addresses){
for(addressindoc.addresses){
result.add(doc.addresses[address].city,{"field":"city"})
}
}
returnresult;
}

http://localhost:5984/addressbook/_fti/_design/person/search?q=occupation:supermodel AND city:Beckerborough

CouchDB A Database for the Web

Distributed

CouchDB A Database for the Web

DISTRIBUTED

Ubuntu One

CouchDB A Database for the Web

DISTRIBUTED

Replication

CouchDB A Database for the Web

DISTRIBUTED

Conflict Resolutions

_rev1

http://guide.couchdb.org/draft/consistency.html#study

CouchDB A Database for the Web

DISTRIBUTED

Simple Clustering With HTTP Reverse Proxies

http://ephemera.karmi.cz/post/247255194/simple-couchdb-multi-master-clustering-via-nginx

CouchDB A Database for the Web

DISTRIBUTED

Scaling Down

http://www.couchone.com/page/android
CouchDB A Database for the Web

DISTRIBUTED

CouchApps

CouchDB A Database for the Web

DISTRIBUTED

CouchApps

http://pollen.nymphormation.org/afgwar/_design/afgwardiary/index.html

CouchDB A Database for the Web

DISTRIBUTED

CouchApps

CouchDB A Database for the Web

Resources

CouchDB A Database for the Web

DISTRIBUTED

Resources

http://guide.couchdb.org

https://nosqleast.com/2009/#speaker/miller

http://www.couchone.com/migrating-to-couchdb

http://wiki.apache.org/couchdb/

http://blog.couchone.com/

http://stackoverflow.com/tags/couchdb/

CouchDB A Database for the Web

Demo: Example Application

CouchDB A Database for the Web

DEMO

Application

SOURCE CODE:

http://github.com/karmi/couchdb-showcase

http://karmi.couchone.com/addressbook/_design/person/_list/all/all

CouchDB A Database for the Web

Questions!

Vous aimerez peut-être aussi