Académique Documents
Professionnel Documents
Culture Documents
Karel Minak
Karel Minak
Independent web designer and developer
Ruby, Rails, Git and CouchDB propagandista in .cz
Previously: Flash Developer; Art Director; Information Architect; (see LinkedIn)
@karmiq at Twitter
karmi.cz
$couchdb
ApacheCouchDBhasstarted.Timetorelax.
r
t
s
i
D
d
e
t
ibu
e
e
r
F
Apache CouchDB is a distributed,
fault-tolerant
a
em
and
t
n
e
m
u
c
Do nted
schema-free
e
i
r
O
Sch
document-oriented database
l
accessible via a RESTful
HTTP/JSON
API.
u
J
f
S
O
T
S
E
R
http://wiki.apache.org/couchdb
Talk Outline
RESTful API
Project Voldemort
NOSQL
NOSQL
Non-relational data stores reverse this model completely, because they dont have the
complex read operations of SQL. The model forces you to shift your computation to the
writes, while reducing most reads to simple operations the equivalent of SELECT*
FROM`Table`.
http://about.digg.com/blog/looking-future-cassandra
CouchDB A Database for the Web
NOSQL
redis>lpopmylist
"1"
...
redis>llenmylist
(integer)1000000
redis>lpopmylist
"2"
$ redis-benchmark
...
====== LPOP ======
10025 requests completed in 0.53 seconds
...
NOSQL
RPUSH
LPOP
O(1)
Millions of items
http://github.com/defunkt/resque/blob/master/lib/resque.rb#L133-138
Damien Katz
(RubyFringe 2008)
http://www.infoq.com/presentations/katz-couchdb-and-me
Schemafree Documents
SCHEMA-FREE STORAGE
Relational Data
SCHEMAFREE DOCUMENTS
SCHEMAFREE DOCUMENTS
Customers
id INTEGER
A N P
first_name VARCHAR
last_name VARCHAR
phone VARCHAR
http://en.wikipedia.org/wiki/First_normal_form#Domains_and_values
SCHEMAFREE DOCUMENTS
Customers
id INTEGER
A N P
first_name VARCHAR
last_name VARCHAR
phone VARCHAR
SCHEMAFREE DOCUMENTS
Customers
id INTEGER
A N P
first_name VARCHAR
last_name VARCHAR
phone_1 VARCHAR
phone_2 VARCHAR
phone_3 VARCHAR
http://en.wikipedia.org/wiki/First_normal_form#Domains_and_values
SCHEMAFREE DOCUMENTS
Customers
id INTEGER
CustomerPhones
U A N P
first_name VARCHAR
customer_id INTEGER
phone VARCHAR
N F
N
last_name VARCHAR
http://en.wikipedia.org/wiki/First_normal_form#Domains_and_values
SCHEMAFREE DOCUMENTS
mysql>SELECT*FROMCustomersLEFTJOINCustomerPhones
ONCustomers.id=CustomerPhones.customer_id;
++++++
|id|first_name|last_name|customer_id|phone|
++++++
|1|John|Smith|1|123|
|1|John|Smith|1|456|
++++++
SCHEMAFREE DOCUMENTS
mysql>SELECT*FROMCustomersWHEREid=1;
++++
|id|first_name|last_name|
++++
|1|John|Smith|
++++
mysql>SELECTphoneFROMCustomerPhonesWHEREcustomer_idIN(1);
++
|phone|
++
|123|
|456|
++
SCHEMAFREE DOCUMENTS
Structured data
But, damn!, I want something like this:
{
"id":1,
"first_name":"Clara",
"last_name":"Rice",
"phones":["0000777888999","0000123456789","0000314181116"]
}
No problem, you just iterate over the rows and build your object. Thats the way it is!
If this would be too painful, we will put some cache there.
SCHEMAFREE DOCUMENTS
Ephemeral data
SCHEMAFREE DOCUMENTS
Consistency
SCHEMAFREE DOCUMENTS
i
s
de
123
t
Fac
s
g
n
i
h
t
n
g
ory
Cit
ee
Str
y,
Ph
000
St
12
3E
VE
RY
Fa
e:
x:
W
5
5
Sk
H
5
.
yp
55 ER
e: 444
E
5
AV
cit .44 .55
5
EN
4
y.r
5
ea .44
UE
4
lit
y.l 4
td
00
on
cityr
eali
CI
TY
,S
ty, lt
00
00
d.
UNDERGROUND RECORDS
in fo @u nd er gr ou
nd re co rd s.c om
F: 555.555.5555
http://guide.couchdb.org/draft/why.html#better
SCHEMAFREE DOCUMENTS
: "clara-rice",
: "1-def456",
"first_name"
"last_name"
: "Clara",
: "Rice",
N
O
S
J
"phones"
: {
"mobile" : "0000 777 888 999"
"home"
: "0000 123 456 789",
"work"
: "0000 314 181 116"
},
"addresses"
:
"home"
: {
"street" :
"number" :
"city"
:
"country" :
},
},
{
"Wintheiser Ports",
"789/23",
"Erinshire",
"United Kingdom"
"occupation" : "model",
"birthday"
: "1970/05/01",
"groups"
: ["friends", "models"],
SCHEMAFREE DOCUMENTS
Procrustean Bed
CouchDB A Database for the Web
RESTful HTTP
1990s
HTTP
http://jacobian.org/writing/of-the-web/
HTTP
http://jacobian.org/writing/of-the-web/
HTTP
http://jacobian.org/writing/of-the-web/
HTTP
HTTP API
HOST=http://localhost:5984
curlXGET$HOST
#{"couchdb":"Welcome","version":"0.11.0b22c551bbgit"}
curlXGET$HOST/mydatabase
#{"error":"not_found","reason":"no_db_file"}
curlXPUT$HOST/mydatabase
#{"ok":true}
curlXPUT$HOST/mydatabase/abc123d'{"foo":"bar"}'
#{"ok":true,"id":"abc123","rev":"14c6114c65e295552ab1019e2b046b10e"}
curlXGET$HOST/mydatabase/abc123
#{"_id":"abc123","_rev":"14c6114c65e295552ab1019e2b046b10e","foo":"bar"}
curlXDELETE$HOST/mydatabase/abc123?rev=2d179f665eb01834faf192153dc72dcb3
#{"ok":true,"id":"abc123","rev":"14c6114c65e295552ab1019e2b046b10e"}
HTTP
Easy To Wrap
require'rubygems'
require'ostruct'
require'restclient'
require'json'
1 HTTP library
2 JSON library
classArticle<OpenStruct
defself.db(path='')
RestClient::Resource.new"http://localhost:5984/blog/#{path}",
:headers=>{:content_type=>:json,:accept=>:json}
end
db.put''rescueRestClient::PreconditionFailed
defself.create(params={})
newdb.post(params.to_json)
end
defself.find(id)
newJSON.parse(db(id).get)
end
defdestroy
self.class.db(self._id+"?rev=#{self._rev}").delete
end
end
HTTP
Easy To Wrap
Article.create:_id=>'myfirstpost',
:title=>'CouchDBiseasy',
:body=>'Sorelax!',
:tags=>['couchdb','databases']rescueRestClient::Conflict
article=Article.find('myfirstpost')
puts"Gotanarticle:"
particle
puts"\n"
puts"Title:%s"%article.title+"(class:#{article.title.class})"
puts"Tags:%s"%article.tags.inspect+"(class:#{article.tags.class})"
puts"\n\n"
puts"Deletingarticle..."
article.destroy
HTTP
$curlXPOSThttp://localhost:5984/_replicate\
d'{"source":"database",
"target":"http://example.org/database"}'
HTTP
$curliXGET$HOST/mydatabase/abc123
HTTP/1.1200OK
Server:CouchDB/1.0.1(ErlangOTP/R14B)
Etag:"4f04f2435e031054d6b5298c5841ae052"
Date:Thu,23Sep201012:56:37GMT
ContentType:text/plain;charset=utf8
ContentLength:73
CacheControl:mustrevalidate
{"_id":"abc123","_rev":"4f04f2435e031054d6b5298c5841ae052","foo":"bar"}
$cat/etc/squid3/squid.conf
cache_peer192.168.100.2parent59840noqueryoriginservername=master
aclmaster_aclmethodGETPOSTPUTDELETE
cache_peer_accessmasterallowmaster_acl
HTTP
What is RESTful?
REST is a set of principles that define how Web standards, such as HTTP and
URIs, are supposed to be used. (...) In summary, the five key principles are:
Give every thing an ID
Link things together
Use standard methods
Resources with multiple representations
Communicate statelessly
http://www.infoq.com/articles/rest-introduction
HTTP
What is RESTful?
$kill9<PID>
CouchDB A Database for the Web
FAULTTOLERANT
Erlang
Erlang!
http://www.youtube.com/watch?v=uKfKtXYLG78
FAULTTOLERANT
Erlang
Erlang's main strength is support for concurrency. It has a small but powerful
set of primitives to create processes and communicate among them.
() a benchmark with 20 million processes has been successfully performed.
http://en.wikipedia.org/wiki/Erlang_(programming_language)
FAULTTOLERANT
AppendOnly BTree
http://guide.couchdb.org/draft/btree.html
MAP/REDUCE
http://labs.google.com/papers/mapreduce.html
MAP/REDUCE
The Concept
moduleEnumerable
alias:reduce:injectunlessmethod_defined?:reduce
end
(1..3).map{|number|number*2}
#=>[2,4,6]
(1..3).reduce(0){|sum,number|sum+=number}
#=>6
MAP/REDUCE
function(doc){
if(doc.last_name&&doc.first_name){
emit(doc.last_name+''+doc.first_name,doc)
}
}
MAP/REDUCE
INPUT
function(doc){
if(doc.last_name&&doc.first_name){
emit(doc.last_name+''+doc.first_name,doc)
}
}
OUTPUT
KEY
VALUE
MAP/REDUCE
Value
_id:"lottiearmstrong",
_rev:"2fcb71b26096957b3ff3ffd2970f3c933",
addresses:{
home:{
city:"Murphyville"
...
}
},
first_name:"Lottie",
last_name:"Armstrong",
occupation:"programmer",
"ArmstrongLottie"
_id:"kaelynbailey",
_rev:"12e25e6c9448520fa796988894423a23b",
addresses:{
home:{
city:"LakeDedric"
...
}
},
first_name:"Kaelyn",
last_name:"Bailey",
occupation:"supermodel"
"BaileyKaelyn"
...
...
MAP/REDUCE
MAP/REDUCE
function(doc) {
emit(doc.occupation, 1);
}
MAP/REDUCE
http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_occupation
MAP/REDUCE
A Simple Reduce
function(keys,values){
returnsum(values)
}
MAP/REDUCE
http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_occupation
MAP/REDUCE
_count
_sum
_stats
http://wiki.apache.org/couchdb/Built-In_Reduce_Functions#Available_Build-In_Functions
MAP/REDUCE
function(doc){
for(groupindoc.groups){
emit(doc.groups[group],1)
}
}
_count
MAP/REDUCE
http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_groups
MAP/REDUCE
http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_groups
MAP/REDUCE
Group Levels
function(doc){
vardate=newDate(doc.birthday)
emit([date.getFullYear(),date.getMonth()+1,date.getDate()],1)
}
COMPOSITEKEY(ARRAY)
_count
MAP/REDUCE
http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_birthday
MAP/REDUCE
Group Level 2
http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_birthday
MAP/REDUCE
Group Level 1
http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_birthday
QUERYING VIEWS
QUERYING VIEWS
A Complex Map/Reduce
QUERYING VIEWS
A Complex Map/Reduce
SELECT
COUNT(*)AScount,
DATE_FORMAT(published_at,"%Y/%m/%d")ASdate,
keywords.valueASkeyword
FROMfeed_entries
INNERJOINfeedsONfeed_entries.feed_id=feeds.id
INNERJOINkeywordsONfeeds.keyword_id=keywords.id
WHEREDATE_SUB(CURDATE(),INTERVAL90DAY)<=feed_entries.published_at
GROUPBYdate,keyword
ORDERBYdate,keywordASC;
QUERYING VIEWS
A Complex Map/Reduce
But. We dont need a table. We need the data in a format like this:
Streamgraph.load_data({
max:170,
keywords:['ruby','python','erlang','javascript','haskell'],
values:[
{date:'2010/01/01',ruby:50,python:20,erlang:5,javascript:30,haskell:50},
{date:'2010/02/01',ruby:20,python:20,erlang:2,javascript:40,haskell:43},
{date:'2010/03/01',ruby:70,python:20,erlang:10,javascript:80,haskell:15},
{date:'2010/04/01',ruby:20,python:40,erlang:8,javascript:30,haskell:12},
{date:'2010/05/01',ruby:150,python:30,erlang:12,javascript:40,haskell:18},
{date:'2010/06/01',ruby:30,python:10,erlang:14,javascript:170,haskell:14}
]
});
QUERYING VIEWS
QUERYING VIEWS
QUERYING VIEWS
The Result
$curlhttp://localhost:5984/customer_database/_design/Mention/_view/by_date_and_keyword?group=true
{
"rows":[
{
"key":"2010/09/22",
"value":{"ruby":8,"python":19}
},
{
"key":"2010/09/23",
"value":{"ruby":24,"python":12}
},
{
"key":"2010/09/24",
"value":{"ruby":7,"python":8}
}
]
}
QUERYING VIEWS
QUERYING VIEWS
Complex Queries
COMPLEX QUERIES
CouchDBLucene
COMPLEX QUERIES
Couchdb-Lucene.
When you need foo AND bar.
http://github.com/rnewson/couchdb-lucene
COUCHDB-LUCENE
Indexing function
function(doc){
varresult=newDocument();
if(doc.occupation){
result.add(doc.occupation,{"field":"occupation"})
}
if(doc.addresses){
for(addressindoc.addresses){
result.add(doc.addresses[address].city,{"field":"city"})
}
}
returnresult;
}
Distributed
DISTRIBUTED
Ubuntu One
DISTRIBUTED
Replication
DISTRIBUTED
Conflict Resolutions
_rev1
http://guide.couchdb.org/draft/consistency.html#study
DISTRIBUTED
http://ephemera.karmi.cz/post/247255194/simple-couchdb-multi-master-clustering-via-nginx
DISTRIBUTED
Scaling Down
http://www.couchone.com/page/android
CouchDB A Database for the Web
DISTRIBUTED
CouchApps
DISTRIBUTED
CouchApps
http://pollen.nymphormation.org/afgwar/_design/afgwardiary/index.html
DISTRIBUTED
CouchApps
Resources
DISTRIBUTED
Resources
http://guide.couchdb.org
https://nosqleast.com/2009/#speaker/miller
http://www.couchone.com/migrating-to-couchdb
http://wiki.apache.org/couchdb/
http://blog.couchone.com/
http://stackoverflow.com/tags/couchdb/
DEMO
Application
SOURCE CODE:
http://github.com/karmi/couchdb-showcase
http://karmi.couchone.com/addressbook/_design/person/_list/all/all
Questions!