Vous êtes sur la page 1sur 130

EvaluatingandModeling

NoSQLandSQLDatabases
Part2
2016EDW
byMichaelBowers
20160314
v.4.9
mike@cssDesignPatterns.com
1

Abstract
Weareinthemiddleofadatabaserevolution.NoSQLisdisrupting
thedatabaseworldbyinnovatinginmanydisruptiveways.
Howdowemodelinthesenewparadigms?
HowdoestheoldSQLparadigmfitinthisnewbraveworld?
Whatparadigmisbestforyourproject?
Weareinanewdataparadigm:
Newdatabasearchitectures(softwareandhardware)handle
thelargeandevergrowingvelocityandvolumeofdatathat
isdispersedacrossgeographicallydistantdatacenters
Newgraph,document,andwidecolumnmodeling
paradigmscompetewithrelational,anddimensional
Schemalessdatabasesenablemaximumagilityofsoftware
developmentandrapidchangestohugedatasets
2

Whatwillyoulearn?
Youwillbeabletochoosethebestdatabasetomeetyourneedsfor
velocity,volume,variety,variability,relevance,productivity,data
model,scale,consistency,andcost.
YouwillknowthetradeoffsofACIDorBASEconsistencymodels
andwhenitisOKornotOKtocompromiseconsistency.
Youwillunderstandthestrengthsandweaknessesofrelational,
dimensional,document,keyvalue,andtriplemodels,andwhich
SQLandNoSQLdatabasessupportwhichmodels.

AbouttheAuthor
MichaelBowers
PrincipalArchitect
LDSChurch
Author
ProCSSandHTMLDesign
Patterns
PublishedbyApress,2007

ProHTML5andCSS3Design
Patterns
PublishedbyApress,2011

mike@cssDesignPatterns.com
4

ChurchofJesusChristofLatterdaySaints

15millionmembers(29,621congregationsworldwide)
Humanitarianassistancein185countries
Thousandsofdocumentsin188publishedlanguages
192websitesandapplicationsinproduction
withbillionsofpageviewsannuallyrunningon
hundredsofMarkLogicservers

Agenda
1.
2.
3.
4.

DefiningNoSQLandBigData
OptimizingforVelocityorVolume
OptimizingforAvailabilityorConsistency
OptimizingforModelingParadigms

FiveDataParadigms
Relational
FlexibleQueries

Document
EasyDevelopment

Dimensional
DataWarehousing

Graph
UnlimitedRelationships

WideColumn/Keyvalue
FastPutsandGets

RelationalModeling
indetail

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)
newSQL

#58 GemFire
#69 Oracle x10

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Graph Raw
Document
Lessstructure(schemaless)

DataSourceusedinExamples
HospitalName:
OperationType:
Operation ID:
SurgeonName:

JohnHopkins
HeartTransplant
13
DorothyOz

Drug
Name

Drug
Manufacturer

Dose
Size

Minicillan
Maxicillan
Minicillan

DrugsRUs
200
Canada4LessDrugs 400
DrugUSA
150

Dose
UOM
mg
mg
mg
10

RelationalModeling
#1Normalize
Giveeachattributeitsownfield
Groupattributesintotablesensuringeach
tablehasonecoherentcontext
Assignoneprimarykeytoeachtable
Eliminateduplicateattributes
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan

JohnHopkins
HeartTransplant
DorothyOz
13

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Hospital
HospitalID
HospitalName
Surgeon
SurgeonID
SurgeonName

Operation
OperationID
HospitalID
SurgeonID
OperationType

OperationCodes
OperationCodeID
OperationCodeType
Drugs
DrugID
DrugName
DrugManufacturer

OperationDrugs
OperationID
DrugID
DoseSize
DoseUOM
11

RelationalModeling
#2Orthogonalize
Createreferencetablesthatstand
independentofallcontexts
Thismaximizesdatareusebyallowing
tablestobecombinedwithothertablesto
createanycontext
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan

JohnHopkins
HeartTransplant
DorothyOz
13

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Hospital
HospitalID
HospitalName
Surgeon
SurgeonID
SurgeonName

Operation
OperationID
HospitalID
SurgeonID
OperationType

OperationCodes
OperationCodeID
OperationCodeType
Drugs
DrugID
DrugName
DrugManufacturer

OperationDrugs
OperationID
DrugID
DoseSize
DoseUOM
12

RelationalModeling
#3Generalize
Maketablesmoregeneralinpurposeso
theycanbereusedinmultiplecontexts
SuchasreplacingSurgeonwithperson
Donotovergeneralizebecauseithidesthe
purposeofthemodel
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan

JohnHopkins
HeartTransplant
DorothyOz
13

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Hospital
HospitalID
HospitalName
Surgeon
SurgeonID
SurgeonName

Operation
OperationID
HospitalID
SurgeonID
OperationType

OperationCodes
OperationCodeID
OperationCodeType
Drugs
DrugID
DrugName
DrugManufacturer

OperationDrugs
OperationID
DrugID
DoseSize
DoseUOM
13

RelationalModeling
#4Tune
Modifytablestomeettheapplications
performanceneedsforreadsandwrites
Suchasspeedingreadsbymaterializing
viewsandduplicatingattributesacross
tablestoeliminatejoins
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan

JohnHopkins
HeartTransplant
DorothyOz
13

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Hospital
HospitalID
HospitalName
Surgeon
SurgeonID
SurgeonName

Operation
OperationID
HospitalID
SurgeonID
OperationType

OperationCodes
OperationCodeID
OperationCodeType
Drugs
DrugID
DrugName
DrugManufacturer

OperationDrugs
OperationID
DrugID
DoseSize
DoseUOM
14

RelationalModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

PROs

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

CONs

15

RelationalModelingAnswer
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

Orders
OrderID
CustomerID
OrderDate

OrderLineItems
OrderID
OrderLineID
ProductID
ProductPrice
ProductQuantity

Customers
CustomerID
CustomerName
CustomerPhone
CustomerAddressStreet
CustomerAddressCity
CustomerAddressState
CustomerAddressPostal

Products
ProductID
ProductName
ProductDescription
ProductListPrice
16

PROs
Mostflexiblequeries

RelationalModel

Updatedatainoneplace
Reusedatastructuresinanycontext
GreatDBtoDBintegration
Maturetools

Orders
OrderID
CustomerID
OrderDate

StandardQueryLanguage
Easytohireexpertise
CONs
Designtime,staticrelationships
Staticdatastructures:designbeforeloadingdata
Hardtomodel:mustshreddataintotables
Requirescodetomapshreddedrelationaldataback
intounifiedobjectorienteddatastructures
Cannotqueryforrelevance;hardtosearch

OrderLineItems
OrderID
OrderLineID
ProductID
ProductPrice
ProductQuantity

Customers
CustomerID
CustomerName
CustomerPhone
CustomerAddressStreet
CustomerAddressCity
CustomerAddressState
CustomerAddressPostal

Products
ProductID
ProductName
ProductDescription
ProductListPrice
17

RelationalModel
Summary
Useformaximumflexibility
inqueryingandupdating
operational data
Example:
traditionaldataentryapps
18

DimensionalModeling
indetail

19

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)
newSQL

#58 GemFire
#69 Oracle x10

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Graph Raw
Document
Lessstructure(schemaless)

20

DimensionalModeling
#1ModelContexts
Determinethebusiness
questionsyouwantto
answer
Determinewhichfactwill
answeroneormore
questions
Determinethegrainofthe
fact

HospitalDimension
HospitalID
Attributes

SurgeonDimension
SurgeonID
Attributes

DrugDoseFacts
HospitalID
SurgeonID
OperationID
DrugID
DrugDose

DrugDimension
DrugID
Attributes

Operation
Dimension
OperationID
Attributes

Determinethedimensions
neededtojoinwiththefact
toanswerthebusiness
questions
Createonestarschema(or
OLAPmodel)perfact

21

DimensionalModeling
#2ELT
Extractdatafromasource
system

HospitalDimension
HospitalID
Attributes

Loaditintoastagingarea
inthedatawarehouse

Transformitintothestar
schema

SurgeonDimension
SurgeonID
Attributes

DrugDoseFacts
HospitalID
SurgeonID
OperationID
DrugID
DrugDose

DrugDimension
DrugID
Attributes

Operation
Dimension
OperationID
Attributes

Improvedataquality

22

DimensionalModeling
#3SemanticLayer
Definesemanticlayer
toenableselfservice
reporting
Renamecolumnstobe
businessfriendly
Adddescriptionsto
columns

HospitalDimension
HospitalID
Attributes

SurgeonDimension
SurgeonID
Attributes

DrugDoseFacts
HospitalID
SurgeonID
OperationID
DrugID
DrugDose

DrugDimension
DrugID
Attributes

Operation
Dimension
OperationID
Attributes

Createjoinpathsfor
errorfreereporting

23

DimensionalModeling
#4Tune
Determinequery
patterns

HospitalDimension
HospitalID
Attributes

Optimizequeries
Optimizeindexesand
fulltablescans
Movetospecialized
datawarehouse
technology

SurgeonDimension
SurgeonID
Attributes

DrugDoseFacts
HospitalID
SurgeonID
OperationID
DrugID
DrugDose

DrugDimension
DrugID
Attributes

Operation
Dimension
OperationID
Attributes

24

DimensionalModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

PROs

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

CONs

25

DimensionalModelingAnswer
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

26

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

OrderFact
OrderID
OrderLineID
OrderDateID
CustomerID
ProductID
ProductQty
ProductPrice

DateDim
OrderDateID
OrderDate
OrderDay
OrderMonth
OrderQuarter
OrderYear

CustomerDim
CustomerID
CustomerName
CustomerPhone
CustomerAreaCode
CustomerAddressStreet
CustomerAddressCity
CustomerAddressState
CustomerAddressPostal

ProductDim
ProductID
ProductName
ProductDescription
ProductCategory
ProductListPrice

26

PROs

DimensionalModel

Queriesfactsincontext
Selfservice,adhocqueries
Highperformanceplatforms
Maturetoolsandintegration
StandardQueryLanguage
Turnsdataintoinformation
CONs
Expensiveplatforms
Designtime,staticstructures:
designstructuresfirstthenloaddata
Cannotqueryforrelevance
Cannotqueryforanswersthatarenotbuilt
intothemodel

OrderFact
OrderID
OrderLineID
OrderDateID
CustomerID
ProductID
ProductQty
ProductPrice

DateDim
OrderDateID
OrderDate
OrderDay
OrderMonth
OrderQuarter
OrderYear

CustomerDim
CustomerID
CustomerName
CustomerPhone
CustomerAreaCode
CustomerAddressStreet
CustomerAddressCity
CustomerAddressState
CustomerAddressPostal

ProductDim
ProductID
ProductName
ProductDescription
ProductCategory
ProductListPrice

27

DimensionalModel
Summary
Usetotransformauthoritativedata
intocontextual information
toenableselfservice,adhoc,flexiblereporting
Examples:BusinessIntelligence,
DataWarehouse
28

NewSQL
indetail

29

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)
newSQL

#58 GemFire
#69 Oracle x10

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Graph Raw
Document
Lessstructure(schemaless)

30

WhatswrongwithOldSQLDBs?

Relevance
Velocity
Volume
Variety
Variability

hacky

31

WhatswrongwithNewSQLDBs?

Relevance
Velocity
Volume

Variety
Variability

hacky

32

Relevance=meaningfultome
Narrative +Data =ContextualInformation +Relationships =Meaningful Knowledge
(Semantic

T
ARelationalModelofDataforLarge
SharedDataBanks
P
E.F.CODD
IBMResearchLaboratory,SanJose,California
L
L
O
InformationRetrieval,Volume13/Number6/
A
A
A
June,1970
E
Programsshouldremainunaffectedwhenthe
I
internalrepresentationofdataischanged.
Treestructuredinadequaciesare
I
T
discussed.Relationsarediscussedand
T
appliedtotheproblemsofredundancyand
T
T
consistency.
KEYWORDSANDPHRASES:database,data
T
T
structure,dataorganization,hierarchiesofdata,
T
T
networksofdata,relations
T
T
CRCATEGORIES: 3.70,3.73,3.75,4.20,4.22
R R R R R

&Structural)

1.RelationalModelandNormalForm
1.1.INTRODUCTION
Thispaperisconcernedwiththeapplicationofelementaryrelation
theorytoformatteddata.Theproblemsarethoseofdata
independenceanddatainconsistency.
Therelationalviewappearstobesuperiorinseveralrespectsto
thegraph ornetworkmodel.
Relationalviewformsasoundbasisfortreatingderivability,
redundancy,andconsistency.[and]aclearerevaluationof
1.2.DATADEPENDENCIESINPRESENTSYSTEMS
Tablesrepresentamajoradvancetowardthegoalofdata
independence
1.2.1.OrderingDependence.Programswhichtakeadvantageof
thestoredorderingofafilearelikelytofailifitbecomes
necessarytoreplacethatorderingbyadifferentone.
1.2.2.IndexingDependence.Canapplicationprogramsremain
invariantasindicescomeandgo?
1.2.3.AccessPathDependence.Manyoftheexistingformatteddata
systemsprovideuserswithtreestructuredfilesorslightlymore
generalnetworkmodelsofthedata.Theseprogramsfailwhena
changeinstructurebecomesnecessary.Theprogramisrequired
toexploitpathstothedata.Programsbecomedependentonthe
continuedexistenceofthepaths.

33

Variability
ManagingRapidChange
Schemasareincompatiblewithrapidchange
Constantlyevolvingdatastructures
Canweaffordtokeepalargeapplicationinsyncwithregularchangestodatastructures?

Bigdata
Isdatasolargethatittakestoolongtomodifyvalues,structures,andindexes?

Agiledevelopment
Arerequirementsstableenoughtocreatelonglastingrelationaldatastructures?

Schemaless dataisidealforrapidchange
Schemalessdataandlanguages
JSON/JavaScript,Triple/SPARQL,XML/XQuery

Defensiveprogramming isrequired
Youneverknowwhatquerieswillreturn

34

Variety
Handlingdatainallimaginableforms

Impedancemismatch
Differentdatastructures
Structured,unstructured,semistructured

Differentdataparadigms
Relational,Dimensional,Document,Graph,Objectoriented,etc.

Differentdatatypes
JSONdoesnthaveadate/time/durationtype,XMLschemaandSQLhavea
variety,etc.

Differentmarkupstandards
JSON,XML,RDF,etc.

35

newSQL
Summary
Usetoforinmemory,realtime
SQLtransactions
oldSQLdatabasesarenowproviding
highperformanceinmemorySQL
buttheystillcannotscalehorizontally
36

WideColumnand
KeyValueModeling
indetail
37

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)
newSQL

#58 GemFire
#69 Oracle x10

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Graph Raw
Document
Lessstructure(schemaless)

38

WideColumnandKeyValueDatabases
WideColumnorMultidimensionalKey
Database

Query

UniqueFeature

Cassandra

CQL

Schemadefined,collocated,composite columns

HBase

API

Massivelysparse columnsonHDFS

Aerospike

AQL

Schemalesswithdynamicallytypedcolumns

Accumulo

API

Hbaselike withcelllevelsecurity

OracleNoSQL

API

Consistent fastperformance,JSON&AvroSchema,ACIDshards

Redis

API

Inmemory datastructures

MemcacheDB

API

SimpleMemcacheAPI

Riak

API

Search,MapReduce

DynamoDB

API

Schemaless: keyplusflatJSONlikevalue

FoundationDB

API,SQL

ACID, userdefinedkeystructures

SimpleKey

39

3Columnar/KeyValueModels
Multidimensionalkey plusCellvalue
WideColumn

Simplekey plusMultidimensional value

40

MultidimensionalKeyandSingleCellModel
Multidimensionalkey plusCellvalue
WideColumn

Simplekeyplusmultidimensionalvalue

41

MultidimensionalKey
andSingleCellmodel
indetail
42

MultidimensionalKeyandSingleCellModel
#1ModelTransactions
Becausenojoinsarepossible,
createadenormalized
hierarchicalkeystructure
thatconnectseachattribute
ofthetransaction

DB

Ops
Ops
Ops
Ops
Ops
Ops
Ops
HospitalName:
JohnHopkins Ops
OperationType:
HeartTransplantOps
Ops
Operation ID:
13
Ops
SurgeonName:
DorothyOz
Ops
Ops
Drug
Drug
Dose
Ops
Name
Manufacturer
SizeOps
Minicillan
DrugsRUs
200 Ops
Maxicillan
Canada4LessDrugs 400 Ops
Minicillan
DrugUSA
150 Ops

Table

Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs
Dose JohnHopkins
OpsDrugs JohnHopkins
UOM
OpsDrugs JohnHopkins
mg JohnHopkins
OpsDrugs
OpsDrugs
mg JohnHopkins
OpsDrugs
mg JohnHopkins

Op
ID
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13

Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
2110
DrugMFG 20140814 Canada4LessDrugs
2110
Dose Size 20140814
400
mg
2110
DoseUOM 20140814
9448 OperationType 20140814 HeartTransplant
9448
Surgeon
20140814
DorothyOz
9448
DrugName 20140814
Minicillan
9448
DrugMFG 20140814
DrugUSA
9448
Dose Size 20140814
150
9448
DoseUOM 20140814
mg
43

MultidimensionalKeyandSingleCellModel
#2ReviewColocation
StructureofKey
Thekeydefineshowdatais
collocatedondisk.
OpsDrugs tableiscollocated
withintheOpsDB.
HospitalIDsarecollocated
withintheOpsDrugstable.
OpIDsarecollocatedwithin
theHospitalIDrows.
DrugIDsarecollocated
withintheOpIDrows.
Columns arecollocated
withineachrow,etc.

DB
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops

Table

Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins

Op
ID
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13

Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
2110
DrugMFG 20140814 Canada4LessDrugs
2110
Dose Size 20140814
400
mg
2110
DoseUOM 20140814
9448 OperationType 20140814 HeartTransplant
9448
Surgeon
20140814
DorothyOz
9448
DrugName 20140814
Minicillan
9448
DrugMFG 20140814
DrugUSA
9448
Dose Size 20140814
150
9448
DoseUOM 20140814
mg
44

MultidimensionalKeyandSingleCellModel
#3VerifyColocation
EffectsonQueries
Queriesarefastwhentheyuse
thekeytoretrievedata
Youcanretrieveallvaluesthat
arecollocatedwithinaportion
ofthekey:

DB
Ops
Ops
Ops
Ops
Ops
Ops

Table

Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins

Op
ID
13
13
13
13
13
13

Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg

Opsdatabase

ReturnsallcellsintheOpsdatabase

Ops/OpsDrugs

ReturnsallcellsintheOpsDrugstable

Ops/OpsDrugs/JohnHopkins

ReturnsallcellsintheJohnHopkinshospital

Ops/OpsDrugs/JohnHopkins/13

ReturnsallcellsinOperation13

Ops/OpsDrugs/JohnHopkins/13/1997

ReturnsallcellsforDrugID1997

Ops/OpsDrugs/JohnHopkins/13/1997/Surgeon

ReturnsthevaluefortheSurgeoncell

45

MultidimensionalKeyandSingleCellModel
#4ShardingStrategy
Determinebestsharding
strategyforthedatabasedon
thequantitiesofdataineach
keyandgeographic
distributionofdatabykey
Configurehowthekeyis
shardedandreplicatedacross
serversintheclusterand
acrossdatacenters

DB
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops

Table

Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins

Op
ID
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13

Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
2110
DrugMFG 20140814 Canada4LessDrugs
2110
Dose Size 20140814
400
mg
2110
DoseUOM 20140814
9448 OperationType 20140814 HeartTransplant
9448
Surgeon
20140814
DorothyOz
9448
DrugName 20140814
Minicillan
9448
DrugMFG 20140814
DrugUSA
9448
Dose Size 20140814
150
9448
DoseUOM 20140814
mg
46

MultidimensionalKeyandSingleCellModel
#5ModifyKey

Modifykeytomatchquery
needs,optimizecollocation,
andoptimizesharding

Forexample,youmaywantto
movetheDrugIDbeforeOp
ID.Thisisbecause

DB
Ops
Ops
Ops
Ops
Ops
Ops

Table

Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins

Drug
ID
1997
1997
1997
1997
1997
1997

Op
Column
Time
CellValue
ID
Type
Stamp
13 OperationType 20140814 HeartTransplant
13
Surgeon
20140814
DorothyOz
13
DrugName 20140814
Minicillan
13
DrugMFG 20140814
DrugsRUs
13
Dose Size 20140814
200
13
DoseUOM 20140814
mg

Drugsarequeriedmore
oftenthanOperations
Thelargeamountofdata
withindrugsmakesita
goodsegmentforsharding

Sincethesearenaturalkeys,
changingtypesandstructures
hasmajorimpact
47

MultidimensionalKeyandSingleCellModel
#6Create
SecondaryIndexes
Createsecondaryindexesfor
queriesthatdonotfollowthe
hierarchyofthekey
Forexample,youneeda
secondaryindexonsurgeon
ifyouwanttoquicklyfindall
operationsperformedbya
surgeon
Secondaryindexesslow
downinserts,updates,and
deletesbecausetheyare
typicallycopiesoftheentire
tablewithadifferentkey

DB
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops

Table

Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins

Op
ID
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13

Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
2110
DrugMFG 20140814 Canada4LessDrugs
2110
Dose Size 20140814
400
mg
2110
DoseUOM 20140814
9448 OperationType 20140814 HeartTransplant
9448
Surgeon
20140814
DorothyOz
9448
DrugName 20140814
Minicillan
9448
DrugMFG 20140814
DrugUSA
9448
Dose Size 20140814
150
9448
DoseUOM 20140814
mg
48

MultidimensionalKeyandSingleCellModel
#7MaterializeViews
Becausenojoinsare
possible,create
materializedviewsacross
multipletablesor
columnfamilies
thatmaterializethejoin
intoanewtable
Thismakesforveryfast
readsofdataacross
multiplerecords
Thisslowinserts
becauseeachinsert
isrepeatedmultiple
times behindthescenes

DB

Table

Hospital
ID
Ops OpsDrugs JohnHopkins
Ops OpsDrugs JohnHopkins
Ops OpsDrugs JohnHopkins

Op
ID
13
13
13

Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan

HospitalID
Hospital Administrator
JohnHopkins JohnAdams
DrugID
1997

DB
Ops
Ops
Ops
Ops
Ops

Table

Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins

Op
ID
13
13
13
13
13

DrugName
Minicillan

DrugSuccessRate
89%

Drug
Column
Time
CellValue
ID
Type
Stamp
1997
OperationType
20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName
20140814
Minicillan
1997 HospitalAdministrator 20140814
JohnAdams
1997
DrugSuccessRate 20140814
89%
49

MultidimensionalKeyandSingleCellModel
#8WriteCode
Developerwritesapplication
codeagainstthedatabase
APIorDSLto
createkeys
createsecondaryindexes
putdata
deletedata
getdata
joindata
ensuredataintegrity
(NOTE:joinsanddataintegrityare
notpartofthedatabase)

DB
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops

Table

Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins

Op
ID
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13

Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
2110
DrugMFG 20140814 Canada4LessDrugs
2110
Dose Size 20140814
400
mg
2110
DoseUOM 20140814
9448 OperationType 20140814 HeartTransplant
9448
Surgeon
20140814
DorothyOz
9448
DrugName 20140814
Minicillan
9448
DrugMFG 20140814
DrugUSA
9448
Dose Size 20140814
150
9448
DoseUOM 20140814
mg
50

MultidimKey/CellModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

PROs

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

CONs

51

MultidimKey/CellModelingAnswer
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

Table
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders

OrderID
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332

ColumnType
OrderDate
OrderTotalAmount
CustomerName
CustomerPhone
CustomerStreet
CustomerCity
CustomerState
CustomerPostalCode
Line1ProductName
Line1ProductDescription
Line1ProductPrice
Line1Product Quantity
Line2ProductName
Line2ProductDescription
Line2ProductPrice
Line2Product Quantity
Line3ProductName
Line3ProductDescription
Line3ProductPrice
Line3ProductQuantity

CellValue
20140816
$40
MikeBowers
8015551212
111My Street
SanDiego
CA
92093
CSSBook
CSSandHTMLDesign
$20
1
CSSBook
HTML5andCSS3Design
$20
2

52

MultidimensionalKeyandSingleCellValue
PROs
Fastputsandgets
Massivescalability
Easytoshard&replicate
Datacolocation
Sparselypopulatedcolumns
List,Map,andSetdatatypes
CONs
NoJoins
Mustcreateprejoinedtables
Createatableforeachquery
ShredJSONintoflatcolumns
NostandardqueryAPIorLang
Immaturetoolsandplatform
Hardtointegrateandhire

Table
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders

OrderID
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332

ColumnType
OrderDate
OrderTotalAmount
CustomerName
CustomerPhone
CustomerStreet
CustomerCity
CustomerState
CustomerPostalCode
Line1ProductName
Line1ProductDescription
Line1ProductPrice
Line1Product Quantity
Line2ProductName
Line2ProductDescription
Line2ProductPrice
Line2Product Quantity
Line3ProductName
Line3ProductDescription
Line3ProductPrice
Line3ProductQuantity

CellValue
20140816
$40
MikeBowers
8015551212
111My Street
SanDiego
CA
92093
CSSBook
CSSandHTMLDesign
$20
1
CSSBook
HTML5andCSS3Design
$20
2

53

MultidimensionalKey/CellModel
summary
Useformaximumspeedandscalability
byhandtuningapplicationcodeforqueries&inserts
tocreateInternetscaleapplications
Example:
Netflix,Google,Linkedin,etc.
54

WideColumn
Model
indetail
55

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)
newSQL

#58 GemFire
#69 Oracle x10

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Graph Raw
Document
Lessstructure(schemaless)

56

WideColumnModel
Multidimensionalkeyplusacellvalue
WideColumn

Simplekeyplusmultidimensionalvalue

57

WideColumnModel
#1Columnmodel

Easiertouseversionof
theMultidimensional
Key/Cellmodel

Rowsarepivotedinto
columnsinatable

SQLlikeQuery
Languagesmakeiteasy
tocreatetablesandto
querythem

Tuningprinciplesarethe
sameforwidecolumn
andmultidimensional
Keymodels,butlogical
modelingisdifferent

Joinsarenotpossible

Hospital
ID
John
Hopkins
John
Hopkins
John
Hopkins
DB
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops

Op
ID
13

Drug
ID
1997

13

2110

13

9448

Table

Operation
Type
Heart
Transplant
Heart
Transplant
Heart
Transplant

Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins

Op
ID
13
13
13
13
13
13
13
13
13

Surgeon

Drug DrugMFG
Name
Dorothy Minicillan DrugsRUs
Oz
Dorothy Maxicillan Canada4Less
Oz
Drugs
Dorothy Minicillan DrugUSA
Oz

Dose
Size
200

Dose
UOM
mg

400

mg

150

mg

Drug
Column
Time
ColumnValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
58

WideColumnModel
Option#1
Becausenojoinsarepossible,
modeltheoneasthekeyof
aonetomanyrelationship
andthemanyasthemulti
columnvalue
HospitalName:
OperationType:
Operation ID:
SurgeonName:
Drug
Name
Minicillan
Maxicillan
Minicillan

Hospital Op
ID
ID
John
13
Hopkins
John
13
Hopkins
John
13
Hopkins

Operation
Type
Heart
Transplant
Heart
Transplant
Heart
Transplant

Surgeon
Dorothy
Oz
Dorothy
Oz
Dorothy
Oz

Drug
Drug DrugMFG
ID
Name
1997 Minicillan DrugsRUs

Dose
Size
200

2110 Maxicillan Canada4Less 400


Drugs
9448 Minicillan DrugUSA
150

Dose
UOM
mg
mg
mg

JohnHopkins
HeartTransplant
13
DorothyOz

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg
59

WideColumnModel
Option#2

Becausenojoinsare
possible,modeloneto
manyrelationshipssparsely
populated,nestedgroupsof
repetitivecolumns

HospitalName:
OperationType:
Operation ID:
SurgeonName:
Drug
Name
Minicillan
Maxicillan
Minicillan

Hospital Op
ID
ID
John
13
Hopkins

Operation
Type
Heart
Transplant

Dorothy
Oz

Continued

JohnHopkins
HeartTransplant
13
DorothyOz

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Surgeon

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Drug
Drug
ID1 Name1
1997 Minicillan

Drug
MFG1
DrugsRUs

Dose
Size1
200

Dose
UOM1
mg

Drug
Drug
Drug
Dose
ID2 Name2
MFG2
Size2
2110 Maxicillan Canada4Less 400
Drugs

Dose
UOM2
mg

Drug
Drug
ID3 Name3
9448 Minicillan

Dose
UOM3
mg

Drug
MFG3
DrugUSA

Dose
Size3
150

60

WideColumnModel
Option#3

Becausenojoinsarepossible,
createcolumnswithembedded
onetomanyrelationshipsas
nestedUDTs,maps,lists,or
sets

HospitalName:
OperationType:
Operation ID:
SurgeonName:
Drug
Name
Minicillan
Maxicillan
Minicillan

Hospital
ID

Op
ID

Operation Surgeon
Type

John
Hopkins

13

Heart
Transplant

JohnHopkins
HeartTransplant
13
DorothyOz

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Drugs
Name

Dorothy {
Oz
'drug1':{
drug_name:'Minicillan',
drug_manufacturer:'DrugsRUs',
dose_size:200,dose_uom:'mg'},
'drug2':{
drug_name:Maxicillan',
drug_manufacturer:'Canada4LessDrugs',
dose_size:400,dose_uom:'mg'},
'drug3':{
drug_name:'Minicillan',
drug_manufacturer:'DrugUSA',
dose_size:150,dose_uom:'mg'}
}

61

WideColumnModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

PROs

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

CONs

62

WideColumnModelingAnswer#1
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Category
CSS Book
CSSBook

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

Order
Order
ID
Date
1332 20140816

Order Customer Customer


Total
Name
Phone
$40
Mike
801
Bowers
5551212

CustomerAddress
{
street:'111MyStreet',
city:'SanDiego',
state:'CA',
postal:'92093'
}

OrderItems
{
'product1': {
category:'CSSBook',
description:'CSSandHTMLDesign',
price:20.00,
quantity:1},
'product2': {
category:'CSSBook',
description:'HTML5andCSS3Design',
price:20.00,
quantity:2}

CassandraUDTslooksomewhatlikeJSON,
buttheyarenotJSON.
Theylimitedquerycapabilities.
SeeUsingUserDefinedTypesinCassandra

Cont

}
63

WideColumnModelingAnswer#2
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Category
CSS Book
CSSBook

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

OrderID

OrderLine
ID

Order
Date

Customer
Name

Customer
Phone

1332

20140816

Mike
Bowers

801555
1212

1332

20140816

Mike
Bowers

801555
1212

Customer
Street

Customer
City

111MyStreet

SanDiego

CA

92093

111MyStreet

SanDiego

CA

92093

Cont

Customer Customer
State
Postal
Cont

Product
Category

Product
Description

Product
Price

Product
Quantity

CSSBook

CSSandHTMLDesign

20.00

CSSBook

HTML5andCSS3Design

20.00

64

WideColumnModelingAnswer#3
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Category
CSS Book
CSSBook

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

Product
ID

Order
ID

OrderLine Product Product


ID
Category Description

17

1332

Cont
CSSBook CSSandHTMLDesign

9466

1332

CSSBook HTML5andCSS3Design

Product Product
Price
Quantity

Order
Date

Customer
Name

20.00

20140816 MikeBowers

20.00

20140816 MikeBowers

Customer
Phone

Customer
Street

Customer
City

Cont

Customer Customer
State
Postal

8015551212 111MyStreet SanDiego

CA

92093

8015551212 111MyStreet SanDiego

CA

92093
65

WideColumn
PROs
TablelikewithSQLlikequeries
Fastputsandgets
Massivescalability
Easytoshard&replicate
Datacolocation
Sparselypopulatedcolumns
List,Map,andSetdatatypes
CONs
Modelbyquery:Createatableperquery
NoJoins:mustcreateprejoinedtables
ShredJSONintoflatcolumnsorflatmaps
NostandardqueryAPIorLang
Immaturetoolsandplatform
Hardtointegrateandhire

OrderID

OrderLine
ID

Order
Date

Customer
Name

Customer
Phone

1332

20140816

Mike
Bowers

801555
1212

1332

20140816

Mike
Bowers

801555
1212

Customer
Street

Customer
City

111MyStreet

SanDiego

CA

92093

111MyStreet

SanDiego

CA

92093

Cont

Customer Customer
State
Postal
Cont

Product
Category

Product
Description

Product
Price

Product
Quantity

CSSBook

CSSandHTMLDesign

20.00

CSSBook

HTML5andCSS3Design

20.00

66

WideColumnModel
summary
Useformaximumspeedandscalability
withSQLlikecodeforalloperations
tocreateInternetscaleapplications
Example:
Apple,Netflix,Google,Linkedin,etc.

67

Key/Value
Model
indetail
68

SimpleKey/MultidimensionalValueModel
Multidimensionalkeyplusacellvalue
WideColumn

SimplekeyplusMultidimensionalvalue

69

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)
newSQL

#58 GemFire
#69 Oracle x10

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Graph Raw
Document
Lessstructure(schemaless)

70

SimpleKey/MultidimensionalValue
1

Value
Type
OpsDrugs

Op
ID
JohnHopkins 13

OpsDrugs

JohnHopkins

13

OpsDrugs

JohnHopkins

13

Key

HospitalName:
OperationType:
Operation ID:
SurgeonName:
Drug
Name
Minicillan
Maxicillan
Minicillan

Hospital

Drug
Time Operation Surgeon
Drug
ID
Stamp
Type
Name
1997 20140814
Heart
DorothyOz Minicillan
Transplant
2110 20140814
Heart
DorothyOz Maxicillan
Transplant
9448 20140814
Heart
DorothyOz Minicillan
Transplant

Dose
Size
200
400
150

DrugsRUs
Canada4Less
Drugs
DrugUSA

Dose Dose
Size UOM
200
mg
400

mg

150

mg

#1ModelTransactions

JohnHopkins
HeartTransplant
13
DorothyOz

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

DrugMFG

Dose
UOM
mg
mg
mg

Createadenormalizedflat
datastructureforeach
attributeinthetransaction
Eachrecordwillbeaccess
throughsimple,meaningless
key
71

SimpleKey/MultidimensionalValue
1

Value
Type
OpsDrugs

Op
ID
JohnHopkins 13

OpsDrugs

JohnHopkins

13

OpsDrugs

JohnHopkins

13

Key

Hospital

Drug
Time Operation Surgeon
Drug
ID
Stamp
Type
Name
1997 20140814
Heart
DorothyOz Minicillan
Transplant
2110 20140814
Heart
DorothyOz Maxicillan
Transplant
9448 20140814
Heart
DorothyOz Minicillan
Transplant

DrugMFG
DrugsRUs
Canada4Less
Drugs
DrugUSA

Dose Dose
Size UOM
200
mg
400

mg

150

mg

#2CreateSecondaryIndexes
Determinewhichattributesneedtobequeried
andcreatesecondaryindexesonthem
Primaryandsecondaryindexesareforfastquerieswithinrecords.
Theydonot makeitpossibletodojoinsbetweenrecords.
Key/Valuedatabaseslimitthenumberofsecondaryindexes:
ADynamoDB tablecanhaveupto5globalsecondaryindexesand5local
72

SimpleKey/MultidimensionalValue
#3WriteCode
Developerwritesapplicationcode
againstthedatabaseAPIorDSLto
createkeys,createsecondaryindexes,
putrecords,deleterecords,getrecords,
joinrecords,ensuredataintegrity
Joinsanddataintegrityhavetobedone
inapplicationcodebecausethedatabase
cant
Todojoinsintheapplication,youretrievea
record,readanyembeddedIDstoother
records,andthenretrievethoserecords.

Key/Valuedatabases,suchas
DynamoDB,arestartingtosupport
JSONlike values,whichmakesthem
morelikeDocumentDatabases

{"_id": 1,
"_type":"Operation",
"operation":{
"hospitalName":"JohnHopkins",
"operationTypeName":"HeartTransplant",
"surgeonName":"DorothyOz",
"operationNumber":13,
"administeredDrugs":[
{"drugName":
"Minicillan",
"drugManufacturer":"DrugsRUs",
"drugDoseSize":
200,
"drugDoseUOM":
"mg"},
{"drugName":
"Maxicillan",
"drugManufacturer":"Canada4Less",
"drugDoseSize":
400,
"drugDoseUOM":
"mg"},
{"drugName":
"Minicillan",
"drugManufacturer":"DrugUSA",
"drugDoseSize":
150,
"drugDoseUOM":
"mg"}]}}
73

SimpleKeyModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

PROs

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

CONs

74

SimpleKeyModelingAnswer#1
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Category
CSS Book
CSSBook

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

ID

Order Order
ID
LineID

Order
Date

Customer
Name

1332

20140816 MikeBowers 8015551212

1332

20140816 MikeBowers 8015551212

Customer
Street

Customer
City

111MyStreet

SanDiego

CA

92093

111MyStreet

SanDiego

CA

92093

Customer
Phone
Cont

Customer Customer
State
Postal
Cont

Product
Category

Product
Description

Product
Price

Product
Quantity

CSSBook

CSSandHTMLDesign

20.00

CSSBook

HTML5andCSS3Design

20.00

2
75

SimpleKeyAnswer#2:JSONlikeTypes
{
"id":"1332",
"type":"order",
"orderDate":"20140816",
"customerName":"MikeBowers",
"customerPhone":"8015551212",
"customerAddress":{
"customerAddressStreet":"111MyStreet",
"customerAddressCity":"SanDiego",
"customerAddressState":"CA",
"customerAddressPostalCode":"92093"
},
"product":[
{"productCategory":"CSSBook",
"productDescription":"CSSandHTMLDesign",
"productPrice":20.00,
"productQuantity":1
},
{"productCategory":"CSSBook",
"productDescription":"HTML5andCSS3Design",
"productPrice":20.00,
"productQuantity":2
}
]

DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Category
CSS Book
CSSBook

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

76

76

SimpleKeyandMultidimensionalValue
1

Value
Type
OpsDrugs

Op
ID
JohnHopkins 13

OpsDrugs

JohnHopkins

13

OpsDrugs

JohnHopkins

13

Key

Hospital

Drug
Time Operation Surgeon
Drug
ID
Stamp
Type
Name
1997 20140814
Heart
DorothyOz Minicillan
Transplant
2110 20140814
Heart
DorothyOz Maxicillan
Transplant
9448 20140814
Heart
DorothyOz Minicillan
Transplant

DrugMFG
DrugsRUs
Canada4Less
Drugs
DrugUSA

Dose Dose
Size UOM
200
mg
400

mg

150

mg

PROs

CONs

Fastputsandgets

NoJoins;Appimplementsjoins

Massivescalability

Noreferentialintegrity

Secondaryindexesrequiredtoqueryvalues

ShredJSONvalueintoflatcolumnsorJSONliketypes

Inexpensive

DesignqueriesusingnonstandardAPIorquerylanguage

Cannotqueryforrelevance

Dataintransactionalcontext

Immaturetoolsandplatform

Developerincontrol

Hardtointegrateandhireexpertise

Easytoshard&replicate
Verysimpletomodel

77

Key/Value
summary
Usewhenyouneedmaximumspeed
toretrieveasetofvaluesbykey
Typicallythevalueisabloborasetofflatvalues
AdocumentDBisbetterthanasimplekey/valueDB
becauseJSON&XMLaretrueobjectstructures
78

DocumentModeling
indetail

79

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)
newSQL

#58 GemFire
#69 Oracle x10

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Graph Raw
Document
Lessstructure(schemaless)

80

Whatisadocument?
Adocumentisanestedstructurereferencedbykey
{"_id":"1",
"_type":"Operation",
"operation":{
"hospitalName":"JohnHopkins",
"operationTypeName":"HeartTransplant",
"surgeonName":"DorothyOz",
"operationNumber":13,
"administeredDrugs":[
{"drugName":"Minicillan","drugManufacturer":"DrugsRUs","drugDoseSize":200,"drugDoseUOM":"mg"},
{"drugName":"Maxicillan","drugManufacturer":"Canada4Less","drugDoseSize":400,"drugDoseUOM":"mg"},
{"drugName":"Minicillan","drugManufacturer":"DrugUSA","drugDoseSize":150,"drugDoseUOM":"mg"}
],
"relations":{
"values":[
{"subject":"1","predicate":"opHospital","object":"10","hospitalAddress":"1057Mayberry"},
{"subject":"1","predicate":"opType","object":"100","insuranceCode":21187},
{"subject":"1","predicate":"opSurgeon","object":"10000","surgeonSuccessRate":0.87},
{"subject":"1","predicate":"opDrug","object":"10000","drugEfficacy":0.8,"drugRecalls":1},
{"subject":"1","predicate":"opDrug","object":"20000","drugEfficacy":0.5,"drugRecalls":3},
{"subject":"1","predicate":"opDrug","object":"30000","drugEfficacy":0.7,"drugRecalls":1}
]}}}

81

JSONvs.XML
{"section":{
"heading":"DataModels",
"paragraphs":[
{"paragraph":[
{"s":"Thispapershows." }]},
{"paragraph":[

<section>
<heading>DataModels</heading>
<paragraph>
Thispapershows.</paragraph>

<paragraph>

"The",
{"i":"relational"},
"modelisnolonger,",
{"br":null},
"theonlygameintown."]}]}}

The
<i>relational</i>
modelisnolonger,

<br/>
theonlygameintown. </paragraph></section>

JSON

XML

1.
2.
3.
4.
5.

1.
2.
3.
4.
5.

Bestforstructureddata(textpouredintoobjects)
Nodocumenttypeandimmatureschemas
Objects,arrays,floats,strings,booleans,nulls
Nonamespaces,Nocomments,Noattributes
Easy,simple,compact,andfasttoparse

Bestforstructuredtext (structureaddedontopoftext)
Documenttypeswithoptionalmatureschemas
Objects,sets,alldatatypes:dates,durations,integers,etc.
Namespaces,Comments,Attributes
Attributesaddmetadata;Namespacesembedobjecttypes
82

DocumentModeling
#1ModelTransactions
Eachdocumentisatransaction
JSONdataintheapplicationis
theJSONdataindatabase
DocumentIDistheprimarykey
Eachdocumentincludesalldata
capturedduringthetransaction
Eachdocumentishistorically
accurateforitspointintime
Usesecondaryindexestoflatten
structuretomakequeriesflexible
Usesearchindexestofindthe
mostrelevantdocuments

DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan

JohnHopkins
HeartTransplant
DorothyOz
13

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Documentmodelingistheoppositeof
relational:startdenormalizedand
normalizetomatchtransactionpatterns
83

DocumentModeling
#2CreateReferenceDocs
Createadditionaldocumenttypesfor
eachtypeofreferencedata,suchas
Hospitals,OperationTypes,and
Drugs

Hospital
ID
10
20
Operation
TypeID
100
200

HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan

Hospital
Name
JohnHopkins
BostonChildrens
Operation
TypeName
HeartTransplant
Appendectomy

DocumentID:1

Surgeon Surgeon
ID
Name
1000
Dorothy Oz
2000
VanTristic

Drug
ID
10000
20000
30000

JohnHopkins
HeartTransplant
DorothyOz
13

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Drug
Name
Minicillan
Maxicillan
Minicillan

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

84

DocumentModeling
#3aConnectReferences

Best:Usetriples tobidirectionally connect


eachtransactiondoctoitsreferencedocs

Youcanquerydocumentrelationships

Youcanjoindocumentsinthedatabaseand
returntheresultsofthejoin

Subject
TransDoc
1
1
1
1
1
1

Predicate
Relationship
SurgeryInHospital
SurgeryOperationType
SurgerySurgeon
SurgeryDrugsGiven
SurgeryDrugsGiven
SurgeryDrugsGiven

Object
RefDoc
10
100
1000
10000
20000
30000

DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan

JohnHopkins
HeartTransplant
DorothyOz
13

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

85

DocumentModeling
#3bConnectReferences

OK:Usedocumentreferences to
unidirectionally connecteachtransaction
documenttoitsreferencedocuments.

Thismakesiteasytocreatelinksfroma
transactiondocumenttoitsreference
documents.

IfyouindexeachreferenceID,thenqueries
forreferenceddocumentscanbefast

Thisisnotajoin:yourapplicationcode
readsinadocument,findsreferenced
documents,andretrievesthemonebyone

Hospital
ID
10

Hospital
Name
JohnHopkins

DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
ID
10000
20000
30000

Operation Operation
TypeID
TypeName
100
HeartTransplant

Drug
Name
Minicillan
Maxicillan
Minicillan
Drug
ID
10000
20000
30000

JohnHopkins
10
HeartTransplant 100
DorothyOz
13

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Drug
Name
Minicillan
Maxicillan
Minicillan

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

86

DocumentModeling
#4SyncReferences

DocumentID:1

Optionallychoosetosynchronize
changesinreferencedocuments
intotransactiondocuments.
Topreservehistory,addchanges
tonewelements.
Tomaximizeintegrityoverwrite
original.
Hospital
ID
10

Hospital
Name
JHResearch

Operation Operation
TypeID
TypeName
100
CardiacTransplant

HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan

JohnHopkins
JHResearch
HeartTransplant Cardiactransplant
DorothyOz
DorothyWiz
13

Drug
Manufacturer
DrugsRUs
BestDrugs
DrugUSA

Surgeon Surgeon
ID
Name
1000
Dorothy Wiz

Drug
ID
20000

Dose
Size
200
400
150

Drug
Name
Maxicillan

Dose
UOM
mg
mg
mg

Drug
Manufacturer
BestDrugs
87

DocumentModeling
#5Projections

DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:

Optionallychoosetoproject
valuesfromreferencedocuments
intotransactiondocuments.
Formaximumreadandsearch
performance,projectdataduring
writes.

Drug
Name
Minicillan
Maxicillan
Minicillan

Formaximumwriteperformance,
projectdataduringreads.
Drug
ID
10000
20000
30000

Drug
Name
Minicillan
Maxicillan
Minicillan

Drug
Manufacturer
DrugsRUs
BestDrugs
DrugUSA

Drug
Efficacy
80%
50%
70%

Drug
Recalls
1
3
1

JohnHopkins
HeartTransplant
DorothyOz
13

Drug
Manufacturer
DrugsRUs
BestDrugs
DrugUSA

Drug
Efficacy
80%
50%
70%

Drug
Recalls
1
3
1

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

88

Whyisthedocumentmodel
bestfordeveloperproductivity?
JSONisthelinguafrancaoftheweb
JSONRESTWebServicesaremodeledfirst(relationalisanafterthought)
Documentssupportsagiledevelopmentwithoutaschema
Documentshandlerapidlychangingrequirements
Documentshandledeeplyhierarchical,complex,andhighlyvariablestructures
Documentshavenoimpedancemismatchbetweenapplicationanddatabase
Documentsmakesearchrelevancepossible
Fulltextsearchincontextofdocumentstructure
Fullfeaturedqueriesofanydataanywhereinadocument
89

DocumentModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

PROs

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 mg

CONs

90

DocumentModelingAnswer
{

DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

"id":"1332",
"type":"order",
"orderDate":"20140816",
"customerName":"MikeBowers",
"customerPhone":"8015551212",
"customerAddress":{
"customerAddressStreet":"111MyStreet",
"customerAddressCity":"SanDiego",
"customerAddressState":"CA",
"customerAddressPostalCode":"92093"
},
"product":[
{"productCategory":"CSSBook",
"productDescription":"CSSandHTMLDesign",
"productPrice":20.00,
"productQuantity":1
},
{"productCategory":"CSSBook",
"productDescription":"HTML5andCSS3Design",
"productPrice":20.00,
"productQuantity":2
}
]

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 mg

91

DocumentModel
PROs
Fastestdevelopment

SimpleKey/DocumentValue
{
"id":"1332",
"type":"order",
"orderDate":"20140816",
"customerName":"MikeBowers",
"customerPhone":"8015551212",
"customerAddress":{
"customerAddressStreet":"111MyStreet",
"customerAddressCity":"SanDiego",
"customerAddressState":"CA",
"customerAddressPostalCode":"92093"
},
"product":[
{"productCategory":"CSSBook",
"productDescription":"CSSandHTMLDesign",
"productPrice":20.00,
"productQuantity":1
},
{"productCategory":"CSSBook",
"productDescription":"HTML5andCSS3Design",
"productPrice":20.00,
"productQuantity":2
}
]

Indexeverything,Queryanything
Selfservice,adhocqueries
Schemaless,designdataatruntime
JSONand/orXMLdatastructures
Querieseverythingincontextwithrelevance
Turnsdataintoinformation
CONs
Defensiveprogrammingforschemalessdata
Expensiveplatforms,immaturetools
NonstandardQueryLanguages
Notasfastaswidecolumnandsimplekey
valuedatabases
}

92

DocumentModel
summary
Usewhenyouneedmaximumdeveloperproductivity
andgreatspeedandscalability
Example:
Enterpriseapplications,Websites,etc.
93

DocumentModel
tip
UseJSON forobjects
UseXML fortext
(tomarkupstructure,semantics,anddata)
94

GraphModeling
indetail

95

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)
newSQL

#58 GemFire
#69 Oracle x10

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Graph Raw
Document
Lessstructure(schemaless)

96

WhatisaTriple?
AtripleisthreeIDs:subject,predicate,object
{"triples":[
{"subject":"docID_1","predicate":"opDrug","object":"docID_10000"},
{"subject":"docID_1","predicate":"opDrug","object":"docID_20000"},
{"subject":"docID_1","predicate":"opDrug","object":"docID_30000"}]}

Subject
Itisthefocus ofthetriple
ItisaURIuniquetothedatabase

Predicate
Specifiestherelationship betweenthesubjectandtheobject
ItisaURItypicallydefinedbyexternalontologies

Object
Specifiesthetarget ofthesubjectsrelationship
ItisavalueoraURI typicallytheURIofanothersubjectorastring,number,date,etc.

97

WhatisaQuad?
excelsat

Surgeon

performed

Operation
operatedon

operatedon

Person

Operated
at

AQuadisaTripleprefixedwiththe
collectionitbelongsto

patientat
worksat

Hospital

{"quads":[
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"excelsAt","object":"operationDoc13"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"performed","object":"operationDoc13"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"operatedOn","object":"userDoc1554"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"worksAt","object":"hospitalDoc10"},
{"collection":"HospitalOps","subject":"operationDoc13","predicate":"requestingUser","object":"userDoc1554"},
{"collection":"HospitalOps","subject":"operationDoc13","predicate":"operatedAt","object":"hospitalDoc10"},
{"collection":"HospitalOps","subject":"userDoc1554","predicate":"patientAt","object":"hospitalDoc10"}]}

98

TriplesDeconstructDataAtomically
Triplesbreakdowndataintosingularitems identified
byIDs

Anitem onlyhasmeaningwhenitisrelated toother


itemsorsimpledata

Thisislikedeconstructingdataintoelectrons,neutrons,
andprotonssothatyoucanreconstructanytypeof
atomandthencombineatomsintomolecules,and
combinemoleculesintocompounds,etc.
99

TriplesFocusonRelationships notData
Theprimaryfocusoftriplesison
relationshipsbetweenitems:
Traversinganetworkofrelationships
Findingitemsthathavethesamerelationshippatterns

Togetanyinformationaboutanitem
requiresqueryingrelationshipstootheritems
Tomakethiseasier,someTripledatabasesallowitemsto
havepropertiesorallowitemstobedocuments

100

ConnectingTriplesandDocuments
Neo4J isapropertygraphdatabase
Itprovidesproperties onsubjects,predicates,andobjects
(i.e.nodesandrelationships)

MarkLogic isanRDFsemanticgraphdatabase
Itallowssubjects,predicates,andobjectsbereferencesto
documentsanditallowsdocumentstocontain embedded
triplesandprojectionsoftripledata

OrientDB connectsdocumentsusingtriples
101

EmbeddingTriplesinaDocument
Usetriplestorelatedocumentsbidirectionally
{"_id":"1",
"_type":"Operation",
"operation":{
"hospitalName":"JohnHopkins",
"operationTypeName":"HeartTransplant",
"surgeonName":"DorothyOz",
"operationNumber":13,
"administeredDrugs":[
{"drugName":"Minicillan","drugManufacturer":"DrugsRUs","drugDoseSize":200,"drugDoseUOM":"mg"},
{"drugName":"Maxicillan","drugManufacturer":"Canada4Less","drugDoseSize":400,"drugDoseUOM":"mg"},
{"drugName":"Minicillan","drugManufacturer":"DrugUSA","drugDoseSize":150,"drugDoseUOM":"mg"}
],
"relations":{
"values":[
{"subject":"1","predicate":"opHospital","object":"10"},
{"subject":"1","predicate":"opType","object":"100"},
{"subject":"1","predicate":"opSurgeon","object":"10000"},
{"subject":"1","predicate":"opDrug","object":"10000"},
{"subject":"1","predicate":"opDrug","object":"20000"},
{"subject":"1","predicate":"opDrug","object":"30000"}
]}}}
102

ProjectingDocumentValuesintoTriples
Atwriteorreadtimeyoucanprojectdataintodocs

DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:

JohnHopkins
HeartTransplant
DorothyOz
13

{"_id":"1",
"_type":"Operation",
"operation":{
Drug Drug
Drug
Dose Dose
Drug Drug
Drug
ID
Name
Manufacturer Size
UOM
ID
Efficacy Recalls
"hospitalName":"JohnHopkins",
10000 Minicillan DrugsRUs
200
mg
10000 80%
1
"operationTypeName":"HeartTransplant",
20000 Maxicillan Canada4Less 400
mg
20000 50%
3
"surgeonName":"DorothyOz",
30000 Minicillan DrugUSA
150
mg
30000 70%
1
"operationNumber":13,
"administeredDrugs":[
{"drugName":"Minicillan","drugManufacturer":"DrugsRUs","drugDoseSize":200,"drugDoseUOM":"mg"},
{"drugName":"Maxicillan","drugManufacturer":"Canada4Less","drugDoseSize":400,"drugDoseUOM":"mg"},
{"drugName":"Minicillan","drugManufacturer":"DrugUSA","drugDoseSize":150,"drugDoseUOM":"mg"}
],
"relations":{
"values":[
{"subject":"1","predicate":"opHospital","object":"10","hospitalAddress":"1057Mayberry"},
{"subject":"1","predicate":"opType","object":"100","insuranceCode":21187
},
{"subject":"1","predicate":"opSurgeon","object":"10000","surgeonSuccessRate":0.87
},
{"subject":"1","predicate":"opDrug","object":"10000","drugEfficacy":0.8,"drugRecalls":1 },
{"subject":"1","predicate":"opDrug","object":"20000","drugEfficacy":0.5,"drugRecalls":3 },
{"subject":"1","predicate":"opDrug","object":"30000","drugEfficacy":0.7,"drugRecalls":1 }
]}}}

103

PowerofCombiningDocumentsandTriples
Narrative +Data =ContextualInformation +Relationships =Meaningful Knowledge
(Semantic &Structural)

T
ARelationalModelofDataforLarge
SharedDataBanks
P
E.F.CODD
IBMResearchLaboratory,SanJose,California
L
L
O
InformationRetrieval,Volume13/Number6/
A
A
A
June,1970
E
Programsshouldremainunaffectedwhenthe
I
internalrepresentationofdataischanged.
Treestructuredinadequaciesare
I
T
discussed.Relationsarediscussedand
T
appliedtotheproblemsofredundancyand
T
T
consistency.
KEYWORDSANDPHRASES:database,data
T
T
structure,dataorganization,hierarchiesofdata,
T
T
networksofdata,relations
T
T
CRCATEGORIES: 3.70,3.73,3.75,4.20,4.22
R R R R R

1.RelationalModelandNormalForm
1.1.INTRODUCTION
Thispaperisconcernedwiththeapplicationofelementaryrelation
theorytoformatteddata.Theproblemsarethoseofdata
independenceanddatainconsistency.
Therelationalviewappearstobesuperiorinseveralrespectsto
thegraph ornetworkmodel.
Relationalviewformsasoundbasisfortreatingderivability,
redundancy,andconsistency.[and]aclearerevaluationof
1.2.DATADEPENDENCIESINPRESENTSYSTEMS
Tablesrepresentamajoradvancetowardthegoalofdata
independence
1.2.1.OrderingDependence.Programswhichtakeadvantageof
thestoredorderingofafilearelikelytofailifitbecomes
necessarytoreplacethatorderingbyadifferentone.
1.2.2.IndexingDependence.Canapplicationprogramsremain
invariantasindicescomeandgo?
1.2.3.AccessPathDependence.Manyoftheexistingformatteddata
systemsprovideuserswithtreestructuredfilesorslightlymore
generalnetworkmodelsofthedata.Theseprogramsfailwhena
changeinstructurebecomesnecessary.Theprogramisrequired
toexploitpathstothedata.Programsbecomedependentonthe
continuedexistenceofthepaths.

104

GraphModeling
#1aDefineRelationships

excelsat

Defineastandardsetofrelationships
withprecisemeanings
Thisiscriticalbecauserelationships
assignmeaningtoitemsandmake
queriespossible

Surgeon

performed

Operation
operatedon

operatedon

Person

Operated
at

DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan

patientat

JohnHopkins
HeartTransplant
DorothyOz
13

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Dose
Size
200
400
150

worksat

Dose
UOM
mg
mg
mg

Hospital

105

GraphModeling
#1bUseExistingOntologies
Savetimeandmakeyourdataeasierto
understandbyleveragingexisting
relationshipontologies

DublinCore
FOAF
TrackBack
MetaVocab
BasicGeoVocabulary
BIO
RSS1.0
VCardRDF
CreativeCommonsmetadata
WOT

TIP:Searchforontologiesat
LinkedOpenVocabularies(LOV)

SIOC
GoodRelations
DOAP
Programmes Ontology
MusicOntology
OpenGUID
ProvenanceVocabulary
Pedagogicaldiagnosis
DILIGENTArgumentation
106

GraphModeling
#2DefineAttributes

excelsat

CreateanIDforeachitem
TheIDcanbehumanreadable,butit
usuallyisavariationofaUUID

Surgeon

performed

Operation
operatedon

operatedon

Person

Operated
at

DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan

patientat

JohnHopkins
HeartTransplant
DorothyOz
13

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Dose
Size
200
400
150

worksat

Dose
UOM
mg
mg
mg

Hospital

107

GraphModeling
#3AddCoreDatatoItems
Definecommoncoredatayouwantadded
toallitems
BecauseanitemissimplyanIDthathasno
meaning,youneedtoaddcoremetadatato
it,suchastype,name,updatedBy,
updatedOn,etc.
Ifyourdatabaseallowsanitemtobea
documentortohaveproperties,youcan
addcoredatadirectlytoit

ID:155321

itemType

Surgeon

itemName

DorothyOz

updatedBy

ID:622480

Inapuretriplesystem,youmustuse
triplestoconnectitemstocoredata

108

GraphModeling
#3AddRelationships

excelsat

Createatripleforeveryrelationship
betweenitems

Surgeon

performed

Graphsareschemaless youcan
addmorerelationshipsatruntime

Operation
operatedon

operatedon

Person

Operated
at

DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan

patientat

JohnHopkins
HeartTransplant
DorothyOz
13

Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA

Dose
Size
200
400
150

worksat

Dose
UOM
mg
mg
mg

Hospital

109

QueryusingSPARQL
excelsat

Surgeon

performed

SPARQLisatriplequerylanguage

Operation
operatedon

operatedon

Person
patientat

worksat

Hospital

Operated
at

SELECT*
WHERE{?SurgeonexcelsAt?Operation
?OperationisNamed
"HeartSurgery"}

Returnsallsurgeonswhoexcelatheartsurgery.

{"quads":[
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"excelsAt","object":"operationDoc13"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"performed","object":"operationDoc13"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"operatedOn","object":"userDoc1554"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"worksAt","object":"hospitalDoc10"},
{"collection":"HospitalOps","subject":"operationDoc13","predicate":"requestingUser","object":"userDoc1554"},
{"collection":"HospitalOps","subject":"operationDoc13","predicate":"operatedAt","object":"hospitalDoc10"},
{"collection":"HospitalOps","subject":"operationDoc13","predicate":"isNamed","object":"HeartSurgery"},
{"collection":"HospitalOps","subject":"userDoc1554","predicate":"patientAt","object":"hospitalDoc10"}
]}

110

Whynotuse3ColumnsinaTableasatriple?
Subject
lds.org/manual/TrueToTheFaith
lds.org/manual/TrueToTheFaith
lds.org/manual/TrueToTheFaith/Faith
lds.org/manual/TrueToTheFaith/Faith
lds.org/manual/TrueToTheFaith/Faith
www.lds.org/scriptures/nt/heb/11
lds.org/topic/faith
lds.org/topic/faith
lds.org/topic/salvation

Predicate
isType
isDocumentType
isChapterIn
isRelatedToTopic
ReferencesScripture
isRelatedToTopic
isType
isRelatedToTopic
isChapterIn

Object
lds.org/type/publishedDocument
lds.org/documentType/Pamphlet
lds.org/manual/TrueToTheFaith
lds.org/topic/faith
www.lds.org/scriptures/nt/heb/11
lds.org/topic/faith
lds.org/type/topic
lds.org/topic/salvation
lds.org/manual/TrueToTheFaith

Arelationaldatabaserequireshundredsofrecursivejoinstoresolvetriplequeries
ImaginetheperformanceandcomplexityofaSQLquerythatjoinsseveralhundredtables
ImaginetheperformanceofasingletablethatcontainsbillionsofrowswhereeveryCRUD
statementinthedatabaseisexecutedonthattableanditsthreeindexes
Howdoyoudoinferences(suchasafatherofafatherisagrandfather)inSQL?
WhydoesOracleDBhaveatripleindexandlicenseitasaseparateproduct?

111

TripleModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

PROs

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 mg

CONs

112

TripleModelingAnswer
{"relationships":[

1332", "predicate":"rdf:type",
{"subject":"1332", "predicate":"orderNumber",
{"subject":"1332", "predicate":"orderDate",
{"subject":"1332", "predicate":"orderTotal",
{"subject":"1332", "predicate":"customer",
{"subject":"1332", "predicate":"productOrdered",
{"subject":"1332", "predicate":"productOrdered",

"order"},
"object":"1332"},
"object":"20140816"},
"object":40.00},
"object":1},
"object":100 },
"object": 200 },

1
1
{"subject":"1",
{"subject":"1",

"rdf:type",
"customerName",
"predicate":"customerPhone",
"predicate":"customerAddress",

"customer"},
"MikeBowers"},
"object":"8015551212"},
"object": 10 },

"rdf:type",
"addressStreet",
"predicate":"addressCity",
"predicate":"addressState",
"predicate":"addressPostal",

"address"},
"111MyStreet"},
"object":"SanDiego"},
"object":"CA"},
"object":"92093"},

{"subject":"

DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook

1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode

Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2

{"subject":" ",

"predicate":

{"subject":" ",

"predicate":

10",
10",
{"subject":"10",
{"subject":"10",
{"subject":"10",
{"subject":"

"predicate":

{"subject":"

"predicate":

100",
100 ",
{"subject":"100 ",
{"subject":"100 ",
{"subject":"100",

"rdf:type",
"productCategory",
"predicate":"productDescription",
"predicate":"productPrice",
"predicate":"productQuantity",

{"subject":"

"predicate":

{"subject":"

"predicate":

200",
{"subject":"200",
{"subject":"200",
{"subject":"200",
{"subject":"200",
{"subject":"

"rdf:type",
"predicate":"productCategory",
"predicate":"productDescription",
"predicate":"productPrice",
"predicate":"productQuantity",
"predicate":

"object":

"object":

"object":

"object":

"object":

"product"},
"CSSBook"},
"object":"CSSandHTMLDesign"
"object":20.00 },
"object":1 },
"object":

"object":

},

"product" },
"object":"CSSBook"},
"object":"HTML5andCSS3Design" },
"object":20.00 },
"object":2 }]}
113
"object":

GraphModel
summary
Modelnetworks,relatedocuments,
andenrichdata
Example:
genetics,familyhistory,socialnetworks
114

Graphvs.SemanticDatabases
SemanticdatabasesarebasedonW3CRDFstandards.
TheyarebuildforsemanticexpertstorunSPARQL queries
tofilter,match,aggregate,andinfermeaning

Graphdatabasesarenotstandardized.
Theyarebuiltfordeveloperstowritecodetotraversegraphs,
tofilter,match,aggregatedata,andcalculatemeaning
115

FiveDataParadigms
Relational
FlexibleQueries

Document
EasyDevelopment

Dimensional
DataWarehousing

Graph
UnlimitedRelationships

Column/Keyvalue
FastPutsandGets

116

ModelingTakeaway
Noonephysicaldatamodelmeetsallneeds,sochooseamultimodelDB
Dimensional

BusinessIntelligencereportingand
analytics

Relational

Flexiblequeries,joins,updates,
mature,standard

WideColumn

Simple,fastputsandgets,massively
scalable

Document

Fastestdevelopment,schemaless
JSON/XML,searchable

Graph/RDF

Modelinganythingatruntime
includingrelationships

DocumentscombinedwithGraph
arethefuture
117

Whatmodelbestfitsyournextproject?
Thoughts?

118

MultimodelDatabases
indetail

119

PowerofCombiningDataParadigms
Narrative +Data =ContextualInformation +Relationships =Meaningful Knowledge
(Semantic &Structural)

T
ARelationalModelofDataforLarge
SharedDataBanks
P
E.F.CODD
IBMResearchLaboratory,SanJose,California
L
L
O
InformationRetrieval,Volume13/Number6/
A
A
A
June,1970
E
Programsshouldremainunaffectedwhenthe
I
internalrepresentationofdataischanged.
Treestructuredinadequaciesare
I
T
discussed.Relationsarediscussedand
T
appliedtotheproblemsofredundancyand
T
T
consistency.
KEYWORDSANDPHRASES:database,data
T
T
structure,dataorganization,hierarchiesofdata,
T
T
networksofdata,relations
T
T
CRCATEGORIES: 3.70,3.73,3.75,4.20,4.22
R R R R R

1.RelationalModelandNormalForm
1.1.INTRODUCTION
Thispaperisconcernedwiththeapplicationofelementaryrelation
theorytoformatteddata.Theproblemsarethoseofdata
independenceanddatainconsistency.
Therelationalviewappearstobesuperiorinseveralrespectsto
thegraph ornetworkmodel.
Relationalviewformsasoundbasisfortreatingderivability,
redundancy,andconsistency.[and]aclearerevaluationof
1.2.DATADEPENDENCIESINPRESENTSYSTEMS
Tablesrepresentamajoradvancetowardthegoalofdata
independence
1.2.1.OrderingDependence.Programswhichtakeadvantageof
thestoredorderingofafilearelikelytofailifitbecomes
necessarytoreplacethatorderingbyadifferentone.
1.2.2.IndexingDependence.Canapplicationprogramsremain
invariantasindicescomeandgo?
1.2.3.AccessPathDependence.Manyoftheexistingformatteddata
systemsprovideuserswithtreestructuredfilesorslightlymore
generalnetworkmodelsofthedata.Theseprogramsfailwhena
changeinstructurebecomesnecessary.Theprogramisrequired
toexploitpathstothedata.Programsbecomedependentonthe
continuedexistenceofthepaths.

120

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

MultimodelSQLDatabases
newSQL

LiveAnalytics

WideColumn
Complex
Key

Key/Value
Simple
Key

Document
JSON

Oracle DB
Enterprise DB

Graph/RDF

Oracle DB

SQL

DataWarehouse

DocWarehouse
XML
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:

Oracle DB
Enterprise DB

Drug
Name
Minicillan
Maxicillan
Minicillan

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Raw

Oracle Exadata
Oracle DB
Oracle DB
Enterprise DB

Relational
Morestructure(schema)

Dimensional

Widecolumn/Keyvalue

Document
Graph Raw
Lessstructure(schemaless)

121

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

MultimodelNoSQLDatabases
newSQL

LiveAnalytics

WideColumn
Complex
Key

Key/Value
Simple
Key

Document
JSON

Graph/RDF

Cassandra

MarkLogic
Cassandra

MarkLogic
OrientDB

MarkLogic
OrientDB
Cassandra

SQL

DataWarehouse

DocWarehouse
XML
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:

MarkLogic

Drug
Name
Minicillan
Maxicillan
Minicillan

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Raw

MarkLogic
MarkLogic
MarkLogic
Cassandra

Relational
Morestructure(schema)

Dimensional

Widecolumn/Keyvalue

Document
Graph Raw
Lessstructure(schemaless)

122

Agenda
1. DefiningNoSQLandBigData
2. OptimizingforVelocityorVolume
3. OptimizingforAvailabilityorConsistency
4. OptimizingforModelingParadigms
5.Summary

123

ModelingTakeaway
Chooseadatabasethatmeetsyourmultiplemodelingneeds
Dimensional

BusinessIntelligencereportingand
analytics

Relational

Flexiblequeries,joins,updates,
mature,standard

WideColumn

Simple,fastputsandgets,massively
scalable

Document

Fastdevelopment,schemaless
JSON/XML,searchable

Graph/RDF

Modelinganythingatruntime
includingrelationships

DocumentscombinedwithGraph
arethefuture
124

VelocityTakeaway
ChooseDBthathandlesyourrequiredvelocity
Volume
PerDay

Realworld1K
Transactions
PerDay

Realworld 1K
Transactions
PerSecond

Relational

Document

WideColumn
orKeyValue

8GB

8,640,000

100 AsIs

86 GB

86,400,000

1,000 Tuned*

AsIs

432GB

432,000,000

5,000 Appliance

Tuned*

AsIs

864GB

864,000,000

10,000 Clustered
Appliance

Clustered
Servers

Tuned*

8,640GB

8,640,000,000

100,000

43,200GB

43,200,000,000

500,000

ManyClustered Clustered
Servers
Servers
ManyClustered
Servers

*Tunedmeanstuningthemodel,queries,and/orhardware(moreCPU,RAM,andFlash)
125

HardwareTakeaway
ChooseDBdesignedtomeetyourscalingneeds
forvelocityandvolumeatlowesthardwarecost

LeveragesRAMwhenyouneedmaximumvelocity
(lowlatency)

Leveragesdiskwhenyouneedmassivevolume
(highbandwidth)

Scaleshorizontallyformaximumparallel
processing

Letsyouchoosetherightmixofsynchronousand
asynchronoustransactions
126

ConsistencyTakeaway
Chooseadatabasethatmeetsyourneeds
forwritelocalityorconsistency
MultimasterClusters

NaOH

BASE

Datacenter1

WriteLocality

Zone1

Zone2

Datacenter2
Zone1

Zone2

ACID
H2SO4

PointintimeConsistency
Lessdataloss(durability)
Morequeryaccuracy(isolation)

GloballyConsistentClusters
Datacenter1
Zone1
Zone2

Moredataintegrity(atomicity)
Lesscode tocompensatefordata
inconsistenciesandconflicts

Datacenter2
Zone1
Zone2

127

ChooseaDatabasethatisMatureEnoughforYou
DBaaS
NoSQL
DB
Appliances

MapReduce

Technology
Trigger

Inflated
Expectations

Disillusionment

Enlightenment

EnterpriseReady

1to5years

SQL

Productivity

5to10years

DerivedfromGartnerHypeCycleforDataManagement

128

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)
newSQL

#58 GemFire
#69 Oracle x10

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Document
Graph Raw
Lessstructure(schemaless)

129

EvaluatingandModeling
NoSQLandSQLDatabases
Part2
2016EDW
byMichaelBowers
20160314
v.4.9
mike@cssDesignPatterns.com
130

Vous aimerez peut-être aussi