Académique Documents
Professionnel Documents
Culture Documents
NoSQLandSQLDatabases
Part2
2016EDW
byMichaelBowers
20160314
v.4.9
mike@cssDesignPatterns.com
1
Abstract
Weareinthemiddleofadatabaserevolution.NoSQLisdisrupting
thedatabaseworldbyinnovatinginmanydisruptiveways.
Howdowemodelinthesenewparadigms?
HowdoestheoldSQLparadigmfitinthisnewbraveworld?
Whatparadigmisbestforyourproject?
Weareinanewdataparadigm:
Newdatabasearchitectures(softwareandhardware)handle
thelargeandevergrowingvelocityandvolumeofdatathat
isdispersedacrossgeographicallydistantdatacenters
Newgraph,document,andwidecolumnmodeling
paradigmscompetewithrelational,anddimensional
Schemalessdatabasesenablemaximumagilityofsoftware
developmentandrapidchangestohugedatasets
2
Whatwillyoulearn?
Youwillbeabletochoosethebestdatabasetomeetyourneedsfor
velocity,volume,variety,variability,relevance,productivity,data
model,scale,consistency,andcost.
YouwillknowthetradeoffsofACIDorBASEconsistencymodels
andwhenitisOKornotOKtocompromiseconsistency.
Youwillunderstandthestrengthsandweaknessesofrelational,
dimensional,document,keyvalue,andtriplemodels,andwhich
SQLandNoSQLdatabasessupportwhichmodels.
AbouttheAuthor
MichaelBowers
PrincipalArchitect
LDSChurch
Author
ProCSSandHTMLDesign
Patterns
PublishedbyApress,2007
ProHTML5andCSS3Design
Patterns
PublishedbyApress,2011
mike@cssDesignPatterns.com
4
ChurchofJesusChristofLatterdaySaints
15millionmembers(29,621congregationsworldwide)
Humanitarianassistancein185countries
Thousandsofdocumentsin188publishedlanguages
192websitesandapplicationsinproduction
withbillionsofpageviewsannuallyrunningon
hundredsofMarkLogicservers
Agenda
1.
2.
3.
4.
DefiningNoSQLandBigData
OptimizingforVelocityorVolume
OptimizingforAvailabilityorConsistency
OptimizingforModelingParadigms
FiveDataParadigms
Relational
FlexibleQueries
Document
EasyDevelopment
Dimensional
DataWarehousing
Graph
UnlimitedRelationships
WideColumn/Keyvalue
FastPutsandGets
RelationalModeling
indetail
Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt
LowLatencyOperational Velocity
HighBandwidthAnalytical Volume
Databases(Rankedbypopularityasof20160314)
newSQL
#58 GemFire
#69 Oracle x10
LiveAnalytics
#1 Oracle Exalytics
#19 SAP HANA
WideColumn
Complex
Key
#8 Cassandra
#15 Hbase
Key/Value
Simple
Key
#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak
SQL
DataWarehouse
Document
JSON
#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant
Relational
Morestructure(schema)
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan
#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift
Dimensional
#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan
DocWarehouse
XML
#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB
Graph/RDF
Big Data
John Hopkins
13
Heart Transplant
Dorothy Oz
Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx
Widecolumn/Keyvalue
Raw
Hadoop
#18 Splunk
Graph Raw
Document
Lessstructure(schemaless)
DataSourceusedinExamples
HospitalName:
OperationType:
Operation ID:
SurgeonName:
JohnHopkins
HeartTransplant
13
DorothyOz
Drug
Name
Drug
Manufacturer
Dose
Size
Minicillan
Maxicillan
Minicillan
DrugsRUs
200
Canada4LessDrugs 400
DrugUSA
150
Dose
UOM
mg
mg
mg
10
RelationalModeling
#1Normalize
Giveeachattributeitsownfield
Groupattributesintotablesensuringeach
tablehasonecoherentcontext
Assignoneprimarykeytoeachtable
Eliminateduplicateattributes
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan
JohnHopkins
HeartTransplant
DorothyOz
13
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
Hospital
HospitalID
HospitalName
Surgeon
SurgeonID
SurgeonName
Operation
OperationID
HospitalID
SurgeonID
OperationType
OperationCodes
OperationCodeID
OperationCodeType
Drugs
DrugID
DrugName
DrugManufacturer
OperationDrugs
OperationID
DrugID
DoseSize
DoseUOM
11
RelationalModeling
#2Orthogonalize
Createreferencetablesthatstand
independentofallcontexts
Thismaximizesdatareusebyallowing
tablestobecombinedwithothertablesto
createanycontext
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan
JohnHopkins
HeartTransplant
DorothyOz
13
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
Hospital
HospitalID
HospitalName
Surgeon
SurgeonID
SurgeonName
Operation
OperationID
HospitalID
SurgeonID
OperationType
OperationCodes
OperationCodeID
OperationCodeType
Drugs
DrugID
DrugName
DrugManufacturer
OperationDrugs
OperationID
DrugID
DoseSize
DoseUOM
12
RelationalModeling
#3Generalize
Maketablesmoregeneralinpurposeso
theycanbereusedinmultiplecontexts
SuchasreplacingSurgeonwithperson
Donotovergeneralizebecauseithidesthe
purposeofthemodel
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan
JohnHopkins
HeartTransplant
DorothyOz
13
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
Hospital
HospitalID
HospitalName
Surgeon
SurgeonID
SurgeonName
Operation
OperationID
HospitalID
SurgeonID
OperationType
OperationCodes
OperationCodeID
OperationCodeType
Drugs
DrugID
DrugName
DrugManufacturer
OperationDrugs
OperationID
DrugID
DoseSize
DoseUOM
13
RelationalModeling
#4Tune
Modifytablestomeettheapplications
performanceneedsforreadsandwrites
Suchasspeedingreadsbymaterializing
viewsandduplicatingattributesacross
tablestoeliminatejoins
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan
JohnHopkins
HeartTransplant
DorothyOz
13
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
Hospital
HospitalID
HospitalName
Surgeon
SurgeonID
SurgeonName
Operation
OperationID
HospitalID
SurgeonID
OperationType
OperationCodes
OperationCodeID
OperationCodeType
Drugs
DrugID
DrugName
DrugManufacturer
OperationDrugs
OperationID
DrugID
DoseSize
DoseUOM
14
RelationalModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
PROs
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
CONs
15
RelationalModelingAnswer
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
Orders
OrderID
CustomerID
OrderDate
OrderLineItems
OrderID
OrderLineID
ProductID
ProductPrice
ProductQuantity
Customers
CustomerID
CustomerName
CustomerPhone
CustomerAddressStreet
CustomerAddressCity
CustomerAddressState
CustomerAddressPostal
Products
ProductID
ProductName
ProductDescription
ProductListPrice
16
PROs
Mostflexiblequeries
RelationalModel
Updatedatainoneplace
Reusedatastructuresinanycontext
GreatDBtoDBintegration
Maturetools
Orders
OrderID
CustomerID
OrderDate
StandardQueryLanguage
Easytohireexpertise
CONs
Designtime,staticrelationships
Staticdatastructures:designbeforeloadingdata
Hardtomodel:mustshreddataintotables
Requirescodetomapshreddedrelationaldataback
intounifiedobjectorienteddatastructures
Cannotqueryforrelevance;hardtosearch
OrderLineItems
OrderID
OrderLineID
ProductID
ProductPrice
ProductQuantity
Customers
CustomerID
CustomerName
CustomerPhone
CustomerAddressStreet
CustomerAddressCity
CustomerAddressState
CustomerAddressPostal
Products
ProductID
ProductName
ProductDescription
ProductListPrice
17
RelationalModel
Summary
Useformaximumflexibility
inqueryingandupdating
operational data
Example:
traditionaldataentryapps
18
DimensionalModeling
indetail
19
Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt
LowLatencyOperational Velocity
HighBandwidthAnalytical Volume
Databases(Rankedbypopularityasof20160314)
newSQL
#58 GemFire
#69 Oracle x10
LiveAnalytics
#1 Oracle Exalytics
#19 SAP HANA
WideColumn
Complex
Key
#8 Cassandra
#15 Hbase
Key/Value
Simple
Key
#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak
SQL
DataWarehouse
Document
JSON
#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant
Relational
Morestructure(schema)
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan
#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift
Dimensional
#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan
DocWarehouse
XML
#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB
Graph/RDF
Big Data
John Hopkins
13
Heart Transplant
Dorothy Oz
Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx
Widecolumn/Keyvalue
Raw
Hadoop
#18 Splunk
Graph Raw
Document
Lessstructure(schemaless)
20
DimensionalModeling
#1ModelContexts
Determinethebusiness
questionsyouwantto
answer
Determinewhichfactwill
answeroneormore
questions
Determinethegrainofthe
fact
HospitalDimension
HospitalID
Attributes
SurgeonDimension
SurgeonID
Attributes
DrugDoseFacts
HospitalID
SurgeonID
OperationID
DrugID
DrugDose
DrugDimension
DrugID
Attributes
Operation
Dimension
OperationID
Attributes
Determinethedimensions
neededtojoinwiththefact
toanswerthebusiness
questions
Createonestarschema(or
OLAPmodel)perfact
21
DimensionalModeling
#2ELT
Extractdatafromasource
system
HospitalDimension
HospitalID
Attributes
Loaditintoastagingarea
inthedatawarehouse
Transformitintothestar
schema
SurgeonDimension
SurgeonID
Attributes
DrugDoseFacts
HospitalID
SurgeonID
OperationID
DrugID
DrugDose
DrugDimension
DrugID
Attributes
Operation
Dimension
OperationID
Attributes
Improvedataquality
22
DimensionalModeling
#3SemanticLayer
Definesemanticlayer
toenableselfservice
reporting
Renamecolumnstobe
businessfriendly
Adddescriptionsto
columns
HospitalDimension
HospitalID
Attributes
SurgeonDimension
SurgeonID
Attributes
DrugDoseFacts
HospitalID
SurgeonID
OperationID
DrugID
DrugDose
DrugDimension
DrugID
Attributes
Operation
Dimension
OperationID
Attributes
Createjoinpathsfor
errorfreereporting
23
DimensionalModeling
#4Tune
Determinequery
patterns
HospitalDimension
HospitalID
Attributes
Optimizequeries
Optimizeindexesand
fulltablescans
Movetospecialized
datawarehouse
technology
SurgeonDimension
SurgeonID
Attributes
DrugDoseFacts
HospitalID
SurgeonID
OperationID
DrugID
DrugDose
DrugDimension
DrugID
Attributes
Operation
Dimension
OperationID
Attributes
24
DimensionalModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
PROs
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
CONs
25
DimensionalModelingAnswer
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
26
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
OrderFact
OrderID
OrderLineID
OrderDateID
CustomerID
ProductID
ProductQty
ProductPrice
DateDim
OrderDateID
OrderDate
OrderDay
OrderMonth
OrderQuarter
OrderYear
CustomerDim
CustomerID
CustomerName
CustomerPhone
CustomerAreaCode
CustomerAddressStreet
CustomerAddressCity
CustomerAddressState
CustomerAddressPostal
ProductDim
ProductID
ProductName
ProductDescription
ProductCategory
ProductListPrice
26
PROs
DimensionalModel
Queriesfactsincontext
Selfservice,adhocqueries
Highperformanceplatforms
Maturetoolsandintegration
StandardQueryLanguage
Turnsdataintoinformation
CONs
Expensiveplatforms
Designtime,staticstructures:
designstructuresfirstthenloaddata
Cannotqueryforrelevance
Cannotqueryforanswersthatarenotbuilt
intothemodel
OrderFact
OrderID
OrderLineID
OrderDateID
CustomerID
ProductID
ProductQty
ProductPrice
DateDim
OrderDateID
OrderDate
OrderDay
OrderMonth
OrderQuarter
OrderYear
CustomerDim
CustomerID
CustomerName
CustomerPhone
CustomerAreaCode
CustomerAddressStreet
CustomerAddressCity
CustomerAddressState
CustomerAddressPostal
ProductDim
ProductID
ProductName
ProductDescription
ProductCategory
ProductListPrice
27
DimensionalModel
Summary
Usetotransformauthoritativedata
intocontextual information
toenableselfservice,adhoc,flexiblereporting
Examples:BusinessIntelligence,
DataWarehouse
28
NewSQL
indetail
29
Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt
LowLatencyOperational Velocity
HighBandwidthAnalytical Volume
Databases(Rankedbypopularityasof20160314)
newSQL
#58 GemFire
#69 Oracle x10
LiveAnalytics
#1 Oracle Exalytics
#19 SAP HANA
WideColumn
Complex
Key
#8 Cassandra
#15 Hbase
Key/Value
Simple
Key
#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak
SQL
DataWarehouse
Document
JSON
#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant
Relational
Morestructure(schema)
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan
#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift
Dimensional
#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan
DocWarehouse
XML
#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB
Graph/RDF
Big Data
John Hopkins
13
Heart Transplant
Dorothy Oz
Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx
Widecolumn/Keyvalue
Raw
Hadoop
#18 Splunk
Graph Raw
Document
Lessstructure(schemaless)
30
WhatswrongwithOldSQLDBs?
Relevance
Velocity
Volume
Variety
Variability
hacky
31
WhatswrongwithNewSQLDBs?
Relevance
Velocity
Volume
Variety
Variability
hacky
32
Relevance=meaningfultome
Narrative +Data =ContextualInformation +Relationships =Meaningful Knowledge
(Semantic
T
ARelationalModelofDataforLarge
SharedDataBanks
P
E.F.CODD
IBMResearchLaboratory,SanJose,California
L
L
O
InformationRetrieval,Volume13/Number6/
A
A
A
June,1970
E
Programsshouldremainunaffectedwhenthe
I
internalrepresentationofdataischanged.
Treestructuredinadequaciesare
I
T
discussed.Relationsarediscussedand
T
appliedtotheproblemsofredundancyand
T
T
consistency.
KEYWORDSANDPHRASES:database,data
T
T
structure,dataorganization,hierarchiesofdata,
T
T
networksofdata,relations
T
T
CRCATEGORIES: 3.70,3.73,3.75,4.20,4.22
R R R R R
&Structural)
1.RelationalModelandNormalForm
1.1.INTRODUCTION
Thispaperisconcernedwiththeapplicationofelementaryrelation
theorytoformatteddata.Theproblemsarethoseofdata
independenceanddatainconsistency.
Therelationalviewappearstobesuperiorinseveralrespectsto
thegraph ornetworkmodel.
Relationalviewformsasoundbasisfortreatingderivability,
redundancy,andconsistency.[and]aclearerevaluationof
1.2.DATADEPENDENCIESINPRESENTSYSTEMS
Tablesrepresentamajoradvancetowardthegoalofdata
independence
1.2.1.OrderingDependence.Programswhichtakeadvantageof
thestoredorderingofafilearelikelytofailifitbecomes
necessarytoreplacethatorderingbyadifferentone.
1.2.2.IndexingDependence.Canapplicationprogramsremain
invariantasindicescomeandgo?
1.2.3.AccessPathDependence.Manyoftheexistingformatteddata
systemsprovideuserswithtreestructuredfilesorslightlymore
generalnetworkmodelsofthedata.Theseprogramsfailwhena
changeinstructurebecomesnecessary.Theprogramisrequired
toexploitpathstothedata.Programsbecomedependentonthe
continuedexistenceofthepaths.
33
Variability
ManagingRapidChange
Schemasareincompatiblewithrapidchange
Constantlyevolvingdatastructures
Canweaffordtokeepalargeapplicationinsyncwithregularchangestodatastructures?
Bigdata
Isdatasolargethatittakestoolongtomodifyvalues,structures,andindexes?
Agiledevelopment
Arerequirementsstableenoughtocreatelonglastingrelationaldatastructures?
Schemaless dataisidealforrapidchange
Schemalessdataandlanguages
JSON/JavaScript,Triple/SPARQL,XML/XQuery
Defensiveprogramming isrequired
Youneverknowwhatquerieswillreturn
34
Variety
Handlingdatainallimaginableforms
Impedancemismatch
Differentdatastructures
Structured,unstructured,semistructured
Differentdataparadigms
Relational,Dimensional,Document,Graph,Objectoriented,etc.
Differentdatatypes
JSONdoesnthaveadate/time/durationtype,XMLschemaandSQLhavea
variety,etc.
Differentmarkupstandards
JSON,XML,RDF,etc.
35
newSQL
Summary
Usetoforinmemory,realtime
SQLtransactions
oldSQLdatabasesarenowproviding
highperformanceinmemorySQL
buttheystillcannotscalehorizontally
36
WideColumnand
KeyValueModeling
indetail
37
Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt
LowLatencyOperational Velocity
HighBandwidthAnalytical Volume
Databases(Rankedbypopularityasof20160314)
newSQL
#58 GemFire
#69 Oracle x10
LiveAnalytics
#1 Oracle Exalytics
#19 SAP HANA
WideColumn
Complex
Key
#8 Cassandra
#15 Hbase
Key/Value
Simple
Key
#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak
SQL
DataWarehouse
Document
JSON
#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant
Relational
Morestructure(schema)
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan
#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift
Dimensional
#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan
DocWarehouse
XML
#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB
Graph/RDF
Big Data
John Hopkins
13
Heart Transplant
Dorothy Oz
Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx
Widecolumn/Keyvalue
Raw
Hadoop
#18 Splunk
Graph Raw
Document
Lessstructure(schemaless)
38
WideColumnandKeyValueDatabases
WideColumnorMultidimensionalKey
Database
Query
UniqueFeature
Cassandra
CQL
Schemadefined,collocated,composite columns
HBase
API
Massivelysparse columnsonHDFS
Aerospike
AQL
Schemalesswithdynamicallytypedcolumns
Accumulo
API
Hbaselike withcelllevelsecurity
OracleNoSQL
API
Consistent fastperformance,JSON&AvroSchema,ACIDshards
Redis
API
Inmemory datastructures
MemcacheDB
API
SimpleMemcacheAPI
Riak
API
Search,MapReduce
DynamoDB
API
Schemaless: keyplusflatJSONlikevalue
FoundationDB
API,SQL
ACID, userdefinedkeystructures
SimpleKey
39
3Columnar/KeyValueModels
Multidimensionalkey plusCellvalue
WideColumn
40
MultidimensionalKeyandSingleCellModel
Multidimensionalkey plusCellvalue
WideColumn
Simplekeyplusmultidimensionalvalue
41
MultidimensionalKey
andSingleCellmodel
indetail
42
MultidimensionalKeyandSingleCellModel
#1ModelTransactions
Becausenojoinsarepossible,
createadenormalized
hierarchicalkeystructure
thatconnectseachattribute
ofthetransaction
DB
Ops
Ops
Ops
Ops
Ops
Ops
Ops
HospitalName:
JohnHopkins Ops
OperationType:
HeartTransplantOps
Ops
Operation ID:
13
Ops
SurgeonName:
DorothyOz
Ops
Ops
Drug
Drug
Dose
Ops
Name
Manufacturer
SizeOps
Minicillan
DrugsRUs
200 Ops
Maxicillan
Canada4LessDrugs 400 Ops
Minicillan
DrugUSA
150 Ops
Table
Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs
Dose JohnHopkins
OpsDrugs JohnHopkins
UOM
OpsDrugs JohnHopkins
mg JohnHopkins
OpsDrugs
OpsDrugs
mg JohnHopkins
OpsDrugs
mg JohnHopkins
Op
ID
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
2110
DrugMFG 20140814 Canada4LessDrugs
2110
Dose Size 20140814
400
mg
2110
DoseUOM 20140814
9448 OperationType 20140814 HeartTransplant
9448
Surgeon
20140814
DorothyOz
9448
DrugName 20140814
Minicillan
9448
DrugMFG 20140814
DrugUSA
9448
Dose Size 20140814
150
9448
DoseUOM 20140814
mg
43
MultidimensionalKeyandSingleCellModel
#2ReviewColocation
StructureofKey
Thekeydefineshowdatais
collocatedondisk.
OpsDrugs tableiscollocated
withintheOpsDB.
HospitalIDsarecollocated
withintheOpsDrugstable.
OpIDsarecollocatedwithin
theHospitalIDrows.
DrugIDsarecollocated
withintheOpIDrows.
Columns arecollocated
withineachrow,etc.
DB
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Table
Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
Op
ID
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
2110
DrugMFG 20140814 Canada4LessDrugs
2110
Dose Size 20140814
400
mg
2110
DoseUOM 20140814
9448 OperationType 20140814 HeartTransplant
9448
Surgeon
20140814
DorothyOz
9448
DrugName 20140814
Minicillan
9448
DrugMFG 20140814
DrugUSA
9448
Dose Size 20140814
150
9448
DoseUOM 20140814
mg
44
MultidimensionalKeyandSingleCellModel
#3VerifyColocation
EffectsonQueries
Queriesarefastwhentheyuse
thekeytoretrievedata
Youcanretrieveallvaluesthat
arecollocatedwithinaportion
ofthekey:
DB
Ops
Ops
Ops
Ops
Ops
Ops
Table
Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
Op
ID
13
13
13
13
13
13
Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
Opsdatabase
ReturnsallcellsintheOpsdatabase
Ops/OpsDrugs
ReturnsallcellsintheOpsDrugstable
Ops/OpsDrugs/JohnHopkins
ReturnsallcellsintheJohnHopkinshospital
Ops/OpsDrugs/JohnHopkins/13
ReturnsallcellsinOperation13
Ops/OpsDrugs/JohnHopkins/13/1997
ReturnsallcellsforDrugID1997
Ops/OpsDrugs/JohnHopkins/13/1997/Surgeon
ReturnsthevaluefortheSurgeoncell
45
MultidimensionalKeyandSingleCellModel
#4ShardingStrategy
Determinebestsharding
strategyforthedatabasedon
thequantitiesofdataineach
keyandgeographic
distributionofdatabykey
Configurehowthekeyis
shardedandreplicatedacross
serversintheclusterand
acrossdatacenters
DB
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Table
Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
Op
ID
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
2110
DrugMFG 20140814 Canada4LessDrugs
2110
Dose Size 20140814
400
mg
2110
DoseUOM 20140814
9448 OperationType 20140814 HeartTransplant
9448
Surgeon
20140814
DorothyOz
9448
DrugName 20140814
Minicillan
9448
DrugMFG 20140814
DrugUSA
9448
Dose Size 20140814
150
9448
DoseUOM 20140814
mg
46
MultidimensionalKeyandSingleCellModel
#5ModifyKey
Modifykeytomatchquery
needs,optimizecollocation,
andoptimizesharding
Forexample,youmaywantto
movetheDrugIDbeforeOp
ID.Thisisbecause
DB
Ops
Ops
Ops
Ops
Ops
Ops
Table
Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
Drug
ID
1997
1997
1997
1997
1997
1997
Op
Column
Time
CellValue
ID
Type
Stamp
13 OperationType 20140814 HeartTransplant
13
Surgeon
20140814
DorothyOz
13
DrugName 20140814
Minicillan
13
DrugMFG 20140814
DrugsRUs
13
Dose Size 20140814
200
13
DoseUOM 20140814
mg
Drugsarequeriedmore
oftenthanOperations
Thelargeamountofdata
withindrugsmakesita
goodsegmentforsharding
Sincethesearenaturalkeys,
changingtypesandstructures
hasmajorimpact
47
MultidimensionalKeyandSingleCellModel
#6Create
SecondaryIndexes
Createsecondaryindexesfor
queriesthatdonotfollowthe
hierarchyofthekey
Forexample,youneeda
secondaryindexonsurgeon
ifyouwanttoquicklyfindall
operationsperformedbya
surgeon
Secondaryindexesslow
downinserts,updates,and
deletesbecausetheyare
typicallycopiesoftheentire
tablewithadifferentkey
DB
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Table
Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
Op
ID
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
2110
DrugMFG 20140814 Canada4LessDrugs
2110
Dose Size 20140814
400
mg
2110
DoseUOM 20140814
9448 OperationType 20140814 HeartTransplant
9448
Surgeon
20140814
DorothyOz
9448
DrugName 20140814
Minicillan
9448
DrugMFG 20140814
DrugUSA
9448
Dose Size 20140814
150
9448
DoseUOM 20140814
mg
48
MultidimensionalKeyandSingleCellModel
#7MaterializeViews
Becausenojoinsare
possible,create
materializedviewsacross
multipletablesor
columnfamilies
thatmaterializethejoin
intoanewtable
Thismakesforveryfast
readsofdataacross
multiplerecords
Thisslowinserts
becauseeachinsert
isrepeatedmultiple
times behindthescenes
DB
Table
Hospital
ID
Ops OpsDrugs JohnHopkins
Ops OpsDrugs JohnHopkins
Ops OpsDrugs JohnHopkins
Op
ID
13
13
13
Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
HospitalID
Hospital Administrator
JohnHopkins JohnAdams
DrugID
1997
DB
Ops
Ops
Ops
Ops
Ops
Table
Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
Op
ID
13
13
13
13
13
DrugName
Minicillan
DrugSuccessRate
89%
Drug
Column
Time
CellValue
ID
Type
Stamp
1997
OperationType
20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName
20140814
Minicillan
1997 HospitalAdministrator 20140814
JohnAdams
1997
DrugSuccessRate 20140814
89%
49
MultidimensionalKeyandSingleCellModel
#8WriteCode
Developerwritesapplication
codeagainstthedatabase
APIorDSLto
createkeys
createsecondaryindexes
putdata
deletedata
getdata
joindata
ensuredataintegrity
(NOTE:joinsanddataintegrityare
notpartofthedatabase)
DB
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Table
Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
Op
ID
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
13
Drug
Column
Time
CellValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
2110
DrugMFG 20140814 Canada4LessDrugs
2110
Dose Size 20140814
400
mg
2110
DoseUOM 20140814
9448 OperationType 20140814 HeartTransplant
9448
Surgeon
20140814
DorothyOz
9448
DrugName 20140814
Minicillan
9448
DrugMFG 20140814
DrugUSA
9448
Dose Size 20140814
150
9448
DoseUOM 20140814
mg
50
MultidimKey/CellModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
PROs
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
CONs
51
MultidimKey/CellModelingAnswer
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
Table
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
OrderID
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
ColumnType
OrderDate
OrderTotalAmount
CustomerName
CustomerPhone
CustomerStreet
CustomerCity
CustomerState
CustomerPostalCode
Line1ProductName
Line1ProductDescription
Line1ProductPrice
Line1Product Quantity
Line2ProductName
Line2ProductDescription
Line2ProductPrice
Line2Product Quantity
Line3ProductName
Line3ProductDescription
Line3ProductPrice
Line3ProductQuantity
CellValue
20140816
$40
MikeBowers
8015551212
111My Street
SanDiego
CA
92093
CSSBook
CSSandHTMLDesign
$20
1
CSSBook
HTML5andCSS3Design
$20
2
52
MultidimensionalKeyandSingleCellValue
PROs
Fastputsandgets
Massivescalability
Easytoshard&replicate
Datacolocation
Sparselypopulatedcolumns
List,Map,andSetdatatypes
CONs
NoJoins
Mustcreateprejoinedtables
Createatableforeachquery
ShredJSONintoflatcolumns
NostandardqueryAPIorLang
Immaturetoolsandplatform
Hardtointegrateandhire
Table
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
Orders
OrderID
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
1332
ColumnType
OrderDate
OrderTotalAmount
CustomerName
CustomerPhone
CustomerStreet
CustomerCity
CustomerState
CustomerPostalCode
Line1ProductName
Line1ProductDescription
Line1ProductPrice
Line1Product Quantity
Line2ProductName
Line2ProductDescription
Line2ProductPrice
Line2Product Quantity
Line3ProductName
Line3ProductDescription
Line3ProductPrice
Line3ProductQuantity
CellValue
20140816
$40
MikeBowers
8015551212
111My Street
SanDiego
CA
92093
CSSBook
CSSandHTMLDesign
$20
1
CSSBook
HTML5andCSS3Design
$20
2
53
MultidimensionalKey/CellModel
summary
Useformaximumspeedandscalability
byhandtuningapplicationcodeforqueries&inserts
tocreateInternetscaleapplications
Example:
Netflix,Google,Linkedin,etc.
54
WideColumn
Model
indetail
55
Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt
LowLatencyOperational Velocity
HighBandwidthAnalytical Volume
Databases(Rankedbypopularityasof20160314)
newSQL
#58 GemFire
#69 Oracle x10
LiveAnalytics
#1 Oracle Exalytics
#19 SAP HANA
WideColumn
Complex
Key
#8 Cassandra
#15 Hbase
Key/Value
Simple
Key
#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak
SQL
DataWarehouse
Document
JSON
#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant
Relational
Morestructure(schema)
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan
#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift
Dimensional
#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan
DocWarehouse
XML
#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB
Graph/RDF
Big Data
John Hopkins
13
Heart Transplant
Dorothy Oz
Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx
Widecolumn/Keyvalue
Raw
Hadoop
#18 Splunk
Graph Raw
Document
Lessstructure(schemaless)
56
WideColumnModel
Multidimensionalkeyplusacellvalue
WideColumn
Simplekeyplusmultidimensionalvalue
57
WideColumnModel
#1Columnmodel
Easiertouseversionof
theMultidimensional
Key/Cellmodel
Rowsarepivotedinto
columnsinatable
SQLlikeQuery
Languagesmakeiteasy
tocreatetablesandto
querythem
Tuningprinciplesarethe
sameforwidecolumn
andmultidimensional
Keymodels,butlogical
modelingisdifferent
Joinsarenotpossible
Hospital
ID
John
Hopkins
John
Hopkins
John
Hopkins
DB
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Ops
Op
ID
13
Drug
ID
1997
13
2110
13
9448
Table
Operation
Type
Heart
Transplant
Heart
Transplant
Heart
Transplant
Hospital
ID
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
OpsDrugs JohnHopkins
Op
ID
13
13
13
13
13
13
13
13
13
Surgeon
Drug DrugMFG
Name
Dorothy Minicillan DrugsRUs
Oz
Dorothy Maxicillan Canada4Less
Oz
Drugs
Dorothy Minicillan DrugUSA
Oz
Dose
Size
200
Dose
UOM
mg
400
mg
150
mg
Drug
Column
Time
ColumnValue
ID
Type
Stamp
1997 OperationType 20140814 HeartTransplant
1997
Surgeon
20140814
DorothyOz
1997
DrugName 20140814
Minicillan
1997
DrugMFG 20140814
DrugsRUs
1997
Dose Size 20140814
200
1997
DoseUOM 20140814
mg
2110 OperationType 20140814 HeartTransplant
2110
Surgeon
20140814
DorothyOz
2110
DrugName 20140814
Maxicillan
58
WideColumnModel
Option#1
Becausenojoinsarepossible,
modeltheoneasthekeyof
aonetomanyrelationship
andthemanyasthemulti
columnvalue
HospitalName:
OperationType:
Operation ID:
SurgeonName:
Drug
Name
Minicillan
Maxicillan
Minicillan
Hospital Op
ID
ID
John
13
Hopkins
John
13
Hopkins
John
13
Hopkins
Operation
Type
Heart
Transplant
Heart
Transplant
Heart
Transplant
Surgeon
Dorothy
Oz
Dorothy
Oz
Dorothy
Oz
Drug
Drug DrugMFG
ID
Name
1997 Minicillan DrugsRUs
Dose
Size
200
Dose
UOM
mg
mg
mg
JohnHopkins
HeartTransplant
13
DorothyOz
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
59
WideColumnModel
Option#2
Becausenojoinsare
possible,modeloneto
manyrelationshipssparsely
populated,nestedgroupsof
repetitivecolumns
HospitalName:
OperationType:
Operation ID:
SurgeonName:
Drug
Name
Minicillan
Maxicillan
Minicillan
Hospital Op
ID
ID
John
13
Hopkins
Operation
Type
Heart
Transplant
Dorothy
Oz
Continued
JohnHopkins
HeartTransplant
13
DorothyOz
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Surgeon
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
Drug
Drug
ID1 Name1
1997 Minicillan
Drug
MFG1
DrugsRUs
Dose
Size1
200
Dose
UOM1
mg
Drug
Drug
Drug
Dose
ID2 Name2
MFG2
Size2
2110 Maxicillan Canada4Less 400
Drugs
Dose
UOM2
mg
Drug
Drug
ID3 Name3
9448 Minicillan
Dose
UOM3
mg
Drug
MFG3
DrugUSA
Dose
Size3
150
60
WideColumnModel
Option#3
Becausenojoinsarepossible,
createcolumnswithembedded
onetomanyrelationshipsas
nestedUDTs,maps,lists,or
sets
HospitalName:
OperationType:
Operation ID:
SurgeonName:
Drug
Name
Minicillan
Maxicillan
Minicillan
Hospital
ID
Op
ID
Operation Surgeon
Type
John
Hopkins
13
Heart
Transplant
JohnHopkins
HeartTransplant
13
DorothyOz
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
Drugs
Name
Dorothy {
Oz
'drug1':{
drug_name:'Minicillan',
drug_manufacturer:'DrugsRUs',
dose_size:200,dose_uom:'mg'},
'drug2':{
drug_name:Maxicillan',
drug_manufacturer:'Canada4LessDrugs',
dose_size:400,dose_uom:'mg'},
'drug3':{
drug_name:'Minicillan',
drug_manufacturer:'DrugUSA',
dose_size:150,dose_uom:'mg'}
}
61
WideColumnModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
PROs
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
CONs
62
WideColumnModelingAnswer#1
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Category
CSS Book
CSSBook
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
Order
Order
ID
Date
1332 20140816
CustomerAddress
{
street:'111MyStreet',
city:'SanDiego',
state:'CA',
postal:'92093'
}
OrderItems
{
'product1': {
category:'CSSBook',
description:'CSSandHTMLDesign',
price:20.00,
quantity:1},
'product2': {
category:'CSSBook',
description:'HTML5andCSS3Design',
price:20.00,
quantity:2}
CassandraUDTslooksomewhatlikeJSON,
buttheyarenotJSON.
Theylimitedquerycapabilities.
SeeUsingUserDefinedTypesinCassandra
Cont
}
63
WideColumnModelingAnswer#2
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Category
CSS Book
CSSBook
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
OrderID
OrderLine
ID
Order
Date
Customer
Name
Customer
Phone
1332
20140816
Mike
Bowers
801555
1212
1332
20140816
Mike
Bowers
801555
1212
Customer
Street
Customer
City
111MyStreet
SanDiego
CA
92093
111MyStreet
SanDiego
CA
92093
Cont
Customer Customer
State
Postal
Cont
Product
Category
Product
Description
Product
Price
Product
Quantity
CSSBook
CSSandHTMLDesign
20.00
CSSBook
HTML5andCSS3Design
20.00
64
WideColumnModelingAnswer#3
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Category
CSS Book
CSSBook
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
Product
ID
Order
ID
17
1332
Cont
CSSBook CSSandHTMLDesign
9466
1332
CSSBook HTML5andCSS3Design
Product Product
Price
Quantity
Order
Date
Customer
Name
20.00
20140816 MikeBowers
20.00
20140816 MikeBowers
Customer
Phone
Customer
Street
Customer
City
Cont
Customer Customer
State
Postal
CA
92093
CA
92093
65
WideColumn
PROs
TablelikewithSQLlikequeries
Fastputsandgets
Massivescalability
Easytoshard&replicate
Datacolocation
Sparselypopulatedcolumns
List,Map,andSetdatatypes
CONs
Modelbyquery:Createatableperquery
NoJoins:mustcreateprejoinedtables
ShredJSONintoflatcolumnsorflatmaps
NostandardqueryAPIorLang
Immaturetoolsandplatform
Hardtointegrateandhire
OrderID
OrderLine
ID
Order
Date
Customer
Name
Customer
Phone
1332
20140816
Mike
Bowers
801555
1212
1332
20140816
Mike
Bowers
801555
1212
Customer
Street
Customer
City
111MyStreet
SanDiego
CA
92093
111MyStreet
SanDiego
CA
92093
Cont
Customer Customer
State
Postal
Cont
Product
Category
Product
Description
Product
Price
Product
Quantity
CSSBook
CSSandHTMLDesign
20.00
CSSBook
HTML5andCSS3Design
20.00
66
WideColumnModel
summary
Useformaximumspeedandscalability
withSQLlikecodeforalloperations
tocreateInternetscaleapplications
Example:
Apple,Netflix,Google,Linkedin,etc.
67
Key/Value
Model
indetail
68
SimpleKey/MultidimensionalValueModel
Multidimensionalkeyplusacellvalue
WideColumn
SimplekeyplusMultidimensionalvalue
69
Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt
LowLatencyOperational Velocity
HighBandwidthAnalytical Volume
Databases(Rankedbypopularityasof20160314)
newSQL
#58 GemFire
#69 Oracle x10
LiveAnalytics
#1 Oracle Exalytics
#19 SAP HANA
WideColumn
Complex
Key
#8 Cassandra
#15 Hbase
Key/Value
Simple
Key
#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak
SQL
DataWarehouse
Document
JSON
#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant
Relational
Morestructure(schema)
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan
#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift
Dimensional
#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan
DocWarehouse
XML
#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB
Graph/RDF
Big Data
John Hopkins
13
Heart Transplant
Dorothy Oz
Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx
Widecolumn/Keyvalue
Raw
Hadoop
#18 Splunk
Graph Raw
Document
Lessstructure(schemaless)
70
SimpleKey/MultidimensionalValue
1
Value
Type
OpsDrugs
Op
ID
JohnHopkins 13
OpsDrugs
JohnHopkins
13
OpsDrugs
JohnHopkins
13
Key
HospitalName:
OperationType:
Operation ID:
SurgeonName:
Drug
Name
Minicillan
Maxicillan
Minicillan
Hospital
Drug
Time Operation Surgeon
Drug
ID
Stamp
Type
Name
1997 20140814
Heart
DorothyOz Minicillan
Transplant
2110 20140814
Heart
DorothyOz Maxicillan
Transplant
9448 20140814
Heart
DorothyOz Minicillan
Transplant
Dose
Size
200
400
150
DrugsRUs
Canada4Less
Drugs
DrugUSA
Dose Dose
Size UOM
200
mg
400
mg
150
mg
#1ModelTransactions
JohnHopkins
HeartTransplant
13
DorothyOz
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
DrugMFG
Dose
UOM
mg
mg
mg
Createadenormalizedflat
datastructureforeach
attributeinthetransaction
Eachrecordwillbeaccess
throughsimple,meaningless
key
71
SimpleKey/MultidimensionalValue
1
Value
Type
OpsDrugs
Op
ID
JohnHopkins 13
OpsDrugs
JohnHopkins
13
OpsDrugs
JohnHopkins
13
Key
Hospital
Drug
Time Operation Surgeon
Drug
ID
Stamp
Type
Name
1997 20140814
Heart
DorothyOz Minicillan
Transplant
2110 20140814
Heart
DorothyOz Maxicillan
Transplant
9448 20140814
Heart
DorothyOz Minicillan
Transplant
DrugMFG
DrugsRUs
Canada4Less
Drugs
DrugUSA
Dose Dose
Size UOM
200
mg
400
mg
150
mg
#2CreateSecondaryIndexes
Determinewhichattributesneedtobequeried
andcreatesecondaryindexesonthem
Primaryandsecondaryindexesareforfastquerieswithinrecords.
Theydonot makeitpossibletodojoinsbetweenrecords.
Key/Valuedatabaseslimitthenumberofsecondaryindexes:
ADynamoDB tablecanhaveupto5globalsecondaryindexesand5local
72
SimpleKey/MultidimensionalValue
#3WriteCode
Developerwritesapplicationcode
againstthedatabaseAPIorDSLto
createkeys,createsecondaryindexes,
putrecords,deleterecords,getrecords,
joinrecords,ensuredataintegrity
Joinsanddataintegrityhavetobedone
inapplicationcodebecausethedatabase
cant
Todojoinsintheapplication,youretrievea
record,readanyembeddedIDstoother
records,andthenretrievethoserecords.
Key/Valuedatabases,suchas
DynamoDB,arestartingtosupport
JSONlike values,whichmakesthem
morelikeDocumentDatabases
{"_id": 1,
"_type":"Operation",
"operation":{
"hospitalName":"JohnHopkins",
"operationTypeName":"HeartTransplant",
"surgeonName":"DorothyOz",
"operationNumber":13,
"administeredDrugs":[
{"drugName":
"Minicillan",
"drugManufacturer":"DrugsRUs",
"drugDoseSize":
200,
"drugDoseUOM":
"mg"},
{"drugName":
"Maxicillan",
"drugManufacturer":"Canada4Less",
"drugDoseSize":
400,
"drugDoseUOM":
"mg"},
{"drugName":
"Minicillan",
"drugManufacturer":"DrugUSA",
"drugDoseSize":
150,
"drugDoseUOM":
"mg"}]}}
73
SimpleKeyModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
PROs
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
CONs
74
SimpleKeyModelingAnswer#1
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Category
CSS Book
CSSBook
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
ID
Order Order
ID
LineID
Order
Date
Customer
Name
1332
1332
Customer
Street
Customer
City
111MyStreet
SanDiego
CA
92093
111MyStreet
SanDiego
CA
92093
Customer
Phone
Cont
Customer Customer
State
Postal
Cont
Product
Category
Product
Description
Product
Price
Product
Quantity
CSSBook
CSSandHTMLDesign
20.00
CSSBook
HTML5andCSS3Design
20.00
2
75
SimpleKeyAnswer#2:JSONlikeTypes
{
"id":"1332",
"type":"order",
"orderDate":"20140816",
"customerName":"MikeBowers",
"customerPhone":"8015551212",
"customerAddress":{
"customerAddressStreet":"111MyStreet",
"customerAddressCity":"SanDiego",
"customerAddressState":"CA",
"customerAddressPostalCode":"92093"
},
"product":[
{"productCategory":"CSSBook",
"productDescription":"CSSandHTMLDesign",
"productPrice":20.00,
"productQuantity":1
},
{"productCategory":"CSSBook",
"productDescription":"HTML5andCSS3Design",
"productPrice":20.00,
"productQuantity":2
}
]
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Category
CSS Book
CSSBook
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
76
76
SimpleKeyandMultidimensionalValue
1
Value
Type
OpsDrugs
Op
ID
JohnHopkins 13
OpsDrugs
JohnHopkins
13
OpsDrugs
JohnHopkins
13
Key
Hospital
Drug
Time Operation Surgeon
Drug
ID
Stamp
Type
Name
1997 20140814
Heart
DorothyOz Minicillan
Transplant
2110 20140814
Heart
DorothyOz Maxicillan
Transplant
9448 20140814
Heart
DorothyOz Minicillan
Transplant
DrugMFG
DrugsRUs
Canada4Less
Drugs
DrugUSA
Dose Dose
Size UOM
200
mg
400
mg
150
mg
PROs
CONs
Fastputsandgets
NoJoins;Appimplementsjoins
Massivescalability
Noreferentialintegrity
Secondaryindexesrequiredtoqueryvalues
ShredJSONvalueintoflatcolumnsorJSONliketypes
Inexpensive
DesignqueriesusingnonstandardAPIorquerylanguage
Cannotqueryforrelevance
Dataintransactionalcontext
Immaturetoolsandplatform
Developerincontrol
Hardtointegrateandhireexpertise
Easytoshard&replicate
Verysimpletomodel
77
Key/Value
summary
Usewhenyouneedmaximumspeed
toretrieveasetofvaluesbykey
Typicallythevalueisabloborasetofflatvalues
AdocumentDBisbetterthanasimplekey/valueDB
becauseJSON&XMLaretrueobjectstructures
78
DocumentModeling
indetail
79
Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt
LowLatencyOperational Velocity
HighBandwidthAnalytical Volume
Databases(Rankedbypopularityasof20160314)
newSQL
#58 GemFire
#69 Oracle x10
LiveAnalytics
#1 Oracle Exalytics
#19 SAP HANA
WideColumn
Complex
Key
#8 Cassandra
#15 Hbase
Key/Value
Simple
Key
#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak
SQL
DataWarehouse
Document
JSON
#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant
Relational
Morestructure(schema)
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan
#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift
Dimensional
#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan
DocWarehouse
XML
#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB
Graph/RDF
Big Data
John Hopkins
13
Heart Transplant
Dorothy Oz
Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx
Widecolumn/Keyvalue
Raw
Hadoop
#18 Splunk
Graph Raw
Document
Lessstructure(schemaless)
80
Whatisadocument?
Adocumentisanestedstructurereferencedbykey
{"_id":"1",
"_type":"Operation",
"operation":{
"hospitalName":"JohnHopkins",
"operationTypeName":"HeartTransplant",
"surgeonName":"DorothyOz",
"operationNumber":13,
"administeredDrugs":[
{"drugName":"Minicillan","drugManufacturer":"DrugsRUs","drugDoseSize":200,"drugDoseUOM":"mg"},
{"drugName":"Maxicillan","drugManufacturer":"Canada4Less","drugDoseSize":400,"drugDoseUOM":"mg"},
{"drugName":"Minicillan","drugManufacturer":"DrugUSA","drugDoseSize":150,"drugDoseUOM":"mg"}
],
"relations":{
"values":[
{"subject":"1","predicate":"opHospital","object":"10","hospitalAddress":"1057Mayberry"},
{"subject":"1","predicate":"opType","object":"100","insuranceCode":21187},
{"subject":"1","predicate":"opSurgeon","object":"10000","surgeonSuccessRate":0.87},
{"subject":"1","predicate":"opDrug","object":"10000","drugEfficacy":0.8,"drugRecalls":1},
{"subject":"1","predicate":"opDrug","object":"20000","drugEfficacy":0.5,"drugRecalls":3},
{"subject":"1","predicate":"opDrug","object":"30000","drugEfficacy":0.7,"drugRecalls":1}
]}}}
81
JSONvs.XML
{"section":{
"heading":"DataModels",
"paragraphs":[
{"paragraph":[
{"s":"Thispapershows." }]},
{"paragraph":[
<section>
<heading>DataModels</heading>
<paragraph>
Thispapershows.</paragraph>
<paragraph>
"The",
{"i":"relational"},
"modelisnolonger,",
{"br":null},
"theonlygameintown."]}]}}
The
<i>relational</i>
modelisnolonger,
<br/>
theonlygameintown. </paragraph></section>
JSON
XML
1.
2.
3.
4.
5.
1.
2.
3.
4.
5.
Bestforstructureddata(textpouredintoobjects)
Nodocumenttypeandimmatureschemas
Objects,arrays,floats,strings,booleans,nulls
Nonamespaces,Nocomments,Noattributes
Easy,simple,compact,andfasttoparse
Bestforstructuredtext (structureaddedontopoftext)
Documenttypeswithoptionalmatureschemas
Objects,sets,alldatatypes:dates,durations,integers,etc.
Namespaces,Comments,Attributes
Attributesaddmetadata;Namespacesembedobjecttypes
82
DocumentModeling
#1ModelTransactions
Eachdocumentisatransaction
JSONdataintheapplicationis
theJSONdataindatabase
DocumentIDistheprimarykey
Eachdocumentincludesalldata
capturedduringthetransaction
Eachdocumentishistorically
accurateforitspointintime
Usesecondaryindexestoflatten
structuretomakequeriesflexible
Usesearchindexestofindthe
mostrelevantdocuments
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan
JohnHopkins
HeartTransplant
DorothyOz
13
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
Documentmodelingistheoppositeof
relational:startdenormalizedand
normalizetomatchtransactionpatterns
83
DocumentModeling
#2CreateReferenceDocs
Createadditionaldocumenttypesfor
eachtypeofreferencedata,suchas
Hospitals,OperationTypes,and
Drugs
Hospital
ID
10
20
Operation
TypeID
100
200
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan
Hospital
Name
JohnHopkins
BostonChildrens
Operation
TypeName
HeartTransplant
Appendectomy
DocumentID:1
Surgeon Surgeon
ID
Name
1000
Dorothy Oz
2000
VanTristic
Drug
ID
10000
20000
30000
JohnHopkins
HeartTransplant
DorothyOz
13
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Drug
Name
Minicillan
Maxicillan
Minicillan
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
84
DocumentModeling
#3aConnectReferences
Youcanquerydocumentrelationships
Youcanjoindocumentsinthedatabaseand
returntheresultsofthejoin
Subject
TransDoc
1
1
1
1
1
1
Predicate
Relationship
SurgeryInHospital
SurgeryOperationType
SurgerySurgeon
SurgeryDrugsGiven
SurgeryDrugsGiven
SurgeryDrugsGiven
Object
RefDoc
10
100
1000
10000
20000
30000
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan
JohnHopkins
HeartTransplant
DorothyOz
13
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
85
DocumentModeling
#3bConnectReferences
OK:Usedocumentreferences to
unidirectionally connecteachtransaction
documenttoitsreferencedocuments.
Thismakesiteasytocreatelinksfroma
transactiondocumenttoitsreference
documents.
IfyouindexeachreferenceID,thenqueries
forreferenceddocumentscanbefast
Thisisnotajoin:yourapplicationcode
readsinadocument,findsreferenced
documents,andretrievesthemonebyone
Hospital
ID
10
Hospital
Name
JohnHopkins
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
ID
10000
20000
30000
Operation Operation
TypeID
TypeName
100
HeartTransplant
Drug
Name
Minicillan
Maxicillan
Minicillan
Drug
ID
10000
20000
30000
JohnHopkins
10
HeartTransplant 100
DorothyOz
13
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Drug
Name
Minicillan
Maxicillan
Minicillan
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
86
DocumentModeling
#4SyncReferences
DocumentID:1
Optionallychoosetosynchronize
changesinreferencedocuments
intotransactiondocuments.
Topreservehistory,addchanges
tonewelements.
Tomaximizeintegrityoverwrite
original.
Hospital
ID
10
Hospital
Name
JHResearch
Operation Operation
TypeID
TypeName
100
CardiacTransplant
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan
JohnHopkins
JHResearch
HeartTransplant Cardiactransplant
DorothyOz
DorothyWiz
13
Drug
Manufacturer
DrugsRUs
BestDrugs
DrugUSA
Surgeon Surgeon
ID
Name
1000
Dorothy Wiz
Drug
ID
20000
Dose
Size
200
400
150
Drug
Name
Maxicillan
Dose
UOM
mg
mg
mg
Drug
Manufacturer
BestDrugs
87
DocumentModeling
#5Projections
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Optionallychoosetoproject
valuesfromreferencedocuments
intotransactiondocuments.
Formaximumreadandsearch
performance,projectdataduring
writes.
Drug
Name
Minicillan
Maxicillan
Minicillan
Formaximumwriteperformance,
projectdataduringreads.
Drug
ID
10000
20000
30000
Drug
Name
Minicillan
Maxicillan
Minicillan
Drug
Manufacturer
DrugsRUs
BestDrugs
DrugUSA
Drug
Efficacy
80%
50%
70%
Drug
Recalls
1
3
1
JohnHopkins
HeartTransplant
DorothyOz
13
Drug
Manufacturer
DrugsRUs
BestDrugs
DrugUSA
Drug
Efficacy
80%
50%
70%
Drug
Recalls
1
3
1
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
88
Whyisthedocumentmodel
bestfordeveloperproductivity?
JSONisthelinguafrancaoftheweb
JSONRESTWebServicesaremodeledfirst(relationalisanafterthought)
Documentssupportsagiledevelopmentwithoutaschema
Documentshandlerapidlychangingrequirements
Documentshandledeeplyhierarchical,complex,andhighlyvariablestructures
Documentshavenoimpedancemismatchbetweenapplicationanddatabase
Documentsmakesearchrelevancepossible
Fulltextsearchincontextofdocumentstructure
Fullfeaturedqueriesofanydataanywhereinadocument
89
DocumentModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
PROs
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 mg
CONs
90
DocumentModelingAnswer
{
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
"id":"1332",
"type":"order",
"orderDate":"20140816",
"customerName":"MikeBowers",
"customerPhone":"8015551212",
"customerAddress":{
"customerAddressStreet":"111MyStreet",
"customerAddressCity":"SanDiego",
"customerAddressState":"CA",
"customerAddressPostalCode":"92093"
},
"product":[
{"productCategory":"CSSBook",
"productDescription":"CSSandHTMLDesign",
"productPrice":20.00,
"productQuantity":1
},
{"productCategory":"CSSBook",
"productDescription":"HTML5andCSS3Design",
"productPrice":20.00,
"productQuantity":2
}
]
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 mg
91
DocumentModel
PROs
Fastestdevelopment
SimpleKey/DocumentValue
{
"id":"1332",
"type":"order",
"orderDate":"20140816",
"customerName":"MikeBowers",
"customerPhone":"8015551212",
"customerAddress":{
"customerAddressStreet":"111MyStreet",
"customerAddressCity":"SanDiego",
"customerAddressState":"CA",
"customerAddressPostalCode":"92093"
},
"product":[
{"productCategory":"CSSBook",
"productDescription":"CSSandHTMLDesign",
"productPrice":20.00,
"productQuantity":1
},
{"productCategory":"CSSBook",
"productDescription":"HTML5andCSS3Design",
"productPrice":20.00,
"productQuantity":2
}
]
Indexeverything,Queryanything
Selfservice,adhocqueries
Schemaless,designdataatruntime
JSONand/orXMLdatastructures
Querieseverythingincontextwithrelevance
Turnsdataintoinformation
CONs
Defensiveprogrammingforschemalessdata
Expensiveplatforms,immaturetools
NonstandardQueryLanguages
Notasfastaswidecolumnandsimplekey
valuedatabases
}
92
DocumentModel
summary
Usewhenyouneedmaximumdeveloperproductivity
andgreatspeedandscalability
Example:
Enterpriseapplications,Websites,etc.
93
DocumentModel
tip
UseJSON forobjects
UseXML fortext
(tomarkupstructure,semantics,anddata)
94
GraphModeling
indetail
95
Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt
LowLatencyOperational Velocity
HighBandwidthAnalytical Volume
Databases(Rankedbypopularityasof20160314)
newSQL
#58 GemFire
#69 Oracle x10
LiveAnalytics
#1 Oracle Exalytics
#19 SAP HANA
WideColumn
Complex
Key
#8 Cassandra
#15 Hbase
Key/Value
Simple
Key
#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak
SQL
DataWarehouse
Document
JSON
#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant
Relational
Morestructure(schema)
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan
#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift
Dimensional
#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan
DocWarehouse
XML
#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB
Graph/RDF
Big Data
John Hopkins
13
Heart Transplant
Dorothy Oz
Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx
Widecolumn/Keyvalue
Raw
Hadoop
#18 Splunk
Graph Raw
Document
Lessstructure(schemaless)
96
WhatisaTriple?
AtripleisthreeIDs:subject,predicate,object
{"triples":[
{"subject":"docID_1","predicate":"opDrug","object":"docID_10000"},
{"subject":"docID_1","predicate":"opDrug","object":"docID_20000"},
{"subject":"docID_1","predicate":"opDrug","object":"docID_30000"}]}
Subject
Itisthefocus ofthetriple
ItisaURIuniquetothedatabase
Predicate
Specifiestherelationship betweenthesubjectandtheobject
ItisaURItypicallydefinedbyexternalontologies
Object
Specifiesthetarget ofthesubjectsrelationship
ItisavalueoraURI typicallytheURIofanothersubjectorastring,number,date,etc.
97
WhatisaQuad?
excelsat
Surgeon
performed
Operation
operatedon
operatedon
Person
Operated
at
AQuadisaTripleprefixedwiththe
collectionitbelongsto
patientat
worksat
Hospital
{"quads":[
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"excelsAt","object":"operationDoc13"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"performed","object":"operationDoc13"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"operatedOn","object":"userDoc1554"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"worksAt","object":"hospitalDoc10"},
{"collection":"HospitalOps","subject":"operationDoc13","predicate":"requestingUser","object":"userDoc1554"},
{"collection":"HospitalOps","subject":"operationDoc13","predicate":"operatedAt","object":"hospitalDoc10"},
{"collection":"HospitalOps","subject":"userDoc1554","predicate":"patientAt","object":"hospitalDoc10"}]}
98
TriplesDeconstructDataAtomically
Triplesbreakdowndataintosingularitems identified
byIDs
Thisislikedeconstructingdataintoelectrons,neutrons,
andprotonssothatyoucanreconstructanytypeof
atomandthencombineatomsintomolecules,and
combinemoleculesintocompounds,etc.
99
TriplesFocusonRelationships notData
Theprimaryfocusoftriplesison
relationshipsbetweenitems:
Traversinganetworkofrelationships
Findingitemsthathavethesamerelationshippatterns
Togetanyinformationaboutanitem
requiresqueryingrelationshipstootheritems
Tomakethiseasier,someTripledatabasesallowitemsto
havepropertiesorallowitemstobedocuments
100
ConnectingTriplesandDocuments
Neo4J isapropertygraphdatabase
Itprovidesproperties onsubjects,predicates,andobjects
(i.e.nodesandrelationships)
MarkLogic isanRDFsemanticgraphdatabase
Itallowssubjects,predicates,andobjectsbereferencesto
documentsanditallowsdocumentstocontain embedded
triplesandprojectionsoftripledata
OrientDB connectsdocumentsusingtriples
101
EmbeddingTriplesinaDocument
Usetriplestorelatedocumentsbidirectionally
{"_id":"1",
"_type":"Operation",
"operation":{
"hospitalName":"JohnHopkins",
"operationTypeName":"HeartTransplant",
"surgeonName":"DorothyOz",
"operationNumber":13,
"administeredDrugs":[
{"drugName":"Minicillan","drugManufacturer":"DrugsRUs","drugDoseSize":200,"drugDoseUOM":"mg"},
{"drugName":"Maxicillan","drugManufacturer":"Canada4Less","drugDoseSize":400,"drugDoseUOM":"mg"},
{"drugName":"Minicillan","drugManufacturer":"DrugUSA","drugDoseSize":150,"drugDoseUOM":"mg"}
],
"relations":{
"values":[
{"subject":"1","predicate":"opHospital","object":"10"},
{"subject":"1","predicate":"opType","object":"100"},
{"subject":"1","predicate":"opSurgeon","object":"10000"},
{"subject":"1","predicate":"opDrug","object":"10000"},
{"subject":"1","predicate":"opDrug","object":"20000"},
{"subject":"1","predicate":"opDrug","object":"30000"}
]}}}
102
ProjectingDocumentValuesintoTriples
Atwriteorreadtimeyoucanprojectdataintodocs
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
JohnHopkins
HeartTransplant
DorothyOz
13
{"_id":"1",
"_type":"Operation",
"operation":{
Drug Drug
Drug
Dose Dose
Drug Drug
Drug
ID
Name
Manufacturer Size
UOM
ID
Efficacy Recalls
"hospitalName":"JohnHopkins",
10000 Minicillan DrugsRUs
200
mg
10000 80%
1
"operationTypeName":"HeartTransplant",
20000 Maxicillan Canada4Less 400
mg
20000 50%
3
"surgeonName":"DorothyOz",
30000 Minicillan DrugUSA
150
mg
30000 70%
1
"operationNumber":13,
"administeredDrugs":[
{"drugName":"Minicillan","drugManufacturer":"DrugsRUs","drugDoseSize":200,"drugDoseUOM":"mg"},
{"drugName":"Maxicillan","drugManufacturer":"Canada4Less","drugDoseSize":400,"drugDoseUOM":"mg"},
{"drugName":"Minicillan","drugManufacturer":"DrugUSA","drugDoseSize":150,"drugDoseUOM":"mg"}
],
"relations":{
"values":[
{"subject":"1","predicate":"opHospital","object":"10","hospitalAddress":"1057Mayberry"},
{"subject":"1","predicate":"opType","object":"100","insuranceCode":21187
},
{"subject":"1","predicate":"opSurgeon","object":"10000","surgeonSuccessRate":0.87
},
{"subject":"1","predicate":"opDrug","object":"10000","drugEfficacy":0.8,"drugRecalls":1 },
{"subject":"1","predicate":"opDrug","object":"20000","drugEfficacy":0.5,"drugRecalls":3 },
{"subject":"1","predicate":"opDrug","object":"30000","drugEfficacy":0.7,"drugRecalls":1 }
]}}}
103
PowerofCombiningDocumentsandTriples
Narrative +Data =ContextualInformation +Relationships =Meaningful Knowledge
(Semantic &Structural)
T
ARelationalModelofDataforLarge
SharedDataBanks
P
E.F.CODD
IBMResearchLaboratory,SanJose,California
L
L
O
InformationRetrieval,Volume13/Number6/
A
A
A
June,1970
E
Programsshouldremainunaffectedwhenthe
I
internalrepresentationofdataischanged.
Treestructuredinadequaciesare
I
T
discussed.Relationsarediscussedand
T
appliedtotheproblemsofredundancyand
T
T
consistency.
KEYWORDSANDPHRASES:database,data
T
T
structure,dataorganization,hierarchiesofdata,
T
T
networksofdata,relations
T
T
CRCATEGORIES: 3.70,3.73,3.75,4.20,4.22
R R R R R
1.RelationalModelandNormalForm
1.1.INTRODUCTION
Thispaperisconcernedwiththeapplicationofelementaryrelation
theorytoformatteddata.Theproblemsarethoseofdata
independenceanddatainconsistency.
Therelationalviewappearstobesuperiorinseveralrespectsto
thegraph ornetworkmodel.
Relationalviewformsasoundbasisfortreatingderivability,
redundancy,andconsistency.[and]aclearerevaluationof
1.2.DATADEPENDENCIESINPRESENTSYSTEMS
Tablesrepresentamajoradvancetowardthegoalofdata
independence
1.2.1.OrderingDependence.Programswhichtakeadvantageof
thestoredorderingofafilearelikelytofailifitbecomes
necessarytoreplacethatorderingbyadifferentone.
1.2.2.IndexingDependence.Canapplicationprogramsremain
invariantasindicescomeandgo?
1.2.3.AccessPathDependence.Manyoftheexistingformatteddata
systemsprovideuserswithtreestructuredfilesorslightlymore
generalnetworkmodelsofthedata.Theseprogramsfailwhena
changeinstructurebecomesnecessary.Theprogramisrequired
toexploitpathstothedata.Programsbecomedependentonthe
continuedexistenceofthepaths.
104
GraphModeling
#1aDefineRelationships
excelsat
Defineastandardsetofrelationships
withprecisemeanings
Thisiscriticalbecauserelationships
assignmeaningtoitemsandmake
queriespossible
Surgeon
performed
Operation
operatedon
operatedon
Person
Operated
at
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan
patientat
JohnHopkins
HeartTransplant
DorothyOz
13
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Dose
Size
200
400
150
worksat
Dose
UOM
mg
mg
mg
Hospital
105
GraphModeling
#1bUseExistingOntologies
Savetimeandmakeyourdataeasierto
understandbyleveragingexisting
relationshipontologies
DublinCore
FOAF
TrackBack
MetaVocab
BasicGeoVocabulary
BIO
RSS1.0
VCardRDF
CreativeCommonsmetadata
WOT
TIP:Searchforontologiesat
LinkedOpenVocabularies(LOV)
SIOC
GoodRelations
DOAP
Programmes Ontology
MusicOntology
OpenGUID
ProvenanceVocabulary
Pedagogicaldiagnosis
DILIGENTArgumentation
106
GraphModeling
#2DefineAttributes
excelsat
CreateanIDforeachitem
TheIDcanbehumanreadable,butit
usuallyisavariationofaUUID
Surgeon
performed
Operation
operatedon
operatedon
Person
Operated
at
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan
patientat
JohnHopkins
HeartTransplant
DorothyOz
13
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Dose
Size
200
400
150
worksat
Dose
UOM
mg
mg
mg
Hospital
107
GraphModeling
#3AddCoreDatatoItems
Definecommoncoredatayouwantadded
toallitems
BecauseanitemissimplyanIDthathasno
meaning,youneedtoaddcoremetadatato
it,suchastype,name,updatedBy,
updatedOn,etc.
Ifyourdatabaseallowsanitemtobea
documentortohaveproperties,youcan
addcoredatadirectlytoit
ID:155321
itemType
Surgeon
itemName
DorothyOz
updatedBy
ID:622480
Inapuretriplesystem,youmustuse
triplestoconnectitemstocoredata
108
GraphModeling
#3AddRelationships
excelsat
Createatripleforeveryrelationship
betweenitems
Surgeon
performed
Graphsareschemaless youcan
addmorerelationshipsatruntime
Operation
operatedon
operatedon
Person
Operated
at
DocumentID:1
HospitalName:
OperationType:
SurgeonName:
Operation Number:
Drug
Name
Minicillan
Maxicillan
Minicillan
patientat
JohnHopkins
HeartTransplant
DorothyOz
13
Drug
Manufacturer
DrugsRUs
Canada4LessDrugs
DrugUSA
Dose
Size
200
400
150
worksat
Dose
UOM
mg
mg
mg
Hospital
109
QueryusingSPARQL
excelsat
Surgeon
performed
SPARQLisatriplequerylanguage
Operation
operatedon
operatedon
Person
patientat
worksat
Hospital
Operated
at
SELECT*
WHERE{?SurgeonexcelsAt?Operation
?OperationisNamed
"HeartSurgery"}
Returnsallsurgeonswhoexcelatheartsurgery.
{"quads":[
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"excelsAt","object":"operationDoc13"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"performed","object":"operationDoc13"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"operatedOn","object":"userDoc1554"},
{"collection":"HospitalOps","subject":"surgeonDoc1","predicate":"worksAt","object":"hospitalDoc10"},
{"collection":"HospitalOps","subject":"operationDoc13","predicate":"requestingUser","object":"userDoc1554"},
{"collection":"HospitalOps","subject":"operationDoc13","predicate":"operatedAt","object":"hospitalDoc10"},
{"collection":"HospitalOps","subject":"operationDoc13","predicate":"isNamed","object":"HeartSurgery"},
{"collection":"HospitalOps","subject":"userDoc1554","predicate":"patientAt","object":"hospitalDoc10"}
]}
110
Whynotuse3ColumnsinaTableasatriple?
Subject
lds.org/manual/TrueToTheFaith
lds.org/manual/TrueToTheFaith
lds.org/manual/TrueToTheFaith/Faith
lds.org/manual/TrueToTheFaith/Faith
lds.org/manual/TrueToTheFaith/Faith
www.lds.org/scriptures/nt/heb/11
lds.org/topic/faith
lds.org/topic/faith
lds.org/topic/salvation
Predicate
isType
isDocumentType
isChapterIn
isRelatedToTopic
ReferencesScripture
isRelatedToTopic
isType
isRelatedToTopic
isChapterIn
Object
lds.org/type/publishedDocument
lds.org/documentType/Pamphlet
lds.org/manual/TrueToTheFaith
lds.org/topic/faith
www.lds.org/scriptures/nt/heb/11
lds.org/topic/faith
lds.org/type/topic
lds.org/topic/salvation
lds.org/manual/TrueToTheFaith
Arelationaldatabaserequireshundredsofrecursivejoinstoresolvetriplequeries
ImaginetheperformanceandcomplexityofaSQLquerythatjoinsseveralhundredtables
ImaginetheperformanceofasingletablethatcontainsbillionsofrowswhereeveryCRUD
statementinthedatabaseisexecutedonthattableanditsthreeindexes
Howdoyoudoinferences(suchasafatherofafatherisagrandfather)inSQL?
WhydoesOracleDBhaveatripleindexandlicenseitasaseparateproduct?
111
TripleModelingExercise
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
PROs
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 mg
CONs
112
TripleModelingAnswer
{"relationships":[
1332", "predicate":"rdf:type",
{"subject":"1332", "predicate":"orderNumber",
{"subject":"1332", "predicate":"orderDate",
{"subject":"1332", "predicate":"orderTotal",
{"subject":"1332", "predicate":"customer",
{"subject":"1332", "predicate":"productOrdered",
{"subject":"1332", "predicate":"productOrdered",
"order"},
"object":"1332"},
"object":"20140816"},
"object":40.00},
"object":1},
"object":100 },
"object": 200 },
1
1
{"subject":"1",
{"subject":"1",
"rdf:type",
"customerName",
"predicate":"customerPhone",
"predicate":"customerAddress",
"customer"},
"MikeBowers"},
"object":"8015551212"},
"object": 10 },
"rdf:type",
"addressStreet",
"predicate":"addressCity",
"predicate":"addressState",
"predicate":"addressPostal",
"address"},
"111MyStreet"},
"object":"SanDiego"},
"object":"CA"},
"object":"92093"},
{"subject":"
DocumentID:1
OrderNumber:
OrderDate:
TotalAmount:
CustomerName:
CustomerPhone:
CustomerAddress:
Product
Name
CSS Book
CSSBook
1332
20140816
$40
MikeBowers
8015551212
Street
City, State,PostalCode
Product
Description
Price QTY
CSSandHTMLDesign
$20 1
HTML5andCSS3Design $20 2
{"subject":" ",
"predicate":
{"subject":" ",
"predicate":
10",
10",
{"subject":"10",
{"subject":"10",
{"subject":"10",
{"subject":"
"predicate":
{"subject":"
"predicate":
100",
100 ",
{"subject":"100 ",
{"subject":"100 ",
{"subject":"100",
"rdf:type",
"productCategory",
"predicate":"productDescription",
"predicate":"productPrice",
"predicate":"productQuantity",
{"subject":"
"predicate":
{"subject":"
"predicate":
200",
{"subject":"200",
{"subject":"200",
{"subject":"200",
{"subject":"200",
{"subject":"
"rdf:type",
"predicate":"productCategory",
"predicate":"productDescription",
"predicate":"productPrice",
"predicate":"productQuantity",
"predicate":
"object":
"object":
"object":
"object":
"object":
"product"},
"CSSBook"},
"object":"CSSandHTMLDesign"
"object":20.00 },
"object":1 },
"object":
"object":
},
"product" },
"object":"CSSBook"},
"object":"HTML5andCSS3Design" },
"object":20.00 },
"object":2 }]}
113
"object":
GraphModel
summary
Modelnetworks,relatedocuments,
andenrichdata
Example:
genetics,familyhistory,socialnetworks
114
Graphvs.SemanticDatabases
SemanticdatabasesarebasedonW3CRDFstandards.
TheyarebuildforsemanticexpertstorunSPARQL queries
tofilter,match,aggregate,andinfermeaning
Graphdatabasesarenotstandardized.
Theyarebuiltfordeveloperstowritecodetotraversegraphs,
tofilter,match,aggregatedata,andcalculatemeaning
115
FiveDataParadigms
Relational
FlexibleQueries
Document
EasyDevelopment
Dimensional
DataWarehousing
Graph
UnlimitedRelationships
Column/Keyvalue
FastPutsandGets
116
ModelingTakeaway
Noonephysicaldatamodelmeetsallneeds,sochooseamultimodelDB
Dimensional
BusinessIntelligencereportingand
analytics
Relational
Flexiblequeries,joins,updates,
mature,standard
WideColumn
Simple,fastputsandgets,massively
scalable
Document
Fastestdevelopment,schemaless
JSON/XML,searchable
Graph/RDF
Modelinganythingatruntime
includingrelationships
DocumentscombinedwithGraph
arethefuture
117
Whatmodelbestfitsyournextproject?
Thoughts?
118
MultimodelDatabases
indetail
119
PowerofCombiningDataParadigms
Narrative +Data =ContextualInformation +Relationships =Meaningful Knowledge
(Semantic &Structural)
T
ARelationalModelofDataforLarge
SharedDataBanks
P
E.F.CODD
IBMResearchLaboratory,SanJose,California
L
L
O
InformationRetrieval,Volume13/Number6/
A
A
A
June,1970
E
Programsshouldremainunaffectedwhenthe
I
internalrepresentationofdataischanged.
Treestructuredinadequaciesare
I
T
discussed.Relationsarediscussedand
T
appliedtotheproblemsofredundancyand
T
T
consistency.
KEYWORDSANDPHRASES:database,data
T
T
structure,dataorganization,hierarchiesofdata,
T
T
networksofdata,relations
T
T
CRCATEGORIES: 3.70,3.73,3.75,4.20,4.22
R R R R R
1.RelationalModelandNormalForm
1.1.INTRODUCTION
Thispaperisconcernedwiththeapplicationofelementaryrelation
theorytoformatteddata.Theproblemsarethoseofdata
independenceanddatainconsistency.
Therelationalviewappearstobesuperiorinseveralrespectsto
thegraph ornetworkmodel.
Relationalviewformsasoundbasisfortreatingderivability,
redundancy,andconsistency.[and]aclearerevaluationof
1.2.DATADEPENDENCIESINPRESENTSYSTEMS
Tablesrepresentamajoradvancetowardthegoalofdata
independence
1.2.1.OrderingDependence.Programswhichtakeadvantageof
thestoredorderingofafilearelikelytofailifitbecomes
necessarytoreplacethatorderingbyadifferentone.
1.2.2.IndexingDependence.Canapplicationprogramsremain
invariantasindicescomeandgo?
1.2.3.AccessPathDependence.Manyoftheexistingformatteddata
systemsprovideuserswithtreestructuredfilesorslightlymore
generalnetworkmodelsofthedata.Theseprogramsfailwhena
changeinstructurebecomesnecessary.Theprogramisrequired
toexploitpathstothedata.Programsbecomedependentonthe
continuedexistenceofthepaths.
120
Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt
LowLatencyOperational Velocity
HighBandwidthAnalytical Volume
MultimodelSQLDatabases
newSQL
LiveAnalytics
WideColumn
Complex
Key
Key/Value
Simple
Key
Document
JSON
Oracle DB
Enterprise DB
Graph/RDF
Oracle DB
SQL
DataWarehouse
DocWarehouse
XML
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Oracle DB
Enterprise DB
Drug
Name
Minicillan
Maxicillan
Minicillan
Big Data
John Hopkins
13
Heart Transplant
Dorothy Oz
Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
Raw
Oracle Exadata
Oracle DB
Oracle DB
Enterprise DB
Relational
Morestructure(schema)
Dimensional
Widecolumn/Keyvalue
Document
Graph Raw
Lessstructure(schemaless)
121
Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt
LowLatencyOperational Velocity
HighBandwidthAnalytical Volume
MultimodelNoSQLDatabases
newSQL
LiveAnalytics
WideColumn
Complex
Key
Key/Value
Simple
Key
Document
JSON
Graph/RDF
Cassandra
MarkLogic
Cassandra
MarkLogic
OrientDB
MarkLogic
OrientDB
Cassandra
SQL
DataWarehouse
DocWarehouse
XML
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
MarkLogic
Drug
Name
Minicillan
Maxicillan
Minicillan
Big Data
John Hopkins
13
Heart Transplant
Dorothy Oz
Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
Raw
MarkLogic
MarkLogic
MarkLogic
Cassandra
Relational
Morestructure(schema)
Dimensional
Widecolumn/Keyvalue
Document
Graph Raw
Lessstructure(schemaless)
122
Agenda
1. DefiningNoSQLandBigData
2. OptimizingforVelocityorVolume
3. OptimizingforAvailabilityorConsistency
4. OptimizingforModelingParadigms
5.Summary
123
ModelingTakeaway
Chooseadatabasethatmeetsyourmultiplemodelingneeds
Dimensional
BusinessIntelligencereportingand
analytics
Relational
Flexiblequeries,joins,updates,
mature,standard
WideColumn
Simple,fastputsandgets,massively
scalable
Document
Fastdevelopment,schemaless
JSON/XML,searchable
Graph/RDF
Modelinganythingatruntime
includingrelationships
DocumentscombinedwithGraph
arethefuture
124
VelocityTakeaway
ChooseDBthathandlesyourrequiredvelocity
Volume
PerDay
Realworld1K
Transactions
PerDay
Realworld 1K
Transactions
PerSecond
Relational
Document
WideColumn
orKeyValue
8GB
8,640,000
100 AsIs
86 GB
86,400,000
1,000 Tuned*
AsIs
432GB
432,000,000
5,000 Appliance
Tuned*
AsIs
864GB
864,000,000
10,000 Clustered
Appliance
Clustered
Servers
Tuned*
8,640GB
8,640,000,000
100,000
43,200GB
43,200,000,000
500,000
ManyClustered Clustered
Servers
Servers
ManyClustered
Servers
*Tunedmeanstuningthemodel,queries,and/orhardware(moreCPU,RAM,andFlash)
125
HardwareTakeaway
ChooseDBdesignedtomeetyourscalingneeds
forvelocityandvolumeatlowesthardwarecost
LeveragesRAMwhenyouneedmaximumvelocity
(lowlatency)
Leveragesdiskwhenyouneedmassivevolume
(highbandwidth)
Scaleshorizontallyformaximumparallel
processing
Letsyouchoosetherightmixofsynchronousand
asynchronoustransactions
126
ConsistencyTakeaway
Chooseadatabasethatmeetsyourneeds
forwritelocalityorconsistency
MultimasterClusters
NaOH
BASE
Datacenter1
WriteLocality
Zone1
Zone2
Datacenter2
Zone1
Zone2
ACID
H2SO4
PointintimeConsistency
Lessdataloss(durability)
Morequeryaccuracy(isolation)
GloballyConsistentClusters
Datacenter1
Zone1
Zone2
Moredataintegrity(atomicity)
Lesscode tocompensatefordata
inconsistenciesandconflicts
Datacenter2
Zone1
Zone2
127
ChooseaDatabasethatisMatureEnoughforYou
DBaaS
NoSQL
DB
Appliances
MapReduce
Technology
Trigger
Inflated
Expectations
Disillusionment
Enlightenment
EnterpriseReady
1to5years
SQL
Productivity
5to10years
DerivedfromGartnerHypeCycleforDataManagement
128
Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt
LowLatencyOperational Velocity
HighBandwidthAnalytical Volume
Databases(Rankedbypopularityasof20160314)
newSQL
#58 GemFire
#69 Oracle x10
LiveAnalytics
#1 Oracle Exalytics
#19 SAP HANA
WideColumn
Complex
Key
#8 Cassandra
#15 Hbase
Key/Value
Simple
Key
#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak
SQL
DataWarehouse
Document
JSON
#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant
Relational
Morestructure(schema)
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan
#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift
Dimensional
#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan
DocWarehouse
XML
#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB
Graph/RDF
Big Data
John Hopkins
13
Heart Transplant
Dorothy Oz
Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA
Dose
Size
200
400
150
Dose
UOM
mg
mg
mg
#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx
Widecolumn/Keyvalue
Raw
Hadoop
#18 Splunk
Document
Graph Raw
Lessstructure(schemaless)
129
EvaluatingandModeling
NoSQLandSQLDatabases
Part2
2016EDW
byMichaelBowers
20160314
v.4.9
mike@cssDesignPatterns.com
130