Académique Documents
Professionnel Documents
Culture Documents
Bestpractices
TransformingIBMIndustryModels
intoaproductiondatawarehouse
AustinClifford
DB2WarehouseQAspecialist
IBMDublinLab
DavidMurphy
DB2Developmentlead
IBMDublinLab
PatMeehan
SeniorITSpecialist
IBMDublinLab
RnnOSuilleabhain
SamiAbed
Integration&deployment DB2Kerneldeveloper
specialist,IndustryModels IBMDublinLab
IBMDublinLab
Issued:October2012
GarrettFitzsimons
Bestpracticesspecialistfor
warehouse&appliances
IBMDublinLab
TransformingIBMIndustryModelsintoaproductiondatawarehouse .... 1
Executivesummary ............................................................................................. 4
Introduction .......................................................................................................... 5
Usingindustrymodels........................................................................................ 6
Implementinganindustrymodelasaphysicaldatabase........................ 7
Understandingdatabasedesignchallenges............................................... 8
Scopingthelogicalmodelandtransformingintoaphysicaldatamodel ... 9
Scopingthelogicalmodel............................................................................. 9
Preparingthephysicaldatamodelfordeploymentasapartitioned
database ............................................................................................................... 11
Startingwithdatabasepartitiongroups................................................... 11
Implementingatablespacestrategy......................................................... 12
Customizingkeysanddatatypes.............................................................. 14
Usingreportspecificationstoinfluenceyourphysicaldatamodel ........... 16
AnexcerptfromtheSolvencyIIphysicaldatamodel............................ 16
InterpretingSolvencyIIexamplereportsasdatabasedesigndecisions
......................................................................................................................... 17
Optimizingthedatabasearchitectureanddesignforyourenvironment . 19
Choosingdistributionkeys......................................................................... 19
ChoosingMDCtablesandcolumns.......................................................... 20
Partitioninglargetablesforqueryperformanceandmanageability ... 22
Indexingforperformance ........................................................................... 23
UsingpartitionedMQTstoenhancequeryperformance ...................... 26
Enablingandevaluatingcompression...................................................... 27
Replicatingnonpartitionedtables ............................................................. 28
Ingestingdataintothedatabase ...................................................................... 29
Usingtemporaltablestoeffectivelymanagechange ................................... 33
Implementingsystemperiodtemporaltimeforadimensiontable..... 33
Implementingbusinessperiodtemporaltimeforadimensiontable.. 35
BestpracticesfordeployingIBMIndustryModels
Page2of50
Conclusion .......................................................................................................... 37
Bestpractices....................................................................................................... 38
AppendixA.Testenvironment ....................................................................... 39
AppendixBSampleQueries ............................................................................ 40
Queryforreportexample1:Assetsbyregion,bycounterparty,by
creditrating................................................................................................... 40
Queryforreportexample2:Assetvaluationsdrilldownbydimension
filter ................................................................................................................ 46
AppendixCSampleSolvencyIIIBMCognosreport ................................... 47
Furtherreading................................................................................................... 48
Contributors.................................................................................................. 48
Notices ................................................................................................................. 49
Trademarks ................................................................................................... 50
BestpracticesfordeployingIBMIndustryModels
Page3of50
Executivesummary
Implementinganindustrymodelcanhelpaccelerateprojectsinawidevarietyof
industrysectorsbyreducingtheeffortrequiredtocreateadatabasedesignoptimizedfor
datawarehousingandbusinessintelligence.
IBMIndustryModelscoverarangeofindustrieswhichincludebanking,healthcare,
retail,andtelecommunications.Theexamplethatischoseninthisdocumentisfroma
subsetoftheIBMInsuranceInformationWarehousemodelpertainingtoSolvencyII(SII)
regulations.
Manyofthemostimportantpartitioneddatabasedesigndecisionsaredependentonthe
queriesthataregeneratedbyreportingandanalyticalapplications.Thispaperexplains
howtotranslatereportingneedsintodatabasedesigndecisions.
Thispaperguidesyouthroughthefollowingrecommendedprocessfortransforminga
logicaldatamodelfordimensionalwarehousingintoaphysicaldatabasedesignfor
productionuseinyourenvironment.Thekeyphasesofthisprocessare:
Createasubsetofthedatamodelsubsetfromthesuppliedlogicaldatamodel
PreparethephysicaldatamodelfordeploymentasapartitionedDB2database
Refinethephysicaldatamodeltoreflectyourreportingandanalyticsneeds
Optimizethedatabasearchitectureanddesignforaproductionenvironment
Implementadatabasearchitecturethatisalignedwithbestpracticesforwarehousing
beforetuningyourdatabasedesigntoreflectperformanceneedsforreportsandqueries.
Whenyouhavecreatedandpopulatedthetestdatabasecanyoufurtheroptimizethe
databasedesigntoreflecttheanticipatedquery,ingestandmaintenanceworkload.
Apoorlydesigneddatabaseandarchitecturecanleadtopoorqueryperformanceanda
needforoutagestoaccommodatemaintenanceoperations.Usingtherecommendations
inthispapercanhelpyoutransformanIBMIndustryModeldimensionalwarehouse
solutionintoapartitioneddatabasethatisreadyforproductionuse.
BestpracticesfordeployingIBMIndustryModels
Page4of50
Introduction
IBMIndustryModelsprovideyouwithanextensiveandextensibledatamodelforyour
industrysector.UsethelogicaldatamodelasprovidedbyIBMtobuildaphysicalmodel
thatiscustomizedforyourreportingrequirementsthendeployandpopulateabest
practicepartitioneddatabaseproductionenvironment.
Thispaperdoesnotdiscussdatamodelingconceptsbutinsteadfocusesonwhatyou
mustdototransformanonvendorspecificlogicaldatamodelintoabestpractice
productionDB2partitioneddatabase.
Thispaperistargetedatpeopleinvolvedintransformingthedimensionaldata
warehouselogicaldatamodelintoaproductionpartitioneddatabasethatisbasedon
DB2DatabaseforLinux,UNIX,andWindowssoftwarev10.1.
ThegoaloftheIBMInsuranceInformationWarehousemodelinaddressingSolvencyIIis
tofacilitatereportinginlinewithEuropeanUniondirectivesandinternalbusiness
requirements.Asubsetoftablesfromthedimensionallayer,togetherwithasampleSII
basedCognosreport,isreferencedthroughoutthepaper.Thetestenvironmentusedis
describedinappendixA.
Thefirstsectionofthispaperlooksatthedeploymentpatternandthemaindesign
challengesthatyoumustaddresswhenyouimplementanindustrymodel.Theprocess
ofidentifyingandmanipulatingthecomponentsoftheindustrymodelthatarerelevant
toyourbusinessisoutlined.
Thesecondsectionofthepaperlooksattransformingthelogicaldatamodelintoa
physicaldatamodelthatisalignedwithbestpracticesforapartitionedDB2database.
Thethirdsectionofthepapershowshowtotranslatereportingrequirementsinto
databasedesignchoicesthathelpshapethephysicaldatamodel.
Thefourthsectionofthepaperdescribeshowtooptimizethedatabasedesigntoreflect
thespecificneedsofyourproductionenvironment.Thetemporalfeatureandcontinuous
dataingestutility,bothintroducedinDB2Version10.1,aredescribedinthecontextof
howtheymightinfluenceyourdatabasedesigndecisions.
TheIBMPureDataforOperationalAnalyticsSystemimplementsbestpracticesfordata
warehousearchitecturethatusesDB2software.Thesharednothingarchitectureofthe
IBMPureDataforOperationalAnalyticsSystemprovidesaplatformthatemphasizes
performance,scalability,andbalance.ThepaperBestPractices:Physicaldatabase
designfordatawarehouseenvironments,referencedintheFurtherreadingsection,
coverstherecommendationsfordatawarehousedatabasedesignindetail.
BestpracticesfordeployingIBMIndustryModels
Page5of50
Usingindustrymodels
Anindustrymodelisacomprehensivesetofpredesignedmodelsthatformthebasisofa
businessandsoftwaresolution.Anindustrymodelssolutionconsistsofasetofindustry
specificintegratedmodelsthatareoptimizedforbusinesschallengesinaparticular
sector.Domainareasincludedatawarehousing,businessintelligence,businessprocess
management,serviceorientedarchitecture,businessterminology,andbusinessglossary
templates.
SolvencyII(SII),aEuropeanUniondirectiveoninsuranceregulation,andtheassociated
Pillar3reportingrequirementsareextensivelycoveredbytheIBMInsurance
InformationWarehouse(IIW)datamodels.ThispaperusestheSIIcoveragewithinthe
IIWmodeltoshowhowbestpracticesindatawarehousingforDB2canbeappliedtoan
industrymodel.
Figure1Thedimensionalwarehousemodelwithinthecontextofindustrymodels
Tochoosetheentitiesorscopeofthelogicaldatamodelthatarerelevanttoyour
business,firstdetermineyourspecificreportingrequirements.
BestpracticesfordeployingIBMIndustryModels
Page6of50
Implementinganindustrymodelasaphysicaldatabase
Implementingalogicaldatamodelasaphysicaldatabasepresentstechnicalchallenges.
Thereareseveralstepsthroughwhichyoucreateandoptimizeyourphysicaldatabase
forproductionuse.
Figure2Typicaldeploymentpatternsforcreatingaphysicaldatabase
Thephasesthatareinvolvedinthisprocessinclude:
Scopingthelogicalmodelandtransformingitintoaphysicalmodel
Mapreportingrequirementstothelogicaldatamodeltodeterminethescopeof
themodel.Includeonlythoseentitiesandattributesforwhichyouhavea
reportingrequirement.
PreparingthephysicaldatamodelfordeploymentasapartitionedDB2database
Updatethephysicaldatamodeltobecompatiblewiththetargetpartitioned
databasearchitecture.
Refiningthephysicaldatamodeltoreflectyourindividualreportingneeds
Usethedetailsofindividualreportspecificationstohelpyoufurtherdefinethe
physicaldatabasemodel.
Optimizingthedatabasearchitectureanddesignforaproductionenvironment
Createandpopulatethephysicaldatabasetodeterminefinaldatabasedesign
optimizationsthatarebasedonactualqueryandmaintenanceworkloadsthat
mustbeappliedbackintothephysicaldatamodel.
BestpracticesfordeployingIBMIndustryModels
Page7of50
Understandingdatabasedesignchallenges
Thelogicaldatamodelasprovidedcontainsnovendorspecificdatabasefeatures.You
mustimplementthefeaturesofyourdatabasesoftwareinthephysicaldatamodel.
Focusyourdatawarehousedesigndecisionsonthefollowingelements:
Queryperformanceefficientqueryperformanceminimizesresourceusage.
Usedatabasepartitioning,materializedquerytables(MQT),multidimensional
clustering(MDC),andtablerangepartitioning(RP)tohelpmaximizequery
performanceacrossalldatabasepartitions.
Intelligenttablespacedesigntheabilitytomanage,move,andarchivedataas
itages.
Intelligenttablespacedesignfacilitatesdataarchiving,supportformulti
temperaturestorage,flexibilityinbackupandrecoverystrategies.
Dataingestingestdatawithminimaleffectondataavailability.
Implementanarchitectureinwhichdataingestanddatabackupcanoperate
concurrently.
Onlinemaintenanceoperationsarchitectanddesignforconcurrentdatabase
operations.
Reducethenumberofoperationsthatareneededtomaintaindataavailability
andqueryperformanceandenableonlinedatabaseoperations.
Focusonperformance,scalability,andbalancewhenyoudesignyourdatabase
environment.
Thefollowinginformationmustbeavailableordeterminedbeforeyoucanbeginto
optimizeyourdatabasedesign:
Thevolumeofdatapredictedforeachfactanddimensiontableinitiallyandover
thelifecycleofthedatabaseforeachtable.
Knowwhichfactanddimensiontablesarelargesttofinalizeyourdistribution
keys.
Datalifecycleandmultitemperaturerequirements.
Thelengthoftimedatamustberetainedwithinthedatabase.Maintainingdata
forqueriesandforregulatorycomplianceinfluenceshowyoumightpartition
yourtablesorconfigurestoragegroups.
Samplequeriesfromreportsoranalyticsapplicationsthatreflectanticipatedor
realqueriestobesubmittedinproduction.
ThismethodhelpstodetermineMDC,MQT,indexing,anddistributionkey
selectionrequirements.
Statedobjectivesforbackupandrestoreoperations.
Aggressivebackupandrestoretimeobjectivescandeterminethatcertaintables
areplacedinseparatetablespaces.
BestpracticesfordeployingIBMIndustryModels
Page8of50
Scopingthelogicalmodelandtransformingintoa
physicaldatamodel
Scopingtheindustrymodelistheprocessofselectingthebusinessobjectsthatyouneed
fromthelogicaldatamodeltobuildavalidphysicaldatamodel.Thephysicaldata
modelmustreflectyouranalyticalandreportingneeds.
Asaprerequisitetothescopingphase,youmustmodelyourbusinessrequirementsand
mapthesetoanalyticalrequirements.Thequalityandavailabilityofyourdatasources
mustalsobeunderstoodandassessed.However,thesetasksareoutsidethescopeofthis
documentwhichfocusesontheimplementationofaproductionphysicaldatabase.
Whenyoucompletetheprocessofscoping,youcantransformyourlogicaldatamodel
intoaphysicaldatamodelbyselectingamenuoptioninIBMInfoSphereData
Architect.
IBMInfoSphereDataArchitectisadatamodelingtoolthatyoucanusedtoscope,
transform,andcustomizethedatamodelsthataresuppliedintheIndustryModels
solutions.TheexamplesinthispaperreferenceInfoSphereDataArchitectandyoucan
referencetheFurtherreadingsectionformoredetailsabouttheproduct.
Scopingthelogicalmodel
Thelogicaldatamodelisdesignedtomeetallaspectsofreportingforanindustrysector.
Yourenterprisemightnotneedalloftheobjectsthatareprovided.Scopingisthe
process,byusingInfoSphereDataArchitectorothermodelingtools,ofselectingthose
entitiesfromthelogicaldatamodelthatalignwithyouranalyticalrequirements.
Refineyourlogicaldatamodelscopetoaddressonlyyourcurrentdataandreporting
needs.ThistargetsjustthosetablesaccessedbyETL,queriesandmaintenance
operations.
Whenyouscopetheindustrymodeltocreateyourlogicaldatamodel,usethesesteps
withInfoSphereDataArchitect:
Createadiagramintowhichyoucandragthoseentitiesthatyouneedtoaddress
yourwarehousingandreportingneeds.
Creatingadiagramforyourlogicaldatamodelavoidsdirectlychangingthebase
model.Thismethodallowsyoutomoreeasilyacceptfutureindustrymodel
upgrades.
NavigatethroughtheAggregateFactssectionanddragtheaggregatefactsthat
youneedintoyournewdiagram.
Aggregatefactsarerelatedtothesupportingfacttableswhichcanbeidentified
andincludedwhenyouarecompletingthescopingprocess.Supportingentities
canbeidentifiedundertheheadingDWMSource.
BestpracticesfordeployingIBMIndustryModels
Page9of50
UseInfoSphereDataArchitecttoidentifyandincludeallrelatedentities(fact
anddimensionstables)inthediagram.
Avoidmanuallymovingindividualrelatedentitiesbecausethiscanaffectthe
integrityoftheresultingdatabase.LetInfoSphereDataArchitectidentifyand
automaticallyaddallrelatedentities;youcanthenprunethoseentitlesthatyou
donotneed.
Transformingthelogicaldatamodelintoaphysicaldatamodel
Sincethelogicaldatamodelappliestoalldatabases,minimizethedatabasearchitecture
anddesignchangesthatyoumaketothelogicaldatamodel.Instead,implementthose
changesinthephysicaldatamodel.Thisstrategyprovidesthefollowingbenefits:
Easierupgradestrategyforfuturereleasesofindustrymodelsasonlysemantic
differenceswillexistbetweenyourmodelandtheindustrymodel.
Focustechnicalmodelingeffortonthephysicaldatamodelandretainalogical
datamodelthatissuitedforalldatabases.
Moreeasilycontrolchanges,inboththelogicalandphysicaldatamodels,by
assigningclearrolestoeachmodel.Thelogicalmodelfunctionsasasemantic
masterwhilethephysicalmodelisthetechnicalmaster.
Entitiescanbeaddedtothediagramatalaterstageandmergedintoanexistingphysical
datamodelbyusingthecompareandmergefunctionalityinInfoSphereDataArchitect.
Selectthisapproachtobuildyourphysicaldatamodelincrementallyovertime.
ApplyarchitectureanddatabasedesignchangesthatarespecifictoDB2databasesto
thephysicaldatabasemodelratherthanthelogicaldatamodeltohelpaccommodate
futureupgrades.
Transformyourlogicaldatamodelintoaphysicaldatamodelbyselectingablankareain
thediagramand,fromtheInfoSphereDataArchitectmainmenu,selectingData>
Transform>PhysicalDataModel.
Whenpromptedforfurtherdetails,selecttheDB2databaseversionthatyourequire,for
exampleDB2V10.1.Usethedefaultsettingsthatareprovidedandcompletethe
transformationprocess.ValidateyourDB2installationandthephysicaldatamodelby
generatingtheDDLtocreateatestdatabase.
BestpracticesfordeployingIBMIndustryModels
Page10of50
Preparingthephysicaldatamodelfordeploymentas
apartitioneddatabase
Thephysicaldatamodelisarepresentationofyourlogicaldatamodelthatiscompatible
withDB2.However,themodeldoesnotyetreflectapartitioneddatabaseenvironment
orspecifically,adatawarehousearchitectureanddesign.
Severalbestpracticerecommendationsfordatawarehousingcanbeappliedtothe
physicaldatamodelbeforeyougenerateDDLthatissuitableforapartitioneddatabase
environment.Theseimprovementsincludethefollowingitems:
Introducingdatabasepartitiongroups
Implementingatablespacestrategy
Customizingdatatypes
Implementingsurrogatekeys
RefertotheFurtherreadingsectionandthebestpracticespapercalledPhysicaldatabase
designfordatawarehouseenvironmentsfordetailedexplanationsandexamples.
Startingwithdatabasepartitiongroups
Inapartitioneddatabase,databasepartitiongroupsdeterminewhichtablespacesand,
byeffect,whichtableandindexobjectsarepartitionedornonpartitioned.
Minimizethenumberofdatabasepartitiongroupsandavoidoverlappingdatabase
partitiongroupsonthesamedatahosttoavoidaddingcomplexitytoresource
allocationandmonitoring.
AdheretothefollowingdefaultsimplementedinanIBMPureDataforOperational
AnalyticsSystembuildwhenyouarecreatingandusingdatabasepartitiongroups:
Createjusttwonewdatabasepartitiongroups;onefortablesyouwantto
partitionandonefortablesyoudonotwanttopartition.
Therearenoperformancegainstobemadefromhavingmultipledatabase
partitiongroups.Collocatedqueriesarenotsupportedwheretablesarein
differentdatabasepartitiongroups.
Avoidoverlappingdatabasepartitiongroups.
Createthenonpartitioneddatabasepartitiongrouponthecoordinatordatabase
partitionandcreatethepartitioneddatabasepartitiongroupacrosseachdata
host.
Decidewhichtablesyouwanttopartitionandwhichtablesdonotcontain
enoughrowstobepartitioned.Placethesetablesintheappropriatedatabase
partitiongroup.
BestpracticesfordeployingIBMIndustryModels
Page11of50
Forexample,adimensiontablesuchasCountry Of Originwouldnot
containenoughrowstobeconsideredforpartitioning.Theoverheadof
partitioningasmalltablewouldexceedanyperformancebenefit.
Implementingatablespacestrategy
Intelligenttablespacedesign,withtablepartitioning,unlocksmanyfeaturesavailablein
theDB2software:
Balancedtablespacesizeandgrowthenablesmoreefficientbackupperformance
andamoreflexiblebackupandrecoverystrategy.
Forexample,inarecoveryscenarioyoucanfocusonrecoveringtabledata.You
canopttorebuildindexes,refreshMQTs,replicatetables,andrestoreoldertable
dataasseparateoperations.
Tablespacemaintenanceoperationscanbetargetedatactivedataratherthan
entiretableswhichincludeactiveandinactivedata.
Forexample,operationssuchasREORGcanbetargetedatjustthosetablespaces
thatcontainactivedata.
Datalifecycleoperationscantakeplaceonline.
Forexample,whendatapartitionsaredetached(rolledout),thededicatedtable
spacecanberemovedandthespacecanbereclaimedimmediately.
Multitemperaturedatabasestrategyispossibleasindividualtablespacescanbe
movedfromonestoragelayertoanotherasanonlineoperation.
Forexample,whentablepartitioningisimplemented,inactivedatacanbe
movedasanonlineoperationtolessexpensivestorage,releasingstorage
capacityformoreactivedatatobeplacedon.
Intelligenttablespacedesignfacilitatesconcurrentdatabasemaintenanceoperations
andhelpsavoidcostlyreorganizationtasksinproduction
Correctingapoortablespacedesignstrategypostproductioncanhaveanegativeeffect
onresourceusageanddataavailability:
Significantresourcesareneededtophysicallymovedatafromonetablespaceto
anotherpostproduction.
Havingtoofewtablespacesrestrictsyourflexibilityinperformingdataspecific
backupandrestoreoperations,performingmaintenanceoperationsonspecific
rangesofdataortables,andmanagingthedatalifecycle.
Havingtoomanytablespacescreatesunnecessaryoverheadtothedatabase
managerwhenyouactivatethedatabaseandmaintainrecoveryhistory.Itcan
alsorequiretoomanydatabaseoperationstobeissuedinparalleltocomplete
tasks.
BestpracticesfordeployingIBMIndustryModels
Page12of50
Table1describeshowtodesignagoodtablespacedesignstrategythatgivesyouthe
flexibilityyouneedtomeetyourservicelevelobjectivesforallworkloads.
Tabletype
Tablespacestrategy
Largestfacttable,
largestassociated
dimensiontable,
missioncriticaltables
Createaseparatetablespaceforeachtableandforeachdata
partition.Createaseparatetablespaceforindexesinlinewith
eachtableanddatapartition.Thisprocessenablestablelevel
recoveryfromabackup.
Mediumsized
partitionedtables
Logicallygroupmediumsizeddimensiontablesthatarepartof
thesamefacttablestarorsnowflakeschema;agroupoffive
tablesisadequate.Createaseparatetablespaceforthisgroup
oftablesandaseparatetablespacefortheassociatedindexes.
Smallsizedpartitioned Groupallsmallsizeddimensiontables.Createaseparatetable
tables
spaceforthegroupandaseparatetablespacefortheindexes
thatareassociatedwiththesetables.
Partitioned
materializedquery
tables
CreateaseparatetablespaceforMQTsandaseparatetable
spaceforindexesonMQTs.Usethismethodtodeterminea
separatebackupandrecoverystrategyfortheaggregationlayer
Replicatedtables
Createaseparatetablespaceforreplicatedtablesanda separate
tablespaceforindexes.
Temporalhistorytables Createaseparatetablespacetoholdtemporalhistorytables.
Nonpartitionedtables
Createaseparatetablespacefordataandforindexesinthe
nonpartitioneddatabasepartitiongroup.Thisseparatesuser
datafromcatalogdataonthecatalogdatabasepartition.
Stagingtables
Createaseparatetablespaceforstagingtablesandforother
nonproductiontables.
Table1Guidelinesforimplementingatablespacestrategy
Assigningtablestotablespaces
Theprocessofplacingtablesintotablespacesismadeeasierbyadheringtothetable
spacestrategyintable1.Assigneachtablebytypetotheappropriatetablespace
strategy.
Asageneralguideline,replicateallnoncollocateddimensiontablesbutwherenon
collocateddimensiontablesareover10mrowsthenconsiderpartitioningthesetablesto
avoidlongrefreshtimes.
Createpartitionedtablesashashpartitionedtablesandassignthemtothepartitioned
databasepartitiongroup.
Placetabledataandindexdatainseparatetablespacestoprovidemoreflexibility
whenyouaredesigningyouroperationalmaintenanceanddatalifecyclestrategy
BestpracticesfordeployingIBMIndustryModels
Page13of50
Specificdistributionkeyscanbeassignedlaterduringoptimizationbutmustalwaysbe
identifiedandassignedbeforeproductionuse.Changingthedistributionkeyrequires
youtodropandrecreatethetable.
Customizingkeysanddatatypes
Thephysicaldatamodel,whengenerated,usesdefaultkeys,constraints,anddatatype
valuesthatyouneedtomodifybasedonyoursourcedataandyourapproachtodata
ingest.Considertherecommendationsinthefollowingareas:
Primarykeys
Thelogicaldatamodelimplementsacompositeprimarykeyonfacttables.Theprimary
keyincludesalldimensionforeignkeysthatmaketheprimarykeyunique.
Removetheprimarykeyfromthefacttablesonthephysicaldatamodel.Thefacttables
inthelogicaldatamodelhavemanydimensionkeysandthesekeyscanaffectdataingest
performancenegatively.ReplacethesekeyswithMDCandthenonuniquecomposite
indexingstrategythatisproposedinthispaper.
Referentialconstraints
Sincethelogicaldatamodelissuitableforanydatabase,referentialconstraintsare
createdbydefaultasenforcedconstraints.Enforcedconstraintscanincreasetheeffecton
resourceswheningestingdataandthiscanresultinsloweringestspeedsandreduced
queryperformance.
Inawarehousingenvironment,changetheseconstraintstoinformationalconstraints
becauseinformationalconstraintscanbeusedbytheDB2optimizerwhencompiling
accessplansandthisusehelpsimprovequeryperformance.
Useinformationalconstraintsinsteadofenforcedconstraintstominimizetheeffectof
uniqueindexmaintenancewhenyouarepopulatingfacttables.
Identitykeysandsurrogatekeys
Thelogicaldatamodelimplementsidentitykeysfortheprimarykey,whosepurposeis
asasurrogatekey,oneachdimensiontable.
Fordimensionentities,thelogicaldatamodeldefinescertainattributesasprimaryand
surrogatekeys.
Withinthephysicaldatamodel,useGENERATED BY DEFAULT AS IDENTITYfor
identitycolumns.Thisallowstheingestutilitytosupplyitsownvaluesforthesurrogate
keys,ifrequired,intheinputdataratherthanlettingDB2generatethem.Ifinputvalues
arenotsupplied,thenDB2generatesthem.
Supplyingitsownvaluesforsuchcolumnscanfacilitatedataingestincreatingparent
childrelationships,sincethekeyoftheparentisknownandalreadysupplied.
BestpracticesfordeployingIBMIndustryModels
Page14of50
Thetoolsthatareusedtoingestdataintothedatawarehousegenerallyinfluencewhere
thesurrogatekeysgetassignedduringtheingestprocess.
WhenyouareusinganenginebasedETLtoolsuchasIBMDataStage,youcan
generatesurrogatekeyvalueswithintheETLengineorbyusingsequences.
Whenyouareusingtheingestutilitywithdimensiontables,useidentitykeysto
determineyoursurrogatekeyandomittheidentitykeycolumnfromtheingest
columnlist.
Refinedatatypes
Thephysicaldatamodel,whengeneratedfromyourlogicaldatamodel,usesdefault
datatypesettingsforintegerandcharactercolumns.Usethefollowingguidelinesto
refinethedefaultvaluesbutavoidoverpruningcolumnlengths.Modifyingthesevalues
inproductioncanincurtheneedtoreorganizedatawhichcanbecostly.
ChangethedefaultCHAR(x)datatypetoCHAR(18)forthosecolumnsyou
anticipatetobenomorethan18characterslong.
ChangethedefaultCHAR(x)datatypetoVARCHAR(y)forthosecolumnsyou
anticipatetobegreaterthan18characterslong.
Ensurethatthemodifieddatatypesofcolumnsthatareusedintablejoinsmatch
foroptimalqueryaccessplans.
UsetheBIGINTdatatypewhereyouexpectthevaluesinthecolumnstobe
greaterthanthecapacityoftheintegerdatatype.
UsetheDATEdatatypefortheprimarykeyofthedateandtimedimension.This
useenablesyoutoimplementatablepartitioningstrategythatisbasedon
calendardate.Inaddition,theDB2optimizercaneliminateentiretablepartitions
(rangesofdatawithinatable)basedonadatepredicateandthiseliminationcan
helpimprovequeryperformance.
UsetheDATEdatatypefortheprimarykeyonthedateandtimedimensiontable.This
methodenablesdatapartitionstobedatebasedandfacilitatesqueryperformance.
BestpracticesfordeployingIBMIndustryModels
Page15of50
Usingreportspecificationstoinfluenceyourphysical
datamodel
Thecharacteristicsofyourdatabasedesigncanbefurtherimprovedbyanalyzingthe
physicaldatamodelfromtheperspectiveofyouranalyticalandreportingrequirements.
Reviewyourreportspecificationswithaviewtoidentifyingandcontrastingwhatyou
seewiththedetailsoftheunderlyingdimensionaldatabase.Considerhoweven
distributionofdata,collocatedqueries,andparallelismofthequeryworkloadacrossthe
partitioneddatabasecanbeachieved.
Contrastthegranularityofthereportwiththegranularityofdatainyour
database.
Forexample,ifthegranularityofthedatabasefacttableis1ormoretransactions
perday,andthegranularityofthereportcomparesdailytotals,thenthedatain
thefacttableisacandidateforaggregationbeforeitispresentedtothereport.
Lookatwhatdimensionsareusedtoaggregatedataonyourreport.
Forexample,datamightbeaggregatedbycountry,orbycreditrating.
Examiningthesedimensionscanhelpyouidentifysuitablecandidatecolumnsto
beusedasdistributionkeys.Parametersthatareusedfrequentlyinreportscan
affectthedegreeofparallelisminqueriesandarethereforeunsuitableas
distributionkeys.
Examinethereportparametersandfiltersandtheorderinwhichtheyareused.
Reportparameterscanbeviewedasdimensionfilters.Forexample,afrequently
usedreportthatcontainsfewparameterscanhelpdeterminebothyourMDC
columnsforthefacttablesandpotentialinclusioninanyMQTs.
AnexcerptfromtheSolvencyIIphysicaldatamodel
Figure3belowshowsanexcerptfromtheSIIdatamodel.Thecenterofthediagram
showsthekeyfacttablesandtherestofthediagramshowsthedimensiontables.
Figure3Excerptfromdatamodelshowingfactanddimensionentities
BestpracticesfordeployingIBMIndustryModels
Page16of50
TheSIIdatamodelpresentsdataatthegranularlevelofmonth;thereisasinglerowfor
anassetorinvestmentforeachgivenmonth.Fromadimensionalperspective,thismeans
thatthefacttablesareperiodicsnapshottables,representingapositionattheendofthe
month.
Table2liststhefacttablesandthelargestdimensiontable,showninfigure3,associated
witheachfacttable.
Objecttype
Description
SLVC_INV_HLDG_FCT
Transactionfacttablecontainsinvestmentholdingfact
dataatthegranularityofonerowpermonth.
AGRM_COLL
Agreementcollectionisthelargestdimensiontablethatis
associatedwiththeinvestmentholdingfacttable.
SLVC_AST_VAL_FCT
Transactionfacttableholdsassetvaluationatthe
granularityofonerowperassetpermonth.
FNC_AST
Financialassetisthelargestdimensiontablethatis
associatedwiththeassetvaluationfacttable.
QRT_INV
AggregatetransactiontableQRTInvestmentsisaunion
ofthetwofacttables.
TM_DIMENSION
Timedimension.
SLVC_RPT_AST_CGY_ID SolvencyreportAssetCategorydimensiontable.
CR_RTG_ID
CreditRatingdimensiontable.
CNTRY_OF_CUSTODY_O CountryofCustodydimensiontable.
F_AST_ID
ISSUR_OF_INV_ID
IssuerofInvestmentcounterpartydimensiontable.
Table2ListofSolvencyIIdatamodeltablesreferencedinthispaper
InterpretingSolvencyIIexamplereportsasdatabasedesign
decisions
Theexamplereportsreferencedinthispaperpresentinformationabouttheholdingof
assetsandinvestmentsforthemonthorseriesofmonthsspecified.Thetimeperiodsthat
arecomparedcouldincludethesameassets,orhavesomeassetsthatareremoved,or
havenewassetsthatareintroducedwhichtriggerschangestoassetprices,ratings,and
soon.
Thetwoexamplereportrequirementsthatareusedtorefinethephysicaldatamodeland
databasedesignare:
Parentreport:Assetsbyregion,bycounterparty,bycreditrating
Childreport:Assetdrillthroughbydimension
BestpracticesfordeployingIBMIndustryModels
Page17of50
Parentreport:Assetsbyregionbycounterpartybycreditrating
report
Thebusinessquestionansweredbythisreportis;HowexposedamIgeographicallyby
issuersofbondsindifferentpartsoftheworldandwhattheircreditratingis?
ThebasequeryforthisreportisaUNIONofboththeAssetsandtheInvestmentsfact
tables.Thereportaggregatesthedatabyregion(countryoforigin,)bycounterparty,by
creditrating,andprovidesfiltersontheseandotherdimensions.
Thefollowingdesignpointscanbedeterminedfromthisreport:
Distributionkeys
Usethereportspecificationtohelpinchoosingcandidatedistributionkeys.
Usingthefiltersinthisreportcouldleadtoqueriesthatlimitthenumber
databasepartitionsusedinparalleltoperformthequery.
Tablepartitioning
Tablepartitioningbydateisthemosteffectivemethodforbackup,restore,aging,
andarchiving.Sincethedataandreportbothusemonth,thisisthemostsuitable
optiontouseforpartitioningthefacttable.
Materializedquerytables(MQTs)
ThisisasummaryreportandanidealcandidateforanMQT.Thedatamodel
providesatable,QRT_INV(quantitativereportingtemplates)asatemplatefor
reportsagainstthesetables.Itisrecommendedthatyoureplacetheprovided
QRT_INVtablewithtwoseparateMQTs,oneforeachsideoftheUNIONclause.
Childreport:Assetdrillthroughbydimensionreport
Thisreportiseffectivelyadrillthroughreportfromwithinthepreviousreport.The
businessproblemthatisaddressedbythisreportis;Fromwithinthefirstreport,Iwant
toseethecompositionoftheaggregatedmetricsforanentryonthefirstreport.
Thisreportismoregranularthanthepreviousreportandallowsyoutointerrogatethe
datasetrightdowntothegranularityofthefacttableandanalyzedatatothelevelof
financialasset.Thereporthasparametersthatallowdatatobefilteredonindividual
region,counterparty,orcreditrating.Thefollowingdesignpointscanbedetermined
fromthisreport:
Distribution
SincethisreportinvolvesaquerywhichjoinstheFinancialassetvaluations
facttothelargeFinancialAssetdimension,collocationisanimportantdesign
goal.TheprimarykeyforFinancialassetandInvestmentholding
dimensionscouldbeanidealdistributionkeytohelpcollocatethefactwithits
largestdimensionandhelpensureanevendistributionofdata.
Multidimensionalclustering(MDC)
Themostcommonlyuseddimensionfiltercolumnsaregoodcandidatesfor
definingtheMDCcolumnsforthefacttables.
BestpracticesfordeployingIBMIndustryModels
Page18of50
Optimizingthedatabasearchitectureanddesignfor
yourenvironment
Optimizingadatabaseisaniterativeprocessandyoumustensurethatsufficienttest
datathatreflectsthetargetproductionenvironmentisavailable.
Whenyouareoptimizingyourdatabasedesign,itiscriticalthatyouqualifyeach
optimizationbyensuringthattheintendedchangeisperformingasexpectedinisolation
andinparallelwithotherexpectedworkloads.Theexamplesinthissectioninclude
referencestohowyoucanusetheexplainplantoolstodeterminewhetheryour
optimizationssucceed.
Whenyouimplementandsuccessfullytestchangestothedatabase,applythechanges
backintothephysicaldatamodel.
Thissectionlooksathowtooptimizethedatabasethatsupportsthereportsdescribed.
Themaindesigndecisionsarebasedon:
Choosingdistributionkeysforpartitionedtables
ChoosingMDCtablesandMDCcolumns
Choosingtablestoberangepartitioned
Indexingforperformance
IdentifyingcandidatesforMQTs
Choosingdistributionkeys
SincethereportdoesnotfilterbyindividualFinancialasset,theprimarykeyofthe
Financialassetsdimensiontable(FNC_AST.FNC_AST_ID)wouldbeanideal
distributionkeyfortheAssetvaluationfacttabletohelpensureanevendistributionof
dataacrosseachdatabasepartitionandpromoteparallelisminthequeryworkload.The
aggregationofdataeffectivelyremovesthedetailofindividualassetsonwhichthetable
isbased.
Forexample,thecolumnFNC_AST_IDmakessenseasadistributionkeyforboththe
Financialassets(FNC_AST)andAssetvalue(SLVC_AST_VAL_FCT)becausethe
followingcriteriaarefulfilled:
Thedistributionkeyisnotusedasafilterconditioninthereports.Thischoice
preventsaqueryfromrequestingdataonasingledatabasepartitiononlyand
artificiallylimitingperformance.
Collocatedqueriesforthetargetreportsareachievedbypartitioningboththe
factandthechosendimensionontheprimarykeyofthedimensiontable.
Facilitateanevendistributionofdatabyusingthemostgranulardimensionor
thedimensionthatisclosesttothegranularityofthetransactiontable.
BestpracticesfordeployingIBMIndustryModels
Page19of50
Designyourdatabasetosupportcollocatedquerieswithintheconstraintofhavingno
morethan10%skewinthedistributionofdataacrossthepartitioneddatabase
Inordertoachieveanevendistributionovereachdatabasepartition,thedistributionkey
mustcontainarelativelyhighnumberofdistinctvalues(thatis,cardinality)andthefact
tablemusthaveanevendistributionofdataforthechosendistributionkey.These
choiceshelpavoidanunevendistributionorskewwhichoccurswhenthedatabase
partitionwiththegreatestnumberofrowshas10%morerowsthantheaveragerow
countacrossalldatabasepartitions.
ChoosingMDCtablesandcolumns
Allfacttablesarecandidatesformultidimensionalclusteringsincetheadvantages
gainedinqueryperformanceandthereductioninmaintenanceoperationsare
significant.Choosingcolumnsthatarenotusedinthefiltersofyourmostfrequent
queriescanhaveanegativeeffectonqueryperformance.ChangingyourMDCcolumns
requiresarebuildofthetablesochooseandtestyourMDCstrategyinlinewithreport
developmentbeforeyouintroducethestrategyintoproduction.
Usemultidimensionalclusteringtablestoorganizedatainallfacttablestoreducethe
needforregularindexesandassociatedmaintenanceoperations
AnMDCtablephysicallygroupsdatapagesthatarebasedonthevaluesforoneormore
specifieddimensioncolumns.EffectiveuseofMDCcansignificantlyimprovequery
performancebecausequeriesaccessonlythosepagesthathaverowswiththecorrect
dimensionvalues.
ConsiderthefollowingtaskswhenyouarechoosingMDCcolumns:
CreateMDCtablesonthecolumnsthathavelowcardinalityinordertohave
enoughrowstofillanentirecell.
CreateMDCtablesthathaveanaverageoffivecellsperuniquecombinationof
valuesintheclusteringkeys.
Usegeneratedcolumnstocoarsifyorreducethenumberofdistinctvaluesfor
MDCdimensionsintablesthatdonothavecolumncandidateswithsuitablelow
cardinality.
AvoidincludingallfilterdimensionsintheMDCORGANIZE BYclause.Althoughthis
practiceincreasestheflexibilityofreporting,thenumberofcellscanalsobeincreased,
resultinginsparselypopulatedMDCtables.
Todeterminewhetherthechosencolumnsaresuitableasdimensioncolumnsinthefact
table,usethefollowingquerytocalculatethedensityofthecellscolumnsthatarebased
onanaverageuniquecellcountperuniquevaluecombinationgreaterthan5.Table
statisticsmustbecurrenttoreturnmeaningfuldata.
SELECT CASE
WHEN (a.npages/extentsize)/
(SELECT COUNT(1) as num_distinct_values
BestpracticesfordeployingIBMIndustryModels
Page20of50
Thesecolumnswerethemostfrequentlyusedcolumnsinthereportforfiltering.
ThecardinalityofthetwocolumnswassuitableforMDCcellpopulationand
couldbecoarsified.
ThetimedimensionwasincludedasanMDCcolumntohelpreducelocking
contentionwithdataingestoperationsandtoenablerollin,rolloutcapability.
Coarsification
Coarsification,withinaDB2database,istheprocessofcreatingageneratedcolumnto
decreasethegranularityofdimensionswhereyourestimatesshowthattheresulting
MDCtablewouldbesparselypopulated.
AsparselypopulatedMDCtableexistswherethenumberofrowspercell(chosenMDC
dimensioncolumns)islessthanthenumberofpagesinablock(16*16KpagesinanIBM
PureDataforOperationalAnalyticsSystem).SinceDB2allocates1blockpercell,itis
importantthateachcellcontainsahealthynumberofrowstoavoidslowerquery
performancethroughincreaseddiskI/O.
UsegeneratedcolumnswhererequiredtoavoidcreatingsparselypopulatedMDC
tables
Forexample,ifusingtheearliercodesampledoesnotidentifycolumncandidateswith
suitablelowcardinality,thenlooktocoarsifyexistingcolumns.Forexample,coarsifying
theassetcategoryandcreditratingcolumnstoreducethecardinalityinordertoincrease
thenumberofrowspercellwouldcreateamoreefficientMDCtable.
Createageneratedcolumninthefacttableusingthefrequentlyusedfiltercolumnsin
thereport.UseadivisiblenumberthatcreatesasuitableMDCcandidatecolumnas
showninthesampleSQLstatementabove.Thenewcolumncanthenbeaddedtothe
tableasanMDCcolumn,forexample:
DIM_SLVC_RPT_AST_CGY_ID SMALLINT GENERATED ALWAYS AS
(SLVC_RPT_AST_CGY_ID/10),
BestpracticesfordeployingIBMIndustryModels
Page21of50
Partitioninglargetablesforqueryperformanceand
manageability
Usetablepartitioningtoseparaterangesofdatawithinatableintodatapartitionstotake
advantageofDB2features.
Fromadimensionaldatawarehouseperspective,areportoranalyticalquerytypically
readsalargevolumeofrowsinordertoreturnafewrows.Anefficientdesignmustlook
tominimizethenumberofrowsthatarereadtojustthoserowsthatareneeded.
Buildonanintelligenttablespacedesignstrategybypartitioningyourlargestfact
tablestofacilitatequeryperformanceandenableflexibilitywhenperforming
maintenanceoperations.
Forexample,bypartitioningthefacttableSLVC_AST_VAL_FCTbymonth,thefollowing
capabilitiesareenabled:
UsethetimedimensiontodeterminerangesofdatapermonthastheDATE
columnwasusedonthetimedimensionkey.
TheDB2optimizercaneliminateadatapartitionwherethedaterangeofthe
querydoesnotmatchthedaterangeofthedatapartition.Thiscanhelp
significantlyreducethenumberofrowsreadforaquery.
Defineeachindexonthefacttableasapartitionedindex(optioninInfoSphere
DataArchitect)andincludetherangepartitioningkey.
Indexmaintenancecantakeplaceatdatapartitionlevelandbetargetedatactive
datapartitions.
DefineenddatesforeachdatapartitionasEXCLUSIVEsothatthefirstdayinthe
subsequentmonthcanbespecifiedastheendoftheperiodrange.
Thismethodisclearerandlesspronetoerrorthanmanuallydeterminingthe
correctendofmonthdate.
Assigneachtable(range)partitiontoaseparatetablespace;forexample
February 2012isassignedtothePD_AST_VAL_FEB2012tablespace.
Thisincreasesvisibilityandflexibilityinmaintenanceoperations,backupand
recovery,anddatalifecyclemanagement.Ageddatacaneasilybedetachedfrom
thedatabasehelpingtomaintainabalancebetweenactiveandinactivedata.
ThefollowingDDLexcerptwasgeneratedfromthephysicaldatamodelandshowsthe
partoftheCREATE TABLEstatementthatincludedthetablepartitioningsyntax.Data
partitionswerecreatedforJanuary,February,March,andsoon,for2012withaseparate
datapartitionforrowsthatexistfordatesbeforeandaftertherangesspecified.
PARTITION BY RANGE (TM_DIMENSION_ID)
(
BestpracticesfordeployingIBMIndustryModels
Page22of50
ConfirmingdatapartitioneliminationinSELECTstatements
UsetheEXPLAIN PLANstatementtodeterminethatdata(range)partitionsarebeing
referencedinyourqueries.Forexample,thefollowingexplainplanoutputfromareport
queryindicatesthatthequeryidentifiedaspecificdatapartition(range)tosatisfythe
querybeforetheI/Ooperationstakeplace.
Range 1)
Start Predicate: (Q6.TM_DIMENSION_ID = '12/01/2011')
Stop Predicate: (Q6.TM_DIMENSION_ID = '12/01/2011')
Indexingforperformance
TheuseofMDConfacttablesreducesyourneedtocreatemultipleindexesonfact
tables.Instead,useindexingtofacilitatequeryperformancewhencolumnsthatarenot
usedintheMDCarereferenced.TheDB2optimizercantakeadvantageofsingularor
compositeindexesonforeignkeys.
MDCfacilitatesaccesstodatainmultipledimensionsbyorganizingthedatain
dimensionalblocks,whichalsoreducestherequirementtoreorganizethetables.
Indexesareprimarilyusedtoenhancequeryperformance,butalsocanbeusedtogovern
howdataisorganizedondimensiontablesortoenforceuniqueconstraints.Inadata
warehouseenvironment,focusonqueryperformancebyusingthesetechniques:
Useindexesforqueryperformanceonly;constraintsshouldbecheckedand
enforcedbytheETLprocessandmadeawaretotheDB2optimizerusing
informationalconstraints.
Usepartitionedindexesoverglobalindexestominimizequerycostandindex
maintenance.
BestpracticesfordeployingIBMIndustryModels
Page23of50
EnhancementsinDB2V10.1thatallowtheoptimizertorecognizedatawarehouse
queriesanduseofazigzagjoincanhelpimproveperformanceandrequireaspecific
designpattern.Azigzagjoincanoccurwhereafacttableandtwoormoredimension
tablesinastarschemaarejoined.
Usecompositeindexestoincludethoseforeignkeysthatareusedinqueryjoins,
includingMDCcolumns
Toenabletheoptimizertouseazigzagjoin:
Createaprimarykeyoneachdimensiontableinthephysicaldatamodelto
enforceuniquenessandprovideanindexfortheoptimizertouse.Primarykeys
ontheparenttablearealsoamandatoryrequirementforforeignkeyconstraints
(informationalorenforced)withthefacttable.
Createacompositeindexonthefacttablethatincludesallfrequentlyusedjoin
columns,includingMDCcolumns.Refinethecolumnsthatareusedduringthe
queryoptimizationprocess;theexplainplanoutputhelpsidentifythecolumns
needed.
UsingtheExplainfacilitytodeterminezigzagjoinusage
TofacilitatethezigzagjoinaccessplansoverdimensionkeysnotincludedintheMDC
columns,anonuniquecompositeindexiscreatedoveralltheforeignkeystothe
dimensiontablesthatarereferencedintheexamplereports.Forexample:
CREATE INDEX QRT_DWM.IDX_SLVC_AST_VAL_FCT ON
"QRT_DWM"."SLVC_AST_VAL_FCT" ("SLVC_RPT_AST_CGY_ID", "CR_RTG_ID",
"ISSUR_OF_INV_ID", "ISSUR_CNTRY_LGL_SEAT_ID",
"CNTRY_OF_CUSTODY_OF_AST_ID", "TM_DIMENSION_ID") PARTITIONED;
Toensurethatzigzagjoinoperatorisbeingachieved,usetheEXPLAIN PLANstatement
andthedb2exfmtcommandtoexaminetheaccessplanforthequeryandlookforthe
ZZJOINoperatorintheaccessplangraphproduced.Forexample:
EXPLAIN PLAN FOR <SELECT statement>
db2exfmt -d modeldb5 -t -g
ThefollowingisasnippetoftheexplainplanoutputthatshowstheZZJOINoperator.
TheoutputalsoshowsaBTQ(BroadcastTableQueue)thatwouldbeapossible
candidatefortablereplication.
|
0.0633706
ZZJOIN
( 12)
57.4938
24
+---------------------+------+--------------+
3.31022 1
.06757 0.0179323
TBSCAN
TBSCAN
IXSCAN
( 13)
( 18)
( 23)
BestpracticesfordeployingIBMIndustryModels
Page24of50
46.1982
4.52372
6.77185
22
1
1
|
|
|
3.31022
1.06757
50318
TEMP
TEMP DP-INDEX|
QRT_DWM
( 14)
( 19)
TEST_INDEX
46.1969
4.52241
Q6
22
1
|
|
3.31022
1.06757
BTQ
BTQ
( 15)
( 20)
46.1853
4.51201
22
1
|
|
3.31022
1.06757
FETCH
FETCH
( 16)
( 21)
46.1597
4.48722
22
1
/---+----\
/----+-----\
1000 1000
122
122
IXSCAN TABLE:QRT_DWM
IXSCAN TABLE: QRT_DWM
( 17) GEO_AREA
( 22) SLVC_RPT_AST_CGY
8.86251
Q2
0.00777168 Q4
2
0
|
|
1000
122
INDEX: QRT_DWM
INDEX: QRT_DWM
GEO_AREA_PK
SLVC_RPT_AST_CGY_PK
Q2
Q4
BestpracticesfordeployingIBMIndustryModels
Page25of50
UsingpartitionedMQTstoenhancequeryperformance
Usematerializedquerytables(MQTs)toenhancequeryperformancebyprecomputing
expectedaggregationqueries.Whenyouarecreatinganaggregationlayer:
RemovecolumnsfromtheMQTsthatarenotincludedinyourreportqueryto
improvetheperformanceoftheMQTREFRESHoperationandallowtheMQTto
beusedinmultiplecontexts.
Sincecolumnsarenotremovedfromtheunderlyingtables,columnscanbe
addedtotheMQTatafuturepointintimeifneeded.
DonotwritequeriesdirectlyagainsttheMQTs.
DesigningandimplementingMQTisaniterativeprocessevenafterthedata
warehousegoesliveparticularlyifselfserviceanalysisqueriesareused.
Wherepossible,implementpartitionedMQTsbyusingthesamedistribution
keys,MDCcolumns,compressionsetting,andrangepartitioningasthe
underlyingfacttable.
ThismethodhelpsensurethattheMQTismorefavorabletotheoptimizerfor
queryrewrite.
IncludeRUNSTATSforMQTsintoyouroverallstatisticscollectionstrategyand
alwaysrefreshstatisticsforanMQTafteryourefreshtheMQT.
DefiningMQTsforthesamplereportsanddatamodel
Forthesamplereports,anestedMQTstrategywaschosenandappliedtothephysical
datamodel.BynestingMQTs,therefreshofoneMQTusesthecontentsofanotherMQT,
helpingtoreducetherefreshtimeforMQTs.
TwoMQTswerecreatedforeachsideoftheselect(union)statementthatwas
usedintheparentreport.EachMQTaggregatedfacttabledataonly.
BycreatingseparateMQTsforeachsideoftheunionclause,otherreportsand
otherMQTsthatreferencethefacttablescantakeadvantageoftheMQTs.
AnMQTwascreatedforthechildreportwhichaggregatestheassetfactdataby
thedimensionsinthereport.ThisMQTincludedimensioncolumnsforeach
reportfilterinadditiontotheaggregatedfactdata.
ThedesigngoalforthisMQTistoachievenestedMQTs;therefreshoperation
forthisMQTusesthefacttableMQTstohelpmaximizeperformanceand
efficiencywhenitismaintainingMQTs.
Inourtestenvironment,theassetdimensionthatisaggregatedintheMQT,andthe
MQThadroughly500,000rows,comparedtothe10,000,000rowsintheassetsfacttable.
Thesenumbersrepresentsa20:1consolidationratio,whichisinlinewiththedata
warehousebestpracticerecommendationofgreaterthan10:1consolidation,toallowthe
optimizertofavortheMQToverthesourcefacttable.
BestpracticesfordeployingIBMIndustryModels
Page26of50
UsingExplainPlantodetermineMQToptimizations
UsetheDB2explainplantoolstohelpensurethattheMQTsyoucreatedarebeingused
bytheoptimizer.
ConfirmthattheDB2optimizerisidentifyingthechangesthatyoumadetothe
databasebeforeyouapplythechangesbackintothephysicalmodel
ConfirmthattheoptimizerisrewritingthequerytousetheMQTcreatedbyrunningthe
querywiththeEXPLAIN PLANstatementandthenusingthedb2exfmtcommandto
formattheresultingaccessplan.Forexample:
EXPLAIN PLAN FOR <SELECT statement used in report>
Issuethedb2exfmtcommandagainstthetargetdatabase:
db2exfmt -d modeldb5 -t -g
ConfirmthattheMQTisbeingusedbylookingattheresultingaccessplangraphorby
thecommentsintheExtendedDiagnosticInformationsectionoftheformattedplan.In
thefollowingexampleMQTswereidentifiedbytheoptimizerinpreparingaplanforthe
query:
Extended Diagnostic Information:
Diagnostic Identifier: 1
Diagnostic Details:EXP0148W The following MQT or statistical view
was considered in query matching: "QRT_DWM"."QRT_INV_AST_VAL".
Diagnostic Identifier: 2
Diagnostic Details: EXP0148W The following MQT or statistical
view was considered in query matching: "QRT_DWM". "QRT_INV_HLDG".
Diagnostic Identifier: 3
Diagnostic Details: EXP0149W The following MQT was used (from
those considered) in query matching: "QRT_DWM".
"MQT_REPORT1_AST".
Diagnostic Identifier: 4
Diagnostic Details: EXP0149W The following MQT was used (from
those considered) in query matching: "QRT_DWM"."QRT_INV_HLDG".
Thecostfortheaccessprecedingplan,whichleveragesMQTs,showedasignificant
improvementoverthecostwithouttheMQT,andresultsinasimilarincreaseinquery
performance.
Enablingandevaluatingcompression
Enablingcompressioncanhelpreducestorageneedsandincreasequeryperformance
throughreducedI/Oanddimensionaldatabasesaresuitedtoenablingcompression
givendatapatternsgeneratedbyconformeddimensions.
Atdesigntime,compressalltablesandindexes.
Evaluatecompressionratiosduringtestingbyusingrepresentativedata.
BestpracticesfordeployingIBMIndustryModels
Page27of50
ThefollowingqueryagainsttheSYSCAT.TABLEScatalogviewshowstheaveragerow
compressionratiothatisachievedfortheSLV_AST_VAL_FACTtable.Itshowsthat77%
ofthepagesaresavedthroughcompressionwhichrepresentsanaveragecompression
ratioof5.34.
SELECT substr(TABNAME,1,30), AVGROWCOMPRESSIONRATIO,
PCTPAGESSAVED FROM SYSCAT.TABLES WHERE TABNAME='SLVC_AST_VAL_FCT'
ThefollowingqueryagainsttheSYSCAT.INDEXEScatalogviewshowsthepercentageof
pagesthataresavedforeachindex.Forexample,thecompositeindex
SLVC_AST_VAL_FCT_IN1saved64%ofthepagesinthisindex.
SELECT substr(indname,1,30) as INDNAME, substr(TABNAME,1,30) AS
TABNAME, substr(colnames,1,50) AS COLNAMES, PCTPAGESSAVED FROM
syscat.indexes WHERE TABNAME = 'SLVC_AST_VAL_FCT'
Replicatingnonpartitionedtables
Replicatingnonpartitionedtablesplacesacopyofthetableontoeachdatabasepartition.
Thismethodenablescollocatedquerieswithpartitionedfacttables,avoiding
unnecessarycommunicationbetweendatahostswhichtakesplaceasuncompresseddata
exchanges.
Becausethefacttablecanbecollocatedthroughsharingthedistributionkeywithasingle
majordimensiontable(inthiscaseFNC_AST)only,replicatetheotherdimension
tablesthatareinvolvedinthereportqueriestoachievecollocationacrosstheentire
query.Forexample,theGEO_AREAtableisreplicatedbyusingthefollowingDDL:
CREATE TABLE REPL_GEO_AREA AS (SELECT * FROM GEO_AREA)
IN PD_REPL_DIM_TBL;
BestpracticesfordeployingIBMIndustryModels
Page28of50
Ingestingdataintothedatabase
ToolsincludingIBMDataStage,InfoSphereSQLWarehousing(SQW),andtheDB2load
utilityaredocumentedandprovidevariousfeaturesforETLprocessing.Wheredata
cleansingandtransformationsareneeded,thechoiceexiststoperformthesetaskswithin
thedatabaselayeroroutsideofthedatabaselayer.
Astagingareawithinthedatabaseiscommonlyusedtoholddataforvalidation,
cleansing,andtransformationbeforetransferringthedataintotheproductiontables.
Createthestaginglayerinthephysicaldatamodelandplacethetableswithinthesame
databasepartitiongroupsbutinaseparateschemaandinaseparatetablespace.
Stagingtablesareusedunderoneormoreofthefollowingconditions:
Thedataisnottablereadysignificanttransformationsordatacleansingis
needed.
Thebusinessisnotreadyyouwanttoavoidmakingthedataavailableuntilthe
businessasksforitoraneventoccurs.However,youwanttohavethedata
readytoinsertintotheproductiontables.
Datareferencingdatafromdifferentdatasourcesmustbecrossreferenced
beforeinsertedintotheproductiontables.
IncontrasttotheDB2loadutilitytheDB2ingestclientutility,availableinDB2V10.1,can
beusedtopopulatethedatabasewithloggedtransactionsandwithoutthenecessityto
usestagingtablesandwithoutoffliningtargetproductiontables.
Ingestprovidesascalablesolutionforingestingdataintoapartitioneddatabasebecause
datacanbepreparedontheclientanddirectedtoindividualdatabasepartitions,
avoidinganybottleneckthroughthecoordinatordatabasepartition.
Theingestutilityhasanumberoffeaturesthatmakeitaneffectivewaytogetdatainto
theproductiondatabase:
Youneedotherapplicationstoaccessorupdatethetablewhileitisbeing
populated.
Theinputfilecontainsfieldsthatyouwanttotransformorskipover.
YouneedtouseMERGEorDELETEfunctionality.
Youneedtorecoverandcontinueonwhentheutilitygetsarecoverableerror.
Usetheingestutilitytopopulatethestagingandproductiontablesasaconcurrent
operation;ingestinsertsdataasaloggedoperation,isconcurrentwithbackup,query,
andotherdatabaseoperationsandhelpsmaintainarecoverabledatabase.
Forexample,inpopulatingthefacttableSLVC_AST_VAL_FCTthefollowingingest
commandwasused:
INGEST FROM source_SLVC_AST_VAL_FCT.del format DELIMITED
BestpracticesfordeployingIBMIndustryModels
Page29of50
restart new (
$f_CO_ID BIGINT external,
$f_CNTRY_OF_CUSTODY_OF_AST_ID BIGINT external)
merge into SLVC_AST_VAL_FCT
on (CO_ID = $f_CO_ID)
when matched and (CNTRY_OF_CUSTODY_OF_AST_ID =
$f_CNTRY_OF_CUSTODY_OF_AST_ID) and (ESR_CCY_ID = 'USD')
then
update set UNIT_PRC = UNIT_PRC * 0.0024
when matched and (UNIT_PRC > 20) then
update set UNIT_PRC = 20
when matched and (CR_RTG_ID = 'BBB') then
delete
Theexamplethatisshownillustrates:
Aconditionalexpressionisusedtomergedataforspecificrows.This
demonstrateshowtheINGESTcommandcancombineoperationsanduse
expressionstooffloadsomeoftheanalysisanddatacleansingworkthatother
processesperforminastagingtable.
Whenyouareusingtheingestutility,choosetheCOMMIT COUNTorCOMMIT_PERIOD
optionsasamethodforcommittingrowstothedatabase.Theseoptionscanbespecified
by:
Thenumberofrowsthatareingestedbeforeacommit
INGEST SET COMMIT_COUNT 100
Thelengthoftimeinsecondsbetweencommits
INGEST SET COMMIT_PERIOD 90
Thefollowingexampleshowshowaningestprocesswheninterruptedcanberestarted
followinganinterruption:
INGEST FROM source_SLVC_AST_VAL_FCT.del format DELIMITED
restart new "update_SLVC_AST_VAL_FCT_001" (
$f_CO_ID BIGINT external,
$f_CNTRY_OF_CUSTODY_OF_AST_ID BIGINT external,
$f_UNIT_PRC DECIMAL external)
update SLVC_AST_VAL_FCT
set UNIT_PRC = $f_UNIT_PRC where CO_ID = $f_CO_ID and
CNTRY_OF_CUSTODY_OF_AST_ID <> $f_CNTRY_OF_CUSTODY_OF_AST_ID
<CTRL-C> Interrupt occurs
ingest from source_SLVC_AST_VAL_FCT.del format DELIMITED
restart continue "update_SLVC_AST_VAL_FCT_001" (
BestpracticesfordeployingIBMIndustryModels
Page30of50
CNTRY_OF_CUSTODY_OF_AST_ID <>
$f_CNTRY_OF_CUSTODY_OF_AST_ID
where:
source_SLVC_AST_VAL_FCT.delisthenameofthesourcetextfile
update_SLVC_AST_VAL_FCT_001isthenamedrecoverableingestjob
Theingestjobresumesfromthepointoflastcommit,accordingtosettingsspecifiedby
eithertheCOMMIT_COUNTorCOMMIT_PERIOD.
Whenusedwiththetemporalfeature,afullhistoryofchangescanberecordedinthe
temporalhistorytable.
Useidentitycolumnsforthedimensiontableprimarykeysandomitthatprimarykey
columnfromthecolumnspecificationlistintheingestcontrolfile.
Thesimpleexamplebelowshowshowtheingestutilitycanbeusedtopopulatethedate
dimensioninthesampleindustrydatamodelused.
BestpracticesfordeployingIBMIndustryModels
Page31of50
else 'Y'
END,
'N')
Where:
Thefile,source_date_dim.del,simplycontainsthenumbers1to365to
representeachdayintheyear.
CDR_DTistheprimarykeywithadatatypeofDATE
BestpracticesfordeployingIBMIndustryModels
Page32of50
Usingtemporaltablestoeffectivelymanagechange
Usingtemporaltables,thedatabasecanstoreandretrievetimebaseddatawithoutmore
applicationlogic.Forexample,adatabasecanstorethehistoryofatable(deletedrowsor
theoriginalvaluesofrowsthatwereupdated)soyou,oryourauditors,canunderstand
thehistoryofarowinatableorretrievedataasofaspecificpointintime.
DB2supportsthreetypesoftemporaltables:
Systemperiodtemporaltables(STTs).
DB2transparentlykeepsahistoryofupdatedanddeletedrowsovertime.
Applicationperiodtemporaltables(ATTs).
NewSQLconstructsallowuserstoinsert,query,update,anddeletedatainthe
past,present,orfuture.DB2automaticallyappliestemporalconstraintsand
rowsplitstocorrectlymaintaintheapplicationsuppliedbusinesstime,also
knownasvalidtime.
Bitemporaltables(BTTs).
Thiscombinationenablesapplicationstomanagethebusinessvalidityoftheir
datawhileDB2keepsafullhistoryofanyupdatesanddeletes.EveryBTTisalso
anSTTandanATT.
Usingtemporaltablescanhelpyoutrackdatachangesovertimeandprovideanefficient
waytoaddressSolvencyIIauditingandcompliancerequirements.
ThetemporaldatafeatureisdescribedindetailinthebestpracticespapertitledBest
practices:TemporaldatamanagementwithDB2whichisreferencedintheFurtherreading
section.
Implementingsystemperiodtemporaltimeforadimension
table
ThisexampleimplementstheSTTfeaturetotheagreementcollectiondimension
(AGRM_COLL)torecordallchangestothetableatasystemdatelevel.
ToenableSTTfortheagreementcollectiontablethefollowingCREATE TABLEstatement
wouldbeusedwherethecolumnsVLF_FM_DT(ValidFromDate),VLF_TO_DT(ValidTo
Date),andVLD_TX_START(ValidTransactionStart)areaddedtothetableandthe
keywordPERIOD SYSTEM_TIMEdeterminesthecolumnstobeused:
CREATE TABLE "QRT_DWM"."AGRM_COLL" (
"AGRM_COLL_ID" BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY
(START WITH 1 INCREMENT BY 1 MINVALUE 1 MAXVALUE
9223372036854775807 NO CYCLE CACHE 20 NO ORDER ),
"ANCHOR_ID" BIGINT, "CGY_SCM_NUM" VARCHAR(20),
"DSC" VARCHAR(256), "EFF_FM_DT" TIMESTAMP, "EFF_TO_DT" TIMESTAMP,
BestpracticesfordeployingIBMIndustryModels
Page33of50
Usethesamedistributionkeyastheparentdimensiontable
Usethesamecompressionstate
Useaseparatetablespacetocontainthehistorytable
Thetemporalhistorytableshouldusethesameattributesandcharacteristicsasthe
parenttableforwhichyouareenablingthetemporalfeature
Usingtemporaltodeterminesystemperiodtemporalstate
Usethesystemperiodtemporaltabletofacilitateasofreportingandtodeterminewhen
datachangedandwhatthebeforeandaftercolumnvalueswere.
TheSQLstatementbelowusesthesystemperiodtemporaltabletoreturnthevaluesofa
specificcolumn,EXT_REFR(ExternalReference)foraspecifictimeframe:
SELECT AGRM_COLL_ID, REFRESH_DT, EXT_REFR, VLD_FM_DT, VLD_TO_DT
FROM QRT_DWM.AGRM_COLL
FOR SYSTEM_TIME FROM '2010-01-01' TO '2012-10-01'
WHERE AGRM_COLL_ID = 9741
TheoutputfromtheSQLstatementshowsthatthreeversionsoftherowexistforthe
timeperiodwithvaryingvaluesforthecolumnretrieved:
AGRM_COLL_ID REFRESH_DT EXT_REFR VLD_FM_DT
VLD_TO_DT
------------ ---------- --------- -------------------------9741 04/07/2011 RQF-28/W 2010-01-0100.00.00.000000000000 2012-09-13-08.21.28.143129000000
9741 04/07/2011 RQF-26/X 2012-09-1308.21.28.143129000000 2012-09-13-08.23.03.818945000000
BestpracticesfordeployingIBMIndustryModels
Page34of50
Implementingbusinessperiodtemporaltimeforadimension
table
Usebusinesstimeandapplicationperiodtemporaltables,ifyouneedtodescribewhen
informationisvalidintherealworld,outsideofDB2.
Whenyoucreateatemporaltabletoincludebusinessperiodtemporaltime,youare
allowingDB2softwaretocreatemultiplerowsforadimensiontablewhereeachrow
representsdataforaneffectivebusinessdaterange.
Forexample,toapplybusinessperiodtemporaltimefortheAgreementCollection
dimension,usingtheexistingcolumnsEFF_FM_DTandEFF_TO_DT,thefollowingcreate
tablestatementwouldbegenerated:
CREATE TABLE "QRT_DWM"."AGRM_COLL" (
BestpracticesfordeployingIBMIndustryModels
Page35of50
Toaccommodatemultiplerowsperdimensionkey,thebusinesstemporalcolumnsare
addedtotheprimarykeyonthedimensiontabletomakeacompositeprimarykey.
Theeffectofthisisthattheforeignkeyonthefacttablethatisalignedwiththe
dimensiontablecannotbeenforcedastherecanbemorethanonerowinthedimension
tablefortherowinthefacttable.
Whenimplementingbusinessperiodtemporaltimeforadimensiontable,removethe
foreignkeyontheassociatedfacttable.
BestpracticesfordeployingIBMIndustryModels
Page36of50
Conclusion
DeployinganindustrymodelsolutioninapartitionedDB2databasecanbesuccessfully
achievedbyfollowingtherecommendationsinthispaper.
Whenyouimplementalogicaldatamodelitisimportantthatyouscopeonlythose
entitiesthatrelatetoyourbusinessrequirements.Addotherentitiesasyourbusiness
needschangeandgrow.Thishelpstoreduceyourinitialworkloadingettingthe
databaseintoproductionandavoidsimplementingadatabasewithmanyemptytables.
Beforeyoumakedetaileddatabasedesigndecisionsimplementdatabasearchitecturein
linewithbestpracticerecommendations.Thishelpstoavoidcostlyoutagesinyour
productionenvironmentwhendatamovementormaintenanceoperationsneedtotake
place.
Whenyoubuildapartitioneddatabaseforadatawarehouseenvironment,incorporate
featuresavailableintheDB2software.Theseincludemultidimensionalclustering,table
partitioning,compressionandpartitionedindexes.
Whenyouoptimizeyourdatabasedesignforproductionuseensurethatyouusea
partitioneddatabaseenvironmentpopulatedwithrelevantdataandusequeries
generatedfromspecificreportingrequirementsandintendeddataingestanddatabase
maintenanceoperations.
Gooddatabasearchitectureanddesigndecisionsappliedtoyourphysicaldatamodel
resultinaproductiondatawarehousethatcanaccommodatebusinessneedsfor
reporting,dataavailability,regulatorycomplianceandgrowth.
BestpracticesfordeployingIBMIndustryModels
Page37of50
Bestpractices
ApplyarchitectureanddatabasedesignchangesspecifictoDB2
databasestothephysicaldatabasemodelratherthanthelogical
datamodeltohelpaccommodatefutureupgrades
Refineyourlogicaldatamodelscopetoaddressonlyyour
currentdataandreportingneeds.Thistargetsjustthosetables
accessedbyETL,queriesandmaintenanceoperations.
Minimizethenumberofdatabasepartitiongroupsandavoid
overlappingdatabasepartitiongroupsonthesamedatahostto
avoidaddingcomplexitytoresourceallocationandmonitoring
Employintelligenttablespacedesigntofacilitateconcurrent
databasemaintenanceoperationsandhelpavoidcostly
reorganizationtasksinproduction
Placetabledataandindexdatainseparatetablespacesto
providemoreflexibilityforoperationalmaintenanceanddata
lifecyclestrategy
Useinformationconstraintsinsteadofenforcedconstraintsto
minimizetheeffectofuniqueindexmaintenancewhenyouare
populatingfacttables
UsetheDATEdatatypefortheprimarykeyonthedate/time
dimensiontable.Thismethodenablesdatapartitionstobedate
basedandfacilitatesqueryperformance.
Usemultidimensionalclusteringtablestoorganizedatainall
facttablestoreducetheneedforregularindexesandassociated
maintenanceoperations
Usecompositeindexestoincludethoseforeignkeysthatareused
inqueryjoins,includingMDCcolumns
ConfirmthattheDB2optimizerisidentifyingthechangesthat
youmadetothedatabasebeforeyouapplythechangesbackinto
thephysicalmodel
BestpracticesfordeployingIBMIndustryModels
Page38of50
AppendixA.Testenvironment
ThetestenvironmentusedintheresearchanddevelopmentofthispaperwasanIBM
PureDataforOperationalAnalyticsSystemwhichhasasharednothingarchitecture.
InfoSphereWarehouseV10.1containingDB2V10.1wasinstalled.Thedatabasewas
populatedwithover1TBofdatatotesttheconceptsandrecommendationsinthispaper
atscale.
Figure4illustratesthearchitectureoftheIBMPureDataforOperationalAnalytics
System.Theadministrationhostcontainedfivedatabasepartitions;oneforthe
coordinatordatabasepartition,andfourdatadatabasepartitions.Twodatahostswith
eightdatabasepartitionsoneachdatahostcompletedthepartitioneddatabase.
Figure4IBMPureDataforOperationalAnalyticsSystemdatabasearchitecture
BestpracticesfordeployingIBMIndustryModels
Page39of50
AppendixBSampleQueries
Thissectionliststhemainqueriesthatwereusedbysamplereportsinthispaper.
Queryforreportexample1:Assetsbyregion,bycounterparty,
bycreditrating
ThisisthequerysimilartothatusedbytheIBMCognosreportandsubmittedtoDB2:
--QUERY for selecting Investment Holdings.
SELECT
SLVC_INV_HLDG_FCT.SLVC_RPT_AST_CGY_ID AS "CIC",
SLVC_RPT_AST_CGY.TP AS "ID Code Type",
SLVC_RPT_AST_CGY.NM AS "SII Category Name",
"Asset Custody Country".CNTRY AS "Country of Custody",
"Counterparty".AGRM_REFR AS "Counterparty Issuer Name",
CR_RTG.EXT_REFR AS "External rating",
CDR_DT.CDR_YR, CDR_DT.CDR_MTH,
SUM(SLVC_INV_HLDG_FCT.QTY) AS "Quantity",
SUM(SLVC_INV_HLDG_FCT.SLVC_II_VAL) AS "Unit SII price",
SUM(SLVC_INV_HLDG_FCT.ACQ_COST) AS "Acquisition cost",
SUM(SLVC_INV_HLDG_FCT.QTY * SLVC_INV_HLDG_FCT.SLVC_II_VAL) AS
"Total SII price",
SUM(SLVC_INV_HLDG_FCT.ACR_INT) AS "Accrued Interest"
FROM
SLVC_INV_HLDG_FCT, SLVC_RPT_AST_CGY, FNC_SERVICES_RL AS
"Counterparty",
GEO_AREA AS "Asset Custody Country", CR_RTG, CDR_DT
WHERE
SLVC_INV_HLDG_FCT.SLVC_RPT_AST_CGY_ID =
SLVC_RPT_AST_CGY.SLVC_RPT_AST_CGY_ID AND
SLVC_INV_HLDG_FCT.ISSUR_OF_INV_ID =
"Counterparty".FNC_SERVICES_RL_ID AND
SLVC_INV_HLDG_FCT.CNTRY_OF_CUSTODY_OF_AST_ID = "Asset Custody
Country".GEO_AREA_ID AND
SLVC_INV_HLDG_FCT.CR_RTG_ID = CR_RTG.CR_RTG_ID AND
SLVC_INV_HLDG_FCT.TM_DIMENSION_ID = CDR_DT.CDR_DT
GROUP BY
SLVC_INV_HLDG_FCT.SLVC_RPT_AST_CGY_ID,
SLVC_RPT_AST_CGY.TP, "Asset Custody Country".CNTRY,
"Counterparty".AGRM_REFR, SLVC_RPT_AST_CGY.NM,
CR_RTG.EXT_REFR, CDR_DT.CDR_YR, CDR_DT.CDR_MTH
UNION ALL
-- SELECT statement for selecting Asset Valuations looks like:
SELECT
--SLVC_AST_VAL_FCT.FNC_AST_ID,
SLVC_AST_VAL_FCT.SLVC_RPT_AST_CGY_ID AS "CIC",
SLVC_RPT_AST_CGY.TP AS "ID Code Type",
SLVC_RPT_AST_CGY.NM AS "SII Category Name",
"Asset Custody Country".CNTRY AS "Country of Custody",
"Counterparty".AGRM_REFR AS "Counterparty Issuer Name",
CR_RTG.EXT_REFR AS "External rating",
CDR_DT.CDR_YR, CDR_DT.CDR_MTH,
null AS "Quantity",
BestpracticesfordeployingIBMIndustryModels
Page40of50
Materializedquerytableforassetvaluations
ThisistheMQTcreatedfortheassetvaluationsideoftheSQLstatementthatwas
submittedinthepreviousexample.TheDB2optimizerrewritesthequeryplanforthe
SQLstatementtotakeadvantageoftheMQTwhereamoreefficientplanisestimated.
CREATE TABLE QRT_DWM.QRT_INV_AST_VAL AS (
SELECT
SLVC_AST_VAL_FCT.SLVC_RPT_AST_CGY_ID,
SLVC_AST_VAL_FCT.CR_RTG_ID,
SLVC_AST_VAL_FCT.ISSUR_OF_INV_ID,
SLVC_AST_VAL_FCT.ISSUR_CNTRY_LGL_SEAT_ID,
SLVC_AST_VAL_FCT.CNTRY_OF_CUSTODY_OF_AST_ID,
SLVC_AST_VAL_FCT.TM_DIMENSION_ID,
SLVC_AST_VAL_FCT.DIM_SLVC_RPT_AST_CGY_ID,
SLVC_AST_VAL_FCT.DIM_CR_RTG_ID,
SUM(UNIT_PRC) as UNIT_PRC
FROM SLVC_AST_VAL_FCT
GROUP BY
SLVC_AST_VAL_FCT.SLVC_RPT_AST_CGY_ID,
SLVC_AST_VAL_FCT.CR_RTG_ID,
SLVC_AST_VAL_FCT.ISSUR_OF_INV_ID,
SLVC_AST_VAL_FCT.ISSUR_CNTRY_LGL_SEAT_ID,
SLVC_AST_VAL_FCT.CNTRY_OF_CUSTODY_OF_AST_ID,
SLVC_AST_VAL_FCT.TM_DIMENSION_ID,
SLVC_AST_VAL_FCT.DIM_SLVC_RPT_AST_CGY_ID,
SLVC_AST_VAL_FCT.DIM_CR_RTG_ID
)
DATA INITIALLY DEFERRED REFRESH DEFERRED
ENABLE QUERY OPTIMIZATION MAINTAINED BY SYSTEM
BestpracticesfordeployingIBMIndustryModels
Page41of50
COMPRESS YES
ORGANIZE BY DIMENSIONS ( (DIM_SLVC_RPT_AST_CGY_ID),
(DIM_CR_RTG_ID) )
DISTRIBUTE BY HASH(SLVC_RPT_AST_CGY_ID)
PARTITION BY RANGE (TM_DIMENSION_ID)
(
PART PAST STARTING(MINVALUE)
ENDING('2010-01-01') EXCLUSIVE IN TS_PD_MQT_QRT_INV,
PART PART_2010_JAN STARTING('2010-01-01')
ENDING('2010-02-01') EXCLUSIVE IN TS_PD_MQT_QRT_INV,
PART PART_2010_FEB STARTING ('2010-02-01')
ENDING('2010-03-01') EXCLUSIVE IN TS_PD_MQT_QRT_INV,
..
PART PART_2011_NOV STARTING ('2011-11-01')
ENDING('2011-12-01') EXCLUSIVE IN TS_PD_MQT_QRT_INV,
PART PART_2011_DEC STARTING ('2011-12-01')
ENDING('2012-01-01') EXCLUSIVE IN TS_PD_MQT_QRT_INV,
PART PART_FUTURE STARTING ('2012-01-01')
ENDING(MAXVALUE) IN TS_PD_MQT_QRT_INV);
TopreparetheMQTforuseaftertheinitialtakeonofhistoricaldataintothewarehouse
fromoperationalsources,usetheLOADFROMCURSORapproachwhichisafast
methodforpopulatingthetable;aloggedoperationisnotneededhere.
DECLARE C_CUR CURSOR FOR
(SELECT
SLVC_AST_VAL_FCT.SLVC_RPT_AST_CGY_ID,
SLVC_AST_VAL_FCT.CR_RTG_ID,
SLVC_AST_VAL_FCT.ISSUR_OF_INV_ID,
SLVC_AST_VAL_FCT.ISSUR_CNTRY_LGL_SEAT_ID,
SLVC_AST_VAL_FCT.CNTRY_OF_CUSTODY_OF_AST_ID,
SLVC_AST_VAL_FCT.TM_DIMENSION_ID,
SLVC_AST_VAL_FCT.DIM_SLVC_RPT_AST_CGY_ID,
SLVC_AST_VAL_FCT.DIM_CR_RTG_ID,
SUM(UNIT_PRC) as UNIT_PRC
FROM SLVC_AST_VAL_FCT
GROUP BY
SLVC_AST_VAL_FCT.SLVC_RPT_AST_CGY_ID,
SLVC_AST_VAL_FCT.CR_RTG_ID,
SLVC_AST_VAL_FCT.ISSUR_OF_INV_ID,
SLVC_AST_VAL_FCT.ISSUR_CNTRY_LGL_SEAT_ID,
SLVC_AST_VAL_FCT.CNTRY_OF_CUSTODY_OF_AST_ID,
SLVC_AST_VAL_FCT.TM_DIMENSION_ID,
SLVC_AST_VAL_FCT.DIM_SLVC_RPT_AST_CGY_ID,
SLVC_AST_VAL_FCT.DIM_CR_RTG_ID);
LOAD FROM C_CUR OF CURSOR REPLACE
BestpracticesfordeployingIBMIndustryModels
Page42of50
1.
Dropthestagingtable
2.
Recreatethestagingtable
Populatethestagingtable
GROUP BY
SLVC_AST_VAL_FCT.SLVC_RPT_AST_CGY_ID,
SLVC_AST_VAL_FCT.CR_RTG_ID,
SLVC_AST_VAL_FCT.ISSUR_OF_INV_ID,
SLVC_AST_VAL_FCT.ISSUR_CNTRY_LGL_SEAT_ID,
SLVC_AST_VAL_FCT.CNTRY_OF_CUSTODY_OF_AST_ID,
SLVC_AST_VAL_FCT.TM_DIMENSION_ID,
SLVC_AST_VAL_FCT.DIM_SLVC_RPT_AST_CGY_ID,
SLVC_AST_VAL_FCT.DIM_CR_RTG_ID;
4.
RemovetheoriginalMQTfromthetable
5.
AttachthenewMQTdatapartition
SetIntegrityfortheentireMQT
BestpracticesfordeployingIBMIndustryModels
Page43of50
ReintroducetheMQTtothetargettable
Setintegrityandcollectstatisticsforthetargettable
Createamaterializedquerytabledimensionaljoins
AnadditionalMQTiscreatedtomaterializethedimensionaljoinsforreport1,which
lookslikethefollowingontheassetsside:
CREATE TABLE MQT_REPORT1_AST AS (
SELECT
SLVC_AST_VAL_FCT.SLVC_RPT_AST_CGY_ID,
SLVC_RPT_AST_CGY.TP, SLVC_RPT_AST_CGY.NM, "Asset Custody
Country".CNTRY, "Counterparty".AGRM_REFR, CR_RTG.EXT_REFR,
CDR_DT.CDR_YR, CDR_DT.CDR_MTH, SLVC_AST_VAL_FCT.TM_DIMENSION_ID,
SUM(SLVC_AST_VAL_FCT.UNIT_PRC) AS "Unit SII price",
SUM(SLVC_AST_VAL_FCT.UNIT_PRC) AS "Total SII price"
FROM
SLVC_AST_VAL_FCT, SLVC_RPT_AST_CGY, FNC_SERVICES_RL AS
"Counterparty",
GEO_AREA AS "Asset Custody Country", CR_RTG, CDR_DT
WHERE
BestpracticesfordeployingIBMIndustryModels
Page44of50
SLVC_AST_VAL_FCT.SLVC_RPT_AST_CGY_ID =
SLVC_RPT_AST_CGY.SLVC_RPT_AST_CGY_ID AND
SLVC_AST_VAL_FCT.ISSUR_OF_INV_ID =
"Counterparty".FNC_SERVICES_RL_ID AND
SLVC_AST_VAL_FCT.CNTRY_OF_CUSTODY_OF_AST_ID = "Asset Custody
Country".GEO_AREA_ID AND
SLVC_AST_VAL_FCT.CR_RTG_ID = CR_RTG.CR_RTG_ID AND
SLVC_AST_VAL_FCT.TM_DIMENSION_ID = CDR_DT.CDR_DT
GROUP BY
--SLVC_AST_VAL_FCT.FNC_AST_ID,
SLVC_AST_VAL_FCT.SLVC_RPT_AST_CGY_ID,
SLVC_RPT_AST_CGY.TP, "Asset Custody Country".CNTRY,
"Counterparty".AGRM_REFR, SLVC_RPT_AST_CGY.NM,
CR_RTG.EXT_REFR, SLVC_AST_VAL_FCT.TM_DIMENSION_ID,
CDR_DT.CDR_YR, CDR_DT.CDR_MTH
)
DATA INITIALLY DEFERRED REFRESH DEFERRED
ENABLE QUERY OPTIMIZATION MAINTAINED BY SYSTEM
COMPRESS YES
DISTRIBUTE BY HASH(SLVC_RPT_AST_CGY_ID)
PARTITION BY RANGE (TM_DIMENSION_ID)
(
PART PAST STARTING(MINVALUE)
ENDING('2010-01-01') EXCLUSIVE IN TS_PD_MQT_QRT_INV,
PART PART_2010_JAN STARTING('2010-01-01')
ENDING('2010-02-01') EXCLUSIVE IN TS_PD_MQT_QRT_INV,
PART PART_2010_FEB STARTING ('2010-02-01')
ENDING('2010-03-01') EXCLUSIVE IN TS_PD_MQT_QRT_INV,
..
BestpracticesfordeployingIBMIndustryModels
Page45of50
Queryforreportexample2:Assetvaluationsdrilldownby
dimensionfilter
ThisisaquerysimilartothatusedbythesecondIBMCognosreportandsubmittedto
DB2:
SELECT
FNC_AST.AST_NM as "Asset Name",
FNC_AST.DSC as "Asset Description",
SLVC_AST_VAL_FCT.SLVC_RPT_AST_CGY_ID AS "CIC",
SLVC_RPT_AST_CGY.TP AS "ID Code Type",
SLVC_RPT_AST_CGY.NM AS "SII Category Name",
"Asset Custody Country".CNTRY AS "Country of Custody",
"Counterparty".AGRM_REFR AS "Counterparty Issuer Name",
CR_RTG.EXT_REFR AS "External rating",
SLVC_AST_VAL_FCT.UNIT_PRC
FROM
SLVC_AST_VAL_FCT, FNC_AST, SLVC_RPT_AST_CGY, FNC_SERVICES_RL AS
"Counterparty", GEO_AREA AS "Asset Custody Country", CR_RTG ,
CDR_DT
WHERE
SLVC_AST_VAL_FCT.SLVC_RPT_AST_CGY_ID =
SLVC_RPT_AST_CGY.SLVC_RPT_AST_CGY_ID AND
SLVC_AST_VAL_FCT.ISSUR_OF_INV_ID =
"Counterparty".FNC_SERVICES_RL_ID AND
SLVC_AST_VAL_FCT.CNTRY_OF_CUSTODY_OF_AST_ID = "Asset Custody
Country".GEO_AREA_ID AND
SLVC_AST_VAL_FCT.CR_RTG_ID = CR_RTG.CR_RTG_ID AND
SLVC_AST_VAL_FCT.FNC_AST_ID = FNC_AST.FNC_AST_ID AND
SLVC_AST_VAL_FCT.TM_DIMENSION_ID = CDR_DT.CDR_DT AND
SLVC_RPT_AST_CGY.NM = 'Government bonds' AND
"Asset Custody Country".CNTRY = 'El Salvador' AND
CR_RTG.EXT_REFR = 'BB' AND
CDR_DT.CDR_YR = 2011 AND CDR_DT.CDR_MTH = 10;
BestpracticesfordeployingIBMIndustryModels
Page46of50
AppendixCSampleSolvencyIIIBMCognosreport
TheIBMCognosreportpresentedheresimplyshowcasesthereportreferencedinthis
paperforreference.
Figure5SampleIBMCognosreportforsampleparentreport
BestpracticesfordeployingIBMIndustryModels
Page47of50
Furtherreading
Governingandmanagingenterprisemodels
http://www.ibm.com/developerworks/rational/library/10/governingandmanagin
genterprisemodelsseries/index.html
DB2forLinux,UNIX,andWindowsbestpractices:
http://www.ibm.com/developerworks/data/bestpractices/db2luw/
BestpracticesforDB2datawarehouseenvironments:
http://www.ibm.com/developerworks/data/bestpractices/db2luw/#analytics
Bestpractices:TemporaldatamanagementwithDB2:
http://www.ibm.com/developerworks/data/bestpractices/temporal/index.html
Bestpractices:DB2V10.1multitemperaturedatamanagement:
http://www.ibm.com/developerworks/data/library/long/dm
1205multitemp/index.html
Bestpractices:Storageoptimizationwithdeepcompression:
http://www.ibm.com/developerworks/data/bestpractices/deepcompression/inde
x.html
ScopingtheIBMIndustryModelforbankingusingEnterpriseModelExtender
andInfoSphereDataArchitect:
http://www.ibm.com/developerworks/data/tutorials/dm
1003bankindustrymodel/
IBMPureDataSystemforOperationalAnalytics:
http://www01.ibm.com/software/data/puredata/analytics/operational/
Contributors
PatGOSullivan
SeniorTechnicalStaffMember,IndustryModelsArchitecture,IBM
DublinLab
RichardLubell
InformationDevelopmentforDB2appliancesandwarehousing,IBM
DublinLab
PaulOSullivan
Insurancedatamodelingarchitect,IBMDublinLab
BestpracticesfordeployingIBMIndustryModels
Page48of50
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and services
currently available in your area. Any reference to an IBM product, program, or service is not
intended to state or imply that only that IBM product, program, or service may be used. Any
functionally equivalent product, program, or service that does not infringe any IBM
intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in
this document. The furnishing of this document does not grant you any license to these
patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
Armonk, NY 10504-1785
U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON
INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do
not allow disclaimer of express or implied warranties in certain transactions, therefore, this
statement may not apply to you.
Without limiting the above disclaimers, IBM provides no representations or warranties
regarding the accuracy, reliability or serviceability of any information or recommendations
provided in this publication, or with respect to any results that may be obtained by the use of
the information or observance of any recommendations provided herein. The information
contained in this document has not been submitted to any formal IBM test and is distributed
AS IS. The use of this information or the implementation of any recommendations or
techniques herein is a customer responsibility and depends on the customers ability to
evaluate and integrate them into the customers operational environment. While each item
may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee
that the same or similar results will be obtained elsewhere. Anyone attempting to adapt
these techniques to their own environment does so at their own risk.
This document and the information contained herein may be used solely in connection with
the IBM products discussed in this document.
This information could include technical inaccuracies or typographical errors. Changes are
periodically made to the information herein; these changes will be incorporated in new
editions of the publication. IBM may make improvements and/or changes in the product(s)
and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only
and do not in any manner serve as an endorsement of those websites. The materials at those
websites are not part of the materials for this IBM product and use of those websites is at your
own risk.
IBM may use or distribute any of the information you supply in any way it believes
appropriate without incurring any obligation to you.
Any performance data contained herein was determined in a controlled environment.
Therefore, the results obtained in other operating environments may vary significantly. Some
measurements may have been made on development-level systems and there is no
guarantee that these measurements will be the same on generally available systems.
Furthermore, some measurements may have been estimated through extrapolation. Actual
results may vary. Users of this document should verify the applicable data for their specific
environment.
BestpracticesfordeployingIBMIndustryModels
Page49of50
Information concerning non-IBM products was obtained from the suppliers of those products,
their published announcements or other publicly available sources. IBM has not tested those
products and cannot confirm the accuracy of performance, compatibility or any other
claims related to non-IBM products. Questions on the capabilities of non-IBM products should
be addressed to the suppliers of those products.
All statements regarding IBM's future direction or intent are subject to change or withdrawal
without notice, and represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To
illustrate them as completely as possible, the examples include the names of individuals,
companies, brands, and products. All of these names are fictitious and any similarity to the
names and addresses used by an actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE: Copyright IBM Corporation 2012. All Rights Reserved.
This information contains sample application programs in source language, which illustrate
programming techniques on various operating platforms. You may copy, modify, and
distribute these sample programs in any form without payment to IBM, for the purposes of
developing, using, marketing or distributing application programs conforming to the
application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions.
IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both. If these and
other IBM trademarked terms are marked on their first occurrence in this information with a
trademark symbol ( or ), these symbols indicate U.S. registered or common law
trademarks owned by IBM at the time this information was published. Such trademarks may
also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at Copyright and trademark information at
www.ibm.com/legal/copytrade.shtml
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
BestpracticesfordeployingIBMIndustryModels
Page50of50