Vous êtes sur la page 1sur 19

WHITE PAPER ON TERADATA PERFORMANCE TUNING

PERFORMANCE TUNING IN TERADATA


PREFACE
1. 2. 3. '. ). ,. -. 1. Performance Tuning Overview Teradata Performance Tuning - Basic Tips COLLECTIO O! "T#TI"TIC" $"I % "#&PLE #dvantages of Co((ecting "tats using "#&PLE Performance Tuning Tips * E+amp(es Peformance Improvement Tips Teradata ".L .uer/ Optimi0ation Conc(usion

Performance Tuning Overview:


Teradata Corporation is an #merican computer compan/ t2at se((s data3ase software for data ware2ouses and ana(/tic app(ications4 inc(uding Big 5ata. Its products are meant to conso(idate data from different sources and ma6e t2e data avai(a3(e for ana(/sis T2is paper wi(( wa(6 /ou t2roug2 t2e process of ac2ieving good performance improvement as 2ow t2e s/stem can 3e fine tuned using t2e features of Teradata.

Tera a!a Performance Tuning " #a$ic Ti%$


Performance tuning t2um3 ru(es7 8ere are ver/ 3asic steps w2ic2 are used to PT an/ given 9uer/ in given environment . #s a pre-re9uisite4 ma6e sure $ser 2as proper se(ect rig2ts and actua( profi(e settings Enoug2 space avai(a3(e to run and test t2e 9ueries 1. :un e+p(ain p(an ;pressing !, or <E=PL#I se( > ?@4A T2en see for potentia( information (i6e o or (ow confidence - Product Boins conditions - B/ wa/ of an a(( row scan - !T" - Trans(ate #(so c2ec6 for - 5istinct or group 3/ 6e/words in ".L 9uer/ - InC not in 6e/words and c2ec6 for t2e (ist of va(ues generated for t2e same

#PP:O#C8E" #. In case of product Boin scenarios4c2ec6 for - Proper usage of a(ias - Boining on matc2ing co(umns - $sage of Boin 6e/words - (i6e specif/ing t/pe of Boins ;e+. inner or outer A - use union in case of DO:@ scenarios

Ensure statistics are co((ected on Boin co(umns and t2is is especia((/ important if t2e co(umns /ou are Boining on are not uni9ue.

B. co((ects stats - :un command Ddiagnostic 2e(p stats on for t2e sessionD - %at2er information on co(umns on w2ic2 stats 2as to 3e co((ected - Co((ect stats on suggestions co(umns - #(so c2ec6 for stats missing on PI4 "I or co(umns used in Boins D2e(p stats Edata3asenameF.Eta3(enameF - &a6e sure stats are re-co((ected w2en at-(east 1GH of data c2anges - remove unwanted stats or stat w2ic2 2ard(/ improves performance of t2e 9ueries - Co((ect stats on co(umns instead of inde+es since inde+ dropped wi(( drop stats as we((II - co((ect stats on inde+ 2aving mu(tip(e co(umns4 t2is mig2t 3e 2e(pfu( w2en t2ese co(umns are used in Boin conditions - C2ec6 if stats are re-created for ta3(es w2ose structures 2ave some c2anges C. !u(( ta3(e scan scenarios - Tr/ to avoid !T" scenarios as4 it mig2t ta6e ver/ (ong time to access a(( t2e data in ever/ amp in t2e s/stem - &a6e sure "I is defined on t2e co(umns w2ic2 are used as part of Boins or #(ternate access pat2. - Co((ect stats on "I co(umns e(se t2ere are c2ances w2ere optimi0er mig2t go for !T" even w2en "I is defined on t2at particu(ar co(umn. 2. If intermediate ta3(es are used to store resu(ts4 ma6e sure t2at - It 2as same PI of source and destination ta3(e 3. Tune to get t2e optimi0er to Boin on t2e Primar/ Inde+ of t2e (argest ta3(e4 w2en possi3(e4 to ensure t2at t2e (arge ta3(e is not redistri3uted on #&P". '. !or (arge (ist of va(ues4 avoid using I C OT I in ".Ls. Jrite (arge (ist va(ues to a temporar/ ta3(e and use t2is ta3(e in t2e 9uer/. ). &a6e sure w2en to use e+istsCnot e+ists condition since t2e/ ignore un6nown comparisons ;e+. - $LL va(ue in t2e co(umn resu(ts in un6nownA . 8ence t2is (eads to inconsistent resu(ts

,. Inner Ks Outer Loins C2ec6 w2ic2 Boin wor6s efficient(/ in given scenarios."ome e+amp(es are - Outer Boins can 3e used in case of (arge ta3(e Boining wit2 sma(( ta3(es ;(i6e fact ta3(e Boining wit2 5imension ta3(e 3ased on reference co(umnA - Inner Boins can 3e used w2en we get actua( data and no e+tra data is (oaded into spoo( for processing P(ease note for outer Boin conditions7 !i(ter condition for inner ta3(e s2ou(d 3e present in DO D condition 2. !i(ter condition for outer ta3(e s2ou(d 3e present in DJ8E:ED condition.

CO&&ECTION OF 'TATI'TIC' U'ING 'AMP&E:


8ere is t2e step 3/ step process on 2ow to co((ect statistics using "#&PLE7 "#&PLE statistics is co((ected using :andom amp samp(ing and it is recommended to use w2en we donMt 2ave stats co((ected on inde+ or set of co(umns. #s a preparation step we c2ec6 w2et2er ta3(e is suita3(e for "#&PLE "T#TI"TIC" co((ection using fo((owing 9uer/ C> suggested to use w2en data s6ew is (ess and a(so w2en more rows are t2ere in ta3(e t2an num3er of amps>C "EL T#B1.# #" T#BLECO$ T4 T#B2.B #" #&PCO$ T4 C#"E J8E T#BLECO$ T F #&PCO$ T T8E N :# 5O& #&P "#&PLI % C# BE "$%%E"TE5N EL"E N:# 5O& #&P "#&PLI % OT EE5E5N E 5 !:O& ;"EL CO$ T ;>A #" # !:O& T#BLE #&EA T#B14 ;"EL 8#"8#&P ;A O1 #" BA T#B2P Be(ow is step 3/ step process on Co((ecting statistics using "#&PLI % 1. C2ec6 if "tats are a(read/ co((ected on t2e co(umn of ta3(e for w2ic2 :andom #&P samp(ing is to 3e considered using 8ELP "T#TI"TIC" O EQO$:R5BF.EQO$:RTBFP If QE"4 t2en t2is situation is tric6/ and do /ou sti(( want to tr/ out "#&PLI % or (oo6 for ot2er recommendations is up-to /ou. 5

2. If

O t2en4 c2ec6 if co(umn is 2ig2(/ s6ewed using fo((owing .uer/.

"ELECT 8#"8#&P ;8#"8B$CSET ;8#"8:OJ ;EQO$:RCOL$& FAAA 4 CO$ T ;>A !:O& EQO$:R5BF.EQO$:RTBF %:O$P BQ 1P If /ou see t2at 5ata is e9ua((/ distri3uted among a(( t2e amps ;Kariance of O-) H is acceptedA4 If t2ere is (arge amount of 5#T#"SEJ in one #&P4 t2en "#&PLI % is not a good option 3. If /ou donMt find data s6ew on an/ particu(ar #&P t2en4 :un samp(e statistics on co(umn of particu(ar ta3(e as fo((ows. COLLECT "T#TIC"TIC" O EQO$:R5BF.EQO$:RTBF COL$& ;EQO$:RCOL$& FA $"I % "#&PLEP '. C2ec6 t2e performance of 9uer/ after running samp(e "T#T"4 a(so note t2e time ta6en for co((ecting samp(e stats. ). If not satisfied wit2 performance4 tr/ to run fu(( statistics on co(umns and measure performance and time ta6en to co((ect fu(( stats ,. 5ecide w2ic2 is t2e 3est option <!$LL "T#T" or "#&PLE<considering factors (i6e - Performance4 - Time ta6en for statistics co((ection on scenarios4 - Ta3(e si0e4 - 5ata s6ew4 - !re9uenc/ of ta3(e 3eing (oaded - 8ow man/ times t2is ta3(e wou(d 3e used in /our environment.

A van!age$ of Co((ec!ing '!a!$ u$ing 'AMP&E:


1. On(/ a samp(e of ta3(e rows is scanned. 5efau(t 3eing 2H. It is 3ased on random amp samp(ing estimate of tota( rows. a. If /ou want to override t2e defau(t va(ue for particu(ar session t2en use4 3. 5I#% O"TIC DCOLLECT"T#T"4 "#&PLE"ITEUnD O !O: "E""IO P 2. It uses (ess CP$ and ICO resources compared to fu(( statistics 2ence saving considera3(e amount of time and resources. It is not recommended to used for 1. Co(umns w2ic2 are not inde+ed 2. Inde+es w2ic2 2as (ot of dup(icates or non uni9ue com3inations 3. for sma(( ta3(es (i6e dimension C6e/ ta3(es '. for ta3(es t2at 2ave greater data s6ew. P(ease note t2at "amp(e statistics cannot 3e co((ected on 1. %(o3a( temporar/ ta3(es 6

2. Loin inde+es

Performance Tuning$ !i%$ : In(i$! v$ #e!ween


T2ere are (ot of 9uestions running around gu/s w2o do tuning . "ometimes t2e/ do suggest use of temporar/ ta3(es instead of using (arge in (ist va(ues4 as t2e optimi0er wou(d go for va(ue in === or Ka(ue in ==Q or Ka(ue in ==T to generate a e+p(ain p(ain. J2at if t2e co(umn compared against (arge in-(ist va(ues was part of an/ inde+ sa/ PI4"I4LI.... V "ometimes it so 2appens t2at even after using a temp ta3(e wit2 (ist of va(ues /ou wou(d sti(( get same performance issue4 w2/ soV 5id /ou ever consider to use D3etweenD c(ause to c2ec6 if t2e 9uer/ performed 3etter VV 5id /ou...V I wou(d sa/ give it a tr/ to see if t2is wou(d 3e muc2 3etter option compared to standard Dtemp ta3(e D against t2e in(ist. "a/ for e+amp(e7 "ELECT customerRnum3er4 customerRname !:O& customer J8E:E customerRnum3er in ;1GGG4 1GG14 1GG24 1GG34 1GG'AP is muc2 (ess efficient t2an "ELECT customerRnum3er4 customerRname !:O& customer J8E:E customerRnum3er BETJEE 1GGG and 1GG' Je are assuming t2at an inde+ on customerRnum3er4 t2e .uer/ Optimi0er can (ocate a range of num3ers muc2 faster ;using BETJEE A t2an it can find a series of num3ers using t2e I c(ause. C2ec6 for e+p(ain to compare difference4 if sti(( t2e same... refres2Cco((ect stats on inde+ co(umn it wou(d 2e(p. If sti(( /ou wou(d find some 6ind of issue tr/ to find out s6ewness of co(umn using fo((owing 9uer/ and tr/ to rectif/ t2e issue. "e( 2as2amp;2as23uc6et;2as2row;customerRnum3erAAA4 count;>A from Customer group 3/ 1P Over a(( we can sa/ tr/ing t2e 9uer/ wit2 3etween wou(d 3e of great 2e(p. But 9uer/ tuning is suc2 a comp(e+ t2ing /ou wi(( never get w2at /ou want un(ess /ou understand t2e data.

Peformance !uning Ti%$ : )oin Con$i era!ion$


7

If /ou are wor6ing on writing 9ueries4 wor6ing on performance or 2e(ping in 3etterment of performance. Qou wi(( 2ave to ta6e sometime in going t2roug2 t2is topic. It is a(( to do a3out Loins w2ic2 is most important concern in Teradata. If some (ig2t is given to fo((owing suggestions4 an/ Boin re(ated issues can 3e ta6en care off...

Tip 17 )oining on PI*NUPI* Non PI co(umn$ 7 Je s2ou(d ma6e sure Boin is 2appening on co(umns composed of $PIC $PI. But w2/VV J2enever we Boin two ta3(es on common co(umns4 t2e smart optimi0er wi(( tr/ to ta6e data from 3ot2 t2e data into a common spoo( space and Boin t2em to get resu(ts. But getting data from 3ot2 t2e ta3(es into common spoo( 2as over2ead. J2at if I Boined a ver/ (arge ta3(e wit2 sma(( ta3(eV "2ou(d sma(( ta3(e 3e redistri3uted or (arge ta3(eV "2ou(d sma(( ta3(e 3e dup(icated across a(( t2e #&PsV "2ou(d 3ot2 t2e ta3(es 3e redistri3uted across a(( t2e #&PsVV 8ere is some 3asic t2um3 ru(es on Boining co(umns on Inde+4 so Boining 2appens faster. Case 1 - P.I U P.I Boins T2ere is no redistri3ution of data over ampNs. "ince amp (oca( Boins 2appen as data are present in same #&P and need not 3e redistri3uted. T2ese t/pes of Boins on uni9ue primar/ inde+ are ver/ fast. Case 2 - P.I U on PI co(umn Boins -5ata from second ta3(e wi(( 3e re-distri3uted on a(( amps since Boins are 2appening on PI vs. $PI co(umn. Idea( scenario is w2en sma(( ta3(e is redistri3uted to 3e Boined wit2 (arge ta3(e records on same amp -5ata in sma(( ta3(e is dup(icated to Ever/ #&P w2ere it is Boined (oca((/ wit2 (arge ta3(e Case 3 - o Inde+ U on PI co(umn Boins 5ata from 3ot2 t2e ta3(es are redistri3uted on a(( #&Ps. T2is is one of t2e (ongest processing 9ueries 4 Care s2ou(d 3e ta6en to see t2at stats are co((ected on t2ese co(umns Tip 27 T+e co(umn$ %ar! of ,oin mu$! -e of !+e $ame /CHAR0 INTEGER012 But w2/VIV a!a !.%e

J2en tr/ing to Boin co(umns from two ta3(es4 optimi0er ma6es sure t2at datat/pe is same or e(se it wi(( trans(ate t2e co(umn in driving ta3(e to matc2 t2at of derived ta3(e. "a/ for e+amp(e T#BLE emp(o/ee deptno ;c2arA T#BLE dept deptno ;integerA If I am Boining emp(o/ee ta3(e wit2 5ept on emp(o/ee.deptno;c2arA U dept.deptno;IntegerA4 optimi0er wi(( convert c2aracter co(umn to Integer resu(ting in trans(ation . J2at wou(d 2appen if emp(o/ee ta3(e 2ad 1GG mi((ion records and ever/ time deptno wou(d 2ave to undergo Trans(ation. "o we 2ave to ma6e sure to avoid suc2 scenarios since trans(ation is a cost factor and mig2t need time and s/stem resources. &a6e sure /ou are Boining co(umns t2at 2ave same data t/pes to avoid trans(ationIIII Tip 3 7 Do no! u$e func!ion$ (i3e 'U#'TR0 COA&E'CE 0 CA'E 222 on !+e in ice$ u$e a$ %ar! of )oin2 J2/VIV It is not recommended not to use functions suc2 as "$B"T:4 CO#LE"CE4 C#"E and ot2ers since t2e/ add up to cost factor resu(ting in performance issue. Optimi0er wi(( not 3e a3(e to read stats on t2ose co(umns w2ic2 2ave functions as it is 3us/ converting functions. T2is mig2t resu(t in Product Boin4 spoo( out issues and optimi0er wi(( not 3e a3(e to ta6e decisions since no statsCdemograp2ics are avai(a3(e on co(umn. It mig2t assume co(umn to 2ave 1GG va(ues instead of 1 mi((ion va(ues and mig2t redistri3ute on wrong assumption direct(/ impacting performance. Tip ' 7 U$e NOT NU&& w+ere ever %o$$i-(e4 J2atVII 5id someone sa/ ot u((VV .. Qes4 we 2ave to ma6e sure to use OT nu(( for co(umns w2ic2 are dec(ared as $LL#BLE in T#BLE definition. :eason 3eing t2at a(( t2e u(( va(ues mig2t get sorted to one poor #&P resu(ting in infamous D O "POOL "P#CE D Error as t2at #&P cannot accommodate an/ more u(( va(ues. "O remem3er to use avoid . OT $LL in Boining so t2at ta3(e "SEJ can 3e

"ince K2:) 4 teradata automatica((/ adds t2e condition W I" OT $LL X to t2e 9uer/. "ti(( it is 3etter to ensure OT $LL co(umns are not inc(uded as part of t2e Boin.

Peformance !uning Ti%$ : &oc3ing !a-(e for acce$$5


Je wou(d 2ave come across t2is statement in man/ 9ueries w2ic2 are run in sensitive environments (i6e P:O54 $#T. T2e/ can 3e used wit2 views or sometimes Bust for 9uer/ing purpose. I wanted to discuss 2ow important t2is statement wou(d 3e in rea(-time Cactive data ware2ouses w2ere (ot of users wi(( 3e stri6ing 9ueries on same data3ase at t2e time. create view Emp(o/ee.viewRemp(o/Rwit2Loc6 as (oc6ing ta3(e Emp(o/ee.5eptRemp for access se(ect > from Emp(o/ee.5eptRemp P B/ using (oc6ing ta3(e for access4 we ma6e sure t2at norma( DaccessD (oc6 is app(ied on ta3(e w2ic2 is re9uired to fetc2 resu(ts and 3/ doing so T2ere is no waiting for ot2er (oc6s to re(ease since access (oc6 can 3e app(ied on ta3(e w2ic2 2as readCwrite (oc6 app(ied to it. T2is wi(( cause t2e 9uer/ to e+ecute even w2en some (oc6 is app(ied 4 3ut accessing data using t2is (oc6 mig2t not 3e consistent as it mig2t resu(t in dirt/ read due to concurrent write on t2e same ta3(e.

It is a(wa/s suggested to use <(oc6ing ta3(e for accessD w2ic2 since t2e/ wi(( not 3(oc6 t2e ot2er users from app(/ing readCwrite (oc6 on t2e ta3(e.

Peformance !uning Ti%$ : &I6E C(au$e


J2i(e tuning 9ueries in Teradata 4 Je ta6e care of maBor performance issues 3ut ignore sma(( cases w2ic2 mig2t sti(( cause t2e 9uer/ to perform 3ad(/. I wanted to mention a3out one suc2 case of LISE c(ause 4 w2ic2 man/ peop(e good at performance tuning miss it assuming (i6e patterns does not 2arm t2e performance . But in rea(it/ t2is is not soII

10

If LISE is used in a J8E:E c(ause4 it is 3etter to tr/ to use one or more (eading c2aracter in t2e c(ause4 if at a(( possi3(e. egP LISE NH"T:I %HN wi(( 3e processed different(/ compared to LISE N"T:I %HN If a (eading c2aracter N"T:I %HN is used in t2e 3eginning of (i6e c(ause 4 t2en t2e .ptimi0er ma6es use of an inde+ to perform on 9uer/ t2ere3/ increasing t2e performance. But if t2e (eading c2aracterN in NH"T:I %HN is a wi(dcard;sa/ NHNA 4 t2en t2e Optimi0er wi(( not 3e a3(e to use an inde+4 and a fu(( ta3(e scan ;!T" A must 3e run4 w2ic2 reduces performance and ta6es more time. 8ence it is suggested to go for NH"T:I %HN on(/ if "T:I % is a part of entire pattern sa/ N"$B"T:I %N

Peformance !uning Ti%$ : 'ome More Ti%$


J2en it comes to performance tuning4 we cannot stic6 to a certain set of ru(es. It varies 3ased on t2e data /ou are dea(ing wit2. #(t2oug24 we can create a 3ase(ine and address issues 3ased on scenarios we face on a da/ to da/ 3asis.

1. $ti(i0ing TeradataMs Para((e( #rc2itecture7 If /ou understand w2at 2appens in t2e 3ac6ground4 /ou wi(( 3e a3(e to ma6e /our 9uer/ wor6 its 3est. "o4 tr/ and run e+p(ain p(an on /our 9uer/ 3efore e+ecuting it and see 2ow t2e PE;Parsing EngineA 2as p(anned to e+ecute it. $nderstand t2e Se/-words in E+p(ain p(an. I wi(( 2ave to write a more detai(ed post on t2is topic. But for now4 (et us go on wit2 t2e 2ig2(ig2ts 2. $nderstanding :esource consumption7 :esource t2at /ou consume can 3e direct(/ re(ated to do((ars. Be aware and fruga( a3out t2e resources /ou use. !o((owing are t2e factors /ou need to 6now and c2ec6 from time to time7 a. CP$ consumption 3. Para((e( Efficienc/ C 8ot amp percentage c. "poo( usage 3. 8e(p t2e Parser7 "ince t2e arc2itecture 2as 3een made to 3e inte((igent4 we 2ave to give it some respect Qou can 2e(p t2e parser understand data /ou are dea(ing wit24 3/ co((ecting statistics. 11

But /ou need to 3e carefu( w2en /ou do so4 due to 2 reasons7 Incorrect stats are worse t2an not co((ecting stats4 so ma6e sure /our stats are not sta(e;o(dA If /our dataset c2anges rapid(/ in /our ta3(e4 and suppose /ou are dea(ing wit2 a (ot of data4 t2en co((ecting stats itse(f mig2t 3e resource consuming. "o4 3ased on 2ow fre9uent(/ /our ta3(e wi(( 3e accessed4 /ou wi(( 2ave to ma6e t2e ca((

'. "ince same ".L can 3e written in different wa/s4 /ou wi(( 2ave to 6now w2ic2 met2od is 3etter t2an w2ic2. !or eg4 creating Ko(ati(e ta3(e vs %(o3a( temp ta3(e vs wor6ing ta3(e. Qou cannot direct(/ point out w2ic2 is t2e 3est4 But I can touc2 3ase on t2e pros and cons and comparison for t2em. ). Ta6e a step 3ac6 and (oo6 at t2e w2o(e process. Consider 2ow muc2 data /ou need to 6eep4 2ow critica( is it for /our 3usiness to get t2e data soon4 2ow fre9uent(/ do /ou need to run /our ".L. &ost of t2e times4 t2e Y3ig pictureM wi(( give /ou a (ot of answers

Peformance Im%rovemen! Ti%$:


T2is artic(e is prepared to understand t2e 6e/ points re9uired to 3e considered w2i(e ana(/0ingCimproving t2e Teradata ETL performance. T2ese points genera((/ refer to t2e periodic activities or t2e 2ouse6eeping activities t2at s2ou(d 3e done periodica((/ to avoid t2e data3ase performance imp(ications. Se/ Points7 1. "tat Co((ection 2. Pac6 5is6 3. 5ata "6ew #na(/sis '. Loc6 monitoring ;Loc6ing LoggerA ). "ession tuning ;5B" Contro( parametersA 1A. "tat Co((ection 7 "tatistics on t2e (oaded ta3(es are ver/ important from t2e performance perspective. "tatistics 2e(ps optimi0er to generate t2e accurate Z faster 9uer/ p(ans. #n o(d statistics in ware2ouse can (ead to wrong 9uer/ p(ans w2ic2 ma/ ta6e time in 9uer/ processing. 8ence it is muc2 re9uired to refres2 t2e statistics periodica((/ at (east in t2e ware2ousing environment. Je can identif/ t2e ta3(es w2ic2 are fre9uent(/ accessed Z modified wit2 inserts4 updates4 de(etes etc.

12

It is recommended to refres2 t2e stats after ever/ 1GH of data c2ange. Je can co((ect t2e statistics at co(umn (eve( or at inde+ (eve(. "/nta+7 Co((ect statistics on Eta3(eRnameF co(umn ;co(umnRname 14..4 co(umnRname nAP O: Co((ect statistics on Eta3(eRnameF inde+ ;co(umnRname 14..4 co(umnRname nAP 2A. Pac6 5is6 7 Pac6 dis6 is an uti(it/ t2at free up t2e c/(inder space on t2e data3ase4 t2is uti(it/ must 3e run periodica((/ as in t2e ware2ouse environment (arge amount of data inserts4 updates are 2appening w2ic2 causes t2e p2/sica( memor/ to disorder due to fre9uent data manipu(ation. Pac6 dis6 uti(it/ a((ows us to restructure Z p2/sica((/ reorder t2e data4 free up space same as defragmentation. Teradata a(so run mini CQLP#CSs automatica((/4 if c/(inder space goes 3e(ow t2e prescri3ed (imit. C/(inder space is re9uired for t2e merge operation w2i(e t2e data Insert4 5e(etes $pdates etc. To run a pac6 dis6 we use !erret uti(it/ provided 3/ Teradata can 3e run t2roug2 Teradata &anager Too( or t2roug2 te(net on node session. T2e set of commands t2at starts pac6dis6 uti(it/ are given 3e(ow one can create a 6ron Bo3 to sc2edu(e t2e same Z run it periodica((/. Commands to run pac6 defrag Z pac6dis6 uti(ities 7 [ferret defrag Q pac6dis6 fspU1 Q

3A. "6ew #na(/sis 7 Primar/ inde+ of a ta3(e in Teradata is responsi3(e for t2e data distri3ution on a(( t2e #&Ps. Proper data distri3ution is re9uired for t2e para((e( processing in t2e s/stem. #s Teradata s/stem fo((ows s2ared not2ing arc2itecture4 a(( t2e #&Ps wor6s in para((e(. If data is even(/ distri3uted amongst t2e #&Ps t2en t2e amount of t2e wor6 done 3/ ever/ #&P wou(d 3e e9ua( Z time re9uired for particu(ar Bo3 wou(d o3vious(/ 3e (esser. In contrast to t2is if on(/ oneCtwo #&Ps are f(ooded wit2 t2e data i.e. data s6ew t2en w2i(e running t2at Bo3 t2e two #&Ps wou(d 3e wor6ing Z ot2ers wi(( 3e id(e. In t2is case we wonMt 3e uti(i0ing t2e para((e( processing power of t2e s/stem.

13

To avoid suc2 data s6ew need to ana(/0e t2e primar/ inde+ of t2e ta3(es in Teradata data3ase over t2e period of time it mig2t 2appen t2at data is getting accumu(ate at t2e few #&Ps4 w2ic2 can 2ave a adverse effect on t2e ETL as we(( as t2e s/stem performance. To ana(/0e t2e data distri3ution for t2e ta3(e we can use t2e in3ui(t 8#"8 functions provided 3/ t2e Teradata. To c2ec6 t2e data distri3ution for a ta3(e one can use a 9uer/7 "ELECT 8#"8#&P ;8#"8B$CSET ;8#"8:OJ ;Co(umn 14..4 co(umn nAAA #" #&PR $&4 count;>A !rom Ta3(eR ame %roup 3/ 1P T2is 9uer/ wi(( provide t2e distri3ution of records on eac2 #&P we can a(so ana(/0e t2e pro3a3(e PIs wit2 t2is 9uer/ w2ic2 wi(( predict t2e data distri3ution on t2e #&Ps 'A. Loc6 monitoring 7 Loc6ing Logger is an uti(it/ t2at ena3(es us to monitor t2e (oc6ing on t2e ta3(es. $sing t2is uti(it/ we can create a ta3(e t2at 2as t2e entries for t2e (oc6s w2ic2 2ave 3een app(ied to t2e ta3(es w2i(e processing. T2is uti(it/ a((ows us to ana(/0e t2e regu(ar ETL process4 Bo3s 3eing 3(oc6ed at particu(ar time w2en t2ere is no one to monitor t2e (oc6ing. B/ ana(/0ing suc2 (oc6ing situations we can modif/ t2e Bo3s Z avoid t2e waiting period due to suc2 situations. To app(/ t2is (oc6ing (oggers !irst4 we need to ena3(e (oc6ing (ogger via t2e 5B" conso(e window or t2e cnsterm su3s/stem. T2e setting does not ta6e effect unti( t2e data3ase is restarted. Loc6Logger - T2is !ie(d defines t2e s/stem defau(t for t2e (oc6ing (ogger. T2is a((ows t2e 5B# to (og t2e de(a/s caused 3/ data3ase (oc6s to 2e(p in identif/ing (oc6 conf(icts. To ena3(e t2is feature set t2e fie(d to T:$E. To disa3(e t2e feature set t2e fie(d to !#L"E. #fter a data3ase restart wit2 t2e Loc6Logger f(ag set to true4 t2e Loc6ing Logger wi(( 3egin to accumu(ate (oc6 information into a circu(ar memor/ 3uffer of ,'SB. 5epending on 2ow fre9uent(/ t2e s/stem encounters (oc6 contention4 t2is 3uffer wi(( wrap4 3ut it wi(( usua((/ span a severa( da/ period. !o((owing a period of (oc6 contention4 to ana(/0e t2e (oc6 activit/4 /ou need to run t2e dump(oc6(og uti(it/ w2ic2 moves t2e data from t2e memor/ 3uffer to a data3ase ta3(e w2ere it can 3e accessed. )A. "ession Tuning7 "ession tuning is done for t2e running t2e (oad uti(ities in para((e(

14

t2is re9uires to ana(/0e some 5B"contro( parameters Z tune t2e same to provide t2e 3est para((e( processing of t2e (oad uti(ities. T2ere are two parameters &a+Load#JT Z &a+LoadTas6s t2at ena3(es t2e para((e( Bo3 management a s2ort note on t2e same7 T2e &a+Load#JT interna( fie(d serves two purposes7 1A Ena3(ing a 2ig2er (imit for t2e &a+LoadTas6s fie(d 3e/ond t2e defau(t (imit of 1). 2A "pecif/ing t2e #&P Jor6er Tas6 ;#JTA (imit for concurrent !astLoad and &u(tiLoad Bo3s w2en a 2ig2er (imit is ena3(ed. In effect4 t2is fie(d a((ows more !astLoad4 &u(tiLoad4 and !astE+port uti(ities running concurrent(/ w2i(e contro((ing #JT usage and preventing e+cessive consumption and possi3(e #JT e+2austion. T2e defau(t va(ue is 0ero7 J2en &a+Load#JT is 0ero4 concurrenc/ (imit operates in t2e same manner as prior to K2:,.1 &a+LoadTas6s specifies t2e concurrenc/ (imit for a(( t2ree uti(ities7 !astLoad4 &u(tiLoad4 and !astE+port. T2e va(id range for &a+LoadTas6s is from G to 1). J2en &a+Load#JT is non-0ero ;2ig2er (imit ena3(edA7 It specifies t2e ma+imum num3er of #JTs t2at can 3e used 3/ !astLoads and &u(tiLoads. &a+imum a((owa3(e va(ue is ,GH of t2e tota( #JTs. T2e va(id range for &a+LoadTas6s is from G to 3G. # new !astLoadC&u(tiLoad Bo3 is a((owed to start on(/ if BOT8 &a+LoadTas6s # 5 &a+Load#JT (imits are not reac2ed. T2erefore4 Bo3s ma/ 3e reBected 3efore &a+LoadTas6s (imit is e+ceeded. &a+LoadTas6s specifies t2e concurrenc/ (imit for t2e com3ination of on(/ two uti(ities7 !astLoad and &u(tiLoad. !astE+port is managed different(/P !astE+port is no (onger contro((ed 3/ t2e &a+LoadTas6s fie(d. # !astE+port Bo3 is on(/ reBected if t2e tota( num3er of active uti(it/ Bo3s is ,G. #t (east 3G !astE+port Bo3s can run at an/ time. # !astE+port Bo3 ma/ 3e a3(e to run even w2en !astLoad and &u(tiLoad Bo3s are reBected. J2en a Teradata 5/namic Jor6(oad &anager ;T5J&A uti(it/ t2rott(e ru(e is ena3(ed4 t2e &a+Load#JT fie(d is overridden. T5J& wi(( use t2e 2ig2est a((owa3(e va(ue w2ic2 is ,GH of tota( #JTs. $pdate to &a+Load#JT 3ecomes effective after t2e 5B" contro( record 2as 3een written. o 5B" restart is re9uired. ote t2at w2en t2e tota( num3er of #JTs ;specified 3/ t2e interna( fie(d &a+#&PJor6erTas6sA 2as 3een modified 3ut a 5B" restart 2as not occurred4 t2en t2ere ma/ 3e a discrepanc/ 3etween t2e actua( num3er of #JTs and t2e 5B" contro( record. T2e s/stem ma/ interna((/ reduce t2e effective va(ue of &a+Load#JTs to prevent #JT e+2austion. #JT $sage of Load $ti(ities7 #(( (oadCun(oad uti(ities re9uire and consume #JTs at different rates depending on t2e e+ecution p2ase7 !astLoad7 * P2ase 1 ;LoadingA7 3 #JTMs * P2ase 2 ;End 15

LoadingA7 1 #JTMs &u(tiLoad>7 * #c9uisition P2ase ;and 3eforeA7 2 #JTMs. #pp(ication P2ase ;and afterA7 1 #JTMs !astE+port7 * #((. T2is description is for t2e sing(e target ta3(e case w2ic2 is t2e most common. T2e a3ove e+p(ained parameters can 3e ana(/0ed Z tuned according(/ to ac2ieve t2e e+pected performance on t2e Teradata s/stem. #(so need to 2ave some maintenanceC 8ouse 6eeping activities in p(ace to avoid t2e performance imp(ications due to some p2/sica( data parameters (i6e data s6ew4 (ess c/(inder space etc.

Tera a!a '7& 7uer. O%!imi8a!ion:


'7& an In e9e$ :

1A Primar/ inde+es7 $se primar/ inde+es for Boins w2enever possi3(e4 and specif/ in t2e w2ere c(ause a(( t2e co(umns for t2e primar/ inde+es. 2A "econdar/ inde+es ;1GH ru(e rumorA7 T2e optimi0er does not actua((/ use a 1GH ru(e to determine if a secondar/ inde+ wi(( 3e used. But4 t2is is a good estimation7 If (ess t2an 1GH of a ta3(e wi(( 3e accessed if t2e secondar/ inde+ is used4 t2en assume t2e s9( wi(( use t2e secondar/ inde+. Ot2erwise4 t2e s9( e+ecution wi(( do a fu(( ta3(e scan. T2e optimi0er actua((/ uses a <(east cost@ met2od7 T2e optimi0er determines if t2e cost of using a secondar/ inde+ is c2eaper t2an t2e cost of doing a fu(( ta3(e scan. T2e cost invo(ves t2e cpu usage4 and dis6io counts. 3A Constants7 $se constants to specif/ inde+ co(umn contents w2enever possi3(e4 instead of specif/ing t2e constant once4 and Boining t2e ta3(es. T2is ma/ provide a sma(( savings on performance. 'A &at2ematica( operations7 &at2ematica( operations are faster t2an string operations ;i.e. concatenationA4 if 3ot2 can ac2ieve t2e same resu(t. )A Karia3(e (engt2 co(umns7 T2e use of varia3(e (engt2 co(umns s2ou(d 3e minimi0ed4 and s2ou(d 3e 3/ e+ception. !i+ed (engt2 co(umns s2ou(d a(wa/s 3e used to define ta3(es. ,A $nion7 T2e <union@ command can 3e used to 3rea6 up a (arge s9( process or statement into severa( sma((er s9( processes or statements4 w2ic2 wou(d run in para((e(. But t2ese cou(d t2en cause spoo(space (imit pro3(ems. <$nion a((@ e+ecutes t2e s9(Ms sing(e t2readed. 16

-A J2ere inCw2ere not in ;su39uer/A7 T2e s9( <w2ere in@ is more efficient t2an t2e s9( <w2ere not in@. It is more efficient to specif/ constants in t2ese4 3ut if a su39uer/ is specified4 t2en t2e su39uer/ 2as a direct impact on t2e s9( time. If t2ere is a s9( time pro3(em wit2 t2e su39uer/4 t2en t2e s9( su39uer/ cou(d 3e separated from t2e origina( 9uer/. T2is wou(d re9uire 2 s9( statements4 and an intermediate ta3(e. T2e 2 s9( statements wou(d 3e7 1A ew s9( statement4 w2ic2 does t2e previous su39uer/ function4 and inserts into t2e temporar/ ta3(e4 and 2A &odified origina( s9( statement4 w2ic2 doesnMt 2ave t2e su39uer/4 and reads t2e temporar/ ta3(e. 1A "trategic "emico(on7 #t t2e end of ever/ s9( statement4 t2ere is a semico(on. In some cases4 t2e strategic p(acement of t2is semico(on can improve t2e s9( time of a group of s9( statements. But t2is wi(( not improve an individua( s9( statementMs time. T2ese are a coup(e cases7 1A T2e groupMs s9( time cou(d 3e improved if a group of s9( statements s2are t2e same ta3(es ;or spoo( fi(esA4 2A T2e groupMs s9( time cou(d 3e improved if severa( s9( statements use t2e same uni+ input fi(e.

Re ucing &arge '7&:$ :


T2e fo((owing met2ods can 3e used to scope down t2e si0e of s9(Ms. 1A Ta3(e denorma(i0ation7 5up(icating data in anot2er ta3(e. T2is provides faster access to t2e dup(icated data4 3ut re9uires more update time. 2A Ta3(e summari0ation7 T2e data from oneCman/ ta3(e;sA is summari0ed into common(/ used summar/ ta3(es. T2is provides faster access to t2e summari0ed data4 3ut re9uires more update time. 3A ".L union7 T2e 5BCC".L $nion can 3e used to 3rea6 up a (arge ".L process or statement into severa( sma((er ".L processes or statements4 w2ic2 wou(d run in para((e(. 'A $ni+ sp(it7 # (arge input uni+ fi(es cou(d 3e sp(it into severa( sma((er uni+ fi(es4 w2ic2 cou(d t2en 3e input in series4 or in para((e(4 to create sma((er ".L processing steps. )A $ni+ concatenation7 # (arge 9uer/ cou(d 3e 3ro6en up into sma((er independent 9ueries4 w2ose output is written to severa( sma((er uni+ fi(es. T2en t2ese sma((er fi(es are uni+ concatenated toget2er to provide a sing(e uni+ fi(e.

17

,A Trigger ta3(es7 # group of ta3(es4 eac2 contains a su3set of t2e 6e/s of t2e inde+ of an origina( ta3(e. t2e ta3(es cou(d 3e created 3ased on some va(ue in t2e inde+ of t2e origina( ta3(e. T2is provides an a3i(it/ to 3rea6 up a (arge ".L statement into mu(tip(e sma((er ".L statements4 3ut creating t2e trigger ta3(es re9uires more update time. -A "orts ;order 3/A7 #(t2oug2 sorts ta6e time4 t2ese are a(wa/s done at t2e end of t2e 9uer/4 and t2e sort time is direct(/ dependent on t2e si0e of t2e so(ution. $nnecessar/ sorts cou(d 3e e(iminated. 1A E+portCLoad7 Ta3(e data cou(d 3e e+ported ;Bte94 !aste+portA to a uni+ fi(e4 and updated4 and t2en re(oaded into t2e ta3(e ;Bte94 fast(oad4 &u(ti(oadA. \A C P:O%:#&C$ I= "C:IPT"7 "ome data manipu(ation is ver/ difficu(t and time consuming in s9(. T2ese cou(d 3e rep(aced wit2 c programsCuni+ scripts. "ee t2e <CCEm3edded s9(@ tip.

Re ucing Ta-(e U% a!e Time :


1A Ta3(e update time can 3e improved 3/ dropping t2e ta3(eMs inde+es first4 and t2en doing t2e updates. #fter t2e comp(etion of t2e updates4 t2en re3ui(d t2e inde+es4 and reco((ect t2e ta3(eMs statistics on t2e inde+es. T2e 3est improvement is o3tained w2en t2e vo(ume of ta3(e updates is (arge in re(ation to t2e si0e of t2e ta3(e. If more t2en )H of a (arge ta3(e is c2anged. 2A Tr/ to avoid dropping a ta3(e4 instead4 de(ete t2e ta3(e. Ta3(e re(ated statements ;i.e. create ta3(e4 drop ta3(eA are sing(e t2readed t2ru a s/stem permissions ta3(e and 3ecome a 3ott(enec6. T2e/ can a(so cause dead(oc6s on t2e dictionar/ ta3(es. #(so4 an/ user permissions specific to t2e ta3(e are dropped w2en t2e ta3(e is dropped4 and t2ese permissions must 3e recreated.

Conc(u$ion:
Teradata is a "/stem w2ic2 rea((/ can process t2e comp(e+ 9ueries ver/ fast(/. Teradata data3ase is Linear(/ sca(a3(e.Je can e+pand t2e data3ase capacit/ 3/ Bust adding more nodes to t2e e+isting data3ase.If t2e data vo(ume grows we can add more 2ardware and e+pand t2e data3ase capacit/. Teradata 2as a e+tensive para((e( processing capacit/4It can 2and(e mu(tip(e ad2oc re9uests and man/ concurrent users. Teradata data3ase 2as s2ared not2ing arc2itecture. It 2as 2ig2 fau(t to(erance and data protection. #not2er advantage is t2e uniform distri3ution of data t2roug2 t2e $ni9ue primar/ inde+es wit2 out an/ over2ead. T2e 18

performance is Bust ama0ing for 8uge data. Teradata is e+ce((ent to 2and(e 8$%E data.

19

Vous aimerez peut-être aussi