Vous êtes sur la page 1sur 7

Oracle RAC instances are composed of following background processes:

ACMS(11g) Atomic Control le to Memor! Ser"ice (ACMS)


#$%&'( (11g) #lobal $ransaction )rocess
*MO+ #lobal ,n-ueue Ser"ice Monitor
*M. #lobal ,n-ueue Ser"ice .aemon
*MS #lobal Cac/e Ser"ice )rocess
*C0& 1nstance ,n-ueue )rocess
.1A# .iagnosabilit! .aemon
RMSn Oracle RAC Management )rocesses (RMSn)
RSM+ Remote Sla"e Monitor
.2RM .atabase Resource Manager (from 11g R3)
)1+# Response $ime Agent (from 11g R3)
Oracle Real Application Clusters New features
Oracle 9i RAC
O)S (Oracle )arallel Ser"er) was renamed as RAC
C4S (Cluster 4ile S!stem) was supported
OC4S (Oracle Cluster 4ile S!stem) for *inu5 and 6indows
watc/dog timer replaced b! /angc/eck timer
Oracle 10g R1 RAC
Cluster Manager replaced b! CRS
ASM introduced
Concept of Ser"ices e5panded
ocrc/eck introduced
ocrdump introduced
A6R was instance specic
Oracle 10g R2 RAC
CRS was renamed as Clusterware
asmcmd introduced
C*7849 introduced
OCR and 8oting disks can be mirrored
Can use 4A+:4C4 wit/ $A4 for OC1 and O.);+,$
Oracle 11g R1 RAC
1; Oracle 11g RAC parallel upgrades ' Oracle 11g /a"e rolling upgrade features w/ereb!
RAC database can be upgraded wit/out an! downtime;
3; <ot patc/ing ' =ero downtime patc/ application;
>; Oracle RAC load balancing ad"isor ' Starting from 1&g R3 we /a"e RAC load balancing
ad"isor utilit!; 11g RAC load balancing ad"isor is onl! a"ailable wit/ clients w/o use
;+,$? O.2C? or t/e Oracle Call 1nterface (OC1);
@; A..M for RAC ' Oracle /as incorporated RAC into t/e automatic database diagnostic
monitor? for cross'node ad"isories; $/e script addmrpt;s-l run gi"e report for single
instance? will not report all instances in RAC? t/is is known as instance A..M; 2ut using
t/e new package .2MSAA..M? we can generate report for all instances of RAC? t/is
known as database A..M;
B; OptimiCed RAC cac/e fusion protocols ' mo"es on from t/e general cac/e fusion
protocols in 1&g to deal wit/ specic scenarios w/ere t/e protocols could be furt/er
optimiCed;
D; Oracle 11g RAC #rid pro"isioning ' $/e Oracle grid control pro"isioning pack allows us to
Eblow'outE a RAC node wit/out t/e time'consuming install? using a pre'installed
EfootprintE;
Oracle 11g R2 RAC
1; 6e can store e"er!t/ing on t/e ASM; 6e can store OCR F "oting les also on t/e ASM;
3; ASMCA
>; Single Client Access +ame (SCA+) ' eliminates t/e need to c/ange tns entr! w/en
nodes are added to or remo"ed from t/e Cluster; RAC instances register to SCA+
listeners as remote listeners; SCA+ is full! -ualied name; Oracle recommends
assigning > addresses to SCA+? w/ic/ create t/ree SCA+ listeners;
@; A6R is consolidated for t/e database;
B; 11g Release 3 Real Application Cluster (RAC) /as ser"er pooling tec/nologies so itGs
easier to pro"ision and manage database grids; $/is update is geared toward
d!namicall! ad(usting ser"ers as corporations manage t/e ebb and How between data
re-uirements for dataware/ousing and applications;
D; 2! default? *OA.A2A*A+C, is O+;
I; #S. (#lobal Ser"ice .eamon)? gsdctl introduced;
J; #)n) prole;
K; Oracle RAC One+ode is a new option t/at makes it easier to consolidate databases t/at
arenGt mission critical? but need redundanc!;
1&;raconeinit ' to con"ert database to RacOne+ode;
11;racone5 ' to 5 RacOne+ode database in case of failure;
13;racone3rac ' to con"ert RacOne+ode back to RAC;
1>;Oracle Restart ' t/e feature of Oracle #rid 1nfrastructureLs <ig/ A"ailabilit! Ser"ices
(<AS) to manage associated listeners? ASM instances and Oracle instances;
1@;Oracle Omotion ' Oracle 11g release3 RAC introduces new feature called Oracle
Omotion? an online migration utilit!; $/is Omotion utilit! will relocate t/e instance from
one node to anot/er? w/ene"er instance failure /appens;
1B;Omotion utilit! uses .atabase Area +etwork (.A+) to mo"e Oracle instances; .atabase
Area +etwork (.A+) tec/nolog! /elps seamless database relocation wit/out losing
transactions;
1D;Cluster $ime S!nc/roniCation Ser"ice (C$SS) is a new feature in Oracle 11g R3
RAC? w/ic/ is used to s!nc/roniCe time across t/e nodes of t/e cluster; C$SS will be
replacement of +$) protocol;
1I;#rid +aming Ser"ice (#+S) is a new ser"ice introduced in Oracle RAC 11g R3; 6it/ #+S?
Oracle Clusterware (CRS) can manage .!namic <ost Conguration )rotocol (.<C)) and
.+S ser"ices for t/e d!namic node registration and conguration;
1J;Oracle *ocal Registr! (O*R) ' 4rom Oracle 11gR3 EOracle *ocal Registr! (O*R)E
somet/ing new as part of Oracle Clusterware; O*R is nodeGs local repositor!? similar to
OCR (but local) and is managed b! O<AS.; 1t pertains data of local node onl! and is not
s/ared among ot/er nodes;
1K;Multicasting is introduced in 11gR3 for pri"ate interconnect traMc;
3&;1:O fencing pre"ents updates b! failed instances? and detecting failure and pre"enting
split brain in cluster; 6/en a cluster node fails? t/e failed node needs to be fenced oN
from all t/e s/ared disk de"ices or diskgroups; $/is met/odolog! is called 1:O 4encing?
sometimes called .isk 4encing or failure fencing;
31;Re'bootless node fencing (restart) ' instead of fast re'booting t/e node? a graceful
s/utdown of t/e stack is attempted;
33;8irtual Oracle 11g RAC cluster ' Oracle 11g RAC supports "irtualiCation;
SPLIT BRAIN CONITION AN IO !"NCIN# $"C%ANIS$ IN ORACL" CL&ST"R'AR"
Oracle clusterware pro"ides t/e mec/anisms to monitor t/e cluster operation and detect
some potential issues wit/ t/e cluster; One of particular scenarios t/at needs to be pre"ented is
called split brain condition; A split brain condition occurs w/en a single cluster node /as a failure
t/at results in reconguration of cluster into multiple partitions wit/ eac/ partition forming its
own sub'cluster wit/out t/e knowledge of t/e e5istence of ot/er; $/is would lead to collision and
corruption of s/ared data as eac/ sub'cluster assumes owners/ip of s/ared data O1P; 4or a
cluster databases like Oracle RAC database? data corruption is a serious issue t/at /as to be
pre"ented all t/e time; Oracle clustereware solution to t/e split brain condition is to pro"ide 1O
fencing: if a cluster node fails? Oracle clusterware ensures t/e failed node to be
fenced oN from all t/e 1O operations on t/e s/ared storage; One of t/e 1O fencing met/od is
called S$OM1$< w/ic/ stands for S/oot t/e Ot/er Mac/ine in t/e <ead;
1n t/is met/od? once detecting a potential split brain condition? Oracle clusterware
automaticall! picks a cluster node as a "ictim to reboot to a"oid data corruption; $/is process is
called node e"iction; .2As or s!stem administrators need to understand /ow t/is 1O fencing
mec/anism works and learn /ow to troubles/oot t/e clustereware problem; 6/en t/e!
e5perience a cluster node reboot e"ent? .2As or s!stem administrators need to be able to
anal!Ce t/e e"ents and identif! t/e root cause of t/e clusterware failure;
Oracle clusterware uses two Cluster S!nc/roniCation Ser"ice (CSS) /eartbeats:
1; network /eartbeat (+<2) and
3; disk /eartbeat (.<2)
and two CSS misscount "alues associated wit/ t/ese /eartbeats to detect t/e potential
split brain conditions;
$/e network /eartbeat crosses t/e pri"ate interconnect to establis/ and conrm "alid
node members/ip in t/e cluster; $/e disk /eartbeat is between t/e cluster node and t/e "oting
disk on t/e s/ared storage; 2ot/ /eartbeats /a"e t/eir own ma5imal misscount "alues in
seconds called CSS misscount in w/ic/ t/e /eartbeats must be completedQ ot/erwise a node
e"iction will be triggered;
$/e CSS (isscount for t)e networ* )eart+eat /as t/e following default "alues
depending on t/e "ersion of Oracle clusterweare and operating s!stems:
OS
10g
,R1
-R2
.
11
g
*inu5 D& >&
7ni5 >& >&
8MS >& >&
6indo
ws >& >&
$/e CSS (isscount for /is* )eart+eat also "aries on t/e "ersions of Oracle
clustereware; 4or oracle 1&;3;1 and up? t/e default "alue is 3&& seconds;
NO" "0ICTION IA#NOSIS CAS" ST&1
6/en a node e"iction occurs? Oracle clusterware usuall! records error messages into "arious log
les; $/ese logs les pro"ide t/e e"idences and t/e start points for .2As and s!stem
administrators to do troubles/ooting ; $/e following case stud! illustrates a troubles/ooting
process based on a node e"iction w/ic/ occurred in a 11'node 1&g RAC production database; $/e
s!mptom was t/at node I of t/at cluster got automaticall! rebooted around 11:1Bam; $/e
troubles/ooting started wit/ e5amining s!slog le :"arQlog:messages and found t/e following
error message:
Jul 23 11:15:23 racdb7 logger: Oracle clsomon failed with fatal status 12.
Jul 23 11:15:23 racdb7 logger: Oracle CSSD failure 13.
Jul 23 11:15:23 racdb7 logger: Oracle C!S failure. !ebooting for cluster integrit".
$/en e5amined t/e OCSS. logle at $CRS_HOME/log/<hostname>/cssd/ocssd.log le and
found t/e following error
messages w/ic/ s/owed t/at node I network /eartbeat didnGt complete wit/in t/e D& seconds
CSS misscount and triggered a node e"iction e"ent:
# CSSD$2%%&'%7'23 11:1:(.15% #11(()1&%%$ *+,!-.-/:
clssnm0olling1hread: node racdb7 273 at 5%4 heartbeat fatal5 e6iction in 2(.72% seconds
..
clssnm0olling1hread: node racdb7 273 at (%4 heartbeat fatal5 e6iction in %.55% seconds
7
# CSSD$2%%&'%7'23 11:15:1(.%7( #122%5(&112$ *1!,C8:
clssnmDoS"nc9:date: 1erminating node 75 racdb75 misstime2)%2%%3 state233
CRS R"BOOTS TRO&BL"S%OOTIN# PROC"&R"
2esides of t/e node e"iction caused b! t/e failure of network /eartbeat or disk /eartbeat?
ot/er e"ents ma! also cause CRS node reboot; Oracle clusterware pro"ides se"eral processes to
monitor t/e operations of t/e clusterware; 6/en certain conditions occurs? to protect t/e data
integrit!? t/ese monitoring process ma! automaticall! kill t/e clusterware? e"en reboot t/e node
and lea"e some critical error messages in t/eir log les $/e following lists roles of t/ese
clusterware processes in t/e ser"er reboot and w/ere t/eir logs are located:
$/ree of clusterware processes OCSS.? O)ROC. and OC*SOMO+ can initiate a CRS reboot
w/en t/e! run into certain errors:
1; OCSS , CSS /ae(on. monitors inter'node /eat/? suc/ as t/e interconnect and
members/ip of t/e cluster nodes; 1ts log le is located in
RCRSA<OM,:log:S/ostT:cssd:ocssd;log
3; OPROC,Oracle Process $onitor ae(on.? introduced in 1&;3;&;@? detects /ardware
and dri"er freeCes t/at results in t/e node e"iction? t/en kills t/e node to pre"ent an! 1O
from accessing t/e s/aring disk; 1ts log le is :etc:oracle:oprocd:S/ostnameT; oprocd;log
>; OCLSO$ON process monitors t/e CSS daemon for /angs or sc/eduling issue; 1t ma!
reboot t/e node if it sees a potential /ang; $/e log le is
RCRSA<OM,:log:S/ostT:cssd:oclsomon:oclsmon;log
And one of t/e most important log les is t/e s!slog le? On *inu5? t/e s!slog le is
:"ar:log:messages;
$/e CRS reboot troubles/ooting procedure starts wit/ re"iewing "arious logs les to identif!
w/ic/ of t/ree processes abo"e contributes t/e node reboot and t/en isolates t/e root cause of
t/is process reboot; 4igure D troubles/ooting tree or diagram illustrated t/e CRS reboot
troubles/ooting Howc/art;

Vous aimerez peut-être aussi