Académique Documents
Professionnel Documents
Culture Documents
There are several VERITAS products that all work together when running a VERITAS
cluster. These are:
Normally, at boot-up after a crash, a system needs to manually check the integrity of
all the filesystems. With VERITAS, it checks the journalled logs, and then comes up.
This can save 30-60 minutes on large filesystems.
Generally, maintanence for VERITAS file system is often done with VERITAS Volume
Manager.
--------------------------------------------------------------------------------
Veritas Volume Manager Overview
Veritas Volume Manager divides disks into disk groups and partitions these groups as
desired. There is a nice GUI which helps alot. You can even pull up a command
window to see what the gui is running. The newest version of the gui is vmsa.
General Commands:
1
Veritas Volume Manager licenses info:
/usr/sbin/vxlicense -p
What volume groups:
vxdg list
Import volume group (see details on cluster debugging)
vxprint -ht
What is veritas doing (if running another command and it is hanging)
vxtask list
--------------------------------------------------------------------------------
Veritas Cluster Overview
Veritas Cluster enables one system to failover to the other system. All related
software processes are simply moved from one system to the other system with
minimal downtime.
Veritas Cluster does NOT have both boxes up at once sevicing requests. It only
offers a hot standby system. This enables the system to keep running (with a short
transfer period) if a machine fails or system maintenance needs to be done.
Overview
Cluster Startup Processes
File Locations * Changing Configurations
If any of the critical processes fail, the whole system is faulted. The most common
reason for failing is expired licenses, so check licenses before doing work with
vxlicense -p.
2
Overview * Cluster Startup Processes * File Locations * Changing Configurations
Veritas Cluster Install * Veritas Cluster Debugging * Veritas Cluster Testing
File Locations (Logs, Conf, Executables)
Look at the most recent ones for debugging purposes (ls -ltr).
Conf files:
ALWAYS be very careful when changing the cluster configurations. The only time I
needed to change the cluster configuration was when Vipul upgraded Oracle versions
and ORACLE_HOME changed directories. This is a very dangerous thing to do.
There are two ways of changing the configurations. The method one uses if the
system is up (cluster is running on at least one node, preferably on both):
haconf -makerw
3
run needed commands (ie. hasys ....)
haconf -dump -makero
If both systems are down:
--------------------------------------------------------------------------------
This is the MOST important part of the veritas cluster installation process. If you skip
a step - you pay for it later!
Machines racked
Machines jumpstarted or equivalent (last 4 e3500's needed to be done by hand)
On one machine set scsi-initiator to 7
Have array installed, verify disks can be seen on both machines
Additional patches for veritas installed (download latest sun recommended patches)
-- these are not in jumpstart!
Send off for veritas licemse -- if not there in time, get temp license
Have software ready for install
Hardware meets specifications
Veritas checklist should be filled out & ready to go
Change internet/network link to qfe1 ; cross over cables on hme0 & qfe0 (REMOVE
hostname.home and hostname.qfe0)
/etc/init.d/volmgt start
insert cdrom for db edition 2.1 [or higher]
cd /cdrom/cdrom0
./installDBED [answer default/yes to all EXCEPT say no to single / group ]
[may need to change cdroms here - not sure]
installvx
remove cdrom
vxdiskadm -- encapsulate root - specify your 2 root drives
process will reboot twice
create mount points for your db
Once done, on ONE machine:
4
mount disks
add to vfstab
/etc/init.d/volmgt start
insert cluster server cd
cd /cdrom/cdrom0
pkgadd -d . (add packages 3,2,5,1,4,6; yes to everything)
eject cdrom; mount oracle cluster agent
cd /cdrom/cdrom0
pkgadd -d .
eject cdrom
cd /opt/VRTSllt
cp llttab /etc
cd /etc
vi /etc/llttab -- uncomment/change following:
set-node 0 [on one machine set to 1]
set-cluster 0
link hme0 & qfe0
low-link pri qfe1
start (at bottom)
vi /etc/gabtab
uncomment gabconfig -c -n 2
cd /etc/rc2.d
start llt and gab on both machines
/sbin/lltconfig -a list [check for all 3 interfaces]
gabconfig -a [check for membership]
add /sbin and /opt/VRTSvcs/bin to /.profile PATH]
On ONE machine:
mkdir /etc/VRTSvcs/conf/config
cd /etc/VRTSvcs/config
cp *.cf config
cd sample-oracle
cp main.cf ../config [may need to copy other files - check]
cd ../config
vi main.cf
update systema and systemb
update SystemList and AutoStartList
add diskgroups
IP qfe1 nic-qfe1
add mountpoints
update oracle info
build dependancies
listener
5
oracle
mount
volumes
diskgroup
vip
nic
hacf -verify .
manually stop listener [lsnrctl stop]
manually stop db [svrmgrl; connect internal ; shutdown immediate]
take mountpoints out of vfstab
hastart [start cluster]
hagrp -switch oragrp -to systemb [test switchover]
run veritas testing on both machines
Veritas cluster server is a high availability server. This means that processes switch
between servers when a server fails. All database processes are run through this
server - and as such, this needs to run smoothly. Note that the oracle process should
only actually be running on the server which is active. On monitoring tools, the procs
light for whichever box is secondary should be yellow, because oracle is not running.
Yet, the cluster is running on both systems.
/opt/VRTSvcs/bin/hastatus -summary
This will give the general status of each machine and processes
/opt/VRTSvcs/bin/hares -display
This gives much more details - down to the resource level.
If hastatus fails on both machines (it returns that the cluster is not up or returns
nothing), try to start the cluster
/opt/VRTSvcs/bin/hastart
6
/opt/VRTSvcs/bin/hastatus -summary
will tell you if processes started properly. It will NOT start processes on a FAULTED
system.
Starting Single System NOT Faulted
If the system is NOT FAULTED and only one system is up, the cluster probably needs
to have gabconfig manually started. Do this by running:
/sbin/gabconfig -c -x
/opt/VRTSvcs/bin/hastart
/opt/VRTSvcs/bin/hastatus -summary
If the system is faulted, check licenses and clear the faults as described next.
To check licenses:
vxlicense -p
Make sure all licenses are current - and NOT expired! If they are expired, that is your
problem. Call VERITAS to get temporary licenses.
There is a BUG with veritas licences. Veritas will not run if there are ANY expired
licenses -- even if you have the valid ones you need. To get veritas to run, you will
need to MOVE the expired licenses. [Note: you will minimally need VXFS, VxVM and
RAID licenses to NOT be expired from what I understand.]
vxlicense -p
Note the NUMBER after the license (ie: Feature name: DATABASE_EDITION [100])
cd /etc/vx/elm
mkdir old
mv lic.number old [do this for all expired licenses]
vxlicense -p [Make sure there are no expired licenses AND your good licenses are
there]
hastart
If still fails, call veritas for temp licenses. Otherwise, be certain to do the same on
your second machine.
To clear FAULTS:
hares -display
For each resource that is faulted run:
hares -clear resource-name -sys faulted-system
If all of these clear, then run hastatus -summary and make sure that these are clear.
If some don't clear you MAY be able to clear them on the group level. Only do this as
last resort:
hagrp -disableresources groupname
hagrp -flush group -sys sysname
hagrp -enableresources groupname
7
Veritas Product Overview
--------------------------------------------------------------------------------
Veritas Software
Home Page
Veritas Software
Technical Services
--------------------------------------------------------------------------------
Veritas Software
Veritas Clusters:
Shared Data Clusters: Scaleable,...
-- SYSTEM STATE
-- System State Frozen
A gedb001 RUNNING 0
A gedb002 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
8
gedb002# hares -display | grep ONLINE
nic-qfe3 State gedb001 ONLINE
nic-qfe3 State gedb002 ONLINE
Recovery Commands:
hastop -all
on one machine hastart
wait a few minutes
on other machine hastart
Reviewing Log Files:
If you are still having troubles, look at the logs in /var/VRTSvcs/log. Look at the most
recent ones for debugging purposes (ls -ltr). Here is a short description of the logs in
/var/VRTSvcs/log:
By looking at the most recent logs, you can know what failed last (or most recently).
You can also tell what did NOT run which may be jut as much of a clue. Of course, if
none of this helps, open a call with veritas tech support.
If you have tried the previously described debugging methods, call Veritas tech
support: 800-634-4747. Your company needs to have a Veritas support contract.
9
Restarting Services:
If a system is gracefully shutdown and it was running oracle or other high availability
services, it will NOT transfer them. It only transfers services when the system
crashes or has an error.
hastart
hastatus -summary
will tell you if processes started properly. It will NOT start processes on a FAULTED
system. If the system is faulted, clear the faults as described above.
Doing Maintenance on DBs:
BEFORE working on DB
hastart on the same machine as you started the work on (the first on system with
oracle running)
wait 3-5 minutes
then run hastart on the other system
If you need the instance to run on the other system, you can run: hagrp -switch
oragrp -to othersystem
Shutting down db machines:
If you shutdown the machine that is running veritas cluster, it will NOT start on the
other machine. It only fails over if the machine crashes. You need to manually switch
the services if you shutdown the machine. To switch processes:
Then shutdown machine as desired. When rebooted will start cluster daemon
automatically.
Doing Maintenance on Admin Network:
If the admin network is brought down (that the veritas cluster uses), veritas WILL
fault both machines AND bring down oracle (nicely). You will need to do the
following to recover:
hastop -all
On ONE machine: hastart
wait 5 minutes
10
On other machine: hastart
Manual start/stop WITHOUT veritas cluster:
If possible, use the section on DB Maintenance. Only use this if system fails on
coming up AND you KNOW that it is due to a db configuration error. If you manually
startup filesystems/oracle -- manually shut them down and restart using hastart
when done.
To startup:
Make sure ONLY rootdg volume group is active on BOTH NODEs. This is EXTREMELY
important as if it is active on both nodes corruption occurs. [ie. oradg or xxoradg is
NOT present]
vxdg list
hastatus (stop on both as you are faulted on both machines )
hastop -all (if either was active make sure you are truly shutdown!)
Once you have confirmed that the oracle datagroup is not active, on ONE machine
do the following:
vxdg import oradg [this may be xxoradg where xx is the client 2 char code]
vxvol -g oradg startall
To shutdown:
umount /mountpoint [foreach mountpoint]
vxdg deport oradg
--------------------------------------------------------------------------------
If any licenses are not valid or expired -- get them FIXED before continuing! All
licenses should say "No expiration". If ANY license has an actual expiration date, the
11
test failed. Permenant licenses do NOT have an expiration date. Non-essential
licenses may be moved -- however, a senior admin should do this.
If your lists do NOT contain both systems, you will probably need to modify them
with commands that follow.
more /etc/VRTSvcs/conf/config/main.cf (See if it is reasonable. It is likely that the
systems aren't fully set up)
haconf -makerw (this lets you write the conf file)
hagrp -modify oragrp SystemList system1 0 system2 1
hagrp -modify oragrp AutoStartList system1 system2
haconf -dump -makero (this makes conf file read only again)
hastatus -summary
If this command could NOT be found, add the following to root's path in /.profile:
vi /.profile
add /opt/VRTSvcs/bin to your PATH variable
If /.profile does not already exist, use this one:
PATH=/usr/bin:/usr/sbin:/usr/ucb:/usr/local/bin:/opt/VRTSvcs/bin:/sbin:$PATH
export PATH
. /.profile
Re-verify command now runs if you changed /.profile:
hastatus -summary
Here is the expected result (your SYSTEMs/GROUPs may vary):
One system should be OFFLINE and one system should be ONLINE ie:
# hastatus -summary
Veritas Product Overview
12
Veritas Cluster Debugging
--------------------------------------------------------------------------------
Veritas Software
Home Page
Veritas Software
Technical Services
--------------------------------------------------------------------------------
Veritas Software
Veritas Clusters:
Shared Data Clusters: Scaleable,...
-- SYSTEM STATE
-- System State Frozen
A e4500a RUNNING 0
A e4500b RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
If your systems do not show the above status, try these debugging steps:
If NO systems are up, run hastart on both systems and run hastatus -summary
again.
If only one system is shown, start other system with hastart. Note: one system
should ALWAYS be OFFLINE for the way we configure systems here. (If we ran
oracle parallel server, this could change -- but currently we run standard oracle
server)
If both systems are up but are OFFLINE and hastart did NOT correct the problem
and oracle filesystems are not running on either system, the cluster needs to be
13
reset. (This happens under strange network situations with GE Access.) [You ran
hastart and that wasn't enough to get full cluster to work.]
Verify that the systems have the following EXACT status (though your machine
names will vary for other customers):
-- SYSTEM STATE
-- System State Frozen
A gedb001 RUNNING 0
A gedb002 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
Recovery Commands:
hastop -all
on one machine hastart
wait a few minutes
on other machine hastart
hastatus -summary (make sure one is OFFLINE && one is ONLINE)
If none of these steps resolved the situation, contact Lorraine or Luke (possibly Russ
Button or Jen Redman if they made it to Veritas Cluster class) or a Veritas
Consultant.
14
First check if group can switch back and forth. On the system that is running
(system1), switch veritas to other system (system2):
hagrp -switch groupname -to system2 [ie: hagrp -switch oragrp -to e4500b]
Watch failover with hastatus -summary. Once it is failed over, switch it back:
hagrp -switch groupname -to system1
ssh system2
/usr/sbin/shutdown -i6 -g0 -y
Make sure that the when the system comes up & is running after the reboot. That is,
when the reboot is finished, the second system should say it is offline using hastatus.
hastatus -summary
Once this is done, hagrp -switch groupname -to system2 and repeat reboot for the
other system
hagrp -switch groupname -to system2
ssh system1
/usr/sbin/shutdown -i6 -g0 -y
Verify that system1 is in cluster once rebooted
hastatus -summary
On system that is online (should be system2), kill off ORACLE LISTENER Process
You need to CLEAR the fault before trying to fail back over.
15
hastatus -summary
Now do the same thing on this system... To do this, we will kill off the listener
process, which should force a failover.
On system that is online (should be system2), kill off ORACLE LISTENER Process
You need to CLEAR the fault before trying to fail back over.
hares -display | grep FAULT for the resource that is failed (in this case, LISTENER)
Clear the fault
hares -clear resource-name -sys faulted-system [ie: hares -clear LISTENER -sys
e4500a]
Run:
hastatus -summary
to make sure everything is okay.
16