Académique Documents
Professionnel Documents
Culture Documents
Growing
1) Take
#
#
#
#
#
2) HBA Information.
# /usr/sbin/hbanyware/hbacmd listhbas | grep Port WWN | tee
hba_wwpn.txt
3) Identify the new LUNS.
# /usr/sbin/hbanyware/hbacmd reset [HBA_WWPN1]
***Verify that the paths through this HBA are online after the HBA
reset before resetting the next HBA***
#powermt display paths
#powermt display dev=all
# /usr/sbin/hbanyware/hbacmd reset [HBA_WWPN2]
***Verify that the paths are online after the HBA reset***
#powermt display paths
#powermt display dev=all
4) Get the new LUNS
# /usr/bin/lsscsi > lsscsi.after
# /sbin/powermt config
# /sbin/powermt save
# /sbin/powermt display dev=all > powermt.after
# /sbin/powermt display dev=all | grep emcpower > emcpower.after
# /usr/sbin/lvmdiskscan > lvmdiskscan.after
# /usr/bin/diff powermt.after powermt.b4
5) Once you get the new LUNS/emcpower devices, next task will be to add
them in the volume group and extend filesystem.
Take the needed outputs before performing the change.
# vgdisplay -v /dev/<vgname> > vgdisplay.b4
# vgs > vgs.b4
# pvs > pvs.b4
# lvs > lvs.b4
# lvdisplay -v /dev/<vgname>/<lvname> > lvdisplay.b4
6) create the PV and add it to volume group
# pvcreate /dev/emcpowerxx ( use the -f option to create in case it
throws error )
# pvs > pvs.after
# vgextend <vgname> /dev/emcpowerxx
# vgdisplay v /dev/<vgname>
7) Extend the logical volume and the filesystem
# lvextend /dev/<vgname>/<lvname> /dev/emcpowerxx ( will extend to
the value of emcpowerxx )
# lvdisplay -v /dev/<vgname>/<lvname>
# resize2fs /dev/<vgname>/<lvname>
# df -h
Oracle RAC and node evictions, RCA
Oracle Node evictions under clusterware were quite a common task in the
past Seems like they have been reduced a lot in 11gR2. However, we still
get issues related to the evictions time to time and nothing concrete
eventually comes out as RCA. I have seen both the oracle 10g as well 11g
clusterware, having these issues on solaris as well Linux.
Most common reasons
*Network disconnect between nodes, causing CRS to reboot. In my
experience, whenever IPMP has timeouts, mostly probe based, oracle
CRS evicts the node.
*Disk timeouts/Path/ (I/O) timeouts, very rare but possible.
Storage/SAN related.
*High resource contention, causing the server to hang, which is
picked up in turn by CRS causing node evictions.
The challenge was always to tune the kernel settings for ORACLE RAC
properly, Ensure OS and ORACLE CRS configuration settings are in
concurrence, configuring kdump/dumpadm settings properly, configure CRS to
dump core instead of only reboot.
Getting oracle RAC logs, OS watcher ( if enabled ) and the coredump and
analyse to arrive at a root cause.
How to find the global zone name from the local zone
# pkgparam -v SUNWcsr | grep From: | grep -v fern | head -1 | cut -d:
-f 5 | awk { print $1}
Use this from Non global zone to get the Global zone name.
6. Make sure that your virtual keyswitch setting is not in the LOCKED
position
sc> showkeyswitch
If the virtual key switch is in LOCKED position you can change that with
the following command:
sc> setkeyswitch -y normal
7. Flash update the downloaded Sun System Firmware image
sc> flashupdate -s 127.0.0.1
127.0.0.1 is the default address for the local host. When the download
process is finished, ILOM displays the message:
Update complete. Reset device to use new software.
8. Type the resetsc command to reset ILOM
sc > resetsc
9. After the reboot, please check the version of the firmware
sc > showhost
10. poweron the server
sc> poweron
11. Check the console and boot the server
sc> console
{ok} boot
cfgadm:::Device being used by VxVM
root@abc>/> cfgadm -c unconfigure c1::dsk/c1t0d0
cfgadm: Component system is busy, try again: failed to offline:
/devices/ssm@0,0/pci@1c,700000/pci@1/SUNW,isptwo@4/sd@0,0
Resource
Information
/dev/dsk/c1t0d0s2
Device being used by VxVM
cfgadm unconfigure command fails here.
The way to resolve this is to disable the disks path from DMP control.
Since there is only one path to this disk, the -f (for force) option
needs to be used:
root@abc>/> vxdmpadm -f disable path=c1t0d0s2
root@abc>/> vxdmpadm getsubpaths ctlr=c1
NAME
STATE[A]
PATH-TYPE[M] DMPNODENAME ENCLR-TYPE
ENCLRNAME
ATTRS
=====================================================================
===========
c1t0d0s2
DISABLED
c1t0d0s2
Disk
Disk
c1t6d0s2
ENABLED(A)
c1t6d0s2
Disk
Disk
2. Create a user named admin, and set the admin account roles to aucro and
the CLI mode to alom
-> create /SP/users/admin role=aucro cli_mode=alom
Creating user
Enter new password: ********
Enter new password again: ********
Created /SP/users/admin
3. Log out of the root account after you have finished creating the admin
account
-> exit
4. Log in to the ALOM CLI shell (indicated by the sc> prompt) from the ILOM
login prompt.
XXXXXXXXXXXXXXXXX login: admin
Password:
Waiting for daemons to initialize
Daemons ready
Integrated Lights Out Manager
Version 3.0.x.x
Copyright 2008 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
sc>
Remember that the ALOM CMT compatibility shell is an ILOM interface
Solaris Patching Error Codes
Here is the list of Solaris Patching Error codes
The complete list:
# 0 No error
# 1 Usage error
# 2 Attempt to apply a patch thats already been applied
# 3 Effective UID is not root
# 4 Attempt to save original files failed
# 5 pkgadd failed
# 6 Patch is obsoleted
# 7 Invalid package directory
# 8 Attempting to patch a package that is not installed
# 9 Cannot access /usr/sbin/pkgadd (client problem)
# 10 Package validation errors
# 11 Error adding patch to root template
# 12 Patch script terminated due to signal
# 13 Symbolic link included in patch
# 14 NOT USED
# 15 The prepatch script had a return code other than 0.
# 16 The postpatch script had a return code other than 0.
# 17 Mismatch of the -d option between a previous patch
# install and the current one.
# 18 Not enough space in the file systems that are targets
# of the patch.
# 19 $SOFTINFO/INST_RELEASE file not found
# 20 A direct instance patch was required but not found
# 21 The required patches have not been installed on the manager
# 22 A progressive instance patch was required but not found
# 23 A restricted patch is already applied to the package
# 24 An incompatible patch is applied
# 25 A required patch is not applied
# 26 The user specified backout data cant be found
Extending Filesystems in Sun Cluster using VxVM
I did an activity to add LUNS to existing database file systems running sun
Cluster 3.1, Solaris 9 OS with VxVM. The steps I followed are :
a) Get the LUN information from Storage and update the sd.conf
accordingly on both the servers
b) Make the LUNS Visible on the server. Run the command on both the nodes
#update_drv -f sd
c) Verify for the LUNS present; # format
d) Configure emcpower devices. Run on both the servers.
# /etc/powermt config
# /etc/powermt save
e) Create the Sun Cluster DID devices. Run on both nodes if required.
# devfsadm
# scgdevs
f) Verify the sun Cluster DID devices. # scdidadm -L
g) Add the disks to veritas diskgroup ; # vxdiskadm
h) Grow the FS
Updating Boot device order
You can update the boot device order using eeprom command.
# eeprom boot-device=vx-rootdisk vx-rootmirror
Found duplicate PV, how to solve them
Today lets see more about LVM. The most common issue is the Found
duplicate PV. How do you solve it?
Well, the message is directly related to the filter you use in
/etc/lvm/lvm.conf.
What ever devices that are included in the filter will have the LVM
metadata checked.
For example a filter like this
filter = [ a/.*/ ]
will have all the devices scanned for LVM metadata.
* When using device-mapper-multipath or other multipath software such as
EMC PowerPath or Hitachi Dynamic Link Manager (HDLM), each path to a
particular logical unit number (LUN) is registered as a different SCSI
device, such as /dev/sdb, /dev/sdc, and so on. The multipath software will
then create a new device that maps to those individual paths, such as
/dev/mapper/mpath1 or /dev/mapper/mpatha for device-mapper-multipath,
/dev/emcpowera for EMC PowerPath, or /dev/sddlmab for Hitachi HDLM. Since
each LUN has multiple device nodes in /dev that point to the same
underlying data, they all contain the same LVM metadata and thus LVM
commands will find the same metadata multiple times and report them as
duplicates.
* This is only a warning message and does not indicate a failure in LVM
operation. Rather, system alerts the administrator that only one of the
device is used as a PV and rest are being ignored.
To avoid this situation, a filter with only the needed devices should be
included.
For example, to allow the internal disks in HP arrays and any EMC Powerpath
devices, the filter would like
filter = [ a|/dev/cciss/.*|, a|/dev/emcpower.*|, r|.*| ]
After applying the filter
# rm -rf /etc/lvm/cache/.cache/
# pvscan
# vgscan
# lvscan
Do a #vgs -vv to cross check as well.
There are scenarios where a working filter may have failed and you may see
duplicate entries which should not be visible
This scenario calls for an investigation:
* Device names and the filter patterns for the devices in lvm.conf
* Any changes made to the lvm.conf and the initrd not recompiled to reflect
the changes.
Rebuild initird RHEL
Leave a reply
whenever there is an update to kernel modules or config files that are
needed at the time of booting, system admin needs to rebuild the initial
ramdisk. The sole purpose of initramfs is to help the rootfs mounted so
that the transition to real rootfs can happen.
RHEL prior to version 6, was using the mkinitrd for the rebuild. However,
with RHEL 6 this has been changed to dracut. Dracut tool is built in such
a way to use only minimal hardcoded info on the initramfs. Dracuts
initramfs depends on udev to create symbolic links to device nodes and when
the rootfss device node appears, it mounts it and switches root to it.
Lets see how to build the initramfs using both the options.
mkinitrd
1. Take a backup copy of the initrd before proceeding to rebuild.
# cp -p /boot/initrd-$(uname -r).img /boot/initrd-$(uname -r).img.bak
Just for scripting or some other remote purpose you may need to be sure if
the zone you are in is a global or a non global zone. We have a straight
forward solution for this. The command pkgcond.
Just check the usage of this command.
root@solaris:~# uname -a
SunOS solaris 5.11 11.1 i86pc i386 i86pc
root@solaris:~# pkgcond
no condition to check specified; usage is:
pkgcond [-nv] [ ]
command options:
-n negate results of condition test
-v verbose output of condition testing
As it states it works on a condition and I assume this was developed for
scripting purpose. Let us see some examples.
root@solaris:~# pkgcond is_global_zone
root@solaris:~# echo $?
0
root@solaris:~#
The condition echo $? is 0 indicating it is true. The zone from where I
executed this command was a global zone. Let us try the same for Non global
zone.
root@solaris:~# pkgcond is_nonglobal_zone
root@solaris:~# echo $?
1
root@solaris:~#
It returns false, so it is not a NON global zone. Thats it.
Wait, let me get this complete. The command pkgcond can be used to check
lot of other conditions. Here you go, all 0 are true and all 1 are false.
Just Enjoy..
root@solaris:~# pkgcond is_what
can_add_driver=1
can_remove_driver=1
can_update_driver=1
is_alternative_root=1
is_boot_environment=1
is_diskless_client=1
is_global_zone=0
is_mounted_miniroot=1
is_netinstall_image=1
is_nonglobal_zone=1
is_path_writable=0
is_running_system=0
root@solaris:~#
PS : As you can see I have used a solaris 11 box, in solaris 10 you can
also check to see if the zone is whole root or sparse root.
GAB and LLT Basics
Leave a reply
Today we will see something on Veritas Cluster. Strangely I never felt
like blogging about VCS and this is my first post on VCS. We will see some
more interesting and in-depth VCS stuff also going forward.
In a cluster, configuration is shared between nodes in the cluster. There
is a need for communication between nodes in the cluster to share the
configuration as well the changes that happen to the nodes. LLT and GAB is
used by VCS precisely for this reason.
LLT
* LLT ( low latency protocol ) operates in the layer 2 of the network
stack.
* Provides fast reliable cluster communication between Kernels/OS.
* Responsible for heartbeats between nodes.
* Important files /etc/llttab /etc/llthosts
GAB
* Is loaded as a kernel module on each cluster node.
* Maintains cluster membership by receiving input on the status of the
heartbeat from LLT. When a system no longer receives heartbeats from a
peer, it marks the peer as DOWN and excludes the peer from the cluster.
* Guaranteed delivery of point-to-point and broadcast messages to all nodes
within the cluster.
* Important file /etc/gabtab
This entry was posted in Cluster and tagged Veritas Cluster on July 26,
2013 by krishsubramanian.
Learning Python
Leave a reply
If you guys are interested to learn python online, you have some very
useful resourcessome of them i found interesting were
http://thenewboston.org/list.php?cat=36
https://www.udacity.com/course/cs101
http://www.codecademy.com/tracks/python
https://www.khanacademy.org/science/computer-science
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/600sc-introduction-to-computer-science-and-programming-spring-2011/
Please let the community know via comments if you know any additional
websites/videos to learn python
Cheers
This entry was posted in programming and tagged Python on July 20, 2013 by
krishsubramanian.
LDOM Basics part one
Leave a reply
It has been some time since I wrote something. Hope to blog more
frequently Today let us see about Logical Domains (LDOMs)
Logical domains recently renamed as Oracle VM server is a server
virtualization technology from Oracle Corporation. There are three
important components to LDOMs. They are
A) Hypervisor
B) Solaris OE
C) Resources like CPU, Memory, network and disks.
Physical hardware is separated in to logical domains each having its own
Solaris OE, CPU, Memory, OBP, Console, I/O components. Each domain is
independent of each other and can be separately rebooted, patched and
upgraded.
LDOM manager is a piece of software that is used to create and manage
logical domains, it is also used to map logical domains to physical
resources.
Domains Explained
Logical domains are classified in to
a) control domain :
LDOM manager runs in this domain, so enables to
create and
manage LDOMs. You can have only one control domain in the
physical server. This is the first domain that is created when you install
LDOM software. Control domain is named primary.
b) Service domain : Provides virtual device services to other domains such
as virtual switch, virtual console, virtual disk server. Any domain can be
configured as a Service domain.
c) I/O domain : Has direct access to physical I/O devices, such as
network card in a PCIe controller. I/O domain can own a PCIe slot or onboard PCIe device. I/O domain can share physical I/O devices with other
domains in the form of virtual devices when the I/O domain is also used as
a service domain.
d) Root domain : Owns a PCIe root complex, PCIe fabric & provides all
fabric relaed services such as fabric error handling.
e) Guest domain
: Non I/O domain consumes virtual device services that
are provided by other service domains. Has no physical I/O devices, only
virtual devices.
LDOM Daemons
Ldmd => Logical domain daemon
VNTSD => virtual network terminal server daemon.