Vous êtes sur la page 1sur 9

*** Sendmail problem: Mail is getting stuck on /var/spool/mqueue (Steve)

1. Find out what version of sendmail is running by typing sendmail -d0.1


2. Look at the version by running the above command. If sendmail version 8.9.x then edit /etc/sendmail.cf and if
version 8.11.x then edit /etc/mail/sendmail.cf
3. Search for DS in sendmail.cf then add bracket [ ], for example: DS[d02smtp01.southbury.ibm.com] and by adding
the bracket you are forcing to look at CNAME instead of MX record name
5. Then kill the sendmail daemon kill -9 sendmail_pid (you can find senmail_pid by typing ps -ef | grep sendmail)
6. Then start the sendmail daemon /usr/lib/sendmail -q15m (If it is sendmail cleint) or /usr/lib/sendmail -bd -q15m
(If it is sendmail server)
7. You will see all the mail queues are getting cleared from /var/spool/mqueue directory after few minutes or just
run the command: #mailq

*** After you replace the hardware and if you see same error messages are coming to GWA root, then
do the followings: (Steve)

1. cd /usr/bin
2. cp errpt -a > /tmp/errpt_mm_dd_yyyyy
3. run: errclear 0
4. run: /usr/lib/errstop
5. run: rm /var/adm/ras/errlog
6. run: /usr/lib/errdemon
7. Perform log repair action of the hardware that has been replaced by running diag --> task selection --> log repair
action --> chose the device that has been replaced and press F7 to cmmit and then exit

*** If wtmp file is very large and you want compress the wtmp file, please follow the instruction below.
(Steve)

1. run: stopsrc -s syslogd


2. compress -v wtmp
3. you will see the wtmp.Z filecreated
4. rename this file by running: mv wtmp.Z /tmp/wtmp_mm_dd_yyyyy.Z
5. run: touch wtmp (to create a new wtmp file)
5a. chown adm wtmp
chgrp adm wtmp
chmod 664 wtmp
6. run startsrc -s syslogd

Answered by Steve:
If you still see % did not go down, then run the following command to see the unused processes, for example to see
unused processes in /var filesystem:
d02http003:/# fuser -dV /var

/var:
inode=4123 size=9567989 fd=6 13938
inode=6175 size=19190691 fd=4 29172

Then verify the name of these id, by running:


#ps -deaf | grep 29172
root 29172 8780 0 Dec 06 - 144:09 /usr/lpp/ssp/bin/pmand

You can kill that unused id and it will restart it automatically but make sure you verify that it is restarted

*** How to replace a disk in "rootvg" volume group (Wahid)

Before you remove the logical volume please run the following commands:

A. sysdumpdev -l (to find out what primary dumpdevice system is using)

B. sysdumpdev -l may look like the following: primary /dev/lv00


secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump TRUE
dump compression OFF

C. If this is the case redirect the sysdump primary device to sysdumpnull: sysdumpdev -Pp /dev/sysdumpnull
and verify the change by running again sysdumpdev -l

1. Run lsvg -l rootvg and make and if you 2 mirror copy for both disks which means it has mirror copies. If you see 1
in any of the mirror copy then try to copy that lv/fs to the good disk. If the disk content is mirrored, break the mirror
off of the failed disk (rmlvcopy lvname hdisk# or simply "smitty rmlvcopy"). If the disk contains lv(s) that are
not mirrored and data is lost due to disk failure, open a sev1 and engage the owner of the data
(application, database, etc.), a duty manager, inform SDM, AVM and PM.

2. Once the disk contains no lv, run: reducevg rootvg hdisk# (if the disk contains lv(s) that cannot be deleted
due to the state of the disk, run: reducevg -df rootvg hdisk#; this command will remove all data on the
failed disk and remove the failed disk from the vg)

3. Issue the following commands after removing the bad disk from the volume group:

migratepv -l hd5 SourceDisk DestinationDisk


bosboot -ad /dev/hdisk# ====> to create the boot image and device, hdisk# is the
good one exists currently with active vg
bootlist -m normal hdisk# ====> to alter the boot device list , hdisk# is the good
one exists currently with active vg
use: bootlist -m normal -o ====> to verify bootlist updated successfully

lastly, issue: lsvg rootvg and verify that QUORUM is set to 1

4. rmdev -dl hdisk#

5. Run: shutdown (if the disk is in a server which facilitates internal disk replacement without shutting down the
server, do not shutdown server. If you don't know if the server has this facility, please ask the CE, they
can confirm)

6. Have the CE replace the drive

7. Reboot the server if it was shut down. (if the server was not shut down, simply run: cfgmgr)

8.Run: lspv => this command should display the new disk with "none" and "none"

9. Run diagnostics on the new disk and certify it.

10. Run: extendvg rootvg hdisk#

11. mklv -y lv00 -t sysdump rootvg 32 hdisk# (here hdisk# is the newly replaced hard drive)

12. sysdumpdev -Pp /dev/lv00

13. Run: mirrorvg rootvg hdisk# (if only a part of the rootvg was mirrored to this disk, as oppoesed to the
whole rootvg, use "smitty mklvcopy" fast path to mirror only those logical volumes onto the new disk)

14. Once the mirroring is complete, run: lsvg -l rootvg => this should show the lv's in "stale" state

15. Run: syncvg -v rootvg

16. Run: bosboot -ad /dev/hdisk# => this command creates a boot image on the new disk

17. Run: bootlist -m normal hdisk# hdisk# => include all disks in rootvg sepeated by aspace

18. Run: bootlist -m normal -o => this will display the disks that are capable of booting the server

*** How to replace a shared external disk in an HACMP Clustered Environment (Wahid)

NOTE: Assume that the failed disk is in a shared volume group which is part of an HACMP Cluster. For
simplicity, we're assuming 2 nodes in the HACMP Cluster, Node A and Node B; shared Volume Group is
"sharedvg" and the failed disk is pdisk10 and hdisk12 on both nodes (please keep in mind that the
failed disk can be defined differently in each node; i.e. hdisk# and pdisk# on one node can be different
on the other node because of the individual node's hardware configuration. If the hdisk# and pdisk#
are different on each node, one has to identify the disks by their serial numbers and/or slot number in
the DASD, need less to mention the analysis of error log!). This is a complex task and you should do
this only if you are comfortable running the series of commands, otherwise please seek help as
mistakes in the process can be unpleasant!!

Steps to run on Node A:

1. lsvg -l sharedvg => this will display all lv's in the vg and will tell if they are mirrored
2. lspv -l hdisk12 => this will show which logical volumes/filesystems reside on hdisk12
3. "smitty rmlvcopy" => remove the copies of the lv's found in step 1 from the failed disk; If the disk state doesn't
allow removal of lv's from the disk, don't worry, step 5 will take care of the problem.
4. lspv -l hdisk12 => this should show no lv's in the output (provided that no error was generated from step 3)
5. reducevg -df sharedvg hdisk12
6. rmdev -dl hdisk12
7. rmdev -dl pdisk10
8. cfgmgr -vl ssar
9. lsdev -Cc pdisk => you should see the new pdisk10
10. ssaxlate -l pdisk10 => will display the associated hdisk# (ideally it should still be hdisk12, but can be different)
11. run diag and certify the new pdisk found in step 9
12. extendvg sharedvg hdisk12
13. "smitty mklvcopy" => using this fast path, re-mirror the lv's that resided on the failed disk
14. chvg -u sharedvg =>this command enables the vg to be accessible simultaneously from the other
node as well. It can be fatal if any commands run on any filesystems or files of sharedvg on the other
node (NodeB).

Steps to run on Node B:

1. exportvg sharedvg
2. rmdev -dl hdisk12
3. rmdev -dl pdisk10
4. cfgmgr -vl ssar
5. lsdev -Cc pdisk => this should display the new pdisk10
6. ssaxlate -l pdisk10 => this should display hdisk12 (but can be different)
7. importvg -y sharedvg hdisk12
8. varyoffvg sharedvg

Steps to run on Node A:

1. varyonvg sharedvg (this will disable the accessibility of this vg from Node B which was enabled in
Step14 above)

*** Problems with file sizes over 1GB (Paul)

If a user is having a problem creating a file over 1GB, you may want to check /etc/security/limits for that certain user
ID.
smitty users
Change / Show Characteristics of a User
type in user name (ie: zmurrayp)
* User NAME [zmurrayp] +
scroll down to
Soft FILE size
and you can set a higher limit or set to unlimited (-1)

*** How to find pdisk serial number (Steve)

There are several ways you can find pdisk serial number:

1. run diag --> Task Selection --> SSA Service Aids --> Link Verification --> choose ssa0/ssa1--> pdisk# serial
number displayed
2. run diag --> Task Selection --> SSA Service Aids --> Configuration Verification --> pdisk# serial number
displayed (If you find ???? on pdisk serial number in link verification, try this option)
3. run diag --> Task Selection --> SSA Service Aids --> Certify Disk --> pdisk# serial number displayed (Do not
certify the disk unless you have to)
4. run smitty ssaraid --> Change/Show use of an SSA Physical Disk --> pdisk# serial number displayed (If raid is
configured, you can try this option)

*** Lower down usage high of paging space without reboot server (on aix 5.0 above ) (Jessie)

1) Create a new same sized paging space with the old one, and when using smitty pgsp , select:
Start using this paging space NOW? yes
Use this paging space each time the system is RESTARTED? yes

2) Then deactivate the original paging space. It's better to use command, for example, if /dev/hd6 is the old paging
space, then
# nohup swapoff /dev/hd6 > /tmp/swapoff.log 2>&1 &

3) Before step 2 finish, check paging space usage, by:


# lsps -a
you will find the usage of new paging space growing gradually while the old one decreasing gradually.

4) Without waiting step 2 finish, change the original paging space (hd6) to
Use this paging space each time the system is RESTARTED? no

5) After it finishes, Activate the original paging space (hd6)

6) (Optional) Remove the original paging space. (leave it if you want to swap paging space next time)

OR

1) Create a new same sized paging space with the old one, and when using smitty pgsp , select:
Start using this paging space NOW? yes
Use this paging space each time the system is RESTARTED? yes

2) Deactive the orginal paging space using smitty command and wait for an hour

3) Recative the original paging space

4) Remove the paging space you have created

*** How to send snap to hardware support (Steve)

a. snap –r (To remove previous snap)


b. snap –gc (To build new snap)
c. cd d02app010:/tmp/ibmsupt#
d. mv snap.pax.Z 18436.snap.pax.Z
e. ftp testcase.software.ibm.com
f. Username: anounymous
g. Password: your_email_address
h. cd /aix/toibm (for software support they use this directory) or cd hardware/toibm/machin_model (for hardware
support, they use this directory)
j. bin
k. put 18436.snap.pax.Z

OR

In case the above instructions don't work and if you are unable to ftp from server then you have to copy the file to
you local hard drive and ftp from your local machine. Here is the instruction below for doing this:
a. snap –r (To remove previous snap)
b. snap –gc (To build new snap)
c. cd d02app010:/tmp/ibmsupt#
d. mv snap.pax.Z 18436.snap.pax.Z
e. cp 18436.snap.pax.Z /tmp
f. cd /tmp and chmod 777 18436.snap.pax.Z /tmp
g. use WinSCP3 to copy snap file to your local c:/temp directory
h. open a windows command prompt and change directory to c:\temp in command prompt
|i. run: c:\temp>ftp testcase.boulder.ibm.com
j. username: anonymous and password: your email address
k. cd /aix/toibm (for software support they use this directory) or cd /hardware/toibm/machin_model (for hardware
support, they use this directory)
l. ftp>bin
m. put 18436.snap.pax.Z

*** How to Replace a Fibre Channel Host Bus Adapter (HBA): Wahid Ullah

RECOMMENDED Steps for Replacing an AIX Fibre Channel Host Bus Adapter (HBA)
(requires to stop applications, unmount f/s’, varyoff vg’s)
Note: The following steps can be used to replace an AIX fibre channel host bus adapter (6227, 6228, or 6239) that is
attached to a 2105 Enterprise Storage Server. The steps for replacing an HBA attached to FAStT and OEM disk
subsystems may be slightly different.

Note: The hdisk definitions cannot be removed unless the filesystems are unmounted and the volume groups are varied
off first, regardless of whether or not SDD is being used.

Procedure for the ESS:

1. Stop all I/O to the ESS devices by doing the following


- Stop any applications currently running on ESS devices
- umount /mnt_point -> Unmount all filesystems that reside on 2105 devices
- varyoffvg vgname -> varyoff all volume groups that contain 2105 devices
- Note: The hdisk definitions cannot be removed unless the filesystems are
unmounted and the volume groups are varied off first, regardless of whether or not
SDD is being used.

2. Remove the adapter definition and all hdisks associated with that path
# rmdev -dl fcs# -R -> where fcs# is the adapter that is being replaced

3. Use the appropriate hardware procedures for physically replacing he host bus adapter

4. Configure the new adapter definition


# cfgmgr

5. Find the World Wide Port Number (WWPN) of the new adapter

# lsdev -Cc adapter -> look for the new device definition (fcs#)

# lscfg -vl fcs# -> Write down the value next to the Network Address field

Note: Step #6 and #7do not apply to us since in GWA we (AIX team) do not change the wwpn. Just note the wwpn
from step 5 and give this to SAN team and they will change the wwpn and make changes to the SAN switch.

6. Change the wwpn of the host nickname on the ESS Specialist so that the ESS volumes
will be presented to the new adapter.

Bring up the ESS Specialist by placing an ESS cluster hostname into a web browser
Click Storage Allocation
Click Open System Storage
Click Modify Host Systems
Click on the host nickname that corresponds to the replaced adapter
Change the World-Wide Port Name to the value that was listed in the lscfg output
above.
Click Modify
Click Perform Configuration Update
7. If the replaced HBA is attached to a Fibre Channel Switch, configuration changes on
the SAN may be necessary. If soft zoning is being used, the zoning configuration
MUST be updated to reflect the wwpn of the new adapter. If hard zoning (port
zoning) is used, no zoning changes are necessary.

8. Configure the new hdisk definitions

# cfgmgr (may have to run more than once)

9. If using Subsystem Device Driver, bring the new path into the SDD configuration

# addpaths

10. If using Subsystem Device Driver, ensure that failover protection is being used

# lsvpcfg -> ensure that the pvids and volume groups are listed next to vpaths only

If failover protection was lost during the reconfiguration (pvids show up on hdisks),
run the following for each volume group:

# dpovgfix VGName

11. Ensure that the Host Bus Adapter microcode levels are consistent between the new
adapter and any other adapters in the system

# lscfg -vl fcs# -> The ROS level of each fcs# device should match

12. Ensure that all adapters have the latest microcode by going to the following
download site and following the procedures for updating to the latest microcode
level:

http://techsupport.services.ibm.com/server/mdownload2/download.html

13. Continue normal I/O Operations on ESS Volumes

# varyonvg VGName -> varyon Volume groups

# mount /mnt_point -> mount filesystems

# Start any Applications that use ESS Volumes

***Create large file enabled /tempfs (Steve)


Here is a help file I use to delete /tempfs and recreate it large file enabled:

Check no processes are using /tempfs:


fuser -xcV /tempfs
fuser -dV /tempfs

tar -cvf /tmp/tempfs.tar /tempfs

lsvg -l rootvg # Check the number of PP's of /tempfs and if mirrored


umount /tempfs
rmfs /tempfs
mklv -c 2 -y tempfslv rootvg 48 # 48 is #PP's(for example) -c 2 = mirrored
crfs -v jfs -d /dev/tempfslv -A yes -a "bf=true" -m /tempfs;mount /tempfs
tar -xvf /tmp/tempfs.tar
rm /tmp/tempfs.tar

...........................