Académique Documents
Professionnel Documents
Culture Documents
BACKBONE
ALARMS - GUIDE
Version: 2.0
Prashanth Burugula
Alarm->MaintObject_CMG_EventID_14
Alarm->MaintObject_CMG_EventID_16
CMG-19
CMG-34
CMG-21/cmgIccMissing
CMG-22/cmgIccAutoReset
CMG-23
CMG-35/VoIPOccFault
CMG-36/VoIPStats AppFault Trap
CMG-25/26
CMG-47/48/49
CMG-50/51/52
7
8
9
9
9
9
10
12
12
12
13
13
Platform Alarms
SME
14
Tripwire
15
GW_ENV_EventID_10
17
DAL2/DAL1/DAJ
18
19
FILESYNC ALARMS
21
ARBITER ALARMS/A_EventID_X
23
BKP_EventID_10
24
WD_EventID_22
26
WD_EventID_26
27
PE HEALTHCHECK/PE_EventID_1
28
Malformed_INADS
29
LOGIN_EventID_x
30
USB1_EventID_X
30
UPD_EventID_X
31
UPG_EventID_x
32
UPS_EventID_X
32
STD_EventID_X
33
ENV_EventID_x
34
SVC_MON_EventID_X
34
36
PKT-BUS
36
G3_Cabinet-Down/G3_CircuitPack-Down
36
SYS-LINK
38
TONE-BD
40
ETR-PT
40
CLAN-BD
42
IPMEDPRO
43
MEDPROPT
44
VAL-PT
45
VAL-BD
45
SNI-BD
47
SNI-PEER
47
SN-CONF
48
SNC-LINK/SNC-BD/SNC-REF
48
EXP-INTF
49
EXP-PN
50
FIBER-LINK
51
DS1C-BD
52
TDM-BUS
53
POW-SUP
54
55
PS-RGEN/RING-GEN
56
NR-CONN
57
58
ESS_LOCATION_C000
61
ESS_EventID_1
62
ESS_EventID_2
62
ESS_EventID_3
62
ESS_EventID_4
62
ESS_EventID_5
63
ESS_EventID_6
63
64
ASAI-PT/BD
65
ADJ-IP/AESV-SES/ASAI-IP
66
67
UDS1-BD/MG-DS1
68
MG-ANA
70
ANL-BD
71
BRI-BD /MG-BRI/TBRI-BD
71
BRI-PT/TBRI-PT
73
CO-TRK
74
ISDN-SGR / ISDN-TRK
75
H323-SGR
76
77
HARD DISK,S,77
78
Switch Alarms
PowerSupply_Fault
79
pethPsePortOnOffNotification
80
Interface_Fault_MIB2
80
ExceededMaximumUptime
80
HighErrorRate
80
81
Alarm Description:
If the GTWY is not registered (in case if you see n for Registered? option) then try to ping the ip address of the
GTWY to see if it is pingable from the main server. If not pingable, contact the customer and check for any network
outage or for any scheduled power outage at the site.
If Pingable:
Do traceroute <ip address of MED GTWY> from the main server. If any errors or shows any * values, it
has issues.
Login to the MED GTWY from ssh init@<ip address of MED GTWY>
Do show event-log to check for the issues with the time stamp of the alarm occurred.
Just FYI.. In my case, there was some network fluctuation for which the H.248 link went down, Customer was able
to find it and once the network is back. The H.248 link went up. The below screenshot from Event log shows the
link is up which cleared the alarm and proceeded for the case closure.
Network Flap Scenario where the H248 link goes down, the logs would appear as below:
Alarm->MaintObject_CMG_EventID_14
cmgSyncSignalFault / cmgSyncSignalClear
Understanding: If the Avaya G700 Media Gateway contains an MM710 T1/E1 Media Module, it is usually
advisable to set the MM710 up as the primary synchronization source for the G700. In so doing, clock sync signals
from the Central Office (CO) are used by the MM710 to synchronize all operations of the G700. If no MM710 is
present, it is not necessary to set synchronization. If neither primary nor secondary sources are identified, then the
local clock becomes Active. By setting the clock source to primary, normal failover will occur. Setting the source
to secondary overrides normal failover, generates a trap, and asserts a fault.
Probable cause: Alarm can be reported either due:
LAN issue / Power outage at the site OR
ESS/LSP server reloads after getting translations from main server OR
ESS/LSP is down due to bad health
Solution:
almdisplay v / almdisplay res|more (Display resolved alarms in page wise, use spacebar to go to next
page)
show faults (verify for any sync faults/DS1 board faults, if present)
show sync timing (check for any errors for synchronization, if present )
show events (check for loss of signal or signal fault clear statement in logs)
test board < DS1 board location>
status trunk X (check the status whether trunk is in service or out-of service)
list measurement ds1 log <DS1 board location> (to check for any slip errors, if trunk is in service)
Alarm Description:
Alarm->MaintObject_CMG_EventID_16
Understanding: This trap indicates that any of the media-module (may be voip-module) have undergone a
change. Change could be either new media-module has been inserted or reseated or busyout-released or
any configuration file has been uploaded or firmware has been downloaded
Solution:
almdisplay v / almdisplay res |more
show faults (login to media-gateway and check for any active faults)
show event-log (check logs for the exact event happened on the media-module)
list configuration board <board location> (check for the board detection)
test board <board location> (check for all test are getting passed for the board)
Probable Cause: The common reason for the alarm to get reported is administration work on any of
the media-module:
firmware update activity on media-module OR
Bad health of the media-module because of which it was resetted/reseated
Alarm Description:
Alarm->MaintObject_CMG_EventID_19/ 34
Description: Alarm indicates an attempt to download a software module (cmg 19) or an attempt to upload a
configuration file (cmg 34) has failed
Procedure:
almdisplay v / almdisplay res |more
show faults (login to media-gateway and check for any active faults)
show event-log (check logs for the exact event happened on the media-module)
list configuration board <board location> (check for the board detection)
test board <board location> (check for all test are getting passed for the board)
Try downloading the software for module (for cmg19 alarm) / uploading the configuration file (for cmg
34 alarm), if this fails then follow below step with required Customers permission
reset the board (ie busyout-releasing the board) followed by reseat of the board and then if
required replacing the board
Probable Cause: The most common reason is failure of an update activity
Alarm Description:
Alarm->MaintObject_CMG_EventID_21/22
Alarm->cmgIccMissing /cmgIccAutoReset
Understanding: Alarm indicates that an ICC, expected in Slot 1, is either missing/present (cmg21) and/or that
the Media Gateway automatically reset the ICC (cmg 22).
Procedure:
almdisplay v / almdisplay res |more
restartcause (to check who has initialised server)
list survivable-processor (to check time of saved translation file incase ICC is an LSP)
show event-log (check logs of media-gateway for finding the exact event happened)
* If CM version is 5.2.x check for PCN 1690Pu.*
status clan-port (port locationnote 17th is the required Ethernet port here)
display errors
If Ethernet-link is down, inform Customer and ask to check the physical lan connectivity to the
Clan board. If no issues found in connectivity then, with the required permission reset the Clan
board (ie busyout-release the board) followed by reseat and then replacing the board, if required.
Probable Cause: The alarm was reported maybe due to:
LAN Issue OR
Bad health of the Clan
B. If the first controller is ICC
almdisplay v / almdisplay res |more
restartcause (to check how and when ICC was rebooted)
statapp (check whether all services are up and fine)
If ICC is not accessible/down, inform Customer and ask to reseat the ICC board. If still ICC
doesnt come up try replacing the board.
Probable Cause: The alarm may get reported due to:
Lan Issue /Power outage issue OR
Bad health of S8300 main server or it was rebooted.
Alarm Description:
Alarm->MaintObject_CMG_EventID_35 /36
Alarm->cmgVoipOccFault / VoIPStats AppFault Trap
Description: One or more or all of the VoIP engines in the media gateway is over is its occupancy threshold
(Channels In Use/Total Channels(cmg 35))/below its occupancy threshold.(ie occupancy is back below
threshold value after exceeding it (cmg 36))
Procedure:
show faults
show voip-parameters
show event-log (confirm whether occupancy is back to normal after exceeding the threshold
value)
Typically no other action is required here.
Probable Cause: The most common cause is voip occupancy exceeds its threshold value.
Alarm Description:
Alarm->MaintObject_CMG_EventID_25/26/47-52
Description: Telephone services on a Media Gateway are controlled by a Media Gateway Controller (MGC).
All media-gateways integrate seamlessly with Avaya Media Server. For MGC to control media-gateways, later
needs to be registered with the media servers. If S87xx is the primary controller, then MG has to register with
a Clan board. For S85xx, MG registers either with Clan board / Ethernet Processor port , if enabled. And for
S8300, it registers with Ethernet Processor Port. The Alarms simply means that Media-Gateway is not registered to
its controller.
Procedure:
almdisplay v / almdisplay res |more
cd /var/log/ecs
grep -R MG <file name> (to identify the Media-gateway, if alarm is in resolved state)
ping <ip-address of Media-gateway> (check whether MG is reachable) from main server)
display media-gateway X (If MG is pingable but not registered check for recovery rule)
Platform Alarms
Alarm Description:
Alarm->MaintObject_SME_EventID_1
Description: The Server Maintenance Engine (SME) is a Linux process which provides error analysis,
periodic testing, and demand testing for the server.SME means that alarms are not being reported by the
other Server in duplex configuration, due to failure of either the GMM or administered reporting mechanism.
Procedure:
testinads / testcustalm
If testinads or testcustalm replies affirmative then the cause due to which alarm was
reported, no longer exist.
statapp (check whether all required processes are up)
b)
init@pacehqs8720b> kill -9 5303
c) init@pacehqs8720b> ps -ef |grep alm
root 26897 3790 0 07:27 ? 00:00:00 /opt/ws/almindsagt
root 27044 26878 0 07:28 pts/0 00:00:00 grep alm
Note*: almindsagt process automatically restarts after it gets killed
testinads
stop s SME and stop s MVSubAgent followed by
start s SME and start s MVSubAgent
if testinads is getting failed , then we need to restart the sme and mvsubagent
process that sends trap from the server but with customers permission.
testinads/testcustalm
if testinads fails then need to warm reboot (stop a followed by start -a)followed by cold reboot
(reboot) if required but with customers permission.
logger -t svc_mon[2343] atd could not be restarted
try to run a false alarm and check whether it gets reported to Remedy
once alarm is resolved then, almclear a
Probable Causes: The most common cause is that one of the duplex server could not call out an alarm and the
other server calls this alarm to inform Administrator of that. This could be either due to:
GMM failure OR
Failure of the sub-process which are essential to administered reporting mechanism such as sme or
mvsubagent etc process OR
Any scheduled activity at the Customer site may also cause affect the reporting mechanism.
Alarm Description:
Alarm->MaintObject_TRIPWIRE_EventID_7
Description: Tripwire is an intrusion detection system (IDS), which, constantly and automatically, keeps
your critical system files and reports under control if they have been destroyed or modified by a cracker (or by
mistake). It allows the system administrator to know immediately what was compromised and fix it. The first time
Tripwire is run it stores checksums, exact sizes and other data of all the selected files in a database. The successive
runs check whether every file still matches the information in the database and report all changes.
Procedure:
Probable Cause: When any of the critical system files and reports is changed or modified, we get this alarm.
Alarm Description:
Alarm->MaintObject_GW_ENV_EventID_10
Description: This environment alarm is raised in case of power supply faults with the gateway
Procedure: Login to Media Gateway and shoot following commands.
show faults
show platform
show voltage
show event-log
show system
Alarm Description:
Alarm->MaintObject_DAL2/DAL1/DAJ1
Understanding: This MO supports each S8700 media servers Duplication Memory board, a NIC
(Network interface card) serving as the physical and data-link interface for an Ethernet-based
Duplication link between the servers. This link provides a Call-status data path for sending:
TCP-based communication between each servers Process Manager
UDP-based communication between each servers Arbiter to:
Enable arbitration between the active and standby servers
Provide status signaling for memory refreshes
Procedure:
almdisplay v / almdisplay res |more
server (check for server, if it is in curbs in mode and status of standby shadowing & duplication
link)
testdupboard
(Note: If a cable has become unplugged from either of the DAJ1 boards both boards will test ok. The dup
link will show down/not refreshed but both DAJ1 boards will test ok.)
restartcause (if alarm is in resolved state and output of step 2 and 3 are fine)
testdupboard -t localloop (if the standby server is in busy-out state then, only on standby server)
reboot (If test continues to fail with Customers permission)
If test continues to fail then replace the DAL/DAJ card
Probable Cause: The alarm may be due to:
bad health of any of the duplication board OR
duplication link got refreshed because of periodic/scheduled maintenance activity OR
CM server got reloaded because of save translational activity OR
Server got rebooted/reloaded
Alarm Description:
Alarm->MaintObject_DUP_EventID_X
Understanding: The Duplication Link is a 10/100BaseT Ethernet link which is used by the Duplication
Manager (ndm) Process to communicate with the other servers ndm process. The Duplication Manager
process (via coordination of the Arbiter process) runs on each S8700 Multi-Connect server to control data
shadowing between them. Meanwhile, at the physical and data-link layers, an Ethernet duplication link
Provides a TCP communication path between each servers Duplication Manager to enable their control of data
shadowing. The dupmgr is responsible for monitoring the status of this link. It raises a major alarm in the event
that the Duplication Link is non-functional, by logging an entry into syslog that the Global Maintenance Monitor
(GMM) uses to report alarms.
Procedure:
For Duplication Link for S87xxx server
testdupboard
cd /var/log/ecs (do ls ltr to check the log file with the latest date tag .Eg: 2014-0203-070101.log)
cat <file name> (check for logs , when duplication link went down or refreshed. Was there any scheduled /
periodic maintenance running at that time or some other activity which may affect functioning of
duplication link )
restartcause (to check if CM on any of the server was reloaded or there was a server interchange)
Probable Cause: The alarm is reported, if Dup-Link is not-functional maybe due to:
Alarm Description:
Alarm->MaintObject_FSY_EventID_X
Description: When multiple servers (i.e. processors) are present in a network, the active server shares
configuration information (translations) with all the other servers (standby server and LSP/ESS servers) so that in
the event of failure, a surviving processor can take over and have the latest information.Sharing occurs in a process
known as file synchronization (filesync) and can happen once per day or whenever the translation file is changed.
The system must be operated in a manner, and the network connectivity designed, to accommodate this activity.
Procedure:
For ESS/LSP server (Note: Click for Dup-FSY alarm)
almdisplay v / almdisplay res |more
filesync -Q all (check the status of File Synchronisation)
statapp (check for filesync process is running on both main and ess/lsp server)
filesync -w -a ess <IP aaddress of ess> trans (in case of an ess server)
(Check whether manual push is successful or else check for the error-reason code)
list survivable-processors (check the connectivity of ESS/LSP with main server and whether
translation were Saved on ESS/LSP because CM reloads on ESS/LSP after getting the translation file and in
the event alarm is reported on Main server)
restartcause
Incase alarm is active and above mentioned steps doesnt Identify the cause then:
date
(shoot this command on both main server as well as lsp/ess because time mismatch
could be the cause)
ip_fw -q -s 21874/tcp service (check whether tcp ports defined for filesync are open in both directions,on
each server)
cat /etc/sysconfig/network-scripts/ifcfg-ethX
(check whether Ethernet ports are locked to 100MBps-Full Duplex on each server and ethX is an
etherent port defined for Customer lan)
/sbin/ifconfig ethX
(check whether Ethernet port is seeing errors and ethx is the Ethernet port defined
for customer lan)
Alarm Description:
Alarm->MaintObject_A_EventID_X
Description:
Alarm indicates malfunctioning of Arbiter Process, used in duplex server to determine the
Health of the server.The Arbiter process runs on each S87xx server to:
Decide which server is healthier (more able to be active)
Coordinate data shadowing between them (under the Duplication Managers control).
Meanwhile, at the physical and data-link layers, an Ethernet-based duplication link Provides an inter-arbiter
UDP communication path to:
Enable this arbitration between the active and standby servers
Provide the necessary status signaling for memory refreshes
Procedure:
Need to follow DUP Alarm and then ...
server
(If output indicates corrupt failed then inform Customer and with the desired permission restart Arbiter process
executing following Commands):
stop SF -s arbiter
start -s arbiter
server c
cd /var/log/ecs
grep -R Arbiter <filename>
verify host name and corresponding ip-address are identical in host file and configuration file:a) more /etc/host
b) more /etc/opt/ecs/servers.conf
c) ifconfig a ( verify that ip-address matches with host and configuration file and all
Ethernet ports have ip-address assigned.)
d)/sbin/arp a ( verify MAC address is complete)
verify whether the alarm is still active on port using following command
netstat -a | grep -R 1332
Alarm->MaintObject_BKP_EventID_10
Understanding: Backups are designed to preserve off server copies of translations, configuration files,
Security files, logs, and other important information. The backup command is used for both backup
and restore of data sets. Above alarm is reported when Scheduled backup has failed.
Procedure:
A Backup is being failed on FTP server
almdisplay v / almdisplay res |more
sudo backup t |more (gives history of successful and failed backups
ping <IP address> of ftp server
traceroute <IP address of ftp server>
If ftp server is not pingable then check from which hop it is failing and ask customer to check the network integrity.
If ftp server is pingable then take manual backup:
a)from sroot cd /etc/cron.d ls and then open any file using cat <File name>(to find the location of the file
where it should need to backup)
Then copy the string from backup (as shown above) in a notepad. Then add --verbose d to the string after b as
shown below
b) Or cat web* or cat back* (to get login password for ftp server, if required)
c) sudo backup -b --verbose -d ftp://'login':'paswd'@<IPof ftp>/ -c full
d) Backup can be taken on server as well using following command
sudo backup -b --verbose -d /var/home/ftp/pub/ -n 3 -c -x -c -- xln os security
Once the backup is Successful, check the backup t|more to capture the backup logs and then proceed to case
closure.
Alarm->MaintObject__WD_EventID_22
Understanding: The watchdog keeps an eye on all processes in the system, maintaining heartbeats with
both Communication Manager and platform processes. The watchdog is responsible for stopping and
starting processes when necessary. This process watches over the entire system. Event Id-22 indicates that one of
the WD process was terminated.
Procedure:
grep -R terminated /var/log/messages (identify the Application that was terminated and corresponding
time-stamp)
Check statapp to find watchdog application is UP inlucding all the applications are UP.
Alarm Description:
Alarm->MaintObject__WD_EventID_26
Description: Watched handshake error IF USB alarms are also present, this strongly points to a global SAMP or
networking problem. This error implies malfunctioning or missing or configurational / firmware mismatches issues
of SAMP or else it may point to usb modem malfunctioning.
Procedure:
almdisplay v / almdisplay res |more
sampdiag v (gives status of SAMP)
grep -R SampEth /etc/opt/ecs/ecs.conf (to chk detection of SAMP card)
sampcmd date (to check synchronization of SAMP with host)
restartcause
testmodem
testmodem -t reset_usb (soft reset of USB modem, if any of the test is getting failed)
switch to sroot login and then arping -I ethX -f -c 1 -w 1 <IP Address> (check whether arping is getting
passed & value of X need to be identified in step 3)
Note: If this issue occurs 3 times in a row it could lead to an interchange if only one server sees the failure. Many of
the examples seen have been chronic issue that occur many times over a week or 2. In this case additional analysis
should be done to determine if there is a possible issue that is occurring.
Probable Cause: When arping gets fail for any Ethernet port on server, this alarm is reported.
Alarm Description:
almdisplay v
statapp (check for messaging / INDADS AlarmAgent service is down)
Alarm Description:
Alarm-> MaintObject_USB1_EventID_X
Understanding: Modems are used for their ability to call out Alarms to external Alarm Monitoring System and also
to access Avaya servers remotely by dialing through the Modem (eg: from toolsa we can access
Customers network only through Modem). The modems in the system are tested every 15 minutes to verify that
dial tone can be achieved. If dial tone is not achieved every 15 minutes, Watch Dog reports as an alarm.
Procedure:
almdisplay v /almdisplay res |more
testmodem
restartcause
testmodem
testmodem -t reset_usb (soft reset of USB modem, if any of the test is getting failed)
stop -s ModemMtty followed by start -s ModemMtty (If soft reset doesnt work then restart ModemMtty
process but do take Customer before doing that)
If testmodem still get failed, ask Customer to get the Modem reseated followed by telephone cable
(inserted into modem) reseated
Note: If Handshake Test is failing then reseat the modem and if Off-Hook Test is failing then
get the telephone cable (inserted into the modem) reseated.
Malfunctioning of modem OR
Modemtty service is hung or stopped on server OR
Telephone line connected to modem is not functioning properly
Alarm Description:
Alarm->MaintObject_UPD_EventID_X
Description: The kernel update is activated but the activation is not committed.
Procedure:
almdisplay v
swversion a (gives the date when the update was executed)
update_show
update_commit <filename/Update ID> (making the update permanent)
almclear a
Probable Cause: The kernel update was not committed and hence this alarm is reported.
Alarm Description:
Alarm->MaintObject_UPG_EventID_1
Description: THE UPG raises an alarm if the upgrade was not made permanent within a certain amount of time
after the upgrade.
Procedure:
Probable Cause: The alarm is mostly due to the upgrade activity scheduled at the customer side and the the
upgrade was not made permanent in a specific time after the upgrade
Alarm Description:
Alarm->MaintObject_UPS_EventID_X
Description: The UPS process is for monitoring the status of the UPS for each 8700 server. An alarm will be raised
when there is a loss of commercial power or there is some other power problem such as a spike, sag, brownout or
blackout.
Procedure:
(verify if switch is pingable or else ask customer to check the network integrity)
snmpwalk <ip-address of UPS switch> -c public 33 | more (to verify if the system is currently on
backup power)
If alarm is active inform Customer and ask to verify AC Power is being supplied to UPS and coordinate with
the vendor, if required.
Procedure:
If Event ID is 1 or 2
almdisplay v / almdisplay res |more
ping <ip-address> (ip-address can be identified in alarms itself)
login to the entity having the above ip-address and check whether it had under-gone a cold or warm
reboot
Note: Event ID 1 corresponds to Cold Reboot and 2 to Warm Reboot of the entity
If Event ID is 3
almdisplay v / almdisplay res |more
ping <ip-address> (ip-address can be identified in alarms itself)
Identify the entity and alarm indicates that communication link between media server and that
particular entity is either down or have come up after a failure.
Note: STD_EventID_X alarms are generally in resolved state and can be closed either by stating
cold/warm reboot of ip-entity or communication link flap between media server and ip-entity, as per
the X value.
Alarm Description:
Alarm->MaintObject_ENV_EventID_X
Description: The ENV MO monitors environmental variables (including temperature, voltages, and fans)
within the server. Alarm indicates that any of these variable has deviated from its nominal value
Procedure:
Description: MO-SVC_MON is a media server process, started by the Watchdog, to monitor Linux services and
daemons. It also starts up threads to communicate with a hardware-sanity device. This alarm indicates one of the
Linux Daemon is down.
Procedure:
almdisplay v /almdisplay res |more
cd /var/log
grep svc_mon messages (to check which daemon was affected)
service <daemon name> status (to check the status of daemon whether it is running)
service <daemon name> start (if service is not running )
service <daemon name> status (to confirm the service is running)
If still daemon doesnt come up, then inform the Customer and get the permission of Warm Reboot
followed by Cold Reboot (only if required), in lean period.
Probable Cause: Whenever one of the below daemon is either stopped or restarted maybe due to server
Reboot or CM reload or maybe due to health of the server degrades.
Alarm->MaintObject _PKT-INT_Location_
Alarm->MaintObject_PKT-BUS
G3_Cabinet-Down / G3_CircuitPack-Down
Description: An IPSI board contains several different functionalities, one of them: the PKTINT. This is the
resource on the IPSI board that is the manager for the LAPD links travelling through the packet bus. These links
include RSCLs, EALs, INLs, etc. The packet bus consists of a single bus, and one such bus appears in each port
network.The packet bus in each port network is physically independent from those in other port networks, so each
port network has a separate PKT-BUS MO.
In addition to affecting telephone service, a Packet Interface/Packet Bus failure affects the service
Provided by circuit packs e.g. ISDN-signaling service, service provided by the C-LAN or VALor
IPMEDPRO boards etc.
Procedure:
status cabinet X
If alarm is active and any of the ipsi is down, inform Customer and with required permission go
for reset of ipsi followed by reseat and then, if required, replace it.
Probable Cause: The alarm may get report due to:
Lan Issue / Power outage at the site OR
Reboot of port-network may be due to too many sanity failures OR
Bad health of ipsi.
Alarm Description:
Alarm->MaintObject_SYS-LINK
Understanding: System links are packet links that originate at the Packet Interface board and traverse various
hardware components to specific endpoints. The hardware components involved on the forward and reverse
routes can be different, depending upon the configuration and switch administration. Various types of links are
defined by their endpoints: EAL, PRI, RSCL, RSL, MBL etc.
The state of a system link is dependent on the state of the various hardware components that it travels over. Hence,
when analyzing any system link problem, look for other active alarms present for corresponding hardware
component. If so then follow the maintenance procedures for the alarmed components to clear those alarm first.
Note: All the above links originates from Pkt-Int ie from an IPSI board and terminates on
corresponding circuit-packs
If none of the alarms for above listed hardware components are present, accept the sys-link then
execute below steps to clear the alarm.
Procedure:
almdisplay v /almdisplay res |more
list sys-link (to identify the sys-link)
test sys-link <sys-link location> long clear (to clear the dead alarm and/or to identify any test if
getting failed)
Alarm Description:
Alarm->MaintObject_TONE-BD
Description: For IPSI-equipped EPNs, the TONE-BD MO consists of a module located on the IPSI circuit
pack and provides tone generation, tone detection, call classification, clock generation, and synchronization.
For non-IPSI EPNs, the TN2182B Tone-Clock circuit pack provides the functions.
Note: Check for any other IPSI related alarms, if present follow corresponding procedure to resolve the alarms. If
there are no other alarms then follow the below procedure.
Procedure:
almdisplay v /almdisplay res |more
test tone-clock <board location>
display errors
If alarm is still present, then inform Customer and with required permission proceed with
following steps:
busyout-release of the board followed by reseat and then, if required replace it
Probable Cause: Mal-functioning of Tone-Clock board.
Alarm Description:
Alarm->MaintObject_ETH-PT
Description: The TN799DP Control LAN (C-LAN) circuit pack provides TCP/IP connection to adjuncts
applications such as CMS, INTUITY, and DCS Networking. The C-LAN circuit pack has one 100BASE-T
Ethernet connection and up to 16 DS0 physical interfaces for PPP connections. Also C-Lan acts as a
gatekeeper for IP-Endpoints registration.
Procedure:
almdisplay v /almdisplay res |more
display port <Port location> ( identify data-module/link number, say X)
status link X/ status data-module X (verify the current status of the link/data-module ie it is in-service
or not)
get ethernet-options <C-Lan board location> (check for Ethernet port settings-Avaya
Alarm Description:
Alarm->MaintObject_CLAN-BD
Description: The TN799DP Control LAN (C-LAN) circuit pack provides TCP/IP connection to adjuncts
applications such as CMS, INTUITY, and DCS Networking. The C-LAN circuit pack has one 100BASE-T
Ethernet connection and up to 16 DS0 physical interfaces for PPP connections. Also C-Lan acts as a
gatekeeper for IP-Endpoints registeration.
Procedure:
almdisplay v /almdisplay res |more
test board <C-Lan board location> (to check whether all test are getting passed)
display port <Port location> ( identify data-module/link number, say X)
status link X/ status data-module X (verify the current status of the link/data-module ie it is in-service
or not)
get ethernet-options <C-Lan board location> (check for Ethernet port settings-Avaya
recommends to have Ethernet-port on 100Mbps-Full Duplex and Autonegotiation off)
If still alarm doesnt clear off then try inserting C-Lan board into some other slot and check. If
alarm clears off, then replace the Carrier or else replace the Circuit Pack
Note: While resetting the ip-interface board through Sat-prompt, first busyout the board and then
disable the Ethernet interface by changing ip-interface <board location>. Also enable the same after
reseting the board and then only release it
Probable Cause: The alarm may get report due to:
Lan Issue OR
Bad health of any of the Clan board OR
Wrong configuration of Clan board
Alarm Description:
Alarm->MaintObject_IPMEDPRO
Description: In an IP telephony solution, digital signal processing (DSP) resources are used for handling
media streams. DSP resources inter-work audio between the media gateways time division multiplex (TDM)
bus and the IP network, as well as transcoding (ie when needed to convert one codec to another). DSP
resources are dynamically allocated on a call-by-call basis and are provided by the IP Media Processor
(IPMEDPRO) circuit pack for solutions using S8100, S8500 or S8700 Media Server with G600 or G650 Media
Gateways (or traditional SCC1, MCC1 gateways)
There 2 types of IPMEDPRO circuit packs
TN 2302AP IP Media Processor provides
TN 2602AP IP Media Processor provides
The TN2302/TN2602 includes a 10/100 BaseT Ethernet interface to support IP audio for IP trunks and H.323
endpoints and also for adjuncts such as Voice Recording Logger.The IPMEDPRO circuit pack acts as a
service circuit to terminate generic RTP streams used to carry packetized audio over an IP network.
Procedure:
almdisplay v /almdisplay res |more
list configuration board <board location> (to identify TN2303 / TN 2602 circuit pack)
display ip-interface <board location> (verify that ethernet port is enabled and is set to 100 Mbps
speed, Full duplex and Autonegotiation is disabled)
test board <board location> (check for all test are getting passed)
display errors (check for any errors against the IPMEDPRO circuit pack )
ping <ip-address of Medpro board> (check whether server is able to ping the Medpro-board)
If alarm is active, inform the Customer and ask to check and confirm the network integrity to the
Medpros Ethernet-port and if Customer replies everything is fine , then follow below procedure with
required permission
busyout board <board location> followed by reset board and then release board (ie If alarm is
active , try resetting the Medpro board)
Get the board re-seated either with the help of Customer or else by sending a technician on-site.
If still alarm doesnt clear off then try inserting Medpro board into some other slot and check. If
alarm clears off, then replace the Carrier or else replace the Circuit Pack.
Note: While resetting the ip-interface board through Sat-prompt, first busyout the board and then
disable the Ethernet interface by changing ip-interface <board location>. Also enable the same after
reseting the board and then only release it
Probable Cause: The alarm may get report due to:
Lan Issue OR
Wrong Configuration of the Medpro board OR
Bad health of any of the Medpro board.
Alarm Description:
Alarm->MaintObject_MEDPROPT_Location_X_OnBoard_Y
Description: The Media Processor Port (MEDPROPT) MO monitors the health of the Media Processor
(MEDPRO) digital signal processors (DSPs). This maintenance object resides on the TN2302/TN2602 Media
Processor circuit packs which provide audio bearer channels for H.323 voice over IP calls. One TN2302AP
has 8 MEDPROPTs; each TN2302 MEDPROPT has the processing capacity to handle 8 G.711 coded
Channels, for a total of 64 channels per TN2302. The capacity provided by the TN2602 is controlled by the
Avaya Communication Manager license file and may be set at either 80 G.711 channels or 320 G.711
channels. If individual DSPs on the TN2302AP or TN2602 fail, the board remains in-service at lower capacity.
The MEDPROPT is a shared service circuit. It is shared between H.323 trunk channels and H.323 stations.
An idle channel is allocated to an H.323 trunk/station on a call-by-call basis.
Note: If any Medpro-board/TDM/Pkt-Int alarm is present along with Medpropt, follow corresponding
procedure to proceed further or else follow below procedure
Procedure:
almdisplay v / almdisplay res |more
status media-processor board <board location> (Check whether Ethernet link and MPCL links
are up and all DSP channels are inservice/idle or busy state.)
test port <port location> (check for all test against the port are getting pass)
test board <Medpro board location> (check for all test for the board are getting pass)
display errors (check for any errors against the Medpropt to identify the cause)
test board <Medpro board location> long r 5 (to execute the test board command five times)
busyout-release port <port location> (soft reset of Medpropt)
Procedure:
almdisplay v /almdisplay res |more
display ip-interface <board location> (verify that ethernet port is enabled and is set to 100 Mbps
speed, Full duplex and Auto negotiation is disabled)
test board <board location> (check for all test are getting passed)
display errors (check for any errors against the Val board to identify cause )
ping <ip-address of Medpro board> (check whether server is able to ping the Val-board)
If alarm is active, inform the Customer and ask to check and confirm the network integrity to the
Medpros Ethernet-port and if Customer replies everything is fine , then follow below procedure with
required permission
busyout board <board location> followed by reset board and then release board (ie If alarm is
active , try resetting the Val board)
Get the board re-seated either with the help of Customer or else by sending a technician on-site.
If still alarm doesnt clear off then try inserting Val board into some other slot and check. If alarm
clears off, then replace the Carrier or else replace the Circuit Pack.
Note1: While resetting the ip-interface board through Sat-prompt, first busyout the board and then
disable the Ethernet interface by changing ip-interface <board location>. Also enable the same after
reseting the board and then only release it
Note2: Before resetting the Val board or else getting it reseated, it is recommended to take the backup
of announcements present on the Val board because sometimes announcement files may get erased.
Same can be confirmed with Customer , since they may be have a schedule Val backup in place.
Incase, we are required to take the announcements backup, follow the below procedure :
Val-Backup Procedure:
list directory board (Val board location) ( this command runs on Sat-prompt---here you get all the
announcement files present in Val board)
enable filexfer (this command needs to be run on Sat-prompt---here you need to define a
login/password, secure as no and mention the Val-board location)
sudo ftpserv on (this command needs to be run on Shell-prompt---means we are turning on the
ftp service, so that we can ftp the Val board from server)
ftp <ip-address of Val board>
bin
(to get the binary version)
hash
(to get the file transfer status)
prompt
(to get more than one file with one command)
mget .*
(gets all the files from Val board to the server)
Probable Cause: Bad health of SNI-Board /SNC board or the fiber link associated with the SNI-BD
Alarm Description:
Alarm->MaintObject_SN-CONF _Location_X
Description: A switch node carrier contains:
Up to 16 Switch Node Interface (SNI) TN573 circuit packs in slots 2 through 9 and slots 13
through 20
One or two Switch Node Clock (SNC) TN572 circuit packs in slots 10 and 12
An Expansion Interface (EI) TN570 circuit pack, a DS1 Converter (DS1C) TN574 circuit
pack, or no circuit pack in slot 1
An optional DS1 converter circuit pack in slot 21
Procedure:
almdisplay v / almdisplay res |more
test board X (check for any test, if getting failed)
list fiber-link (to identify fiber link and other end-point to which it is connected)
test fiber-link Y (check for any test, if getting failed)
display errors (check for any errors against the board)
clear firmware-counters location (SNC firmware generates error reports independently of
demand tests. Therefore, test board X does not affect the error status by firmware Hence this
command needs to be executed to clear any firmware generated errors unconditionally.)
Inform customer and verify that the fiber-link physically connected to any SNI board and the other
end-point of the fiber-link are properly administered
If alarm do not clear off, then Inform Customer and with required permission go ahead with soft
reset (ie busyout then reset and then releasing the board) followed by reseat of the board (ie
removing the board from the slot and then re-inserting it) and then, if alarm still doesnt go off,
replace the board.
Probable Cause: SN-CONF errors and alarms are generated for two types of failures:
Failure of SNI or SNC board OR
Absence of physical connectivity of a fiber-link between 2 end-points (ie either between 2 SNIs or
2 EIs or between SNI & EI or DS1C & SNI/EI) but is administered on CM OR
Two endpoints are physically connected but not administered on CM software.
Alarm Description:
Alarm->MaintObject_SNC-LINK _Location_X
Alarm->MaintObject_SNC-BD _Location_X
Alarm->MaintObject_SNC-REF _Location_X
Description: The Switch Node Clock (SNC) TN572 circuit pack is part of the Center Stage Switch (CSS)
configuration. It resides in a switch node carrier that alone or with other switch nodes make up a CSS. In a
high-reliability system (duplicated server and control network, unduplicated PNC), each SNC is duplicated
such that there are two SNCs in each switch node carrier. In a critical-reliability system (duplicated server,
control network, and PNC), each switch node is fully duplicated, and there is one SNC in each switch node
carrier. SNCs are placed in slots 10 and 12 of the switch node carrier. These are the alarms associated with
SNC circuit pack:
-The SNC-LINK MO reports errors in communications between the active Switch Node Clock and Switch
Node Interfaces over the serial channel (Aux Data 1) and the TPN link (Aux Data 2).
-The SNC-BD MO covers general SNC board errors and errors with the serial communication channel
between the active and standby SNCs.
-The SNC-REF MO reports errors in SNI reference signals detected by the active Switch Node Clock.
Note: If any alarm, related to SNI-BD or SNI-PEER or fiber-link or DS1C-BD, is present the follow
corresponding repair procedures first
Procedure:
almdisplay v / almdisplay res |more
test board X (check whether all test are getting passed)
display errors (check for any errors against the board)
clear firmware-counters location (SNC firmware generates error reports independently of
demand tests. Therefore, test board X does not affect the error status by firmware Hence this
command needs to be executed to clear any firmware generated errors unconditionally.)
If alarm do not clear off, then Inform Customer and with required permission go ahead with soft
reset (ie busyout then reset and then releasing the board) followed by reseat of the board (ie
removing the board from the slot and then re-inserting it) and then, if alarm still doesnt go off,
replace the board.
Probabale Cause: The alarm may get reported due to:
Bad health of any of the hardware component mentioned in above note OR
Bad health of the SNC board OR
Configuration issue for new installation or some change activity at the custom.
Alarm Description:
Alarm->MaintObject_EXP-INTF_Location_X
Description: The TN570 or the TN776 Expansion Interface (EI) circuit pack provides a TDM- and packet bustofiber interface for the communication of signaling information, circuit-switched connections, and packetswitched
connections between endpoints residing in separate PNs. EI circuit packs are connected via optical
fiber links.
Note: If any alarm, related to IPSI which is acting as archangel or fiber-link or TDM bus or Tone-Clk, is
present then follow corresponding repair procedures first to resolve the alarm
Procedure:
almdisplay v /almdisplay res |more
status cabinet X (to check status of connectivity of EPN)
Alarm->MaintObject_EXP-PN_Location_PN X
Description: The EXP-PN MO is responsible for overall maintenance of an Expansion Port Network (EPN)
and monitors cross-cabinet administration for compatible companding across the circuit-switched connection. The
focus of EPN maintenance is on the EI or IPSI circuit pack that is acting as the Expansion Archangel link in an EPN.
Note: If alarms, involving EI board or IPSI board which is acting as Expansion Archangel or any of the
hardware involved with CSS such as SNI-BD or SNC-BD or DS1C-BD or fiber link, are present, then
these alarms needs to be repaired first
Procedure:
almdisplay v /almdisplay res |more
status port-network X (check status of EPN )
status sys-link <EI slot location> (To identify which IPSI is controlling EPN and check whether
any other alarm is present for the identified IPSI and/or for EI circuit pack. If yes, follow corresponding
procedure to resolve the alarm)
If still alarm is active and corresponding IPSI and EI circuit-packs are fine, inform Customer about the
alarm and with required permission follow below procedure:
display fiber-link X (to identify the end-points and check whether any alarm is present for any of the
end-point. If yes, follow corresponding repair procedure to resolve the alarm)
test fiber-link X (check whether all test are getting passed for fiber-link)
display errors (to identify cause for the alarm)
busyout-release the fiber link.(ie soft reset of fiber-link with Customers permission)
If still alarm is active and corresponding end-points are fine, ask Customer to check the physical
connectivity ie fiber-link is properly terminated onto the end-points and also to check the fiber-cable,
to ensure no cuts are present on fiber-cable.
Probable Cause: The alarm may get report due to:
Bad health of end-points that are connected through fiber-link (ie either SNI/EI/DS1C board as per the
solution deployed at customer site)
Physical Connectivity issue (ie either fiber-link is not properly terminated onto the end-points or else
fiber-link is broken in between.)
Configuration Issue for a new installation or due to some change activity at the customer site.
Alarm Description:
Alarm->MaintObject_DS1C-BD_Location_X
Description: The DS1 converter complex is part of the port-network connectivity (PNC) consisting of two
TN574 DS1 Converter or two TN1654 DS1 Converter circuit packs connected by one to four DS1 facilities. It
is used to extend the range of the 32-Mbps fiber links that connect PNs to the Center Stage Switch, allowing
PNs to be located at remote sites. The DS1 converter complex can extend a fiber link between two EIs or
between a PN EI and an SNI. Fiber links between two SNIs or between a PN and the Center Stage Switch
(CSS) cannot be extended.
Note: If SYNC, TDM-CLK, SNC-BD,SNI-BD, Fiber-Lk or DS1-FAC alarms are present then follow
corresponding repair procedures first. If only DS1C-Bd alarm is present, follow below procedure.
Procedure:
A. If alarm is off board
almdisplay v /almdisplay res |more
display errors (to identify the cause of alarm ie either TDM-Clk/SYNC/SNC-BD or fiber-link or ds1
facility alarms)
If errors are associated with ds1-facility then follow step 3 & 9 but with customers permission
busyout & release ds1-facility <ds1c-board location> (If errors associated to DS1 facility are present,
then have a soft reset of the ds1-facility)
If errors are associated with Synchronisation or fiber-link then follow step 4 & 9
status synchronization (to check the synchronization status and fiber-link could be the source for
synchronization issue in an EPN)
list fiber-link (to identify corresponding fiber-link no and also the extreme end-points of that fiber-link.)
test fiber-link X (check for any test, if getting failed follow the corresponding repair procedure)
test board <DS1C board location> (check for any test, if getting failed)
If alarm do not clear off, then Inform Customer and with required permission go ahead with soft reset
(ie busyout then reset and then releasing the board) followed by reseat of the board (ie removing the
board from the slot and then re-inserting it) and then, if alarm still doesnt go off, replace the board.
Probable Cause: The alarm may get report maybe due to:
If an issue has been detected due to synchronization issue or mal-functioning of TDM-Clk or fiber-link
for an EPN OR
Issue has been encountered with ds1-facility provided by DS1C board OR
Bad health of DS1C board OR
Configuration Issue for a new installation or due to some change activity at the customer site.
B. If alarm is on board
almdisplay v /almdisplay res |more
test board <DS1C board location> (check for any test, if failing)
display errors (to identify cause)
If alarm do not clear off, then Inform Customer and with required permission go ahead with soft reset
(ie busyout then reset and then releasing the board) followed by reseat of the board (ie removing the
board from the slot and then re-inserting it) and then, if alarm still doesnt go off, replace the board.
Probable Cause: The alarm may get report maybe due to:
Bad health of DS1C board OR
Configuration Issue for a new installation or due to some change activity at the customer site.
Alarm Description:
Alarm->MaintObject_TDM-BUS_Location_PN X
Description: Each Each port network has a pair of TDM buses, designated TDM bus A and TDM bus B, each
with 256 time slots. This division allows for duplication of control channels and dedicated tone time slots. The
first five time slots on each bus are reserved for the control channel, which is active on only one bus at a time
in each port network. The next 17 time slots are reserved for system tones such as dial tone, busy tone and
so on. As with the control channel, these time slots are active on only one bus, A or B, at a time. The rest of
the time slots on each bus are for general system use such as carrying call-associated voice data. The 17
dedicated tone time slots that are inactive can also be used for call processing when every other available
time slot is in use.When the system initializes, the control channel is on TDM bus A and the dedicated tones
on TDM bus B in each port network. If a failure occurs on one of the two buses, the system will switch any
control, tone and traffic channels to the other bus. Service will still be provided, though at a reduced capacity..
TDM-bus faults are usually caused by one of the following:
A defective circuit pack connected to the backplane
Bent pins on the backplane
Defective bus cables or terminators
Procedure:
almdisplay v /almdisplay res |more
status port-network X
test tdm port-network X (to check all the test are getting passed for the tdm bus in a port network)
If alarm is active, then follow below procedure to isolate and detect TDM-Bus Fault. Also always
inform Customer about the issue and plan of action stated below before proceeding further. Always
have a SFM or TM involve in the execution:
Step 1: Check for any active alarms for Tone-Clock /Detectors Board, Expansion Interface
i.e. EI board alarms and Packet Interface i.e. IPSI board alarms or any other TN
Circuit-Pack. Follow corresponding procedure to resolve the respective alarms and then check for TDM-Bus alarm,
if it is cleared, close the case.
Step 2: If no active alarm is present for any of the Tone board or EI or IPSI board or any other circuit-pack, then
a) If duplicated circuit-pack is present, then switch standby circuit-pack to active and check for the
alarm. If alarm is resolved, remove the then standby circuit-pack and check for backplane pins. If
they are bent then switch-off the power of this Carrier and straighten or replace the pins and reinsert
the circuit-pack and restore the power.
b) Try re-moving all the circuit-packs in the Port-Network one by one depending upon the criticality
of the function of the circuit-pack. This means IPSI/EI board should be removed at the last and
Tone-Clock board needs to be removed at last but one (This is because removing these circuit packs
will result in disconnection of corresponding Port-Network).
c) When any of the circuit-packs is removed , determine whether the backplane pins in the slot
appears to be bent. If yes, then switch-off the power to this Carrier and straighten or replace the
pins and then re-insert the circuit-pack and restore the power. If backplane pins are not bent ,
then re-insert the circuit-pack.
d) If all the circuit-packs are checked as mentioned above, and alarm is still active then try replacing
TDM cable assemblies and TDM Bus terminators and then if required replace carrier itself.
Probable Cause: The alarm may get report due to:
When control of system tones is switched from one bus to other OR
Bad health of the Circuit-Pack providing Tone-Clock functions OR
Physiscal Connectivity issue ie TDN Cable assemblies or TDM bus terminators or backplane pins
which connects to Circuit-Pack inside the slot.
Alarm Description:
Alarm->MaintObject_POW-SUP_Location_X
Description: This MO verifies physical presence of power supply and output voltage of each power supply in
G650 is within tolerance
Procedure:
almdisplay v / almdisplay res | more
test board <Power Supply Board Location> (Check for any test , if being failed)
status environment <Cabinet No.> (Check the environment of the cabinet)
test environment <Cabinet No.> (Check whether all test are getting passed)
display errors (and select board) (check for errors to find the cause of alarm)
Inform Customer and with required permission go ahead with soft reset followed by reseat of the
board and then, if required, replace it.
Probable Cause: Bad health of Power Supply board OR Power supply being delivered to the board.
Alarm Description:
Alarm->MaintObject_M/T-BD / MT-ANL/ M/T-DIG/ M/T-PKT
Decription: The Maintenance/Test circuit pack (TN771D) supports packet-bus fault detection and bus
reconfiguration for the port network where it is installed. The circuit pack also provides Analog
Trunk testing and data loop-back testing of DCP Mode 2 endpoints and Digital (ISDN) Trunk
facilities via the TDM bus.
Port 1 of the Maintenance/Test board is the Analog Test port which provides the Analog Trunk testing
function for Automatic Transmission Measurement System (ATMS). M/T-ANL maintenance ensures that the
analog trunks testing function is operating correctly.
Ports 2 and 3 are the digital ports which provide the digital (ISDN) trunk-testing functions.
M/T-DIG maintenance ensures that the digital trunk testing function is operating correctly.
Port 4 is the packet port provides the packet-bus maintenance functions: Packet-bus fault detection &
Packet-bus re-configuration.
Procedure:
almdisplay v /almdisplay res |more
test board <board location> (check for any test, if failing)
display errors
test board <board location> long
If alarm do not clear off, then Inform Customer and with required permission go ahead with soft reset
(ie busyout then reset and then releasing the board) followed by reseat of the board (ie removing the
board from the slot and then re-inserting it) and then, if alarm still doesnt go off, replace the board.
Probable Cause: Bad health of Maintenance port/ board.
Alarm Description:
Alarm->MaintObject_PS-RGEN_Location_X
Alarm->MaintObject_RING-GEN_Location_X
Understanding: The PS-RGEN maintenance object monitors the ringing voltage of each 655A power supply. The
TN2312BP IPSI uses the ring detection circuit on the 655A to monitor ring voltage for the G650. Failure of the ring
generator results in loss of ringing on analog phones. Ringing on digital and hybrid phones is not affected.
Procedure:
almdisplay v / almdisplay res | more
test board <Power Supply Board Location> (Check for any test , if being failed)
status environment <Cabinet No.> (Check the environment of the cabinet). The results should appear as
below with OK.
test environment <Cabinet No.> (Check whether all test are getting passed)
display errors (and select board) (check for errors to find the cause of alarm)
Inform Customer and with required permission go ahead with soft reset followed by reseat of the
board and then, if required, replace it.
Probable Cause: Bad health of Power Supply board OR Power supply being delivered to the board
Alarm Description:
Alarm->MaintObject_NR-CONN
Understanding: The Network-Region Connect (NR-CONN) MO monitors VoIP connectivity between network
test between IP endpoints in separate network regions.& Minor alarm for multiple failures: Once a single failure is
detected, Test #1417 is re-executed between different IP endpoints in the same pair of network regions.
Procedure:
ping ip-address A board B (where A is ip-address of one NR and B is the ip-board of the other
NR and these boards needs to be from alarmed ip-network-region )
status ip-network-region X
test failed-ip-network-region X (to clear the alarms and or to check whether all test are being
passed for ip-network-region)
display errors (to identify the cause of the alarms)
display ip-network-map (to identify and confirm an entry, as required, against the failed ipnetworkregion. Because value may get modified here, after any change/update activity )
Description: License is either missing or have gone corrupted or alarm is on one of the lsp/ess where it is
controlling any of the Media-Gateway/Port-Network
Procedure:
If license is either missing or is corrupted
almdisplay v / almdisplay res |more
statuslicense v (check whether license is corrupted/missing/normal)
Download a license copy from https://rfa.avaya.com onto your laptop (for CM 5.2)
For CM version latest to CM 5.2, download the license from the PLDS.
stage it onto the server through sig
Commandline on sig:
ssh <filename> init@<IP Address of server>:/var/home/ftp/pub
loadlicense <filename> (on shell prompt of server, You will have to wait for 30 minutes, till license
comes to Normal status)
Now, open folder <your avaya handle> and then drag and drop the downloaded license file into this
location
Now, open the Desktop folder and then drag and drop the License file to the Notes folder under FTP odva.w.ag.60\TSH (this may be different for some) as shown below.
Now, login to the Web LM with the default user name and password (admin/admin01) or check with the
customer if it is changed. And click on Install License option and then select Browse
Now, open select the License file which was earlier saved on the FTP server.
Now, click on Install. This will install the required License on the CM.
Now, click on Communication Manager option and see the new license details.
Note: If you receive any conflict error with the old existing Certificate, try uninstalling the old License from
Uninstall License option and install a new one.
Probable Cause: The alarm could have been reported maybe due to:
After an update/upgrade either license file was not installed or corrupt license file was installed OR
License file got corrupted due to bad health of the server
Solution:
lsp/ess has become active
almdisplay v / almdisplay res |more
list survivable-processors (check whether lsp/ess is registered with main server)
list media-gateways (to check if any Media-Gateway is not registered to main server and is registered to
lsp.)
Probable Cause: The alarm could have been reported due to:
Lan Issue /Power outage at the site OR
Main server is down and hence PNs and/or MGs got registered to the ESS or an LSP.
Alarm Description:
Alarm->MaintObject_ESS_Location_CL 000_OnBoard_N
Description: One or more IPSI is not pingable from the ESS server or ESS server is not able to detect the
serial-number of ipsi
Procedure:
almdisplay v / almdisplay res |more
pingall i (check whether all ipsis are pingable)
cd /var/log/ecs
grep -R sanity <filename>
serialnumber (check whether serialnumber of all ipsis are being detected by the server)
netstat -v |grep "5010" (check whether tcp link is established between server and ipsi)
ipsiversion a (check the firmware version of ipsi and its compatibility with the CM load on the server)
Probable Cause: The common cause is ESS is not either to ping any of the ipsi or not able to detect serial number
of an ipsi maybe because of
Lan issue OR
ipsi firmware mis-match with CM release OR
bad health of an ipsi
Alarm Description:
Alarm->MaintObject_ESS_EventID_X (where X=1,2,3or4)
Description:
Procedure:
almdisplay v / almdisplay res |more
status ess port-networks (check whether any port-network being controlled by an ess)
traceroute <ip-address of IPSI>,(traceroute only those ipsis from main server which are not
pingable)
get forced-takeover ipserver-interface port-network X
(If IPSI is pingable from main server but being controlled by ESS, then with customers permission get
force control of IPSI to main server)
cd /var/log/ecs
grep -R sanity <filename>
(If IPSI is pingable and being controlled by Main Server, then check for sanity failures if any which
could be the cause for the alarm)
If IPSI is not pingable, check with customer for network issue , if any.
Note: For EventID_3/EventID_4, either the IPSI controlling EI link of EPN is registered to ESS server or
there could be some issue with fiber link. Incase, no issues has been found with ipsi then check for the fiber link
issues and continue the below steps. But below steps are only to be followed for EventID_3or4.
cat /var/log/messages |more (to check for any fiber-link issues trace)
list fiber-link (get the details of fiber-links)
test fiber-link X (check for any test, if getting failed)
status sys-link <Slot Location for the EI Board> (check which IPSI is controlling the alarmed EPN)
list ipserver-interface (check for any errors on ipsi ie CPEG and ip-address of ipsi. )
display errors
restartcause
Probable Cause:
Lan issue OR
Bad health of main server OR
Issue with physical Connections of fiberlink (only for Event-Id 3&4)
Alarm Description:
Alarm->MaintObject_ESS_EventID_5
Alarm->MaintObject_ESS_EventID_6
Description: Enterprise Survivable Server cluster not registered ie ESS is not registered to the main server
(EventID_5) or it is registered back to main server (EventID_6)
Procedure:
almdisplay v / almdisplay res |more
status ess cluster
list survivable-processor
cd /var/log
grep -R register messages (to check which clustered is/was not registered)
Probable Cause: The alarm could have been reported because of:
Lan Issue / Power outage at site OR
ESS server is down due to its bad health.
display ip-services (to find the node-name used for CDR and associated Clan board)
display node-names ip <Node-names found in above step> (to get the ip-address of CDR and
Clan board)
test board <Clan board location> (check for any test failing and/or any active alarm for the clan
board. If yes, proceed further with investigation on the Clan board, as discussed in the respective
section )
ping <ip-address of CDR> board <Clan Bd location (to which that CDR is connected)> (Check
any issues with the lan connectivity)
display errors (to identify the cause)
Alarm Description:
Alarm->MaintObject_ADJ-IP_Location_X
Alarm->MaintObject_AESV-SES_Location_X
Alarm->MaintObject_ASAI-IP_Location_X
Description: ASAI-IP corresponds to the fault detection of a cti-link which is connected to one of the adjunct
which is not of Avaya make where as ADJ-IP corresponds to the adjunct which is of Avaya make.
Procedure:
almdisplay v / almdisplay res |more
test cti-link X
status aesvcs link (to check remote IP and local nodename for Clan BD, also to identify aesvcs
server number)
ping ip-address <Remote IP> board <Clan board location> (to check the lan connectivity issue)
test aesvcs-server <aesvcs server number>
Alarm Description:
Alarm->MaintObject_UDS1-BD / MG-DS1
Understanding: The alarm refers to the DS1 Interface circuit packs, UDS1 Interface circuit
packs, and DS1 Interface Media Modules.
Solution:
Check for the alarm on the active server. almdisplay v (to display active alarms)
If the alarm is not displayed in active alarms, check for resolved alarms.
almdisplay res|more (Display resolved alarms in page wise, use spacebar to go to next page)
Login to autosat
The test board <board number>
Check if the test fails any. The board with no issues would appear as below:
Check status trunk X (trunk number can be found by displaying any of the port on the board).
Check the service state if it shows all as In-Service/ idle or Out of Service.
The Trunk status with no issues would appear as below screenshot.
list measurement ds1 log <Board location> (Check whether any slip-errors are present.)
MM STATUS FAILURE
Primary
v2
Failure Reason
Secondary v3
Active
Local
Standby None
v0
None
Reason for a failed source can be: Loss of Signal, Locked out etc.
There needs to be an explicit sync source on a MG i.e a DS1 board, else it shall default to the internal clock of the
MG/PN. Please take permission from customer before making this change, this is not service impacting.
Define which interfaces are primary and secondary(depends on the number of DS1 boards in the MG).
Set the synchronize interface primary <mmid> i.e. set sync int pri v2.
Set synchronize interface secondary <mmid> i.e. set sync int sec v3.
Specify the source to use set synchronize source <primary | secondary> i.e. set synchronize source
primary.
Display the current selections and their status, Show synchronized timing.
Following is an example screenshot if the sync source is not defined, as displayed below:
MG34(develop)# show sync timing
SYNCHRONIZATION CONTROL: --- Local --SOURCE MM or VoIP
STATUS
FAILURE
--------- ------------------- ----------------------- --------------Primary
Not Configured >>DS1 board should be set up as primary source
Secondary
Not Configured
Local v0
Active
None
Active Source: v0
Done!
If slip errors are present and are more consistent, notify Customer and ask to follow up with
Service provider to clear the slip errors. Also need to monitor the same until slip errors gets clearoff.
If alarm do not clear off, then Inform Customer and with required permission go ahead with soft
Reset (i.e. buyout then reset and then releasing the board) followed by reseat of the board (i.e.
Removing the board from the slot and then re-inserting it) and then, if alarm still doesnt go off, replace
the board.
removing the board from the slot and then re-inserting it) and then, if alarm still doesnt go off,
replace the board.
Probable Cause: Bad health of analog media module OR any activity at customer site OR physical
connectivity issue.
Alarm Description:
Alarm->MaintObject_ANL-BD - (Analog TN Circuit-Pack).
Description: Alarm is related to Analog board which can be used for analog stations as well as to terminate
analog trunks. Types of Analog Boards:
1. ANL-16-L (16-Port Analog Line)
The circuit packs listed below provide 16 analog line ports for single-line voice terminals.
The TN746, TN468, and TN749 support only single-line, on-premises/in-building, analog
voice terminals, and not off-premises stations, since these circuit packs are not equipped
with lightning protection.
The TN746B, TN2144, TN2149, and TN468B support both on-premises and off-premises
(that is out-of-building) analog voice terminals.
Note: TN746 & TN746B supports the neon-message waiting feature.
2. ANL-LINE (8-Port Analog Line)
The circuit packs-TN411, TN443, TN467, TN712, TN742, TN769 provide 8 analog line ports for single-line,
on or off-premises analog endpoints such as analog voice terminals, queue warning level lamps, recorded
announcements, dictation machines, PAGEPAC paging equipment, external alerting devices, modems, fax
machines, and AUDIX voice ports.
Procedure:
almdisplay v /almdisplay res |more
test board <Board location> (check for any test, if getting failed)
display errors (to identify the cause of the issue)
If alarm do not clear off, then Inform Customer and with required permission go ahead with soft
reset (ie busyout then reset and then releasing the board) followed by reseat of the board (ie
removing the board from the slot and then re-inserting it) and then, if alarm still doesnt go off,
replace the board.
Probable Cause: Bad health of analog board OR any activity at customer site OR physical connectivity issue.
Alarm Description:
Alarm->MaintObject_BRI-BD /MG-BRI/TBRI-BD
Description:
BRI-BD BRI Circuit pack
The TN556, TN2198, and TN2208 ISDN-BRI Lines are packet port circuit packs that provides access to
ISDN-BRI endpoints. The ISDN-BRI Line circuit packs supports 12 ports, each of which provides access to ISDN
stations. Voice and circuit-switched data from the ISDN stations are carried on the Time Division Multiplex (TDM)
Bus. Signaling is carried over the packet bus. The TN2208 LGATE MFB provides the system with the interface to
Adjunct-Switch Application Interface (ASAI) and Avaya adjuncts (for example, CONVERSANTR Voice System).
Though TN2208 contains 12 ports for line circuit interface, only 8 are usable by the switch.
MG-BRI -BRI Media Module
MM720 (8-port 4-wire BRI Trunk/Line Media Module),
MM722 (2-port 4-wire BRI Trunk Media Module),
TIM521 (4-port BRI Trunk Media Module in a Juniper Media Gateway) &
VMM_2BRI (2 port trunk-side integrated BRI Media Module)
The above Media Modules provide access to ISDN-BRI end-points where each port supports 2B channels of 64kbps
each and 1D chanel of 16kbps carried on 144 kbps.
TBRI-BD Trunk-Side BRI Circuit-Pack/ Media Module
The TBRI-PT maintenance object is a port on both the TN2185 Trunk-Side BRI circuit pack and
the MM720 BRI Media Module. The TN2185 Circuit Pack & MM720 Media Module contains eight, 4-wire ports that
interface to the network at the ISDN S/T reference point over two 64-kbps channels (B1 and B2) and over a 16kbps signaling (D) channel. The B1 and B2 channels can be simultaneously circuit-switched, or individually packetswitched. Only one channel per trunk can be packet-switched due to PPE (Packet Processing Element) limitations.
The D channel is either circuit- or packet-switched.
Note: If any PKT-Bus alarm is present along with BRI-BD, follow corresponding procedure to clear
PKT-Bus alarm first and if required /or else follow below procedure
Procedure:
almdisplay v /almdisplay res |more
test board <Board location> (Check for any test, if failing)
status trunk X (trunk number can be found by displaying any of the port on the board. Also when
you do test board X, you can identify trunk number against test of each port ).
Alarm->MaintObject_BRI-PORT_Location_X
Alarm->MaintObject_TBRI-PT_Location_X
Description: Some of the results of maintenance testing of ISDN-BRI ports may be affected by the health of the
ISDN-BRI Line circuit pack (BRI-BD), BRI endpoint (BRI-SET), or ASAI adjunct (ASAI-ADJ/LGATE-ADJ/ LGATE-AJ) or
Avaya adjunct (ATT-ADJ/ATTE-AJ). These interactions should be kept in mind when
Investigating the cause of ISDN-BRI port problems.
Note: There is a quite possibility, on account of the Pkt-Bus / BRI-BD/ any Adjunct alarms could be the cause of
BRI-PT alarms. So if respective alarm also exist, then you need to follow the
corresponding action to resolve those alarms as discussed in respective section
Procedure:
Alarm->MaintObject_CO-TRK
Understanding: Analog CO trunks are 2-wire analog lines to the CO that support both incoming and outgoing calls.
CO trunk circuit packs have eight ports, each of which provides an interface between the 2-wire CO line and the 4wire TDM bus.
Note: If any of Tone-BD/ Tone-Clk/ TDM-Bus alarms is present , then that could be the cause of the
CO-TRK alarm and hence need to check the corresponding action to resolve those alarms as
discussed in respective section
Procedure:
almdisplay v / almdisplay res |more
test board <board location> (check for any test, if failing for the board/port)
status trunk X (trunk number can be identifiedagainst the test of each port which can be seen in
output of step 2)
test analog-testcall full (check whether all test related to ATM-Transmission are getting passed)
busy-out & release trunk /port (i.e. soft reset of either trunk or port)
If alarm is active, then inform Customer and with required permission go ahead with soft reset (ie
busyout then reset and then releasing the board)
If alarm doesnt go off, then replace the Media-Gateway
Note: If test 3 is getting failed, then need to check with the customer that analog connection is in use
or has been de-activated. If customer replies that it needs to be functional, ask customer to detach the
analog line from the Analog port and run test board <board location> . If test passes, needs to be
followed up with Service Provider for the resolution or if test 3 fails , replace the circuit-pack
Probable Cause: The alarm may get report due to:
Physical connectivity issue or some activity at the customer site OR
Issue at the Service Provider side OR
Bad health of Analog port/board where CO trunk is terminated
Alarm Description:
Alarm->MaintObject_ISDN-SGR / ISDN-TRK
Understanding: An ISDN-PRI Signaling Group is a collection of B-channels for which a given ISDN-PRI
Signaling Channel Port (D-channel) carries signaling information. B-channels carry voice or data and can be
assigned to DS1 ISDN trunks (ISDN-TRK).
Procedure:
1. almdisplay v / almdisplay res |more
2. status signaling-group X (check whether signaling group is functional or not)
Alarm Description:
Alarm->MaintObject_H323-SGR
Understanding: The H.323 signaling group (H323-SGR) is a Signaling channel that physically resides on a CLAN port
(socket) and the IP network. The MEDPRO circuit-pack provides audio connectivity, working in
concert with a C-LAN (TN799DP) circuit pack that provides control signaling to support an H.323 connection. Unlike
ISDN D-channels, the H.323 channel may actually come up and down on a call by call basis. The H.323 channel is
actually a TCP/IP signaling channel.
Procedure:
almdisplay v / almdisplay res |more
status signaling-group X (check whether signaling group is functional or not)
ping <ip-address of Far-End node-name> board <Clan-dlocation from Step 6> (to confirm the
network integrity between the two nodes)
status media-processor board <Medpro-Bd location which is placed in the same cabinet as that
of Clan board found in the Step 6> (to confirm the adequate DSP resources)
Note: If any of Clan-BD/Medpro-BD alarms is present , then that could be the cause of the SGR alarm and hence
need to check the corresponding action to resolve those alarms as discussed in respective section
Probable Cause: The alarm may get report due to:
Network Issue between the two end-points of the signaling group OR
Bad health of Clan board (near end where Signaling group is terminated) OR
Bad health of Far-End Node-name
Alarm Description:
Note: This is only a Notification alarm which informs that the system load usage is more than 70 %.
Alarm Description:
HARD DISK,S, 77
Description: The majority of S8800 servers included the OEM MR10i RAID controller card with 256 MB Battery
Backed Write Cache for the application. The RAID battery used in these RAID controller cards is a Lithium Ion
battery. Over time, in normal operation, the RAID battery will have a reduced charge. The charging system in the
MR10i controller card extends the life of the battery but is not able to do so indefinitely.
Once the RAID battery falls below a minimum level of voltage output, it will no longer power the cache. When that
happens, the RAID controller switches from the default Write-Back to a Write-Through mode of operation. For
applications like Communication Manager, there will be no general performance degradation, although backup
time may be increased by slower read-write activity.
If the RAID battery is completely exhausted and the server is exposed to a commercial power source failure and no
Universal Power Supply (UPS) is being used, there is a chance that disk file corruption will occur if a write operation
is taking place at the time of the power interruption.
Procedure:
Login as root
To Get Default Cache policy: /opt/MegaRAID/MegaCli/MegaCli -LdPdInfo -aALL|egrep -i Default
To Get Current Cache policy: /opt/MegaRAID/MegaCli/MegaCli -LdPdInfo -aALL|egrep -i Current
If WriteThrough
raid_status -p -v (check any error counts for Media and Other, Also check if any predictive errors)
Note: If the Current Cache Policy is showing WriteThrough and the above show any of the errors and the Battery
Replacement required option as Yes. Then go ahead and notify the customer to change the raid battery for that
particular Device slot.
Replacement Information:
The RAID battery is a consumable product on the S8800 server, generally not covered under the maintenance
agreement, and is available for purchase.
Alternatively, an order for Avaya to install the replacement battery may also be placed when calling to purchase
the RAID battery. The S8800 RAID battery kit is generally available and may be ordered as a miscellaneous part. It
is customer-installable; alternatively, an order for Avaya to install the replacement battery may be placed. (Tech
Dispatch)
Switch Alarms.
Alarm Description:
PowerSupply_Fault_ExtremeAlpine
Description: Indicates that the state of the power supply for this system is not normal or has been
Shutdown
Procedure:
show power (check the power status)
show inline-power (check whether inline-power is enabled)
show log (check logs for any administration work or power outage which may be the cause of the
alarm).
Probable Cause: Some issue with power-supply of the switch.
Alarm Description:
pethPsePortOnOffNotification
Description: This kind of alarm generated whenever there is a change observed on PoE port of LAN switch
and/or PD (Powered Device like IP Phone).
Note: In above heading, peth means power-ethernet and PSE means power Sourcing element, port means a
port on POE switch.
Procedure:
show power (check the power status)
show port (check whether port is functional or not)
show log (check logs for any administration work or power outage or any other related logs which
could be the cause of the alarm).
Note: typical no solution is required and this is a normal behaviour since most of the times Voice
devices such as IP phones are connected to these switches and any unplugging or reseting ofphones could
generate this alarm.
Probable Cause: Whenever a change is observed on PoE port of a Lan switch, this alarm is generated.
Alarm Description:
Interface_Fault_MIB2/ ExceededMaximumUptime
Description: Indicates that an interface marked as Backup, Serial or Dial-on-Demand has been active for too
long.
Note: Alarm could be related to media server/switch/media-gateway etc
Procedure:
log into the device and check the uptime
if system is functioning fine, then ignore the alarm
if alarm is generated frequently, this indicates that threshold value for "Maximum uptime" is set
too low, and you need to increase it in SMARTS. So transfer this ticket to NA-MS-IPT-AI team to change
the configuration.
Probable Cause: The alarm is reported when any host is not rebooted atleast for a X no. of days. The value
of X is defined in Smart.
Alarm Description:
HighErrorRate
Description: Indicates that the percentage of error packets for either input or output exceeds ErrorThreshold.
Procedure:
show ports rxerror (check the received error count)
show ports txerror (check the transmitted error count)
If errors are present for the specific port then notify data team of customer through an email and
keep the case under monitoring. If error count increases then call customer and refer it to
Customers network team
Probable Cause: When error packets either received at receiver or transmitted by transmitter exceeds its