Académique Documents
Professionnel Documents
Culture Documents
October 7, 2014
1.1
1.1
THE SPECIFICATIONS AND INFORMATION REGARDING THE PRODUCTS IN THIS MANUAL ARE SUBJECT
TO CHANGE WITHOUT NOTICE. ALL STATEMENTS, INFORMATION, AND RECOMMENDATIONS IN THIS
MANUAL ARE BELIEVED TO BE ACCURATE BUT ARE PRESENTED WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED. USERS MUST TAKE FULL RESPONSIBILITY FOR THEIR APPLICATION OF ANY
PRODUCTS.
THE SOFTWARE LICENSE AND LIMITED WARRANTY FOR THE ACCOMPANYING PRODUCT ARE SET
FORTH IN THE INFORMATION PACKET THAT SHIPPED WITH THE PRODUCT AND ARE INCORPORATED
HEREIN BY THIS REFERENCE. IF YOU ARE UNABLE TO LOCATE THE SOFTWARE LICENSE OR LIMITED
WARRANTY, CONTACT YOUR CISCO REPRESENTATIVE FOR A COPY.
The Cisco implementation of TCP header compression is an adaptation of a program developed by the University of
California, Berkeley (UCB) as part of UCBs public domain version of the UNIX operating system. All rights reserved.
Copyright 1981, Regents of the University of California.
NOTWITHSTANDING ANY OTHER WARRANTY HEREIN, ALL DOCUMENT FILES AND SOFTWARE OF
THESE SUPPLIERS ARE PROVIDED AS IS WITH ALL FAULTS. CISCO AND THE ABOVE-NAMED
SUPPLIERS DISCLAIM ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION,
THOSE OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OR
ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE.
IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL,
CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR
LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THIS MANUAL, EVEN IF
CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other
countries. To view a list of Cisco trademarks, go to this URL: www.cisco.com/go/trademarks. Third-party trademarks
mentioned are the property of their respective owners. The use of the word partner does not imply a partnership
relationship between Cisco and any other company. (1110R)
Any Internet Protocol (IP) addresses and phone numbers used in this document are not intended to be actual addresses and
phone numbers. Any examples, command display output, network topology diagrams, and other figures included in the
document are shown for illustrative purposes only. Any use of actual IP addresses or phone numbers in illustrative
content is unintentional and coincidental.
2014 Cisco Systems, Inc. All rights reserved.
1.1
CONTENTS
Contents
C O N T E N T S ............................................................................................................................................................. 3
1.
PREFACE ................................................................................................................................................................ 5
2.
AUDIENCE .............................................................................................................................................................. 6
3.
8.1.
8.2.
9.
1.1
9.1.
9.2.
9.3.
9.4.
9.5.
9.6.
9.7.
9.8.
9.9.
9.10.
9.11.
9.12.
10.
10.1.
10.2.
10.3.
10.4.
10.5.
11.
1.
1.1
Preface
This document shows how to monitor QPS with operational trending information, and how to manage QPS based
on system notifications. Proactive monitoring in this way increases service availability and system usability,
Monitoring and alert notifications are provided via Network Monitoring Solutions (NMS) standard Simple Network
Management Protocol (SNMP) methodologies.
This preface covers the following topics:
2.
1.1
Audience
This guide is best used by these readers:
Deployment engineers
Implementation engineers
Network administrators
Network engineers
Network operators
System administrators
This document assumes a general understanding of network architecture and systems management. Specific
knowledge of the SNMP, specifically Version 2c, is required. Installation and initial configuration of QPS is a
prerequisite.
3.
1.1
3.1.
UCS.
MANGEMENT
NETWORK:
ENTEL-QPS-CDV
ENTEL-QPS-CDV-A
ENTEL-QPS-CDV-B
172.18.169.0/27
172.18.169.5
172.18.169.6
172.18.169.7
Protocol
HTTPS
SSH
SSH
User
admin
admin
admin
Table 1 Management IP
3.2.
3.2.1.
QNS URLs/IPs
Control Center:
http://172.18.169.164:8090
Unified SuM (deprecated):
http://172.18.169.164/portal/usum
Policy Builder:
http://172.18.169.164:7070/pb
Unified API WSDL:
http://172.18.169.164:8080
8080/ua/wsdl/UnifiedApi.wsdl
HAProxy Status:
http://172.18.169.164/haproxy?stats
172.18.169.160/28
172.18.169.161
172.18.169.162
172.18.169.163
172.18.169.164
172.18.169.165
172.18.169.166
pcrfclient01
172.18.169.167
pcrfclient01
172.18.169.168
IP Network CNT
Pass
NA
NA
NA
1.1
Networking CNT
Inside Firewall
QNS External Nets
Eth
Virtual
Machine
or VIP
Eth
Management
Subnet
Eth
Signalling
Subnet
172.18.169.164
172.31.234.68
1
1
1
1
1
172.18.169.170
172.18.169.165
172.18.169.166
172.18.169.167
172.18.169.168
2
2
2
172.31.234.77
172.31.234.69
172.31.234.70
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
172.31.234.71
172.31.234.72
172.31.234.73
172.31.234.74
172.31.234.75
172.31.234.76
172.31.234.77
172.31.234.78
172.31.234.79
172.31.234.80
172.31.234.81
172.31.234.82
172.31.234.83
172.31.234.84
172.31.234.85
172.31.234.86
172.31.234.87
172.31.234.88
172.31.234.89
172.31.234.90
172.31.234.91
lbvip01
Points to
External
lb01/02
lbvip02
172.18.175.4
Points to
Internal
lb01/02
172.18.175.15
172.18.175.5
172.18.175.6
172.18.175.7
172.18.175.8
172.18.175.9
172.18.175.10
172.18.175.11
172.18.175.12
172.18.175.13
172.18.175.14
172.18.175.15
172.18.175.16
172.18.175.17
172.18.175.18
172.18.175.19
172.18.175.20
172.18.175.21
172.18.175.22
172.18.175.23
172.18.175.24
172.18.175.25
172.18.175.26
172.18.175.27
172.18.175.28
172.18.175.29
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Arbiter
lb01
lb02
pcrfclient01
pcrfclient02
sessionmgr01
sessionmgr02
sessionmgr03
sessionmgr04
sessionmgr05
sessionmgr06
arbiter
qns01
qns02
qns03
qns04
qns05
qns06
qns07
qns08
qns09
qns10
qns11
qns12
qns13
qns14
IP Networking CNT
1.1
SM01
QNS08
QNS01
172.18.175.9
172.18.175.23
172.18.175.16
172.31.234.71
172.31.234.85
172.31.234.78
SM02
QNS09
QNS02
172.18.175.7
172.18.175.10
172.18.175.24
172.18.175.17
172.18.169.167
172.31.234.72
172.31.234.86
172.31.234.79
SM03
QNS10
QNS03
172.18.175.8
172.18.175.11
172.18.175.25
172.18.175.18
172.18.169.168
172.31.234.73
172.31.234.87
172.31.234.80
SM04
QNS11
QNS04
172.18.175.26
172.18.175.19
172.31.234.88
172.31.234.81
QPS CNT
PCRFCLIENT01
PCRFCLIENT02
172.18.175.12
172.31.234.74
LB-01
172.18.175.5
172.18.169.165
172.31.234.69
172.18.175.4
SM05
QNS12
QNS05
172.18.175.13
172.18.175.27
172.18.175.20
172.31.234.75
172.31.234.89
172.31.234.82
SM06
QNS13
QNS06
172.18.175.14
172.18.175.28
172.18.175.21
172.31.234.76
172.31.234.90
172.31.234.83
ARBITER
QNS14
QNS07
172.18.175.29
172.18.175.22
172.31.234.91
172.31.234.84
172.18.175.15
172.31.234.77
172.169.170
LBVIP-01
LBVIP-02
LB-02
172.31.234.68
172.18.169.164
172.18.175.6
172.18.169.166
172.31.234.70
1.1
1.1
3.3.
3.3.5.
QNS URLs/IPs
Control Center:
http://172.18.169.36:8090
Unified SuM (deprecated):
http://172.18.169.36/portal/usum
Policy Builder:
http://172.18.169.36:7070/pb
Unified API WSDL:
http://172.18.169.36:8080
8080/ua/wsdl/UnifiedApi.wsdl
HAProxy Status:
http://172.18.169.36/haproxy?stats
172.18.169.32/28
172.18.169.33
172.18.169.34
172.18.169.35
lbvip01
lb01
lb02
pcrfclient01
172.18.169.36
172.18.169.37
172.18.169.38
172.18.169.39
pcrfclient01
172.18.169.40
Table 2 IP Network CDV
1.1
Networking CDV
Eth
Inside Firewall
QNS External Nets
Virtual
Management
Machine
Eth
Subnet
or VIP
Signalling
Subnet
lbvip01
Points to
External
lb01/02
172.18.169.36
172.31.235.68
1
1
1
1
172.18.169.37
172.18.169.38
172.18.169.39
172.18.169.40
2
2
172.31.235.69
172.31.235.70
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
172.31.235.71
172.31.235.72
172.31.235.73
172.31.235.74
172.31.235.75
172.31.235.76
172.31.235.77
172.31.235.78
172.31.235.79
172.31.235.80
172.31.235.81
172.31.235.82
172.31.235.83
172.31.235.84
172.31.235.85
172.31.235.86
172.31.235.87
172.31.235.88
172.31.235.89
172.31.235.90
172.31.235.91
lbvip02
172.18.175.36
Points to
Internal
lb01/02
172.18.175.37
172.18.175.38
172.18.175.39
172.18.175.40
172.18.175.41
172.18.175.42
172.18.175.43
172.18.175.44
172.18.175.45
172.18.175.46
172.18.175.47
172.18.175.48
172.18.175.49
172.18.175.50
172.18.175.51
172.18.175.52
172.18.175.53
172.18.175.54
172.18.175.55
172.18.175.56
172.18.175.57
172.18.175.58
172.18.175.59
172.18.175.60
172.18.175.61
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
lb01
lb02
pcrfclient01
pcrfclient02
sessionmgr01
sessionmgr02
sessionmgr03
sessionmgr04
sessionmgr05
sessionmgr06
arbiter
qns01
qns02
qns03
qns04
qns05
qns06
qns07
qns08
qns09
qns10
qns11
qns12
qns13
qns14
IP Networking CDV
1.1
SM01
QNS08
QNS01
172.18.175.41
172.18.175.55
172.18.175.48
172.31.235.71
172.31.235.85
172.31.235.78
SM02
QNS09
QNS02
172.18.175.39
172.18.175.42
172.18.175.56
172.18.175.49
172.18.169.39
172.31.235.72
172.31.235.86
172.31.235.79
SM03
QNS10
QNS03
172.18.175.40
172.18.175.43
172.18.175.57
172.18.175.50
172.18.169.40
172.31.235.73
172.31.235.87
172.31.235.80
SM04
QNS11
QNS04
172.18.175.58
172.18.175.51
172.31.235.88
172.31.235.81
QPS CDV
PCRFCLIENT01
PCRFCLIENT02
172.18.175.44
172.31.235.74
LB-01
172.18.175.37
172.18.169.37
172.31.235.69
LBVIP-01
LBVIP-02
172.18.175.36
SM05
QNS12
QNS05
172.18.175.45
172.18.175.59
172.18.175.52
172.31.235.75
172.31.235.89
172.31.235.82
SM06
QNS13
QNS06
172.18.175.46
172.18.175.60
172.18.175.53
172.31.235.76
172.31.235.90
172.31.235.83
QNS14
QNS07
172.18.175.61
172.18.175.54
172.31.235.91
172.31.235.84
LB-02
172.31.235.68
172.18.169.36
172.18.175.38
172.18.169.38
172.31.235.70
1.1
4.
4.1.
1.1
A Cisco QPS deployment is comprised of multiple virtual instances deployed for scaling and high availability
purposes. The QPS Systems Monitoring and Notification Alerting system makes the entire QPS installation appear
as a single appliance. Rather than have administrators deal with a multitude of device agent endpoints, a single
entry point (LB) for NMS operational trending and monitoring is used. Likewise, notification alerting from the entire
system derives from a single point.
When QPS is deployed in a High Availability (HA) configuration, monitoring and alerting endpoints are deployed as
HA as well. This is shown in the illustration below.
4.2.
Technical Architecture
The Quantum Policy Suite is deployed as a distributed virtual appliance. The standard architecture uses VMWare
ESXi virtualization. Multiple physical hardware host components run VMWare ESXi, and each host runs several
virtual machines. Within each virtual machine, one-to-many internal QPS components can run. If you add HA
capabilities to the deployment, monitoring each QPS component individually becomes unwieldy. The QPS
monitoring and alert notification infrastructure simplifies the virtual, physical, and redundant aspects of the
architecture.
This section covers the following topics:
4.3.
Facility
Severity
Categorization
The QPS monitoring and alert notification infrastructure provides a simple, standards-based interface for network
administrators and NMS. SNMPv2 is the underlying protocol for all monitoring and alert notifications. Standard
SNMPv2 gets and notifications (traps) are used throughout the infrastructure and aggregated to an SNMP proxy. This
proxy provides a common endpoint for SNMP queries and also maps components into the Cisco Object Identifier
(OID) tree structure.
The following drawing shows the aggregation and mapping on the SNMP endpoint (LB).
4.4.
1.1
Cisco has a registered private enterprise Object Identifier (OID) of 26878. This OID is the base from which all
aggregated QPS metrics are exposed at the SNMP endpoint. The Cisco OID is fully specified and made humanreadable through a set of Cisco Management Information Base (MIB-II) files.
The current MIBs are defined as follows:
4.5.
MIB Filename
Purpose
BROADHOP-MIB.mib
BROADHOP-QNS-MIB.mib
BROADHOP-NOTIFICATION-MIB.mib
The Monitoring and Alert Notification infrastructure provides standard SNMPv2 get and getnext access to the QPS
system. This provides access to targeted metrics to trend and view Key Performance Indicators (KPI). Metrics
available through this part of the infrastructure are as general as component load and as specific as transactions
processed per second.
SNMPv2 Notifications, in the form of traps (one-way) are also provided by the infrastructure. QPS notifications do
not require acknowledgments. These provide both proactive alerts that pre-set thresholds have been passed (for
example, Disk is nearing full, CPU load high) and reactive alerting when system components fail or are in a
degraded state (for example, DEAD PROCESS, network connectivity outages, etc.).
Notifications and traps are categorized by a methodology similar to UNIX System Logging (syslog) with both Severity
and Facility markers. All event notifications (traps) contain these items:
Facility
Severity
Device time
These objects enable Network Operations Center (NOC) staff to identify where the issue lies,
the Facility (system layer), and the Severity (importance) of the reported issue.
4.6.
Facility
Number
Facility
Description
Hardware
Networking
Virtualization
Operating System
Application
Process
1.1
There may be overlaps in the Facility value as well as gaps if a particular SNMP agent does not have full view into an
issue. The Facility reported is always shown as viewed from the reporting SNMP agent.
4.7.
Severity
In addition to Facility, each notification has a Severity measure. The defined severities are directly from UNIX syslog
and defined as follows:
Number
Severity
Description
Emergency
System is unusable.
Alert
Critical
Critical conditions.
Error
Error conditions.
Warning
Warning conditions.
Notice
Info
Informational message.
Debug
None
Indicates no severity.
Clear
For the purposes of the QPS Monitoring and Alert Notifications system, Severity levels of Notice, Info and Debug
are usually not used. Warning conditions are often used for proactive threshold monitoring (for example, Disk
usage or CPU Load), which requires some action on the part of administrators, but not immediately. Conversely,
Emergency severity indicates that some major component of the system has failed and that either core policy
processing, session management or major system function is impacted.
4.8.
1.1
Categorization
Combinations of Facility and Severity create many possibilities of notifications (traps) that might be sent. However,
some combinations are more likely than others. The following table lists some noteworthy Facility and Severity
categorizations.
Facility.Severity
Categorization
Possibility
Process.Emergency
Hardware.Debug
Virtualization.Emergen
cy
It is not possible quantify every Facility and Severity combination. However, greater experience with QPS
leads to better diagnostics. The QPS Monitoring and Alert Notification infrastructure provides a baseline for
event definition and notification by an experienced engineer.
5.
1.1
5.1.
Component
Information
LB01/LB02
CpuUser
PortalLB01/PortalLB02
CpuSystem
PCRFClient01/PCRFClient02
CpuIdle
SessionMgr01/SessionMgr02
LoadAverage1
QNS01/QNS02/QNS03/QNS04
LoadAverage5
Portal01/Portal02
LoadAverage15
MemoryTotal
MemoryAvailable
SwapTotal
SwapAvailable
Eth0InOctets
Eth0OutOctets
Eth1InOctets
Eth1OutOctets
Entel Chile
5.2.
1.1
The following information is available, and is listed per component. MIB documentation provides units of
measure.
5.3.
Current version Key Performance Indicators (KPI) information is available at the OID root of:
.1.3.6.1.4.1.26878.200.2.3.53
This corresponds to an MIB of:
.iso
.identified-organization
.dod
.internet
.private
.enterprise
.broadhop
.broadhopProducts
.broadhopProductsQNS
.broadhopProductsQNSKPIVersion
.broadhopProductsQNSKPI53
Entel Chile
5.3.1.
1.1
The following application KPI's are available for monitoring on each node using SNMP Get and Walk
utilities.
Component
Information
LB01/LB02
PCRFProxyExternalCurrentSessions: is the
number of open connections to lbvip01:8080.
PCRFProxyInternalCurrentSessions: is the
number of open connections to lbvip02:8080
PortalLB01/PortalLB02
PortalProxyExternalCurrentSessions: is the
number of connections to sslvip01:80.
PCRFClient01/PCRFClient02
----------
SessionMgr01/SessionMgr02
----------
QNS01
QNS02
QNS03
QNS04
Portal01
----------
Portal02
Entel Chile
6.
1.1
The QPS Monitoring and Alert Notification framework provides the following SNMPv2 notification traps (oneway). Traps are either proactive or reactive. Proactive traps are alerts based on system events or changes
that require attention (for example, Disk is filling up). Reactive traps are alerts that an event has already
occurred (e.g., an application process died).
This section covers the following topics:
6.1.
Component Notifications
Application Notifications
Component Notifications
Components are devices that make up the QPS system. These are systems level traps. They are generated
when some predefined thresholds are crossed. User can define these thresholds in
/etc/snmp/snmpd.conf. For example, for disk full, low memory etc. Process snmpd is running on all the
VMs. When process snmpd starts, it notes the values set in snmpd.conf. Hence, whenever user makes any
change in snmpd.conf, the user must execute command
For example, if threshold crosses, snmpd throws a trap to LBVIP on the internal network on port 162. On
LB, process snmptrapd is listening on port 162. When snmptrap sees trap on 162, it logs it in the file
/var/log/snmp/trap and re-throws it on corporate_nms_ip on port 162. This corporate NMS IP is set
inside /etc/hosts file on LB1 and LB2. Typically, these components equate to running Virtual Machines.
1.1
Additional information about the notification, which might be a bit of log or other information.
Component Notifications that QPS generates are shown in the following list. Any component in the QPS
system may generate these notifications.
Name
Feature
Severity
Message Text
Component
Warning
DiskFullAlert
1.
2. /var
3. /home
4. /boot
5. /opt
Name
Feature
Severity
Message Text
Component
Clear
DiskFullClear
1.
2. /var
3. /home
4. /boot
5. /opt
Component
1 minute
Warning (1, 5
minutes)
HighLoadAlert
Alert (15
minutes)
5 minute
15 minute Average
Component
Entel Chile
Clear
HighLoadClear
1.1
1 minute
5 minute
15 minute Average
Low Swap memory alarm
LowSwapAlert
LowSwapClear
<Interface Name> is Up
LowMemoryClear
6.2.
Application Notifications
Applications are running processes on a component device that make up the QPS system. These are
application level traps. QPS process (starting with word java when we run "ps -ef") and some scripts (for GR
traps) generates these traps.
For example, when a trap is generated, it is thrown to LBVIP on internal network (can be on port 162. On
LB, process snmptrapd is listening on port 162. When snmptrap sees trap on 162, it logs it in the file
/var/log/snmpd/trap and re-throws it on corporate_nms_ip on port 162. This corporate NMS IP is set
inside /etc/hosts file on LB1 and LB2.
NOTIFICATION-
1.1
STATUS
current
DESCRIPTION "
Notification Trap from any QNS application i.e.,
runtime.
"
::= { broadhopProductsQNSNotifications 2 }
Each Application Notification contains these elements:
Additional information about the notification, which might be a portion of log or other information
Application Notifications that QPS generates are shown in the following list. Any application in QPS system
may generate these notifications.
Name
Feature
Severity
Message Text
License Usage
Threshold Exceeded
Application
The license threshold is defined and SNMP traps are sent out if the thresholds
ere exceeded by the real license usage. This limits the total session count usage.
Memcached
Application
Critical
ConnectError
MemcachedConnect
Error
Application
Major
Alert
Entel Chile
Application
Critical
1.1
Application
Emergency
The system license currently installed is not valid. This prevents system
operation until resolved. This is possible if no license is installed or if the
current license does not designate values. This also may occur if any system
networking MAC addresses have changed..
Application
Emergency
Critical
xxx is Expired %s
Major
License is invalid.
Application
License has expired.
Application
Critical
Major
Feature
Severity
Message Text
PolicyConfiguration
Application
Major
Application
Major
Application
Emergency
Configured
The policy engine cannot find any policies to apply while starting up. This may
occur on a new system, but requires immediate resolution for any system
services to operate.
DiameterPeerDown
Application
Major
Critical
Application
Primary member of replica is down and taken over by other primary member of
same replica set.
Entel Chile
Geo_Failover
Application
1.1
Critical
Primary member of replica set is down and taken over by other member of the
same replica set.
All_replica_of_DB_
down
Application
Critical
All replicas of
${SET_NAME}-SET$Loop are
down
Application
Critical
Critical
Secondary DB
%member_ip:%mem_port
(%mem_hostname) of SET $SET is
down
Application
Critical
Arbiter %member_ip:%mem_port
(%mem_hostname) of SET $SET is
down
Name
Feature
Severity
Message Text
Config Server is
Down
Application
Critical
Config Server
%member_ip:%mem_port
(%mem_hostname) of SET $SET is
down
Application
Critical
Application
Critical
The administrator is not able to ping to VM (This alarms gets generated for all
VMs configured inside /etc/hosts of lb).
QPS Process Down
Application
Critical
Application
Critical
Entel Chile
Alert
unable to connect
%INTERFACE(lbvip01/lbvip02)
VM. Not reachable
Cisco Systems, Inc.
1.1
Not able to ping the virtual Interface. This alarm gets generated for lbvip01,
lbvip02.
Developer Mode
License traps
Application
Warning
Application
ZeroMQConnection
Error
Application
Major
Application
Clear
Name
Feature
Severity
Message Text
Diameter Critical
Failure
Application
Critical
Application
Major
Entel Chile
7.
1.1
All access to system statistics and KPIs should be collected via SNMP gets and walks from the virtual IP
lbvip01, which can be located on either lb01 or lb02 load balancers. System notifications are also sourced
from this address.
Configuration of the system consists of following:
7.1.
At the time of installation, SNMPv2 gets and walks can be performed against the system lbvip01 with the
default read-only community string of Cisco using standard UDP port 161. The IP address of lbvip01 can be
found in the /etc/hosts file of pcrfclient01, lb01 or lb02.
The read-only community string can be changed from its default of Cisco to a new value using the following
steps:
7.2.
After the previous configurations have been made, notifications should be logged locally in the
/var/log/snmp/trap file as well as forwarded to the NMS destination at corporate_nms_ip. By default,
traps are sent to the destination corporate_nms_ip using the SNMPv2 community string of Cisco. The
standard SNMP UDP trap port of 162 is also used. Both of these values may be changed to
7.3.
This section describes the commands for validation and testing of the QPS SNMP infrastructure during its
development. You can use these commands now to validate and test your system during setup,
configuration, or at any point. Our examples use MIB values because they are more descriptive, but you may
use equivalent OID values if you like, particularly when configuring an NMS.
The examples here use Net-SNMP snmpget, snmpwalk and snmptrap programs. Detailed configuration of
this application is outside the scope of this document, but the examples assume that the three Cisco MIBs
are installed in the locations described on the man page of snmpcmd (typically the
/home/share/<user>/.snmp/mibs or /usr/share/snmp/mibs directories).
Validation and testing is of three types and correspond to the statistics and notifications detailed earlier in
this document:
Component Statistics
Application KPI
Notifications
Entel Chile
1.1
Run all tests from a client with network access to the Management Network or from the lb01, lb02,
pcrfclient01 or pcrfclient02 hosts (which are also on the Management Network).
7.4.
Component Statistics
Component statistics can be obtained on a per statistic basis with snmpget. As an example, to get the
current available memory on pcrfclient01 use the following command:
where <lbvip01> is the IP address of lbvip01 or as resolved from the /etc/hosts file. An example of the
output from this command is:
BROADHOP-QNS-MIB::component53PCRFClient01MemoryAvailable = INTEGER: 629100
Interpret this output means that 629,100 MB of memory are available on this component machine.
All available component statistics in an MIB node can be walked via the snmpwalk command. This is very
similar to snmpget as above. For example, to see all statistics on lb01 use the command:
where <lbvip01> is the IP address of lbvip01 or as resolved from the /etc/hosts file. An example of the
output from this command is:
7.5.
Application KPI
Application KPI can be obtained on a per statistic basis with snmpget in a manner much like obtaining
Component Statistics. As an example, to get the number of sessions currently active on qns01, use the
following command:
where <lbvip01> is the IP address of lbvip01 or as resolved from the /etc/hosts file. An example of the
output from this command would be:
Entel Chile
1.1
Similarly, all available KPI in an MIB node can be walked via the snmpwalk command. This is very similar
to snmpget as above. As an example, to see all statistics on qns02, use the following command:
where <lbvip01> is the IP address of lbvip01 or as resolved from the /etc/hosts file. An example of the
output from this command would be:
BROADHOP-QNS-MIB::kpi53QNS02PolicyCount
STRING:
BROADHOP-QNS-
MIB::kpi53QNS02QueueSize = STRING: 0
BROADHOP-QNS-MIB::kpi53QNS02FailedEnqueueCount
STRING:
BROADHOP-QNS-MIB::kpi53QNS02ErrorCount
STRING: 0 BROADHOP-QNS-MIB::kpi53QNS02SessionCount =
STRING: 937
BROADHOP-QNS-MIB::kpi53QNS02FreeMemory = STRING: 3721598032
Entel Chile
8.
1.1
Notifications
Testing and validating notifications requires slightly more skill than testing SNMP gets and walks. Recall
that the overall architecture is that all components and applications in the QPS system are configured to
send notifications to lb01 or lb02 via lbvip02, the Internal Network IP. These systems log the notification
locally in /var/log/snmp/trap and then re-throw the notification to the destination configured by
corporate_nms_ip. Two testing and troubleshooting methods are illustrated below: confirming notifications
are being sent properly from system components to lb01 or lb02, and confirming that notifications can be
sent upstream to the NMS.
8.1.
Receiving Notifications
There are several ways to confirm that lb01 or lb02 are properly receiving notifications from components.
First, determine the active load balancer it is either lb01 or lb02 and have multiple IP addresses per
interface as shown by the ifconfig command.
8.2.
Upstream Notifications
Should a notification not be received by the NMS, you can manually throw a notification from the active load
balancer to the NMS using this command:
where <corporate_nms_ip> is the appropriate NMS IP address. This sends an SNMPv2 trap from the
active load balancer to the NMS and can be used for debugging.
Entel Chile
9.
1.1
Logs
Most of the QPS logs are located locally in /var/log/broadhop/ on virtual machines IOMGRxx, QNSxx,
and PCRFclientXX. PCRFclient01 contains the consolidated logs from all of the IOMGR, QNS and
PCRFCLIENT virtual machines.
The QPS logs can be divided based on Application/Script which produces the logs:
9.1.
QNS pcrfclient01:
/opt/broadhop/log/consolidated-qns.log
/opt/broadhop/log/qns-engine.log
QNS01 04 & LB01/02:
/var/log/broadhop
Mongo sessionmgr01/02:
/var/log/mongodb-277XX.log
Portal portal01/02:
/var/log/httpd/access.log
9.2.
AIO:
/var/log/broadhop/upgrade<date>_<time>.log
HA/GR:
Entel Chile
pcrfclient01:/var/log/broadhop/upgrade<date>_<time>.log
Log Rollover: No
PRIVATE AND CONFIDENTIAL
Page 33 of 46
9.3.
1.1
subscriber has, the state of the session, and other useful information.
Log file name, format, path:
1.1
/etc/broadhop/controlcenter/logback.xml
9.4.
path: AIO:
/var/log/broadhop/ qnspb.log
HA/GR: pcrfclient01: /var/log/broadhop/qns-pb.log
Log config File: NA
Log Rollover: No
1.1
9.5.
9.6.
Log Rollover: No
path: AIO:
/var/log/httpd/access_l
og
HA/GR: pcrfclient01: /var/log/httpd/access_log
Log config File: /etc/httpd/conf/httpd.conf
Log Rollover: Yes
during serving requests are logged to this file. This apache log file often
contain details of what went wrong and how to fix it.
Log file name, format, path:
AIO: /var/log/httpd/error_log
HA/GR: pcrfclient01: /var/log/httpd/error_log
Log config File: /etc/httpd/conf/httpd.conf
Log Rollover: Yes
Entel Chile
9.7.
1.1
system.
Log file name, format, path:
AIO:
/var/www/portal/app/tmp/logs/api_request.log
AIO:
/var/www/portal/app/tmp/logs/api_response.log
AIO: /var/www/portal/app/tmp/logs/error.log
HA/GR: portal01: /var/www/portal/app/tmp/logs/error.log
Log config File: NA
Log Rollover: No
AIO: /var/www/portal/app/tmp/logs/debug.log
HA/GR:
/var/www/portal/app/tmp/logs/debug.log
Entel Chile
9.8.
1.1
9.9.
Log Rollover: No
Description: svn log command displays commit log messages. For more
information refer:
/usr/bin/svn log -help. For example,./usr/bin/svn log http://lbvip02/repos/run
9.10.
Log Rollover: No
Description: Contains log of all sessions established with QPS VM. SSH
session logs, cron job logs.
Entel Chile
9.11.
1.1
path: AIO:
/var/log/carbon/client.l
og
HA/GR: pcrfclient01: /var/log/carbon/client.log
Log config File: /etc/carbon/carbon.conf
Log Rollover: No
path: AIO:
/var/log/carbon/
console.log
HA/GR: pcrfclient01: /var/log/carbon/console.log
Log config File: /etc/carbon/carbon.conf
Log Rollover: No
path: AIO:
/var/log/carbon/query.l
og
HA/GR: pcrfclient01: /var/log/carbon/query.log
Log config File: /etc/carbon/carbon.conf
Log Rollover: No
path: AIO:
/var/log/carbon/creates.l
og
HA/GR: pcrfclient01: /var/log/carbon/creates.log
Entel Chile
1.1
path: AIO:
/var/log/carbon/listener.l
og
HA/GR: pcrfclient01: /var/log/carbon/listener.log
Log config File: /etc/carbon/carbon.conf
Log Rollover: Yes
9.12.
Log: haproxy
Entel Chile
1.1
10.1.
It is good to understand when the call was supposed to occur in order to narrow
down the issue.
If there are no ERRORs or no exceptions, etc, then we can increase the logging
levels. Policy tracing and logs at DEBUG levels can usually indicate the problem.
But just like a router, too much debugging can affect the performance of the system.
Use grep usernames, mac addresses, IP addresses, etc in logs to find required
information.
QNS
TRACE messages are so verbose they are not recorded in the consolidated log in a distributed
configuration so you need review individual logs on the VMs themselves
Portal
Logging level and the actual effective logging level can be two different levels because of the following
logback logging rules:
1. When a logging level is set, if the logging level of the parent process is higher than the logging level of
the child process, then the effective logging level of the child process is that of the parent process.
That is, even though the logging level of the child process is set, it cannot be below the logging level
of the parent process and is automatically overridden to the higher logging level of the parent
process.
2. There is a global root logging level that each process can inherit as an effective default logging
level. If you do not want to have a default effective logging level, then set the root level to OFF.
3. Each logging level prints the output of the lower logging levels.
Entel Chile
1.1
This following table displays the logging level and the message types printed due to 3. above.
Level
All
Trace
Debug
Info
Warn
Error
Error
Off
The following table describes the different logging levels and what they should be used for:
Error
Database is not
available.
Warn
Info
NA
Debug
NA
NA
Trace
NA
NA
You can configure target and log rotation for consolidated logs in log configuration file
/etc/broadhop/controlcenter/logback.xml.
Entel Chile
1.1
<minIndex>1</minIndex>
<maxIndex>5</maxIndex>
</rollingPolicy>
<triggeringPolicy
class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolic
y">
<maxFileSize>100MB</maxFileSize>
</triggeringPolicy>
Using the above configuration 100 MB log files are generated and after that log files rotate from index 1 to
5. When the 100 MB log file trigger condition is met, the order in which QPS system performs the file
operations is:
Entel Chile
1.1
10.2.
10.3.
Go to the bottom of the log file and search backwards for ERROR
QNS pcrfclient01:
/etc/broadhop/qns.conf
/etc/broadhop/logback.xml
10.4.
Domain Troubleshooting
1. tail f consolidated-qns.log to determine what domain is being calculated for your call flow
2. The domain calculation comes after the location query response to the portal
3. If the domain calculation is wrong either the wrong portal has been determined or the wrong domain is
associated with your desired portal page
4. You can fix your domain through the portal administration page
5. If your domain is correct but the call flow is incorrect after that, access policy builder to review your
domain configuration
10.5.
Test service definition requests from ISG to the QNS by running the following command. Testing a service does
not require a password. Testing a user does.
a) test aaa group <YOUR_AAA_GROUP> <YOUR_SERVICE_OR_USER_NAME>
<ANY_PASSWORD_FOR_SERVICE_OR_USER_PASSWORD> legacy
b) Use this command on all policy map services that is assigned to the interface you are working
with (e.g. PBHK, OPENGARDEN, L4 REDIRECT, ).
2. Listen for RADIUS traffic from the ISG by logging into lb01 and lb02. Depending on the problem, you want to
review Access Requests (typically port 1812) and/or Accounting Requests (typically port 1813). If you are working
ISG Prepaid, you want to listen in on all active ports (e.g. 1812,1813,1814 and 1815). The following lists
tcpdumps that you may want to run and then review in wireshark. The -w option writes the output to a pcap file
and the -vvv option writes verbose output to the console.
tcpdump -i any port 1812 -s0 -vvv -w /tmp/tcpdump.pcap (Access Request Only)
Entel Chile
1.1
tcpdump -i any port 1812 and port 1813 -s0 -vvv -w /tmp/tcpdump.pcap
tcpdump -i any port 1812 or port 1813 or port 1814 or port 1815 -s0 vvv -w tcpdump.pcap
Entel Chile
1.1
Entel Chile