QPS Rapid TroubleShooting

QPS Health Check
CiscoQuantumPolicySuite6.1.1Alarming and SNMP Guide

Version 6.1.1
October 7, 2014
Cisco Systems, Inc.

www.cisco.com
Cisco has more than 200 offices worldwide. Addresses,

phone numbers, and fax numbers are listed on the Cisco
website at www.cisco.com/go/offices.
Text Part Number: OL-31984-02
1.1
QPS Health Check
1.1
THE SPECIFICATIONS AND INFORMATION REGARDING THE PRODUCTS IN THIS MANUAL ARE SUBJECT
TO CHANGE WITHOUT NOTICE. ALL STATEMENTS, INFORMATION, AND RECOMMENDATIONS IN THIS
MANUAL ARE BELIEVED TO BE ACCURATE BUT ARE PRESENTED WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED. USERS MUST TAKE FULL RESPONSIBILITY FOR THEIR APPLICATION OF ANY
PRODUCTS.
THE SOFTWARE LICENSE AND LIMITED WARRANTY FOR THE ACCOMPANYING PRODUCT ARE SET
FORTH IN THE INFORMATION PACKET THAT SHIPPED WITH THE PRODUCT AND ARE INCORPORATED
HEREIN BY THIS REFERENCE. IF YOU ARE UNABLE TO LOCATE THE SOFTWARE LICENSE OR LIMITED
WARRANTY, CONTACT YOUR CISCO REPRESENTATIVE FOR A COPY.
The Cisco implementation of TCP header compression is an adaptation of a program developed by the University of
California, Berkeley (UCB) as part of UCBs public domain version of the UNIX operating system. All rights reserved.
Copyright 1981, Regents of the University of California.
NOTWITHSTANDING ANY OTHER WARRANTY HEREIN, ALL DOCUMENT FILES AND SOFTWARE OF
THESE SUPPLIERS ARE PROVIDED AS IS WITH ALL FAULTS. CISCO AND THE ABOVE-NAMED
SUPPLIERS DISCLAIM ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION,
THOSE OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OR
ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE.
IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL,
CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR
LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THIS MANUAL, EVEN IF
CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other
countries. To view a list of Cisco trademarks, go to this URL: www.cisco.com/go/trademarks. Third-party trademarks
mentioned are the property of their respective owners. The use of the word partner does not imply a partnership
relationship between Cisco and any other company. (1110R)
Any Internet Protocol (IP) addresses and phone numbers used in this document are not intended to be actual addresses and
phone numbers. Any examples, command display output, network topology diagrams, and other figures included in the
document are shown for illustrative purposes only. Any use of actual IP addresses or phone numbers in illustrative
content is unintentional and coincidental.
2014 Cisco Systems, Inc. All rights reserved.
QPS Health Check
1.1
CONTENTS
Contents
C O N T E N T S ............................................................................................................................................................. 3
1.
PREFACE ................................................................................................................................................................ 5
2.
AUDIENCE .............................................................................................................................................................. 6
3.
NODE ACCESS AND IP ADDRESSING .................................................................................................................. 7
3.1. UCS. ..................................................................................................................................................................... 7

3.2. QPS Aplication Site Centro (CNT) ......................................................................................................................... 7
3.2.1. QNS URLs/IPs ................................................................................................................................................... 7
3.2.2. Networking CNT ................................................................................................................................................. 8
3.2.3. CNT Virtual Machines (VMs) Network Diagram ................................................................................................... 9
3.2.4. CNT UCS Blades Assigned .............................................................................................................................. 10
3.3. QPS Aplication Site Cuidad de los Valles (CDV) .................................................................................................. 11
3.3.5. QNS URLs/IPs ................................................................................................................................................. 11
3.3.6. Networking CDV ............................................................................................................................................... 12
3.3.7. CDV Virtual Machines (VMs) Network Diagram ................................................................................................. 13
3.3.8. CDV UCS Blades Assigned .............................................................................................................................. 14
4.
4.1.
4.2.
4.3.
4.4.
4.5.
4.6.
4.7.
4.8.
5.
MONITORING AND ALERT NOTIFICATION .......................................................................................................... 15

Architectural Overview ......................................................................................................................................... 15
Technical Architecture ......................................................................................................................................... 15
Protocols and Query Endpoints ............................................................................................................................ 15
SNMP Object Identifier and Management Information Base................................................................................... 16
SNMPv2 Data and Notifications ............................................................................................................................ 16
Facility................................................................................................................................................................. 16
Severity ............................................................................................................................................................... 17
Categorization ..................................................................................................................................................... 18
SNMP SYSTEM AND APPLICATION KPIS ............................................................................................................ 19
5.1. SNMP System KPIs ............................................................................................................................................. 19

5.2. Details of SNMP System KPIs .............................................................................................................................. 20
5.3. SNMP Application KPIs ........................................................................................................................................ 20
5.3.1. Summary of SNMP Application KPIs .................................................................................................................. 21
6.
6.1.
6.2.
7.
7.1.
7.2.
7.3.
NOTIFICATIONS AND ALERTING (TRAPS) .......................................................................................................... 22

Component Notifications ...................................................................................................................................... 22
Application Notifications ....................................................................................................................................... 24
CONFIGURATION AND USAGE ............................................................................................................................ 29
Configuration for SNMP gets and walks ................................................................................................................. 29
Configuration for Notifications (traps) .................................................................................................................... 29
Validation and Testing .......................................................................................................................................... 29
QPS Health Check

7.4.
7.5.
8.
Component Statistics ........................................................................................................................................... 30

Application KPI .................................................................................................................................................... 30
NOTIFICATIONS ................................................................................................................................................... 32
8.1.
8.2.
9.
1.1
Receiving Notifications ......................................................................................................................................... 32

Upstream Notifications ......................................................................................................................................... 32
LOGS..................................................................................................................................................................... 33
9.1.
9.2.
9.3.
9.4.
9.5.
9.6.
9.7.
9.8.
9.9.
9.10.
9.11.
9.12.
Useful logs for troubleshooting............................................................................................................................. 33

Application/Script Produces Logs: Upgrade Script, Upgrade Binary ....................................................................... 33
Application/Script Produces Logs: qns.................................................................................................................. 34
Application/Script Produces Logs: qns pb ............................................................................................................. 35
Application/Script Produces Logs: mongo ............................................................................................................. 36
Application/Script Produces Logs: httpd................................................................................................................ 36
Application/Script Produces Logs: portal............................................................................................................... 37
Application/Script Produces Logs: license manager .............................................................................................. 38
Application/Script Produces Logs: svn .................................................................................................................. 38
Application/Script Produces Logs: auditd ............................................................................................................ 38
Application/Script Produces Logs: graphite ......................................................................................................... 39
Application/Script Produces Logs: kernel ............................................................................................................ 40
10.
BASIC TROUBLESHOOTING USING QPS LOGS ............................................................................................... 41
10.1.
10.2.
10.3.
10.4.
10.5.
11.
Logging Level and Effective Logging Level ......................................................................................................... 41

Diagnostics and Status scripts ........................................................................................................................... 44
Useful Configuration Files .................................................................................................................................. 44
Domain Troubleshooting .................................................................................................................................... 44
tcpdump to Troubleshoot Interfaces ................................................................................................................... 44
MIBS AND COMPONENTS. ................................................................................................................................ 46
QPS Health Check
1.
1.1
Preface
This document shows how to monitor QPS with operational trending information, and how to manage QPS based
on system notifications. Proactive monitoring in this way increases service availability and system usability,
Monitoring and alert notifications are provided via Network Monitoring Solutions (NMS) standard Simple Network
Management Protocol (SNMP) methodologies.
This preface covers the following topics:
QPS Health Check
2.
1.1
Audience
This guide is best used by these readers:
Deployment engineers
Implementation engineers
Network administrators
Network engineers
Network operators
System administrators
This document assumes a general understanding of network architecture and systems management. Specific
knowledge of the SNMP, specifically Version 2c, is required. Installation and initial configuration of QPS is a
prerequisite.
QPS Health Check
3.
1.1
Node Access and IP Addressing
3.1.
UCS.
MANGEMENT
NETWORK:
ENTEL-QPS-CDV
ENTEL-QPS-CDV-A
ENTEL-QPS-CDV-B
172.18.169.0/27
172.18.169.5
172.18.169.6
172.18.169.7
Protocol
HTTPS
SSH
SSH
User
admin
admin
admin
Table 1 Management IP
3.2.
QPS Aplication Site Centro (CNT)
3.2.1.
QNS URLs/IPs
Control Center:
http://172.18.169.164:8090
Unified SuM (deprecated):
http://172.18.169.164/portal/usum
Policy Builder:
http://172.18.169.164:7070/pb
Unified API WSDL:
http://172.18.169.164:8080
8080/ua/wsdl/UnifiedApi.wsdl
HAProxy Status:
http://172.18.169.164/haproxy?stats
MANGEMENT PCRF CNT:

VLAN 156
172.18.169.160/28
172.18.169.161
SW1-PCRF-CDV - VLAN 156
172.18.169.162

lbvip01
lb01
lb02
172.18.169.163
172.18.169.164
172.18.169.165
172.18.169.166
pcrfclient01
172.18.169.167
pcrfclient01
172.18.169.168
IP Network CNT
Pass
NA
NA
NA
QPS Health Check

3.2.2.
1.1
Networking CNT
Inside Firewall
QNS External Nets
QNS Internal Nets

Internal
Subnet
Eth
Virtual
Machine
or VIP
QNS External Nets
Eth
Management
Subnet
Eth
Signalling
Subnet
172.18.169.164
172.31.234.68
1
1
1
1
1
172.18.169.170
172.18.169.165
172.18.169.166
172.18.169.167
172.18.169.168
2
2
2
172.31.234.77
172.31.234.69
172.31.234.70
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
172.31.234.71
172.31.234.72
172.31.234.73
172.31.234.74
172.31.234.75
172.31.234.76
172.31.234.77
172.31.234.78
172.31.234.79
172.31.234.80
172.31.234.81
172.31.234.82
172.31.234.83
172.31.234.84
172.31.234.85
172.31.234.86
172.31.234.87
172.31.234.88
172.31.234.89
172.31.234.90
172.31.234.91
lbvip01
Points to
External
lb01/02
lbvip02
172.18.175.4
Points to
Internal
lb01/02
172.18.175.15
172.18.175.5
172.18.175.6
172.18.175.7
172.18.175.8
172.18.175.9
172.18.175.10
172.18.175.11
172.18.175.12
172.18.175.13
172.18.175.14
172.18.175.15
172.18.175.16
172.18.175.17
172.18.175.18
172.18.175.19
172.18.175.20
172.18.175.21
172.18.175.22
172.18.175.23
172.18.175.24
172.18.175.25
172.18.175.26
172.18.175.27
172.18.175.28
172.18.175.29
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Arbiter
lb01
lb02
pcrfclient01
pcrfclient02
sessionmgr01
sessionmgr02
sessionmgr03
sessionmgr04
sessionmgr05
sessionmgr06
arbiter
qns01
qns02
qns03
qns04
qns05
qns06
qns07
qns08
qns09
qns10
qns11
qns12
qns13
qns14
IP Networking CNT
QPS Health Check

3.2.3.
1.1
CNT Virtual Machines (VMs) Network Diagram
SM01
QNS08
QNS01
172.18.175.9
172.18.175.23
172.18.175.16
172.31.234.71
172.31.234.85
172.31.234.78
SM02
QNS09
QNS02
172.18.175.7
172.18.175.10
172.18.175.24
172.18.175.17
172.18.169.167
172.31.234.72
172.31.234.86
172.31.234.79
SM03
QNS10
QNS03
172.18.175.8
172.18.175.11
172.18.175.25
172.18.175.18
172.18.169.168
172.31.234.73
172.31.234.87
172.31.234.80
SM04
QNS11
QNS04
172.18.175.26
172.18.175.19
172.31.234.88
172.31.234.81
QPS CNT
PCRFCLIENT01
PCRFCLIENT02
172.18.175.12
172.31.234.74
LB-01
172.18.175.5
172.18.169.165
172.31.234.69
172.18.175.4
SM05
QNS12
QNS05
172.18.175.13
172.18.175.27
172.18.175.20
172.31.234.75
172.31.234.89
172.31.234.82
SM06
QNS13
QNS06
172.18.175.14
172.18.175.28
172.18.175.21
172.31.234.76
172.31.234.90
172.31.234.83
ARBITER
QNS14
QNS07
172.18.175.29
172.18.175.22
172.31.234.91
172.31.234.84
172.18.175.15
172.31.234.77
172.169.170
LBVIP-01
LBVIP-02
LB-02
172.31.234.68
172.18.169.164
172.18.175.6
172.18.169.166
172.31.234.70
QPS Health Check

3.2.4.
CNT UCS Blades Assigned
1.1
QPS Health Check
1.1
3.3.
QPS Aplication Site Cuidad de los Valles (CDV)
3.3.5.
QNS URLs/IPs
Control Center:
http://172.18.169.36:8090
Unified SuM (deprecated):
http://172.18.169.36/portal/usum
Policy Builder:
http://172.18.169.36:7070/pb
Unified API WSDL:
http://172.18.169.36:8080
8080/ua/wsdl/UnifiedApi.wsdl
HAProxy Status:
http://172.18.169.36/haproxy?stats
MANGEMENT QPS CDV:

VLAN 156
172.18.169.32/28
172.18.169.33
172.18.169.34
172.18.169.35
lbvip01
lb01
lb02
pcrfclient01
172.18.169.36
172.18.169.37
172.18.169.38
172.18.169.39
pcrfclient01
172.18.169.40
Table 2 IP Network CDV
QPS Health Check

3.3.6.
1.1
Networking CDV
QNS Internal Nets

Internal
Subnet
Eth
Inside Firewall
QNS External Nets
Virtual
Management
Machine
Eth
Subnet
or VIP
QNS External Nets

Eth
Signalling
Subnet
lbvip01
Points to
External
lb01/02
172.18.169.36
172.31.235.68
1
1
1
1
172.18.169.37
172.18.169.38
172.18.169.39
172.18.169.40
2
2
172.31.235.69
172.31.235.70
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
172.31.235.71
172.31.235.72
172.31.235.73
172.31.235.74
172.31.235.75
172.31.235.76
172.31.235.77
172.31.235.78
172.31.235.79
172.31.235.80
172.31.235.81
172.31.235.82
172.31.235.83
172.31.235.84
172.31.235.85
172.31.235.86
172.31.235.87
172.31.235.88
172.31.235.89
172.31.235.90
172.31.235.91
lbvip02
172.18.175.36
Points to
Internal
lb01/02
172.18.175.37
172.18.175.38
172.18.175.39
172.18.175.40
172.18.175.41
172.18.175.42
172.18.175.43
172.18.175.44
172.18.175.45
172.18.175.46
172.18.175.47
172.18.175.48
172.18.175.49
172.18.175.50
172.18.175.51
172.18.175.52
172.18.175.53
172.18.175.54
172.18.175.55
172.18.175.56
172.18.175.57
172.18.175.58
172.18.175.59
172.18.175.60
172.18.175.61
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
lb01
lb02
pcrfclient01
pcrfclient02
sessionmgr01
sessionmgr02
sessionmgr03
sessionmgr04
sessionmgr05
sessionmgr06
arbiter
qns01
qns02
qns03
qns04
qns05
qns06
qns07
qns08
qns09
qns10
qns11
qns12
qns13
qns14
IP Networking CDV
QPS Health Check

3.3.7.
1.1
CDV Virtual Machines (VMs) Network Diagram
SM01
QNS08
QNS01
172.18.175.41
172.18.175.55
172.18.175.48
172.31.235.71
172.31.235.85
172.31.235.78
SM02
QNS09
QNS02
172.18.175.39
172.18.175.42
172.18.175.56
172.18.175.49
172.18.169.39
172.31.235.72
172.31.235.86
172.31.235.79
SM03
QNS10
QNS03
172.18.175.40
172.18.175.43
172.18.175.57
172.18.175.50
172.18.169.40
172.31.235.73
172.31.235.87
172.31.235.80
SM04
QNS11
QNS04
172.18.175.58
172.18.175.51
172.31.235.88
172.31.235.81
QPS CDV
PCRFCLIENT01
PCRFCLIENT02
172.18.175.44
172.31.235.74
LB-01
172.18.175.37
172.18.169.37
172.31.235.69
LBVIP-01
LBVIP-02
172.18.175.36
SM05
QNS12
QNS05
172.18.175.45
172.18.175.59
172.18.175.52
172.31.235.75
172.31.235.89
172.31.235.82
SM06
QNS13
QNS06
172.18.175.46
172.18.175.60
172.18.175.53
172.31.235.76
172.31.235.90
172.31.235.83
QNS14
QNS07
172.18.175.61
172.18.175.54
172.31.235.91
172.31.235.84
LB-02
172.31.235.68
172.18.169.36
172.18.175.38
172.18.169.38
172.31.235.70
QPS Health Check

3.3.8.
CDV UCS Blades Assigned
1.1
QPS Health Check
4.
4.1.
1.1
Monitoring and Alert Notification

Architectural Overview
A Cisco QPS deployment is comprised of multiple virtual instances deployed for scaling and high availability
purposes. The QPS Systems Monitoring and Notification Alerting system makes the entire QPS installation appear
as a single appliance. Rather than have administrators deal with a multitude of device agent endpoints, a single
entry point (LB) for NMS operational trending and monitoring is used. Likewise, notification alerting from the entire
system derives from a single point.
When QPS is deployed in a High Availability (HA) configuration, monitoring and alerting endpoints are deployed as
HA as well. This is shown in the illustration below.
4.2.
Technical Architecture
The Quantum Policy Suite is deployed as a distributed virtual appliance. The standard architecture uses VMWare
ESXi virtualization. Multiple physical hardware host components run VMWare ESXi, and each host runs several
virtual machines. Within each virtual machine, one-to-many internal QPS components can run. If you add HA
capabilities to the deployment, monitoring each QPS component individually becomes unwieldy. The QPS
monitoring and alert notification infrastructure simplifies the virtual, physical, and redundant aspects of the
architecture.
This section covers the following topics:
4.3.
Protocols and Query Endpoints
SNMP Object Identifier and Management Information Base
SNMPv2 Data and Notifications
Facility
Severity
Categorization
Emergency Severity Note
Protocols and Query Endpoints
The QPS monitoring and alert notification infrastructure provides a simple, standards-based interface for network
administrators and NMS. SNMPv2 is the underlying protocol for all monitoring and alert notifications. Standard
SNMPv2 gets and notifications (traps) are used throughout the infrastructure and aggregated to an SNMP proxy. This
proxy provides a common endpoint for SNMP queries and also maps components into the Cisco Object Identifier
(OID) tree structure.
The following drawing shows the aggregation and mapping on the SNMP endpoint (LB).
QPS Health Check
4.4.
1.1
SNMP Object Identifier and Management Information Base
Cisco has a registered private enterprise Object Identifier (OID) of 26878. This OID is the base from which all
aggregated QPS metrics are exposed at the SNMP endpoint. The Cisco OID is fully specified and made humanreadable through a set of Cisco Management Information Base (MIB-II) files.
The current MIBs are defined as follows:
4.5.
MIB Filename
Purpose
BROADHOP-MIB.mib
Defines the main structure, including structures and

codes.
BROADHOP-QNS-MIB.mib
Defines the retrievable statistics and KPI.
BROADHOP-NOTIFICATION-MIB.mib
Defines Notifications/Traps available.
SNMPv2 Data and Notifications
The Monitoring and Alert Notification infrastructure provides standard SNMPv2 get and getnext access to the QPS
system. This provides access to targeted metrics to trend and view Key Performance Indicators (KPI). Metrics
available through this part of the infrastructure are as general as component load and as specific as transactions
processed per second.
SNMPv2 Notifications, in the form of traps (one-way) are also provided by the infrastructure. QPS notifications do
not require acknowledgments. These provide both proactive alerts that pre-set thresholds have been passed (for
example, Disk is nearing full, CPU load high) and reactive alerting when system components fail or are in a
degraded state (for example, DEAD PROCESS, network connectivity outages, etc.).
Notifications and traps are categorized by a methodology similar to UNIX System Logging (syslog) with both Severity
and Facility markers. All event notifications (traps) contain these items:
Facility
Severity
Source (device name)
Device time
These objects enable Network Operations Center (NOC) staff to identify where the issue lies,
the Facility (system layer), and the Severity (importance) of the reported issue.
4.6.
Facility
The generic syslog Facility has the following definitions.
Number
Facility
Description
Hardware
Physical Hardware Servers, SAN, NIC, Switch, etc.
Networking
Connectivity in the OSI (TCP/IP) model.
Virtualization
VMWare ESXi (or other) Virtualization
Operating System
Linux, Microsoft Windows, etc.
Application
Apache httpd, load balancer, Cisco QPS, Cisco sessionmgr, etc.
Process
Particular httpd process, Cisco QPS qns01_A, etc.
QPS Health Check
1.1
There may be overlaps in the Facility value as well as gaps if a particular SNMP agent does not have full view into an
issue. The Facility reported is always shown as viewed from the reporting SNMP agent.
4.7.
Severity
In addition to Facility, each notification has a Severity measure. The defined severities are directly from UNIX syslog
and defined as follows:
Number
Severity
Description
Emergency
System is unusable.
Alert
Action must be taken immediately.
Critical
Critical conditions.
Error
Error conditions.
Warning
Warning conditions.
Notice
Normal but significant condition.
Info
Informational message.
Debug
Lower level debug messages.
None
Indicates no severity.
Clear
The occurred condition has been cleared.
For the purposes of the QPS Monitoring and Alert Notifications system, Severity levels of Notice, Info and Debug
are usually not used. Warning conditions are often used for proactive threshold monitoring (for example, Disk
usage or CPU Load), which requires some action on the part of administrators, but not immediately. Conversely,
Emergency severity indicates that some major component of the system has failed and that either core policy
processing, session management or major system function is impacted.
QPS Health Check
4.8.
1.1
Categorization
Combinations of Facility and Severity create many possibilities of notifications (traps) that might be sent. However,
some combinations are more likely than others. The following table lists some noteworthy Facility and Severity
categorizations.
Facility.Severity
Categorization
Possibility
Process.Emergency
A single part of an application

has dramatically failed.
Possible, but in an HA configuration very

unlikely.
Hardware.Debug
A hardware component has sent

a debug message.
Possible but highly unlikely.
Operating System.Alert An Operating System (kernel or

resource level) fault has
occurred.
Possible as a recoverable kernel fault (on

a vNIC for instance).
Application.Emergency An entire application component

has failed.
Unlikely but possible (load balancers

failing for instance).
Virtualization.Emergen
cy
Unlikely but possible (VM won't start, or

vSwitch fault for instance.
The virtualization system has

thrown a fault.
It is not possible quantify every Facility and Severity combination. However, greater experience with QPS
leads to better diagnostics. The QPS Monitoring and Alert Notification infrastructure provides a baseline for
event definition and notification by an experienced engineer.
QPS Health Check
5.
1.1
SNMP System and Application KPIs

Many QPS system statistics and Key Performance Indicators (KPI) are available via
SNMPv2 gets and walks. Both system device level information and application level
information is available. This information is well documented in the BROADHOP-QNSMIB. A summary of the information available is provided below. This section covers the
following topics:
5.1.
SNMP System KPIs
Details of SNMP System KPIs
SNMP Application KPIs
Summary of SNMP Application KPIs
Details of Supported KPIs
SNMP System KPIs
In this table, the system KPI information is provided.
Component
Information
LB01/LB02
CpuUser
PortalLB01/PortalLB02
CpuSystem
PCRFClient01/PCRFClient02
CpuIdle
SessionMgr01/SessionMgr02
LoadAverage1
QNS01/QNS02/QNS03/QNS04
LoadAverage5
Portal01/Portal02
LoadAverage15
MemoryTotal
MemoryAvailable
SwapTotal
SwapAvailable
Eth0InOctets
Eth0OutOctets
Eth1InOctets
Eth1OutOctets
Entel Chile
PRIVATE AND CONFIDENTIAL

Page 19 of 46
Cisco Systems, Inc.
QPS Health Check
5.2.
1.1
Details of SNMP System KPIs
The following information is available, and is listed per component. MIB documentation provides units of
measure.
5.3.
SNMP Application KPIs
Current version Key Performance Indicators (KPI) information is available at the OID root of:
.1.3.6.1.4.1.26878.200.2.3.53
This corresponds to an MIB of:
.iso
.identified-organization
.dod
.internet
.private
.enterprise
.broadhop
.broadhopProducts
.broadhopProductsQNS
.broadhopProductsQNSKPIVersion
.broadhopProductsQNSKPI53
Entel Chile

Page 20 of 46
Cisco Systems, Inc.
QPS Health Check
5.3.1.
1.1
Summary of SNMP Application KPIs
The following application KPI's are available for monitoring on each node using SNMP Get and Walk
utilities.
Component
Information
LB01/LB02
PCRFProxyExternalCurrentSessions: is the
number of open connections to lbvip01:8080.
PCRFProxyInternalCurrentSessions: is the
number of open connections to lbvip02:8080
PortalLB01/PortalLB02
PortalProxyExternalCurrentSessions: is the
number of connections to sslvip01:80.
PCRFClient01/PCRFClient02
----------
SessionMgr01/SessionMgr02
----------
QNS01
PolicyCount: The number of processed policy

messages.
QNS02
QNS03
QNS04
QueueSize: The number of entries in the

processing queue. The default queue size is 500,
but this can be configured by the customer in the
Policy Builder. You can also see the number of
dropped messages in the statistics files.
FailedEnqueueCount
ErrorCount
SessionCount
FreeMemory
Portal01
----------
Portal02
Entel Chile

Page 21 of 46
Cisco Systems, Inc.
QPS Health Check
6.
1.1
Notifications and Alerting (Traps)
The QPS Monitoring and Alert Notification framework provides the following SNMPv2 notification traps (oneway). Traps are either proactive or reactive. Proactive traps are alerts based on system events or changes
that require attention (for example, Disk is filling up). Reactive traps are alerts that an event has already
occurred (e.g., an application process died).
This section covers the following topics:
6.1.
Component Notifications
Application Notifications
Component Notifications
Components are devices that make up the QPS system. These are systems level traps. They are generated
when some predefined thresholds are crossed. User can define these thresholds in
/etc/snmp/snmpd.conf. For example, for disk full, low memory etc. Process snmpd is running on all the
VMs. When process snmpd starts, it notes the values set in snmpd.conf. Hence, whenever user makes any
change in snmpd.conf, the user must execute command
service snmpd restart
For example, if threshold crosses, snmpd throws a trap to LBVIP on the internal network on port 162. On
LB, process snmptrapd is listening on port 162. When snmptrap sees trap on 162, it logs it in the file
/var/log/snmp/trap and re-throws it on corporate_nms_ip on port 162. This corporate NMS IP is set
inside /etc/hosts file on LB1 and LB2. Typically, these components equate to running Virtual Machines.
Component notifications are defined in the BROADHOP-NOTIFICATION-MIB as follows:

broadhopQNSComponentNotification NOTIFICATION-TYPE
OBJECTS { broadhopComponentName,
broadhopComponentTime,
broadhopComponentNotificationName,
broadhopNotificationFacility,
broadhopNotificationSeverity,
broadhopComponentAdditionalInfo }
STATUS
current
DESCRIPTION "
Trap from any QNS component - i.e. device.
"
::= { broadhopProductsQNSNotifications 1 }
Entel Chile

Page 22 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
Each Component Notification contains:
Name of the device throwing the notification (broadhopComponentName)
Time the notification was generated (broadhopComponentTime)
Facility or which layer the notification came from (broadhopNotificationFacility)
Severity of the error (broadhopNotificationSeverity)
Additional information about the notification, which might be a bit of log or other information.
Component Notifications that QPS generates are shown in the following list. Any component in the QPS
system may generate these notifications.
Name
Feature
Severity
Message Text
Disk Full: This alarm gets

generated for following file
system:
Component
Warning
DiskFullAlert
1.
Current disk usage has passed a designated threshold. This situation

may resolve on its own, but could be a sign of logs or database files
growing large.
2. /var
3. /home
4. /boot
5. /opt
Name
Feature
Severity
Message Text
Disk Full Clear: This alarm gets

generated for following file
system:
Component
Clear
DiskFullClear
1.
Current Disk usage has recovered the designated Threshold.
2. /var
3. /home
4. /boot
5. /opt
Load Average of local system:

The alarm gets generated for
Component
1 minute
Warning (1, 5
minutes)
HighLoadAlert
Alert (15
minutes)
5 minute
15 minute Average
Current CPU load is more than configured threshold for 1/5/15

minutes.
Load Average Clear of local
Component
Entel Chile
Clear

Page 23 of 46
HighLoadClear
Cisco Systems, Inc.
QPS Health Check
system: The alarm gets

generated for
1.1
Current CPU load has recovered from more than configured

threshold.
1 minute
5 minute
15 minute Average
Low Swap memory alarm
Operating System Warning
LowSwapAlert
Current swap usage has passed a designated threshold. This is a

warning.
Low Swap memory clear
Operating System Clear
LowSwapClear
Current swap usage has recovered a designated threshold.

Interface Down Alarm: This
alarm gets generated for all
physical interface attached to the
system.
Operating System Alert
Interface Up Alarm: This alarm

gets generated for all physical
interface attached to the system.
Operating System Clear
Low Memory Alert Alarm
Operating System Warning
<Interface Name> is Down
Not able to connect or ping to the interface.
<Interface Name> is Up
Able to ping or connect to interface.

LowMemoryAlert
Low memory alert.

Low Memory Clear Alarm
Operating System Info
LowMemoryClear
Low memory alert.
6.2.
Application Notifications
Applications are running processes on a component device that make up the QPS system. These are
application level traps. QPS process (starting with word java when we run "ps -ef") and some scripts (for GR
traps) generates these traps.
For example, when a trap is generated, it is thrown to LBVIP on internal network (can be on port 162. On
LB, process snmptrapd is listening on port 162. When snmptrap sees trap on 162, it logs it in the file
/var/log/snmpd/trap and re-throws it on corporate_nms_ip on port 162. This corporate NMS IP is set
inside /etc/hosts file on LB1 and LB2.
Application notifications are defined in the BROADHOP-NOTIFICATION-MIB as follows:

broadhopQNSApplicationNotification
NOTIFICATION-
TYPE OBJECTS { broadhopComponentName,

broadhopComponentTime,
broadhopComponentNotificationName,
broadhopNotificationFacility,
broadhopNotificationSeverity,
broadhopComponentAdditionalInfo }
Entel Chile

Page 24 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
STATUS
current
DESCRIPTION "
Notification Trap from any QNS application i.e.,
runtime.
"
::= { broadhopProductsQNSNotifications 2 }
Each Application Notification contains these elements:
Name of the device throwing the notification (broadhopComponentName)
Time the notification was generated (broadhopComponentTime)
Facility or which layer the notification came from (broadhopNotificationFacility)
Severity of the error (broadhopNotificationSeverity)
Additional information about the notification, which might be a portion of log or other information
Application Notifications that QPS generates are shown in the following list. Any application in QPS system
may generate these notifications.
Name
Feature
Severity
Message Text
License Usage
Threshold Exceeded
Application
Major/Minor/War "Session Count License Usage

ning/Critical/Cle at: xx%,threshold at: xx%"
ar
The license threshold is defined and SNMP traps are sent out if the thresholds
ere exceeded by the real license usage. This limits the total session count usage.
Memcached
Application
Critical
ConnectError
Generated if the system is unable to connect to memcached server or unable to

write to it.
MemcachedConnect
Error
Application
Major
Memcached server is in error: %s
Memcached server is in error: OR

some exception generated message.
Generated if attempting to connect to or write to the memcached server causes

an exception. %s is the exception that occurred.
ApplicationStartErro Application
r
Alert
Feature %s is unable to start. Error

%s
Generated if an installed feature cannot start.

LicensedSessionCre
ation
Entel Chile
Application
Critical

Page 25 of 46
Common Services: 4:Session

creation is not allowed
Cisco Systems, Inc.
QPS Health Check
1.1
A predefined threshold of sessions covered by licensing has been passed. This

is a warning and should be reported. License limits may need to be increased
soon. This message can be generated by an invalid license, but the
AdditionalInfo portion of the notification shows root cause.
InvalidLicense
Application
Emergency
"xxx license has not been verified

yet"
The system license currently installed is not valid. This prevents system
operation until resolved. This is possible if no license is installed or if the
current license does not designate values. This also may occur if any system
networking MAC addresses have changed..
Application
Emergency
xxx license is invalid %s
Critical
xxx is Expired %s
Major
xxx will expire soon %s
License is invalid.
Application
License has expired.
Application
License is going to expire soon.

Application
Critical
xxx has exceeded the allowed

parameters %s
License has exceeded the allowed parameters.

Application
Major
xxx is nearing the allowed

parameters %s
License is nearing allowed parameters.

Name
Feature
Severity
Message Text
PolicyConfiguration
Application
Major
"Last policy configuration failed

with the following message:xxx"
A change to system policy structure has failed. The AdditionalInfo portion of

the notification contains more information. The system typically remains in a
proper state and continues core operations. Either make note of this message or
investigate more fully.
SessionManagerUna
vailable
Application
Major
"$HOST : $PORT session

management node is not available"
The reported session management node is unavailable. This should not

immediately affect core services, but may degrade service performance.
PoliciesNot
Application
Emergency
"Policies not configured "
Configured
The policy engine cannot find any policies to apply while starting up. This may
occur on a new system, but requires immediate resolution for any system
services to operate.
DiameterPeerDown
Application
Major
host $HOST realm: %s is down
Critical
HA Failover done from %s to %s of

${SET_NAME}-SET$Loop
Diameter peer is down.

HA_Failover
Application
Primary member of replica is down and taken over by other primary member of
same replica set.
Entel Chile

Page 26 of 46
Cisco Systems, Inc.
QPS Health Check
Geo_Failover
Application
1.1
Critical
Geo Failover done from %s to %s of

Primary member of replica set is down and taken over by other member of the
same replica set.
All_replica_of_DB_
down
Application
Critical
All replicas of
${SET_NAME}-SET$Loop are
down
Not able to connect to any member of replica set.

No_Primary_membe
r_found
Application
Critical
Unable to find primary member for

Replica-set
Unable to find primary member for replica-set.

Secondary DB Down Application
Critical
Secondary DB
%member_ip:%mem_port
(%mem_hostname) of SET $SET is
down
In replica set, secondary DB member is not able to connect.

Arbiter Down
Application
Critical
Arbiter %member_ip:%mem_port
down
In replica set, the administrator is not able to connect to configured arbiter.
Name
Feature
Severity
Message Text
Config Server is
Down
Application
Critical
Config Server
%member_ip:%mem_port
down
In replica set, the administrator is not able to connect to Configured Config

Server.
Site Down
Application
Critical
Site %site is down
The administrator is not able to connect to the Site.

VM Down
Application
Critical
unable to connect %member_ip

(%member) VM. It is not reachable.
The administrator is not able to ping to VM (This alarms gets generated for all
VMs configured inside /etc/hosts of lb).
QPS Process Down
Application
Critical
%server server on %vm vm is down
QPS java process is down.

Admin Logged In
Application
Critical
root user logged in on %hostname

terminal %terminal from machine
%from_system
root user logged in on %hostname terminal.

VIRTUALInterfaceE Network
rror
Entel Chile
Alert

Page 27 of 46
unable to connect
%INTERFACE(lbvip01/lbvip02)
VM. Not reachable
Cisco Systems, Inc.
QPS Health Check
1.1
Not able to ping the virtual Interface. This alarm gets generated for lbvip01,
lbvip02.
Developer Mode
License traps
Application
Warning
Common Services: 4:Using

POC/Development license (100
session limit). To use a license file,
remove
-Dcom.broadhop.developer.mode
from /etc/broadhop/qns.conf
Generated if developer mode is configured in qns.conf.

Can't create sessions
due to errors
Application
ZeroMQConnection
Error
Application
Major
Session creation is not allowed
Generated if license is not installed or session count has reached allowed

license level.
Error
QNS Failed to Send

Message:tcp://%s:%d
Internal services cannot connect to a required Java ZeroMQ queue. Although

retry logic and recovery is available, and core system functions should
continue, investigate and remedy the root cause.
ZeroMQConnection
Error(clear)
Application
Clear
Send Message Success:tcp://%s:%d
Name
Feature
Severity
Message Text
Diameter Critical
Failure
Application
Critical
Realm: <<xxx.com>> all peers are

down
Internal services can connect to a required Java ZeroMQ queue.
The realm name within << >> is the

name configured by customer which
has no connected peers.
All diameter peer connections configured in a given realm are DOWN (i.e.
connection lost). The alarm identifies which realm is down. The alarm is
cleared when at least one of the peers in that realm is available.
Diameter Critical
Failure
Application
Major
Host: <<ci-host-gx>> Realm:

<<ci-gx-client.com>> is down
First <<>> is hostname as
configured; second << >> is realm
name of that host
The connection to a diameter peer configured in a given realm is down. The

alarm identifies which peer connection is down. The alarm is cleared when the
connection to that peer is available once again. Also when peer is back UP, we
issue a Clear alarm.
Entel Chile

Page 28 of 46
Cisco Systems, Inc.
QPS Health Check
7.
1.1
Configuration and Usage
All access to system statistics and KPIs should be collected via SNMP gets and walks from the virtual IP
lbvip01, which can be located on either lb01 or lb02 load balancers. System notifications are also sourced
from this address.
Configuration of the system consists of following:
Configuration for SNMP gets and walks
Configuration for Notifications (traps)
License Usage Threshold
Validation and Testing
7.1.
Configuration for SNMP gets and walks
At the time of installation, SNMPv2 gets and walks can be performed against the system lbvip01 with the
default read-only community string of Cisco using standard UDP port 161. The IP address of lbvip01 can be
found in the /etc/hosts file of pcrfclient01, lb01 or lb02.
The read-only community string can be changed from its default of Cisco to a new value using the following
steps:
7.2.
Configuration for Notifications (traps)
After the previous configurations have been made, notifications should be logged locally in the
/var/log/snmp/trap file as well as forwarded to the NMS destination at corporate_nms_ip. By default,
traps are sent to the destination corporate_nms_ip using the SNMPv2 community string of Cisco. The
standard SNMP UDP trap port of 162 is also used. Both of these values may be changed to
7.3.
Validation and Testing
This section describes the commands for validation and testing of the QPS SNMP infrastructure during its
development. You can use these commands now to validate and test your system during setup,
configuration, or at any point. Our examples use MIB values because they are more descriptive, but you may
use equivalent OID values if you like, particularly when configuring an NMS.
The examples here use Net-SNMP snmpget, snmpwalk and snmptrap programs. Detailed configuration of
this application is outside the scope of this document, but the examples assume that the three Cisco MIBs
are installed in the locations described on the man page of snmpcmd (typically the
/home/share/<user>/.snmp/mibs or /usr/share/snmp/mibs directories).
Validation and testing is of three types and correspond to the statistics and notifications detailed earlier in
this document:
Component Statistics
Application KPI
Notifications
Entel Chile

Page 29 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
Run all tests from a client with network access to the Management Network or from the lb01, lb02,
pcrfclient01 or pcrfclient02 hosts (which are also on the Management Network).
7.4.
Component Statistics
Component statistics can be obtained on a per statistic basis with snmpget. As an example, to get the
current available memory on pcrfclient01 use the following command:
snmpget -v 2c -c broadhop -m +BROADHOP-MIB:BROADHOP-QNS-MIB <lbvip01>

BROADHOP-QNS-MIB::component53PCRFClient01MemoryAvailable
where <lbvip01> is the IP address of lbvip01 or as resolved from the /etc/hosts file. An example of the
output from this command is:
BROADHOP-QNS-MIB::component53PCRFClient01MemoryAvailable = INTEGER: 629100
Interpret this output means that 629,100 MB of memory are available on this component machine.
All available component statistics in an MIB node can be walked via the snmpwalk command. This is very
similar to snmpget as above. For example, to see all statistics on lb01 use the command:
snmpwalk -v 2c -c broadhop -m +BROADHOP-MIB:BROADHOP-QNS-MIB <lbvip01>

BROADHOP-QNS-MIB::broadhopProductsQNSComponents53LB01
output from this command is:
7.5.
Application KPI
Application KPI can be obtained on a per statistic basis with snmpget in a manner much like obtaining
Component Statistics. As an example, to get the number of sessions currently active on qns01, use the
following command:
snmpget -v 2c -c broadhop -m +BROADHOP-MIB:BROADHOP-QNS-MIB <lbvip01>

BROADHOP-QNSMIB::kpi53QNS01SessionCount
output from this command would be:
Entel Chile

Page 30 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
BROADHOP-QNS-MIB::kpi53QNS01SessionCount = STRING: 937

Read this output means that 937 sessions are active on qns01.
Similarly, all available KPI in an MIB node can be walked via the snmpwalk command. This is very similar
to snmpget as above. As an example, to see all statistics on qns02, use the following command:
snmpwalk -v 2c -c broadhop -m +BROADHOP-MIB:BROADHOP-QNS-MIB <lbvip01>

BROADHOP-QNS-MIB::broadhopProductsQNSKPI53QNS02
output from this command would be:
BROADHOP-QNS-MIB::kpi53QNS02PolicyCount
STRING:
BROADHOP-QNS-
MIB::kpi53QNS02QueueSize = STRING: 0
BROADHOP-QNS-MIB::kpi53QNS02FailedEnqueueCount
STRING:
BROADHOP-QNS-MIB::kpi53QNS02ErrorCount
STRING: 0 BROADHOP-QNS-MIB::kpi53QNS02SessionCount =
STRING: 937
BROADHOP-QNS-MIB::kpi53QNS02FreeMemory = STRING: 3721598032
Entel Chile

Page 31 of 46
Cisco Systems, Inc.
QPS Health Check
8.
1.1
Notifications
Testing and validating notifications requires slightly more skill than testing SNMP gets and walks. Recall
that the overall architecture is that all components and applications in the QPS system are configured to
send notifications to lb01 or lb02 via lbvip02, the Internal Network IP. These systems log the notification
locally in /var/log/snmp/trap and then re-throw the notification to the destination configured by
corporate_nms_ip. Two testing and troubleshooting methods are illustrated below: confirming notifications
are being sent properly from system components to lb01 or lb02, and confirming that notifications can be
sent upstream to the NMS.
8.1.
Receiving Notifications
There are several ways to confirm that lb01 or lb02 are properly receiving notifications from components.
First, determine the active load balancer it is either lb01 or lb02 and have multiple IP addresses per
interface as shown by the ifconfig command.
8.2.
Upstream Notifications
Should a notification not be received by the NMS, you can manually throw a notification from the active load
balancer to the NMS using this command:
snmptrap -v 2c -c broadhop <corporate_nms_ip> ""

NET-SNMP-EXAMPLES-MIB::netSnmpExampleHeartbeatNotification
netSnmpExampleHeartbeatRate i 123456
where <corporate_nms_ip> is the appropriate NMS IP address. This sends an SNMPv2 trap from the
active load balancer to the NMS and can be used for debugging.
Entel Chile

Page 32 of 46
Cisco Systems, Inc.
QPS Health Check
9.
1.1
Logs
Most of the QPS logs are located locally in /var/log/broadhop/ on virtual machines IOMGRxx, QNSxx,
and PCRFclientXX. PCRFclient01 contains the consolidated logs from all of the IOMGR, QNS and
PCRFCLIENT virtual machines.
The QPS logs can be divided based on Application/Script which produces the logs:
Application/Script Produces Logs: Upgrade Script, Upgrade Binary, page 2-64
Application/Script Produces Logs: qns, page 2-64
Application/Script Produces Logs: qns pb, page 2-65
Application/Script Produces Logs: mongo, page 2-66
Application/Script Produces Logs: httpd, page 2-66
Application/Script Produces Logs: portal, page 2-67
Application/Script Produces Logs: license manager, page 2-67
Application/Script Produces Logs: svn, page 2-68
Application/Script Produces Logs: auditd, page 2-68
Application/Script Produces Logs: graphite, page 2-68
Application/Script Produces Logs: kernel, page 2-69
9.1.
Useful logs for troubleshooting
QNS pcrfclient01:
/opt/broadhop/log/consolidated-qns.log
/opt/broadhop/log/qns-engine.log
QNS01 04 & LB01/02:
/var/log/broadhop
Mongo sessionmgr01/02:
/var/log/mongodb-277XX.log
Portal portal01/02:
/var/log/httpd/access.log
9.2.
AIO:
Application/Script Produces Logs: Upgrade Script, Upgrade Binary
Log: upgrade log
Description: Log messages generated during upgrade of QPS.
Log file name, format, path:
/var/log/broadhop/upgrade<date>_<time>.log
HA/GR:
Entel Chile
pcrfclient01:/var/log/broadhop/upgrade<date>_<time>.log
Log config File: NA
Log Rollover: No
Page 33 of 46
Cisco Systems, Inc.
QPS Health Check
9.3.
1.1
Application/Script Produces Logs: qns
Log: qns log

Description: Main and most detailed logging. Contains initialization errors
and application level errors.

AIO: /var/log/broadhop/qns-<instance no>.log

HA/GR: pcrfclient01:/var/log/broadhop/qns-<instance no>.log
Log config File: /etc/broadhop/logback.xml
Log Rollover: No
Log: qns engine logs

Description: Higher level event based logging including what services a
subscriber has, the state of the session, and other useful information.
AIO: /var/log/broadhop/qns-engine-<instance no>.log

HA/GR: /var/log/broadhop/qns-engine-<instance no>.log
Log config File: NA
Log Rollover: No
Log: qns service logs

Description: Contains start up logs. Also if the logback.xml is incorrectly
formatted all logging statements go into this log.
AIO: /var/log/broadhop/service-qns-<instance no>.log

HA/GR: qns0*: /var/log/broadhop/service-qns-<instance no>.log
Log config File: NA
Log Rollover: No
Log: consolidated qns logs

Description: Contains the consolidation of all qns logs with the IP of the
instance as part of the log event.

Log file name,
format, path: AIO:

NA
HA/GR: pcrfclient01: /var/broadhop/log/consolidated-qns.log
Log config File: /etc/broadhop/controlcenter/logback.xml
Log Rollover: No
Log: consolidated engine logs

Description: Contains the consolidation of all qns engine logs with the IP of the
instance as part of the log event.

Log file name,
Entel Chile

Page 34 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
format, path: AIO:

NA
HA/GR:
/etc/broadhop/controlcenter/logback.xml
Log config File: /etc/broadhop/controlcenter/logback.xml

Log Rollover: No
Log: consolidated diagnostics logs

Description: Contains logs about errors occurred during diagnostics of QPS.
Log file name,
format, path: AIO:

NA
HA/GR: /pcrfclient01: /var/log/broadhop/ consolidated-diag.log
Log config File: NA
Log Rollover: No
9.4.
Application/Script Produces Logs: qns pb
Log: qns pb logs

Description: Policy builder startup, initialization logs get logged into this log file.
Log file name, format,
path: AIO:
/var/log/broadhop/ qnspb.log
HA/GR: pcrfclient01: /var/log/broadhop/qns-pb.log
Log config File: NA
Log Rollover: No
Log: service qns pb logs

Description: Policy builder service logs.
AIO: /var/log/broadhop/ service-qns-pb.log

HA/GR: pcrfclient01: /var/log/broadhop/service-qns-pb.log
Log config File: NA
Log Rollover: No
Log: qns engine pb logs

Description: Policy builder engine logs.
AIO: /var/log/broadhop/ qns-engine-pb.log

Entel Chile

Page 35 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
HA/GR: pcrfclient01: /var/log/broadhop/qns-engine-pb.log

Log config File: NA
Log Rollover: No
9.5.
Application/Script Produces Logs: mongo
Log: mongo db logs
Description: Contains useful information about the mongo db operations including

queries, errors, warnings, and users behavior.

path: AIO:
/var/log/mongodb<port>.log
HA/GR: sessionmgr01: /var/log/mongodb-<port>.log
9.6.
Log config File: /etc/mongod.conf
Log Rollover: No
Application/Script Produces Logs: httpd
Log: httpd access logs

Description: Apache server records all incoming requests and all requests
processed to a log file.

path: AIO:
/var/log/httpd/access_l
og
HA/GR: pcrfclient01: /var/log/httpd/access_log
Log config File: /etc/httpd/conf/httpd.conf
Log Rollover: Yes
Log: httpd error logs

Description: All apache errors/diagnostic information about other errors found
during serving requests are logged to this file. This apache log file often
contain details of what went wrong and how to fix it.
AIO: /var/log/httpd/error_log
HA/GR: pcrfclient01: /var/log/httpd/error_log
Log config File: /etc/httpd/conf/httpd.conf
Log Rollover: Yes
Entel Chile

Page 36 of 46
Cisco Systems, Inc.
QPS Health Check
9.7.
1.1
Application/Script Produces Logs: portal
Log: portal request logs

Description: Contains the API requests that are sent from the portal to the QNS
system.
AIO:
/var/www/portal/app/tmp/logs/api_request.log
HA/GR: portal01: /var/www/portal/app/tmp/logs/api_requests.log

Log config File: NA
Log Rollover: No
Log: portal response logs

Description: Contains the API response that are sent from the qns to portal.
AIO:
/var/www/portal/app/tmp/logs/api_response.log
HA/GR: portal01: /var/www/portal/app/tmp/logs/api_response.log

Log config File: NA
Log Rollover: No
Log: portal error logs

Description: Contains error level logs of the application. Generally not useful to
anyone except the portal developers. It generally reveals permission issues on

installations.
AIO: /var/www/portal/app/tmp/logs/error.log
HA/GR: portal01: /var/www/portal/app/tmp/logs/error.log
Log config File: NA
Log Rollover: No
Log: portal debug logs

Description: Contains debug level logs of the application.Generally not useful to
anyone except the portal developers. It generally reveals permission issues

on installations.
AIO: /var/www/portal/app/tmp/logs/debug.log
HA/GR:
/var/www/portal/app/tmp/logs/debug.log
Log config File: NA

Log Rollover: No
Entel Chile

Page 37 of 46
Cisco Systems, Inc.
QPS Health Check
9.8.
1.1
Application/Script Produces Logs: license manager
Log: lmgrd logs
Description: Contains license file related errors.

path: AIO:
/var/log/broadhop/lmgrd.l
og
HA/GR: pcrfclient01: /var/log/broadhop/lmgrd.log
9.9.
Log config File: NA
Log Rollover: No
Application/Script Produces Logs: svn
Log: SVN log
Description: svn log command displays commit log messages. For more
information refer:
/usr/bin/svn log -help. For example,./usr/bin/svn log http://lbvip02/repos/run
Log file name,

format, path: AIO:
NA
HA/GR: NA
9.10.
Log config File: NA
Log Rollover: No
Application/Script Produces Logs: auditd
Log: audit logs
Description: Contains log of all sessions established with QPS VM. SSH
session logs, cron job logs.
Log file name,

format, path: AIO:
/var/log/audit/audit.lo
g
HA/GR: pcrfclient01: /var/log/audit/audit.log
Entel Chile
Log config File: NA
Log Rollover: Yes

Page 38 of 46
Cisco Systems, Inc.
QPS Health Check
9.11.
1.1
Application/Script Produces Logs: graphite
Log: carbon client logs

Description: Contains client connection logs.
path: AIO:
/var/log/carbon/client.l
og
HA/GR: pcrfclient01: /var/log/carbon/client.log
Log config File: /etc/carbon/carbon.conf
Log Rollover: No
Log: carbon console logs

Description: Contains process startup and initialization logs.
path: AIO:
/var/log/carbon/
console.log
HA/GR: pcrfclient01: /var/log/carbon/console.log
Log Rollover: No
Log: carbon query logs

Description: Contains log queries which are performed on the application.
path: AIO:
/var/log/carbon/query.l
og
HA/GR: pcrfclient01: /var/log/carbon/query.log
Log Rollover: No
Log: carbon creates logs

Description: This log tells you what .wsp (whisper) db files are being created.
path: AIO:
/var/log/carbon/creates.l
og
HA/GR: pcrfclient01: /var/log/carbon/creates.log
Entel Chile

Page 39 of 46
Cisco Systems, Inc.
QPS Health Check
1.1

Log Rollover: Yes
Log: carbon listener logs

Description: Contains connection related logs.
path: AIO:
/var/log/carbon/listener.l
og
HA/GR: pcrfclient01: /var/log/carbon/listener.log
Log Rollover: Yes
9.12.
Application/Script Produces Logs: kernel
Log: haproxy
Description: Contains information like if VIP handoff.
Log file name,

format, path: AIO:
/var/log/messages
HA/GR: pcrfclient01: /var/log/messages
Entel Chile
Log config File: NA
Log Rollover: Yes

Page 40 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
10. Basic Troubleshooting Using QPS Logs
10.1.
Start the engine logs on pcrfclient01.

This displays issues or problems in the subscriber or services. If the event is not
found in the engine logs, check the qns logs to look for anomalies.
It is good to understand when the call was supposed to occur in order to narrow
down the issue.
If there are no ERRORs or no exceptions, etc, then we can increase the logging
levels. Policy tracing and logs at DEBUG levels can usually indicate the problem.
But just like a router, too much debugging can affect the performance of the system.
Use grep usernames, mac addresses, IP addresses, etc in logs to find required
information.
Logging Level and Effective Logging Level
QNS
Modify the file /etc/broadhop/logback.xml
Error messages can be set to INFO, WARN, DEBUG, TRACE
It takes 1 min for the error level to take effect
TRACE messages are so verbose they are not recorded in the consolidated log in a distributed
configuration so you need review individual logs on the VMs themselves
Logging should be off by default. To enable dynamically edit

/var/www/portal/app/Config/broadhop.php
Portal
Logging level and the actual effective logging level can be two different levels because of the following
logback logging rules:
1. When a logging level is set, if the logging level of the parent process is higher than the logging level of
the child process, then the effective logging level of the child process is that of the parent process.
That is, even though the logging level of the child process is set, it cannot be below the logging level
of the parent process and is automatically overridden to the higher logging level of the parent
process.
2. There is a global root logging level that each process can inherit as an effective default logging
level. If you do not want to have a default effective logging level, then set the root level to OFF.
3. Each logging level prints the output of the lower logging levels.
Entel Chile

Page 41 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
This following table displays the logging level and the message types printed due to 3. above.
Level
Message Types Printed
All
Equivalent to Trace and some more messages.
Trace
Trace, Debug, Info, Warn, & Error
Debug
Debug, Info, Warn, & Error
Info
Info, Warn, & Error
Warn
Warn & Error
Error
Error
Off
The following table describes the different logging levels and what they should be used for:
Logging Level Description
Valid Use Case
Invalid Use Case
Error
Error conditions that breaks a

system feature. Errors logging level
should not be used for call flow
errors.
Database is not
available.
Subscriber not found.
Warn
Helps to understand the early signs

that will prevent the system from
functioning in the near future OR
are triggered by unexpected
preconditions in a method.
Retrieved more then

one Gx QoS profile.
Warnings should not

be used for individual
call flows.
Info
Helps to understand the life cycle of

components and subsystems, such
as plug-ins, databases.
NA
Info should not be

used for individual
call flows.
Debug
Helps to understand the flow of the

code execution at Class/Method
level. i.e. in
_createIsgDeviceSession({log...)
NA
NA
Trace
Helps to understand the values of the

statement and branch of logics
within the method for
troubleshooting.
NA
NA
No service found for

session.
You can configure target and log rotation for consolidated logs in log configuration file
/etc/broadhop/controlcenter/logback.xml.
Entel Chile

Page 42 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
The following parameters can be configured for target VM and port.

<appender
name="SOCKET-BASE"
class="ch.qos.logback.classic.net.SocketAppender>"
<RemoteHost>${logging.controlcenter.host:lbvip02}</RemoteHost>
<Port>${logging.controlcenter.port:-5644}</Port>
<ReconnectionDelay>10000</ReconnectionDelay>
<IncludeCallerData>false</IncludeCallerData>
</appender>
The above configuration is used to redirect consolidated logs to lbvip02 VM on port 5644 with reconnection
delay.
Log rotation is configured using following configuration in
/etc/broadhop/controlcenter/logback.xml.
<rollingPolicy
class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy
">
<fileNamePattern>
${com.broadhop.log.dir:-/var/log/broadhop}/consolidateddiag.%i.log.g z
</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>5</maxIndex>
</rollingPolicy>
<triggeringPolicy
class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolic
y">
<maxFileSize>100MB</maxFileSize>
</triggeringPolicy>
Using the above configuration 100 MB log files are generated and after that log files rotate from index 1 to
5. When the 100 MB log file trigger condition is met, the order in which QPS system performs the file
operations is:
Entel Chile
log.5.gz > deleted
log.4.gz > log.5.gz
log.2.gz > log.3.gz
log.1.gz > log.2.gz

Page 43 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
Current > log.1.gz
Similar configurations can be applied for qns logs in /etc/broadhop/logback.xml.
10.2.
Diagnostics and Status scripts
1. Run /opt/broadhop/installer/diagnostics.sh for status

2. Run /opt/broadhop/control/statusall.sh
3. tail -f /var/log/broadhop/consolidated-qns.log on pcrfclient01
10.3.
Go to the bottom of the log file and search backwards for ERROR
Useful Configuration Files
QNS pcrfclient01:
/etc/broadhop/qns.conf
/etc/broadhop/logback.xml
10.4.
Domain Troubleshooting
1. tail f consolidated-qns.log to determine what domain is being calculated for your call flow
2. The domain calculation comes after the location query response to the portal
3. If the domain calculation is wrong either the wrong portal has been determined or the wrong domain is
associated with your desired portal page
4. You can fix your domain through the portal administration page
5. If your domain is correct but the call flow is incorrect after that, access policy builder to review your
domain configuration
10.5.
tcpdump to Troubleshoot Interfaces
Test service definition requests from ISG to the QNS by running the following command. Testing a service does
not require a password. Testing a user does.
a) test aaa group <YOUR_AAA_GROUP> <YOUR_SERVICE_OR_USER_NAME>
<ANY_PASSWORD_FOR_SERVICE_OR_USER_PASSWORD> legacy
Example: test aaa group TRAINING_AAA_140

L4REDIRECT_SERVICE_TRAINING_140 servicecisco legacy
Example: test aaa group TRAINING_AAA_140 <USER_NAME> <USER_PASSWORD>

legacy
b) Use this command on all policy map services that is assigned to the interface you are working
with (e.g. PBHK, OPENGARDEN, L4 REDIRECT, ).
2. Listen for RADIUS traffic from the ISG by logging into lb01 and lb02. Depending on the problem, you want to
review Access Requests (typically port 1812) and/or Accounting Requests (typically port 1813). If you are working
ISG Prepaid, you want to listen in on all active ports (e.g. 1812,1813,1814 and 1815). The following lists
tcpdumps that you may want to run and then review in wireshark. The -w option writes the output to a pcap file
and the -vvv option writes verbose output to the console.
tcpdump -i any port 1812 -s0 -vvv -w /tmp/tcpdump.pcap (Access Request Only)
Entel Chile

Page 44 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
tcpdump -i any port 1812 and port 1813 -s0 -vvv -w /tmp/tcpdump.pcap
tcpdump -i any port 1812 or port 1813 or port 1814 or port 1815 -s0 vvv -w tcpdump.pcap
Entel Chile

Page 45 of 46
Cisco Systems, Inc.
QPS Health Check
1.1
11. Mibs and Components.

Listado de MIBSvs
components.xlsx
Entel Chile

Page 46 of 46
Cisco Systems, Inc.

QPS Rapid TroubleShooting

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

QPS Rapid TroubleShooting

Transféré par

Droits d'auteur :

Formats disponibles

QPS Health Check

CiscoQuantumPolicySuite6.1.1Alarming and SNMP Guide

Cisco Systems, Inc.

Cisco has more than 200 offices worldwide. Addresses,

QPS Health Check

QPS Health Check

NODE ACCESS AND IP ADDRESSING .................................................................................................................. 7

3.1. UCS. ..................................................................................................................................................................... 7

MONITORING AND ALERT NOTIFICATION .......................................................................................................... 15

5.1. SNMP System KPIs ............................................................................................................................................. 19

NOTIFICATIONS AND ALERTING (TRAPS) .......................................................................................................... 22

QPS Health Check

Component Statistics ........................................................................................................................................... 30

Receiving Notifications ......................................................................................................................................... 32

Useful logs for troubleshooting............................................................................................................................. 33

BASIC TROUBLESHOOTING USING QPS LOGS ............................................................................................... 41

Logging Level and Effective Logging Level ......................................................................................................... 41

QPS Health Check

QPS Health Check

QPS Health Check

Node Access and IP Addressing

QPS Aplication Site Centro (CNT)

MANGEMENT PCRF CNT:

SW1-PCRF-CDV - VLAN 156

SW2-PCRF-CDV - VLAN 156

QPS Health Check

QNS Internal Nets

QNS External Nets

QPS Health Check

CNT Virtual Machines (VMs) Network Diagram

QPS Health Check

CNT UCS Blades Assigned

QPS Health Check

QPS Aplication Site Cuidad de los Valles (CDV)

MANGEMENT QPS CDV:

QPS Health Check

QNS Internal Nets

QNS External Nets

QPS Health Check

CDV Virtual Machines (VMs) Network Diagram

QPS Health Check

CDV UCS Blades Assigned

QPS Health Check

Monitoring and Alert Notification

Protocols and Query Endpoints

SNMP Object Identifier and Management Information Base

SNMPv2 Data and Notifications

Emergency Severity Note

Protocols and Query Endpoints

QPS Health Check

SNMP Object Identifier and Management Information Base

Defines the main structure, including structures and

Defines the retrievable statistics and KPI.

Defines Notifications/Traps available.

SNMPv2 Data and Notifications

Source (device name)

The generic syslog Facility has the following definitions.

Physical Hardware Servers, SAN, NIC, Switch, etc.

Connectivity in the OSI (TCP/IP) model.

VMWare ESXi (or other) Virtualization

Linux, Microsoft Windows, etc.

Apache httpd, load balancer, Cisco QPS, Cisco sessionmgr, etc.

Particular httpd process, Cisco QPS qns01_A, etc.