Vous êtes sur la page 1sur 121

ACI Troubleshooting

BRKACI-2102
Mioljub Jovanovic, Technical Leader

Agenda

Introduction

Understanding Faults and Health


status

Tools

Troubleshooting scenarios

Conclusion / Q&A

The right way were used to do it


# show int eth 1/1 | grep input
30 seconds input rate 97064 bits/sec, 66 packets/sec
input rate 97064 bps, 66 pps; output rate 95008 bps, 57 pps
20297397 input packets
0 input error

6494649266 bytes

0 short frame

0 input with dribble

0 overrun

0 underrun

0 ignored

72 input discard

Good old CLI!!!


Example: Checking input rate on specific interface
4

John Chambers
@CiscoLive #clus, San Diego 2015

The way we do it in APIC

Visualize interface input/output


7

The way we can do it with ACI


> moquery -c eqptIngrPkts5min -f 'eqpt.IngrPkts5min.unicastRate>"1000"' | egrep -e "^dn|^unicastRate"
dn

: topology/pod-1/node-101/sys/phys-[eth1/34]/CDeqptIngrPkts5min

unicastRate

: 1742.12

example: finding interface with unicast rate > 1000

> moquery -c eqptIngrPkts5min -f 'eqpt.IngrPkts5min.unicastRate>"1000"' -o xml


<eqptIngrPkts5min childAction="" cnt="18" dn="topology/pod-1/node-101/sys/phys[eth1/34]/CDeqptIngrPkts5min" status="" unicastAvg="10833" unicastBase="0"
unicastCum="2390904" unicastLast="18809" unicastMax="31630" unicastMin="2075"
unicastPer="194995" unicastRate="1089.254093" unicastSpct="0" unicastThr=""
unicastTr="0" unicastTrBase="503518"/>
</imdata>

Query any managed object (MO) for data we need!

Q: thats cool, but how do I know which object/class to query ?


check next slide for the answer
8
Q: it looks cryptic to me ... how do I find meaning of each field?`

APIC Management Information Model Reference


From the WebUI

direct URL

https://apic/doc/html/

APIC UI

apic 1

Web Browser

apic 2

APIC Cluster

Connect to APIC

Visore

CLI (ssh)

apic 3
10

spine 1

spine 2

Connect to switch

ACI Fabric

leaf 1

leaf 2

leaf 3

leaf 4

leaf 5

We could connect directly to switches as well


- ssh or console
- visore
- REST
11

CLI Available at the Switch


AAA via TACACS+, Radius and LDAP is supported when logging into switch CLI console.
Configuration mode is not supported at switch console.
There are two scenarios where administrators would log into switch console:

From APIC UI, admin can remote login to switch console

Login directly via serial console port on the switch front panel or SSH to management
username "admin".
IP via out of band or inband Using
Application Policy Infrastructure Controller

For majority of use cases,


admin should utilize APIC.

admin@apic1:~> acidiag fnvread


ID
Name
Serial Number
IP Address
Role
State
LastUpdMsgId
------------------------------------------------------------------------------------------------101
leaf1
SAL18CLUX85
10.0.40.66/32
leaf
active
0
102
leaf2
SAL18CBRU00
10.0.64.69/32
leaf
active
0
103
leaf3
SAL18CLHR05
10.0.40.95/32
leaf
active
0
104
leaf4
SAL18CAMS14
10.0.40.65/32
leaf
active
0
105
leaf5
SAL18CCHD53
10.0.112.69/32 leaf
active
0
201
spine1
SAL18CMUC75
10.0.64.65/32
spine
active
0
202
spine2
SAL18CFRA11
10.0.64.64/32
spine
active
0
203
spine3
SAL18CSAN15
10.0.40.69/32
spine
inactive
0x4000000ef664f
204
spine4
SAL18CSFO14
10.0.112.67/32 spine
inactive
0x4000000ef6650
Total 9 nodes
admin@apic1> ssh leaf1

12

Fabric Health Overview

13

Troubleshooting: Where do we start?


Fabric-wide monitoring

Statistics

Faults

Diagnostics

Thresholds

Faults,
Health Scores
Troubleshooting, Drill Downs

Drill-Downs
Stats

Atomic
Counters

ELAM

SPAN

On-Demand
Diagnostics

Switch
Nxos Cli

14

After logging in to the


APIC, youll see the initial
Dashboard screen.

15

The APIC dashboard provides you with an at-a-glance view of the system
health and fault counts.

16

System Health shows you a view of the


overall health of the ACI system (all nodes, tenants, etc).
fabricHealthTotal

Graph is plotted as per fabricOverallHealthHist5min

17

API Inspector
enables us to see REST API calls (GET, DELETE, POST) from WebUI to APIC

82

admin@apic1> moquery -d "/topology/HDfabricOverallHealth5min-0"


Total Objects shown: 1

Prefer JSON or XML instead of text in moquery?


-> no problem
just specify o json or -o xml with moquery

# fabric.OverallHealthHist5min
index
: 0
childAction
:
cnt
: 31
dn
: /topology/HDfabricOverallHealth5min-0
healthAvg
: 82
healthMax
: 82
healthMin
: 82
healthSpct
: 0
healthThr
:
healthTr
: 0
lastCollOffset : 310
modTs
: never
repIntvEnd
: 2015-04-10T19:24:03.530+01:00
repIntvStart
: 2015-04-10T19:18:53.442+01:00
rn
: HDfabricOverallHealth5min-0
status
:
18

How is topology built?

APIC WebUI and API inspector


Identify which objects are used
to plot topology
Re-using fabricLink objects to
identify the links
We could create our own tool
for topology, monitoring or
troubleshooting

admin@apic1:~>

# fabric.Link
n1
:
s1
:
p1
:
n2
:
s2
:
p2
:
dn
:
lcOwn
:
linkState
:
modTs
:
monPolDn
:
rn
:
status
:
wiringIssues :

moquery -c fabricLink

203
1
1
101
1
51
topology/pod-1/lnkcnt-101/lnk-203-1-1-to-101-1-51
local
ok
2015-03-13T14:26:39.526+01:00
uni/fabric/monfab-default
lnk-203-1-1-to-101-1-51

admin@bdsol-aci2-apic1:~> moquery -c fabricLink | egrep -e ^dn | head -5


dn
: topology/pod-1/lnkcnt-1/lnk-102-1-2-to-1-2-2
dn
: topology/pod-1/lnkcnt-2/lnk-102-1-4-to-2-2-2
dn
: topology/pod-1/lnkcnt-3/lnk-102-1-6-to-3-2-2
dn
: topology/pod-1/lnkcnt-201/lnk-102-1-49-to-201-1-34
dn
: topology/pod-1/lnkcnt-202/lnk-102-1-50-to-202-1-34

19

Visore Web based MO query and browser tool


https://<IP>/visore.html

fabricNode
adSt

on

childAction

<?xml version="1.0" encoding="UTF-8"?><imdata totalCount="1"><fabricNode


adSt="on" childAction="" delayedHeartbeat="no" dn="topology/pod-1/node-101"
fabricSt="active" id="101" lcOwn="local" modTs="2015-04-08T14:38:44.546+02:00"
model="N9K-C9396PX" monPolDn="uni/fabric/monfab-default" name="bdsol-9396px02" role="leaf" serial="SAL18CLUS15" status="" uid="0" vendor="Cisco Systems, Inc"
version=""/></imdata>

delayedHeartbeat

no

dn

topology/pod-1/node-101

fabricSt

active

id

101

lcOwn

local

modTs

2015-04-08T14:38:44.546+02:00

model

N9K-C9396PX

monPolDn

uni/fabric/monfab-default

name

bdsol-9396px-02

role

leaf

serial

SAL18CLUS15

status
uid

vendor

Cisco Systems, Inc

version

icurl 'http://apic/api/node/class/fabricNode.xml?query-target-filter=and(eq(fabricNode.id,"101"))'

20

The lower half of the screen shows node and tenant health.

21

The lower half of the screen shows node and tenant health.
Move these sliders
down to show only
nodes / tenants with
lower health.

22

On the right, youll see the fault


counts by domain
(e.g. access, tenant, security)

type
(config, environmental, etc)
and APIC cluster health.
23

How to get object DN from GUI


1

24

Health Score
Number
between
0 and 100

100

Perfect Health Score = 100

Health Score

25

Tools and utilities

27

Network Monitoring and Troubleshooting Tools


Physical Network

Abstracted Network

properties (EP / TEP / contract)

health scores / faults / events / audit

traceroute

iping, itraceroute

show (interface / table / etc)

atomic counters

syslog

statistics

diagnostics (on-demand)

SPAN

ELAM

ping

SPAN

28

UI Tools
Health

Faults

Audits

Events

Statistics

Call-home

Syslogs

SNMP

29

UI Operations Tools introduced in APIC 1.1 and 1.2

Visibility & Troubleshooting (also known as Troubleshooting Wizard - TsW)

Capacity Dashboard

ACI Optimizer

EP Tracker

Visualization

30

MIT access from ishell


admin@apic1:mit> cd /mit
admin@apic1:mit> ls -1l
total 3
drw-rw---- 1 admin admin
drw-rw---- 1 admin admin
drw-rw---- 1 admin admin
drw-rw---- 1 admin admin
drw-rw---- 1 admin admin
drw-rw---- 1 admin admin

512
512
512
512
512
512

Apr
Apr
Apr
Apr
Apr
Apr

2422:48
2422:48
2422:48
2422:48
2422:48
2422:48

comp
dbgs
expcont
fwrepo
topology
uni

31

moquery CLI based MO query tool


admin@apic1:~> moquery -c fabricNode -f 'fabric.Node.id=="1"'
Total Objects shown: 1
# fabric.Node
id
adSt
delayedHeartbeat
dn
fabricSt
lcOwn
modTs
model
monPolDn
name
rn
role
serial
status
uid
vendor
version

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

1
on
no
topology/pod-1/node-1
unknown
local
2015-04-08T14:27:16.290+02:00
APIC
uni/fabric/monfab-default
apic1
node-1
controller
SAL18CLUS15
0
Cisco Systems, Inc
32

moquery some examples

or simply use
WebUI

Find all EPGs with access encapsulation VLAN 3399

moquery -c fvRsPathAtt -o json -f fv.RsPathAtt.encap=="vlan-3399"

Obtain AAEP based on interface policy group

moquery -c "infraAccPortGrp" | egrep "^dn" | awk ' { print "moquery -d


"$3" -x query-target=children \| egrep tDn" }

Query the actual policy group

moquery -d "uni/infra/funcprof/accportgrp-N3k_PG_ddastoli" -x query-target=children

33

mobrowser CLI based MO browser tool

34

DME running on switch

NXOS Process
NXOS Process
NXOS Process

Switch

Get logical MO from PM and


push concrete MO to configure
switch

Objectstore (Shared memory)


35

DME running on switch

NXOS Process
NXOS Process
NXOS Process

Switch

Delegate localObjectstore
faults, events,
records, health score

(Shared memory)
35

DME running on switch

NXOS Process
NXOS Process
NXOS Process

Switch

Objectstore
Opflex(Shared
server for memory)
external
opflex elem
35

DME running on switch

NXOS Process
NXOS Process
NXOS Process

Switch

Objectstore (Shared memory)


Atomic counters, core handling
35

DME running on switch

NXOS Process
NXOS Process
NXOS Process

Switch

Objectstore (Shared memory)

Collect stats from NXOS and


push to APIC
35

APIC Logs

Switch Logs

/var/log/dme/log

/var/log/dme/log

/var/log/dme/oldlog

/var/log/dme/oldlog

/var/sysmgr/tmp_logs/

admin@apic1:~> cd /var/log/dme/log
admin@apic1:log> ls altr *
admin@apic1:log> ls al svc_ifc_policymgr.*

admin@apic1:~> cd /var/log/dme/log
admin@apic1:log> ls altr *
admin@apic1:log> ls -al svc_ifc_policyelem.*

40

acidiag your friend at tough times


admin@apic1:~> acidiag --help
...
avread
read appliance vector
fnvread
read fabric node vector
fnvreadex
read fabric node vector (extended mode)
rvread
read replica vector
rvreadle
read replica leader summary
crashsuspecttracker
read crash suspect tracker state
validateimage
validate image
version
show ISO version
preservelogs
stash away logs in preparation for hard reboot
platform
show platform
verifyapic
run apic installation verify command
bond0test
run bond0 test
touch
touch special files
run
run specific commands and capture output
installer
installer
start
start a service
stop
stop a service
restart
restart a service
reboot
reboot

41

icurl CLI utility for data transfer


mkdir /tmp/tac-655555555
cd /tmp/tac-655555555
icurl 'http://localhost:7777/api/class/faultInfo.xml'

> faultInfo.xml

icurl 'http://localhost:7777/api/class/faultRecord.xml'

> faultRecord.xml

icurl 'http://localhost:7777/api/class/eventRecord.xml'

> eventRecord.xml

icurl 'http://localhost:7777/api/class/aaaModLR.xml'

We can import and analyze active


faults, fault history, events history,
accounting log, login history

> aaaModLR.xml

icurl 'http://localhost:7777/api/class/aaaSessionLR.xml'

> aaaSessionLR.xml

cd /tmp
tar zcvf tac-655555555.tgz tac-655555555
cp tac-655555555.tgz /data/techsupport

Now you may download file from following URL:


https://apic/files/1/techsupport/tac-655555555.tgz

42

iShell filesystem - scriptcontainer


Linux
/ - APIC root filesystem
/var/run/bashroot
bashroot/var/log/dme/log

admin shell
/ - ishell root folder
/var/log/dme/log
/debug
/aci
/mit

/mgmt/log/scriptcontainer.log

43

Troubleshooting scenarios
44

spine 1

spine 2

Topology
2 x spine
2 x leaf N9K-9396px
(48 x 1/10G SFP+)

ACI Fabric

2 x leaf N9K-93128tx
(96 x 1/10G Base-T)

1 x leaf N9K-C9372px
leaf 1

(48 x 1/10G SFP+)

leaf 2

leaf 3

leaf 4

leaf 5

3 x APIC
10Gbps

apic 1

apic 2

apic 3
45

Troubleshooting Scenario

46

Troubleshooting Web UI performance


Open Web Browsers Developer Tools Network tab

Ctrl + Shift + I or F12


or
Cmd + Opt + I

Web Browsers Developer tool Network tab


Showing latency for each HTTP Request to APIC server

47

REST API call without webtoken

Verify if APIC is able


to process REST API
without
Login / APIC-cookie

http://apic/api/aaaListDomains.xml

Double-click on the
specific request to
check timing details.

10ms looks good

48

Note JSON is used by


APIC WebUI, while we
used XML.

How does it look from APICs side?


zegrep -A5 "aaaListDomains.json" /var/log/dme/log/nginx*
zegrep -A5 "aaaListDomains.xml" /var/log/dme/log/nginx.bin.log.*
nginx.bin.log.14.gz:

We could use any other


criteria for grep:
IP, time stamp etc

29701||15-05-10 23:11:05.701+02:00||nginx||DBG4||||Request received


/api/aaaListDomains.xml||../common/src/rest/./Rest.cc||62
bico 56.827
29701||15-05-10 23:11:05.701+02:00||nginx||DBG4||||httpmethod=1; from 10.48.16.90; url=/api/aaaListDomains.xml; url
options=||../common/src/rest/./Request.cc||103
29720||15-05-10 23:11:05.705+02:00||nginx||DBG4||co=doer:255:127:0xff00000003249f06:1||outCode:
200||../common/src/rest/./Worker.cc||357
29720||15-05-10 23:11:05.705+02:00||nginx||DBG4||co=doer:255:127:0xff00000003249f06:1||notifyEvent data ready
0x0||../common/src/rest/./Worker.cc||370
29701||15-05-10 23:11:05.706+02:00||nginx||DBG4||||Reply data (request 831 size 211) <?xml version="1.0"
encoding="UTF-8"?><imdata totalCount="4"><aaaLoginDomain name="LOCAL"/><aaaLoginDomain name="RADIUS"/><aaaLoginDomain
name="TACACS"/><aaaLoginDomain name="DefaultAuth" guiBanner=""/></imdata> Cookie:
NONE||../common/src/rest/./Rest.cc||120

49

Debug data of DMEs is also exposed via REST


APIC

DME

Debug URL

http://apic1/api/nginx/debug/tacacs.xml

50

Same debug data is accessible from ishell also


admin@apic1:~> cat /debug/bdsol-aci3-apic1/nginx/tacacs/mo
RequestsDispatched : 1511
ResponsesReceived : 1498
Check all other nifty stats by executing find /debug/*
Example:

admin@apic1:~> find /debug/* -print -type f -exec cat {} \;


You can also check logs matching certain criteria
Example below, looking for tacacs logs or specific time.

zegrep TAC_ /var/log/dme/log/nginx*


zegrep TAC_ /var/syslog/tmp_logs/nginx*
zegrep 15-05-09 03:48 /var/log/dme/log/*
51

Troubleshooting Scenario

52

Finding changes, faults


during certain timeframe
53

System health change


We noticed slight decrease in System health

Is the cause known?


Do we need to perform Root Cause Analysis?
Were there any known changes, maintenance etc?

were not sure should we call SWAT?


54

Weve suddenly experienced


connectivity loss nothing has
been changed

Dj vu?

Lets think for a second:


What is the the most common
cause of all network incidents?

Change!
55

We noticed slight decrease in System health

aaaModLR
aaaModLR - AAA audit log record,
which is automatically generated
whenever a user modifies
an object.

we want to check if there were any config changes


moquery -c aaaModLR -f 'aaa.ModLR.created==" 2015-05-10"'

Match only on May 10th 2015


moquery -c aaaModLR -f 'aaa.ModLR.created>" 2015-05-07" and aaa.ModLR.created<" 2015-05-10"'

Match audit records (aaaModLR)


between 2015-05-07 AND 2015-05-10
56

Example looking for audit records by date / time


admin@bdsol-aci2-apic1:~> moquery -c aaaModLR -f 'aaa.ModLR.created>" 2015-05-07T17:00" and aaa.ModLR.created<"2015-05-11"'
# aaa.ModLR
id
: 8589938110
affected
: uni/fabric/outofsvc/rsoosPath-[topology/pod-1/paths-101/pathep-[eth1/12]]
cause
: transition
changeSet
:
childAction :
code
: E4208269
created
: 2015-05-08T15:22:04.317+01:00
descr
: Interface topology/pod-1/paths-101/pathep-[eth1/12] enabled
dn
: subj-[uni/fabric/outofsvc/rsoosPath-[topology/pod-1/paths-101/pathep-[eth1/12]]]/mod-8589938110
ind
: deletion
modTs
: never
We dont do changes on non-business days and the day
rn
: mod-8589938110
severity
: info
before, so lets see who has performed any config between
status
:
Thursday evening and Monday morning
trig
: config
txId
: 10720396
user
: admin

admin configured interface eth1/12 on node 101

57

we found there were some admin changes on eth1/12

double click

faultRecord in GUI
We could also check:
eventRecord
healthRecord
58

Using moquery to dump/sort active faults (faultInst)


admin@apic1:~> moquery -c faultInst | egrep -e "^descr" | sort | uniq -c

quickly sorts all active faults


2
3
1
1
1
1
1
1

descr
descr
descr
descr
descr
descr
descr
descr

:
:
:
:
:
:
:
:

Configuration failed for EPG default due to Not Associated With Management Zone
Datetime Policy Configuration for F5clock failed due to : access-epg-not-specified
Failed to form relation to MO AbsGraph-VEStandAloneFuncProfile of class vnsAbsGraph
Failed to form relation to MO fwP-default of class nwsFwPol in context uni/infra
Ntp configuration on leaf leaf1 is Not Synchronized
Ntp configuration on leaf leaf2 is Not Synchronized
Ntp configuration on spine spine1 is Not Synchronized
Power supply shutdown. (serial number DCB18CLUS15)

Now we could query all faults by criteria such as description (fault.Inst.descr)


moquery c faultInst f fault.Inst.descr==: Failed to form relation to MO AbsGraph-VEStandAloneFuncProfile
59

Troubleshooting Scenario

60

NX-OS Style CLI in APIC 1.2


show endpoints
show interface bridge-domain
show health tenant
show health leaf
show faults
show faults last-days 1 history

apic1# show cli manpage ?


WORD Command Name
apic1# show cli manpage show
Cisco APIC NX-OS Style CLI Command Reference

CLI Help and Link to CLI


Reference for your
convenience

show events last-hours 8 leaf 102


show audits last-minutes 59 leaf 101

show stats granularity 15min leaf 101 interface ethernet 1/2


61

Example show stats CLI output in APIC 1.2(1)


apic1# show stats granularity 15min leaf 101 interface ethernet 1/2
Start Time
Counter
Value
-------------------- ---------------------------------------- -------------------2016-01-17 10:59:52
Ingress buffer drop packets
0
2016-01-17 10:59:52
Ingress error drop packets
0
2016-01-17 10:59:52
Ingress forwarding drop packets
0
2016-01-17 10:59:52
Ingress link utilization
0
2016-01-17 10:59:52
Ingress load balancer drop packets
0
2016-01-17 10:59:52
Total ingress bytes
35,117,721
2016-01-17 10:59:52
Total ingress bytes rate
37,331
2016-01-17 10:59:52
Total ingress packets
101,816
2016-01-17 10:59:52
Total ingress packets rate
113
2016-01-17 10:59:40
Egress afd wred packets
0
2016-01-17 10:59:40
Egress buffer drop packets
0
2016-01-17 10:59:40
Egress error drop packets
0
2016-01-17 10:59:40
Egress link utilization
0
2016-01-17 10:59:40
Total egress bytes
22,850,916
2016-01-17 10:59:40
Total egress bytes rate
25,236
2016-01-17 10:59:40
Total egress packets
104,837
2016-01-17 10:59:40
Total egress packets rate
117

Unit
-----------------------packets
packets
packets
%
packets
bytes
bytes-per-second
packets
packets-per-second
packets
packets
packets
%
bytes
bytes-per-second
packets
packets-per-second

62

Troubleshooting Scenario

63

Troubleshooting:
APIC Faults / Visore / debug.log / LTM log

https://<APIC>/visore.html

APIC Faults

/data/devicescript/F5.BIGIP.1.1.0/logs/debug.log

/var/log/*
64

Scenario: Graph failed-to-apply


After clicking Finish to deploy the graph in a contract
Under Deployed Graph Instances

You may see graph in the state failed-to-apply

65

APIC Faults

If need more details,


copy the affect object

Double click
on faults
66

Example L4-L7 fault details using Visore Tool


https://apic/visore.htm

Paste the affected object


in Class or DN field

Provide full details of the


issues

67

APIC debug.log
Locate the APIC that contains the shard configuring the BIG-IP, then go to
the following location:
admin@apic1:~> cd /data/devicescript/F5.BIGIP.1.0.0/logs
You will see debug.log and periodic.log

admin@apic1:logs> ls all
-rw-r--r-- 2 nobody nobody 52688 Sep 30 11:31 debug.log
-rw-r--r-- 2 nobody nobody 35492 Sep 30 11:30 periodic.log
You can tail -f debug.log to monitor the process
68

APIC debug.log (faults)

Example: mcpd

2014-07-25 18:04:00,675 DEBUG 139789634365184 [172.23.76.198, 8534]: Faults: []


2014-07-25 18:05:47,466 DEBUG 139789634365184 [172.23.76.198, 8543]: result: serviceAudit {'stats':
{'max': 20.035178899765015, 'num': 2, 'last': 20.035178899765015, 'avg': 16.63836646080017, 'min':
13.241554021835327}, 'result': {'faults': [([], 82, "Line 100 apic/service.py::modify: Could not
configure service state: Server raised fault: 'Exception caught in
Networking::urn:iControl:Networking/RouteDomainV2::get_identifier()\nException:
Common::OperationFailed\n\tprimary_error_code
: 17237812 (0x01070734)\n\tsecondary_error_code :
0\n\terror_string
: 01070734:3: Configuration error: Invalid mcpd context, folder not found
(/apic_5794)'")], 'state': 3, 'health': [([], 0)]}}
2014-07-25 18:05:47,467 DEBUG 139789634365184 [172.23.76.198, 8543]: Faults: [([], 82, "Line 100
apic/service.py::modify: Could not configure service state: Server raised fault: 'Exception caught in
Networking::urn:iControl:Networking/RouteDomainV2::get_identifier()\nException:
Common::OperationFailed\n\tprimary_error_code
: 17237812 (0x01070734)\n\tsecondary_error_code :
0\n\terror_string
: 01070734:3: Configuration error: Invalid mcpd context, folder not found
(/apic_5794)'")]

69

APIC debug.log (faults)


Example: Tagging mismatch
2014-10-07 13:09:51,166 DEBUG 140447157077760 [198.18.128.130, 76]: Faults: []
2014-10-07 13:09:51,187 DEBUG 140447157077760 [None, None]: Waiting for task
2014-10-07 13:09:53,847 DEBUG 140447148685056 [198.18.128.130, 76]: route_domain: Allocated route
domain 907
2014-10-07 13:09:53,957 DEBUG 140447148685056 [198.18.128.130, 76]: route_domain: Setting route domain
907 on device BIGIP1
2014-10-07 13:09:54,140 INFO 140447148685056 [198.18.128.130, 76]: Line 664
apic/service.py::_modify_vlan: Target: : Creating VLAN '4663_16387' ID 202
2014-10-07 13:09:56,532 INFO 140447148685056 [198.18.128.130, 76]: Line 679
apic/service.py::_modify_vlan: Target: : Modifying VLAN '4663_16387' interface '1.1'
2014-10-07 13:09:57,304 DEBUG 140447148685056 [198.18.128.130, 76]: result: serviceModify {'stats':
{'max': 39.48741388320923, 'num': 4, 'last': 6.139014005661011, 'avg': 21.184859931468964, 'min':
6.139014005661011}, 'result': {'faults': [([(0, '', 4663), (7, '', '2752512_16387')], 81, "Line 383
apic/handlers.py::set_interface: device: : VLAN ifc update fail: Server raised fault: 'Exception
caught in Networking::urn:iControl:Networking/VLAN::add_member()\nException:
Common::OperationFailed\n\tprimary_error_code
: 17236569 (0x01070259)\n\tsecondary_error_code :
0\n\terror_string
: 01070259:3: Requested member (1.1) is untagged on another VLAN'")],
'state': 2, 'health': []}}

70

BIG-IP LTM log


SSH as root into BIG-IP and go to:
[root@bigip:Active:In
[root@bigip:Active:In
ltm
ltm.11.gz
ltm.10.gz ltm.1.gz

Sync] log
Sync] log
ltm.2.gz
ltm.3.gz

# cd /var/log
# ls ltm*
ltm.4.gz ltm.6.gz
ltm.5.gz ltm.7.gz

ltm.8.gz
ltm.9.gz

Example

output

Jul 19 11:57:53 apic-bigip2 notice mcpd[7439]: 01070638:5: Pool /apic_5668/apic_5668_webPool member /apic_5668/192.168.10.101%1295:80 monitor status
down. [ /apic_5668/apic_5668_webMonitor: down ] [ was up for 20hrs:55mins:46sec ]
Jul 19 11:57:54 apic-bigip2 notice mcpd[7439]: 01070638:5: Pool /apic_5668/apic_5668_webPool member /apic_5668/192.168.10.102%1295:80 monitor status
down. [ /apic_5668/apic_5668_webMonitor: down ] [ was up for 20hrs:55mins:47sec ]
Jul 19 11:57:54 apic-bigip2 notice mcpd[7439]: 01071682:5: SNMP_TRAP: Virtual /apic_5668/apic_5668_4096_Virtual-Server has become unavailable
Jul 19 11:57:54 apic-bigip2 err tmm[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool
Jul 19 11:57:54 apic-bigip2 err tmm1[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool
Jul 19 11:57:54 apic-bigip2 err tmm2[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool
Jul 19 11:57:54 apic-bigip2 err tmm3[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool
Jul 19 12:03:02 apic-bigip2 err iprepd[6725]: 015c0004:3: failed connect to 208.87.136.155 on 443
Jul 19 12:03:03 apic-bigip2 err iprepd[6725]: 015c0004:3: Certificate verification error: 18
Jul 19 12:03:03 apic-bigip2 err iprepd[6725]: 015c0004:3: nSendReceiveSsl failed SSL handshake
Jul 19 12:04:11 apic-bigip2 info pfmand[6925]: 01660009:6: Link: 2.1 is DOWN
Jul 19 12:04:11 apic-bigip2 info pfmand[6925]: 01660009:6: Link: 2.2 is DOWN

71

Access Encap
to
Fabric Encap
72

spine 1

spine 2

EP A to EPB - simplified
2

1 Regular L2 packet
2 iVXLAN packet
3 Regular L2 packet

leaf 1

leaf 2

leaf 3

leaf 5

leaf 4

1
EP A

EP B
73

spine 1

spine 2

How to identify VLAN mapping


Scenario:
VM A is unable to reach
other endpoints
connected to the Fabric
- ping doesnt work
- ARP doesnt work

leaf 1

leaf 2

leaf 3

leaf 4

leaf 5

linux VM A:
connected to ACI fabric
VM A

MAC: 00:00:33:33:33:33

VLAN 3399
74

What happens when packet from EP A reaches leaf


To Spines

1 packet first comes to


Merchant ASIC (BCM)

8/12 x 40G

leaf 1

Cisco
ASIC

2 forwarded to destination
if its known on BCM
3 if destination not
learned in BCM
forwarding table, then
send to Cisco ASIC

leaf 1

eth 1/34
8/12 x 40G

Merchant

ASIC
48/96 x 10G

To servers/blade, switches

EP A

MAC: 00:00:33:33:33:33
75

Linux view

VM MAC: 00:00:33:33:33:33

VM thinks its interface is in


VLAN 3399
76

checking l2 forwarding table


on Broadcom

bcm-shell-hw
switch# bcm-shell-hw "l2 show"

mac=52:54:00:b0:c4:81 vlan=57 GPORT=0x22 modid=0 port=34/xe33 Hit

mac=58:f3:9c:24:2e:87 vlan=15 GPORT=0x2 modid=0 port=2/xe1 Hit


mac=00:00:33:33:33:33 vlan=57 GPORT=0x22 modid=0 port=34/xe33 Hit
mac=52:54:00:c3:b8:2c vlan=58 GPORT=0x22 modid=0 port=34/xe33 Hit

mac=00:22:bd:e2:e2:e2 vlan=49 GPORT=0x7f modid=2 port=127

Static

Broadcom says its


VLAN 57
77

from ishell command


interface

MAC learning from ACI switch


switch# show mac address-table interface ethernet 1/34
Legend:

VLAN

show interface eth 1/34 switchport


* to check if VLANs 53/54 are enabled on thet eth1/34 interface
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
MAC Address

Type

age

Secure NTFY Ports/SWID.SSID.LID

---------+-----------------+--------+---------+------+----+-----------------* 53

0000.3333.3333

dynamic

eth1/34

* 53

5254.00b0.c481

dynamic

eth1/34

* 54

5254.00c3.b82c

dynamic

eth1/34

iShell CLI says its VLAN 53


78

so which VLAN is it?

note: were in vsh_lc CLI

module-1# show system internal eltmc info vlan access_encap_vlan 3399


vlan_id:

53

:::

hw_vlan_id:

57

vlan_type:

FD_VLAN

:::

bd_vlan:

52

access_encap_type:

802.1q

:::

access_encap:

3399

fabric_encap_type:

VXLAN

:::

fabric_encap:

9891

sclass:

16387

:::

scope:

bd_vnid:

9891

:::

untagged:

acess_encap_hex:

0xd47

:::

fabric_enc_hex:

0x26a3

its iVXLAN 9891 ??

Encap VLANs, VXLANs are


normalized in ACI Switch,
everything in the fabric is
iVXLAN.
79

Is this actually possible with ACI?

80

Troubleshooting Scenario

81

End Point Search


We can search End Point by
IPv4, IPv6 or MAC address

* Search by wildcard will be available in APIC 1.2(2) release

82

Troubleshooting Scenario

83

Hint: To check list of VRF names:

iPing CLI

show vrf

usage:
iping [-V vrf] [-c count] [-S source ip] host

options:
-V
: vrf to use for ping (management/overlay-1/Tenant VRF)
-c
: # of requests to send.
-i
: interval between ICMP echo packets.
-t
: Timeout for responses.
-p
: Data pattern in payload.
-s
: Size
-S : Source Interface name/ IP address.

84

spine 1

spine 2

iping internals
leaf1# iping V tenant:vrf01 S 64.101.1.1 64.101.1.22

Note: iping is initiated from leaf1


since EP_A is learned on leaf1 packet will be
sent out directly to ep, not going via spines

Recommended: set the source IP address desired GW (BD IP)

1 leaf1: iping to Endpoint_A (EP_A)


2 EP_A (.22): responds to leaf1

leaf 1

leaf 2

leaf 3

leaf 4

leaf 5

1
EP A

Endpoint_A IP: 64.101.1.22

85

spine 1

spine 2

iping internals
leaf4# iping V tenant:vrf01 S 64.101.1.1 64.101.1.22

1 leaf4: iping to Endpoint_A (EP_A)

(icmp echo request to leaf1 TEP)

2 leaf1: ping to Endpoint_A (EP_A)


3 EP_A (.22): responds to leaf4
2
(via leaf1 and fabric)

leaf 1

Note: we initiated iping from leaf4


since EP_A is learned on leaf1
packet will be sent via fabric (via spines)

leaf 2

leaf 3

leaf 4

leaf 5

ICMP echo reply packet to the remote leaf4 node is


relayed by the local leaf1 node

EP A

Endpoint_A IP: 64.101.1.22

86

Troubleshooting Scenario

87

Check ingress traffic rate from CLI multiple ports


leaf1# watch -n 5 -d bcm-shell-hw "show c All RPKT.xe0-16"
Every 5.0s: bcm-shell-hw show c All RPKT.xe0-16
Tue Feb 9 06:06:52 2016
unit is 0
RPKT.xe0
RPKT.xe1
RPKT.xe2
RPKT.xe3
RPKT.xe4
RPKT.xe5
RPKT.xe6
RPKT.xe7
RPKT.xe8
RPKT.xe9
RPKT.xe10
RPKT.xe11
RPKT.xe12
RPKT.xe13
RPKT.xe14
RPKT.xe15

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

368,075,657
351,308,235
332,607,921
0
60,649
60,696
0
0
193,423
1,493,189
10,965,614
0
0
0
6,577,648
0

+253
+264
+212
+0
+0
+0
+0
+0
+0
+1
+5
+0
+0
+0
+0
+0

84/s
87/s
70/s

Convenient way to check traffic rate


on multiple ports at the same time.

2/s

*try also

watch -d bcm-shell-hw "show counters All TPKT"


88

Troubleshooting Scenario

89

Capacity Dashboard

Capacity Dashboard panel displays your usage by range and percentage.

In the example above we


configured large number
of contracts as demo for
this feature
90

Troubleshooting Scenario

91

Visibility and Troubleshooting


0
1

0 define session name

1 select end point 1


2 select end point 2

3 start

We define session name and select End Points wed like to troubleshoot visually

92

Example connectivity diagram generated for the


selected two end points.
We can further select info for particular datapath

93

Troubleshooting Scenario

94

ELAM
95

What is ELAM?

ELAM stands for Embedded Logic Analyzer Module


It is a logic that is present in the ASICs that provides the
capability to capture and view one or more packets, that
match a user specified criteria, from the stream of
packets that are processed by the ASIC

96

ELAM Support in Cisco ASIC


To Fabric

From Fabric
Parser Block

Sideband

Packet RW

Lookup Block
ELAM

ELAM

Input
Select
Lines

Output
Select
Lines

ELAM

ELAM

Output
Select
Lines
Packet RW

Input
Select
Lines
Sideband
To BCM

Egress Pipeline (FabricFrontPanel)

Lookup Block

Parser Block
From BCM

Ingress Pipeline (FrontPanelFabric)


97

ELAM Support in North Star

North Star data path divided into ingress and egress pipelines

2 ELAMs are present in each pipeline (Input ELAM and Output ELAM)

These ELAMs are present at the beginning and end of the lookup block.

ELAMs can be configured using the available select lines

Packets can be captured on the input ELAM based on a output condition


by configuring ELAM in reverse mode

Limitations

Packets can be captured based on either input select lines or output select
lines but not both.

ELAM Configuration should happen in a single user mode

98

ELAM Support

Cisco ASIC data path divided into ingress and egress pipelines

2 ELAMs are present in each pipeline (Input ELAM and Output ELAM)

These ELAMs are present at the beginning and end of the lookup block.

ELAMs can be configured using the available select lines

Packets can be captured on the input ELAM based on a output condition by


configuring ELAM in reverse mode

Limitations

Packets can be captured based on either input select lines or output select lines but
not both.

ELAM Configuration should happen in a single user mode


99

ELAM Support
Input Select Lines Supported
3 Outerl2-outerl3-outerl4
4 Innerl2-innerl3-inner l4
5 Outerl2-innerl2
6 Outerl3-innerl3
7 Outerl4-innerl4
Output Select Lines Supported
0 Pktrw
5 Sideband

Note:
Only output select lines 0 and 5 are supported
for capturing
packets based on output at both output and
input
100

ELAM Configuration
1. Init

The diagram flow during ELAM configuration.


Init Initialize the ELAM select the asic instance,

pipeline and select lines

2. Config

Config Configure the trigger based on different fields

in the packet

3. Arm
Trigger

4. Read

Arm Arm the trigger by setting the fields to match in

hardware
Read Once the trigger is triggered, read the report.
Reset Once the process is complete, reset the trigger

to restart the process

5. Reset
101

ELAM configuration
Show the trigger
The configured trigger can be verified using the show command
root@module-1(NS-elam-insel3)# show

102

ELAM Report Analysis


Elam report is very detailed and dumps many fields.
In Pktrw the important fields are

adj_index
ol_encap_idx
sclass
src_tep_idx
sup_redirect

In Sideband the important fields are

l2flood
fwddrop
bnce
103

ELAM Example
104

What happens when packet from EP A reaches leaf


To Spines

1 packet first comes to


Merchant ASIC (BCM)

8/12 x 40G

leaf 1

Cisco
ASIC

2 forwarded to destination
if its known on BCM
3 if destination not
learned in BCM
forwarding table, then
send to Cisco ASIC

leaf 1

eth 1/10
8/12 x 40G

Merchant

ASIC
48/96 x 10G

To servers/blade, switches

EP A

MAC: 00:25:b5:aa:00:0a
105

spine 1

spine 2

ELAM Example
ingress
1 leaf1: input ingress
outer header

2 spine: input ingress


inner header

3 leaf4: input egress


egress
inner header

leaf 1

3
leaf 2

leaf 3

leaf 4

leaf 5

1
EP A

EP B
106

spine 1

spine 2

ELAM Example
1 leaf1: input ingress
ingress
outer header
outer
vsh_lc
debug platform internal ns elam asic 0
trigger reset
trigger init ingress in-select 3 out-select 0
set outer l2 src_mac 00:25:b5:aa:00:0a
set outer l2 dst_mac ff:ff:ff:ff:ff:ff
start
status
report

Note: outer header


Packet is not yet encapsulated in iVXLAN
Outer header is still original frame from EP

leaf 1

leaf 2

leaf 3

leaf 4

leaf 5

1
EP A

MAC: 00:25:b5:aa:00:0a

EP B

MAC: 00:25:b5:bb:00:0b

107

ELAM configuration
leaf1# vsh_lc
module-1# debug platform internal ns elam asic 0
module-1(NS-elam)# trigger reset
module-1(NS-elam)# trigger init ingress in-select 3 out-select 0
module-1(NS-elam-insel3)# set outer l2 src_mac 00:25:b5:aa:00:0a
module-1(NS-elam-insel3)# set outer l2 dst_mac ff:ff:ff:ff:ff:ff
module-1(NS-elam-insel3)# start
module-1(NS-elam-insel3)# status
Status: Armed
module-1(NS-elam-insel3)# ?
report Show trigger report

module-1(NS-elam-insel3)# report
ELAM not triggered. No report available

Were looking to
confirm if broadcast
packet sourced from
MAC
00:25:b5:aa:00:0a
is reaching
Cisco ASIC

NOTE:
1) Without the "reset" command, trigger buffers are never reset other than reboot.
2) Users can move in and out of the ELAM mode, and there will be no impact on the configured
108
triggers.

ELAM Report Analysis


(trigger went off)

hg2_srcpid: source port on front panel


ce_sa: Source MAC address
ce_etype: Ethertype 0x806 = ARP (Address Resolution)
ar_spa: Source IP address = 10.16.128.48
ar_tpa: Destination IP address: 10.16.128.1

module-1(NS-elam-insel3)# report | egrep ce_|ar_|drop|hg2_src


GBL_C++: [INFO]
hg2_srcpid: 0A
GBL_C++: [INFO]
ce_da: FFFFFFFFFFFF
GBL_C++: [INFO]
ce_sa: 0025B5AA000A
GBL_C++: [INFO]
ce_etype: 0806
GBL_C++: [INFO]
ar_sha: 0025B5AA000A
GBL_C++: [INFO]
ar_spa: 0A108030
GBL_C++: [INFO]
ar_tha: 000000000000
GBL_C++: [INFO]
ar_tpa: 0A108001
GBL_C++: [INFO]
ar_spare: 0000000000000000000000000000
GBL_C++: [MSG]
- pktrw is complete
module-1(NS-elam-insel3)# show platform internal ns forwarding encap 0x2FF6
GBL_C++: [INFO]
drop: 0
TABLE INSTANCE : 0
GBL_C++: [INFO]
hg2_srcpid: 0A
Legend
GBL_C++: [INFO]
hg2_vid_lo: 63
MD: Mode (LUX & RWX)
LB: Loopback
GBL_C++: [INFO]
vlan0: 063
LE: Loopback ECMP
LB-PT: Loopback Port
GBL_C++: [INFO]
adj_index: 000C
VXLAN Destination

ML:
MET
Last
TD: TTL Dec Disable
GBL_C++: [INFO]
ol_encap_idx: 2FF6
DV: Dst Valid
DT-PT: Dest Port
TEP address derived
GBL_C++: [INFO]
ol_ttl: 08
ET: Encap Type
GBL_C++: [INFO]
ol_segid: 2A8001 DT-NP: Dest Port Not-PC
from encap:
OP: Override PIF Pinning
HR: Higig DstMod RW
GBL_C++: [INFO]
sclass: C005
HG-MD: Higig DstMode
KV: Keep VNTAG
10.0.200.127
GBL_C++: [INFO]
sup_redirect: 0
-----------------------------------------------------------GBL_C++: [INFO]
mcast: 0

M PORT L L LB MET M T D DT DT E TST O H HG K M E


POS
D FTAG B E PT PTR L D V PT NP T IDX P R MD V D T Dst MAC
DIP
-------------------------------------------------------------------------------------------------------------------------------------------------People that read hex on the fly appreciate
this output!
--12278 0 c00 0 1 0
0 0 0 0 0 0 3
4 0 0 0 0 0 3 00:00:00:00:00:00 10.0.200.127109

We have destination TEP address, what next?


Find which switch has specific TEP

On APIC or Switch

acidiag fnvread | egrep 10.0.200.127


moquery -c tunnelIf -f 'tunnel.If.dest=="10.0.200.127"

show isis dtep vrf overlay-1

switch output
APIC is not running ISIS
protocol

# show isis dtep vrf overlay-1


IS-IS Dynamic Tunnel End Point (DTEP) database:
DTEP-Address
Role
Encapsulation
Type
10.0.120.95
SPINE
N/A
PHYSICAL
10.0.200.64
SPINE
N/A
PHYSICAL,PROXY-ACAST-MAC
10.0.200.65
SPINE
N/A
PHYSICAL,PROXY-ACAST-V4
10.0.8.65
SPINE
N/A
PHYSICAL,PROXY-ACAST-V6
10.0.8.64
LEAF
N/A
PHYSICAL
10.0.200.127
LEAF
N/A
PHYSICAL
10.0.200.126
SPINE
N/A
PHYSICAL

110

spine 1

spine 2

ELAM Example
ingress
2 spine: input ingress
inner header
inner

2
Cisco ASIC
in spine

vsh_lc
debug platform internal alp elam asic 0 | 1
trigger init ingress in-select 3 out-select 0
set inner l2 src_mac 00:25:b5:aa:00:0a
set inner l2 dst_mac 00:25:b5:bb:00:0b
start
status
report

Hint: dont forget trigger reset

Packet is now encapsulated in iVXLAN, so


were looking for inner header

leaf 1

leaf 2

leaf 3

leaf 4

leaf 5

1
EP A

MAC: 00:25:b5:aa:00:0a

EP B

MAC: 00:25:b5:bb:00:0b

111

spine 1

spine 2

ELAM Example
egress
3 leaf4: input egress
inner header
inner
Egress because were egressing the fabric

3
leaf 1

leaf 2

leaf 3

leaf 4

leaf 5

Cisco ASIC
in leaf

vsh_lc
debug platform internal ns elam asic 0
trigger init egress in-select 3 out-select 0
set inner l2 src_mac 00:25:b5:aa:00:0a
set inner l2 dst_mac 00:25:b5:bb:00:0b
start
status
report

*** report will be available when trigger went off

1
host A

MAC: 00:25:b5:aa:00:0a

report
host B

MAC: 00:25:b5:bb:00:0b

112

References

113

APIC resources

Quick Start / Videos


APIC Help pages

API Documentation
Python SDK

114

Online resources

ACI Documentation - cisco.com/go/aci


Cisco.com APIC Troubleshooting
Cisco Support Forums
Cisco DevNet
GitHub/datacenter
115

GitHub a resource for ACI scripts and tools

ACI Toolkit:
http://datacenter.github.io/acitoolkit/
https://github.com/datacenter/acitoolkit

ACI Diagram
https://github.com/cgascoig/aci-diagram

ACI Endpoint Tracker


http://datacenter.github.io/acitoolkit/docsb
uild/html/endpointtracker.html

116

Troubleshooting
Cisco ACI
Available at GitHub

117

Policy Driven Data


Center with ACI,
The: Architecture,
Concepts, and
Methodology
ISBN: 9781587144905

118

Designing Data
Centers with
Cisco's ACI
LiveLessons-Networking Talks
ISBN: 978-1-58714-436-3

119

Call to Action

Visit the World of Solutions for


Cisco Campus ACI
Walk in Labs ACI
Technical Solution Clinics

Meet the Engineer

Lunch and Learn Topics

DevNet zone related sessions

120

Complete Your Online Session Evaluation

Please complete your online session


evaluations after each session.
Complete 4 session evaluations
& the Overall Conference Evaluation
(available from Thursday)
to receive your Cisco Live T-shirt.

All surveys can be completed via


the Cisco Live Mobile App or the
Communication Stations

121

Thank you

122

Vous aimerez peut-être aussi