JTNOC 12.b SG Vol.1 PDF

Junos Troubleshooting in the NOC
12.b
Student Guide
Volume1
Jun1Pec NETWORKS
Worldwide Education Services
1133 Innovation Way

Sunnyvale, CA 94089
USA
408-745-2000
www.juniper.net
Course Number: EDU-JUN-JTNOC

This document is produced byJuniper Networks, Inc.
This document or any part thereof may not be reproduced or transmitted in any form under penalty of law. without the prior written permission of Juniper Networks
Education Services.
Juniper Networks, the Juniper Networks logo,Junos, NetScreen, and ScreenOS are registered trademarks of Juniper Networks, Inc. in the United States and other
countries. All other trademarks. seivice marks, registered trademarks, or registered seivice marks are the property of their respective owners.
Junos Troubleshooting in the NOC Student Guide, Revision 12.b

Copyright© 2014 Juniper Networks, Inc. All rights reserved.
Printed in USA.
Revision History:
Revision 10.a-Oecember 2010
Revision 11.a-June 2011
Revision 12.a-March 2013
Revision 12.b-January 2014
The information in this document is current as of the date listed above.
The information in this document has been carefully verified and is believed to be accurate for software Release 12.2R2.5. Juniper Networks assumes no
responsibilities for any inaccuracies that may appear in this document. In no event will Juniper Networks be liable for direct. indirect. special, exemplary,
incidental, or consequential damages resulting from any defect or omission in this document, even if advised of the possibility of such damages.
Juniper Networks reserves the right to change, modify, transfer. or otherwise revise this publication without notice.
YEAR 2000 NOTICE
Juniper Networks hardware and software products do not suffer from Year 2000 problems and hence are Year 2000 compliant. The Junos operating system has
no known time-related limitations through the year 2038. However. the NTP application is known to have some difficulty in the year 2036.
SOFTWARE LICENSE
The terms and conditions for using Juniper Networks software are described in the software license provided with the software, or to the extent applicable, in an
agreement executed between you andJuniper Networks, orJuniper Networks agent. By using Juniper Networks software, you indicate that you understand and
agree to be bound by its license terms and conditions. Generally speaking, the software license restricts the manner in which you are permitted to use the Juniper
Networks software. may contain prohibitions against certain uses. and may state conditions under which the license is automatically terminated. You should
consult the software license for further details.
Contents
Chapter 1: Course Introduction .....................................................1-1
Chapter 2: Troubleshooting as a Process .............................................2-1

Before You Begin .............................................................. 2-3
The Troubleshooting Process ...................................................2-13
Challenging Network Issues .................................................... 2-41
The Troubleshooting Process Lab ...............................................2-54
Chapter 3: Junos Product Families...................................................3-1

The Junos OS ................................................................. 3-3
Control Plane and Data Plane .................................................... 3-7
Field-Replaceable Units ........................................................3-22
Junos Product Families ........................................................3-31
Identifying Hardware Components Lab ...........................................3-57
Chapter 4: Troubleshooting Toolkit ..................................................4-1

Troubleshooting Tools .......................................................... 4-3
Best-Practices ...............................................................4-32
Monitoring Tools and Establishing a Baseline Lab ..................................4-44
Chapter 5: Hardware and Environmental Conditions ....................................5-1

Hardware Troubleshooting Overview .............................................. 5-3
Memory and Storage ........................................................... 5-6
Boot Monitoring ..............................................................5-16
Hardware-Related System Logs .................................................5-27
Chassis and Environmental Monitoring ...........................................5-30
Monitoring Hardware and Environmental Conditions Lab ............................5-44
Chapter 6: Control Plane...........................................................6-1

Control Plane Review ........................................................... 6-3
System and User Processes ...·.................................................. 6-7
Monitoring Routing Tables and Protocols .........................................6-16
Monitoring Bridging ...........................................................6-40
Monitoring the Address Resolution Protocol .......................................6-44
Control Plane Monitoring and Troubleshooting Lab .................................6-50
Chapter 7: Data Plane: Interfaces ...................................................7-1

Interface Properties ............................................................ 7-3
General Interface Troubleshooting ................................................ 7-8
Ethernet Interface Troubleshooting ..............................................7-23
Monitoring and Troubleshooting Ethernet Interfaces Lab ............................7-34
www.juniper.net Contents • iii

Chapter 8: Data Plane: Other Components ........................................... 8-1
Definition of a Data Plane Problem ............................................... 8-3
Data Plane Components ........................................................ 8-6
Data Plane Forwarding ......................................................... 8-9
Load-Balancing Behavior ......................................................8-15
Firewall Filters and Policers ....................................................8-19
Data Plane Troubleshooting Case Study ..........................................8-23
Isolate and Troubleshoot PFE Issues Lab .........................................8-31
Acronym List ...................................................................ACR-1
iv • Contents www.juniper.net
Course Overview
This three-day course is designed to provide introductory troubleshooting skills for engineers in a
network operations center (NOC) environment. Key topics within this course include
troubleshooting methodology, troubleshooting tools, hardware monitoring and troubleshooting,
interface monitoring and troubleshooting, troubleshooting the data plane and control plane on
devices running the Junos operating system, staging and acceptance methodology,
troubleshooting routing protocols, monitoring the network, and working with JTAC. This course is
based on Junos operating system Release 12.2R2.5.
Objectives
After successfully completing this course, you should be able to:
Reduce the time it takes to identify and isolate the root cause of an issue impacting
your network.
Gain familiarity with Junos products as they pertain to troubleshooting.
Become familiar with online resources valuable to Junos troubleshooting.
Gain familiarity with Junos tools used in troubleshooting.
Identify and isolate hardware issues.
Troubleshoot problems with the control plane.
Troubleshoot problems with interfaces and other data plane components.
Describe the staging and acceptance methodology.
Troubleshoot routing protocols.
Describe how to monitor your network with SNMP, RMON, JFlow, and port mirroring.
Become familiar with JTAC procedures.
Intended Audience
The course content is aimed at operators of devices running the Junos OS in a NOC environment.
These operators include network engineers, administrators, support personnel, and reseller
support personnel.
Course Level
Junos Troubleshooting in the NOC is an introductory-level course.
Prerequisites
Students should have basic networking knowledge and an understanding of the Open Systems
Interconnection (OSI) reference model and the TCP/IP protocol suite. Students should also attend
the Introduction to the Junos Operating System (IJOS) course and the Junos Routing Essentials
(JRE) course, or have equivalent experience prior to attending this class.
www.juniper.net Course Overview • v

Course Agenda
Day 1
Chapter 1: Course Introduction
Chapter 2: Troubleshooting as a Process
Lab 1: The Troubleshooting Process
Chapter 3: Junos Product Families
Lab 2: Identifying Hardware Components
Chapter 4: Troubleshooting Toolkit
Lab 3: Monitoring Tools and Establishing a Baseline
Day2
Chapter 5: Hardware and Environmental Conditions
Lab 4: Monitoring Hardware and Environmental Conditions
Chapter 6: Control Plane
Lab 5: Control Plane Monitoring and Troubleshooting
Chapter 7: Data Plane: Interfaces
Lab 6: Monitoring and Troubleshooting Ethernet Interfaces
Chapter 8: Data Plane: Other Components
Lab 7: Isolate and Troubleshoot PFE Issues
Day3
Chapter 9: Staging and Acceptance Testing
Chapter 10: Troubleshooting Routing Protocols
Lab 8: Troubleshooting Routing Protocols
Chapter 11: High Availability
Chapter 12: Network Monitoring
Lab 9: Monitoring the Network
Chapter 13: JTAC Procedures
Appendix A: Interface Troubleshooting
vi • Course Agenda www.juniper.net

Document Conventions
CLI and GUI Text

Frequently throughout this course, we refer to text that appears in a command-line interface (CU)
or a graphical user interface (GUI). To make the language of these documents easier to read, we
distinguish GUI and CU text from chapter text according to the following table.
Style Description Usage Example
Franklin Gothic Normal text. Most of what you read in the Lab Guide
and Student Guide.
Courier New Console text:

commit complete
Screen captures
Noncommand-related Exit ing conf iguration mode
syntax
GUI text elements:
Select File > Open, and then click
Menu names Configuration.confin the
Filename text box.
Text field entry
Input Text Versus Output Text

You will also frequently see cases where you must enter input text yourself. Often these instances
will be shown in the context of where you must enter them. We use bold style to distinguish text
that is input versus text that is simply displayed.
Normal CLI No distinguishing variant. Physical interface:fxpO,

Enabled
Normal GUI
View configuration history by clicking
Configuration > Histor�
CLI Input Text that you must enter. lab@San_Jose> show route
GUI Input Select File > Save, and type
config. ini in the Filename field.
Defined and Undefined Syntax Variables

Finally, this course distinguishes between regular text and syntax variables, and it also
distinguishes between syntax variables where the value is already assigned (defined variables) and
syntax variables where you must assign the value (undefined variables). Note that these styles can
be combined with the input style as well.
CLI Variable Text where variable value is pol icy my-peers

already assigned.
GUI Variabl.e
Click my-peers in the dialog.
CLI Undefined Text where the variable's value Type set policy po.licy-name.
is the user's discretion and text
ping 10.0.�
where the variable's value as
GUI Undefined shown in the lab guide might Select File > Save, and type
differ from the value the user fi.lename in the Filename field.
must input.
www.juniper.net Document Conventions • vii

Additional Information
Education Services Offerings

You can obtain information on the latest Education Services offerings, course dates, and class
locations from the World Wide Web by pointing your Web browser to:
http:j/www.juniper.net/training/education/.
About This Publication

The Junos Troubleshooting in the NOC Student Guide was developed and tested using software
Release 12.2R2.5. Previous and later versions of software might behave differently so you should
always consult the documentation and release notes for the version of code you are running before
reporting errors.
This document is written and maintained by the Juniper Networks Education Services development
team. Please send questions and suggestions for improvement to training@juniper.net.
Technical Publications
You can print technical manuals and release notes directly from the Internet in a variety of formats:
Go to http://www.juniper.net/techpubs/.
Locate the specific software or hardware release and title you need, and choose the
format in which you want to view or print the document.
Documentation sets and CDs are available through your local Juniper Networks sales office or
account representative.
Juniper Networks Support

For technical support, contact Juniper Networks at http:j/www.juniper.net/customers/supportf, or
at 1-888-314-JTAC (within the United States) or 408-745-2121 (from outside the United States).
viii • Additional Information www.juniper.net

JUnlev�f
Chapter 1: Course Introduction

Objectives
• After successfully completing this content, you will be
able to:
• Get to know one another
• Identify the objectives, prerequisites, facilities, and
materials used during this course
• Identify additional Education Services courses at
Juniper Networks
• Describe the Juniper Networks Certification Program
We Will Discuss:
Objectives and course content information;
Additional Juniper Networks, Inc. courses; and
The Juniper Networks Certification Program.
Chapter 1-2 • Course Introduction www.juniper.net

Introductions
• Before we get started ...

• What is your name?
•
• Where do you work?
• What is your primary role in your
organization?
• What kind of network experience
\:;;)'
do you have?
• Are you certified on Juniper Networks?
• What is the most important thing for
you to learn in this training session?
Introductions
The slide asks several questions for you to answer during class introductions.
www.juniper.net Course Introduction • Chapter 1-3

Course Contents (1 of 2)
• Contents:
• Chapter 1: Course Introduction
• Chapter 2: Troubleshooting as a Process
• Chapter 3: Junos Product Families
• Chapter 4: Troubleshooting Toolkit
• Chapter 5: Hardware and Environmental Conditions
• Chapter 6: Control Plane
• Chapter 7: Data Plane: Interfaces
• Chapter 8: Data Plane: Other Components
Course Contents: Part 1

The slide lists the topics we discuss in this course.

Course Contents (2 of 2)
• Contents: (contd.)
• Chapter 9: Staging and Acceptance Testing
• Chapter 10: Troubleshooting Routing Protocols
• Chapter 11: High Availability
• Chapter 12: Network Monitoring
• Chapter 13: JTAC Procedures
• Appendix A: Interface Troubleshooting
Course Contents: Part 2

The slide lists the continuation of topics we discuss in this course.

Junes Troubleshooting in the NOC
Prerequisites
• The prerequisites for this course are the following:
• Basic networking knowledge
• Networking Fundamentals computer-based training, or
equivalent knowledge
• The Introduction to the Junos Operating System (IJOS)
course, or equivalent knowledge
• The Junos Routing Essentials (JRE) course, or equivalent
knowledge
Prerequisites
The slide lists the prerequisites for this course.

Course Administration
• The basics:
• Sign-in sheet
• Schedule
• Class times
• Breaks
• Lunch
• Break and restroom facilities
• Fire and safety procedures
• Communications
• Telephones and wireless devices
• Internet access
General Course Administration

The slide documents general aspects of classroom administration.
www.juniper.net Course Introduction • Chapter 1- 7

Education Materials
• Available materials for classroom-based

and instructor-led online classes:
• Lecture material
• Lab guide
• Lab equipment
• Self-paced online courses also available
• http://www.juniper.net;training/technical_education/
Training and Study Materials

The slide describes Education Services materials that are available for reference both in the classroom and on line.

Additional Resources
• For those who want more:

• Juniper Networks Technical Assistance Center (JTAC)
• http//www.juniper.net/support/requesting-support.html
• Juniper Networks books
• http//www.juniper.net;books
• Hardware and software technical documentation
• Online: http//www.juniper.net;techpubs
• Portable libraries: http//www.juniper.net;techpubs/resources
• Certification resources
• http//www.juniper.net;training/certification/resources.html
Additional Resources
The slide provides links to additional resources available to assist you in the installation, configuration, and operation of
Juniper Networks products.

Satisfaction Feedback
&JD
Class
Feedback
Cl ==
• To receive your certificate, you must complete the

survey
• Either you will receive a survey to complete at the end of
class, or we will e-mail it to you within two weeks
• Completed surveys help us serve you better!
Satisfaction Feedback
Juniper Networks uses an electronic survey system to collect and analyze your comments and feedback. Depending on the
class you are taking, please complete the survey at the end of the class, or be sure to look for an e-mail about two weeks
from class completion that directs you to complete an online survey form. (Be sure to provide us with your current e-mail
address.)
Submitting your feedback entitles you to a certificate of class completion. We thank you in advance for taking the time to
help us improve our educational offerings.

Juniper Networks Education Services

Curriculum
• Formats:
• Classroom-based instructor-led technical courses
• Online instructor-led technical courses
• Hardware installation elearning courses as well as technical
elearning courses
• Complete list of courses:
• http:j/www.juniper.net;training/technical_education/
Juniper Networks Education Services Curriculum

Juniper Networks Education Services can help ensure that you have the knowledge and skills to deploy and maintain
cost-effective, high-performance networks for both enterprise and service provider environments. We have expert training
staff with deep technical and industry knowledge, providing you with instructor-led hands-on courses in the classroom and
online, as well as convenient, self-paced elearning courses.
Courses
You can access the latest Education Services offerings covering a wide range of platforms at
http:/ /www.juniper.neVtraining/tech nical_education/.

Juniper Networks Certification Program
• Why earn a Juniper Networks certification?

• Juniper Networks certification makes you stand out
• Unleash your creativity across the entire network
• Set yourself apart from your peers
• Capitalize on the promise of the New Network
• Develop and deploy the services you need CERTIFICATION
PROGRAM
• Lead the way and increase your value
• Unique benefits for certified individuals
Juniper Networks Certification Program

A Juniper Networks certification is the benchmark of skills and competence on Juniper Networks technologies.

Juniper Networks Certification Path
i;li" � '" ' .,"' ' s->:

.J; <( y-Yf*y "
'
"'i) i :'I '"J # � h�ll (.d -., ,_,
Expert Level (9NCIEl

- '
Juniper.
r. ',Ls
', ,
: '' , ,, • '� '',;,: ,; ,, '
, Spec1a1,st ��v�l'(JN�I�}, ; ,
::r: }' ';
=, ' ':,; ,,
;'f•
�, fr\ "., .,
., � .. 1 r"if t , 1 1-
• , """ 1, $ ,.;; ,, •
Juniper Networks Certification Program Overview

The Juniper Networks Certification Program (JNCP) consists of platform-specific, multitiered tracks that enable participants
to demonstrate competence with Juniper Networks technology through a combination of written proficiency exams and
hands-on configuration and troubleshooting exams. Successful candidates demonstrate thorough understanding of Internet
and security technologies and Juniper Networks platform configuration and troubleshooting skills.
The JNCP offers the following features:

Multiple tracks;
Multiple certification levels;
Written proficiency exams; and
Hands-on configuration and troubleshooting exams.
Each JNCP track has one to four certification levels-Associate-level, Specialist-level, Professional-level, and Expert-level.
Associate-level and Specialist-level exams are computer-based exams composed of multiple choice questions administered
at Prometric testing centers worldwide.
Professional-level and Expert-level exams are composed of hands-on lab exercises administered at select Juniper Networks
testing centers. Professional-level and Expert-level exams require that you first obtain the next lower certification in the track.
Please visit the JNCP Web site at
http:j/www.juniper.net/certification for detailed exam information, exam pricing, and exam registration.

Certification Preparation
• Training and study resources:

• Juniper Networks Certification Program website:
www.juniper.net;certification
• Education Services training classes:
www.juniper.net;train i ng
• Juniper Networks documentation and white papers:
www.juniper.net;techpubs
• Community:
• J-Net:
http://forums.juniper.net/t5/Training-Certification-and/
bd-p/Training_and_Certification
• Twitter: @JuniperCertify
Preparing and Studying

The slide lists some options for those interested in preparing for Juniper Networks certification.

Junos Genius: Certification Preparation App

Unlock your Genius...
-"'""'-......'""""--I 1�:;�=
• Practice for multiple exams in Study Mode
• Hundreds of multiple choice questions and .,.
=�
" ',
answer explanations, many with CLI

,U-"<l)e �
�W'Y':.;:'
_.IJ,,.
'-l(r,,.,,....1, �'.)'-:,
NC.A ..lt.rO) r�'<bl:! N,,l'S1lllill"'*"'
snapshots ..:,,;1•b•i111!1••$1, .....,,
'l
'••. OHdtor�"to'oow�J
1 P.. 0.'.3.f• --- mm
'Ju! L-''
• Simulate an exam in Time Challenge Mode s . 15«->M

jC. 1.11'&tt>lf!S � .. ,.,..oillaffi
W,i,v!'l'"d""'at,tll'
• Earn device achievements by winning

in Instructor Challenge Mode
•
• Build a virtual network with device
,. r••
achievements
. :¥
• Track your results in the app and
Game Center; share your network Iii� ..
through Facebook and Twitter
JU NOS
GF�,llF
www.juniper.net;junosgenius
Junos Genius
The Junos Genius application takes certification exam preparation to a new level. With Junos Genius you can practice for
your exam with flashcards, simulate a live exam in a timed challenge, and even build a virtual network with device
achievements earned by challenging Juniper instructors. Download the app now and Unlock your Genius today!

Find Us Online
JnetJ http:j/www.juniper.net/jnet
http:j/www.juniper.net/facebook
EJ http:j/www.juniper.net/youtube
[)j http:j/www.juniper.net/twitter
Find Us Online
The slide lists some on line resources to learn and share information about Juniper Networks.

Questions
Any Questions?
If you have any questions or concerns about the class you are attending, we suggest that you voice them now so that your
instructor can best address your needs during class.


JUnlf2v�f
Chapter 2: Troubleshooting as a Process

Objectives
able to:
• Avoid unnecessary disruptions to production environments
• Describe a troubleshooting process
• Describe troubleshooting challenging network issues
We Will Discuss:
How to avoid unnecessary disruptions to production environments;
Troubleshooting as a process; and
Situations that pose troubleshooting challenges.
Chapter 2-2 • Troubleshooting as a Process www.juniper.net

Agenda: Troubleshooting as a Process
7Before You Begin

• The Troubleshooting Process
• Challenging Network Issues
:JtJnm;
� -,;W'71!l!'f!..i'f�'.';1.�r"<"' ,� � =-=xx-.-.,,,"" �-
020f.41UnlperNellHoflei,l0C.AIIIW*�J 'Jtr"'':!u, '' ,Wo!ldwide Education Services www,unoper.net I 3
,, 2.ifu:Ji1ltit'::iiilI1i Sl'..:��jf':,.,,., ___ ;.;.,;;_ ,;:,f,»-<x==�.:��;s> > - �
Before You Begin

The slide lists the topics we will discuss. We discuss the highlighted topic first.
www.juniper.net Troubleshooting as a Process • Chapter 2-3

Before You Begin ...
• First, do no harm:
• Know what is normal
• Use change control processes
• Plan for the worst
• Backup configurations and other key files
• Use non-disruptive practices
• Recreate in a lab environment
• Use maintenance windows
First, Do No Harm
When modern medical doctors begin practicing, they often take what is called the Hippocratic oath, primum non nocere.
This Latin phrase, attributed to Hippocrates, translates as "first, do no harm."
This should also be the concern of a network administrator working in a production network environment. Some of the
information presented in this course could be disruptive to a production network. This is true not only for corrective actions
that might be taken, but also applies to the troubleshooting process itself.
The slide lists several best practice safety precautions that can be taken to minimize unforeseen impact to the network. We
will discuss each of the listed options in more detail in the next few pages.

Know What Is Normal
• You must know what is normal for your system:

• Establish a baseline before a problem occurs
• Resource utilization
• Throughput
• Types of traffic
• Confirm the symptoms:
• Always verify a problem exists before conducting potentially
disruptive testing
Know What Is Normal

Set yourself up for success. It might seem pretty basic, but to spot anomalous behavior, you must begin with a known
reference point.
Confirm the Symptoms

Always confirm the symptom. Many problems are transient by nature and, in some cases, testing will cause more disruption
than the problem itself. If a transient condition has already cleared, there is little benefit to conducting disruptive testing. It is
better to plan on long-term monitoring with testing occurring when the problem next manifests itself.

Change-Control Processes
• Use change-control processes
• Formalized
• Balance needs with risks
• Coordinate scheduling to minimize impact to production
• Remember, customers might have change control policies in
place as well
Change-Control Policies
Best-in-class companies have formalized change-control policies that govern any modifications to the production
environment and define processes to use when changing it. These processes are designed to balance the need for changes
in an environment with the technical risks associated with implementing those changes. They usually allow for more than
one set of eyes to review changes before they are implemented. This built-in protection helps avoid unforeseen impact due to
oversight by a single individual or group. These processes also dictate how changes should be implemented in a way that
minimizes impact to the network.
Remember that change-control processes can also pertain to troubleshooting, because many troubleshooting steps can
change or impact a production environment as well. Give special consideration to the possible impact of all troubleshooting
steps before they are used. This step can be easily forgotten in a crisis, but failure to do so can make the situation worse.
When troubleshooting, always consider whether the troubleshooting step is something that needs to go through a formalized
change control process.

Plan for the Worst

• Plan for the worst
• Have a backout plan
Plan for the Worst

An important part of any change control process is a back-out plan. Even in situations where a formal change control process
does not apply, appropriate forethought should be given to a back-out plan. What steps will be taken to quickly and
unobtrusively revert to a previous known state? What preparation needs to take place to allow the option of a backout?
Although ideally you will not need to use a back-out plan, it is often too late to implement one after the fact if appropriate
forethought and preparations have not taken place.

Configuration
• Working with configurations:

• Use save to make backups before modifying a
configuration
• All or part of a configuration can be saved locally or remotely
• Use commit comment to add comments
• Logged comments can help when a quick rollback is needed
• Use commit confirmed to temporarily activate
• When working with remote systems
• When adding or modifying policies. firewalls. or other security
elements
Working With Configurations

You can save backups of all or part of your configuration before making changes. This process is independent of automatic
rollback files that are stored each time a commit is performed. These backups can prove invaluable in post-issue wrap-up
and documentation and also provide a return point, should the need arise. The ability to return to the last known state, even
if it was not working completely as desired, can be an important safety net.
Use save to save the current configuration locally or remotely. Remember, you are saving only the configuration statements
at the current hierarchy level and below. To save the entire candidate configuration, you must be at the top level of the
configuration hierarchy.
You can specify a filename in one of the following ways:
filename or path/filename: Locally, we recommend storing files in the /var/tmp directory so they can be
accessed by any user with sufficient rights.
ftp://user:password@host;path/filename: Puts the file in the location explicitly described by this URL using the
FTP protocol. Substituting the word "prompt" for the password causes the FTP server to prompt you for the
user's password.
scp:j/user@host;path/filename: Puts the file on a remote system using the SSH protocol. The software
prompts you for the user's password.

Disruptive Practices
• Be aware of disruptive practices
• Review power-on hardware information for your equipment
• Hot-swappable FRUs
• Hot-pluggable FRUs
• Review hardware redundancy options where available
• Be careful when using hidden CU commands
• Hidden commands are hidden for a reason
• Understand disruptive potential before using
• Be careful when using disruptive testing techniques
Avoid Disruptive Practices

Before adding, removing, or replacing hardware in a production environment, it is a good idea to review procedures for taking
your specific hardware online and offline. Also, be familiar with the available redundancy options and how using them can
impact your production situation.
Throughout this course you will be introduced to several hidden commands that provide additional troubleshooting
capabilities or resolution options. Be advised that these commands are hidden for a reason. Be aware of the potential
impacts, and consider the risks as well as the benefits, before using them in a production environment. This same
precaution applies to some of the testing techniques outlined because they could also prove disruptive to production
environments.

Recreate in Lab Environment

• Why start a lab reproduction effort?
• You can troubleshoot without affecting customer traffic
even when your actions are potentially disruptive
• You are free to experiment with possible workarounds
• One possible methodology is to start with a simple setup. and add
detail until the problem can be reproduced
• Lab reproduction is invaluable for some problems
• Protocol anomalies
• Interoperability issues
• Unexpected signaling behavior
• When is lab reproduction not useful?
• When a problem is tied to a specific device or circuit failure
• When the complexity of the network or of the potential triggers
makes reproduction unfeasible
Why Start a Lab Reproduction Effort?

When possible, reproducing a problem in a lab environment can be a valuable exercise, though occasionally very time
consuming. If you are successful, though, you have the freedom to troubleshoot and analyze the issue without any adverse
impact on the production network while getting to the root cause of the problem. The network can be simplified, and features
removed, until the problem is no longer reproducible-a situation that will indicate a direct or indirect trigger. Moreover, once
a problem is reproduced, you are free to experiment with different workarounds before choosing the best course of action for
fixing the problem. Understanding the best way to reproduce a problem is a skill that comes with experience. Any
reproduction is a simplification and an approximation of the real network, so care must be taken to ensure that you do not
simplify the setup to the point that even the problem trigger is removed.
Lab reproduction is invaluable for diagnosing some problems, such as the following:
Protocol anomalies: Unexpected protocol behavior that might indicate a software issue. Studying the problem in
the lab and comparing the observed behavior with the protocol specifications (RFCs and so forth) allows you to
decide what is normal or not.
Interoperability issues: Protocol or interface incompatibilities with other vendors' devices and systems. In this
case, the most important thing is to find out whose implementation is behaving against the specifications, so
that a case can be opened against the correct party.
Unexpected signaling behavior: When the interaction of protocols and policies-while correct-causes
unexpected problems to the network. A typical case is a routing loop forming during a network migration. Lab
reproduction allows you to find out the cause of the loop and experiment with potential solutions without
putting the network at risk.
Continued on the next page.

Why Start a Lab Reproduction Effort?
With certain problems, lab reproduction is not useful:
A problem that is tied to a specific device failure: If, for example, a Dense Port Concentrator (DPC) crash turns
out to be the result of faulty memory, any attempt to reproduce the problem on a different board is doomed to
fail. In the same way, if a throughput degradation is the result of an error on a specific circuit, any lab
reproduction will be a waste of effort.
Other problems that cannot be simplified in a lab environment: A problem triggered by customer traffic
patterns, or a protocol limitation that starts to emerge only when the network approaches several dozens of
devices. Some problems cannot be easily reproduced for purely practical purposes, such as the lack of
available transmission devices (optical switches, add-drop multiplexers, and so forth).

Maintenance Windows
• Maintenance windows:
• Minimize impact from unforeseen issues
• Do not be distracted by perceived urgency
• Customers have maintenance windows too
Maintenance Windows
Best-in-class companies set aside time for maintenance windows. Like change-control processes, maintenance windows are
designed to balance the need for changes in an environment with the technical risks associated with implementing those
changes.
Under the best of circumstances, appropriate precautions can allow for zero down time. It is the unexpected impacts that
can make a situation worse than the one you began with. Maintenance windows are helpful not only for handling the known
interruptions when making changes in an environment, but can also be beneficial for dealing with potential possibilities.
When troubleshooting, always consider whether the troubleshooting step is something that should wait for a formalized
maintenance window.

• Before You Begin

�The Troubleshooting Process
• Challenging Network Issues
The Troubleshooting Process

The slide highlights the topics we discuss next.

Troubleshooting
• Troubleshooting:
• The ability to identify the root cause of a problem impacting
the network
• The ability to identify the root cause of any deviation from
the normal or expected behavior of a network
Troubleshooting
The purpose of troubleshooting is to identify the root cause of an issue.
Frequently, troubleshooting is thought of only as it applies to tracking down the root cause of a clearly identifiable problem
and, generally, in terms of the impact it has on a network or a user's ability to use the network. In reality, it can extend
beyond that simple definition to include any deviation from the normal or expected behavior.
When problems do occur, or when the behavior of a network varies from the normal or expected behavior, it is necessary to
identify the root cause to resolve the issue and eliminate the negative impact. Additionally, in a production network, it is
important to do so in a manner that introduces the least disruption possible.

A Process-Based Methodology
• Process-based methodology:
• Learnable
• Repeatable
• Can be used when dealing within any of these elements of a
device running the Junos OS:
• Chassis
• Control plane
• Interfaces and circuits
• Data plane
A Process-Based Methodology
We have all known somebody we consider to be a good troubleshooter. The purpose of this chapter is to demonstrate that
the art of troubleshooting, is not an art at all. But rather a learnable, repeatable, process-based methodology.
Two individuals working with the same sets of tools and a common symptom might approach the act of fault analysis in
completely different ways. For example, one person might always start with visual inspection while another opts to begin with
interface loopback tests. In the end, it is hard to say that one approach is better than another, assuming that both individuals
arrive at a similar conclusion, in a similar amount of time with similar levels of disruption.
Although many different approaches to troubleshooting exist, certain fundamental elements are involved in any sound
troubleshooting methodology. Experienced technical engineers likely already employ many of these techniques, intentionally
or otherwise. The goal of this course is to help establish a repeatable framework where experienced engineers can use their
existing knowledge of a given technology to achieve more efficient and effective support.

Where To Begin?
• The scientific method:
• Characterize a problem based on observation and
experience
• Hypothesize and propose an explanation for the observation
• Make a prediction based on past experiences
• Test and experiment to prove or disprove the accuracy of
the prediction
The Scientific Method

Over 2300 years ago the Greek philosopher Aristotle, in his quest to find truth, outlined several steps that today form the
basis for what is called the scientific method. These simple steps have been expanded and used within the scientific
community for hundreds of years to gain a better understanding of our universe.
The slide outlines the basic steps of the scientific method. These steps have been expanded and adapted to fit many
different purposes. These same steps can also be adapted for troubleshooting.

Troubleshooting Steps
• Troubleshooting steps:
Define
• Define success Success
• Isolate the component preventing success
• Characterize
• Hypothesize
0000.I
______,,,
, Isolate
�--· ...
• Predict
• Test and experiment
• Identify a solution
• Implement the solution Implement
Solution
The Troubleshooting Process

Although the scientific method provides a good basis to construct a repeatable methodical approach, it can be expanded to
better fit our needs for troubleshooting.
Some significant differences exist between the need for scientific explanation and troubleshooting. First, scientists must
maintain complete objectivity about the outcome of their process where we have a known desired outcome. It is important to
incorporate this difference into our troubleshooting model so it becomes the first step.
Next, scientists are also dealing with an infinite number of variables, whereas we are dealing with a finite number of inputs
that can interfere with success.
Finally, once we have identified the input that is the root cause of the problem, we must also identify a resolution and figure
out how to implement the fix within a production network with minimal impact. These important steps must also be
incorporated into a troubleshooting model.
The benefit of incorporating a process-based troubleshooting methodology is a more effective and efficient approach to
resolving issues. The slide outlines the four main steps in our troubleshooting methodology. We will discuss each of these
steps in more detail in the upcoming slides.

Define Success
Define
Success
• Define success:
• Quantify the problem
• What is happening that should not be happening?
• What should be happening that is not happening?
• Define a desirable endpoint
• Be specific
• Define a recognizable endpoint
• Example: prefix a.b.c.d/z will be received from neighbor x
• Be careful not to define success using preconceived
solutions
Define Success
One of the most critical, and often most overlooked, steps in the troubleshooting process is defining success. Ideally, you
should define success in terms of a desired objective, rather than an encountered problem or error. This definition should be
a specific, recognizable, and desirable endpoint-not a restatement of the problem.
You should also remember that often many different ways exist to meet a given objective. Understanding the desired final
outcome is far more beneficial than simply understanding a particular problem encountered along the way.

Verify the Problem

• Once defined, verify the problem exists before
proceeding with troubleshooting:
• Troubleshooting can be more disruptive than the problem
w -=�---�wm?''""""'= ,,,,,,,_, """"'

020f.4JanlperNelwotf<s,l®'llllriJjldsreseMMI.' " LJ!.:JflJW"i\'!!°d�Ed�tionServk:es www.junipemet f i9
==#... t.,=�-'Wzirucr"" ,h -'"" -
Always Confirm the Symptom

Once an issue has been reported and a desirable endpoint has been defined, it is a good idea to confirm the symptoms still
exist before continuing with troubleshooting steps that could be disruptive.
Many problems are transient by nature, and in some cases, testing causes more disruption than the problem itself. If a
transient condition has already cleared, conducting disruptive testing at that moment provides little benefit. It is better to
plan on long-term monitoring with testing occurring when the problem next manifests itself.

Define
Suca,ss
Isolate the Problem
• Isolate the componentpreventing suc cess:

• Characterize / Implement
• Hypothesize 5o1.-,
• Predict
• Test and experiment
The way to find the broken something is to find out

what it's not!
-The Cat from the film The Cat in the Hat
(paraphrased liberally)
Isolate the Problem

The objective of the next four steps outlined on the slide is, ultimately, to identify that which is interfering with success as
defined in the previous step. You can process these steps recursively until you identify the ultimate root cause of the
problem.
One significant limitation of the scientific method is that it cannot empirically prove anything. It can be used only to disprove
a particular hypothesis. This limitation does not prove problematic in troubleshooting, particularly when there is a known
desired outcome-which is the reason defining success is such an important part of the troubleshooting process.
Once you clearly define the desired outcome, you can also define a corresponding list of components necessary to achieve
the desired outcome. Then, you simply follow a process of elimination to determine which of the required components is
interfering with success.

Characterize
• Characterize the issue:

• Collect information
• System logs
• Protocol traceoptions
• Operational mode command output
• Ask probing questions
• When did this start happening?
• Has this ever worked?
• When did this last work as desired?
• What has changed?
• What troubleshooting steps and actions have been tried already?
• Identify the knowns and unknowns
Characterize the Issue

The first step in isolating the problem component at the root of an issue is to characterize the issue itself by collecting
information and reviewing it as whole.
Several different sources of information are available including system logs, protocol traceoptions, and output from
operational mode commands. You can also take into account customer reports. However, customer input should not be
accepted blindly-it should always be compared with other available information for confirmation.
The slide lists several questions that are helpful in most troubleshooting scenarios and whose answers should be
understood and factored into future hypothesis and predictions. Other questions can be addressed only with wider collection
of information:
Who reported the problem first?
How many distinct source networks are affected?
What destinations are involved?
What types of traffic are affected?
Is the problem constant or sporadic?
Remember that this is a starting point only. Large-scale networks are complex and changing constantly. Do not let a
perceived correlation prematurely limit your scope.
The information collected will be compared to the possible reasons for a failure in the next step and used to make a
prediction about the most probable root cause.

Hypothesize
• Suggest possible explanations for observed

behavior:
Implement
• Identify all required components and dependencies Solution
• Use your knowledge of the technology

• Remember the OSI
• Use online references
• When possible, reconstruct a working scenario
• Be complete
• Do not assume
• Do not overlook the obvious
Form a Hypothesis
A hypothesis is a possible explanation for behaviors observed during the characterization stage. Many possible explanations
might exist. Some explanations will correlate with the observations made earlier, whereas other explanations will be
immediately discounted based on observations made during the characterization stage. Be careful! Do not be blinded by
subjectivity. Keep an open mind when considering possible explanations. Be complete. Do not make assumptions. Do not
overlook the obvious because you have a preconceived notion about where the root cause of the problem is. Although
leveraging your memory and past experiences against a current problem is good, you should never close your mind to new
possibilities.
During this stage, try to identify as many explanations as possible. Although you might go quickly through this stage on a first
pass, it becomes particularly important to be complete if the troubleshooting process becomes recursive. One helpful
method for identifying the potential root cause of a problem is to identify all of the required components and dependencies
necessary to achieve the desired success. For example, for a host to connect to a remote HTIP server, several components
must exist. Connectivity must exist between the end points. Each endpoint (and all intermediate systems between) must
have appropriate routing information. Security settings must allow the traffic (or a lack of security settings that would prevent
it) and so on.

Layered Approach
OSI TCP
Application
Presentation
Application
Session
Transport Transport
Network Internet
Data Link
Link
Physical
A Layered Approach
When identifying required components, remember the reference models. It does not matter whether you use the Open
Systems Interconnection (OSI) model or the RFC 1122 Internet model. Although the two views of the network are not
intended to match exactly, each provides a layered approach with dependencies on the underlying layers for the upper layers
to perform their role.
Understanding the role that each layer plays and how each layer depends upon the lower levels can greatly simplify the task
of isolating the possible root causes of a problem.
BGP adjacencies are a great example of how the different layers work together to accomplish designated objectives. BGP
forms adjacencies at the Application Layer to share routing information. To form an adjacency, BGP relies on TCP at the
Transport Layer to establish logical connections between BGP peers that in turn rely on the underlying routing information for
reachability of internal BGP peers that, of course, has a dependency on link-level connectivity between all of the involved
devices.

Revisiting Control and Data Planes

Common Symptoms:
Routing
Control Plane Missing routes
Engine
Data Plane Common Symptoms:

Physical errors. dropped
packets (all or some)
• Generally a good idea to begin diagnosis at the

control plane
��,2�.!��!:.l,f"7i'•" ;iJAfPef ;woridwide Education Servk:es wwwJurupernet I 24

��J.,.;__ \(�i�
Layered Approach-Another View

Recall that devices running the Junos operating system maintain a strict separation between the control plane and the data
plane. When you characterize a problem, certain types of symptoms indicate the control plane as the most probable cause,
whereas other symptoms indicate that the root cause ultimately lies in the data plane. In an established operating
environment, it is extremely rare to find a fault in both planes simultaneously due to the different role that each plays.
The control plane is responsible for installing routes into the forwarding table. This function relies on configuration, routing
protocols (interior gateway protocols [IGP], BGP, policies, RSVP, LDP, and so on), connection to peers (including keepalives
and authentication) and so on. The most common symptom of a control plane problem is the lack of one or more routes.
When in doubt, it is generally beneficial to determine whether the control plane is functioning properly before moving on to
the data plane. You can generally make this determination with a simple operational command such as show route.
The data plane uses route information provided by the control plane to perform packet forwarding. The most common
symptom of a data plane issue is dropped packets. P roblems in the data plane can result from faulty hardware or
configuration-based issues such as firewall filters, policers, and so forth. Note that intermittent issues and bottlenecks
almost always trace back to the data plane.
Chapter 2-24 • Troubleshooting as a P rocess www.juniper.net

Predict and Test

• Make a prediction:
• Identify most probable explanation
• Be complete
• Do not assume
• Do not overlook the obvious
• Test to prove (or disprove) your hypothesis
• Validity. validity, validity!
Predict and Test

Once you have identified a plausible explanation (or perhaps several plausible explanations) for the observed behavior, you
can make a predictions about what will happen if a particular test is run or if certain changes are made. It is acceptable, and
in fact expected, that you proceed with what you deem the most probable cause first. You will make a determination about
the most probable cause based on the observed behaviors, the plausible explanations, and your own experience.
Select a test that comes closer to proving (or disproving) your hypothesis. If it does not, it is not a valid test. For example, if
you are testing to ensure that a packet is using the most optimum route between two endpoints, using a simple ping utility
without any options will not provide enough information to verify the objective. A standard ping only demonstrates
reachability. (It does demonstrate two-way reachability, because the destination must also have reachability information
back to the source.) A more valid test to determine whether a packet will take the optimum route between two points would
be the traceroute utility that records all ingress interfaces encountered along the path, or the ping utility utilizing the
record-route option that records all of the egress interfaces encountered along the path. Also remember that if the goal is
HTIP connectivity, using a ping utility might not be a valid test. because firewall rules and other security precautions could
impact HTIP traffic and ping traffic differently.

Recursive Process
Test Condition
• "If at first you don't succeed ... "

• Divide and conquer
• Remember the reference models
• Narrow down the possibilities Fewer Remaining Possibilities
• Validity. validity, validity!

• Build your own troubleshooting flowchart as you go
• Each test should reduce the number of possible causes for the
problem. regardless of pass/fail status
• Remember, more than one contributing factor could be
present (particularly in new setups)
If, at First, You Don't Succeed ...

So, what happens if the most probable cause does not prove to be the culprit? Over 2,500 years ago, Sun Tzu wrote a book
called The Art of War, in which he told us the way to defeat the enemy was to divide and conquer. This general approach
works well when troubleshooting a problem that is generic enough to have numerous possible causes.
Sometimes a single test demonstrates functionality within several layers. At other times, a separate test is required to rule
out each individual required element. For instance, when trying to troubleshoot why two internal BGP peers are not forming
an adjacency, a traceroute between the two loopback addresses (remember to use the source option) eliminates the need
to validate individual link-by-link connectivity and also routing information between the two nodes.
Each test should reduce the number of possible root causes for a problem-regardless of the outcome. For example, if a
device will not boot when new cards are added, one possible course of action would be to remove the nonrequired
field-replaceable units (FRUs) and see if the device boots. The remaining FRUs could then be added back in one at a time (or
in groups) to help isolate the problematic hardware.
Remember, isolation of the root cause of a problem can be a recursive process.

Consider the Possibilities

• Possible causes:
• Configuration
• Hardware
• Software
• Something else
• Remember, more than one contributing factor could
be present
• New installations
• Some troubleshooting has already occurred
• New, previously unnoticed issues. become apparent
Consider the Possibilities

A finite list of things exist that can be responsible for problems within a network. We can reduce this list to four broad
categories.
The slide outlines the four major categories that could be interfering with success as defined in step one of the
troubleshooting process. We discuss each of these categories and when they might be suspect in the upcoming slides.
Compound Issues
We mentioned earlier that it is rare to see problems occur in both the control plane and data plane simultaneously in
established environments. However, if proper precautions are not taken, it is possible to introduce additional problems into
the environment during the troubleshooting process. It is also possible that in the process of troubleshooting one issue, you
might discover other previously unnoticed issues. Always be open to the idea that there might be more than one problem.

Configuration Errors
• Configuration:
• Most plausible in new setup or with recent changes
• Use show system commit to check for recent changes
• Use show compare to display differences
• Remember to check all devices that could introduce a problem
• Eliminate the control plane as a possibility before focusing
on the data plane
• When configuration errors are suspected, it is OK to quickly
glance at configuration. but rely on operational mode
commands to isolate errors
• The human brain sees what it expects to see
Configuration Errors
Configuration issues are the most likely cause of problems in new setups. They are also the most probable cause of
problems within the control plane. Problems in the control plane can even occur because changes were made to the
configuration of another device within the environment.
The Junos OS has built-in sanity checks to ensure that all configuration entries are valid. In some cases, they can also check
for completeness, such as ensuring that a referenced policy exists. This process is different from checking for accuracy
because there is no automated way to check a configuration against intent.
It is common practice to jump directly to viewing the configuration when configuration errors are suspected. This process is
not troubleshooting! It does not guarantee that you will be one step closer to finding the root cause of an issue. It is OK to
take a quick glance at a configuration to see if the configuration error is readily apparent. However, be wary of spending a lot
of time looking at a configuration for errors, because configurations can be quite long and complex. Any benefit to be
achieved from looking at a configuration is further frustrated because the human brain tends to see that which it expects to
see. It is a much better practice to rely on the output of operational mode commands when trying to isolate configuration
errors.

The Human Brain, a Funny Thing...

Take a moment and read the following paragraph:
Arocdnicg to rsceearch 1 it deosn't mttaer in waht

oredr the ltteers in a wrod are, the olny iprmoatnt
tihng is taht the frist and lsat ltteer are in the rghit
peale. The rset can be a toatl mses and you can sitll
raed it wouthit pobelrm. Tihs is buseace the huamn
mnid deos not raed ervey lteter by istlef, but the wrod
as a wlohe.
The Human Brain Sees That Which It Expects To See

This paragraph, which has been widely circulated on the Internet, demonstrates how problematic relying on the process of
reviewing a configuration to identify errors can be.
Although this is an extreme case, we have probably all had the experience of overlooking a minor detail within a device
configuration. Always look for the operational mode command that demonstrates the device is operating as intended rather
than relying only on reviewing the configuration.

Operational Mode Commands

• Sample operational mode commands
• Success:
• Reachability between remote hosts using a BGP-learned route X
• Operational mode commands to help isolate the problem:
• show route protocol bgp
• show route prefix
• show bgp summary
• traceroute
• show route receive-protocol bgp
• show route advertising-protocol bgp
• Know which part of the configuration you must review
Narrowing the Focus

The same autocorrecting functionality of the human brain that allows us to read the paragraph on the previous slide also has
a tendency to overlook simple configuration errors-particularly when it is not known where in the configuration an error
resides.
By using operational mode commands to isolate individual elements from the identified required components list, you can
narrow down the possible locations to a particular configuration stanza. Once the focus has been narrowed to a known
portion of the configuration, it is much easier to recognize an error.
For example, if you have determined that an expected BGP route is not present on a device, it is more beneficial to begin with
the operational mode command show bgp summary to see all functioning BGP peering sessions than to spend time
looking at the configuration to determine which BGP peers should be up. If the output from show bgp summary indicated
a missing internal BGP peer, you might then proceed to use an operational mode command such as traceroute between
the two loopback addresses to ensure that full underlying IGP routing information exists. If a traceroute is unsuccessful, it
does not make much sense to review the BGP configuration until the underlying IGP problem is resolved.

Hardware Errors
• Hardware:
• Plausible in new out-of-box setups
• Plausible if new problems show up in established networks
• Can be a delayed effect from improper handling
• Alarms, LEDs, and log files, along with operational mode
command output all prove helpful in troubleshooting
hardware issues
• Try moving the problem
• Generally eliminate hardware as a possibility before
progressing on to software
Hardware Errors
Hardware is a plausible cause of issues that appear in established environments where configuration is not suspect. It is
also possible in new setups, or anytime that hardware has been moved or relocated-particularly if proper handling
techniques were not used.
Always use appropriate handling techniques when working with hardware including proper grounding and other electrical
static discharge (ESD) precautions, because even the slightest electrical influx can damage the fine traces used in today's
electronic equipment. This damage is not always immediately evident-but it could weaken a trace and could contribute to a
future failure.
Several tools are available when troubleshooting hardware, including alarms, LEDs, various log files and operational mode
command output.
Another very useful tactic when troubleshooting hardware is to attempt to move the problem. By relocating hardware within a
device or between devices, it is often possible to identify the problem component. Remember to take appropriate
precautions when a possibility of impacting production traffic exists.
Because hardware issues are more easily identified than software issues and more likely to occur in established operating
environments, it is generally more efficient to eliminate hardware as a root cause before proceeding to troubleshoot
software.

The Human Brain-Still a Funny Thing ...

Count the number of Fs:
The necessity of training farm hands for first class
farms in the fatherly handling of farm live stock is
foremost in the eyes of farm owners. Since the
forefathers of the farm owners trained the farm hands
for first class farms in the fatherly handling of farm
livestock, the farm owners feel they should carry on
with the family tradition of training farm hands of first
class farmers in the fatherly handling of farm live
stock because they believe it is the basis of good
fundamental farm management.
That Is a Lot of Information To Go Through...

Log files and operational mode commands can generate a lot of output to look through, and the human brain is prone to
make errors.
Read this paragraph quickly, without going back, and see how many occurrences of the letter F (upper case or lower case)
you can find. If you read this paragraph as quickly as you would if you were scanning through a large log file, you might miss
some of the occurrences.
When reviewing large amounts of output, remember to use the command line interface (CLI) pipe function and the
accompanying options such as find, match, and count. Remember, piped commands can be chained for additional
control when parsing large files.
Fortunately, the pipe command eliminates the need to rely on recognition.

Parsing System Log and Other Output
• The CLl's I (pipe) function makes parsing log files and

other extensive output easy
• Several options are available:
• Use the I (pipe) function to filter and manipulate output
show interfaces terse I match down
• Chain multiple options for advanced capability
show log messages I match fpc I count fail
• Use quotes and the pipe function as a logical .. or" for example:
show log messages I match "fpc I sfm I kernel"
• Search the messages and chassisd logs for entries like fail,
kernel, core, error. and so on
The I (Pipe) Command

You can filter the output of any show command, eliminating the possibility for human error when searching for specific taxt.
Searching for key words such as "fail", "down", or specific hardware components can help you quickly hone in on faulty
hardware.

Hardware Troubleshooting Flowchart

�······� b1splayandV1ewAiarms show chassis alarms
show chassis craft-interface
•
View LED Status and
Display Craft Interface show chassis craft-interface
Parse and View Syslogs show log messages

and Act Accordingly show log chassisd
monitor start [messages I chassisd]
show chassis hardware

Display Interface and show chassis fpc
Hardware Status
show pfe statistics error
show interfaces terse
show interfaces interface detail
! Investigate Software Faults I show log 1.o�-fil.e-name
Hardware Troubleshooting Flowchart

The artistic aspect of troubleshooting and the myriad ways in which a modern communications device might malfunction
combine to make a definitive set of troubleshooting steps and procedures an unobtainable goal. The purpose of the
hardware troubleshooting flowchart shown on the slide is simply to provide a set of high-level steps designed to get you
started with hardware fault analysis. Note that reasonable people might disagree on the exact ordering of the steps or the
particular CU commands that could be used to help isolate a hardware failure (for example, some might prefer the
extensive option to the show interfaces command, whereas the sample chart calls out the terse and detail
options).
Note that two individuals working with the same sets of tools and a common symptom might narrow their focus in completely
different ways. For example, one person might always start with visual inspection, whereas another opts to begin with
interface loopback tests. In the end, it is hard to say that one approach is better than another, assuming that both individuals
arrive at a similar conclusion, in a similar amount of time, with similar levels of disruption. There will always remain a certain
degree of artistic license that determines how a particular technician decides to approach a problem, however the objective
remains the same-narrow the focus.

Software Errors
• Software
• Plausible in new setups, with recent Junos OS upgrades, or
when using new features
• View version and last Junos OS change
show version detail
show system software detail
file list /var/sw/pkg detail I match rollback
• Check online resources for known issues
• Check release notes
www.juniper.net;techpubs/software/junos/
• Search using keyword search-requires login
www.support.juniper.net (link: Junos Defect Search)
Software Errors
Like configuration issues, software issues are not as likely to cause random failures in established functional environments.
They are more likely to appear with changes, either to the operating system, or with the utilization of new features. Software
is generally suspect once hardware has been eliminated as a probable cause.
If software errors are suspected, check online references for known issues.
If you are not running the latest version of code, be sure to check the latest release to see if an issue you are experiencing
has been identified and resolved. If an issue is identified there, remember to proceed using the normal upgrade procedures
including testing the new code in a lab setting specific to your environment. Bypassing proper testing in a rush to resolve an
issue can lead to different and possibly more disruptive issues.

Software Troubleshooting
• Troubleshooting software problems:
• First, eliminate hardware as a possible issue
• Review logs for software-related entries
• Verify required processes are running
• Move the problem:
• Can the issue be duplicated on another system using the same
version of the Junos OS?
• Can the issue be duplicated on another system using a different
version of the Junos OS?
• Core files and memory dumps might be required for
advanced troubleshooting
Troubleshooting Software Problems

Once you have eliminated configuration and hardware problems as likely causes, it is time to focus on software. Begin by
reviewing logs for software-related error messages. Also verify required processes are running using the show system
processes command.
You might be able to isolate the issue with simple testing, equivalent to the moving-the-problem approach used with
hardware. Begin by duplicating the symptoms in a nonproduction environment. If the symptoms can be duplicated, you can
test to see if the same symptoms are present in different releases of the Junos OS.
Advanced Software Troubleshooting

In other situations, more advanced troubleshooting is required.
Today's internetworking software is exceedingly complex. As a result, equally complex bugs that result from unforeseen
circumstances can result in a fatal error within a software process. Most of these software faults relate to illegal memory
operations caused by the process attempting to read or write data from a memory area outside the boundaries allocated for
that process. In some cases, faulty hardware, such as failing memory, can cause stack or register corruption that leads to a
fatal error in a software process. Core and log file analysis are used to determine whether hardware errors have led to a
software panic. A core file represents the set of memory locations and stack data that was in place at the time of the fault. A
copy of the binary image that left the core file (with debug symbols included) is then run against the core file using a
debugger to enable problem diagnosis by a Juniper Networks software engineer to help isolate the root cause.

Software Troubleshooting Flowchart
Hardware Is OK
··········• Parse and View Syslogs show log messages

andActAccordingly monitor start messages
show system processes

Display Running show system connections
Processes
file show /etc/services
show system core-dumps

F. ?
�C Determine Whether file list /var/b:r:rp/*core*
�-···········� co_ re FilesAre _ r_ es
_ _ e_n_t__.
.... _ _ _ _ _ _ _ P file list /var/crash/*core*
Investigate Interface Faults
Software Troubleshooting Flowchart

The purpose of the software troubleshooting flowchart shown on the slide is simply to provide a set of high-level steps
designed to get you started on the path of software-related troubleshooting. Note that reasonable people might disagree on
the exact ordering of the steps or on the particulars of the CL! commands that could be used to help isolate a software
failure.

One More Possibility...

• Something else:
• Outside influences
• Changes in traffic flow
• Changes in traffic type
• Malicious attacks
• Works as designed
• Misunderstanding of feature
• Design decision
Something Else
You should take into consideration an additional possibility-it might not be the network at all. Variations in traffic being
introduced to the network can often produce symptoms similar to those encountered with configuration, software, or
hardware problems. These variations might be intentional, such as in a denial of service (DoS) attack, or they might be
normal unexpected changes in the type or amount of traffic traversing the network. When these types of changes are
suspected, it is important to have a baseline reference to compare to current traffic.
Another possibility is that the network is working as designed, but differently than understood or expected-which could be
the result of trying to use a feature in a different way than it was intended or could be the result of a design decision. Modern
networks are a complex combination of standards and protocols implemented across hardware and software. Sometimes
design decisions might have been made to accommodate the complex list of features required in today's networks. Use
online documentation to verify whether the implementation of a particular feature matches your understanding.

Define
sua:ess
Identify a Solution
• Identify possible solutions:

• More than one way might be possible
• Criteria
• The fix does not cause other problems
• The fix survives a reboot Identify
Solution
• The fix is well communicated
• The fix is operationally understandable
• Short-term fixes are acceptable for quick restoration of
service-short term
• Test the solution
• Validity, validity. validity
• Plan how to implement solution with minimum disruption
Identify Possible Solutions

Consider the different possible solutions for the problem. Remember, a short-term fix can be an acceptable solution if it
helps restore service quickly. Your primary objective is to restore service in an operationally supportable way and often this
involves short-term fixes. But they must always be followed up with a long-term fix.
The slide lists criteria that should be expectations of any fix.

Define
Succe,s
Implement the Solution
• Implement the solution:

• Remember-do no harm
• Follow change control processes
• Use maintenance windows
• Have a backout plan
• Plan for the worst Implement
Solution
• Verify that the issue is resolved
• Success achieved?
• Monitor solution
• Confirm the absence of other negative impacts
• Document the changes
Implement the Solution

Remember to use the safety precautions mentioned earlier.
Once you have implemented the solution, verify that success, as defined earlier has been accomplished. Monitor the results
to confirm the resolution and to confirm the absence of other negative impacts.

• Before You Begin

• The Troubleshooting Process
�Challenging Network Issues
Challenging Network Issues

The slide highlights the topics we discuss next.


• Challenging Network Issues:
• Some situations can be particularly challenging:
• Packet loss
• Troubleshooting intermittent issues
• Isolating bottlenecks
• Information is key
• Use an out-of-band management network to ensure access
• Have a baseline for comparison
• Use appropriate logging options
• Look for patterns

Some troubleshooting situations can be particularly challenging like packet loss, intermittent issues and bottlenecks.
These issues represent a scenario where the best defense is a good offense. Have a solid network baseline available for
comparison. A meaningful baseline is necessary to confirm the existence of a problem as well as to help isolate the root
cause of a problem.
Ensure that you are utilizing the appropriate levels of system logging. Traceoptions can be very useful when troubleshooting
but you should keep in mind that traceoptions should be disabled or deleted when not troubleshooting to avoid any
unnecessary processing. If basic traceoptions logging is not sufficient, consider increasing traceoptions logging to the debug
level if necessary to capture enough information.
Be ready to act quickly. Set up an independent out-of-band management network so you need not rely on the impacted
network to access the device and troubleshoot the problem as it occurs.
The information on which you are relying must be accurate and specific-exact dates and times are crucial. Use the Network
Time Protocol (NTP) to ensure that comparing logs across the network adds value.

Troubleshooting Packet Loss

• Troubleshooting packet loss follows standard steps:
• Locate the problem-find the point where packets are lost
• Determine the cause of the loss
• Address the root cause
• To troubleshoot, you will need:
• Two endpoints
• These endpoints can be two routers. a couple of hosts in the
network. or any other type of device where the issue exists
between them
• A clear map of every device and circuit in the path
• Rule out devices and circuits until the location of the problem is
found
Troubleshooting Packet Loss Follows Standard Steps

You should follow three standard steps when troubleshooting packet loss:
1. Locate the point on the path where packets are lost. Usually, you can narrow this down to two devices and the
connection between them by sending and test traffic.
2. Examine the devices between which packets are lost, and try to find the root cause. This step can be done by
process of elimination: Look at each possible cause and determine whether it is causing the problem.
Address the root cause. This step can include a variety of issues, including congestion, a circuit issue, configuration
problems (such as class of service [CoSJ, policers, or duplex settings on Ethernet interfaces), hardware faults, and so on.
To begin troubleshooting packet loss, you will need the following preliminary information:
Two endpoints: The endpoints can be two routers, a pair of hosts in a customer network, or even traffic
generators.
Using routers as test endpoints is not ideal because host-bound traffic is often rate-limited.
Remember that on most platforms, traffic to and from the router itself is treated differently from transit
traffic. In some cases, this configuration can mask a problem on the starting or ending node.
A clear map of every device and circuit in the path. As noted previously, you must proceed by process of
elimination to narrow down the problem to a section of the path. To perform this task, a topology map is a
necessary starting point.

Packet Loss: Locating the Problem (1 of 2)

• traceroute monitor IS a useful first step
MXB (0.0.0.0) (tos=OxO psize=1400 bitpattern=OxOO) Mon Feb 25 19:39:15 2013
Keys: Help Display mode Restart statistics Order of fields quit
Host Loss% Snt Last ltvg Best Wrst St Dev

l. 212.7.1 0.0% 45 0.3 0.2 0.2 0.3 0.0
2. 212.168._ 0.0% 45 2.4 2.8 2.4 4.9 0.6
3. 212.15.10.1 0.0% 45 13. 0 13.3 8.8 33.4 4.7
4. 212.42.62.85 0.0% 45 14.4 14. 7 10.8 32.2 3.6
5. 84.16.245.153 D.0% 45 39.4 21.8 17.8 42.0 4. 9
6. 84.16.244.41 0.0% 45 22.8 22.0 17.9 42.1 4.4
., 84.115.130.42 0.0% 45 22.5 22.4 17.3 41.9 5.2
8. 84.16.132.174 0.0% 45 19.7 21.3 16.8 45.6 4.7
19. 80.80.193.132 2.2% 45 26.1 25.1 18.4 69.1 9.01
10. 82.98.9.10 0.0% 45 23.0 23.5 17.7 50.8 6.3
11. 82.98.9.146 0.0% 45 22.4 23.0 17.9 41. 3 4.3
• Watch for intermediate hops that rate-limit ICMP
The traceroute monitor Command

The traceroute monitor command starts a continuous traceroute and in parallel pings each host in the path; the
result is a very good overview of delay and loss at each hop. When interpreting the output consider that, in general, routers
rate-limit ICMP echo-request packets. In the slide, no real loss exists at hop 9, which is confirmed by the fact that no loss
exists on subsequent hops. The apparent loss is simply the consequence of ICMP rate-limiting.

Packet loss: locating the Problem (2 of 2)
• General troubleshooting hints

• Collecting interface and PFE error statistics for all devices in
the path is an extremely important step
• Watch for asymmetric routing
• Be careful with rapid pings-most routers rate-limit ICMP
• Consider firewall filters to count test traffic
--------------------------------
Watch for Asymmetric Routing

The following list provides general troubleshooting tips:
Check for error counters first: Collecting interface traffic and error counters (on the interface and on the PFE) is
usually enough to find out where a loss is taking place.
When trying to locate packet loss, remember to watch out for asymmetric routing: Always check the path from
both endpoints, else you might end up looking for a problem on the wrong device.
Remember that many routers rate-limit ICMP traffic: This rate-liming can lead to false conclusions.
Use firewall filters if all else fails: If checking interface error counters does not give any definite result, one
alternative is to use firewall filters with counters matching test traffic, on several routers along the path.
This approach allows you to rule out a section of the path: If counters match, no loss has been taking place.
However, although this step can be useful, it is still a rather disruptive operation in that it requires a potentially
service-impacting configuration change. For this reason, you should take this step only when all other steps
have failed to shed any light on the problem.

Payload-Dependent Loss
• In rare cases, the loss is dependent on payload
• The problem appears only with packets matching a specific
bit pattern
• The cause is usually faulty hardware somewhere on the path
• Bit errors in packet memory
• One way of detecting the problem is to use a rapid ping with
different payloads
• Some suggestions are 00 (all zeros). FF (all ones). AA and 55
(alternating ones and zeros). OF and FO (half byte ones. half zeros)
• These issues are rare-rule out other causes first
• But when they happen. they are difficult to pin down
Payload-Dependent Loss
In some (fortunately, very rare) cases, the loss is only triggered by packets matching a specific bit pattern. Generally, these
problems are caused by hardware problems within network elements (for example, because of faulty packet memory) or by
electrical issues on some transmission technologies, especially when payload scrambling is not used. Even though these
problems are very rare, though, it is important to be aware of their existence and to know how to recognize them.
The best way to spot these issues is to run a test with different payload patterns. In general, all zeros, all ones, and various
alternating patterns are good tests to use. If you face an elusive error, take a few minutes to run a few ping tests with
different payload patterns.

Troubleshooting Intermittent Issues (1 of 2)
• Find the constant behind intermittent:

• Determine whether a direct correlation exists between
packet loss and traffic load
• Must have a reference baseline
• Look at the system as a whole
• Look for patterns
• If packet drops impact unicast, broadcast, and multicast packets
equally, look to hardware errors
• If packet drops impact all traffic equally regardless of Cos
classification. look to hardware errors
• If traffic is impacted differently based on Cos classification. look to
circuit overutilization
Intermittent Issues, Not Random

Generally we associate the term intermittent with the term random. Logic tells us there is really no such thing as a random
issue-the issue must be tied to something. It is identifying the unknown variable that is challenging.
One of the first steps in isolating the root cause is to begin to understand the patterns behind the seemingly random
behavior. To accomplish this task, you must rely on a meaningful baseline for comparison. Also, remember to consider the
entire system between endpoints as a whole.
Begin looking for constants. Identify the source and destination networks involved. If the symptoms include dropped
packets, determine whether all packets are being impacted equally or whether a distinction exists between different classes
of traffic. An unequal distribution of dropped packets could indicate an overutilized circuit along the path. Traffic that is being
dropped without regard to classification points more toward problems with physical interfaces, interface configuration, or
link errors.

Troubleshooting Intermittent Issues (2 of 2)
• Intermittent hardware problems

• Intermittent issues not correlating to traffic load are
generally associated with OSI Layer 1 and Layer 2:
• Misconfigured interface properties (check both ends)
• Malfunctioning or failing interfaces or cables
• Layer 2 loops
• Use link-by-link isolation to narrow the focus:
• Use physical loopback testing to distinguish between interface and
circuit faults (add BERT testing on applicable Layer 2 links)
• Use Ethernet operations. administration. and maintenance on
Ethernet links
• Use operations. administration. and maintenance on SONET links
Intermittent Hardware Problems

Traffic that is being dropped without a direct correlation to traffic classification or traffic load is generally associated with OSI
Layer 1 or Layer 2 issues. The root cause of the problem could be misconfigured interfaces or malfunctioning or failing
interfaces or cables.
Layer 2 loops can also be responsible for saturated links and dropped packets. Ensure that the Spanning Tree Protocol (STP)
is running in some form in redundant topologies.
Several tools are available for troubleshooting intermittent issues. Physical loopback cables can be used to distinguish
between interface and circuit faults. Bit error rate testing (BERT) can also be used on applicable Layer 2 links (E1, E3, T1, T3,
channelized DS-3, OC-3, OC-12, STM-1, channelized DS-3 I, E1, and OC-12 IQ interfaces) to confirm the integrity of the
circuits.
Remember that the objective is to narrow down the focus. Eliminating each link or interface as a possible source is a
reiterative process.

Troubleshooting Bottlenecks (1 of 3)
• Bottlenecks:
• Look at the system as a whole
• Use tracert on end hosts to gather path information
10 52 ms 39 ms 42 ms xe-11-0-0.edgel.SJ3.level3.net [10.14.23.249]
11 50 ms 39 ms 37 ms ae-41-99.carl.SJ1.level3.net [10.14.27.195j
• Use traceroute on devices running the Junos OS to identify the full

path between endpoints
• Use show route on each device to identify available link speeds
root@mxC-2> show route 172.18.3.1

172.18.3.0/30 *[Direct/OJ 2d 18:51:13
> via ge-1/0/1.256
�.,;; :,:1':."""'
Ol!OJ.4.kmlperNelllmtlei;IOC:Allrii#IIS�
� _,1I¥1:tr iP'i.lt&r �
iJ:
.Y.::; "
-
JlJIJ,W:
,;,s� ct" i:''!<,"1"> �=� '"
'51�
�"",�� ,m "'" + oc
WoddwideEducationSeMCeS
, ·��'*" ..... �-..,.,�,, =�
wwwJumpo<.net I 49
Bottlenecks
Bottlenecks represent another unique situation that can be frustrating to troubleshoot. When troubleshooting bottlenecks,
you must remember to look at the system as a whole.
Network utilities such as tracert (on end hosts) or traceroute (on devices running the Junos OS) can be used to identify the
path between two endpoints. Sometimes the output can indicate that traffic is not taking the intended or expected path. This
scenario would represent a control plane issue, and additional troubleshooting could take place accordingly. At other times,
it could indicate a resource constraint within the data plane of devices along the path. Sometimes, the link speeds can be
obtained using a DoS tracert as indicated in the output in the slide. At other times, it is necessary to use the show route
command on each device running the Junos OS along the way to determine the available throughput capacity.
Be careful when using traceroute to determine interface link capacity. The interface information that shows up is derived
from the name associated with the link and not the actual capacity of the circuit. The information is only as accurate as the
naming is current. Rely on output from the show route command for actual circuit capacity.
You should take this information in context, however. Remember, bottlenecks are not a result of throughput capacity alone,
but rather a combination of throughput and utilization. Unlike hardware issues that can occur independently of traffic load,
bottlenecks have a direct correlation with the amount of traffic passing through the system and tend to have a correlation to
the classification of traffic passing through the circuit.

• Link utilization:
•Use link-by-link isolation to narrow the focus
• Use extended ping options such as size. do-not-fragment,
record-route, and so on
•Useshow interface statistics
• Hardware issues can impact throughput
• Misconfigured interface properties (check both ends)
• Malfunctioning or failing interfaces or cables
• Layer 2 loops
Link Utilization
Once the full path between endpoints has been established, individual performance statistics can be collected link by link to
identify the bottleneck. The traceroute utility can be useful to determine the path between two endpoints but the information
is generally gathered from the response of only three Internet Control Message Protocol (ICMP) messages-hardly sufficient
to gather an accurate sampling. Instead, use the ping utility for more meaningful sampling information. Utilize the source
option, as well as extended ping options such as size, do-not-fragment, record-route and so on to collect
meaningful information.
Remember duplex mismatches and other interface properties can cause collisions and slow throughput of interfaces and
links. Use the show interface command to confirm settings and to view interface statistics that can help identify errors
in the configuration or other interface problems.

• Another approach:
• Intentionally introduce constraints within the path
• Generate additional traffic on a particular segment
• Reduce bandwidth through interface settings
• Redirect flow to a different interface with less capacity
• If end-to-end throughput changes, you have isolated the
bottleneck
• otherwise. that link was not the bottleneck
• Slow down the next portion and try again
Isolation Through Intentional Constraints

Another methodology for isolating bottlenecks uses the approach of intentionally introducing constraints within the path, one
segment at a time, and then comparing the end-to-end results. By monitoring the end-to-end throughput of the system as
whole, you can determine whether the bottleneck has been isolated. If the end-to-end throughput does not change when
constraints are added to a particular segment, that segment was not the bottleneck. If the end-to-end throughput is
impacted when constraints are added to a particular link, and the results can be reliably toggled, you have identified the
bottleneck within the system as a whole. Remember that this approach can be disruptive to production environments and
appropriate precautions should be taken.
Isolating the bottleneck can also be very helpful when trying to improve the overall throughput of a particular system-make
sure you are spending your upgrade dollars in the appropriate spot. It does no good to upgrade a 10 GB link to a 100 GB link
if that particular link was not the bottleneck in any particular end-to-end system.
You can generate additional traffic for a particular segment by increasing traffic on the segment, or by decreasing capacity.
To increase traffic, you can use a traffic generator to generate additional traffic or redirect existing traffic to a particular
interface. You can decrease capacity by reducing the bandwidth of a particular segment (for example, reducing an
aggregated link from three physical links to two) or by redirecting the traffic flow through a link that already has less capacity.
In either situation, be aware of the impact you are having on production environments.

Summary
• In this content, we:
• Described ways to avoid unnecessary disruptions to
production environments
• Described a troubleshooting process
• Described troubleshooting challenging network issues
We Discussed:
How to avoid unnecessary disruptions to production environments;
Troubleshooting as a process; and
Troubleshooting challenging network issues.

Review Questions
1. What are the four main steps in the troubleshooting
process described in this chapter?
2. What are the four categories of potential root cause
problems described in this chapter?
3. What type of symptom would indicate a problem
within the control plane?
4. What type of symptoms would indicate a problem
within the data plane?
Review Questions
l.
2.
3.
4.

The Troubleshooting Process Lab
• Apply the troubleshooting model described in this

chapter in real world troubleshooting scenarios.
The Troubleshooting Process Lab

The slide provides the objective for this lab.

Answers to Review Questions

1.
The four main steps in the troubleshooting process arc as follows: 1.) Define "success"; 2.) Isolate the component preventing success
(characterize, hypothesize, predict, test, and experiment); 3.) Identify a solution; and 4.) Implement the solution.
2.
The four categories of potential root cause problems described in this chapter are configuration, hardware, software, and "something
else."
3.
Missing routes from routing tables, are an example of a symptom that would indicate a problem within the control plane.
4.
Physical errors, MTU mismatch, firewall filters and police.rs, and intermittently dropped packets arc examples of symptoms that would
indicate a problem within the data plane.


JUnlf2v�f
Chapter 3: Junos Product Families

Objectives
able to:
• Describe the architectural philosophy of devices that run the
Junos OS and understand how this relates to
troubleshooting
• Describe traffic processing for transit and exception traffic
• Describe the function and components of the RE and PFE
within a device running the Junos OS.
• Describe FRUs
• Describe current Junos product families and understand
where to go for detailed information about your hardware
We Will Discuss:
The basic design architecture of devices that run the Junos operating system;
Traffic processing for transit and exception traffic;
The major components of the Routing Engine (RE) and the Packet Forwarding Engine (PFE);
field-replaceable units (FRUs); and
Junos product families.
Chapter 3-2 • Junos Product Families www.juniper.net

Agenda: Junos Product Families

7The Junos OS
• Control Plane and Data Plane
• Field-Replaceable Units
• Junos Product Families
TheJunos OS
The slide highlights the topics we will discuss. We discuss the highlighted topic first.
www.juniper.net Junos Product Families • Chapter 3-3

TheJunosOS
• Robust, modular operating system
• Provides industry-leading performance and scalability
• Based on FreeBSD
Robust, Modular, and Scalable

The Ju nos OS is the trusted, secure network operating system powering the high-performance network infrastructure offered
by Juniper Networks. The Junos kernel is based on FreeBSD, an open-source, multi-user, multi-access, UNIX-like operating
system.
Junos OS functionality is compartmentalized into multiple software processes. Each process handles a portion of the
device's functionality. The independent nature of these processes provides several important benefits.
Each process runs in its own protected memory space, ensuring one process does not directly interfere with another-only
relational dependencies exist. Because of this design, a single process failure (or restart) does not necessarily cause the
entire system to fail. This functionality plays an important role in troubleshooting when isolating and recovering from faults
within specific processes.
This highly modular architecture also prevents isolated failures from bringing an entire system down and ensures new
features can be added with less likelihood of breaking current functionality.

Single Software Train

• A single software train for all platforms running the
Junos OS
• Eases management overhead by providing a consistent set
of features that are implemented in a consistent manner
• The same troubleshooting methodology can be applied to all
devices running the Junos OS
- -
11..4 -12.1 -
- ·-
......................................................
J2320
TX Matrix
Single Software Source Code Base

The Junos product families include many different devices designed for different roles within the network. However, all
devices running the Junos OS are maintained from a single base source code. This common source ensures core features
work in a consistent manner across all platforms running the Junos OS. Because many features and services are configured
and managed the same way, setup tasks and ongoing maintenance and operations within your network are simplified.
Although the source code base is the same for all platforms running the Junos OS, some features are implemented
differently on different platforms. We make a strict effort, however, to ensure features are implemented in a consistent
manner when possible.
Another significant benefit of this common design architecture is that the same troubleshooting methodology can by applied
across all devices. Although function-specific and platform-specific troubleshooting for each product family might exist, the
base troubleshooting methodology remains the same across all devices running the Junos OS.

Separation of Duties
• All platforms running the Junos OS share a common
design philosophy
• Clean separation of control and forwarding functions
• Sometimes accomplished with hardware
• Sometimes implemented within software
Control Plane Internal Link
Data Plane
Cl!:i!c:!I
Frames In
......_____,....!!:..}--,.--..---- C!!:lt:Iul
Frames Out
Packet Forwarding Engine
Junos Architectural Philosophy

All Junos-based platforms share a common design philosophy that separates the device's control plane from the data plane.
The data plane, built around the PFE, handles the role-specific workload of the device. We refer to traffic passing through the
device as transit traffic.
The control plane, by contrast, is responsible for handling traffic destined to the device itself, such as routing updates and
system management. We call this exception traffic.
Because this architecture separates control operations from packet forwarding, the device can deliver superior performance
and highly reliable operation.
Understanding this separation of duties also plays a significant role in troubleshooting.


• The Junos OS
7Control Plane and Data Plane
Control Plane and Data Plane

The slide highlights the topic we discuss next.
www.juniper.net Junos Product Families • Chapter 3- 7

Control Plane Role

• The brain
• Builds and maintains routing and forwarding tables
• Controls and monitors the chassis
• Manages the PFE
• Remember, the RE only processes exception traffic
Data Plane
The Brain
The RE is the brain of the device. It is responsible for system management and for processing routing updates. The RE runs
various protocol and management software processes that run inside a protected memory environment. It provides the
command-line interface (CLI) and the J-Web graphical user interface (GUI). These user interfaces run on top of the Junos
kernel and provide user access and control of the device.
The RE is also responsible for building and maintaining the forwarding information necessary for the device to perform its
function within the network.
It handles all protocol processes in addition to other software processes that control the device's interfaces, the chassis
components, system management, and user access to the device. These software processes run on top of the Junos kernel,
which interacts with the PFE. The software directs all protocol traffic from the network to the RE for the required processing
The RE controls the PFE by providing accurate, up-to-date Layer 2 and Layer 3 forwarding tables and by downloading
microcode and managing software processes that reside in the PFE's microcode. The RE receives hardware and
environmental status messages from the PFE and acts upon them as appropriate.
Separation, Revisited
Remember, the RE does not play a direct role in the forwarding of individual transit traffic packets. When troubleshooting
traffic processing, once the proper forwarding information has been validated, troubleshooting efforts can be focused on the
data plane.

Control Plane-Components
• Common components:
• Processor
• Runs the Junos OS to maintain the router's routing tables and
routing protocols
•DRAM
• Provides storage for the routing and forwarding tables
• Buffers incoming packets
• Storage
• Can be hard disk, NANO flash. or both depending on the system.
• Used to store the Junos OS and also log files and memory dumps
• Visit www.juniper.net;techpubs/ for specific information
about the components in your hardware
Control Plane-A Hardware View

The slide outlines the individual components that make up the device's RE.
The specific components might vary by device. Visit www.juniper.net;techpubs for specific information about the
components of your hardware.

Control Plane-Troubleshooting
• Possible points of failure:
• Configuration errors
• Hardware errors
• Subcomponent-level failure isolation is not usually required
because faulty hardware generally results in replacing the entire
RE
• When working with platforms with redundant REs. isolation of the
faulty RE can be required
• Software errors
• Because of the design of the Junos OS. individual processes can
be restarted without impacting the entire RE
• In some situations. even subportions of processes can be
reinitialized independently
• Be sure to check latest release notes for known issues
Potential Problems with the RE

This slide discusses the possible points of failure that can occur with the control plane.

Data Plane Role
• The workhorse:
• Uses Layer 2 and Layer 3 forwarding tables provided by the
RE to forward traffic toward its destination or special
functions component
• Implements various services such as policing, stateless
firewall filtering, and class of service
Routing Engine
Data Plane
c:Jc:J c:Jc:J
---·--·�-·-···· .. ·----+
Frames In
Frames Out
The Workhorse
The data plane, built around the PFE, systematically forwards traffic based on a synchronized local copy of the forwarding
table created by the RE. Storing and using a local copy of the forwarding table allows the PFE to forward traffic more
efficiently by eliminating the need to consult the RE each time a packet needs to be processed. Using this local copy of the
forwarding table also allows platforms running the Junos OS to continue forwarding traffic during control plane instabilities.
The PFE also maintains Layer 2 bridging information.
In addition to forwarding traffic, the PFE also implements a number of advanced services. Some examples of advanced
services implemented through the PFE include policers that provide rate limiting, stateless firewall filters, and class of
service (CoS). Other services are available through special services cards that can be added to the data plane.

Data Plane A Logical Overview (1 of 2)

• Logically, routing functions are performed in a similar
way across all devices running the Junos OS
• A frame arrives on an ingress interface of the device
• Media-specific operations take place
• The Layer 2 header is removed
• The packet is broken into smaller chunks called J-cells for
quick storage and retrieval
• The Layer 3 and Layer 4 header information necessary to make a
forwarding decision (and stateless firewall decision) is added to a
notification cell and sent to the Internet Processor ASIC so a
forwarding decision can be made
• The remaining data is distributed into memory
The Process of Forwarding a Packet: Part 1

To accomplish the throughput and speed required for today's networks, Juniper Networks uses a divide-and-conquer
approach for forwarding packets. In this slide and the next, we cover the process of routing packets from a logical
perspective. Later, we provide a brief overview of how these functions are accomplished using different hardware.
All traffic traversing a router, arrives on an ingress interface. Simple media-specific operations such as link-level
validity checks take place.
The Layer 2 frame header is removed.
Layer 3 header information is checked for validity.
Cos classification takes place.
The packet is broken into 64-byte J-cells, which allows the process of storing and retrieving the packet to be
distributed for increased performance.
A special J-cell, called the notification cell, containing the information necessary to make forwarding and
stateless firewall decisions is created and sent to the Internet Processor ASIC so a forwarding decision can be
made.

Data Plane-A Logical Overview (2 of 2)

• Overview (contd.):
• A forwarding decision is made by the Internet Processor
ASIC and the information is relayed to the component that
will be handling the data next
• The next component could be an additional services processing
card or an egress interface
• When the component is ready to process the information,
the data is retrieved from memory and reassembled for
processing
• The reassembled data receives a new frame and is
transmitted out the egress interface
The Process of Forwarding a Packet: Part 2

The following list continues the overview of the forwarding process:
The Internet Processor ASIC makes a forwarding decision based on the PFE's local copy of the forwarding table.
The forwarding information is relayed in a results cell and the component responsible for the next processing is
notified.
The component responsible for processing indicates it is available to process the information and receives the
J-cells from memory. The J-cells are reassembled into a packet.
The J-cells are not expressly removed from memory. They are simply overwritten as new data arrives. In
the case of multicast traffic, this design allows for multiple egress interfaces to process the data as they
are able-so long as the J-cells are retrieved before they are overwritten by new data.
The reassembled data is sent to the egress interface where a new frame header is applied and the frame is
transmitted.

Data Plane A Physical Overview
• Distributed architecture
• Made up of several ASICs, processors, or both
• Different chips reside on different components within the data
plane
• On newer-generation devices. several tasks can be accomplished
within a single chip
• On newer-generation devices. several stand-alone PFEs can exist
within a single chassis. linked together through switching fabric
• On some devices, PFE functionality is emulated within a
single processor using software
Division of Duties
When Juniper Networks first entered the market, several key improvements were introduced to network hardware. In
addition to the separation of a control plane and data plane discussed earlier, we also introduced hardware-based
forwarding utilizing application-specific integrated circuits (ASICs). ASICs are designed to do specific tasks and they do them
very quickly.
Initially, individual ASICs were designed to handle each of the tasks described in the process of forwarding a packet.
Later, as technology improvements became available, newer ASICs were designed. These new ASICs added the ability to
combine several functions into a single chip, lowering power consumption and increasing throughput, while at the same time
allowing for even greater scalability than had been available with earlier ASICs.
On some devices running the Junos OS, this functionality is accomplished with software, running on a single code-complete
CPU. This functionality allows the same functionality for lower traffic volumes, but at a price performance balance
appropriate for the environment.
We discuss each of the available chipsets in the upcoming pages.

Data Plane-Components
• The names used for each of these vary with platform

• Interfaces
• Built-in interfaces
• Interchangeable interfaces (PICs, PIMs. MICs)
• Present in DPCs
• Additional services cards
• Line cards
• Flexible PIC Concentrators
• Modular Port Concentrators
• Dense Port Concentrators
• Switching boa rd
• Examples include: FEB. SFM. SIB. SCB. SFB and more
�. : ,_ =
Ol!W.4Jun1per-11<s;loc.All��x'lg\ki,, I
,y'.y'f��"""t'
_JLJnffl; =---
""'"'ti;#;;
-= .. � ..
WorldwideEducationServu:es
;;,;t� >.�- --
"'
wwwJumpernet I ts
Data Plane Components

Most data planes are made up of several components. For simplicity, we group these into three main categories: interfaces,
line cards, and switching boards.
All transit traffic must pass through interfaces. These interfaces can be modular or built-in. As indicated in the slide, several
names are used when referring to interfaces. We cover each of these named options in upcoming slides. For consistency, all
interfaces are identified as PICs in operational command output.
Switching boards and line cards work together to form the PFE portion of the data plane. In some cases, the functionality is
distributed across the line cards and the switching board. In other cases, PFE functionality resides only on the line card, and
multiple PFEs are linked together through a switch fabric, which resides on the switching board. In still other cases, the line
cards play no active role in the PFE and all PFE functionality resides on the switching board.
As illustrated on the slide, the names for line cards can vary with each platform, based on the PFE role performed. For
consistency, all line cards are identified as FPCs in operational command output.
Naming conventions for switching boards can also vary by platform, based on the role they play in the PFE. We discuss each
of the possible names and their differing functionality in upcoming slides.

PFE Implementation Varies

• Different objectives, different PFE implementations
• Functionality of device
• Balance cost and performance
• Continual improvements in silicon design
• Different implementations include:
• A/B/C
• L/M/N/R
• I chip
• Trio chipset
• Express chipset
• RTOS
Different Objectives, Different PFE Implementations

Different Junos product families have different objectives. Sometimes these objectives are driven by the differing
functionality of the devices. At other times, it could be driven by the need to balance cost and performance.
Additionally, we are always seeking to add functionality and increased performance to our product offerings. This goal leads
to continued improvements in silicon design.
As a result, there are multiple implementations of the PFE functionality within Junos-based devices. The differing chipsets
are listed on the slide along with the RTOS implementation, which runs on a code-complete processor. We cover each of
these implementations at a high level in the following slides.
Remember that fault isolation is only necessary to the component level. Although some ASICs can produce specific errors
that indicate which component is experiencing failure, it is not always necessary to identify where within the PFE a
breakdown occurs-only that a particular component is faulty.

Sample Distributed Architecture

Example: M40e
Cf chip-+
Key: Internet Forwarding
Data , .-
1
Processor 11 Table
PFE Control ------> I
I
I
Switching Board Buffer Buffer

A chip-+
Manager Manager
(SFMs) 1 2 -
�
i
.!.J LI I LI
-
Line Card B chip-+
-
-
1/0 M 1/0 M 1/0 M
-
(FPCs) � E E I• E
-
Manager Manager Manager
M M M
Interfaces
(PICs) -
--
I
I
Media-
I
�-
I
I
Addti.
I
,..._,...
-
I
I
Media-
- Specific
ASIC - Services
PIG - Specific
ASIC
Speed-ASIC Based Forwarding

One of the major performance improvements Juniper Networks brought to the market was the introduction of ASIC-based
forwarding. This design eliminated the need to do a processor-intensive lookup for each packet and produced line-rate
transfer speeds.
This slide outlines the flow of a packet through an A/B/C architecture data plane.
A frame arrives on an ingress interface of the device where all media-specific operations take place.
The 1/0 Manager (B chip) removes the Layer 2 header information and breaks the packet into J-cells before
forwarding the information to the Buffer Manager.
The Buffer Manager (A chip) extracts the Layer 3 and Layer 4 header information necessary to make a
forwarding decision and creates a notification cell that is sent to the Internet Processor ASIC. It also distributes
the J-cells across shared memory that resides on each Flexible PIG Concentrator (FPC)
A forwarding decision is made by the Internet Processor ASIC (C chip) and the information is relayed in a results
cell to the outbound Buffer Manager, which notifies the egress 1/0 Manager of a pending packet for delivery.
When the 1/0 Manager is ready to handle the packet, the Buffer Manager retrieves the data from memory and
the 1/0 manager reassembles the packet and adds a new frame header.
The reassembled and framed packet is transmitted out the egress interface.
www.juniper.net Junes Product Families • Chapter 3-17

Sample Distributed Architecture-1./M/N/R

,---------------------------------
Example: T640 I
Fchip-+ Switch Fabric

Switching Board
(SIBs) � ------------------ I
--,
I I
Internet Switch "VI
.-+ Processor II I
.--+
I Interface
I
I Other
I I
I I I FPCs
I I
Line Card I
I Rchip I I Nchip
I I I
(FPCs) 1' I
Layer2/Layer3 Queuing&
Packet �
Switch
f--+ Memory RAM -
Processing
Interface
Interface +
Lchip Nchip Mchip
Interfaces
(PICs) -
--
I
I
---.... Additional
I Key:
Data
-
Media- Iii
Services
Specific
ASIC
PIG PFE Control ------>
Scalability-PFE on a Card
The L/M/N/R chipset divided the duties of the Buffer Manager into two chips, the N Chip, which is responsible for creating
and processing the notification and results cells, and the M chip, which is responsible for distributing J-cells to memory.
The L chip is responsible for all Layer 2 and Layer 3 header checking as well as breaking the packet up into J-cells.
The R chip is responsible for performing the route lookup and making a forwarding decision.
The L/M/N/R PFE also combined all PFE functionality onto a single line card. Multiple PFEs could be used within a single
system for increased performance and throughput. The multiple PFEs are linked together through a switched fabric that
resides on the switching board. Only packets that arrive on one FPC and leave on another are required to cross the switching
board.

Sample Distributed Architecture-I Chip

(:I. of 2) ,---------------------------------
Example: M120 Switch Fabric
Switching Board
(FEB)
·-·
- I chip "W
Other
FEBs
Line Card
FPCs " FPCs
(FPCs/CFPCs) -+ -
_ ----
I
Interfaces
-... I - I
-
Media- Media-
(PICs)
- Specific
ASIC
Specific
ASIC
Even More Scalability-PFE on a Chip

The introduction of the I chip took the performance and scalability improvements introduced with the L/M/N/R chipset and
combined them into a single chip, further increasing performance and reducing power requirements.
In this implementation, the I chip resides on the switching board and the line cards play only a passive role in the data plane.
Multiple switching boards can be used to increase performance and add redundancy.

Sample Distributed Architecture-I Chip

(2 of 2)
Example= SRX3600
Switching Board
(SFB)
Switch Fabric
-------·----
IOC SPC
-----
Line Card
I chip
---- ---
(IOCs/NPCs/SPCs) �
�
Media
Specific
Interfaces ASIC
Note. IOCs and NPCs are combined onto a single board

on SRX5600 and SRX5800 and share I chips
Combine PFE Functionality with Additional Hardware for Added Functionality

In this implementation, the I chip resides on each of the available line cards in an SRX3600 Services Gateway. The input/
output card (IOC) provides the interface capabilities, whereas the Network Processing Card (NPC) and Services Processing
Card (SPC) line cards provide additional stateful security functionality.
We discuss the specifics of the SRX Series architecture on a later slide.

Data Plane-Troubleshooting
• Possible points of failure:

• Configuration errors
• Hardware errors
• Because of distributed architecture design. errors must be
correlated with a particular component of the PFE
• Subcomponent-level failure isolation is not usually required
because faulty hardware will generally result in replacing the entire
component
• Software errors
• Some components run their own microkernels
• These microkernals can be accessed independently for
troubleshooting purposes
Potential Problems with the Data Plane

This slide discusses the possible points of failure that can occur with the data plane.


• The Ju nos OS
�Field-Replaceable Units
Field-Replaceable Units

Field-Replaceable Units (1 of 2)
• Juniper Networks hardware
• Most devices running the Junos OS are made up of a
chassis, containing a midplane or backplane. and several
components that can be added to the chassis
• The removable components are called field-replaceable
units
• FRUs play a role in troubleshooting
• Smallest unit required for isolation of hardware problems
• Can be relocated within chassis or between equipment to help in
the troubleshooting process
• Remove all FRUs from a chassis before shipping it for a
return materials authorization (RMA)
Modular Architecture
Most hardware running the Junos OS is made up of a central chassis containing some form of midplane and several
components that plug into it called field-replaceable units (FRUs).
FRUs are any component of the device that can be replaced. It does not include subcomponent hardware such as the
memory or hard disk on an RE. For instance, if the hard disk on an RE fails, the entire RE is replaced, not just the hard disk.
For this reason, it is not necessary to isolate hardware faults beyond the component level.
Because FRUs can often be added or removed from a system with minimal or no impact to the forwarding functions of the
device, offlining or removing a particular component can prove very helpful in isolating hardware failures. Additionally, if
like-equipment is available, FRUs can often be relocated to help isolate the faulty hardware component. In certain cases,
FRUs can be relocated within the same chassis to assist in hardware fault isolation.
Before sending any chassis back to Juniper Networks for a return materials authorization (RMA). be sure to remove all FRUs
from the chassis.

Field-Replaceable Units (2 of 2)
• FRUs vary by platform
• Some FRUs have an individual serial number
• Some FRUs do not have serial numbers
• Varies with platform
lab@rr�C-1> show chassis hardware
Hardware inventory:
Item Version ?art number serial number Description
Chassis D4897 MXBO
Midplane REV 06 711-031594 I YK898o MXBO
FPC 1 BUILTIN I BUILTIN I MPC BUILTIN

MIC O REV 22 750-028392 YK7402 3D 20x
lGE(LAN) SFP
Fan Tray Fan Tray
- .j�n,w:__ Worl�de Education Services wwwjurupe.-.net 1 24
FRUs Vary Platform to Platform

The slide shows how FRUs are identified. Some FRUs have unique serial numbers, whereas others do not.
In the output displayed in the slide, taken from an MX80. Notice a serial number is provided for both the chassis and the
midplane. The chassis serial number is used to identify the device itself and is used to track service contract information.
The midplane serial number, by contrast, would be used if the entire chassis were sent in for an RMA. Remember, all other
FRUs should be removed from a chassis before it is sent in for an RMA.
Also in the output, we see that MPC 1 (displayed as FPC 1) is a built-in component and therefore does not contain a unique
serial number. If we determined MPC 1 to have experienced a hardware failure, the entire chassis would have to be replaced.
By contrast, the Modular Interface Controller (MIC) is a removable FRU and has its own part number and serial number.
Notice that the fan tray is removable but does not contain a unique serial number.
Most components also have a small rectangular serial number ID label attached to the component body. Sometimes, if the
FRU will not come up in a chassis, this tag must be used to identify the FRU.
In the following slides, we discuss how to find out which FRUs are available for a particular device running the Junos OS. We
also cover how to determine which FRUs are currently installed in your hardware.

Sample Platform FRUs
SCGs
CB
REs (under cover)
Air filter
SIBs
PEMs
T640 FRUs
Sample FRUs
The slide displays a populated T640 chassis and identifies several FRUs.

Identifying FRUs for Your Hardware
• A list of all available FRUs is available online at

www.juniper.net/techpubs
• Select product family
• Select device
• Select Maintenance tab
• Each FRU is listed under Replacing Components
Hem�> Support> "".'",e,ch:1ica1 Occurnen1a11:,n > MX Sents Reuter-�> !.1X4$0
> CU Explorer t�_.

g Print O SHARE -�··;rt,,. �· Rate and give fee-dback: �
• Content Explorer f�!tft MX480 30 Universal Edge Router
• Documen·tation Help �J.w !,1X SBiies Hardw.:1re & Stft."lare Doc:umentatian Horne
• E11terprise M!B5
Overview Components: Planning Safety lnsta.llation Maintenance Troubleshooting
• E:OL Documentation
t
> feature E.xplorer8f.J.6 Maintaining Components Replacing Components
Available FRUs
Each device type has a unique list of available FRUs.
You can find a list of the FRUs available for your equipment by visiting www.juniper.net/techpubs.
Additional information on handing FRUs and all steps required to change each FRUs is also available.
Be sure to follow proper electrostatic discharge (ESD) procedures when working with hardware. Always store hardware in
appropriate ESD packaging. Failing to do so can damage hardware. Even though the damage might not be immediately
evident, any static discharge can affect the integrity of the hardware and decrease its useful life.
Also be aware that some components can be very heavy. Be prepared for the weight and use appropriate equipment to avoid
injury to yourself or the equipment.

JunosTroubleshooting in the NOC
Identifying Your Installed Hardware

• Identify FRUs installed in your equipment using the
command show chassis hardware
lab@mxC-1> show chassis hardware
Hardware inventory:
Item Version Part number serial number Description
Chassis D4897 MX80
Midplane REV 06 ?ll-031594 YK8986 MXBO
PEM O Rev 03 740-028288 UG00890 AC Power Entry Module
Routing Engine BUILTIN BUILT IN Routing Engine
TFEB 0 BUILT IN BUILT IN Forwarding Engine
Processor
QXM 0 REV 05 ?ll-028408 YK6544 MPC QXM
FPC 0 BUILTIN BUILTIN MPC BUILTIN
MIC 0 BUILTIN BUILT IN 4x lOGE XFP
PIC 0 BUILTIN BUILTIN 4x lOGE XFP
Identify FRUs Installed in Your Hardware

You can identify the FRUs installed in your equipment by using the show chassis hardware command. (Remember, you
can add the pipe command I no-more to get complete output without pausing for each screen.)
The following describes the components identified in the displayed output:
Chassis-The chassis description identifies the type of hardware from which the output is obtained. Recall that
the chassis serial number is used to identify the device itself and is used to track service contract information.
This service contract information is often referred to as entitlement.
Midplane-The midplane description also identifies the type of hardware from which the output is obtained.
Additional information includes the part number and revision of the chassis. Recall that the midplane serial
number would be used if the entire chassis were sent in for an RMA. Remember, all FRUs should be removed
from a chassis before it is sent in for an RMA.
PEM-The power entry module (PEM) is a serialized part number. Many devices use multiple PEMs. The
description specifies whether it is an AC or DC power entry module.The two cannot be combined within a
chassis.

Identify FRUs Installed in Your Hardware (contd.)

Routing Engine-The RE might be built-in, as it is in this output from an MX80, or it might be a removable FRU. In
this instance, if the RE were to experience a hardware failure, the entire chassis would need to be replaced as
an RMA. Remember, the midplane serial number would be used when creating the RMA. Also, be aware that
several devices use multiple REs. The serial number and revision number would be included for each
removable RE.
TFEB-The switching board used in the MX80 is called a Forwarding Engine Board (FEB). In the output
displayed, TFEB indicates the use of the Trio chipset FEB. Like the RE, the TFEB is built-in. If the TFEB were to
experience a hardware failure, the entire chassis would need to be replaced as an RMA.
FPC-The term FPC is displayed as an installation position identifier and represents the legacy identification
native to the Junos OS. The description identifies the component in this position as a Modular Port Concentrator
(MPG). As displayed in this output, the MPG is built-in. If the MPG were to experience a hardware failure, the
entire chassis would need to be replaced as an RMA.
MIC-The output display in the slide indicates there is a built-in MIC. This output differs from the earlier output
that showed a MIC installed in FPCl. In the earlier output, the MIC had its own serial number and part number.
In this instance, if the MIC were to experience a hardware failure, the entire chassis would need to be replaced
as an RMA. Also notice the legacy identification of PIG. The term PIG will be used within several troubleshooting
commands.

Working with FRUs (1 of 2)
• Working with FRUs:

• Offline or online procedures must be followed
• Use buttons on the hardware
- or -
• Use the CLI or GUI
• Hot-swappable
• Can remove or replace FRUs without powering down the device or
disrupting overall forwarding functions
• Also called hot-insertable or hot-removable
• Hot-pluggable
• Can remove or replace FRUs without powering down the device:
however, the associated forwarding functions are interrupted when
the component is removed
Working with FRUs

Because of the modular architecture of the Junos OS and the associated hardware, many FRUs can be removed or added
with minimal or no operational downtime. Be sure to follow the instructions provided at www.juniper.neVtechpubs for adding
or removing each FRU.
FRUs are distinguished by whether they can be removed and installed without causing system disruption. Generally
speaking, the three types of FRUs are the following:
Hot-swappable FRUs-You can remove and replace these components without powering off the device or
disrupting global forwarding functions. We often refer to this type of FRU as being hot-insertable and
hot-removable.
Hot-pluggable FRUs-You can remove and replace these components without powering off the device, but you
interrupt the global forwarding functions of the system when you remove the component. Sometimes this
interruption is very minimal, but it does have the ability to impact frames traversing the device at that particular
moment.
FRUs that require power off-In rare cases, an FRU requires you to remove power from the chassis before
removing or inserting it.

Working with FRUs (2 of 2)
Air Filters Nonredundant Routing Engines

Front and Rear Fan Trays Nonredundant Control Boards
Redundant Power Supplies Nonredundant Switching Boards
Craft Interface
Redundant Routing Engines
Redundant Switching Boards (SIBs,
CFEBs, SCBs)
Line Cards (FPCs, DPCs, MPCs, IOCs)
Interface Cards (PICs, M IGs, PIMs)
• A complete list of FRUs and handling and replacement

instructions for each device is available at
www.juniper.net/techpubs
Hot-Swappable and Hot-Pluggable

This slide identifies several FRUs that are hot-swappable and hot-pluggable.
Because FRUs and handling procedures can vary from device to device, visit www.juniper.net;techpubs for a complete list of
FRUs and handling instructions.
Chapter 3-30 • Junes Product Families www.juniper.net


• The Ju nos OS
7Junos Product Families
Junos Product Families


Overview of Devices Running the Junos OS

• Platforms running the Junos OS include switches, routers. and
security devices, and are suited for small to large networks in
both enterprise and service provider environments
• Multiservice routers
• Packet transport switches
• Ethernet services routers
• Universal access routers
• Mobile secure routers
• Ethernet switches
• Security services gateways
JUn05
Junos-Based Devices-Meeting Network Needs
Juniper Networks has developed a wide range of platforms to meet your networking needs. The platform families listed
below all run the Junos OS:
Multiservice routers (T Series, M Series, and J Series);
Packet transport switches (PTX);
Ethernet services routers (MX Series);
Universal access routers (ACX Series);
Mobile secure routers (LN Series);
Ethernet switches (EX Series); and
Security services gateways (SRX Series).
We discuss each of these product families in more detail on the following pages.
Although the product list provided in this course was complete at the time of publication, note that we are constantly
releasing new hardware. It takes a constant effort to always be on the leading edge! For the most current hardware
information available, visit www.juniper.net;techpubs.

Multiservice Routing Platforms

• T Series core routers:
• T320, T640, T1600, T4000, TX Matrix, TX Matrix Plus
• M Series Multiservice Edge Routers:
• M7i, M10i, M40e, M120, and M320
• J Series Services Routers:
• J2320, J2350, J4350. and J6350
T Series Core Routers

Juniper Networks T Series core routers provide the highest possible forwarding performance density on the Internet today.
They offer a wide selection of high-speed and extremely high-speed interface options suited for service provider cores, while
maintaining feature richness and proven reliability
M Series Multiservice Edge Routers

Juniper Networks M Series Multiservice Edge Routers uniquely combine best-in-class IP and MPLS capabilities with
unmatched reliability, stability, security, and service richness. These multiservice edge routers provide industry-leading port
density across a wide range of medium-speed to high-speed interface options and price points.
J Series Service Routers

Juniper Networks J Series Services routers offer predictable high performance and a variety of flexible interfaces that deliver
secure, reliable network connectivity that is cost effective for remote, branch, and regional offices, and for small businesses.

• T Series core routers:

• Up to 240 Gbps throughput
per slot
• Up to 3.84 Tbps
total throughput T640
per chassis
• Wide range of interfaces
• T1 to 100 Gbps Ethernet
• ATM, SONET/SDH, Ethernet,
Serialized
• Additional Services PICs
TX Matrix Plus
TX Matrix

T Series core routers, which support the industry's first standards-based 100-GB interface, are ideal for infrastructure that
must scale to meet growing Internet traffic. T Series routers offer continuous operations for the core and provide the lowest
capacity-based power consumption in the industry.
T Series Interfaces
T Series routers provide a wide range of high speed interfaces for large networks and network applications, such as those
supported by Internet Service Providers (ISPs).

T Series-Data Plane Components

• PFE architecture
• L/M/N/R chipset - Type 3 and type 4 FPCs
• Trio chipset - Type 5 FPCs
• Multiple active PFE complexes available
• Data plane distribution
• Media-specific ASICs reside on PICs
• L/M/N/R chips or Trio chipset resides on FPCs
• Switch Fabric resides on SIBs
• Component-level redundancy can include:
• Routing Engines, Switch Interface Boards, SONET Clock
Generators, power supplies, cooling systems
T Series Architecture
T Series routers use multiple PFEs, each using an L/M/N/R chipset for the type 3 and type 4 FPCs and the Trio chipset for
the type 5 FPCs. These PFEs are tied together through the switch fabric. All of these components are tied together through a
midplane.
Data packets are transferred across the midplane from the PFE on the originating FPC to the Switch Interface Boards (SIBs),
and from the SIBs across the midplane to the PFE on the destination FPC.
T Series Data Plane

Each PIC includes media-specific ASICs. PICs are hot-removable and hot-insertable.
The L, M, N, and R chips or Trio chipset resides on each FPC. Each FPC also contains data memory that is managed by the
Queuing and Memory Interface ASICs. It might be necessary to determine which FPC is generating a specific chip error.
Switch Interface Boards (SIBs) create the switch fabric for the router.
T Series Redundancy
T Series routers are designed so that no single point of failure can cause the entire system to fail. The slide outlines several
options for redundancy.


• M Series routers:
• Up to 20 Gbps throughput per slot
• Up to 320 Gbps total throughput
per chassis M320
M40e
• Wide range of interfaces
• T1 to 10 Gbps
• ATM, SONET/SDH, Ethernet,
Serialized
• Additional Services PICs M7i
M10i
M120

M Series routers can be deployed in various roles within your network. The M Series portfolio uniquely combines IP/MPLS
capabilities with service richness, stability, reliability, and security. The M Series routers allow service providers to
consolidate multiple networks on a single IP/MPLS infrastructure. You can deploy the M Series platforms as a multiservice
edge router, a small or a medium core router, a route reflector, or a peering device. It can also be deployed in multicast,
mobile, or data center applications.
The M Series portfolio ranges from 7 Gbps platforms to 320 Gbps platforms.
M Series Interfaces
M Series routers provide a wide range of high-speed interfaces for large networks. The slide lists the available interfaces.

M40e-Data Plane Components

• A/B/C chipset
• B Chips resides on FPCs
• C and A chips reside on SFMs
• Routing Engines, Switching and Forwarding Modules, PFE
Clock Generators, power supplies, cooling systems
M40e Architecture
The M40e uses the A/B/C chipset.
M40e Data Plane

Each FPC contains the B chip and memory that becomes part of a shared memory pool.
The Cf chip and A chips reside on the Switching and Forwarding Module (SFM).
M40e Redundancy
The M40e provides multiple options for redundancy including redundant REs, SFMs, PFE Clock Generators (PCGs), power
supplies, and cooling systems.
When operating with two SFMs, one is active the other acts as a hot-standby.

M7i and M10i-Data Plane Components

• A/B/C within a single ASIC
• Built-in (non-removable) FPCs act as relay only
• A/B/C chip (single chip) resides on CFEB (or CFEB-E)
• Compact Forwarding Engine Board or Enhanced Compact
Forwarding Engine Board, power supplies
M7i and M10i Architecture

The M7i and M10i use an updated version of the A/B/C architecture where all ASIC functions are integrated into a single
chip. This chip resides on the Compact Forwarding Engine Board (CFEB) or an Enhanced Compact Forwarding Engine Board
(CFEB-E).
M7i and M10i Data Plane

FPCs are not removable and perform only a passive role in the data plane.
The A/B/C chip, which is combined within a single ASIC, resides on the CFEB or CFEB-E.
M7i and M10i Redundancy

The M71 and M10 provide options for redundant power supplies.
The M10i supports a backup CFEB or CFEB-E. The optional second CFEB or CFEB-E acts as a hot-standby and does not
participate with traffic forwarding while the primary CFEB or CFEB-E is operational.

M120 and M320-Data Plane Components

• I chip
• Multiple active PFE complexes available
• FPCs or CFPCs act as relay only
• I chip resides on FEBs
• Routing Engines, Forwarding Engine Boards, power supplies.
cooling systems
M120 and M320 Architecture

The M120 and M320 use the I chip architecture. The I chip resides on the FEB or a CFEB-E. You can install multiple FEBs or
CFEBs and more than one FEB or CFEB can be active at a time.
M120 and M320 Data Plane

FPCs and CFPCs perform only a passive role in the data plane.
The I Chip resides on the FEB or CFEB. Multiple FEBs or CFEBs can be installed and active.
M120 and M320 Redundancy

A fully configured M120 or M320 is designed so that no single point of failure can cause the entire system to fail. Only a fully
configured router provides complete redundancy. All other configurations provide partial redundancy. The M120 and M320
offer redundancy for REs, FEBs, power supplies, and cooling systems.

J Series Services Routers
• J Series Services Routers:

J2320
• RTOS
• Security platforms
• Up to 3.5 Gbps firewall throughput J2350
per chassis
• Additional services available without
additional hardware J4350
• Interfaces:
• Ethernet, Serial, ISDN, DSL, T1/E1
J6320
J Series: Balancing Cost, Performance, and Functionality

J Series routers offer a modular platform for enterprises and are used to securely connect small, branch, and regional offices
to a central site router across ISP networks.
To balance cost and performance. the Junos OS and all associated processes and packet forwarding, are accomplished
using a microprocessor running a real-time operating system (RTOS) rather than individual ASICs. This microprocessor
provides Junos functionality, including the separation of control plane and data plane, but it is accomplished with software
processes rather than individual ASICs.
It also adds the capability for providing additional services without the need for additional hardware. Enhanced services
include MPLS, IP version 6 (1Pv6), and security services such as stateful firewalls, Network Address Translation (NAT), and IP
Security (IPsec) tunneling.
The J Series service devices also support the WXC Integrated Services Module, which provides WAN acceleration.
J Series Interfaces
All J2320, J2350, J4350, and J6350 routers ship with four fixed 10/100/1000 Ethernet ports. You can add additional
modular LAN and WAN interfaces using Physical Interface Modules (PIMs).
J Series routers provide a large selection of connectivity options including Tl and El, Serial, Fast Ethernet, Gigabit Ethernet,
DS3, E3, ISDN, ADSL2+, and G.SHDSL.

J Series RTOS PFE

• All PFE components are emulated within a single
processor using software
��-J_u_n_os�OS��__,/ RE
UNIX Socket
fwdd-unix PFE
hared Memory
______ -I �lost 1 .. S
c::::o--G�l�
Frame In
J[XEC H �:;:t �
'----------------------,�--�---
c::::o
Fram e Out
rt threads
RTOS
Both the RE and the PFE functions are accomplished using software processes running a real-time operating system.
The RTOS is a virtual architecture where CPU and memory resources are dynamically allocated to processes and real-time
threads on an as-needed basis. This virtual architecture allows available resources to be used in the most efficient manner,
adjusting as necessary.
The FWDD process is emulating the control board of hardware based devices.

PTX Series Packet Transport Switches

• PTX Series
• Design to scale beyond 2 Tbps
throughput per slot
• Capable of 600 Mpps per slot
• Up to 16 Tbps total throughput per
chassis
• Massive amounts of Ethernet
interfaces
• 384 - 10 Gbps Ethernet interfaces
40 Gbps Ethernet interfaces
100 Gbps Ethernet interfaces
PTX5000
PTXSeries
Juniper Networks PTX Series Packet Transport Switches are designed for the converged supercore. The system is the first
supercore packet switch in the industry, and delivers powerful capabilities based on innovative silicon and forwarding
architecture that is focused on optimizing MPLS and Ethernet. PTX Series Packet Transport Switches deliver several critical
core functionalities and capabilities, including game changing density and scalability, cost optimization, high availability and
network simplification. They can readily adapt to today's rapidly changing traffic patterns for video, mobility and cloud-based
services.
PTX Series Packet Transport Switches are based on Juniper's patented Express chipset. Express uses state-of-the-art 40nm
fabrication technology and is built with a no packet drop assurance. The PTX Series is designed to scale up to 2 Tbps and
600 Mpps per slot and provide significant cost reduction over traditional core transport solutions.
PTX Series provides a unique combination of hardware and software features that allow service providers to manage their
supercore network more efficiently because the platforms are built from ground up for speed, scale and cost optimization.
They are the first supercore packet switches in the Industry, and support a single chassis with 8 and 16 Tbps capacity. The
modular power design allows power efficiency in the order of 1 watt per Gbps per line rate port.
That's a Lot of Interfaces!

The PTX 5000 router supports up to 384 X 10 Gbps, 32 X 40 Gbps, or 32 X 100 Gbps Ethernet interfaces.

PTX Series - Data Plane Components

• Express chipset
• Express chipset reside on FPCs
• Switch fabric resides on SIBs
• Routing Engines, Switch Control Boards, power supplies,
cooling systems
PTX Series Architecture

PTX Series routers use multiple PFEs, each using the Express chipset. These PFEs are tied together through the switch
fabric. All of these components are tied together through a midplane.
Data packets are transferred across the midplane from the PFE on the originating FPC to the SIBs, and from the SIBs across
the midplane to the PFE on the destination FPC.
PTX Series Data Plane

The Express chipset resides on each FPC. It might be necessary to determine which FPC is generating a specific chip error.
SIBs create the switch fabric for the router.
PTX Series Redundancy

PTX Series routers are designed so that no single point of failure can cause the entire system to fail. The slide outlines
several options for redundancy.

MX Series 3D Universal Edge Routers

• MX Series routers:
• Up to 2 Tbps throughput per slot
• Up to 80 Tbps of system scaling capacity
• Interfaces
• Ethernet and SON ET
• Multiservices DPC
l&Lot • MX80
·1 q "" :
MX10 MX240
MX480
:.c·;- ·, MX960
MX5 MX2010 MX2020
MX Series 30 Universal Edge Routers

Juniper Networks MX Series 30 Universal Edge Routers provide Ethernet switching capabilities without sacrificing
carrier-class routing features customers expect. MX Series routers surpass the requirements of carrier g - rade Ethernet
switches as defined by the Metro Ethernet Forum, leveraging the MPLS capabilities that have made Juniper Networks routers
the platforms of choice for service providers seeking maximum performance, availability, and service agility. By extending the
carrier-class routing functionality of the Ju nos OS to include LAN switching functionality to facilitate migration and growth,
Juniper Networks brings its traditional advantages to Ethernet aggregation. These advantages include high-performance
routing capabilities such as nonstop active routing (NSR), MPLS, fast reroute, and unified in-service software upgrade.
Furthermore, the router's Ethernet switching separates Layer 2 and Layer 3 forwarding with the intelligence to bridge when
possible and route when needed.
MX Series Interfaces
MX Series routers support Dense Port Concentrator (DPC) interface cards, offering enhanced queuing capabilities, QoS, L2
switching, and L3 routing services. MPCs can contain Ethernet or SONET/SDH based interfaces.
Currently the Multiservices DPC supports the following Layer 3 services: stateful firewall, NAT, intrusion detection service
(IDS), IPsec, active flow monitoring, real-time performance monitoring (RPM), and generic routing encapsulation (GRE)
tunnels (including GRE key and fragmentation).

MX Series-Data Plane Components
• Trio chipset
• Media-specific ASICs reside on and PICs. MICs and DPCs
• Media-specific functionality is included in the Trio chipset on MPCs
• Trio chipset chip resides on MPCs
I chip resides on FPCs and DPCs
• Switch fabric resides on SCBs
• Routing Engines. Switch Control Boards. power supplies,
cooling systems
MX Series Architecture
MPCs use the new Trio chipset for even greater performance and scalability. The I chip resides on FPCs and DPCs.
MX Series Data Plane

In the case of FPCs, media-specific ASICs reside on the installed PICs. In the case of DPCs, media-specific ASICs are part of
the DPC along with the I chip. On MPCs, the media-specific elements are integrated into the Trio chipset and reside on the
MPC.
MX Series routers use DPCs, FPCs, and MPCs. DPCs have a fixed interface architecture and blend the interface and line card
into a single piece of hardware. FPCs allow flexibility in interface options by housing PICs. MPCs also allow flexibility in
interface options by housing MICs.
MX Series Redundancy
A fully configured MX Series router is designed so that no single point of failure can cause the entire system to fail. Only a
fully configured router provides complete redundancy. All other configurations provide partial redundancy. The MX Series
platforms offer redundancy for Routing Engines, Switch Control Boards, power supplies, and cooling systems.

ACX Series Universal Access Routers
• ACX Series routers:

• Access Network
• Mobile backhaul
• Up to 60 Gbps of throughput
ACX2100
• 1 U to 2.5 U form factor
• Interfaces
•TOM ACX2000
• Gigabit copper and SFP

• 10 Gigabit Ethernet ACX1100
ACX1000
ACX Series routers

Juniper Networks ACX Series Universal Access Routers include the fixed configuration ACX1000, ACX1100, ACX2000, and
ACX2100 Universal Access Routers in a compact 1 U form factor. These are environmentally hardened and support passive
cooling for easy deployments in outside street cabinets or environmental enclosures. The ACX4000 Universal Access Router
is a modular 2.5 U form factor with higher performance and configurable options for interface types. Powered by The Junos
OS, the ACX Series delivers industry-leading performance and simplified end-to-end provisioning with support for full IP/
MPLS with traffic engineering, and extensive Layer 2 and Layer 3 functionality.
ACX Series Universal Access Routers cost-effectively address current operator challenges to rapidly deploy new, high
bandwidth services. With industry leading performance of up to 60 Gbps for all models and the most comprehensive,
traditional, and packet timing features, the ACX Series is well positioned to address the growing bandwidth needs in the
access network. These platforms deliver the necessary scale and performance needed to support multi-generation services.
With support for extensive hardware and software features, the ACX Series extends the operational intelligence all the way to
the access network to deliver seamless end-to-end services.
ACX Interfaces
Equipped with interfaces for both time-division multiplexing (TDM) and Ethernet (1 Gbps and 10 Gbps interfaces), as well as
support for high precision clocking and synchronization, the ACX Series platforms can support the mobile network's
evolution path from 2G and 2.5G to 3G, 4G, and Long Term Evolution (LTE).

ACX Series Data Plane Components
• Data plane components

• Single board router
• Built-in RE and PFE
• Single PFE handles ingress and egress packet forwarding
• RE provides
• Layer 3 routing services and network management
• PFE performs
• Layer 2 and Layer 3 packet switching
• Route lookups
• Packet forwarding
ACX Data Plane Components

The ACX Series router is a single-board router with a built-in routing engine and one PFE. Because there is no switching
fabric, the single PFE takes care of both ingress and egress packet forwarding:
RE-Provides Layer 3 routing services and network management.
PFE-Performs Layer 2 and Layer 3 packet switching, route lookups, and packet forwarding.
www.juniper.net Junos Product Families • Chapter 3-4 7

LN Series Mobile Secure Routers
• LN Series routers
• High performance firewall and IDS
• I Psec features
• Favorable SWAP characteristics
• Designed for network access
• Military
• First responder
• Transportation vehicles
L 1000
LN Series Routers
The Juniper Networks LN1000 Mobile Secure Router is an edge access router that delivers a high-performance routing
firewall and IDS. Packaged in the standard 4 x 6 x .85 inches VPX form factor, it consumes 35 watts of power or less and
weighs less than 1.5 lbs. The Space, Weight, and Power (SWAP) characteristics of the LN1000 make it ideal for customers
who require a secure and rugged network access router with a small footprint in a transportable package. The LN1000
provides the power of Juniper's hardware and Junos OS routing functionality across its 8 x 1 Gbps Ethernet interfaces.
The LN1000 addresses the growing demand for a network access presence in military, first responder and transportation
vehicles, mining and exploration equipment, unmanned aircraft, and power grids. Until now, many of these networks were
forced to leverage traditional routing and security boxes that were designed for equipment rack installations requiring forced
air or fans for cooling. These designs did not consider the SWAP requirements of mobile secure networks. These mobile, and
in some instances remote network endpoints, have a unique set of requirements that only the LN1000 can provide in a VPX
form factor.

LN Series Data Plane Components
• Data plane components

• Router backplane
• Install in any VITA 46.0-compliant chassis
• Optionally, install in VITA 46.0-compliant chassis with a mid plane
and an LN1000-V rear transition module
• Interfaces
• 8 - 1 Gbps Ethernet interfaces
• IPMI interface
LN Series Data Plane Components

An external interface, located on the back of the LN1000-V router, connects the router to the VITA 46.0-compliant chassis.
The router's PO, Pl, and P2 connectors plugging into the backplane are VITA 46.0-compatible for a 3U peripheral slot with
specific key definitions. The PO and P2 connectors are keyed per the VITA 46.12 specification. Power to the LN1000-V router
is provided through the PO connector.
The LN1000-V router supports up to eight ports of gigabit Ethernet traffic with up to 1024 logical interfaces. The router
supports most Layer 2 and Layer 3 protocols, route redistribution, tunneling, multicast, routine quality of service (QoS), and
security.
The LN1000-V router supports the Intelligent Platform Management Interface (IPMI) in accordance with the VITA 46.0
specification. The IPMI controller on the LN1000-V router is a secondary controller while the IPMI Shelf Manager operates as
the primary controller. The IPMI Shelf Manager is not supplied by default; it is available as an option.

EX Series Ethernet Switches

• EX Series switches:
• Up to 320 Gbps (full duplex)
throughput per line card
• Up to 12.4 Tbps (full duplex)
total throughput per chassis
• Up to 160,000 media access
control (MAC) addresses
' IIJi ' EX8208

EX2200-C -� ..
--··-- . ....
1:1===::l=•=••
EX4500
EX3200-48p
EX2200-24poe
EX4200-48p
EX2500
EX Series Ethernet Switches

Juniper Networks EX Series Ethernet Switches offer flexible, powerful, and modular platforms that deliver performance,
scalability, and high availability. You can deploy these products as a network access layer, as campus aggregation devices
(within high-density data centers), or as core switches.

EX Series-Data Plane Components

• Data plane components vary by model
• Refer to online documentation for model-specific
information
• Interfaces
• Ethernet 10/100/1000
• 10 Gbps uplink
• Redundancy options include:
• Virtual Chassis on EX4200
• SRE/RE on 8208 and 8216
• SF modules on 8208 and 8216
�!:;_;·:�: _·:;:: }:J��f. Wolldwide Education Services wwwJumpe,.net I SJ.
EX Series Architecture
The EX8200 line is midplane architecture, modular Ethernet switch that is designed for ultra high-density environments such
as campus aggregation, data center, or high performance core switching environments. Switch Routing Engines (SREs)
process all Layer 2 and Layer 3 protocols and manage individual chassis components, while the switch fabric module
provides the central crossbar matrix through which all data traffic passes. The SRE and switch fabric modules work together
to fulfill all RE and switch fabric functions.
Whereas each model uses different components to accomplish the switching and routing functions, visit www.juniper.neV
techpubs for detailed information about your specific hardware.
EX Series Interfaces
The line cards in EX8200 line switches combine a PFE and Ethernet interfaces onto a single card. All line cards are
hot-insertable and hot-removable.
EX Series Redundancy
Several different redundancy options exist for different switches. Visit www.juniper.neVtechpubs for detailed information
about your specific hardware.

SRX Series Services Gateway
• Security platforms:
• Range from 700 Mbps to 150 Gbps firewall throughput
• Range form 65 Mbps to 30 Gbps IPS throughput
• Interfaces - Ethernet, Serial, DSL, T1/E1
SRX650 SRX5800
Branch Devices SRX240

Data Center Devices
SRX210
SRX100
SRX110
SRX Series Services Gateway

Juniper Networks SRX Series Services Gateways are the next-generation solution for securing the ever increasing network
infrastructure and application requirements for both enterprise and service providers. SRX Series devices meet the network
and security requirements of data center consolidation, managed services deployments, and aggregation of security
solutions.
The SRX Series gateways enable secure deployment of a wide range of business and residential applications and services
ranging from small to large enterprises, at service provider premises and within data centers. The gateways offer native
support for firewalls, virtual private networks (VPNs), switching and carrier-class Ethernet routing, and IDSs.
SRX Series Interfaces

SRX Series gateways offer a wide range of interfaces for WAN connectivity. The available options are listed on the slide.
The SRX210 Power over Ethernet (PoE) feature simplifies IP phone, camera, and wireless support by delivering power to
those devices without any need for external power.

SRX Data Center Devices-Data Plane

Components
• I chip
• Media-specific ASICs reside on IOCs
• I chip resides on IOCs, NPCs, and SPCs
• Switch fabric resides on SCBs
• Switch Control Boards, power supplies, cooling systems
• Additional redundancy available through high-availability
clustering
Data Center SRX Series Architecture

These high-end SRX Series devices use the I chip architecture.
The I chip is present on IOCs, NPCs, and SPCs.
On the SRX5800 and SRX5600, the functions of the IOC and NPC are combined onto a single line card.
Data Center SRX Series Data Plane

Media-specific ASICs reside on IOCs along with 2 or 41 chips. Two Network Processing Units (NPUs) reside on the NPC on the
SRX3600 and SRX3400. Four NPUs are present on the SRX5800 and SRX5600 IOCs. NPUs are responsible for keeping
track of sessions. Each high-end SRX Series chassis must also have a minimum of one SPC card to function. In addition to 2
or 4 I chips, each SPC also contains 2 or 4 Service Processing Units (SPUs).
Data Center SRX Series Redundancy

In addition to redundant components such as SCBs, power supplies, and cooling systems, SRX Series platforms support
complete redundancy by grouping two like-devices into a cluster. The two nodes back each other up providing complete
redundancy for all RE functions and also data plane functions allowing stateful session failover.

SRX Branch Devices-Data Plane

Components
• RTOS
• Media-specific ASICs reside on PIMs
• PFE components are emulated within a single processor
using software
• SRX650 and SRX550 uses a Services and Routing Engine
• Component-level redundancy
• Redundancy available through high availability clustering
SRX Series Branch Device Architecture

Both the RE and the PFE functions are accomplished using software processes running a real-time operating system.
The SRX650 and SRX550 uses a Services and Routing Engine (SRE) for increased performance.
SRX Series Device PFE

Media-specific ASICs reside on PIMS.
All other data plane functionality is handled by individual processes within the RTOS.
SRX Series Device Redundancy

SRX Series platforms support complete redundancy by grouping two like-devices into a cluster. The two nodes back each
other up providing complete redundancy for all RE functions and also data plane functions allowing stateful session failover.

Junos T roubleshooting in the NOC
Summary
• Described the architectural philosophy of devices that run
the Junos OS and learned how this philosophy relates to
troubleshooting
• Described traffic processing for transit and exception traffic
• Described the function of the RE and the PFE within a device
running the Junos OS, along with the components of each
• Described FRUs
• Described current Junos product families and learned where
to go for detailed information about specific hardware
We Discussed:
The basic design architecture of devices that run the Junos OS;
Traffic processing for transit and exception traffic;
The major components of the RE and the PFE;
FRUs; and
Junos product families.

Review Questions
1. What is a FRU?
2. Where can you find a list of FRUs for a specific
Junos-based device?
3. What is the difference between hot-swappable and
hot-pluggable?
4. Why is it important to understand which
implementation of the PFE is implemented on a
particular device?
Review Questions
1.
2.
3.
4.
Chapter 3-56 • Junes Product Families www.juniper.net

Identifying Hardware Components Lab
• Perform initial hardware identification.

• Use online resources to find hardware information.
Identifying Hardware Components Lab

The slide provides the objectives for this lab.


l.
A field-replaceable wiit (FRLJ) is a component of aJunos-based device that can be added, removed, or replaced. It is the smalkst wiit
required for isolation of hardware problems.
2.
Detailed information for all hardware running the Jw1os OS can be obtained at www.jwuper.net/techpubs.
3.
Hot-swappable and hot-pluggable FRUs can both be added or removed without powering down the device. However, inserting or
removing hot-swappable FRUs (also referred to as hot-insertable or hot-removable FRUs) will not disrupt the global forwarding
function of the device - only services dependent on the FRU will be impacted. In contrast, inserting or removing hot-pluggable FRUs
will impact the global forwarding fwiction of the device, even if only momentarily.
4.
It can be beneficial to understand which version of the PFE is in use and where the individual subcomponents reside when interpreting
chip-specific messages or, as you will learn later, accessing microkernal for additional troubleshooting information.

JUnlf2v�f
Chapter 4: Troubleshooting Toolkit

Objectives
able to:
• Describe various tools that can be used to troubleshoot
devices that run the Junos operating system
• Explain JTAC recommendations for current best-practices
that facilitate troubleshooting
We Will Discuss:
Various troubleshooting tools supported by the Ju nos operating system; and
Juniper Networks Technical Assistance Center (JTAC) recommended configuration settings for ease of
troubleshooting.
Chapter 4-2 • Troubleshooting Toolkit www.juniper.net

Agenda: Troubleshooting Toolkit
7Troubleshooting Tools
• Best-Practices
Troubleshooting Tools
www.juniper.net Troubleshooting Toolkit • Chapter 4-3

The Junos CLI

• The Junos CU:
• A variety of operational mode commands report on
hardware, software, and protocol status
• Process restart and hardware online or offline
• HW redundancy control
• Network utilities
• Ping and traceroute utilities with a rich set of options
• Telnet. SSH, FTP. and SCP
• Monitor traffic (tcpdump)
The Junos CLI

The Junos command-line interface (CU) is the primary mechanism for troubleshooting and operational analysis. Using the
CU, it is easy to determine hardware, software, protocol, and general operational status. The following are some key CU
features:
Support for piped output to functions like count or match for all commands and in all modes (configuration or
operational mode);
The ability to restart software processes and take hardware online or offline;
The ability to control redundant hardware; and
Access to various network utilities like ping and traceroute, and the ability to monitor local traffic in a manner
similar to tcpdump.

Key Operational Mode Commands

• Key operational mode commands include:
•show chassis
• alarms.environment. firmware. fpc. hardware...
•show system
• alarms, statistics, storage, users...
•show route
• protocol. aspath-regex, community, hidden,
resolution, receive-protocol, detail...
•monitor traffic
•show interfaces
• terse, detail, filters, policers...
•monitor interface
Key Operational Mode Commands

Depending on the type of problem with which you are dealing, numerous Junos CU commands might exist that can assist
you in problem determination. The slide calls outs the main classes of operational mode commands that prove particularly
useful in most troubleshooting situations:
The various show chassis commands are well suited to assisting you in performing operational and fault
analysis of hardware-related issues.
The family of show system commands are useful in detecting configuration and operational status of system
protocols and users.
The show route commands are invaluable when testing the control plane to determine what routes are
present, from where the router learned of them, and where they direct matching traffic.
The monitor traffic command makes tcpdump protocol analysis capabilities for local traffic available to
the user.
The show interfaces commands are useful when your focus is on physical or link-level operational
analysis, and when you suspect interface hardware-related faults.
The monitor interface command provides detailed, real-time snapshots of the traffic patterns, error
counts, and alarm status for the monitored interface.

Restarting a Software Process

• You can restart most software processes from the CU
• Restarting other processes requires escape to a shell
user@router> restart?
Possible completions:
adaptive-services Adaptive services process
ancpd-service Access Node Control Protocol Process
audit-process Audit process
auto-configuration Interface Auto-configuration
captive-portal-content-delivery Captive-portal-content-delivery process
chassis-control Chassis control process
redundancy-interface-process Redundancy interface management process

remote-operations Remote operations process
routing Routing protocol process
user@router> restart routing

Routing protocols process started, pid 5042
! ... " - � "'� t' ' ---
;e20J.4�:,.;N��·,nc.Ailrillllt>reseMOd..
��'---<'L--- •
JUntPer
• -...,_...
WorldwideEducationServices
• ..
WWW.JUmper.net I 6
Restarting Software Processes

You can restart most Junos OS processes from the CU. This capability leverages the modular nature of the Junos OS and
avoids the need for a system reboot when a particular process encounters a problem.
Processes not listed in the CU output, such as the ini t process (which is the meta-process that controls the starting of all
other processes), require you escape to a shell to restart them. It is also necessary to escape to a shell to pass the process a
signal such as a kill -1 (SIGHUP). The kill -1 signal forces that process to reread its configuration file but does not
terminate the process.
When restarting a process, the default behavior is a soft kill, or graceful shutdown, in which the process receives a signal
that it should terminate but is given time to clean up its state first. In contrast, a hard kill is equivalent to issuing a kil.l. -9
<pid>, in that it terminates the process immediately.
The init process restarts any process that has failed, so after killing a process, a new instance of that process starts.
However, if a process fails repeatedly in rapid succession, the ini t process disables it to prevent thrashing. Once init
disables a process, you must reboot, or force init to reread its configuration before it allows that process to restart. Issuing
a commit with the hidden ful.l. option passes the init process a SIGH UP that causes it to restart all configured
processes, regardless of previous thrashing behavior. However, if the process still thrashes, ini t disables it.

Bouncing an rpd Component
• The rpd process handles all routing protocols

• Bouncing rpd with a restart routing command
disrupts a// rpd processes
• Use deactivate to bounce a specific rpd component the
example bounces BGP while leaving OSPF untouched:
[edit]
user@routeri show protocols [edit]
bgp { user@router# deactivate protocols bgp
group test {
vpn-apply-export; [edit]
user@router# commit
commit complete
ospf {
area 0.0.D.O { [edit]
interface ge-0/3/0.0; user@routert rollback 1
interface at-0/1/0._00; load complete
interface so-0/2/0.0;
[edit]
user@router# commit
commit complete
Bouncing a Component of rpd

Currently, the routing protocol daemon (rpd) process is responsible for handling all routing protocol functions. If you detect a
problem in the OSPF protocol, for example, then a restart routing command might resolve the issue. The problem is
that restarting routing affects all routing protocols, which includes BGP, Intermediate System-to-Intermediate System (IS-IS),
RIP, and so on.
When the goal is to minimize overall disruption (which it always is), you might consider the technique shown on the slide,
which involves deactivating a particular protocol, rather than restarting all routing functionality. The downside to this
approach is that configuration privileges are necessary.
The example on the slide shows the operation bouncing BGP by deactivating the bgp stanza and issuing a commit. During
the process, the OSPF protocol remains untouched and continues to operate as before. After the commit and a rollback
1, the user issued another commit that restored the bgp stanza to its previous (active) state. The BGP protocol now
initializes, just as if you had restarted the rpd process. Rather than using the rollback function, you can also issue an
activate protocol bgp command from the [edit] hierarchy, followed by a commit to achieve the same results.
www.juniper.net Troubleshooting Toolkit • Chapter 4- 7

Full Commits
• Juniper Networks optimized the commit function
• Goal is to avoid disruption to processes not affected by a
configuration change
• The hidden full option affects all processes
• Forces reread of configuration, reactivating the entire
configuration
• An excellent way to restart a process that is disabled
because of thrashing
..·..··..
[edit]
Hidden option
user@router! commit full
commit complete
·.. .
[edit] ·· ..
user@routeri ·····... ··
············· ....
Performing a Full Commit

The Junos OS optimizes the process of committing a candidate configuration so it does not disrupt processes when their
portion of the configuration has not changed. Although a great idea, a rare situation can exist in which a particular process
fails to wake up with a commit, and as a result, the modified configuration does not go into effect.
By including the hidden full option, when issuing a commit, you force all processes to reread their configuration, which
ensures the honoring of changes. A commit full also signals the ini t process with a kill -1 (SIGH UP) that forces it to
reread its configuration.
Shaking It Up
Because a full commit places a processing strain on a router with a complex configuration, you should perform a full commit
only when conditions warrant.

Hardware Restart
• You can restart FPCs and PICs or bring them offline or
online using the CU:
user@router> request chassis ?
Possible comple1:ions:
cfeb Change Compact Forwarding Engine Board status
pie Change Physical Interface Card status
roucing-engine Change Routing Engine status
user@router> request chassis cfeb ?

Possible comple1:ions:
master Set CFEB mastership
offline Take CFEB offline
online Bring CFEB online
restart Restart CFEB
user@router> request chassis cfeb offline

CFEB Offlined
user@router> show chassis alarms

2 alarms currencly active
Alarm time Class Description
2013-01-08 17:40:40 UTC Major CFEB not cnline, the box is not forwarding
2013-01-08 16:47:22 UTC Minor Host O Boot from alternate media
Hardware Restart
The slide shows how you can use the Junes CLI to take a Compact Forwarding Engine Board (CFEB) (in some models),
Flexible PIC Concentrator (FPC), or PIC offline and online. In some cases, you can clear problems by bouncing a piece of
hardware, which means taking the device offline and then bringing it back online again.
The commands shown on the slide have the same effect as if you depressed the CFEB offline button on the physical router
to bring it offline.

Network UtHitie�s_....Ping and Traceroute

user@router> ping?
<host:> Hostname or IP address of remote host
atm Ping remote Asynchronous Transfer Mode node
bypass-rout:ing Bypass routing table, use specified interface
clns Ping ISO node
coun't Number of ping requests to send (l .. 2000000000 packets)
detail Display incoming interface of received packet
do-not-fragment Don't fragment echo request packets (IPv4)
ethernet Ping to an ethernet host by unicast mac address
inet Force ping to IPv4 destination
inet6 Force ping to IPv6 destination
mac-address �JtC address of the nexthop in xx:xx:xx:xx:xx:xx forniat

mpls Ping label-switched path
no-resolve Don't at:tempt to print addresses symbolically
pattern Hexadecimal fill pattern
rapid Send requests rapidly (default count of 5)
record-rom;e Record and report packet's path (IPv4J Highlighted options are
routing-instance Routing instance for ping attempt particularly useful for
size Size of request packets (0 ..65468 bytes) fault isolation
source Source address of echo request
strict Use strict source route option (IPv4)
+ strict-source Intermediate strict source route entry (IPv4)
cos IP type-of-service value (0.. 255i
Network Utilities-Ping and Traceroute

As you might expect, the Junos OS supports standard network utilities like ping and trace route. As shown on the slide (for the
case of ping) these utilities support a rich set of optional options that can prove especially useful when troubleshooting. The
following are some of the key options:
atm: Generates special Asynchronous Transfer Mode (ATM) pings that use Operation, Administration, and
Maintenance (OAM) cells.
count: Limits the number of ping attempts.
do-not-fragment: Useful in diagnosing maximum transmission unit (MTU)-related problems by preventing
the fragmentation of large packets.
pattern: By altering the payload of ping packets, you can detect error conditions that are triggered by data
patterns.
record-route: Allows you to trace the set of egress interfaces the packet encounters. Note that this process
differs from traceroute, which displays the set of ingress interfaces.
routing-instance: Use this option to provide routing instance and virtual private network (VPN) context for
a ping (or similar) command. By default, a command is issued in the context of the main routing instance unless
you use this option.

Network Utilities-Ping And Traceroute (contd.)
size: By altering the size of packets, you can detect MTU-related and capacity-related problems.
source: This option lets you control the source address placed in the resulting packet. This capability can help
diagnose routing problems because you can make the packet appear to come from any address owned by the
device (spoofing is not permitted).
tos: This option lets you alter the type-of-service (ToS) bits in the packet when testing a class-of-service (CoS)
issue.

Network Utilities-Telnet, SSH, and FTP

• Telnet, SSH, and FTP support
• You must enable the related service under the [edit
system services] hierarchy to support incoming
connections
user@router> telnet ?
<host> Hostname or address or remote host
8bit Use 8-bit data path
bypass-routing Bypass routing table, use specified interface
inet Force telnet to IPv4 destination
inet6 Force telnet to IPv6 destination
interface Name of interface for outgoing traffic
loaical-svstem Name of logical system
no-resolve Don't attempt to print addresses symbolically
port Port number or service name on remote host
routing-instance Name of routing instance for telnet session
source Source address to use in telnet connection
Network Utilities-Telnet, SSH, and FTP

The Junos OS offers support for Telnet, SSH or secure copy (scp), and FTP. As with the ping and traceroute utilities, these
applications support options that are useful in troubleshooting. The following are some of the key options:
no-resolve: This option disables the normal reverse lookup performed on the host address specified in a
telnet command. Use this option when sessions take a long time to open because of the inability to perform the
reverse lookup.
port: The port option allows you to specify a destination port other than the default port normally associated
with that service.
routing-instance: This option supports VPN and routing instance context for applications like Telnet and
FTP. A classic use would be to establish a Telnet connection from a provider edge (PE) router to an attached
customer edge (CE) device, which, being part of a VPN, would reside in a specific routing table and instance.
source: As with ping, altering the source address used in a connection request might uncover problems with
routing that prevent connection establishment when sourcing traffic from the egress interface (the default).

Monitor Traffic
• The monitor traffic command provides CLI
access to the tcpdump utility
• Displays traffic only originating or terminating on the local
Routing Engine
user@router> monitor traffic interface se-1/0/0 detail
Address resolution is ON. Use <no-resolve> to avoid any reverse lookup delay.
Address resolution timeout is 4s.
Listening on se-1/0/0, capture size 1514 bytes
02:18:43.121184 In IP (tos OxcO, ttl 1, id 21998, offset 0, flags [none],

proto: OSPF (89), length: 68) 172.18.36.2 > 224.0.0.5: OSPFv2, Hello, length 48
Router-ID 192.168.36.1, Backbone Area, Authentication Type: none (0)
Options [External]
Hello Timer 10s, Dead Timer 40s, Mask 255.255.255.252, Priority 128
Neighbor Lise:
192.168.24.l
02:18:46.280403 Out LCP, Echo-Request (Ox09), id 177, length 10

encoded length 8 (=Option(s) length 4)
Magic-Num Ox92da0b79
Monitor Traffic
The monitor traffic command provides CU-based access to the tcpdump utility. This command monitors only traffic
originating or terminating on local the routing engine. This capability is the best way to monitor and diagnose problems at
Layer 2 with the Junos OS because tracing, which is similar to debug on equipment from other vendors, does not function for
Layer 2 protocols.We cover tracing on subsequent pages that deal with system logging.
Note that protocol filtering functions (for example, matching on only User Datagram Protocol (UDP) traffic sent from a
specific port) are currently not supported for real-time monitoring because in real-time mode, the Layer 2 headers are
stripped at ingress, which prevents filtering on protocol types. As a workaround, you can write the monitored traffic to a file
using the hidden write-file and read-file options and then read the file with a tcpdump-capable application like
Wireshark.

The Craft Interface Display

• LED with LCD screen indicates system and hardware
status
• Can view remotely with show chassis craft
interface command
user@router> show chassis craft-interface
!Red alarm: LED on, relay on 1 •········· ·.. ····· . .

·· .
�ellow alarm: LED off, relay off :
Routing Engine OK LED: On
Routing Engine fail LED: Off
FPC status
FPCs O
Green
· Red alarm active
·· · ·· ·· · · · ··· ··· +Red
· · ··· · ··
LCD screen: +--------------------+
Host
1 Alarm active
R: Supply A FAIL
+...________...........�... + ..... ...··
.
The Craft Interface

The craft interface panel for systems that support the LCD status screen is an excellent troubleshooting and operational
analysis tool because it provides component and system alarm status in a manner that is easy to interpret. When working
remotely you can issue a show chassis craft-interface command to obtain an ASCII representation of the LEDs
and messages the craft interface displays.

Displaying a Message on the Craft

Interface
• Can display a user-defined message on the craft

interface panel's LCD screen
• Useful for identifying the correct system when relying on
remote hands
• User message alternates with normal display for five
minutes
• Maximum of four lines with a 20-character limit per line
user@rout:er> set chassis display message "M320 unit for RE swap"
message sent
user@router> show chassis craft-interface

Red alarm: LED off, relay off
Yellow alarm: LED off, relay off
LCD screen: +-------------------+

I "M320 unit for RE I
I swap" I
+-------------------+
Displaying Messages on the LCD Screen

Displaying messages on the craft interface panel's LCD screen can be helpful when you want to identify a system or
communicate in some way with a person local to that device. By default, the custom user message alternates with the
normal LCD message display (system status messages that alternate every few seconds). Use the permanent option with
the set chassis display operational mode command to force only the display of the custom message.
Note that the custom message times out after five minutes, and the display returns to the default system status message
rotation. This command is applicable only to platforms that have an LCD screen.

Syslog
• Syslog:
• Standard UNIX syslog configuration syntax
• Primary syslogfile is /var/log/messages
• Most processes also write to individual log files
• Supports numerous facilities and severity levels
• The facility defines the class of log message. whereas the severity
level determines the level of logging detail
• Local and remote syslog support
• We recommend remote logging (and archiving) for troubleshooting
Syslog
Syslog operations use a UNIX syslog-style mechanism to record system-wide, high-level operations, such as interfaces going
up or down or users logging in to or out of the router. You configure these operations by using the sysl.og statement at the
[edit system] hierarchy level and the options statement at the [edit routing-options] hierarchy level.
The results of tracing and logging operations go in files the router stores in the /var I log directory. You use the show l.og
fil.e-name command to display the contents of these files.

Process and Miscellaneous Log Files
• Key process and miscellaneous log files include:

• apsd: Automatic protection switching process
• bf dd: Bidirectional failure detection process
• chassisd: Chassis management process
• cosd: Class-of-service process
• dcd: Device control process
• eccd: Error checking and correction process
• sampled: Sampling process (cflowd)
• snmpd: SNMP process
• vrrpd: Virtual Router Redundancy Protocol process
Process and Miscellaneous Log Files

The primary system log file is the messages file. However, some of the processes that run under the Junos OS maintain
their own log files named after their respective process. No requirement exists to configure the router to keep these logs.
Note that in many cases, the software also writes the entries found in these logs to the messages file. Key process log files
include the following:
apsd: The Automatic Protection Switching (APS) process handles events relegated to SON ET APS. View this log
when you are dealing with an APS issue.
bfdd: The bidirectional failure detection process functions to provide rapid detection of failures in the data
plane to expedite routing protocol convergence.
chassisd: The chassisd process is responsible for monitoring and managing the hardware present in the
physical router chassis, including application-specific integrated circuits (ASICs), power supplies, fans, and
temperature sensors, as well as managing hot-swap events.
cosd: The Cos process monitors class-of-service events in the chassis.

Process and Miscellaneous Log Files (contd.)
dcd: The device control process communicates with the Packet Forwarding Engine (PFE) to track the status and
condition of the router's interfaces. The dcd configures interfaces on the basis of information in the
configuration file and the hardware present in the device. You can configure physical interfaces before the
hardware is present; likewise, a router can contain unconfigured FPCs and PICs. Check the dcd log for
interface-related entries when troubleshooting interface problems.
eccd: The error correction control process deals with memory errors. If you suspect bad or failing memory,
check this log.
mastership: The mastership log records events related to hardware redundancy.
mgdd: The management process controls the CU process. No log file associated with this process exists.
sampled: The sampling process handles tasks related to packet sampling. Check this log when
troubleshooting or monitoring a sampling configuration.
snmpd: The Simple Network Management Protocol (SNMP) process handles tasks related to SNMP. Check this
log when troubleshooting or monitoring SNMP. Note that wherever possible, the SNMP iflndex values are
persistent across reboots or in the event of hardware additions and deletions that result from PIC or FPC
insertion and removal. This persistence is the default behavior and is achieved by storing SNMP indexes in the
/var/db/dcd.snmp_ix file.
vrrpd: The Virtual Router Redundancy Protocol (VRRP) process handles the activities related to this protocol.
Check this log when troubleshooting or monitoring VRRP.
The entries written to individual process log files also write into the main syslog file (mes sages). Generally speaking, you
begin by analyzing the messages file for signs of trouble. Once you identify trouble relating to a particular process, you can
parse or monitor the files of that process to reduce the amount of information you must go through.

Interpreting Syslog Messages

• Standard log entries consist of the following fields:
• Timestamp, platform name, software process name or PIO,
a message code. and the message text:
Apr 29 09:43:08 host chassisd[2320]: CHASSISD_FRU_EVENT:
scb recv slot detach: FPC 1 detach
• Using explicit-priority alters the message format to

include a numeric priority value:
Apr 29 09:41:27 1%DAEMON-5-CHASSISD_FRU_EVENT:!host chassisd[2320]:
scb recv slot detach: FPC 1 detach
• Consult the System Log Messages Reference

documentation for details on log entries
• Use help sys log message-code for help in decoding
message codes
Interpreting System Log Entries

When using the standard syslog format, each log entry written to the messages file consists of the following fields:
timestamp: Time of logging the message.
name: The configured system name.
Process name or PIO: The name of the process (or the Process ID [PIO] when a name is not available) that
generated the log entry.
message-code: A code that identifies the general nature and purpose of the message. In the example shown,
the message code is CHASSISD_FRU_EVENT.
message-text: Additional information related to the message code.
When you add the explicit-priority statement, the syslog message format alters to include a numeric priority value.
In this case the value O is for the most significant and urgent messages (emergency), while 7 denotes debug level messages.
Consult the System Log Messages Reference documentation for a full description of the various message codes and their
meanings-better yet, use the CLl's help function to obtain this information.

Tracing
• Tracing decodes protocol packets and certain router
events:
• Some other vendors refer to tracing as debug
• Tracing operations include:
• Global routing behavior
• Router interfaces
• Protocol-specific information
Tracing Operations
Tracing operations allow you to monitor the operation of routing protocols by decoding the sent and received routing protocol
packets. In many ways, tracing is synonymous with the debug function on equipment made by other vendors. Note that
because of the design of hardware-based Juniper Networks platforms, you can enable reasonably detailed tracing in a
production network without negative impact on overall performance or packet forwarding.

Tracing Overview
• Tracing is the Junos OS equivalent of debug
• You can enable tracing on a production network
• Requires configuration
• Can trace multiple options (flags) to a single file
• Generic tracing configuration syntax:
••....•.•.• ·· ·•··••·····•· •····•· .. Th e protocol or function being traced
... .•
,;
[edit protocols protocol-name]
user@router# show . ··•·· ····•·•· · ·· ·•··•·• ·· ·•······ · ·•····· •••· ·• W
•·•..• here to write the trace results
traceoptions { ....·
file filename [size size] [files number]
[world-readable I no-world-readable);
flag flag [flag-modifier] [disable];
} Flags identify what aspects of
· ..............................................................
tile protocol the software traces
and at what level of detail
ill:,
• JUfil� �w.;'ddwide Education SelVices www,urnpe.-.net I 21.
�� ,,ix.}J;;',j "" ,. �-
Hear Tracing and Think Debug

Tracing is the Junos OS term for what other vendors sometimes call debug. In most cases when you enable tracing (through
configuration). you create a trace file that stores decoded protocol information. You analyze these files using standard CU
log file syntax like show log 1.oqfil.e-naIIJE. Because of the design of Juniper Networks routing platforms, you can
enable detailed tracing in a production network without significantly impacting performance. Even so, you should always
remember to turn tracing off once you complete your testing to avoid unnecessary resource consumption.
Generic Tracing Configuration

The slide shows a generic tracing stanza that, if applied to the [edit protocols J portion of the configuration hierarchy,
would result in tracing of the specified routing protocol's events. Specified routing protocol tracing operations track the
flagged routing operations and record them in the specified log file.

Generic Tracing Configuration (contd.)

The following are configuration options for tracing:
fil.e £il.ename: Specifies the name of the file in which to store information.
size size: Specifies the maximum size of each trace file, in kilobytes (KB), megabytes (MB), or gigabytes
(GB). When a trace file named trace-file reaches this maximum size, it's compressed and renamed to
trace-file. o. gz. When the trace file again reaches its maximum size, trace-file. O. gz is renamed
trace-file .1. gz, and trace-file is compressed and renamed trace-file. O. gz. This renaming scheme
continues until it reaches the maximum number of allowable trace files. The software then overwrites the
oldest trace file. If you do not specify a maximum number of trace files with the fil.es option, the default
number of files to keep is ten. If you specify a maximum file size, you also must specify a maximum number of
trace files with the files option. You can use�.�· or �g to specify kilobytes, megabytes, or gigabytes,
respectively. The default size is 128 KB.
fl.ag £1.ag. Specifies a tracing operation to perform. You can specify multiple flags.
fil.es number: Specifies the maximum number of trace files. When a trace file named trace-file
reaches its maximum size, the Junos OS renames it trace-file. 0, then trace-file.1, and so on, until it
reaches the maximum number of trace files. The software then overwrites the oldest trace file. The default is
ten files.
Including the traceoptions statement at the [edit interfaces interface-name] hierarchy level allows you to trace
the operations of individual router interfaces. You can also trace the operations of the interface process, which is the device
control process.
When tracing a specific interface, the software does not support the specification of a trace file. The Junos kernel does the
logging in this case, so the software places the tracing information in the system's messages file. In contrast, global
interface tracing supports an archive file; by default it uses /var/log/dcd for global interface tracing.

Protocol Tracing
• Include the traceoptions statement at the [edit
protocols protocol-name] hierarchy
• Useful when troubleshooting configuration and
interoperability problems
• Search for Baseline Options at
www. juniper. net/techpubs I software/nag for
protocol-specific traceoptions setup
Protocol Tracing
You trace the operations of a specific protocol by including the traceoptions statement at the [edit protocols
protocol-name] hierarchy. In most cases you should be selective in what you trace because selecting the all keyword
can overwhelm you with endless lines of text.
Visit www.juniper. net/techpubs/software/nog and search for Baseline Operations Guide, then Search Log
Messages, then Track Error Conditions for a complete list of protocol-specific traceoptions setup flags.

Protocol Tracing Example

•Atypical OSPF tracing configuration along with sample
output:
[edit protocols ospf]
user@router! show
traceoptions {
file ospf-trace;
flag hello detail;
flag lsa-request detail;
flag lsa-update detail;
user@router> show log ospf-trace
Oct 9 22:41:45.233671 OSPF built router LSA, area 0.0.0.1

Oct 9 22:41:45.233715 ospf_set_lsdb_state: Router LSA 192.168.24.1 adv-rtr
192.168.24.1 state GEN_PENDING->QUIET
Oct 9 22:41:45.233732 OSPF built router LSA, area 0.0.0.11
Oct 9 22:41:45.233865 OSPF sent Hello 10.222.100.1 -> 224.0.0.5 (ge-
0/0/3.100, IFL 70)
Oct 9 22:41:45.233885 version 2, length 44, ID 192.168.24.1, area 0.0.0.1
Sample Output
The sample OSPF stanza on the slide reflects a typical tracing configuration that provides details about important events like
hello message or OSPF link-state advertisement (LSA) details. In most cases you should use the detail option with a given
protocol flag for the added information often needed in troubleshooting scenarios. Search for baseline options at
http: I /www.juniper.net/techpubs/software/nog for protocol-specific options.
The slide shows a sampling of the results obtained with the tracing configuration. As with any log file, enter show file
trace-fil.e-name to view the decoded protocol entries. The sample trace output reflects the receipt of an OSPF hello
message from 10.222.100.1 and goes on to show some of the hello protocol parameters.

Monitor Log or Trace Files in Real Time
• Monitor a log or trace in real time with the CLl's

monitor command:
user@router> monitor start fiieDa1I1e
• Shows updates to monitored files until canceled

• Works with CLI pipe command for limiting output
• Use Esc + q to enable or disable real-time output to screen
• Issue a monitor stop to cease all monitoring
Monitoring Logs and Trace Files

Use the monitor CL! command to view real-time log information. You can monitor several log files at one time. You can
identify the messages from each Jog by filename, where filename is the name of the file that displays entries. This line
displays initially and when the CL! options between log files.
Using Esc+q enables and disables syslog output to screen; usingmonitor stop ceases all monitoring. Note that you can
use the CLl's match functionality to monitor a file in real time, while displaying only entries that match your search criteria. To
make use of the functionality, use a command in the following form:
user@host> monitor start messages I match fail

Stopping Tracing
• To stop a tracing operation, delete a trace flag or the
entire stanza:
[edit protocols ospf traceoptions]
user@router# delete flag hello
• Log and trace file manipulation:

• Use the clear command to truncate (clear) log and trace
files:
user@router> clear log fiiename
• Use the file delete command to delete log and trace

files:
user@router> file delete xiiename
Stopping Tracing Through Configuration

If you do not delete or disable all trace flags, tracing continues in the background, and the output continues to write to the
specified file. The file remains on the routing engine hard disk until it is either deleted manually or overwritten according to
the traceoptions file parameters. To disable all tracing at a particular hierarchy, issue a delete traceoptions
command at that hierarchy and commit the changes.
Log and Trace File Manipulation

To truncate files used for logging, use the clear log £iiename command. To delete a file, use the file delete
command. You can also use wildcards with delete, compare, copy, list, and rename operations.
Be careful using the delete option. Log information cannot be written to a file or displayed to the screen if the destination
file does not exist. To accomplish the UNIX equivalent of a file touch to recreate the file, use the deactivate command to
temporarily disable traceoptions. Alternatively, use commit full.

The Interactive Shell
• Interactive UNIX shell support:

• CLI users can escape to an interactive shell when permitted
by their login class
• Juniper Networks does not support the shell, and it is
potentially dangerous
• Use only under JTAC guidance
•Somethingsto do when in a shell:

• Access standard UNIX utilities such as Is, tar, gzip, vi,
tcpdump, and so forth
• Display and modify kernel variables using sysctl
• Establish connections (vty or cty) to data plane components
to display NVRAM and other diagnostic data
Interactive Shell Support

Based on a FreeBSD operating system, the Junos OS CU supports an escape to a UNIX-style shell. Although the possibilities
can seem endless, we stress that designers highly customized the Junos OS, and did not design it to act as a Web server or
other type of UNIX device. You can do serious damage to the Juniper Networks platform if you do not observe great care and
caution when operating in the shell. Access to the shell is controllable through login class permissions. Once in the shell, you
can su to root, if you know the root password, or if you have not set it.
Juniper Networks does not officially support use of the shell because the CU offers all you should need in normal
circumstances. For advanced troubleshooting activities, or for advanced functionality like automated shell scripts (for which
Juniper Networks support is not expected nor sought), the shell can be a real boon.
Users who wish to add production scripting functionality to their networks should consider operational scripts, commit
scripts, and event scripts. The coverage of these scripts is outside of the scope of this course.
From a troubleshooting and operational analysis perspective, a few good reasons for escaping a shell exist. These reasons
include the following:
Access standard utilities and programs like tar, gzip, top, ps, kill, vi, and so on, offer experienced UNIX
users the tools they need to perform advanced troubleshooting tasks like compressing a core file or manually
editing a configuration file when the CU is not available;
Use sysctl to access and modify (under the guidance of JTAC) various kernel parameters like TCP window
sizes, the number of available protocol sockets, and so on; and
Establish a connection to the embedded hosts (controllers) within the data plane to access diagnostic and log
data held in NVRAM.

Connecting to Data Plane Components

r-=-l ····· ..
System midplane
Ethernet or console-based ................. i_____:_:___J
communications path """ i-
- ., p�-· ---cp=·-. �t_ 1----..,
.-·· � -- , f�-·--.,
: Host modules comprising
C':p• p .. :P ..
············ N. c:,c PFE ,C ·•····•••·•
...
the data plane complex
��'
T_:_;�O�.·�
�: '1�___ ___:_n_---1
. ••
......··
• Many data plane components run their own
microkernel that can be accessed for additional
troubleshooting information
• Used to display NVRAM crash data, perform module
diagnostics, and so forth
• Access using CU command request pfe execute command
• Access also available through the interactive shell using an
Ethernet (vty) or Console (cty) connection
Connecting to Data Plane Components

You can use the internal connectivity between the routing engine and data plane to establish connections to embedded
hosts (controllers) within the data plane. The term embedded host refers to a data plane component with its own
microprocessor and microkernel. Examples include system Control Boards and FPCs.
In most cases, the only reason to connect to a data plane component is to access diagnostic information in the form of log
entries or core files retained in the affected component's NVRAM. Use the request pfe execute command to connect
to an embedded host and access this information.
In some cases, you might need to access data plane components using the interactive shell through a virtual teletype
terminal (vty) connection over an internal Ethernet communications channel. Some platforms also support console
(asynchronous) access using a serial-type of connection known as console teletype terminal (cty). You should only access
the interactive shell at the direction of JTAC.
By parsing entries in the syslog, you can determine what PFE component has reported a crash, and therefore to which
embedded host you must connect to obtain crash and log data for submission to JTAC.

Core Files
• Modern computing environments are complex and,
therefore, have complex bugs
• Transient software failures are extremely hard to reproduce
and, therefore, difficult to fix
• Hardware errors can also trigger software failures
• Well-written code dumps a core file for diagnostic analysis
when a fatal fault (panic) occurs
• The stack trace identifies the name of the offending process.
memory pointers. and register data at the time of the fault
• In the Junos OS numerous entities can dump a core at panic
or upon command
• The kernel. software processes. and embedded hosts in the data
plane
Complexity of Modern Computers and Operating Systems

The complexity of modern computers and operating systems leads to equally complex bugs. It is very difficult to diagnose
transient software failures (for example, a random crash or reboot), because so many potential causes for these types of
faults exist. In most cases, a crash is the result of a programming error or the failure to anticipate a particular set of events
and the software interaction that ensues. However, a crash can also stem from hardware-related causes. In the latter case, a
memory error might corrupt a memory pointer or result in an illegal instruction.
Because transient software failures are so difficult to diagnose, well-written code incorporates the ability to dump the
program's environment in the form of memory pointers, instructions, and register data to a file in the event of a panic or
other serious malfunction. A software engineer using a debugger and a version of the executable containing debugging
symbols can analyze the resulting core file. The result of this analysis is generally a very good idea of the sequence of events
that led to the crash, and armed with this information, you can take corrective actions. For example, you can perform a
software patch or hardware Return Materials Authorization (RMA).
Although it might sound bad, it is actually quite beneficial that the Junos OS has the ability to dump various types of core files
for diagnostic use. In most cases, core files generate automatically as a result of a failure, but you can also generate cores
on demand. The Junos OS can generate core files relating to the Junos kernel itself, to the processes that run above that
kernel, or to the embedded host modules within the data plane.

Three Types of Core Files

• Technical support engineers deal with three types of
core files:
• Ju nos kernel (also known as RE cores)
• Written to /var I crash
• Ju nos processes
• Written to /var /trnp
• Embedded host cores (also known as PFE cores)
• Stack traces are written to NVRAM on the affected component or
system board
• Also copied to /var/crash when chassis dump-on-panic
is enabled
• Use the CU command show system core-dumps to
determine if any core files are present
Three Types of Core Files

Juniper Networks support engineers typically deal with three types of core files. These files are the following:
The Junos OS kernel (RE) cores: A kernel core file is left by the Junos kernel when it encounters a panic
condition. The software also saves a copy of the virtual memory state (which can be quite large). Core files
created by a kernel panic are stored in the
/var I crash location when you enable the system dump-on-panic option (hidden) at the [edit system]
hierarchy. The software enables this option by default.
Junos OS process cores: Each process, such as the chassis management or automatic protection switching
processes (chassid or apsd), is capable of leaving a core when a panic occurs. Core files generated by a
process are stored in the /var/tmp directory. This behavior is the default in all Junos OS releases.
Embedded host (PFE) cores: Various components in the data plane contain their own microprocessors that run
a microkernel. Examples include the CFEB on M7i and M10i platforms, FPCs, the Forwarding Engine Boards
(FEBs) on the M120, and others. Each of the data planes embedded hosts is capable of dumping a core file
when a crash (panic) occurs. When a PFE component dumps a core, the resulting stack trace writes into that
component's NVRAM. If you enable chassis dump-on-panic (hidden) at the [edit chassis J hierarchy, a
copy of the core is also stored in the /var I crash directory on the routing engine. We recommend this option,
and it is the default.

Forcing Cores
• Forcing a running process to write a core can help
diagnose certain problems
• Use the hidden CU command request system core
dump to force a core dump
• Use with caution! The software creates a copy of the running
process: this copy can result in excessive memory paging if the
memory footprint of the process is large
• JTAC might direct you to force a core from the shell
• The default behavior can be modified to suspend the process
during core writing
• Uses less memory. but process suspension can lead to other
problems
Forcing Process Cores

In certain rare situations, a Juniper Networks software engineer might want to obtain a core file from a process that appears
to be running normally. Note that forcing software processes to write cores might impact system performance and operation.
Only perform these steps under the guidance of JTAC.
In most cases, you obtain a running core file by using the hidden CLI command request system core-dump
process-name. By default, this process forks off a copy of the running process (a running core), which has the upside of
leaving the original process free to do its process duties. The downside is if the process in question is large (for example, rpd)
it might tax system memory, because the system must support two instances of that process. A system that is low on
memory begins paging to the swap file and this procedure can slow things down.
JTAC might direct you to force a core from a root shell using the gcore utility. The main advantage to this approach is you
can instruct gcore to suspend the process in question during the core dump. Because the software does not create a copy
of the process, less taxation occurs on the system's memory. However, because the process suspends during what can be a
somewhat lengthy period (10 seconds or so for a busy system with a large process), other problems might occur.

Agenda: Troubleshooting Toolkit
• Troubleshooting Tools
� Best-Practices
Best-Practices

Best-Practices
• Take the following best-practice steps before a
problem occurs
•Setup an out-of-band management network
•Setup system logging for remote logging
•Setup clock synchronization
• Establish a baseline for reference
Recommended Best-Practices
We recommend several best-practices where network resources and topology allow. We cover each of these topics in more
detail on subsequent slides.

Out-of-Band Ma.nagement Network
• An out-of-band management network is critical in

ti mes of network outage
• Built in out-of-band support with the f xpO interface
• Juniper Networks does not support transit routing over fxpO
• Define a backup-router to support out-of-band routing
when rpd is not running
• Mark the default route used for the out-of-band network as
no-readvertise
• Enable remote access services (Telnet, SSH, or FTP) only as
needed
• Console access recommended for maintenance activities
Deploying an Out-of-Band Management Network

Relying on in-band methods to manage your network might seem like a good idea up until the point a circuit or hardware
outage prevents you from accessing your network and, as a result, prolongs corrective actions. We highly recommend
deploying an out-of -band management network because it provides you with a back door into your network during times of
outage or disruption.
Many Junos platforms come with a built-in out-of-band interface in the form of fxpO. Note that fxpO is an out-of-band
interface because transit traffic cannot be routed over this interface. Put another way, if a packet arrives on fxpO it can
never egress on another data plane interface, and vice versa. Because of this behavior, we do not recommend running a
routing protocol over the fpxO interface in most cases. Instead, we recommend a static route flagged with
no-readvertise. This flag ensures the static route used for out-of-band connectivity does not advertise over any routing
protocol.
We also recommend the use of a backup-router, especially when your hardware supports redundant routing engines.
You use the backup router entry whenever rpd is not running, such as in the case of a backup routing engine or a system that
has had rpd shutdown because of thrashing.
Your out-of-band connectivity should provide both Ethernet (fpxO-based) and console access to your routers. You normally
gain console access through some type of terminal server. We recommend console access whenever you perform serious
maintenance activities, like upgrading or downgrading the system software, because if something goes wrong, or the system
somehow returns to a factory default, you might no longer have Ethernet-based access to the system. Having console access
is the only way you can reload software from removable media or recover a lost root password.

Out-of-Band Management Network

Configuration
[edit]
user@router# show I no-more
system I
backup-router 10.210.15.254 destination 10.210.15.0/24; I
services {
ftp;
ssh;
telnet;
routing-options
static {
route 10.210.15.0/26
next-hop 10.210.15.254;
no-readvertise;
Out-of-Band Management Network Configuration

The slide illustrates the recommended configuration syntax for the out-o f -band management network.
By default, all hosts (default route) are reachable through the backup router. To eliminate the risk of installing a default route
in the forwarding table, include the destination option, specifying an address that is reachable through the backup
router. Specify the address in the format network/mask-length so the entire network is reachable through the backup
router.
The no-readvertise option prohibits the redistribution of the associated route through routing policy into a dynamic
routing protocol such as OSPF. We highly suggest you use the no-readvertise option on static routes that direct traffic
out the management Ethernet interface and through the management network .

Recommended System Log Settings

• Where possible, configure your syslog to do the
following:
• Write entries to both a local file and to a remote host
• Remote archiving proves invaluable when the local hard drive fails
• Use archive settings for your messages file to maintain at
least 20 copies with a minimum 1-MB file size
• Default is 10 copies of files. default size is platform specific
• Especially important if remote syslog is not in effect
• Log interactive CU commands and configuration changes
• Achieved with the interactive-commands and change-log
facilities using the info severity level
• Provides an audit trail of who did what. and when
Recommended System Log Settings

Wherever possible, you should place the following system logging recommendation into effect:
Use a remote sys/og host: This recommendation helps in archiving syslog messages, and ensures these
valuable messages are available even in the event of a catastrophic failure of a router. Configure remote syslog
service to retain log entries for at least one month.
Archive logs: You should configure syslog archive settings that ensure retaining entries for at least two weeks.
This suggestion is especially important when remote system logging is not in place. We recommend configuring
20 copies of the messages file with each copy being at least 1 MB in size, except on J Series and branch SRX
Series devices, which have limited storage space.
Log CU commands and configuration changes: We have all seen the joke about what to do if you break
something while no one is watching-just walk away. While this advice is perhaps sound, it is futile when the
system configuration logs interactive CU commands. When combined with unique user logins, the logging of all
commands issued on the machine provides an excellent audit trail of who did what, and when.

Syslog Configuration Example

[edit system syslog]
user@router# show
user" { ---------------------- Emergency messages go to all

logged-in users (*)
any emergency;
host 10.210.15.24 Logs to a remote host

authorization any;
file messages { Primary syslog file(*)

any notice;
authorization info;
!archive size lm files 20 no-world-readable;!
file interactive-commands +------------ Logs all CU commands{*)

interactive-commands any;
file config-changes { Logs configuration changes

change-log info;
file errors {
any error;
explicit-priority; Note:(*) indicates sample factory-<lefault
settings (hardware-dependant)
Syslog Configuration Example

The slide shows various syslog configuration examples including a number of the default settings. Syslog operations can be
enabled or modified at the [edit system syslog) hierarchy level and the [edit routing-options options
syslog) hierarchy level. General syslog configuration options include the following:
host name or IP address: Sends syslog messages to a remote host-typically a UNIX device configured to
receive incoming syslog messages;
archive: Configures how to archive system logging files (default is to keep 10 archive files with a maximum
size of 128 K each);
console: Configures the types of syslog messages to log to the system console;
facility: Displays the class of log messages;
severity: Displays the severity level of log messages;
file £il.ename: Configures the name of the log file; and
files number: Displays the maximum number of system log files.
You can also configure support for expl.icit-priority in syslog messages. This configuration alters the normal syslog
message format by adding a numeric priority value. The explicit priority value can simplify the task of parsing log files for
important messages. For example, you can search for all messages at priority 7.

Clock Synchronization
• We recommend synchronizing router clocks with NTP

• Correlated timestamps in log files assist fault analysis
• Also useful in forensic analysis of security incidents
• The Junos OS cannot provide primary time reference

• You need an external device for synchronization
• A simple UNIX device using an undisciplined local clock
suffices
• Support for client, server, or symmetric modes, with or
without authentication [edit system]
user@router# show
ntp {
boot-server 10.0.1.201;
server 10.0.1.201;
Synchronize Router Clock

We recommend using the Network Time Protocol (NTP) to synchronize all routers to a common, and preferably accurate, time
source. By synchronizing all routers, you ensure time stamps on log messages are both accurate and meaningful, which is
especially important when conducting security-related forensics where you must correlate events that might have occurred
on numerous devices.
The Junos OS Needs a Reference

The basis for the NTP protocol is a series of timing hierarchies, with a Stratum 1 (atomic) timing source at the very top. While
accuracy is desirable, you do not need to synchronize to Stratum 1 reference to benefit from having synchronized views as to
the time of day. A Junos device cannot provide its own timing source because it does not support the definition of a local,
undisciplined clock source (for example, the local crystal oscillator). If needed, you can always obtain a commodity UNIX
device of some type with a configuration that provides a timing reference based on its local clock. Remember, any
synchronization, even if based on an inaccurate local clock, is better than none.
The Junos OS supports client, sever, and symmetric modes of NTP operation, and can also support broadcast and
authentication. We recommend the use of authentication to ensure an attacker cannot compromise your synchronization.
Use the show ntp associations command to confirm synchronization status.

Establish a Baseline for Reference

• You must know what is normal for your system
• Establish a baseline before a problem occurs
• Environmental conditions
• Resource utilization
• Traffic loads
• Many tools available
• Use SNMP for regular data collection
• Be sure the baseline is representative
• Requires regular monitoring over time
Know What Is Normal

It might seem pretty basic but in order to identify a problem, you must be able to identify whether an observed behavior is
normal or anomalous in your network. How can you accomplish this without a reference? It cannot be done. As an example,
is 30% CPU utilization on a system's Control Board an indication of a problem, or a normal condition?
You must establish a meaningful baseline before a problem occurs. Your baseline must include a representative sampling
gathered from your device and should include information about environmental conditions, resource utilization, traffic loads,
throughput, and so on. This information can then be used to confirm a reported problem before potentially disruptive
measures are taken to troubleshoot or resolve an issue that might, in fact, be normal.
Many tools are available to gather information from your system, including the CU. You must collect information at regular
intervals on an ongoing basis to be meaningful. There is no way to look at a single snapshot and determine whether it is
representative of normal conditions. The Junos OS supports SNMP which can be used with a wide variety of tools to collect
information and establish a meaningful baseline.

Simple Network Management Protocol

•SNMP
• SNMP is an application-layer protocol designed to monitor
and manage TCP/IP network devices
• Communication occurs between an SNMP agent running on
a managed device (such as a device that runs the Junos OS)
and a network management system
• The Junos OS supports SNMP versions 1, 2c, and 3
GetRequest r
SNMP Agent on device
NMS
running the Junos OS
'-------------f-, Response
Simple Network Management Protocol

SNMP defines a set of standards for network management including a protocol, a database structure specification, and a
set of data objects that facilitate communications between an SNMP agent running on a managed device (such as a device
that runs the Junos OS) and a network management system (NMS). SNMP can be used to monitor various parameters such
as CPU utilization, memory utilization, CPU temperature, interface throughput, and so on.
SNMP defines several basic protocol data units (POUs): Get, Response, Trap, and Set. The Get POU is used to retrieve
statistical information from the agent. The Response POU is used to send the requested information to the NMS. The Trap
POU is generated by the agent to alert the NMS of unexpected network changes, and the Set POU is used to apply
configuration changes and operations to the agent.
By polling managed network devices, the NMS collects information about network resources and can establish a meaningful
baseline. The slide shows the NMS initiating a request with a GetRequest POU. The SNMP agent responds with a Response
containing the requested information.
The SNMP agent can initiate a Trap POU to notify the NMS of events and resource constraints.
As mentioned earlier, the SNMP protocol also defines a method to set system parameters with the Set POU. However, SNMP
uses unencrypted data strings to authenticate a manager with an agent. Because this poses a potential security risk, the
Junos OS does not support this functionality.

SNMP Configuration
[edit snmpJ
user@router# show
description "My Junos OS Device";
Device contact
information
location "123 Main Street - Rack 4";
!
contact "John Doe - x1865";
I( j Default
community myManagedDevices
authorization read-only;• �I authorization
clients { .---�::::::::::::::::::�
SNMP requests limited to
Defining an SNMP 10.210 .15 .0/24; -------- 10.210.15/24 subnet; can
community is the also restrict to an interface
minimum SNMP
configuration
trap-group my-trap-group
version v2; Sends SNMPv2
categories ( >------"'I notifications
chassis; regarding link or
link; chassis events
�--- targets {
Defines NMS
10.210.14.173;
for trap
delivery
Sample SNMP Configuration

The slide shows a sample SNMP configuration using some common SNMP configuration options. When configuring contact
information, you should be as specific as possible. This information is useful when trying to resolve issues with a network
device. The example restricts SNMP access to the 10.210.15.0/24 network with read-only authorization. The example also
shows the configuration of an SNMP trap group, necessary for the delivery of SNMP traps to an NMS.

Summary
• Described various tools that can be used to troubleshoot
devices that run the Junos OS
• Explained JTAC recommendations for current best-practices
that facilitate troubleshooting
We Discussed:
Various troubleshooting tools supported by the Junos OS; and
JTAC recommended configuration settings for ease of troubleshooting.

Review Questions
1. What operational mode command would you use to

gather information about the local chassis
environment?
2. What does the monitor traffic command
display?
3. What is SNMP?
Review Questions
1.
2.
3.

Monitoring Tools and Establishing a

Baseline Lab
• Learn how to monitor the health of a device running

the Junos OS.
• Configure NTP to facilitate system clock
synchronization.
• Use online resources to help troubleshoot issues.
Monitoring Tools and Establishing a Baseline Lab


1.
The operational mode command show chassis environment would display information about the local chassis environment.
2.
The monitor traffic command monitors real-time traffic going to and from the control plane. If no interface is specified, it
monitors the control traffic going over FXPO.
3.
SNMP is an Application-Layer protocol designed to monitor and manage TCP /IP network devices. .An NMS requests specific
infonnation frotn an SNl\t11J agent running on the managed device. The agent can also initiate alerts to send to the NMS.


JUnl�v�[
Chapter 5: Hardware and Environmental Conditions

Objectives
able to:
• Describe the key commands and features used to monitor
storage and memory issues
• Describe the key commands and features that you can use
to monitor software installations
• Determine how to find potential hardware problems using
system logs
• Describe the key commands that you can use to monitor
hardware and environmental issues
We Will Discuss:
The commands and features used to monitor storage and memory issues;
The commands and features that you can use to monitor software installations;
Finding potential hardware problems using system logs; and
The commands that you can use to monitor hardware and environmental issues.
Chapter 5-2 • Hardware and Environmental Conditions www.juniper.net

Agenda: Hardware and Environmental

Conditions
7Hardware Troubleshooting Overview
• Memory and Storage
• Boot Monitoring
• Hardware-Related System Logs
• Chassis and Environmental Monitoring
Hardware Troubleshooting Overview

This slide lists the topics we will discuss. We discuss the highlighted topic first.
www.juniper.net Hardware and Environmental Conditions • Chapter 5-3

Hardware Troubleshooting Tools

• The craft interface and visual indicators
• Red LEDs indicate failure
• LCD panel displays all major and minor alarms
• Issue a show chassis craft-interface command to view
the display remotely on all platforms
• Many individual components have their own status
indicators
• The Junos OS CLI and J-Web tools
• Interactive failure analysis using show commands
• Monitor log files using monitor command
• J-Web displays diagnostics about the platform
• System logs (syslog)
• Log files contain a wealth of invaluable information
• CU show log 1.og-£il.e-name command
• Remember to use pipe for added functionality
Troubleshoot Using the Craft Interface

You can use the craft interface to troubleshoot chassis problems. All platforms use some form of LEDs on the craft interface tc·
indicate the status of various chassis components. Some of the larger platforms use the LCD to display general system status
and a listing of any alarms that are currently active.
Troubleshoot Using the Command Line Interface

The primary means of controlling and troubleshooting the Junos operating system, protocols, network connectivity, and the
router hardware is to execute various operational mode commands from the command-line interface (CU). The CU provides
commands that let you display information in the routing tables, display routing protocol-specific information, and check network
connectivity using the ping and traceroute commands.
Troubleshoot Using Syslog Messages

The various system logs maintained by the Junos OS and the various daemons that run on top of the Junos kernel contain a
wealth of information regarding the operational status of a given system. The information stored in system logs is normally
more detailed than that displayed on the craft interface. Do not forget to leverage the CU's pipe function to simplify the task
of parsing through large log files for symptoms of abnormal operation or hardware failure.

CLI-Based Hardware Troubleshooting

• System status available through:
• show system... commands: Displays information about
the system and software processes
• Boot messages. Storage, Uptime
• Use system logs to determine chassis faults
• Hardware status available through:
• show chassis... commands: Display information about
the chassis components
• FPCs. PICs, FEB. SCB, SFM, SSB. fans. and power supplies
• show pfe ... commands: Display information about PFE
statistics and internal communications errors
• Use the CLI to bounce FPCs and PICs
CLI-Based Troubleshooting: Hardware Status

The slide highlights the topics covered in the next sections. We cover CU-based troubleshooting using a variety of show
system, show log, and show chassis commands next.
Using CLI commands, you can find detailed information about the different parts of the Packet Forwarding Engine (PFE) like the
Flexible PIG Concentrators (FPCs), Physical Interface Cards (PICs), Forwarding Engine Boards (FEBs), System Control Boards
(SCBs), Switching and Forwarding Modules (SFMs), System and Switch Boards (SSBs), and more.


Conditions
• Hardware Troubleshooting Overview
7 Memory and Storage
• Boot Monitoring
Memory and Storage


Displaying Routing Engine Status

• View the status of the RE to determine memory and
CPU utilization
user@rr�80> show chassis routing-engine
Routing Engine status:
Temperature 35 degrees c I 95 degrees F
CPU temperature 46 dearees c I 114 degrees F
DRAM 2048 J',I.B
Memory utilization 30 percent
CPU utilization:
User: O percent
Background O percent
Kernel 1 percent
Interrupt O percent
Idle 99 percent
Model RE-MXBO
serial ID S/N YK8973
Start time 2013-01-31 01:10:41 UTC
Uptime 12 days, 47 minutes, 55 seconds
Last reboot reason Router rebooted after a normal shutdown.
Load averages: 1 minute 5 minute 15 minute
0.60 0.23 0.08
Displaying Routing Engine Status

The show chassis routing-engine command displays information about the Routing Engine (RE). The output fields are
the following:
Slot: Indicates the slot number for the RE on systems that support RE redundancy.
Temperature: Displays the temperature of the air flowing past the RE.
DRAM: Displays the total DRAM, in megabytes, available to the RE's processor.
www.juniper.net Hardware and Environmental Conditions • Chapter 5- 7

Displaying Routing Engine Status (contd.)

CPU utilization: Displays information about the RE's CPU utilization, which include the following:
User: Displays the percentage of CPU utilization being used by user processes;
Background: Displays the percentage of CPU utilization being used by background processes;
Kernel: Displays the percentage of CPU utilization being used by kernel processes;
Interrupt: Displays the percentage of CPU utilization being used by interrupt processes; and
Idle: Displays the percentage of idle CPU utilization.
Model: Displays the RE model.
Serial ID: Serial identification number of the RE.
Start time: Displays the time at which the RE started running.
Uptime: Displays how long the RE has been running.
Load averages: Displays the RE load averages for the last 1, 5, and 15 minutes.

Displaying System Storage

• Displays amount of storage available on flash and
rotating disks: / = Partition on flash
(885MB) ..•..••.•••.
usez:@mxSO> show system storage

Filesy3t.e:r, M.::,unted en
._! l
_ct;..e_.
d.evfs
l
,.;.c..;.·a;..0;.;•..;; 1.;.• _____'-'-;.;.;.;---=;.;.;..--.....;..;;.;.;.;.___ -----.....1"4-........ .
..;;.;..;......;.,-/dev
/dev/mdO /packaqes/mnt./J base
/de·.r/:ndl /pa::::kages/m.."l.t/J kernel-ppc-12. 2R2. 5
/dev/:nd2 /package:3/rr!Ilt/jpfe-!".iXS0-12. 2R2. 5
/d-evhnd3 /package3/::nn.t/jd0::::s-12. 2R2. 5
/d-e......-/md.4 /package:,/mnt./jrou'Ce-wc-12. 2R2. S
/dev/md5
/dev/mdf
............... /packagesimn:t/j crypt.o-pp::.-12.2P.2. 5
(:;:np
/de.v/m:17
Memory virtual disks
Displaying System Storage

The show system storage command displays the amount of storage available on the flash and hard drive storage. This
command displays statistics about the amount of free disk space in the various file systems used by the device. Values display
in 512 byte blocks. This command is equivalent to the UNIX df command.
The highlights on the slide indicate the device names used by the flash and hard drive storage devices. In this case, the flash
medium is device adO while the hard disk is device ad2. You can also see that the Junos OS makes use of FreeBSD's virtual file
system support to mount images of jbundle components on memory virtual disks. These RAM disk devices always indicate
being 100% full because of their read-only nature.
Use the command request system storage cleanup to identify and delete files that can be easily removed to free up
disk space. The command will first display the files to be deleted, then prompt the user for confirmation before deleting the files.
Note that the output of the show system storage command might list the same flash and hard disk devices with different
names due to changes in the underlying FreeBSD distribution on which the Junos OS version is based. For the curious, the
device named adOsla has the following meaning:
ad= IDE hard disk(the flash device emulates an IDE disk).
o = The unit number for that device type-for example, the first IDE disk is unit 0.
s 1 = Slice 1 for PC BIOS partition 1.
a = The root(/) partition. Ab partition type is for swap space, while a c partition type is used in dedicated mode
(native BSD slice mode). Other partition types are for general use, such as thee designation for the I con fig
partition.

View Directory Usage

• Use the show system directory-usage to
verify storage levels by directory
user@mx80> show system directory-usage /var/
/var/
/var/::un
2.0K /var/run/ext
/var/run/db
2.0K /var/run/db/private
2.0K /var/run/na..�ed
2. OK I var/run/ppp
/var/run/scripts
2.0K /var/run/3cripts/cc�mit
2. OK /var/run/ scripts/event
2.0K /var/run/scripts/op
18K /var/run/scripts/import
2.0K /var/run/scripts/lib
/ver/!,w
823M /var/3w/p�g
/var/t.."!lp
126K /var/t!:'tp/gre�-tp
Directory Usage
The slide shows the output of the show system directory-usage command. By specifying a particular directory as a
modifier to this command, you can determine how much storage space is being used by each of the underlying directories.

Testing RE Memory and Storage

• You might want to confirm flash or hard disk integrity
before upgrading or performing snapshots
• Test steps:
• Use CU show system storage command or the shell
command df to identify the flash and hard disk devices
• Escape to a root shell and use dd to confirm that the flash.
hard drive. and memory can be read with no errors
root@mx80% dd if=/dev/daO of=/dev/nu11 bs=lm.•····
·········
·······
3920+0 records in Must have root
access
3920+0 records out
4110417920 bytes transferred in 367.623269 secs (11181060 bytes/sec)
Output above shows no errors. A test resulting with errors would show up as
the following:
Feb 20 12:24:15 kernel: daO: FAILUFE - READ status=51<READY,DSC,ERROR>
error=lO<NID NOT FOUND> LBA=18446
Verifying RE Memory Integrity

Although most types of RE memory errors are automatically detected and logged, there might be times when you want to
manually confirm the integrity of the RE's compact-flash or hard drive. For example, before doing major software maintenance,
such as an upgrade or a system snapshot, you might want to first confirm that the RE's storage devices are error free because
errors that are detected while performing software maintenance might lead to a router that will not boot.
dd Utility
The integrity of a storage device is determined by escaping to a root shell and using the dd utility to confirm that all blocks can
be read. This approach is typically used to test the compact flash, but it can also be used on the hard drive and RE memory by
specifying the correct device and switches. The following example shows a compact-flash test (device ado in this example) that
fails with a read error:

dd Utility (contd.)
root@router% dd if=/dev/radO of=/dev/null bs=4k
adO: HARD READ ERROR blk# 65600 status=59 error=40
adO removed from the Boot List
dd: /dev/radO: Input/output error
8200+0 records in
8200+0 records out
33587200 bytes transferred in 25.538337 secs (1315168 bytes/sec)
Note that as a result of the read error, the compact-flash device has automatically been removed from the list of available boot
devices. You can use the sysctl -a command to display the current list of boot devices:
root@router% sysctl -a I grep bootdevs
machdep.bootdevs: pcmcia-flash,disk,lan
In this example you can see that the compac t -flash device is no longer in the boot list. If you believe that a boot device has been
incorrectly removed from the boot list, or that the error condition has been resolved, you might have to manually add that device
back into the boot listing. Note that this addition automatically occurs when you reinstall the Junos OS from removable media.
To manually add a device back to the boot list, use the sysctl -w command at a root shell:
root@router% sysctl -w machdep.bootdevs=pcmcia-flash,compact-flash,disk,lan
machdep.bootdevs: pcmcia-flash,disk,lan -> pcmcia-flas h,compact-flash,disk,lan

Hard Drive Testing

• SMART status checked routinely by smartd
• Enabled at boot time on all hard drives
• smartd continuously monitors the hard drive for:
• Mechanical parameters
• Data transfer operations
• Problems logged to /var I log /messages
• Chassis alarms will reflect smartd issues
•Useshow chassis alarms
• You can also initiate a smartd self-test from the CU
using hidden CU command
user@mx80> request chassis routing-engine hard-disk-test ?
disk Name of hard disk
Hard Drive Testing with smartd

Junos OS routers use Self-Monitoring, Analysis, and Reporting Technology (SMART)-enabled hard drives. srnartd is a daemon
process included in the Junos OS to leverage the SMART instrumentation on the drive and to provide early warning to potential
hard-drive failure. When a drive fails, it is fairly common for it to exhibit minor symptoms that SMART catches before a total
failure occurs. When smartd detects such errors, it reports them through syslog (to the messages file by default), which gives
you the opportunity to deal with the issue gracefully:
Feb 26 01:44:26 London srnartd[566]: Device: /dev/adla, Failed attribute: [ 18 ]
Chassis Alarms
You can also determine whether smartd has detected any problems by using the Show chassis alarms command:
lab@sneaky-reO> show chassis alarms
1 alarms currently active
2010-10-13 13:04:23 PDT Minor Host O hard-disk drive error

Initiating Self Tests
You can use the command on the slide to initiate a SMART-based self-test of the hard drive. The following are sample tests:
Successful self-initiated test:
root@rnxB-1> request chassis routing-engine hard-disk-test short disk /dev/ad2
Drive Command Successful, Short Self te st has begun
Please wait 1 minutes for test to complete
Use smartd -oA to abort test
root@rnxB-1> request chassis routing-engine hard-disk-test show-status disk /dev/ad.2

Device: ST940817SM Supports ATA Version 7, Firmware version 3 .AAB
ATA/ATAPI revision 7
device model ST940817SM
serial number 5RQ04NKV
firmware revision 3 .AAB
cylinders 16383
heads 16
sectors/track 63
lba supported 21248 sectors
lba48 supported -4630047295675542784 sectors
dma supported
overlap not supported
Feature Support EnableValue Vendor

write cache yes yes
read ahead ye s ye s
dma queued no no 31/lF
SMART yes yes
microcode download yes yes
security yes no
power management yes yes
advanced power management yes yes 32960/80CO
automatic acoustic management no no 0/00 254/FE
Drive supports SMART and is enabled
Check SMART Passed
General Smart Values:

Off-line data collection status: (Ox82) Offline data collection activity
completed without error
Self-test execution status: 0) The previous self-test routine completed

without error or no self-test has ever
been run
Total time to complete off-line

data collection: ( 426) Seconds
Offline data collection

Capabilities: (Ox5b) SMART EXECUTE OFF-LINE IMMEDIATE
Automatic timer ON/OFF support
Suspend Offline Collection upon new
command
Offline surface scan supported
Self-test supported

Initiating Self Tests (contd.)
Smart Capablilities: (Ox0003) Saves SMART data before entering
power-saving mode
Supports SMART auto save timer
Error logging capability: (OxOl) Error logging supported
Short self-test routine

recommended polling time: 1) Minutes
Extended self-test routine

recommended polling time: 55) Minutes
Vendor Specific SMART Attributes with Thresholds:

Revision Number: 10
Attribute Flag Value Worst Threshold Raw Value
l)Raw Read Error Rate OxOOOf 100 253 006 000000000000
3)Spin Up Time Ox0003 099 099 000 000000000000
4)Start Stop Count Ox0032 097 097 020 OOOOOOOOOeOl
5)Reallocated Sector Ct. Ox0033 100 100 036 000000000000
7)Seek Error Rate OxOOOf 100 253 030 000000083ddc
9)Power On Hours Count Ox0032 096 096 000 000000001032
lO)Spin Retry Count Ox0013 100 100 034 000000000000
12)Power Cycle Count Ox0032 100 100 020 OOOOOOOOOOOc
(187)Unknown Attribute Ox0032 100 100 000 000000000000
(189)Unknown Attribute Ox003a 100 100 000 000000000000
(190)Unknown Attribute Ox0022 061 054 015 000029lf0027
(192)Power Off Retract Count Ox0032 100 100 000 OOOOOOOOOOOb
(193)Load/Unload Cycle Count Ox0032 099 099 000 OOOOOOOOOeOf
(194)Device Temperature Ox0022 039 046 000 001500000027
(195)Unknown Attribute OxOOla 068 066 000 00000646eaff
(197)Current Pending Sec. Ct Ox0012 100 100 000 000000000000
(198)0ffline Uncorrectable OxOOlO 100 100 000 000000000000
(199)UDMA CRC Error Count Ox003e 200 200 000 000000000000
(200)Write Error Rate OxOOOO 100 253 000 000000000000
(202)Vendor Unique Ox0032 100 253 000 000000000000
SMART Error Log:

SMART Error Logging Version: 1
No Errors Logged
SMART SelfTest Log:

SMART SelfTest Logging Version: 1
Selftest Type Status Failure-LEA Timestamp
Short Successful None 4146

Extended Successful None 4130


Conditions

�Boot Monitoring
Boot Monitoring

The Default Boot Sequence Review

Emergency
Boot media MX80 Boot Sequence:
1. USB (da2)
2. Internal NAND flash #1 (daO)
Primary boot media 3. Internal NAND flash #2 (dal)
No Solid-state
flash disk Secondary boot media
Rotating or SSD
hard disk. or
solid-state
Done flash disk
• Hardware controlled
• Software notifies hardware
I I
Done
when boot completes
�� "''"".... r" ?•J¥ ;;.r--- � -��fE:'7"

C,201.4funipeftle-rks,lnc.llllrii,,ts�ce,, ' , JUnm]t-WorldwideEducationServlces wwwJurupe,-.nel 11.7
"1 ��·:.:= �� -...:..j£��:;,.twr.fblit¥££--
Hardware Controls the Boot Sequence

At power-on. as the router begins the boot process, it first attempts to start the image of software from the emergency boot
media if it is installed in the RE. If this attempt fails or no media is installed, the router next tries to boot from the image of
software on the flash drive and then finally the hard drive.
This sequence is controlled by hardware that waits for a special signal from the Junos OS kernel, indicating a successful boot. If
the hardware does not receive the signal after a few minutes, it forces the system to boot from the next available device in the
boot chain.
Note that the MX80 3D Ethernet Service Router has a universal serial bus (USB) install media instead of PCMCIA flash. These
platforms have the following boot sequence:
USB;
Internal NAND flash #1; and
Internal NAND flash #2.

Boot Devices and Media

• Three boot media options:
• Emergency boot media
• Used for install and upgrade. normally not present
• Flash disk
• Solid-state non-rotating media
• Primary source for booting software
• Hard disk
• Traditional rotating media or SSD. compact flash on some devices
• Secondary source for booting software
• CLI option to set boot source at next reboot:
1.1ser-@mlOi> request system reboot media ? u�er@�rx210> request system reboot media ?
Po93ible completion�: Po�9ible completion3.:
ccmpact-=las: Standard beet off fla�h device
Boot of£ hard tli�k Boot from internal NAL.'lD flash
Boot off U3B device
M10i Example
SRX210 Example
Three Forms of Boot and Storage Media

Junos platforms generally support three forms of boot and storage media:
Emergency boot media: Depending on router model, your router might have a PCMCIA card slot (which reads flash
cards), a compact-flash slot, or a USB port. A copy of the Ju nos OS on removable media is shipped with most
routers.
Flash drive (nonrotating drive): Most Junos platforms are shipped with the Ju nos OS preinstalled on the flash drive.
The flash drive is the primary boot device for most Ju nos OS platforms.
Hard drive (rotating drive or SSD): On most new Junos platforms, we preinstall a backup copy of the Junos OS on
the hard drive. This drive is also used to store system log files and diagnostic dump files.
A Junos router typically boots either from the flash drive or from the hard drive. (Although it is possible to boot the router from
the removable media drive, this is not typically done.) We refer to these drives as the boot media. The drive from which the
router boots is called the primary boot medium, and other drives are secondary boot media. The primary boot medium is
generally the flash drive, and the secondary boot medium is generally the hard drive.

Viewing Boot Messages

root@mxBO> show system boot-messages
platform_ea?:"ly_bo-=:)tinit: 1'-IX-PPC Serie!! Early Boot Initialization
rr..xppc_3et_re_type: hw.board.type i� MXSO
mxppc_�et_re_type: REtype:78, model :mx80, model:MXSO, i2cid:2447
WDOG initialized
Copyright {c) 1996-2012 r Juniper Networks, Inc.
All right3 reserved.
Copyrigh� (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1973, 1980, 1963, 1986, 1988, 1989, 1991, 1992, 1993, 1.994
The Regents of the University of Califo=nia. All right3 reserved .
..ruNCS 12.2a2.s ;o: 2012-11-15 14:13:01 UTC
builde=:-@fa.ranth.jun:..per.net:/volume/build/junoE!/12.2/relea�e/12.2R.2.5/obj
powe=Pc/junos/bsd/kernels/JUNI?ER-PPC/kernel
WARNING: ci�bug.mpsafenet forced �o O as ipsec req-�ires Gia�t
Timecounte= "decrem,enter" frequency 66666€66 Hz q;.:.ality O
cpuO: Frees::ale e500v·2 ::ore revi5ion 3 .0
cpuO: HIDO 80004000<DIC:P, TEEN>
real memory = 2122317824 (202 � l•tB)
azail memory = 20814:23360 (1985 MB)
Security policy loaded: JUNOS J!..1AC/runasnonroot (mac runa3nonroot)
Secu=-ity policy leaded: JUNOS M.AC/pcap (!nac_pcapl
ET�ERNET SOCKET BRIDGE initialising
Initializing M/T platfo=m properties ..
dal: <..�TP AT? IG eUSB SSD 1100> Fixed Di=-ect Access SCSI-0 device
dal: 40.000!:·IBh, tran3fe=s
dal: 35'201-13 (8028160 512 byte sectors: 255H 63S/T 4S9C)
daO at uma�s-si..�0 bus Q target O lun O
daO: <ATP AT? IG eUSB S9D 1100> Fixed Di=ect Access SCSI-0 device
daO: 40.000MB/s tran3fe.rs
daO: 3520M3 (8028160 512 byte sect.ors: 255H 63S/T 499C)
Trying to mount root from ufs:/dev/daOsla
Viewing Boot Messages

The slide shows an example of an operator displaying the contents of the boot log by issuing a show system
boot-messages command. The Junos OS writes this file during system boot, and the file contains the various boot-up
messages generated during the last power cycle/boot or reboot.
In some cases the Junos OS reports hardware errors and device malfunctions at boot time as the system brings itself up. The
truncated capture on the slide does not show any abnormal events.

Viewing the Installed Software
• Use show sys tern software to verify the

installed software version
user@:mx.80> show syste.� softwai:e
Information fer jba�e:
Comrne:it:
JUNOS Base 08 Software suite [12.2R2.SJ
Inforrr.ation for jcrypto:
Comment:
JUNOS Crypto Software Suite (12.2�2.5]
Information for jdo�3:
Comment:
JUNOS Online Docu.�en�ation [12.2R2.5j
Information for jkernel:
Comment:
JUNOS Kernel Software Su�te [12.2R2.5]
Installed Software
The slide shows the use of the show system software command which displays the details of the currently installed
version of the Junos OS.

Viewing the Installed Software

• Use show sys tern software to verify the
installed software version
user(�!':l.X.60> show system software
Infor.!lation for jbase:
Comment:
JUNOS Base OS Software Suite (12.2R2.51
Inforro...ation fer jcrypto:
Comment:
-JUNOS Crypto Software Suite [12.2P2.5]
Information for jdo�3:
Comment:
JUNOS Online Docu.�en�ation [12.2R2.5}
Information for jkernel:
Comment:
JUNOS Kernel Software Suite [12.2R2.5J
Installation from Emergency Boot Media Is Not Common

In the vast majority of cases, you perform software upgrades and downgrades with jbundle or jinstall packages. In some cases
you might want to return a system to a factory state, or you might want to have the router's flash and hard drive devices
reformatted to recover from some type of file system corruption. In these cases you will want to reinstall the Junos OS from
removable media.
Preparation
Before you install the Junos OS, you must perform the following steps:
1. You should have console access to the router so that you can observe installation messages, and so that you can
log in (as root) after the installation. Note that a factory installation supports root logins from the console port only.
2. If you plan to use the existing configuration after software reinstallation, you must take steps to copy the existing
configuration to a remote location. The active configuration file is /config/juniper. conf. This file can be
transferred using FTP to a safe location. Alternatively, you can display the current configuration for copying into a
terminal emulation buffer where it can be pasted into a word-processing program and saved as a text file.
3. Ensure that you have a Juniper Networks installation PCMCIA card or USS drive with the desired software image.

Reinstallation from Emergency Boot Media

• Reinstallation steps:
• Insert emergency boot media into Routing Engine
• PCMCIA flash card or USB flash
• Power cycle the router
• Issue a request system halt from the console
• Power-cycle router
• Follow prompts
• System reboots automatically after installation completes
• Restore original configuration
• Reconfigure management port to facilitate transfer of saved
configuration using FTP/SCP
• Use load override :file-name to load configuration file
• Or. use load override terminal to paste configuration from
a terminal emulation capture buffer
• Commit the restored configuration
• Remove the emergency boot media
Reinstallation from Emergency Boot Media

Perform the following steps to reinstall the Junos OS from the removable medium:
1. Insert installation medium into the RE (USB, PCMCIA, or compact flash).
2. Halt the operating system using the CLI from the console:
root@host>request system halt
3. Power-cycle the router and follow the on-screen prompts. The system reboots automatically after the installation
completes.
4. Restore the original configuration. The procedures will vary based on whether you plan to load a saved
configuration file versus whether you plan to paste the configuration into the router from a capture buffer. In the
former case, you must configure the fxpO out-o f -band (OoB) management interface to restore connectivity so that
you can use FTP or secure copy (scp) to transfer a copy of the original configuration back to the local router. Once
copied, use the load override fil.e-nama command to load the configuration file as a candidate
configuration. Use the load override terminal command to paste the configuration into the router from a
terminal emulation program's buffer. In both cases you must issue a commit to activate the restored
configuration.
You should always remove the removable medium after reinstallation is completed to ensure that the router will reboot normally
in the event of a shutdown or power failure. Note that the router pauses for user input when booting from the removable
medium, so leaving the medium inserted can result in disruption caused by an inoperative router.

Update Image on Removable Media

• Updating process:
• Use FTP to transfer the desired medium image to the
router's /var I tmp directory
• Insert the removable medium into the router's drive
• Escape to the shell and su to root
• Change to the /var /tmp directory, and issue the following
commands:
root@mx80% dd if=/dev/zero of=/dev/externaJ.Drive count=20

root@mx80% dd if=/var/��p/instal.1.Media of=/dev/externaJ.Drive bs=64k
externaJ.Drive = name of removable media drive (da2 on M80)

instal.1.Media = name of installation media placed in/var/tmp
• See the Juniper Networks website for syntax used with other
software releases or media types
Updating the Removable Medium

Updating the removable medium on a periodic basis is a good idea. When the medium is needed, the image stored on it should
match the version of software that is currently in production. To upgrade the image on your removable medium, perform these
steps:
1. Download the desired medium image from the Juniper Networks software download site (a login account is
needed). Once the image is downloaded, it should be moved to the
/var/tmp directory on the router that will be used to perform the image update.
2. Ensure that the removable medium is correctly inserted into the router; pay special attention to the label that says
"insert this side up" because installing the medium incorrectly can damage the extractors used to remove the
PCMCIA card.
3. Escape to a shell and su to root.
4. Change into the /var /tmp directory and use the dd command to first erase the medium, and then again to write
the image file to the medium. Specifics can vary, so consult the Juniper Networks support site for media and
image-specific instructions.
5. When completed, be sure to remove the medium to ensure that it will not interfere with normal operation at the
next reboot.

Backing Up Existing Software

•Backup system software and configuration to rotating
disk:
• Before major upgrade to ensure system recovery if
necessary
• After upgrade when system is judged to be stable
• CU request system snapshot command
• When booted from the primary boot disk, this command
copies the environment to the secondary boot disk
• When booted from the secondary boot disk, this command
copies the environment to the router's primary boot disk
• Can also direct hard drive to mirror contents of flash with
the mirror-flash-on-disk statement
• Incompatible with the request system snapshot
command
Backup Options
In the event of a failure on the flash drive, the router can boot from the hard drive. It is possible to have one version of the Junos
OS on the removable media and another version of the Junos OS on the hard drive. What if you want to ensure that the flash
drive and hard drive versions of the Junos OS are exactly the same?
Requesting a System Snapshot

When the router boots from the flash drive, the request system snapshot command mirrors the contents of the flash
drive onto the hard drive. When the router boots from the hard drive, this command mirrors the hard drive environment to the
router's flash memory.
You should back up software before you upgrade the Junos OS. Or, after you upgrade the software on the router and are
satisfied that the new packages are successfully installed and running, you should consider issuing the request system
snapshot command to back up the software onto the
I al troot and I altconfig file systems, located on the router's hard drive.
Specifically, the root file system (/) is backed up to I al troot, and I config is backed up to
I al tconfig. Normally, the root and /config file systems are on the router's flash drive, and the
I al troot and I al tconfig file systems are on the router's hard drive.

Requesting a System Snapshot (contd.)

In general, system snapshots are best used to preserve a known good environment when performing upgrades or downgrades
on the router's flash memory. In these cases, having the previous environment backed up on the rotating media allows you to
return the router to its previous state if the flash-based upgrade or downgrade should fail or exhibit operational problems.
You can reissue the request system snapshot command to restore the router's flash from the image saved to the hard
drive after the router has booted from the alternative medium.
In contrast, mirroring the router's flash onto the rotating medium is useful in the event that the router's flash memory becomes
corrupted or unusable, as the router can now continue operation using the mirrored image present on the hard drive. Without
disk mirroring, a failure of the router's flash memory results in a reboot. In this case, the router will reboot from alternative
media using the binary image and configuration that was written to the hard drive during the last snapshot operation.
Disk Mirroring
You can direct the hard drive to mirror the contents of the compact flash automatically. When you issue the
mirror-flash-on-disk statement at the [edit system] hierarchy, the hard drive maintains a synchronized mirror copy
of the compact-flash contents. Data written to the compact flash is simultaneously updated in the mirrored copy of the hard
drive. If the flash drive fails to read data, the hard drive automatically retrieves its mirrored copy of the flash disk.
We recommend that you disable flash disk mirroring when you upgrade or downgrade the router. You cannot issue the
request system snapshot command when you enable flash disk mirroring. After you have enabled or disabled the
mirror-flash-on-disk statement, you must reboot the router for your changes to take effect. To reboot, issue the
request system reboot command.

Verify System Snapshot
• Use the show sys tern snapshot command to

determine the status of the current snapshot
user@mx80> show system snapshot

Information for snapshot on internal (dalsl)
creation date: Nov 13 20:37:54 2012
JUNOS version on snapshot:
jbase ppc-12.2Rl.3
jcrypto: ppc-12.2Rl.3
jdocs 12.2Rl.3
jkernel: ppc-12.2Rl.3
jpfe MX80-12.2Rl.3
jroute : ppc-12.2R1.3
View the System Snapshot

To view the system snapshot, issue the show system snapshot command. The output of the command shows the
storage location of the snapshot and when it last occurred.


Conditions
• Boot Monitoring
7Hardware-Related System Logs
Hardware-Related System Logs


Parsing System Logs

• The CLI pipe function makes parsing log files easy
• Search the messages and chassisd logs for entries like
fail,kernel,core,error,and so forth
• Use quotes and the pipe function to search for multiple items:
show log messages I match "fpclsfmlkernelltnp"
• Can you describe the nature of the hardware fault from this
log entry?
user@m320> show log messages I match fail
Jan 8 16:33:16 Bangkok chassisd 2850]: snmp ipc-try-connect: connect to master (unix
sock) failed: Connection refused, retry in -1
Feb 2 21:28:12 Bangkok-rel chassisd[4446 : CHF.SSISD BLOWERS SPEED FULL: Fans and
impellers being set to full speed [fan/blower missing/fail_;-d] -
Parsing System Logs

We covered the configuration of system logging and tracing in a previous chapter, as well as the CLI commands that you use to
view system logs and to monitor changes to log files in real time. Our intent is to simply remind you that the Ju nos CLl's match
and find functions make parsing system log files for signs of trouble easy and effective.
The slide shows examples how you can use the CLI to rapidly locate signs of trouble within a given log file.

JunosTroubleshooting in the NOC
chassisd Logs
• The chassis daemon (chassisd) maintains logs
entries as chassis-related events occur
user@mx2�0> show log chassisd
Sep 21. 18:43:43
Dec 5 20:38:32 �end: clear all FPC 1 alar:ns

Dec 5 20:38:321CH..ASSI3D FRU EVENTJ set_fpc_cnline: restarted P?C 1
De::; 5 20:3a:33 pie cnline req, pie O t1rpe 761, fpc O
Dec: 5 20:38:33 fpc slot O pic_present OxO => Oxl
Dec 5 20:38:33 fpc_send__pic_online_ack: fpc -0 pie D pic_type Ox2fS- m.sg_len 64 tl·..._len O
Dec 5- 20: Ja: 33 pic_get_egi=ess_shaping_overhead: 0/0 esc val = 20
Dec 5 20:38:33 CH..i..£13ISD_SNMP_TRAP10: SNMP trap generated: FRU power ,.::m. (jnxFruCor..tent::i:rndex 8,
jnxFruLlindex 1, jnx?ruL2Index 1, jnxFruL3Index 0, jnxP=-u:..Vame ?IC: @ 0/0/*, jnx?ru�ype 11,
jnxE'ruSlo-:. 0, ::nxF::uOfflineRea5cn 2, jnxFruLastPowerOff 0, jnxFruLast.?owerOn sa65}
I
uaer@mx240> help syslog CHASSISD FRU EVENT
Name: CHAS.SISDI FRO EV'ENT
Mea.:,age: <function-name>: <5cate> <f::-u-n.a."Tle> <fru-5lot>
Help: ?RU changed state
Description: The state of the indicated ccmponent (field-replaceable unit, or ?RU) changed a3
indicated.
Type: Event: This message reports an event, not an e=ror
You can also view Junos OS technical documentation to determine chassisd message
definitions at: http://www.juniper.net;techpubs/software/junos/junosmsyslog-messages/
chassisd-system-log-messages.html
Chassis Daemon Logs

You can parse the chassis daemon logs to view the details and time lines for hardware events that have occurred. Each log
message generated by chassisd has a name.The slide shows that a restarting FPC was logged by chassisd using the
log type named CHASSISD_ FRU_EVENT.To determine the details of any log type, use the help syslog l.og-name
command. Some platforms do not support this command, but the same information can be found on the Juniper Networks
support website.


Conditions
• Boot Monitoring
�Chassis and Environmental Monitoring
Chassis and Environmental Monitoring


Visible Activity at Startup-Typical
• Craft interface LCD display:

• Idle mode: Cycles through various status displays
• Alarm mode: Displays alarms in order of severity
• Craft interface LEDs:
• LEDs for FPCs, DPCs, PICs, RE, CBs, and others
• Blinking green indicates test is in progress
• Solid green indicates success: solid red indicates failure
- -- - ..
• Craft interface: Front panel alarm LEDs
... ....
IS) LS) ...,_ Lllll Lllll
The T640 Craft Interface Panel
Craft Interface LCD

As the router boots, the current status of the boot process is displayed on the craft interface LCD on platforms supporting the
LCD display.
Craft Interface LEDs

A series of diagnostic tests are performed on the FPCs during the boot process. Blinking LEDs indicate tests in progress. They
become solid after conclusion of the testing period. Depending upon the platform, the craft interface might also support LEDs
for the host module status. The host module consists of an RE and a Control Board (CB) or Switch Interface Board (SIB).
Alarm LEDs Illuminate as Needed

Should any red or yellow alarms be declared, the corresponding alarm LED is illuminated on the craft interface. To see the
specifics relating to a given alarm, you can look at the LCD on the craft interface (when present) or use the command show
chassis alarms.

Power Supply and PEM Indicators

• Power supplies and PEMs have their own status
indicators
• Some platforms require at least 60 seconds for proper status
indications
• Each power supply also has LEDs on the craft interface or
front panel
T640 PQwef supply status

indicators
Power Supply and Power Entry Module LEDs

Depending upon the platform and power supply model, one or more status LEDs on each power entry module (PEM) exist that
you can use to determine whether a power supply is functioning normally. Note that for some platforms you must wait at least
60 seconds after applying power to a power supply before you can expect to see meaningful status indications. The self-test
button present on some power supplies should never be used on a production system; this button is for engineering and Juniper
Networks Technical Assistance Center (JTAC) use only.
Power Supply LEDs and Troubleshooting

The following information is adapted from the M320 hardware manual and is representative of typical power supply status
indicators and fault-isolation procedures.
To verify that a power supply is functioning normally, perform the following steps:
Check the OUTPUT OK LED on each power supply faceplate (or the corresponding POWER OK LED on the craft
interface). If this LED is on, the power source is good and the power supply is functional.
Check the display on the craft interface. The Junos OS constantly updates the screen with status information for
each component.

Power Supply LEDs and Troubleshooting (contd.)

If a power supply is not functioning normally, perform the following steps to diagnose and correct the problem:
If the OUTPUT OK power supply LED is off, check the red alarm LED on the craft interface. The Junos OS monitors
the system temperature, and if it exceeds a certain limit, the software triggers a red alarm, a condition that shuts
down the power supplies.
Check the display on the craft interface. The Junos OS constantly updates the screen with status information for
each component. On the display and in the CLI, the lower power supply is referred to as PEM 0, and the upper
power supply is referred to as PEM 1.
If the OUTPUT OK power supply LED is off and no red alarm condition exists, check that the circuit breaker is
switched to the on position (I).
Verify that the source circuit breaker has the proper current rating. Each power supply must be connected to a
separate source circuit breaker.
Verify that the power cords from the power source to the router are not damaged. If the insulation is cracked or
broken, immediately replace the cord or cable.
Connect the power supply to a different power source with a new power cord or power cables. If the power supply
OUTPUT OK LED still does not light, the power supply is the source of the problem. Replace the power supply.
If the OUTPUT OK LED on the installed spare lights, the replaced power supply is faulty, and you should return it for
replacement, as described in Contacting Customer Support and Returning Hardware.
If you cannot determine the cause of the problem or need additional assistance, see Juniper Networks Technical
Assistance Center.

Displaying Alarm Conditions
• Use the show chassis alarms to display an

currently active hardware alarms
user@mx240> show chassis alarms

1 alarm is currently active

2010-11-15 21:30:07 UTC Major Power Supply B not providing power
Listing Alarm Conditions

The show chassis alarms command lists all of the alarm conditions that currently exist in the router. You can disable some
alarms; however, you cannot disable safety-related and chassis component alarms.
Pressing the alarm cutoff button located on the craft interface manually silences the alarm to an external device connected to
the alarm relay, but it does not remove the alarm messages from the display (if present on the router) nor extinguish the alarm
LEDs. In addition, new alarms that occur after silencing an external device reactivate the external device.
The show chassis alarms output fields are:
Alarm time: Displays the date and time the alarm was first recorded;
Class: Displays the severity class for this alarm (it can be minor or major); and
Description: Displays the information about the alarm.

Displaying the Craft Interface

{:master}
user@t640> show chassis craft-interface Front Panel FPC LEDs:
FPM Display contents: FPC 0 2 3 5 6 7
+--------------------+
ldaemonO
Red
IUp: 21+04:07 Green
I
!Temperature OK CB LEDs:
+--------------------1 CB 0 1
Front Panel System LEDs: Amber

Rouc.ing Engine Green
Blue
OK
SCG LEDs:
Fail
SCG O
Master
Amber
Front Panel Alarm Indicators:
Green
Blue
Red LED
Yellow LED
Major relay SIB LEDs:
Mine:= relay
SIB O 2 3
Red
Green
Displaying Contents of LCD Display

The show chassis craft-interface command shows all current messages. The capture shown was taken from a T640
system. Output fields include the following:
FPM Display contents: Displays contents of the Front Panel Module(FP M) display.
router-name: Shows the name of the router.
Up: Shows how long the router has been operational in days, hours, minutes, and seconds.
message: Displays information about the router traffic load, the power supply status, the fan status, and the
temperature status. The display of this information changes every 2 seconds.
Front Panel System LEDs: Displays status of the Front Panel System LEDs. A dot(.) indicates the LED is not
lit. An asterisk(*) indicates the LED is lit.
Front Panel Alarm Indicators: Displays status of the Front Panel Alarm Indicators.A dot indicates the
relay is off. An asterisk indicates the relay is active.
Front Panel FPC LEDs: Displays status of the Front Panel FPC LEDs. A dot indicates the LED is not lit. An
asterisk indicates the LED is lit.
MCS, SFM, SCG, CB, and SIB LEDs: Displays status of the Miscellaneous Control Subsystem(MCS), SONET Clock
Generator(SCG), CB, S FM, and SIB LEDs as supported by a given platform. A dot indicates the LED is not lit. An
asterisk indicates the LED is lit. When neither a dot nor an asterisk is displayed, no board is in that slot.

Displaying Environmental Information

{master}
user@t640> show chassis envirorunent
Class Item Status Measurement
Temp PEM 0 CK 27 degrees c 80 degrees F
PEM 1 CK 27 degrees c I 80 degrees F
SCG 0 OK 35 degrees c I 95 degrees F
SCG OK 34 degrees c I 93 degrees F
Routing Engine 0 OK 31 degrees c I 87 degrees F
Routing Engine 1 OK 30 degrees c I 86 degrees F
CB 0 OK 34 degrees c I 93 degrees F
CB 1 OK 36 degrees c I 96 degrees F
SIB 0 OK 38 degrees c I 100 degrees F
SIB 1 CK 33 degrees c I 100 degrees F
SIB 2 OK 38 degrees c I 100 degrees F
SIB 3 OK 39 degrees c I 102 degrees E'
SIB OK 39 degrees c I 102 degrees F
FPC 1 Top Testing
FPC 1 Bottom Testing
FPC Top OK 43 degrees c 109 degrees F
FPC 3 Bottom OK 30 degrees c I 86 degrees F
FPC 5 Top OK 43 degrees c I 109 degrees F
FPC 5 Bottc-m CK 30 degrees c I 86 degrees F
FPM GBUS OK 28 degrees c I 82 degrees F
FPM Display OK 31 degrees c I 87 degrees F
Fans Top Left Front fan OK Spinning at normal speed
Displaying Environmental Information

The show chassis environment command displays environmental information about the router chassis, including the
temperature and information about the fans, power supplies, and RE.
The output fields are the following:
Power: Displays information about each power supply. Status can be OK, Testing (during initial power-on),
Failed, or Absent.
Temp: Displays the temperature of air flowing through the chassis.
Fans: Displays information about the fans. Status can be OK, Testing (during initial power-on), Failed, or
Absent. Measurement indicates whether the fans are spinning at normal or high speed.
Other: Depending upon the platform, various other fields might be present. For example, for T Series Core Router
platforms, the display includes information on the CB, S!Bs, and the Switch Processor Mezzanine Board (SPMB)
and the Connector Interface Panel (GIP). These fields are not shown in the sample on the slide.

Displaying CPU Temperature

user@MlOi> show chassis routing-engine
Slot 0:
Current state Present
Slot 1:
Current state Master
Election priority Backup (default)
Temperature 38 degrees c I 100 degrees F
CPU temperature 39 degrees c I 102 degrees F
DRA.iwl 1536 !'.3
Memory utilization 21 percent
CPU utilization:
User 0 percent
Background 0 percent
Kernel 3 percent
Interrupt 0 percent
Id_e 97 percent
Model RE-850
Serial ID 1000591462
Start time 2013-01-29 18:25:59 UTC
Uptime 4 days, 3 hours, 6 minutes, 2 seconds
Last reboot reason Router rebooted after a normal shutdown.
Load averages: minute 5 minute 15 minu�e
0.14 0.18 0.09
CPU Temperature
In addition to the ambient temperature surrounding the system components, you can see the actual CPU temperature of the RE.

Temperature Alarm Thresholds

• Use the show chassis temperature-thresholds
to display thresholds in degrees Celsius
• Fan speed
• Normal: Normal speed when component at or below temperature
• High: High Speed when component exceeds listed temperature
• Yellow alarm
• Normal: Component exceeds temperature
• Bad Fan: Component exceeds temperature and a fan has failed
• Red alarm
• Normal: Components exceeds this temperature
• Bad fan: Component exceeds temperature and a fan has failed
U5er@mx240> show chassis temperature-thresho1ds
?an speed Yellow alarm Red alann Fire Shu-:.dcwn
(degree:!5 C) (degree5 C) (degree3 C) {ciegree3 C)
Item Normal High Normal Bad fan Normal 3ad fan No:::mal
Chas:3is default ,a 54 65 55 75 65 100
Routing Engine 0 70 ao 95 95 110 110 112
R:::>uting Engine 1 70 so 95 95 110 110 112
FPC 1 55 60 75 65 90 BO 95
Temperature Thresholds
As temperatures rise within the chassis of a router running the Junos OS, the router will begin to protect itself by increasing
fan speeds or alerting you of the higher temperatures using a yellow or red alarm. The slide shows the temperature threshold
settings for an MX240 router. Note that if a component reaches the fire shutdown temperature threshold, the router shuts
down to stop the component from becoming damaged.

Setting Additional Alarm Thresholds

• Configure additional red and yellow alarm thresholds
under the [edit chassis alarm] hierarchy
[edit]
user@mx240/ set chassis alarm?
+ apply-groups Groups from which to inherit configuration data
+ apply-groups-except Don't inherit configuration data from these groups
> dsl DSl alarms
> ethernet Ethernet alarms
> integrated-services Integrated services alarms
> management-ethernet Management Ethernet alarms
> serial Serial alarms
> services services PIC alarms
> sonet SONET alarms
> t3 DS3 alarms[edit]
user@mx240/ set chassis alarm ethernet link-down red
Manual Alarm Settings

It is possible to change the default red and yellow alarm settings. The slide shows the various types of alarms that can be set
including a specific example that causes a red alarm when any transit Ethernet port goes to the link-down state on the
MX240.

Other Useful Hardware Commands
• show chassis...
• fan: displays the current fan speeds
• fpc: displays status of FPCs and PICs
• tfeb: displays status of control board
• hardware: shows installed hardware part and serial
numbers
• firmware: shows installed firmware
• show system...
• uptime: displays current time and uptime
• reboot: displays any scheduled reboots
Other Commands
The slide shows some other useful commands to display information about your router's hardware.

Displaying Chassis Inventory

{ma5ter}
user@t640> show chassis hardware
Eardware inventory:
I::.em ··ter3ion Part nurr.ber Serial nwnbe= Description
Chaasis 54970 T640
Miciplane REV 05 710-002726 AX5084
E'PM GBUS REV 08 710-002901 HF5013
FPM Display REV 04 710-0 2897 HF5248
CIP REV 06 710-002895 HG0718
PEM 0 Rev 04 740-002595 ML1251S Po·,.,er Entry Module
PEM 1 Rev 04 740-002595 .ML12517 Power Entry Module.
SCG 0 RE"':"."' 09 710-003423 HF9311
SCG 1 REV 09 710-003423 HFS302
Routir�g Engine 0 REV 01 740-005022 210865700267 RE-3.0
Routing Engine 1 REV 01 740-005022 210865700264 RE-3.0
CB 0 REV 10 710-002728 HF9619
CB 1 REV 10 710-002728 HF9629
FPC 1 REV 05 710-007529 HL7538 FPC Type 3
CPU REV 14 710-001726 HG27.SO
�1MB 0 REV 02 710-005555 HL7476 �-IB-:2S9!!U'.>it
MMB 1 REV 02 710-005555 HL7126 �.ffi-283tnbit
PPB 0 REV 04 710-002845 HJ7134 PPB Type 3
PPB 1 REV 04 710-002845 HJ7001 PPB Type 3
S?l".!3 0 REV 03 710-003229 HF5060

SP.!-IB l REV 03 710-003229 HF5045
SIB REV 01 750-005486 HG9926 SIB-IO-F16
SIB 1 REV 01 750-005486 HF7675 SIB-I8-F16
SIB 2 REV 01 750-005486 HF7734 S!B-I8-F16
SIB REV 01 750-005486 HF7736 SIB-I8-F16
SIB REV 01 750-005486 H.G9299 SIB-I8-F16
Displaying Chassis Inventory

The output of the show chassis hardware command displays the hardware components installed in the router. This command is
useful when troubleshooting or upgrading your router.
The show chassis hardware command output fields are the following:
Item: For the chassis component, information appears about the backplane, the power supplies, the maxicab (the
connection between the RE and the backplane), the SCB, and each of the FPCs and their Pl Cs.
Version: Displays the revision level of the chassis component.
Part number: Displays the part number of the chassis component.
Serial number: Displays the serial number of the chassis component. The serial number of the backplane is
also the serial number of the router chassis.
Description: For the power supplies, it displays the type of supply; for the PJCs, it displays the type of PIC.

Summary
• Described the key commands and features used to monitor
storage and memory issues
• Described the key commands and features that you can use
to monitor software installations
• Determined how to find potential hardware problems using
system logs
• Described the key commands that you can use to monitor
hardware and environmental issues
We Discussed:
The commands and features used to monitor storage and memory issues;
The commands and features that you can use to monitor software installations;
Finding potential hardware problems using system logs; and
The commands that you can use to monitor hardware and environmental issues.

Review Questions
1. How do you force a router to boot from rotating
media?
2. Describe ways in which you can troubleshoot
Junos platforms using visual indicators.
3. List three ways that you can use the Junos CLI to
assist in hardware troubleshooting.
4. Describe three ways of determining whether any
chassis alarms are present.
5. What CLI command searches the messages file for
all lines matching fail or error?
Review Questions
1.
2.
3.
4.
5.

Monitoring Hardware and Environmental

Conditions Lab
• Test flash drive integrity.

• View system logs to determine hardware-related
information.
• Set alarm thresholds.
Monitoring Hardware and Environmental Conditions Lab


1.
To force the router to boot from the rotating media, issue the command request system reboot media disk.
2.
When standing near a router running thejunos OS, you should be able to view the LEDs on the craft interfaces as well as the PEMs to
determine so1nc indication of hardware status.
3.
You can use the Junos CLI to assist in hardware troubleshooting by issue show system, show chassis, and show log commands.
4.
To determine whether there are any chassis alarms, you can look at the craft interface LCD, the alarm LEDs, or issue the show chassis
alarms conm1and.
5.
To search the messages log file for all line match fail or error, issue the command show log mes sages J match
11 fai1 I error 11.


JUntev�[
Chapter 6: Control Plane

Ju nos Troubleshooting in the NOC
Objectives

able to:
• Monitor and troubleshoot system processes that reside in
the control plane
• Utilize a logical approach to troubleshooting routing issues
that reside in the control plane
• Monitor and troubleshoot basic bridging and ARP
functionalities
We Will Discuss:
The monitoring and troubleshooting of control plane system processes;
A logical approach to troubleshooting routing issues; and
The monitoring and troubleshooting of basic bridging and Address Resolution Protocol (ARP) functionality.
Chapter 6-2 • Control Plane www.juniper.net

Agenda: Control Plane
-?Control Plane Review

• Monitoring System and User Processes
• Monitoring Routing Tables and Protocols
• Monitoring Bridging
• Monitoring the Address Resolution Protocol
Control Plane Review

The slides lists the topics we will discuss. We discuss the highlighted topic first.
www.juniper.net Control Plane • Chapter 6-3

Brains of a Junos Device

• Control plane resides on the Routing Engine
• Routing Engines serve as the brains of a device running the
Junos operating system
• Runs the Junos OS
• Based on an X86 or PowerPC architecture
• Contains memory and flash
• Manages protocols
• Manages system processes
• Manages user processes
• Maintains routing table
• Maintains forwarding table
• Maintains bridging table
• Connects to the data plane
• Controls some chassis components
The Control Plane Hosts the Brains of the Ju nos Operating System
When discussing the control plane of a device running the Junos operating system, the discussion revolves around the
Routing Engine (RE). The RE acts as the brains of a Junos device.
The RE runs various protocol and management software processes that reside inside a protected memory environment. The RE
is based on an X86 or PowerPC architecture that hosts flash memory and/or a hard disk drive, depending on the specific
platform running the Junos OS.
The RE maintains the routing tables, bridging table, and primary forwarding table and connects to the Packet Forwarding Engine
(PFE) through an internal link. It handles all protocol processes in addition to other software processes that control the device's
interfaces, the chassis components, system management, and user access to the device. These software processes run on top
of the Junos kernel, which interacts with the data plane. The software directs all protocol traffic from the network to the RE for
the required processing.
The RE provides the command-line interface (CU) in addition to the J-Web graphical user interface (GUI). These user interfaces
run on top of the Junos kernel and provide user access and control of the device. The RE controls the data plane by providing
accurate, up-to-date Layer 2 and Layer 3 forwarding tables and by downloading microcode and managing software processes
that reside in the data plane's microcode. The RE receives hardware and environmental status messages from the data plane
and acts upon them as appropriate.

Control Plane Process Separation

• Separation within separation
• Not only are the data plane and control plane separate, but
system processes are often separate
• Provides protection from one process bringing down the
entire control plane
Separation Provides Protection

A key aspect of the Junos OS is the separation of the control plane and the forwarding, or data, plane. The processes that
control routing and switching protocols are cleanly separated from the processes that forward frames and packets through a
Junos device. This design allows you to tune each process for maximum performance and reliability. The separation of the
control and data planes is one of the key reasons why the Junos OS can support many different platforms from a common
code base.
In addition to separation of the control and data planes, Junos OS functionality is compartmentalized into multiple software
processes. Each process runs in its own protected memory space, ensuring that one process cannot directly interfere with
another. When a single process fails, the entire system does not necessarily fail. This modularity also ensures that new
features can be added with less likelihood of breaking current functionality.

Control Plane Interaction

• Kernel provides the underlying infrastructure for all
the Junos processes
Control Plane running the Junos OS
Routing
Tables - Routing
Protocol
Process I Interface
IBI Chassis
I
I
Process Process
Forwarding
Table
I
t t
Layer2 Kernel (Operating System)
Bridging � Protocol
Table Process
I Intel PCI Platform

I
The Kernel
The Junos kernel provides the underlying infrastructure for all the Junos processes. It is responsible for scheduling and device
control. In addition, the kernel provides the link between the routing and switching tables and the RE's forwarding table. It is
responsible for all communication with the data plane, which includes keeping the PFE's copy of the forwarding table
synchronized with the master copy in the RE.


• Control Plane Review
7Monitoring System and User Processes
Monitoring System and User Processes


System Processes
• Processes in the user space interact with the kernel
and are often called daemons
• The Junes OS runs a variety of daemons:
user@router> show system processes extensive I count

Count: 125 lines
user@router> show system processes extensive I match rpd

!12::s! root 1 4 0 !41364K!�0416Klkqread 5 : :: B lo . o o� IrpdI
Process ID (PIO) Total l,e RJdent

Memory
CPU
/! Process Name
Usage
Junos Processes
Processes in the Junos OS run as daemons, or programs, in the background of the operating system. These processes run in
the user space of the operating system. A typical operating system is comprised of the user space and the kernel space. The
kernel space is reserved for kernel operations. Both spaces run in separate memory allocations.
Key Daemons
The show system processes command displays the processes running on the RE in a manner similar to a ps -ax
listing at a shell prompt. You can use this command to confirm that a given daemon (or process) is running, and to determine
what Process ID (PIO) it was assigned. In the Junos OS, the init process is a meta-daemon that starts, monitors, and, if
needed, restarts other daemons. The routing protocol process (rpd), chassis control daemon (chassisd), and the device
control daemon (dcd) are some of the key processes in the Junos OS. The following output shows a list of processes running
on an MX Series 30 Universal Edge Router. including the process name, PIO, raw CPU usage, and memory allocation. The
output has been trimmed for brevity.
user@mx> show system processes extensive I no-more
last pid: 32186; load averages: 0.00, 0.00, 0.00 up 74+17:38:15 15:55:30
119 processes: 2 running, 89 sleeping, 28 waiting

Key Daemons (contd.)
Mem: 322M Active, 34M Inact, 66M Wired, 135M Cache, 112M Buf, 1432M Free
Swap: 2915M Total, 2915M Free
PIO USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 1 171 52 OK 16K RUN 1773.8 98.05% idle
13 root 1 -20 -139 OK 16K WAIT 594:38 0.00% swi7: clock
1114 root 1 96 0 29872K 11624K select 137:25 0.00% chassisd
1268 root 3 20 0 42508K 12680K sigwai 85:35 0.00% jpppd
1271 root 1 96 0 10920K 4676K select 48:47 0.00% jdiameterd
1276 root 2 96 0 18828K 7832K select 32:34 0.00% pfed
1111 root 1 96 0 1968K 808K select 30:27 0.00% bslockd
1278 root 1 96 0 15856K 11140K select 17:43 0.00% snmpd
1132 root 1 96 0 3656K 936K select 15:43 0.00% license-check
1115 root 1 96 0 5440K 1976K select 15:09 0.00% alarmd
12 root 1 -40 -159 OK 16K WAIT 12:39 0.00% swi2: net
28 root 1 -68 -187 OK 16K WAIT 12:36 0.00% irq36: tsecl
23 root 1 -52 -171 OK 16K WAIT 10:20 0.00% irq43: i2c0 i2cl
1275 root 1 96 0 4756K 1980K select 8:47 0.00% irsd
56 root 1 12 0 OK 16K - 7:49 0.00% schedcpu
1265 root )_ 96 0 5932K 2736K select 7:37 0.00% cfmd
15 root 1 -16 0 OK 16K - 7:33 0.00% yarrow
1279 root 1 96 0 30040K 8684K select 6:35 0.00% dcd
2 root 1 -8 0 OK 16K - 6:19 0.00% g_event
44 root 1 20 0 OK 16K syncer 5:43 0.00% syncer
43 root )_ 20 0 OK 16K vnlrum 5:01 0.00% vnlru rnem
27 root 1 -68 -187 OK 16K WAIT 4:37 0.00% irq35: tsecl
3 root 1 -8 0 OK 16K - 4:36 0.00% g_ up
4 root 1 -8 0 OK 16K - 4:32 0.00% g_down
48 root 1 -16 0 OK 16K psleep 4:09 0.00% vmkmemdaemon
1128 root 1 96 0 7568K 3200K select 3:15 0.00% bfdd
28864 root 1 96 0 18328K 8384K select 3:09 0.00% 12ald
1264 root 1 96 0 5108K 2144K select 2:58 0.00% lfmd
1277 root 96 0 12396K 6736K select 2:34 0.00% mib2d
5122 lab 1 96 0 22032K 13168K select 2:08 0.00% cli
1124 root 1 96 0 6420K 3260K select 1:56 0.00% ppmd
1261 root 1 96 0 72636K 61272K select 1:55 0.00% dfcd
1129 root 1 96 0 8900K 3192K select 1:53 0.00% lacpd
1167 root 1 96 0 19068K 6712K select 1:38 0.00% cosd
1272 root 1 4 0 7728K 3792K kqread 1:32 0.00% mcsnoopd
9 root 1 171 52 OK 16K pgzero 1:31 0.00% page zero
1166 root 1 96 0 6880K 2804K select 1:31 0.00% ilmid
1116 root 1 96 0 6604K 1888K select 1:31 0.00% craftd
46 root 1 -16 0 OK 16K sdflus 1:25 0.00% softdepflush
1130 root 1 96 0 12812K 3372K select 1:21 0.00% bdbrepd
20755 root 1 96 0 2452K 2300K select 1:16 0.00% ntpd
1269 root 1 96 0 4616K 1680K select 0:58 0.00% iccpd
42 root 1 -16 0 OK 16K psleep 0:43 0.00% bufdaemon
28868 root 1 4 0 12260K 5440K kqread 0:37 0.00% 12cpd
45 root 1 -4 0 OK 16K vlruwt 0:37 0.00% vnlru
27887 root 1 4 0 41652K 12444K kqread 0:33 0.00% rpd
50 root 1 -16 0 OK 16K psleep 0:33 0.00% vmuncachedaemon
20763 root 1 96 0 7808K 3564K select 0:26 0.00% rmopd
1280 root 1 96 0 19980K 8552K select 0:25 0.00% dfwd
1117 root 1 96 0 37852K 17516K select 0:20 0.00% mgd

Disabling and Restarting System Processes

• Use the operational mode restart command to
restart a system process:
user@router> restart routing?
<[Enter]> Execute this command
gracefully Gracefully restart the process
immediately Immediately restart (SIGKILL) the process
• Disabling a system process: Note: Disabling system processes can cause

instabilities and should only be performed as a
[edit system processes] troubleshooting step under the guidance of JTAC.
lab@mx0-2# show
ethernet-link-fault-management disable;
Nov 8 21:35:08 mxo-2 init: ethernet-link-fault-managernent (PIO 1268)
terminate signal sent
Nov 8 21:35:08 mxo-2 init: ethernet-link-fault-managernent (PIO 1268) exited
with status=O Normal Exit
Restarting System Processes

When performing troubleshooting steps, you can restart a system process using the restart command as shown on the slide.
This measure should be considered a last resort measure and only utilized once it has been determined a specific process
has an instability. Restarting an individual process might allow you to resolve a troubled daemon without performing a full
system reboot. The Junos OS sometimes automatically restarts daemons once it has found an instability. The Junos OS
allows you to restart only certain key processes:
user@mx> restart ?
adaptive-services Adaptive services process
ancpd-service Access Node Control Protocol Process
audit-process Audit process
auto-configuration Interface Auto-configuration
chassis-control Chassis control process
class-of-service Class-of-service process
database-replication Database Replication process
dhcp-service Dynamic Host Configuration Protocol process
diameter-service Diameter process
disk-monitoring Disk monitoring process
dynamic-flow-capture Dynamic flow capture service
ecc-error-logging ECC parity errors logging process

Restarting System Processes (contd.)

ethernet-connectivity-fault-management Connectivity fault management process
ethernet-link-fault-management Ethernet OAM Link-Fault-Management process
event-processing Event processing process
firewall Firewall process
general-authentication-service General authentication process
gracefully Gracefully restart the process
iccp-service Inter-Chassis Communication Protocol daemon
immediately Immediately restart (SIGKILL) the process
interface-control Interface control process
ipsec-key-management IPSec Key Management daemon
12-learning Layer 2 address flooding and learning process
12cpd-service Layer 2 Control Protocol process
lacp Link Aggregation Control Protocol process
license-service Feature license management process
link-management Link management process
mac-validation MAC SA Validation
mib-process Management Information Base II process
mobile-ip Mobile IP process
mountd-service Service for NFS mounts requests
mpls-traceroute MPLS Periodic Traceroute process
mspd Multi.service Daemon
multicast-snooping Multicast Snooping process
named-service DNS server process
nfsd-service Remote NFS server
packet-triggered-subscribers Packet-Triggered Dynamic Subscribers & Policy Control
peer-selection-service Peer selection service process
pgcp-service Packet gateway service process
pgm Pragmatic General Multicast process
pie-services-logging PIC services logging process
PPP PPP process
ppp-service Universal edge PPP process
pppoe Point-to-Point Protocol over Ethernet process
redundancy-interface-process Redundancy interface management process
remote-operations Remote operations process
routing Routing protocol process
sampling Traffic sampling control process
sbc-configuration-process SBC configuration process
sdk-service SDK Service Daemon
secure-neighbor-discovery Secure Neighbor Discovery Protocol process
service-deployment Service Deployment System (SDX) process
services Restart a service
snmp Simple Network Management Protocol process
soft Soft reset (SIGHUP) the process
static-subscribers Static Subscriber Client
tunnel-oamd Tunnel OAM process
vrrp Virtual Router Redundancy Protocol process
Disabling a System Process

You can also disable a system process through configuration as shown on the slide. However, disabling processes manually
can cause instabilities in the Junos OS and should only be performed under the direction of the Juniper Technical Assistance
Center (JTAC).

System Process Summary
• Use the summary command modifier for a complete

look at the device's overall process state:
user@router> show system processes swmnary
last pid: 21038; load averages: 0.23, 0.18, 0.07 up 55+23:28:58
21:49:06
121 processes: 2 running, 91 sleeping, 28 waiting
Mem: 306M Active, 34M Inact, 62M Wired, 139M Cache, 112M Buf, 1448M Free
swap: 2915M Total, 2915M Free
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 1 171 52 OK 16K RUN 1328.7 98.05% idle
Hint: Use show task memory detail over time to identify memory leaks.
Monitoring System Processes
You can monitor the overall state of system processes with the show system processes summary command as
shown on the slide. The output of the command lists the last PIO to start, CPU load averages over time, total processes,
process states and memory allocations. Note that the memory usage displayed represents allocated memory, rather than
actual memory usage.
Rarely, a Junos device might experience a memory leak. Although this issue and its resolution are usually handled by JTAC,
you might be asked to provide the several outputs of show task memory detail over intervals of time. This command
provides memory utilization per process and capturing it over time allows JTAC to examine which process or processes might
be eating memory.

Other System Commands

• System connections represent active IP sockets for
which the Junos device is acting as a server
user@router> show system connections match "10.210.15.30"
tcp4 0 0 10.210.15.8.22 10.210.15.30.61944 ESTABLISHED
• System alarms are raised for licensing violations

user@router> show system alarms
No alarms currently active
• Use show system statistics to view control

plane counters separated by protocol
System Connections
The show system connections command shows open IP sockets on the Junos device. It is equivalent to UNIX shell
command netstat -a and also displays open ports associated with the connection.
You can use the I etc/services file to determine service-to-port mappings:
user@rnx> file show /etc/services I match 22
ssh 22/tcp #Secure Shell Login
ssh 22/udp #Secure Shell Login
System Alarms
Two types of alarms exist on a Junos device: chassis alarms and system alarms. Although chassis alarms are more common
and pertain to a wide variety of chassis alarm conditions, system alarms are reserved for licensing issues and the absence
of a rescue configuration.

System Statistics
The show system statistics command is useful for ascertaining problems with specific control plane protocol
connections and contains a number of options for specific protocols:
user@mx> show system statistics ?
arp Address Resolution Protocol
bridge IEEE 802. 1 Bridging
clns Connectionless Network Service
esis End System-to-Intermediate System
ethoamcfm Ethernet OAM protocol for connectivity fault management
ethoamlfm Ethernet OAM protocol for link fault management
icmp Internet Control Message Protocol
icmp6 Internet Control Message Protocol for IPv6
igmp Internet Gateway Management Protocol
ip IP version 4 (IPv4)
ip6 IP version 6 (IPv6)
mpls Multiprotocol Label Switching
rdp Reliable Datagram Protocol
tcp Transmission Control Protocol
tnp Trivial Network Protocol
ttp TNP Tunneling Protocol
tudp Trivial User Datagram Protocol
udp User Datagram Protocol
vpls Virtual private LAN service
I Pipe through a command
user@mx> show system statistics tcp
lab@mxA-1# run show system statistics tcp
Tep:
3800758 packets sent
144249 data packets (7807134 bytes)
2 data packets retransmitted (88 bytes)
O resends initiated by MTU discovery
3652997 ack only packets (1193261 packets delayed)
O URG only packets
9 window probe packets
2875 window update packets
57113 control packets
14709613 packets received
118730 acks(for 7799516 bytes)
12921325 duplicate acks
O acks for unsent data
1579265 packets received in-sequence(506671113 bytes)
2271070 completely duplicate packets(40 bytes)
O old duplicate packets
O packets with some duplicate data(O bytes duped)
O out-of-order packets(O bytes)
O packets of data after window(O bytes)
O window probes
33686 window update packets
197 packets received after close
O discarded for bad checksums
O discarded for bad header offset fields
O discarded because packet too short
The output has been trimmed for brevity.

User Processes
• Log out of the Junos device gracefully to prevent hung
user sessions:
user@router> show system users
9:46PM up 56 days, 23:25, 2 users, load averages: 0.41, 0.18, 0.07
USER TTY FROM LOGIN@ IDLE WHAT
lab uO 21Sep10 - -cli (cli)
lab po 10.210.15.30 9:45PM - -cli (cli)
• tty of uO indicates a console session

• tty of pO indicates a remote Telnet or SSH session
• Clearing user sessions:
user@router> request system logout user lab terminal uO
logout-user: done
User Processes
Each time a user logs in to the Junos OS, a new process associated with a PID is created just as with other system processes.
User processes can be viewed with the show system users command as shown on the slide. A teletype (tty) session with
the character "u" represents a console type of session. A tty with the character "p" represents a remote Telnet or SSH type of
session.
Clearing User Sessions

If users do not gracefully log out of a Junos device, the Junos OS might be left with hung user sessions. You can clear user
sessions with the request system logout command shown on the slide. If multiple users are logged in with the same
user name, you can specify which user session to clear with either the tty or PID. (Use the CU command show system
processes extensive I match cli to find the PIDs for user sessions.)


�Monitoring Routing Tables and Protocols
Monitoring Routing Tables and Protocols

Chapter 6-16 • Control Piane www.juniper.net

Routing Protocol Daemon
• Core functions:
• Controls routing protocols running on router
• Starts all configured protocols
• Handles all routing messages
• Maintains routing tables
• Implements routing policy
• Maintains its own scheduler
• Prioritizes and switches between routing tasks
Routing-�
Tables r·· · ?2::r
Junos Kernel
I�
The Routing Protocol System Process
The routing protocol daemon controls the routing protocols running on the router. It starts all configured routing protocols and
handles all routing messages. It also maintains one or more routing tables, which are also called Routing Information Bases
(RIBs). These tables consolidate the routing information learned from various routing protocols into a common table.
The routing protocol process determines the active routes to network destinations and installs these routes into the RE's
forwarding table, also called the forwarding information base (FIB). Finally, it implements routing policy, which allows you to
control the routing information that is transferred between the routing protocols and the routing table. Using routing policy, you
can filter routing information or modify attributes associated with the routes, such as adding or removing BGP communities.
The Junos OS implements unicast and multicast IP routing functionality for IP version 4 (1Pv4) and IP version 6 (1Pv6) and also
supports MPLS signaling and switching.

The Routing Daemon Maintains Its Own Scheduler
A multitasking operating system uses the concept of a scheduler to allocate slices of CPU to various processes based on a
given process's relative priority. The Junos kernel, which is based on the FreeBSD kernel, uses a scheduler to prioritize and
service the various daemons and processes that run on the RE. The rpd is unique in that it is the only process in the Junos
OS that has its own internal task scheduler. Thus, rpd manages allocation of resources (memory and CPU time) among its
internal tasks. To the Junos kernel, rpd appears as a single process, and the kernel seamlessly switches between rpd and all
other running processes.

rpd Scheduler Slips

• Scheduler slips indicate that rpd was not able to
service its internal processes in a timely manner
• An indicator that rpd is too busy
• Slips are reported in the syslog
Nov 15 12:32:31 router rpd[309]: RPO SCHED SLIP: 10 sec
scheduler
• Task accounting can help identify the cause of a slip
RPD Scheduler Slips

When rpd performs its tasks too slowly, perhaps because of the RE's CPU being used by another process, or because rpd has
been forced to do extra work, a scheduler slip can occur. These slips are reported in the system log, along with an indication of
the duration of the slip. The presence of scheduler slips is an indication that your device is working too hard; this indication is
definitely something that should be looked into and resolved.
One possible cause for an overworked rpd is excessive tracing, for example, using the all flag while significant route flap or
churn is occurring within that protocol.
The rpd processes sockets. When RPD_ SCHED_SLIP is logged, the error indicates that rpd took too long between
opportunities to process sockets. This delay is communicated to the protocols, which causes the protocols to extend the time
the protocols would ordinarily wait to receive an rpd response. This extension is helpful because if rpd is not responding due to
being consumed with protocol convergence, encouraging the protocols to wait prior to taking definitive action (in the absence of
a response from rpd) would prevent exponential impact on rpd-which would be the result if the protocols timed out and
needed to reinitiate all connections. RPD_SCHED_SLIP in the logs indicates rpd is effectively back online, and that it is able to
process sockets.

Task Accounting
The best way to troubleshoot scheduler slips is to temporarily enable task accounting.To enable task accounting, issue the
hidden operational mode command set task accounting on. Because task accounting adds a significant processing
burden to a system that is evidently already busy enough, hence the slips, you must take care to ensure that accounting is
turned off after a few minutes with a set task accounting off command. Note that this command is unhidden once you
have turned on task accounting.
When you enable task accounting, the rpd scheduler increases the verbosity of its system logging. The added detail should
help identify where rpd is spending all of its time. An example of the added logging detail is shown:
Nov 01 12:00:00 router rpd[609): excessive runtime: BGP 65019.192.168.1.1+179
ran for 12.908 (12.885 user, 0.023 system)
Nov 01 12:00:01 router rpd[609): task_monitor slip: 10s scheduler slip
From this log entry, you can determine that rpd spent over 12 seconds processing BGP updates from peer 192.168.1.1 in
AS 65019.
The output of a show task command with the hidden accounting switch confirms whether task accounting is currently
enabled, and if so, displays a list of the busiest processes.Using the added log file detail and the output of a show task
accounting command, you should be able to at least identify the nature, if not the actual cause, of scheduler slips.
Once you have captured the output of a show task accounting command for submission to JTAC, be sure to disable
task accounting with a set task accounting off command so that additional burden is not placed on your router.
user@mx> set task accounting on
Task accounting enabled.
user@mx> show task accounting

Task accounting is enabled.
Task Started User Time System Time Longest Run

Scheduler 21 0 0.001 0. 000
Memory 1 0 0.000 0.000
KRT 6 0 0.000 0.000
Redirect 1 0 0.000 0.000
MGMT_Listen./var/run/rpd_ 1 0 0.000 0.000
SNMP Subagent./var/run/sn 3 0 0.000 0.000
user@mx> set task accounting off

Task accounting disabled.

Core Files
• Core dump files
• Generated by system process crashes (or forcibly)
• Files should be uploaded to JTAC and associated with a JTAC
case number
• Core dumps fall into three categories:
• Process: Processes running on the Routing Engine
• Kernel: The Routing Engine kernel itself
• PFE boards: The microkernel OS running on the PFE boards
• Check for core dump files
• System syslog
•request support information
•show system core-dumps
• /var/tmp typically hosts process cores
• /var I crash typically hosts kernel and PFE cores
Core Dump Files

Today's internetworking software is exceedingly complex. As a result, equally complex bugs that result from unforeseen
circumstances can result in a fatal error within a software process. Most of these software faults relate to illegal memory
operations caused by the process attempting to read or write data from a memory area outside the boundaries allocated for
that process. In some cases, faulty hardware, such as failing memory, can cause stack or register corruption, which leads to a
fatal error in a software process. Core and log file analysis are used to determine if hardware errors have led to a software panic.
In a monolithic operating system, such a fault results in a crash of the entire operating system. In contrast, the protected
memory environment of the Junos OS ensures that other aspects of the operating system are not affected by faults in other
processes.
Even so, it can be very difficult to diagnose the exact set of events that lead up to a process crash unless a core file is left
behind for forensic analysis. A core file represents the set of memory locations and stack data that was in place at the time of
the fault. A copy of the binary image that left the core file (with debug symbols included) is then run against the core file using a
debugger to enable problem diagnosis by a software engineer.
Any core dump file should result in the creation of a case with JTAC. JTAC will request you upload the core file to the JTAC FTP
server for analysis.

Core Dump Files (contd.)

JTAC engineers typically deal with three types of core files. These files are the following:
Junos process cores: Each daemon process, such as the chassis management or automatic protection switching
daemons (chassisd/apsd), is capable of leaving a core when a panic occurs.
Junos kernel (RE) cores: A kernel core file is left by the Junos kernel when it encounters a panic condition. A copy of
the virtual memory state (which can be quite large) is also saved.
PFE cores: Various components in the PFE contain their own microprocessors that run a microkernel. Each of the
PFE's embedded hosts is capable of dumping a core file when a crash (panic) occurs.
Core File Locations

Control plane system process core dump files are stored on the RE and a system log message is generated.
Core files created by a kernel panic are stored in the /var/crash location when the system dump-on-panic option is
enabled (hidden) at the [edit system) hierarchy. This option is enabled by default. Core files generated by a daemon
process are stored in the /var/tmp directory.
When a PFE component dumps a core, the resulting stack trace is written into that component's NVRAM. If you enable
chassis dump-on-panic (hidden) at the [edit chassis J hierarchy, a copy of the core is also stored in the /var I
crash directory on the RE. We recommend this option, and it is the default.
You can use the command show system core-dumps to quickly determine whether any core files are stored on the RE.
user@mx> show system core-dumps
/var/crash/*core*: No such file or directory
/var/tmp/*core*: No such file or directory
/var/crash/kernel .*: No such file or directory
/tftpboot/corefiles/*core*: No such file or directory

Troubleshooting Methodology (1 of 3)
• New or existing implementation?
• Understanding is important for isolating the issue
• Do no harm! LeastSevereAction
• Clearing a route or database entry
• Single route must refresh
• Bouncing a protocol session or neighborship
• All learned routes must refresh
• Bouncing a protocol
• All adjacencies or peerings must re-establish
• Restarting routing (rpd)
• All routing must restart
• Rebooting the device
• All system processes must restart
MostSevereAction
New or Existing Network Element?

Before beginning to troubleshoot a routing problem, the most common question a JTAC engineer asks is whether the issue is
related to an existing network element or a new implementation. Determining this information up front will change the
approach to problem resolution. You use different sets of information gathering for an OSPF problem that arises on an
existing adjacency versus an OSPF adjacency that is brand new but not coming up.
Do No Harm!
Recall our troubleshooting methodology mantra of do no harm! When troubleshooting routing protocol issues on a live
network, this mantra becomes especially important. The slides display some common resolutions in order of the least severe
impact to the most severe impact on a network. Although it might seem like common sense, it deserves stating that
restarting routing might force your OSPF adjacency to re-initiate, but remember that it will also bounce all those BGP
sessions your network is relying on.

• Define success (and isolate)
• Route received from neighbor
• Check protocol adjacency
• Check protocol database
• Route appears in routing table
• Check preference
• Test import routing policy
• Route being advertised to neighbor
• Check protocol adjacency
• Test export policy
• Route is stable
• Check logs. interfaces. and protocol traces
Define Success
As part of the routing troubleshooting process, be sure to have a clear, unified definition of success. Some common
questions might include:
Is the device receiving expected routes from its neighbor?
Is the suspect route in the routing table?
Is the suspect route being advertised to its neighbor?
Is the route stable over time?

• Identify and implement a solution
• Repair hardware issue
• Adjust protocol configuration
• Adjacency configuration
• Metrics and preferences
• Policy
• Adjust implementation
• Prevent link overutilization
• Test in lab environment
Implementing a Solution
Once you formulated a theory as to the nature of the issue, it is time to implement change that, hopefully, results in
resolution. This change might involve a hardware swap, configuration changes, network infrastructure changes, for example,
implementing a new link to share the load of an over-utilized link.
If it all possible, test your change in a lab environment to monitor the effects in a controlled, and more importantly, test
environment where the impact will not be shared by the live network. At the very least, implement changes in an announced
maintenance window so that end user effects are minimized.

General Path Troubleshooting (1 of 4)
• Host X is no longer able to reach host Y:
1
� Router D
RouterA �
.1
Router F
-:92168.50/24 "
2
®
Router E
Host X HostY
• Assume local host-to-router connectivity is successful
Path Troubleshooting case Study

The slide illustrates an example topology we will use over the next few slides to perform general routing path troubleshooting.
In this example, Host X, which previously was able to reach Host Y, is no longer able to communicate with Host Y.
For this example we will assume the host-to-router connectivity has been verified as functioning properly.

• Ping testing Router B Router C
• Router A cannot ping router D

user@router-A> ping 192.168.30.2
PING 192.168.30.2 (192.168.30.2): 56 data bytes Router F Router E
"C
192.168.30.2 ping statistics
6 packets transmitted, 0 packets received, 100% packet loss
user@router-A> ping 192.168.40.2

PING 192.168.40.2 (192.168.40.2): 56 data bytes
"C
192.168.40.2 ping statistics
6 packets transmitted, O packets received, 100% packet loss
Ping
The slide illustrates the most often used method of network troubleshooting: an Internet Control Message Protocol (ICMP)
ping test. In this case, the ping is sourced from Router A and issued to both known interfaces on Router D with no success.
By default, the Junos OS sources ping packets from the egress interface and default routing instance of a device. The default
command results in a continuous ping with a data payload size of 56 bytes and can be stopped with a Ctrl+C keystroke. You
can alter many aspects of this behavior (output trimmed for brevity):
user@mx> ping ?
<host> Hostname or IP address of remote host
atm Ping remote Asynchronous Transfer Mode node
bypass-routing Bypass routing table, use specified interface
clns Ping ISO node
count Number of ping requests to send (1 .. 2000000000 packets)
detail Display incoming interface of received packet
do-not-fragment Don't fragment echo request packets (IPv4)
ethernet Ping to an ethernet host by unicast mac address
inet Force ping to IPv4 destination
inet6 Force ping to IPv6 destination
interface Source interface (multicast, all-ones, unrouted packets)

• Traceroute results: Router B RouterC
Router F Router E
user@router-A> traceroute 192.168.30.2

traceroute to 192.168.30.2 (192.168.30.2), 30 hops max, 40 byte packets
1 192.168.10.2 (192.168.10.2) 0.464 ms 0.334 ms 0.330 ms
2 192.168.20.2 (192.168.20.2) 0.406 ms 0.364 ms 0.356 ms
3 * * *
4 * * *
5 * *"C
Traceroute
The second most commonly used routing troubleshooting tool is the traceroute command. Performing a traceroute
results in ICMP packets sent to each hop in a path by incrementing the time-to-live (TIL) value of each subsequent packets
by one. By monitoring the responses of each host in the path, network operating systems such as the Junos OS can present
you with a reachability map of the network. You can use this map to isolate where a problem might reside.
In the example on the slide, Router A is performing a traceroute to one of Router D's interfaces. As shown on the slide, the
traceroute is not completely successful.

• Check your knowledge

• What further isolation steps would you perform?
• What are some of the possible causes?
• What are some possible solutions?
• How would you test possible solutions?
Router B Router C
Router F Router E
Check Your Knowledge

The slides lists some questions for discussion.

Protocol Troubleshooting Chart
Chassis. software. interface. and

transmission line are OK
No Suspect
configuration
or lGP
Suspect
Suspect lGP No remote
Yes configuration l+--C. peer
policy
lnvestil!ate
forwariling
faults Suspect l)Olicy Suspect policy
orlGP orlGP
configuration configuration
Protocol Troubleshooting Chart

The slides illustrates a high-level chart useful for isolating routing protocol troubleshooting.

Working with Protocols (1 of 4)
• Helpful commands:
• Protocol show commands:
user@router> show ospf neighbor
Address Interface state ID Pri Dead
172.18.5.1 ge-1/0/2.144 Full 192.168.37.1 128 31
• Disabling a protocol (Disabling single interface is less drastic):

[edit protocols ospf area 0.0.0.0J
user@router> show
interface ge-1/0/2.144
disable;
interface all;
• Restart routing (most drastic):

user@router> restart routing
Routing protocols process started, pid 28667
Commands for Protocol Troubleshooting

The slide lists some example commands used in troubleshooting routing protocols. Junos CLI show commands should be
used in lieu of configuration examination for the most efficient troubleshooting. You can examine most protocol adjacencies
using a show command. The slide illustrates the command to view an OSPF adjacency. Similar results can be obtained for
BGP and IS-IS using show bgp and show isis commands.
The slides illustrates how to disable the OSPF protocol over a single interface, which has a less drastic effect and is preferred
over disabling the entire OSPF protocol. This approach also has a much less drastic effect than restarting routing, which is
also displayed on the slide.

• Helpful commands (contd.):

• View routes from a specific protocol:
user@router> show route protoco1 ospf
• View routes from protocol perspective (before routing table):

user@router> show ospf database
• Clear a protocol adjacency:

user@router> c1ear ospf neighbor
• Clear a specific entry in a protocol database:

user@router> c1ear ospf database 1sa-id isa-id�
/
Sets LSA to MAXAGE. resulting
in re-advertisement from
originator
Helpful OSPF CLI Commands

The slide illustrates typical commands to monitor and troubleshoot OSPF. The show route ospf command narrows the
output from the routing table to only OSPF routes. The show ospf database command provides a detailed view of the
OSPF protocol database including link-state advertisement (LSA) types and entries contained in the various OSPF areas.
Note that viewing the OSPF database provides a view of OSPF routing information before it is populated in the routing table.
The slide also illustrates the commands for clearing an OSPF adjacency, useful when troubleshooting adjacency issues and
for clearing an LSA from the OSPF database. The highlighted example uses the purge option, which clears the LSA by
setting it to its maximum age value. This information is sent to neighbors, resulting in a re-advertisement-which is useful
when testing changes to OSPF configuration.

Junos T roubleshooting in the NOC
• Monitoring protocol traffic:

• System logging:
user@router> show log messages I match ospf
Nov 23 23:37:59 mxD-2 rpd[20773]: RPD_OSPF_NBRDOWN: OSPF neighbor
172.18.5.1 (realm ospf-v2 ge-1/0/2.144 area 0.0.0.0) state changed from
Full to Down due to KillNbr (event reason: interface went down)
• Traceoptions:
(edit protocols ospf]
user@router# set traceoptions flag?
all Trace everything
database-description Trace database description packets
error Trace errored packets
event Trace OSPF state machine events
flooding Trace LSA flooding
Logging and Tracing

You can monitor protocol operations with standard system logging. To include rpd messages, system logging must be
configured to include the daemon facility and the obtained output will be dependent upon the logging level.
For a much more detailed analysis of protocol operations, use protocol traceoptions. Protocol traceoptions are configured
under the protocol and allow you to flag various aspects of the protocol operations for storage in a separate user-named log
file. You can append the detail configuration option for a flag for even more detailed information.
Tracing can also be set for the routing protocol process (rpd) and tracks all general routing operations and records them in a
log file. Rpd tracing options can be set under the [edit routing-options traceoptions J hierarchy. (Note: Some
traceoptions flags generate an extensive amount of information. Tracing can also slow down the operation of routing
protocols. Delete or deactivate the traceoptions configuration if you no longer require it.)

• Monitoring protocol traffic (contd.):

• Monitoring interface traffic:
user@router> monitor traffic interface interface-name no-reso1ve
verbose output suppressed, use <detail> or <extensive> for full protocol
decode
Address resolution is OFF.
Listening on ge-1/0/2.144, capture size 100 bytes
00:24:29.389895 In IP 172.18.5.1 > 224.0.0.5: OSPFv2, Hello, length 44

00:24:33.111454 Out IP truncated-ip - 26 bytes missing! 172.18.5.2 >
224.0.0.5: OSPFv2, Hello, length 44
00:24:38.916834 In IP 172.18.5.1 > 224.0.0.5: OSPFv2, Hello, length 44
00:24:42.024384 Out IP truncated-ip - 26 bytes missing! 172.18.5.2 >
224.0.0.5: OSPFv2, Hello, length 44
"C
4 packets received by filter

O packets dropped by kernel
Monitoring Interface Traffic

The slide illustrates example output from the monitor interface command. Interface monitoring allows the capture of
packets sent to and from the RE. It proves as a another useful protocol troubleshooting tool by allowing you to monitor
routing protocol packets. The output can be viewed live, as shown on the slide, or piped (I) and saved to a file for analysis
using a third-party protocol decoder.
The monitor interface also provides options to alter the capture and output:
user@mx> monitor traffic interface interface-name ?
absolute-sequence Display absolute TCP sequence numbers
brief Display brief output
count Number of packets to receive (0.. 1000000 packets)
detail Display detailed output
extensive Display extensive output
layer2-headers Display link-level header on each dump line
matching Expression for headers of receive packets to match
no-domain-names Don't display domain portion of hostnames
no-promiscuous Don't put interface into promiscuous mode
no-resolve Don't attempt to print addresses symbolically
The output has been trimmed for brevity.

Troubleshooting an OSPF Adjacency (1 of 5)

• New OSPF Area O adjacency not coming up:
user@mx2> show ospf neighbor
user@mx2>
VLAN 144
17218 5.0/30
AreaO
OSPF Adjacency Case Study

The slide illustrates the topology for an OSPF adjacency case study that we cover over the next few slides. It also illustrates
the output of show ospf neighbor which, in our example, contains no output, indicating a problem with the adjacency
formation.


• What information can you derive without studying the
configuration?
• Other protocol show commands:
user@mx2> show ospf interface
Int:erface State Area DR ID BDR ID Nbrs
ge-1/0/2.144 DR 0.0.0.0 192.168.38.l 0.0.0.0 0
user@mx2> clear ospf statistics

user•]mx2> show ospf statistics
Packet type Total Last 5 seconds

Sent: Received sent Received
Hello 3 0 0 0
DbD 0 0 0 ()
LSReq 0 0 0 0
LSUpdate 0 0 0 0
LSAck 0 0 0 0
Troubleshooting the Adjacency with show Commands

To determine the correct interface is enabled for OSPF Area 0.0.0.0, we issue the show ospf interface command. The
output confirms our configuration. To shed further light on the situation, we clear the OSPF protocol statistics, thus
establishing a baseline. We then view the OSPF protocol statistics. Although the output is trimmed for brevity, we can see
that we appear to be receiving OSPF hello messages but we do not see any sent hellos.

• Other outputs to check:

• System log:
user@mx2> show system uptime I match Current
Current time: 2013-02-11 23:14:10 UTC
user@rnx2> show l.og messages I match "Nov 22" I match ospf

Nov 22 00:15:31 mxD-2 rpd[28667]; RPD_OSPF_NBRDCWN: OSPF neighbor
172.18.5.1 (realm ospf-v2 ge-0/0/0.0 area 0.0.0.0) state changed from
Full to Down due to InActiveTimer (event reason: neighbor was inactive
and declared dead)
• No related log messages in recent past
Checking the System Log

Because we have a configured system log file named messages, we view the system log for any messages relating to o:.ir
adjacency issue by matching on OSPF. Correlating the log timestamps with the system time, we come to the realization that
no recent messages have been logged since the adjacency issue arose.

• Other outputs to check (contd.):

• Configure traceoptions:
user@rr�2# show traceoptions
file ospftrace;
flag hello; Note: Bounce OSPF
first to resta rt
• Alas! mx2 is sending Hello messages: adjacency formation
process
user@mx2# run show 1og ospftrace I find he11o
Nov 24 00:59:45.898113 OSPF sent Hello 172.18.5.2 -> 224.0.0.5 (ge-1/0/2.144
IFL 76 area 0.0.0.0)
Nov 24 00:59:45.898595 version 2, length 44, ID 192.168.38.1, area 0.0.0.0
Nov 24 00:59:45.898619 mask 255.255.255.252, hello_ivl 10, opts Ox2, prio
128
Nov 24 00:59:45.898639 dead ivl 40, DR o.o.o.o, BDR 0.0.0.0
OSPF Traceoptions
To obtain more detailed OSPF information, we configure an OSPF traceoptions file and flag for hello messages. Once we
bounce the OSPF adjacency by disabling and re-enabling OSPF on the interface, we discover that we are indeed, sending
OSPF hello messages. However, no further light is shed on the issue.


• Expand the trace:
user@rnx2# show traceoptions
file ospftrace;
flag hello;
flag error detail;
Note: Bounce OSPF
• The culprit: first to resta rt
adjacency formation
user@rnx2# run show 1og ospftrace process
Nov 24 01:31:44.779373 OSPF packet ignored: authentication failure (bad

cksum).
Nov 24 01: 31: 44. 779554 OSPF packet ignored: authentication failure from
172.18.5.1
Note: Monitoring the interface traffic would have been helpful with a plain-text authentication mismatch. but an MD5
secret mismatch would not have been detected.
More OSPF Traceoptions

Needing more information about the issue, we enable the error flag and use the detail option for our traceoptions. A
quick check of the new log highlights the most likely culprit. There is an OSPF MD5 authentication mismatch that we can
quickly resolve through configuration changes.
Note that bouncing the OSPF adjacency was a key action in the troubleshooting process. You might also find it helpful to
clear logs in between testing to establish a new baseline and trim the log file for review.
Not listed here, but immensely helpful, is familiarization with the OSPF adjacency formation process and states. For example,
you can greatly narrow your focus on an OSPF adjacency formation issue if you are aware of conditions that lead to no listed
adjacency state as opposed to an adjacency in the Init or 2-Way state. However, this protocol-level knowledge is beyond
the scope of this course.


7Monitoring Bridging
Monitoring Bridging

Bridging in the Control Plane (1 of 3)

• MX Series bridging processes:
user@rnx> restart 12?
12-learning Layer 2 address flooding and learning process
12cpd-service Layer 2 control Protocol process
12tp-universal-edge Universal edge Layer 2 Tunneling Protocol daemon
• EX Series bridging processes:

user@ex> restart ethernet-switching
Ethernet Switching Process signalled but still running, waiting 8 seconds more
Ethernet switching Process started, pid 17987
user@ex> restart 11dpd-service

Link Layer Discovery Protocol started, pid 17989
MX Series Bridging
Many of the newest generation of Juniper Networks devices provide expanded Layer 2 support in addition to routing. The MX
Series of Ethernet services routers allow for bridging capabilities optimized for the metro Ethernet environment. The slide
illustrates the Layer 2 processes eligible for restart using the Junos CLI. The 12-learning process maintains bridging
functionality and the bridging table. The 12cpd-service is responsible for media access control (MAC) address system
parameters and xSTP protocols.
EX Series Bridging
EX Series Ethernet Switches provide Layer 2 functionality aimed at the enterprise environment and have slightly different
configuration and monitoring commands. The slide illustrates the Layer 2 processes eligible for restart using the Junos CLI.
The ethernet-switching process is responsible for core bridging functionality and address learning. The
lldp-service maintains the Link Layer Discovery Protocol (LLDP) process.


• MX Series and EX Series bridge tables:
usert§} show bridge mac-tab1e
MAC flags (S -static MAC, D -dynamic MAC,

SE -Statistics enabled, NM -Non configured MAC)
Routing instance default-switch

Bridging domain lab, VLAN NA
MAC MAC Logical
address flags interface
50:c5:8d:87:8c:86 D ge-1/0/2.0
user� show ethernet-switching tab1e

Ethernet-switching table: 6 entries, 4 learned
VLA.l'J MAC address Type Age Interfaces
vlOO * Flood - All-members
vlOO OO:Oc:29:73:13:fe Learn O ge-0/0/14.0
Displaying Bridge Tables

The slide illustrates the commands used to view the bridge tables on both MX Series and EX Series devices.

• Clearing bridge entries:

• MXSeries
user@mx> clear bridge mac-table ?
<address> MAC address
bridge-domain Name of bridging domain, or 'all'
instance Display information for a specified instance
interface Clear media access control table for specified interface
isid Clear MAC address learned on a specified ISID
logical-system Name of logical system, or 'all'
vlan-id Clear MAC address learned on a specified VLAN (0 .. 4095)
• EXSeries
user@ex> clear ethernet-switching table
Clearing Bridge Table Entries

The slide illustrates how to clear the bridge table on both MX Series and EX Series devices. Note that you can clear individual
entries, the entire table or other granular levels of the bridging tables.


�Monitoring the Address Resolution Protocol
Monitoring the Address Resolution Protocol


ARP Overview
• ARP associates IP addresses with Layer 2 addresses
in an ARP table
• Once a routing issue is isolated to a broadcast segment,
monitor the ARP process for a local problem
D
.2
�-
HostX Host Y
ARP Table:
192.168.30.2 = 02:00:54:55:4E01
Address Resolution Protocol Overview

Once a routing path issue has been isolated to an Ethernet segment, you might want to monitor the ARP process for further
troubleshooting. The slide provides an overview of the ARP process in a sample network topology simplified for clarity.
The router depicted on the slide has received an IP packet destined for Host Y, which resides on a directly connected
Ethernet segment. The router consults its ARP table to determine if a MAC address is associated with Host Y's IP address.
The destination MAC address is required to forward the packet at Layer 2. Because there is no ARP entry, the router sends a
broadcast ARP request to the destination segment requesting the MAC address associated with the destination IP address.
All end hosts except for Host Y ignore the request. Host Y sends an ARP response with the appropriate destination MAC
address. The router then adds the information to its ARP table for future reference and forwards the IP packet onto the
segment with Host Y's destination MAC address in the Ethernet header.

Working with the ARP Table (1 of 2)
• Display the ARP table:

user@router> show a:cp
MAC Address Address Name Interface Flags
80:71:lf:c3:0d:ff 10.210.15.5 10.210.15.5 fxpO.O none
OO:lb:21:28:4f:54 10.210.15.24 10.210.15.24 fxpO.O none
Total entries: 2
• Manually clear the ARP entry:

user@router> clear a:cp hostname 10.210.15.5
10.210.15.5 deleted
Working with the ARP Table

The slide illustrates the format of the Ju nos ARP table, viewed with the show arp command. It also depicts the clearing of
an individual ARP entry. You can also clear an entire ARP table by omitting the hostname.

Working with the ARP Table (2 of 2)

• Monitoring the ARP process:
user@router> monitor traffic interface ge-1/0/0.141 no-resolve

verbose output suppressed, use <detail> or <extensive> for full
protocol decode
Address resolution is OFF.
Listening on ge-1/0/0.141, capture size 96 bytes
06:03:58.441952 Out arp who-has 172.18.1.1 tell 172.18.1.2

06:03:58.442425 In arp reply 172.18.1.1 is-at 50:c5:Bd:87:Bc:84
"C
2 packets received by filter

O packets dropped by kernel
Monitoring the ARP Process

The best method for troubleshooting a possible ARP issue is to watch it in progress using the monitor traffic
command. The slide illustrates an outgoing ARP request and its response using the monitor traffic command. Note
that most devices that run the Junos OS allow you to specify a manual ARP entry nested under the [edit interfaces
interface-name] hierarchy, which can also provide valuable testing for ARP issues or interoperability problems.
www.juniper.net Control Plane • Chapter 6-4 7

Summary
• Monitored and troubleshot system processes
• Practiced a logical approach to troubleshooting control
plane routing issues
• Learned methods of monitoring and troubleshooting
bridging functionality and ARP
We Discussed:
The monitoring and troubleshooting of control plane system processes;
A logical approach to control plane routing issues; and
The monitoring and troubleshooting of bridging and ARP functionality.

Review Questions
1. Name five functions of the control plane.
2. What are Junos system processes called?
3. How can you determine whether a user is logged in
using a console session versus a Telnet session?
4. Name three functions of rpd.
5. Name three commands that you can use to
troubleshoot control plane issues.
Review Questions
1.
2.
3.
4.
5.

Control Plane Monitoring and

Troubleshooting Lab
• Monitor and troubleshoot control plane system and

user processes.
• Generate and retrieve core dump files.
• Troubleshoot a sample routing protocol issue.
• Monitor bridging and ARP.
Control Plane Monitoring and Troubleshooting Lab


1.
In addition to hosting the Junos OS, the control plane manages routing protocols, system processes, user processes, chassis
components, routing tables, bridging tables, and the forwarding table.
2.
Junos processes arc also known as daemons.
3.
To identify whether a user is logged in using the console or remotely using Telnet or SSH, view the output of the show system users
command. A tty value beginning with a u character indicates a console session. A tty value beginning with a p character indicates a
remote Telnet or SSH session.
4.
rpd is responsible for controlling routing operations including controlling routing protocols, handling protocol messages, maintaining
routing tables, and implementing routing policy.
5.
Junos show commands such as show system statistics, show system processes, show system core-dmnps, show systetn users, and show
system alarms can be used to troubleshoot control plane issues among others.


JUnl�J��f
Chapter 7: Data Plane: Interfaces

Objectives
able to:
• Describe physical and logical interface properties
• Deactivate and disable interfaces
• Perform loopback testing
• Use operational mode commands to monitor and
troubleshoot Ethernet interfaces
Junw-
•
)!Of• . •• • ••
110':ro14:,..�;pa,Nei,;,.,�,oc:.;11riildsreoem,d. Worldwide Education Services ........J\PUpef.net I 2
l,��At;:c;......�=...,�- -�
We Will Discuss:
Physical and logical interface properties;
Deactivating and disabling interfaces;
Loopback testing; and
Monitoring and troubleshooting Ethernet interfaces.
Chapter 7-2 • Data Plane: Interfaces www.juniper.net

Agenda: Data Plane: Interfaces
7lnterface Properties
• General Interface Troubleshooting
• Ethernet Interface Troubleshooting
Interface Properties
www.juniper.net Data Plane: Interfaces • Chapter 7 -3

Interface Properties
• Physical properties:
• Ethernet options (speed, autonegotiation)
• Clocking
• Scrambling
• Frame check sequence (FCS)
• Maximum transmission unit (MTU)
• Data link layer protocol, keepalives
• Logical properties:
• Protocol family (Internet, ISO, MPLS, Bridge)
• Addresses (IP address, ISO NET address)
• Virtual circuits (VCI/VPI, DLCI)
Physical Properties
The following list provides details of the interface's physical properties:
Ethernet options: For Ethernet interfaces, refers to speed, duplex, and autonegotiation parameters.
Clocking: Refers to the interface clock source, either internal or external.
Scrambling: Refers to payload scrambling, which can be on or off.
Frame check sequence (FCS): You can modify to 32-bit mode (the default is 16-bit mode).
Maximum transmission unit (MTU): You can vary the size from 256 to 9192 bytes.
Data-link-layer protocol, keepalives: You can change the data-link-layer protocol for the particular media type (for
example, Point-to-Point Protocol [PPP] to Cisco High-Level Data Link Control [Cisco HDLC)]), and you can turn
keepalives on or off.
The following list provides details of the interface's logical properties:
Protocol family: Refers to the protocol family you want to use, such family iso, inet, or mpls.
Addresses: Refers to the address associated with the particular family (for example, IP address using family inet).
Virtual circuits: Refers to the virtual circuit identifier, such as a data-link connection identifier (DLCI), virtual path
identifier (VPl)/virtual channel identifier (VCI), or virtual LAN (VLAN) tag.
Other characteristics: Some other configurable options include Inverse Address Resolution Protocol (ARP), traps,
and accounting profiles.
Chapter 7 -4 • Data Plane: Interfaces www.juniper.net

Disabling and Deactivating Interfaces
• Add the inactive tag effectively removes the

statement from the configuration:
[edit interfaces]
user@router# deact�vate ge-1/0/1
[edit interfaces]
user@router# show ge-1/0/1
H
## inactive: interfaces ge-1/0/1
H
• Disable an interface or a logical unit to set it as

administratively down:
[edit interfaces]
user@router# set ge-1/0/1 disable
[edit interfaces]
user@router# show ge-1/0/1
disable;
Deactivating an Interface
In a configuration, you can deactivate statements and identifiers so that they do not take effect when you issue the commit
command. Any deactivated statements and identifiers are marked with the inactive tag. They remain in the configuration but
are not activated when you issue a commit command.
To deactivate a statement or identifier, use the deactivate configuration mode command: deactivate (statement I
identifier). To reactivate a statement or identifier, use the activate configuration mode command: activate
(statement I identifier). You can deactivate or disable a statement at many levels of the hierarchy.
Disable Versus Deactivate

In some portions of the configuration hierarchy, you can include a disable statement to disable functionality. One example is
disabling an interface by including the disable statement at the [edit interface interface-name] hierarchy level.
When you deactivate a statement, the Junos OS completely ignores that specific object or property and does not apply it at all
when you issue a commit command. When you disable a functionality, it is activated when you issue a commit command but
is treated as being down or administratively disabled.

Interface Configuration Examples

[edit interfaces] [edit interfaces]
user@router# show at-0/2/1 user@router# show ge-0/0/2
description "SY to HK and DE"; unit O {
atm-options family inet {
vpi O; address 10.0.13.1/24;
}
unit O { family mpls;
description "to HK"; )
vci 0.100; Gigabit Ethernet with inet and mpls support
family inet
address 10.0.15.1/24; [edit interfaces]
user@router# show so-0/1/3
no-keepalives;
unit 101 encapsulation frame-relay;
description "to DE";
vci 0.101; unit 100 {
family inet dlci 100;
address 172.16.0.1/24; family inet
address 4.4.4.4/24;
}
An ATM interface with multiple units
A SONET interface running Frame Relay
with keepalives (LMI) disabled
Interface Configuration Examples

This slide shows three configuration examples for common interface types. You can use cut and paste in conjunction with the
load merge terminal command to modify these configurations for use in your router. Piping the output of a show
command to display set is an excellent way to see the commands that were used to create a given configuration stanza.
Note that each configuration example makes use of at least one logical unit, and that a protocol family and related logical
properties are specified at the unit level. The commands used to configure the Asynchronous Transfer Mode (ATM) interface
shown on the slide are shown here using the support of the command-line interface (CU) for piped output to display set:
[edit interfaces]
user@router# show at-0/2/1 I display set
set interfaces at-0/2/1 description "SY to HK and DE"
set interfaces at-0/2/1 atm-options vpi O
set interfaces at-0/2/1 unit O description "to HK"
set interfaces at-0/2/1 unit O vci 100
set interfaces at-0/2/1 unit O family inet address 10.0.15.1/24
set interfaces at-0/2/1 unit 101 description "to DE"
set interfaces at-0/2/1 unit 101 vci 101
set interfaces at-0/2/1 unit 101 family inet address 172.16.0.1/24

Interface Configuration Examples (contd.)
The following configuration is somewhat complicated because it reflects a channelized DS3 Q-PIC interface that is broken down
into channelized and unchannelized DS1s. In the former case, two DSO channel bundles are defined, along with the relatedds
interfaces:
[edit interfaces]
user@router# show I no-more
ct3-0/1/0 {
description "Q-PIC based CT3 to CTl and non-channelized Tls.";
t3-options {
loopback remote;
loop-timing;
partition 1 interface-type ctl;

partition 2-28 interface-type tl;
ctl-0/1/0:1
description "CTl to NxDSOs.";
tl-options {
line-encoding ami;
framing sf;
bert-algorithm all-ones-repeating;
partition 1 timeslots 1-10 interface-type ds;

partition 2 timeslots 11-23 interface-type ds;
)
ds-0/1/0:1:1 {
description "first DSO channel bundle of ctl-0/1/0:l";
unit O {
family inet {
address 1.1.1.1/24;
ds-0/1/0:1:2
description "Second DSO channel bundle of ctl-0/1/0:l";
unit O {
family inet {
address 2.2.2.2/24;
)
tl-0/1/0:2
description "First full Tl from ct3-0/l/O, range is tl-0/1/0: (2-28]";
encapsulation cisco-hdlc;
unit O {
family inet
address 3.3.3.3/24;
www.juniper.net Data Plane: Interfaces • Chapter 7- 7


• Interface Properties
�General Interface Troubleshooting
• Ethernet Interface Troubleshooting
General Interface Troubleshooting

The slide lists the topic we discuss next.

Interface Troubleshooting Overview
• Understand the demarcation point

• Topology determines troubleshooting approach
essentially three topology types to consider when
troubleshooting:
• LAN or broadcast multi-access (Ethernet)
• Point-to-point (Sonet, SDH, Ti, Ei, T3, E3, PPP or HDLC)
• Point-to-multipoint (Sonet, SDH. Ti, Ei, T3, E3, Frame Relay
or ATM)
• Tools available and approach for each type vary
Understanding the Demarcation

Understanding the demarcation is important when troubleshooting a given problem. The model in North America is based on
the customer providing, and thereby being responsible for, the channel service unit (CSU) and data service unit (DSU) function.
The telco in this environment does not have any means of verifying the local-loop or tail without getting the subscriber to set a
loop back to the provider.
In Europe, the telco supplies the channel service unit (CSU) device and is responsible for the verification and testing of the
local-loop in addition to whatever segments might exist between the customer premises equipment (CPE).
Topology Determines Approach

Three topology types to consider when troubleshooting are the following:
LAN/broadcast multiaccess (FastjGigabit Ethernet);
Point-to-point (SONET/SDH, T3/E3, T1/E1, PPP, or Cisco HDLC); and
Point-to-multipoint (SONET/SDH, T3/E3, T1/E1, Frame Relay or ATM-VC).
Tools Available
The following pages discuss the tools available in the Junos OS.

Displaying Terse Interface Status

• The show interfaces terse command provides
a quick view of interface status: Administratively
user@router> show interfaces so* terse disabled
___________./j
Interface Ad.min Link Proto Local Remote
so-1/1/0 laownj'"iip
so-1/1/0.0 up down inet 1.1.1.1/30
iso
.......
so-1/1/1 up @0,...711,
so-1/1/1.0 up down inet 2.2.2.2/30 Link layer down
iso
so-1/1/2 up �
so-1/1/2.0 up up inet 3.3.3.3/30 Link layer up
.-��-,
Admin Link Meaning
down down Administratively disabled
up down Router interface problem
Interface misconfigured (encapsulation)
Keepalive sequencing not incrementing
CSU/DSU failure
Carrier problem (noisy line. timing mismatches)
Displaying Interface Status at a Glance

Use the show interfaces terse command to display a terse listing of all interfaces installed in the router along with their
administrative and link-layer status. The table on the slide explains the meaning of the Adrnin and Link status indications.
When an interface is administratively disabled, the physical interface has an Adrnin status of down and a Link status of up,
and the logical interface has an admin status of up and a link status of down. The physical interface has a link status of up
because the physical link is healthy (no alarms). The logical interface has a link status of down because the data link layer
cannot be established end to end.
When an interface is not administratively disabled and the data-link layer between the local router and the remote router is not
functioning, the physical interface has an Adrnin status of up and a Link status of up while the logical interface has an admin
status of up and a link status of down. The physical interface has a link status of up because the physical link is healthy (no
alarms). The logical interface has a link status of down because the data-link layer cannot be established end to end.
If the data-link layer between the local router and the remote router is up and running, both the physical and logical interfaces
have an admin status of up and a link status of up, as shown in the case of the so-1/1/2 interface on the slide.

Standard Interface Display

user@router> show interfaces ge-1/0/1
Physical interface: ge-1/0/1, Enabled, Physical link is Up
!Interf ace index: 1411 SNMP ifindex: 513! Physical device indexes
Link-level type: Ethernet, MTU: 1518, Speed: lOOOmbps, BPDU Error: None,
MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
Flow control: Enabled, Auto-negotiation: Enabled, Remote fault: Online
Device flags Present Running Device configuration and
Interface flaqs: SNMP-Tra s Internal: OxO operational flags
Cos queues 8 supported, 8 maximum usable queues
Current address: 80:71:lf:c3:03:61, Hardware address: 80:71:lf:c3:03:61
Last flapped 2013-01-30 17:13:52 PST (lw4d 20:45 ago)
Input rate O bps (0 pps)
Output rate O bps (0 pps) Traffic load and alann
Active alarms None status
Active defects None
Interface transmit statistics: Disabled
Logical interface ge-1/0/ 1.141 hndex 329) (Sl-l"MP ifindex 655 )I Logical device indexes
Flags: SNMP-Traps OxO VLAN-Tag [ Ox8100.141 ] Encapsulation: ENET2
Input packets : 15479
Output packets: 14669
Protocol inet, MTU: 1500
Flags: sendbcast-pkt-to-re
Addresses, Flags: Is-Preferred Is-Primary l..o!!ical device settings
Destination: 172-18.2.0/30, Local: 172.18.2-2, Broadcast: 172-18-2_3
Protocol mul�iservice, MTU: Unlimited
Standard Interface Status

Use the show interfaces command without the terse or detail options to display standard information about the
named interface (or all interfaces when a specific interface is not identified ). This slide provides sample output for a Gigabit
Ethernet interface. The callouts on the slide help illustrate how interfaces are partitioned into physical devices and logical units
in the Junos OS.
Each physical and logical interface is referenced by two index numbers within the Junos OS. An interface index is assigned to
each interface at boot time depending upon the order in which that interface is activated. The Simple Network Management
Protocol (SNMP) ifindex is used to identify and reference that interface when performing SNMP MIB walks. Note that the
indexes assigned to the physical interface device ( ifd) differ from the index used to identify the logical device (ifl). Wherever
possible, the SNMP ifindex values are persistent across reboots or in the event of hardware additions and deletions that
result from PIC or FPC insertion and removal.This persistence is the default behavior and is achieved by storing SNMP indexes
in the /var/db/dcd. snmp_ix file. When issuing a commit synchronize command, the Junos OS copies this file to the
backup RE to ensure that the same SNMP index values are used in the event of an RE switchover.
The output of a show interfaces command also includes a section on the device-level configuration and its operational
flags.
www.juniper.net Data Plane: Interfaces • Chapter 7-11

Standard Interface Status (contd.)

The output of a show interfaces command displays the device-level configuration and provides additional information
about the device's operation through various flags. These flags include the following:
Down: Device was administratively disabled.
Hear-Own-Xmit: Device will hear its own transmissions.

Link-Layer-Down: The link-layer protocol failed to successfully connect with the remote endpoint.
Loopback: Device is in physical loopback.
Loop-Detected: The link layer received frames that it sent and suspects a physical loopback.
No-Carrier: Where the media supports carrier recognition, this indicates that no carrier is currently seen.
No-Multicast: Device does not support multicast traffic.
Present: Device is physically present and recognized.
Promiscuous: Device is in promiscuous mode and sees frames addressed to all physical addresses on the
medium.
Quench: Device is quenched because it overran its output buffer.
Recv-All-Mul ticasts: No multicast filtering (multicast promiscuous).
Running: Device is active and enabled.
The status of the interface is communicated with one or more flags. These flags include the following:
Admin-Test: Interface is in test mode, which means that some sanity checking, such as loop detection, is
disabled.
Disabled: Interface is administratively disabled.
Hardware-Down: Interface is nonfunctional or incorrectly connected.
Link-Layer-Down: Interface keepalives indicate that the link is incomplete.
No-Multicast: Interface does not support multicast traffic.
Point-To-Point: Interface is point to point.
Promiscuous: Interface is in promiscuous mode and sees frames addressed to all physical addresses.
Recv-All-Mul ticasts: No multicast filtering (multicast promiscuous).
SNMP-Traps: SNMP traps are enabled.
Up: Interface is enabled and operational.
The operational status of the device's link layer protocol might also be indicated with flags. These flags include the following:
Give-Up: Link protocol does not continue to retry to connect after repeated failures.
Keepalives: Link protocol keepalives are enabled.
Loose-LCP: PPP does not use Link Control Protocol (LCP) to indicate whether the link protocol is up.
Loose-LMI: Frame Relay will not use the Local Management Interface (LMI) to indicate whether the link protocol
is up.
Loose-NCP: PPP does not use Network Control Protocol (NCP) to indicate whether the device is up.
No-Keepalives: Link protocol keepalives are disabled.
The output from Ethernet interfaces, as shown on the slide, does not display link layer flags.
The output also summarizes the device-level traffic load, which is displayed in both bits and packets per second, as well as
any alarms that might be active. The final portion of the command output displays the configuration and status of each
logical unit defined on that device. In this example, a single unit is defined with support for the inet protocol family.

Displaying Input and Output Errors

user@router> show interfaces ge-1/0/1 extensive I find "Input errors"
!Input errors:
Errors: 0, Drops: O, Framing errors: O, Runts: O, Policed discards: 0,
13 incompletes: 0, 12 channel errors: 0, 12 mismatch timeouts: 0,
·I
FIFO errors: 0, Resource errors: 0
uucput errors:
Carrier transitions: 1, Errors: 0, Drops: 0, Collisions: 0, Aged packets: 0,
FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
Egress queues: 8 supported, 4 in use
Queue counters: Queued packets Transmitted packets Dropped packets
O best-effort 14935 14935 0
1 expedited-to O 0 0
2 assured-for,-.r 0 0 0
3 network-cont 840 840 0
Queue nwnber: Mapped forwarding classes
0 best-effor1:
1 expedited-forwarding
2 assured-forwarding
3 network-control
Ac"Cive alarms None
Active defects None
Displaying Input and Output Errors for the Interface

Use the show interfaces extensive command to display input errors (extensive output only) on the interface. Use the
clear interfaces statistics inter.face-name command to reset the counters for the specified interface; omit an
interface name to clear all interface statistics. The following list explains the nonobvious counters:
Errors: Displays the sum of the incoming frame aborts and FCS errors.
Drops: Displays the number of packets dropped by the output queue of the 1/0 Manager application-specific
integrated circuit (ASIC).
Framing Errors: Displays the number of packets received with an invalid FCS.
Runts: Displays the number of frames received smaller than the runt threshold.
Giants: Displays the number of frames received larger than the giant threshold.
Policed discards: Displays the frames that the incoming packet match code discarded because they were
not recognized or of interest. Usually, this field reports protocols the Ju nos OS does not handle, such as Cisco
Discovery Protocol (CDP), or any protocol type the Junos OS does not understand. (On an Ethernet network,
numerous possibilities exist.)

Displaying Input and Output Errors for the Interface (contd.)

Carrier transitions: This counter represents the number of times the interface has gone from the down
state to the up state. If incrementing quickly, the cable, the remote system, or the interface is malfunctioning.
L3 incompletes: This counter increments when the incoming packet fails Layer 3 (usually 1Pv4) checks of the
header. For example, a frame with Jess than 20 bytes of available IP header would be discarded, and this counter
would increment.
L2 channel errors: This counter increments when the software cannot find a valid logical interface (such as
e3-1 /2 I 3. O) for an incoming frame.
L2 mismatch timeouts: Displays the count of malformed or short packets that cause the incoming packet
handler to discard the frame as unreadable.
The show interface extensive command also displays the output errors on the interface. The following list explains the
nonobvious counters:
HS link CRC errors: Displays the count of errors on the high-speed links between the ASICs responsible for
handling the router interfaces.
carrier transitions: Displays the number of times the interface has gone from down to up. This number
should not increment quickly, increasing only when the cable is unplugged, the far-end system is powered down
and up, or a similar problem occurs. If it does increment quickly (perhaps every 10 seconds), then either the
transmission line, the far-end system, or the PIC is broken.
Errors: Displays the sum of the outgoing frame aborts and FCS errors.
Drops: Displays the number of packets dropped by the output queue of the 1/0 Manager ASIC. If the interface is
saturated, this number increments once for every packet that is dropped by the ASIC's RED mechanism.
Resource errors: Displays the sum of transmit drops.

Aged packets: Displays the number of packets that remained in shared packet SDRAM for so long that the
system automatically purged them. The value in this field should never increment. If it does, it is most likely a
software bug or possibly malfunctioning hardware.

Monitoring an Interface
user@router> monitor interface ge-1/0/1
router Seconds: 57 Time: 13:15:37
Delay: 21/0/79
Interface: ge-1/0/1, Enabled, Link is Up
Encapsulacion: Ethernet, Speed: lOOOmbps
Traffic statistics: Current delta
Input bytes: 10544c910 (0 bps) (1386]
)
om:put bytes: 76075226 (368 bps) [720J
Input packets: 1489787 (0 pps) [2QJ
Output packets: 826831 (0 pps) [8]
Error statistics:
Real-time
Input errors: 0 traffic and [OJ
Input drops: 0 error rounts [OJ
Input framing errors:
_)
0 [OJ
Policed discards: 0 [OJ
13 incompletes: 0 [OJ
L2 channel errors: .(
0 [OJ
L2 mismatch timeouts: 0 Carrier transiti [0]
Next='n', Quit='q 1 or ESC, Freeze='f', Thaw='t', Clear='c', Interface='i'
Monitoring an Interface
The slide depicts a typical output from the monitor interface command. You must set your terminal session to VT100 for
the screen to display correctly. This command provides real-time packet and byte counters as well as displaying error and alarm
conditions. This output contrasts to the monitor traffic command, which displays a form of packet capture for control
traffic.

Loopback Testing
• Loopback testing is the primary method for
distinguishing between interface and circuit faults
Loopback Testing
The physical path of a line usually consists of a number of segments or spans interconnected by devices that repeat and
regenerate the signal. When a fault occurs on the circuit that takes the form of either a break or signal corruption due to noise,
it is possible to localize the problem by testing the line on a segment-by-segment basis or end-to-end basis, as needed.
Each circuit is symmetric in that a transmit path from one device connects to the receive path on the remote side, and vice
versa. Looping is the process of connecting the transmit path of a router or intermediate device to the receive path. If this device
is one of the routers, the loop will either be detected if the looped segment is operational, or not detected if there is a break.
This detection is achieved by the router detecting its own data-link-layer keepalive packets (for example, the magic number when
the encapsulation is PPP).
If a loop is set back towards a router and it is not detected, you can assume that the problem lies somewhere between the router
and where the loop was set by the telco or provider. The next step is to set a loop somewhere closer to the router to localize the
problem segment.
It is usually possible to loop the router's interface locally by connecting the PIC's transmit and receive ports. You should take
care to attenuate signal strength when dealing with intermediate- and long-reach fiber-optic interfaces.
You can use a similar approach to track down noise on a line by combining the looping process with a test that checks for bit
rate errors, commonly known as a bit error rate test (BERT). Many of the Juniper Networks M Series Multiservice Edge Router
and T Series Core Router interfaces support BERT testing.

Supported Loopback Types

• Most PICs support internal loopbacks
O Point-to-point-type PICs also support remote loopbacks
O A local loop does not also provide a remote loop
• Can perform only one type of loopback at any given time

Circuit can be looped anywhere in
the path
Port is OK
(Internally)
Loopback remote
Supported Loopback Types

Most Junos interfaces support internal local loopback tests. Where possible, it is best to perform local loopbacks using an
external loopback plug because this also tests the PICs transmit and receive circuitry. Point-to-point-style interfaces
(non broadcast types of technologies like SONET or T1/DS1), also support remote loopbacks. Note that configuring an interface
for a remote loopback results in a line loop on the local router; it does not generate a remote loopback request to the remote
router. Line loops can be remotely signaled for Pl Cs with integral CSU functionality (Tl/El and T3/E3). but the generation of the
remote loopback code requires telco interaction or test equipment. Again, configuring a remote loopback in the Junos OS does
not signal the remote end to perform a loopback; it creates a local line-loop condition. However, this behavior is somewhat
different with the remote Ethernet loopback testing provided by the advent of Ethernet Operation, Administration, and
Maintenance (OAM), discussed later in this chapter.
For local loopback the PIC's transmit clocking should be set to internal, which is the default setting. A remote loopback (line
loop) allows telco testing on the local loop (also called the tail) and also allows testing from the remote router.

Configuring Loopbacks
• Local and remote loops require configuration on most
Pl Cs
• External local loop and telco line loops do not require
configuration
[edit interfaces ge-1/0/ll
user@routert show
gige er-op 1.ons
loopback; Only local loops permitted on Ethernet interfaces
unit O {
family inet
address 172.22.241.1/24
!arp 172.22.241.10 mac 80:71:lf:c3:18:61;j
i j Must configure static ARP entry

}
• Display interface status to confirm that a configured
loopback is in effect
user@router> show interfaces ge-1/0/1 I match loop
MAC-REWRITE Error: None, Loopback: Enabled, Source filtering: Disabled,
Device flags Present!Runn1.ng Loop-Detected!
Configuring Loopbacks
Interface loopbacks require configuration in the Junos OS for most PICs and interface types. A small number of channelized DS3
and OC12 interfaces support the ability to initiate FEAC-based or Tl inband and FDL-based loopbacks using operational mode
commands. Note that configuration is never needed for an external local-loopback with a loopback plug, or when relying on the
telco to provide a line loopback (which appears as a remote loopback to the attached router). This slide shows an example of a
local-loopback configuration and the operational mode status display that confirms that the loopback is in place. The example is
based on a Gigabit Ethernet interface and displays a manually configured Address Resolution Protocol (ARP) entry. The manual
ARP configuration enables the Junos OS to send test frames without the need for ARP resolution.
Note that when the telco provides a line loopback, nothing indicates that a loopback is in place, unless the configured Layer 2
protocol has built-in loopback detection-for example, PPP. The routers used in this example are running Frame Relay with
LMl-based keepalives disabled. As a result. a remote loopback goes undetected at the remote router, which is now talking to
itself as indicated by the TTL expiration messages shown here (we cover the use of ping to test loopbacks on an upcoming slide):

Configuring Loopbacks (contd.)
[edit interfaces so-0/1/1]
user@router# run ping 10.0.22.1 count 1
PING 10.0.22.1 (10.0.22.1): 56 data bytes
36 bytes from 10.0.22.2: Time to live exceeded
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 0054 60lb O 0000 01 01 198c 10.0.22.2 10.0.22.1
--- 10.0.22.1 ping statistics ---
[edit interfaces so-0/1/1]

user@router# run show interfaces so-0/1/1 I match loop
Link-level type: Frame-Relay, MTU: 4474, Clocking: Internal, SONET mode, Speed: OC3,
Loopback: None, FCS: 16,

Testing a Looped Line

• Loop line and ensure that interface stays up
• Needs ML Frame Relay, or HDLC encapsulation with
keepalives disabled
• Ping remote IP address
• If line is good, ping returns to router and is routed back out
the interface with TTL decremented
• When TTL expires, error message returns-this is the
expected resu It
• TTL expiration indicates that no packets were lost during the
test-TTL setting determines number of loops (default TTL is
255)
• Helpful to open another session and monitor the interface
under test to display such things as CRC errors
user@router> ping 172.22.241.10
PING 172.22.241.10 (172.22.241.10): 56 data bytes
36 bytes from 172.22.241.1: jTime to live exceeded!
4 5 00 0054 7b3f O 0000 01 01 0431 172.22.241.1 172.22.241.10
Testing a Looped Line

Many Layer 2 protocols make use of a keepalive mechanism that, among other things, can detect the presence of a loopback.
Whether local or remote, the detection of a loop condition results in a link down declaration for that interface. When the
interface is marked as down at the link layer, the related interface route is removed from the routing table, which prevents ping
testing for the duration of the loopback.
In most cases you can work around this issue by configuring the interface with a no-keepali ves statement, but this only
works for the frame-relay, atm, and cisco-hdlc encapsulation types. Even with keepalives (LCP) disabled, PPP still
detects the presence of a loopback when the IP Network Control Protocol (IP-NCP) attempts to negotiate Layer 3 parameters.
The only way around this conditions is to change the interface's encapsulation type for the duration of the loopback test.
Note that Ethernet-related technologies, by default, have no concept of a link-layer keepalive protocol, and they do not support
the concept of a remote loopback. However, Ethernet OAM, allows for some remote loopback testing. We discuss Ethernet OAM
later on subsequent slides.
Note that in some non-Juniper Networks equipment, you can test the operation of a WAN link by issuing pings to the router's
local IP address. Junos-based routers do not exhibit this behavior. A ping sent to the router's local IP address does not exit the
interface, and as such, cannot be used to ascertain the operational status of the line. On devices that run the Junos OS, ping a
non-local address from the directly connected subnet.

Detecting MTU Issues
• Use ping with various packet sizes and

do-not-fragment flag:
.--------- The additional 20 bytes of IP and 8 bytes of ICMP
C equal 1500 bytes
-.....,,.
user@router> ping 192.168.20.1 size 1472 do-not-fragment count 1
PING 192.168.20.1 (192.168.20.1): 1472 data bytes
1480 bytes from 192.168.20.1: icmp_seq=O ttl=254 time=2.848 ms The byte that breaks
the MTU"s back
--- 192.168.20.1 ping statistics ---
round-trip min/avg/max/stddev = 2.848/2.848/2.848/0.000 ms
user@router> ping 192.168.20.1 size 1473 do-not-fragment count 1

PING 192.168.20.1 (192.168.20.1): 1473 data bytes
jping: sendto: Message too longj
--- 192.168.20.1 ping statistics

Detecting MTU Problems

The best way to detect interface problems relating to inconsistent MTU settings is to first determine the path that packets take
using traceroute, and then conduct ping testing with varying packet sizes in conjunction with the do-not-fragment
switch.
Note that for each interface, a device MTU exists, which is a function of that medium and which ultimately limits the maximum
supported protocol family MTUs. In addition, each configured protocol family has its own MTU value on a per-logical unit basis. In
this example, the Gigabit Ethernet device MTU has a value of 1500 bytes.
This slide illustrates how the Junos OS can support 1472 ICMP data bytes over a path with an MTU of only 1500 bytes; the
additional 8 bytes of ICMP and 20 bytes of IP header yields a total 1Pv4 protocol data unit size of 1500 bytes. When the operator
increases the size to 1473, the straw that breaks the camel's back is triggered, an error message is returned.

Minimize Disruption
• When links are flapping, disable or remove them from

the IGP
• Limits the impact for flooding LSAs/LSPs, running SPF, etc.
• For OSPF:
[edit protocols ospf area 0]

user@router# set interface ge-0/1/1 disable
• For IS-IS:
[edit protocols isis]
user@router# set interface ge-0/1/1 disable
• Remember to enable the interfaces when the problem is

fixed
Flapping Links
When you suspect intermittent failures of an interface or transmission line, you should consider removing the interface from the
routing protocol configuration. Removing the interface from OSPF or Intermediate System-to-Intermediate System (IS-IS)
advertisements limits the flapping interface's impact on the rest of the network while you isolate and correct the problem.
For OSPF, you can disable the interface using the command shown on the slide. For IS-IS, you can use the same approach, or
alternatively, you can remove the family iso from the interface configuration. In all cases, you should take care to restore
the interface's configuration when the problem is resolved.

• Interface Properties
• General Interface Troubleshooting
�Ethernet Interface Troubleshooting
Ethernet Interface Troubleshooting


Interface Troubleshooting Chart
Suspect bad IP
configuration
No
[];]
�! Bad L2 config
Interface Troubleshooting Flowchart

The purpose of the interface troubleshooting flowchart shown on the slide is simply to provide a set of high-level steps and
decision points designed to get you started on the path of interface and transmission line troubleshooting. Note that reasonable
people might disagree on the exact ordering of the steps or on the particulars of the CU commands that could be used to help
isolate an interface or circuit problem.
Although the chart is applicable to all types of interfaces, this section of the chapter focuses only on Ethernet interface
troubleshooting.

Ethernet Topologies
• Port types:
• Gigabit Ethernet, Fast Ethernet. and so forth
• Link mode (full or half duplex)
• Tools:
•ping
•loopback (local)
•show interfaces extensive
•show interfaces media
•show arp
•monitor traffic
•monitor interface
•clear statistics
Media Types and Interface Naming

Junos devices support several flavors of Ethernet. The relevant media types are:
Fast Ethernet: designated with fe in the interface name;
Gigabit Ethernet: designated with ge in the interface name; and
10 Gigabit Ethernet: designated with xe in the interface name.
Link Mode
When troubleshooting Ethernet topologies, consider the link mode:
Full duplex;
Half duplex; or
Link bonding (802.3ad).
Fast Ethernet interfaces can support half or full duplex, but Gigabit Ethernet and 10 Gigabit interfaces function only in
full-duplex mode.
Junos Tools
The Junos OS provides the tools shown on the slide. The tools listed depict various CU commands used to troubleshoot and
monitor Ethernet interfaces. The following pages examine these tools.

Ethernet Troubleshooting (1 of 4)
• Local loopback support

• Monitor traffic while looped and look for the receipt of all
transmitted ARP messages (broadcast) and LED status
indications
[edit interfaces ge-0/0/0]
user@routeri show
gigether-options
loopback;
• Better yet, use loopback plug and static ARP

• Static ARP must match looped interface's own MAC address
user@router> show interfaces ge-0/0/0 I match hardware
Current address: 00:90:69:6b:30:00, Hardware address: 00:90:69:6b:30:00
user@router> show configuration interfaces ge-0/0/0
unit O {
family inet
address 200.0.0.1/24
arp 200.0.0.20 mac G0:90:69:6b:30:00;
Configuring Loopback Mode

To place an Ethernet interface in loopback mode, issue a set gigether-options loopback at the [edit
interfaces ge-interface-name] hierarchy. You use a similar command for Fast Ethernet interfaces.When the
interface is looped, you can monitor traffic and expect to see all traffic that is sent out-that is, an ARP request-coming right
back in:
user@router> monitor traffic interface ge-0/0/0
verbose output suppressed, use <detail> or <extensive> for full protocol decode
Listening on ge-0/0/0, capture size 96 bytes
21:14:04.424904 Out arp who-has 200.0.0.30 tell 200.0.0.1
21:14:04.425328 In arp who-has 200.0.0.30 tell 200.0.0.1
When operating in the default full-duplex mode, you can also attach an external loopback plug to effect an external local
loopback. To see TIL expired messages (as expected for peer-to-peer interfaces), you must add a static ARP entry that matches
the looped interfaces' own MAC address for the target IP address. This step is necessary so that the returning traffic is accepted
by the interface under test because a nonpromiscuous Ethernet interface only accepts broadcast and unicast traffic sent to its
MAC address. When all is working you should see TIL errors as shown:
user@router> ping 200.0.0.20 count 1
PING 200.0.0.20 (200.0.0.20): 5 6 data bytes
36 bytes from 200.0.0.1: Time to live exceeded
4 5 00 0054 4e67 0 0000 01 01 db2c 200.0.0.1 200.0.0.20

• Ping a locally connected host or router

• Use the show arp no-resolve command
user@router> show arp no-resolve
t-{.AC Address Address Interface Flags
80:71:lf:c3:18:64 10.0.20.1 ge-1/1/4.0 none
• Cable lengths and physical layer standards:

• Cat 5/6 UTP copper: 100 meters
• Multimode/single-mode fiber: Check the port specifications
• Tips:
• Check encapsulation types (802.3 LLC, 802.3 SNAP, DIXv2)
• Use the show interfaces extensive command
•; Use the monitor interfaces command
Pinging a Locally Connected Host

A reply received from the host or router typically provides verification that the link and interface are operating correctly.
Displaying ARP Table

The show arp no-resolve command displays the entries in the ARP table. Using the no-resolve option prevents the
router from attempting to determine the host name that corresponds to the IP address.
Verifying Cable Length

Ensure the cables used on the network do not exceed recommended lengths and meet all relevant specifications.

Generic Tips
We recommend the following:
Ensure that encapsulation types are equivalent to other hosts or router on link.
Use the show interfaces extensive command to check status of interface.
Use the monitor interfaces command to receive real-time statistics.
Use the monitor interface interface-name traffic command to display real-time statistics about a
physical interface. The output is updated every second. The output of this command also shows the amount that
each field has changed since you started the command or since you cleared the counters by using the c key. This
command also checks for and displays common interface failures, such as SONET/SDH and T3 alarms, loopbacks
detected, and increases in framing errors. If the framing errors are increasing, this indicates that frames are being
corrupted. If the input errors are increasing, check the cabling to the router and have the carrier verify the integrity
of the line.

• IEEE 802.3ah OAM link-fault management:

• Standard designed to manage Ethernet as a WAN
technology
• Features include discovery, link monitoring, remote fault
detection, remote loopback
[edit protocols oarn ethernet] Sets remote peer in
user@router# show loopback state
link-fault-management {
-------
interface ge-1/0/2;
interface ge-1/0/1 {
back;
Allows remote peer to
set local interface in
loopback state
Ethernet OAM Link Fault Management

The Institute of Electrical and Electronics Engineers (IEEE) 802.3ah specification was designed for modern Ethernet topology
monitoring and troubleshooting. As networks have evolved and the capacities of Ethernet have grown, Ethernet has moved
from existing primarily as a LAN technology to utilization as a WAN technology. Ethernet OAM link fault management provides
a set of management tools to monitor Ethernet circuits set up in a peer-to-peer fashion.
The slide illustrates the basic configuration required to enable Ethernet OAM link fault management on an interface. It also
displays the configuration for signaling the remote end of the circuit to establish a remote loopback. The configuration also
allows the device at the remote end of the circuit to signal the Junos-based device to enable a loop.
Note that only some Junos-based devices support Ethernet OAM. The link must have a speed of at least 100 Mbps and both
ends of the circuit must support Ethernet OAM.

• Monitoring OAM link-fault management:
user@router> show oam ethernet link-fault-management Both peers are

Interface: ge-1/0/1 looped!
Status: Running, Discovery state: send Any
Peer address: 80:71:lf:c3:18:61
Flags:Remote-stable Remote-state-Valid Local-Stable OxSO
!Remote loopback status: Enabled on local port, Enabled on peer port!
Remote entity information:
Remote MID{ action: forwarding, Remote parser action: loopback
Discovery mode: active, Unidirectional mode: unsupported
Remote loopback mode: supported, Link events: supported
variable requests: unsupported
Interface: ge-1/0/2
status: Running, Discovery state: Active Send Local
Peer address: 00:00:00:00:00:00
Flags:OxB
� No peer discovered
on ge-1/0/2 interface
• Use detail option to view error counters
Monitoring Ethernet OAM Link Fault Management

You can monitor Ethernet OAM link fault management with the show oam ethernet link-fault-management
command as shown on the slide. To display the OAM link fault management counters, use the detail option as shown in
the following output:

Monitoring Ethernet OAM Link Fault Management (contd.)
user@router> show oam ethernet link-fault-management detail
Interface: ge-1/0/1
Status: Running, Discovery state: Send Any
Peer address: 80:71:lf:c3:03:61
Flags:Remote-Stable Remote-State-Valid Local-Stable Ox50
OAM receive statistics:
Information: 459, Event: 0, Variable request: 0, Variable response: 0
Loopback control: 1, Organization specific: 0
OAM flags receive statistics:
Critical event: 0, Dying gasp: 0, Link fault: 0
OAM transmit statistics:
Information: 459, Event: 0, Variable request: 0, Variable response: 0
Loopback control: 2, Organization specific: 0
OAM received symbol error event information:
Events: 0, Window: 0, Threshold: 0
Errors in period: 0, Total errors: 0
OAM received frame error event information:
Events: 0, Window: 0, Threshold: O
OAM received frame period error event information:
OAM received frame seconds error event information:
OAM transmitted symbol error event information:
OAM current symbol error event information:
OAM transmitted frame error event information:
OAM current frame error event information:
Remote loopback status: Enabled on local port, Enabled on peer port
Remote entity information:
Remote MUX action: forwarding, Remote parser action: loopback
Discovery mode: active, Unidirectional mode: unsupported
Remote loopback mode: supported, Link events: supported
Variable requests: unsupported

Summary
• Described physical and logical interface properties
• Deactivated and disabled interfaces
• Performed loopback testing
• Used operational mode commands to monitor and
troubleshoot Ethernet interfaces
We Discussed:
Physical and logical interface properties;
Deactivating and disabling interfaces;
Loopback testing; and
Monitoring and troubleshooting Ethernet interfaces.

Review Questions
1. What is the difference between the monitor

interface and the monitor traffic
commands?
2. What is the difference between deactivating and
disabling an interface?
3. Describe how you can use loopback testing to
troubleshoot connectivity problems.
4. What is the rationale behind disabling an interface
experiencing problems?
5. What condition might lead to a policed discard?
Review Questions
1.
2.
3.
4.
5.

Monitoring and Troubleshooting Ethernet

Interfaces Lab
• Troubleshoot Ethernet interfaces.

• Perform loopback testing.
Monitoring and Troubleshooting Ethernet Interfaces Lab


1.
The monitor interface command displays real-time statistics, while the monitor traffic command displays a packet dump of control
traffic transiting the interface.
2.
Deactivating an interface causes the JUnos OS to ignore the deactivated configuration, while disabling an interface results in the
interface being administratively disabled.
3.
Loopback testing can be used to loop traffic back to the originator at various physical points within a circuit. This helps determine the
location of a fault.
4.
Disabling a troubled interface can remove instability in routing protocols and prevent black holes.
5.
A policed discard represents traffic unknown to the Junos OS, such as CDP traffic.


JUnl�J�[
Chapter 8: Data Plane: Other Components

Objectives
able to:
• Recognize data plane problems and components
• Monitor and troubleshoot data plane forwarding
• Monitor load balancing
• Troubleshoot firewall filter and policer issues
We Will Discuss:
Data plane problems and components;
Monitoring and troubleshooting data plane forwarding;
Monitoring load balancing; and
Troubleshooting firewall filter and policer issues.
Chapter 8-2 • Data Plane: Other Components www.juniper.net

Agenda: Data Plane: Other Components
7Definition of a Data Plane Problem

• Data Plane Components
• Data Plane Forwarding
• Load-Balancing Behavior
• Firewall Filters and Policers
• Data Plane Troubleshooting Case Study
Definition of a Data Plane Problem

www.juniper.net Data Plane: Other Components • Chapter 8-3

What Is a Data Plane Problem?
• Definition of a data plane problem:

• All aspects of the control plane are normal, but traffic is still
not flowing as desired
• Can be a total lack of traffic or impaired performance
• Everything seems normal, but it just does not work
• Usually data plane problems take the form of
hardware errors, firewall filters, policers, and so forth
Routing Engine
Control Plane
Data Plane
DD]DD]
Frames/Packets In Frames/Packets Out

Understanding Data Plane Problems

Data plane problems are rare-which is fortunate, because problems in this plane can be pretty tricky to isolate and resolve.
Technicians are often "blinded" by BGP, which is to say they are so accustomed to dealing with control and signaling plane
problems that sometimes a relatively obvious problem in the data plane simply goes unnoticed.
In general, the primary symptom of a data plane problem is when everything else looks fine-for example, all adjacencies are up
and the routes to the destination are present and active, and so forth-yet, for some reason, traffic either does not flow at all, or
it is flowing towards the destination in such a manner that it is considered a problem.
Typical Data Plane Problems

In most cases a problem in the data plane is traceable, in one way or another, to one of the following items:
General hardware/application-specific integrated circuit (ASIC) failures: This type of event tends to result in
corrupted packets and, therefore, lost traffic. Check the system logs for error reports relating to the Packet
Forwarding Engine (PFE) if you suspect hardware problems.

Typical Data Plane Problems (contd.)
Firewall filters: By design, a firewall filter allows you to block matching traffic. You can also police and alter the
class-o f -service (CoS) settings for matching traffic as well as direct traffic into nonstandard routing instances.
When troubleshooting a problem with forwarding, you should always confirm whether any input or output filters are
in effect. If filters have been applied, use the show configuration filter command to display the specifics
to determine whether the filter can account for the symptoms with which you are dealing.
Po/icers: You can invoke policers from a firewall, or you can apply them directly to an interface. Because a policer
can throttle traffic with the discard of excess packets, you should always confirm whether interface policers are in
effect.
MTU mismatches: Most of the time a mismatched maximum transmission unit (MTU} simply results in diminished
efficiency as larger packets are fragmented to accommodate a smaller MTU. In some cases mismatched MTUs can
lead to OSPF adjacency formation problems or data loss when an application chooses to set the do not fragment
flag within the IP header. A classic symptom of this type of forwarding problem is the ability to send small packets,
with loss occurring as some larger packet size is reached.
Transit routing over the management interface: The fxpO and other built-in management interfaces are designed
for out-of-band (OoB) use only. Attempting to route transit traffic from a PFE port out fxpO, or from fxpO out a PFE
port, results in a discard and an Internet Control Message Protocol (ICMP) destination unreachable. As a general
rule, we recommend that you not run any routing protocols over built-in management interfaces. When a routing
protocol is used, it must not be the same protocol or instance that is used to build transit routing tables.

Agenda: Data Plane: other Components

• Definition of a Data Plane Problem
7 Data Plane Components



• Data plane defined:
• Physical and logical components responsible for forwarding
packets and frames, as well as some services
• Also known as the forwarding plane, or the PFE
• Hardware residing in the data plane
• Midplane or backplane
• Line cards (FPC. IOC. DPC. and so forth)
• Interface cards (PIC. PIM. and so forth)
• Services cards (SPC. NPC. MS-DPC. and so forth)
• Control boards (SIB, SFM. CFEB. TFEB. and so forth)
• Contains various ASICS
Defining the Data Plane

The data plane is the new name for what used to be referenced as the forwarding plane. Although this chapter focuses solely
on forwarding type issues, the data plane reference is indicative of the new features available on Junos devices. With the
advent of switching, security and services, the data plane does much more than just forwarding. It provides advanced
security and subscriber features such as IP Security (IPsec), Network Address Translation (NAT), and remote access
authentication, among others.
The data plane consists of most components not directly tied to the control plane. If the control plane acts as the brains of a
Junos device, the data plane acts as the workhorse. The data plane hosts ASICs designed to perform services and forward
traffic really fast!
The slide lists the most common components of the data plane.
www.juniper.net Data Plane: Other Components • Chapter 8- 7

Working with Data Plane Components

• Offlining and onlining components:
user@router> request chassis mic mic-sl.ot O fpc-sl.ot l offl.ine
fpc 1 mic O offline initiated, use "shorN chassis fpc pie-status 1 11 to verify
user@router> request chassis mic mic-sl.ot O fpc-sl.ot l onl.ine

fpc 1 mic O online initiated, use "show chassis fpc pie-status 1 11 to verify
user@router> show chassis fpc pie-status l

Slot 1 Online MPC Type 2 3D Q
PICO Online lOx lGE(LAN) RJ45
• Obtaining component information

• Some components have console ports
• Control boards keep local state information and can be accessed in
the shell or using the CLI request pfe network execute
command "command" target control.-board-type
command
Working with Data Plane Components

Very rarely, a data plane component such as a PIG, Flexible PIG Concentrator (FPC), or control board experiences a hardware
condition that requires clearing by offlining and onlining the component. Should you encounter such a condition, you should
immediately notify the Juniper Networks support team. The support team might request that you gather information directly
from the suspect component. This information might be gathered either by consoling directly into the component using the
component's console port or by accessing the UNIX shell prompt and opening a virtual terminal session to the faulty
component. Two of the most commonly requested pieces of data from line cards and control boards include the show
sysl.og messages and show nvram outputs.
The slide illustrates the commands used to online, offline and verify the status of a data plane component. It also lists the
start shell. pfe execute action, which you can use to issue shell commands on a data plane component without
explicitly logging into the shell.


�Data Plane Forwarding
Data Plane Forwarding


Viewing Forwarding Table Entries (1 of 3)
• Display entries in the master copy of the forwarding

table
• show route forwarding-table destination
destination-prefix
so-0/1/1
®
mx.A. mxB
Loo: 192.168.10.1 LoO: 192.168.20.1 P2P forwarding
interface
user@mxA> show route forwarding-table destination 192.168.20-1
Routing table: default.inet
Internet:
Destination T e RtRef Next ho . e Index NhRef Netif
192.168-20.1/32 user O 256 2 so-0/1/1.0
P2P interfaces do not use a forwarding next hop
Viewing Forwarding Table Entries: Part 1

Use the show route forwarding-table destination destination-prefix command to display the route entries
in the kernel's forwarding table (FT). This table is the version of the FT in the Routing Engine (RE). The RE copies this table to the
PFE where it is used during hardware forwarding.
The output of this command displays only the network-layer prefixes and their next hops. You can use the show route
commands to access information about any attached communities or other route tags. However, when you run into data plane
problems, it can be necessary to use the show route forwarding-table command to verify that the routing protocol
process relayed the correct information into the forwarding table. By default, this command displays IP version 4 (1Pv4)-related
entries. Specify other protocol families such as mpls or inet6 as needed.

Viewing Forwarding Table Entries: Part 1 (contd.)

The following route types are supported and indicate how the route was placed into the forwarding table:
clon/cloned: Clone route (for TCP or multicast only; displayed only if you specify the detail option).
dest/destination: Remote addresses directly reachable through an interface.
iddn/destination down: Destination route for which the interface is unreachable.
ifcl/interface cloned: Cloned route for which the interface is unreachable.
ifdn/route down: Interface route for which the interface is unreachable.
ignr I ignore: Ignore this route.
intf/interface: Installed as the result of configuring an interface.
perm/permanent: Permanent (installed by the kernel when the routing table is initialized).
user/user: Installed by the routing protocol process or as a result of the configuration.
The following next-hop types are defined:
best/broadcast: Broadcast.
deny/deny: Deny.
dscd/discard: Discard; no ICMP unreachable message sent.

hold/hold: Next hop is waiting to be resolved (will turn into a unicast or multicast after resolution).
idxd/indexed: Indexed next hop.
indr: I indirect: Indirect next hop.
locl/local: Local address on an interface.
mcrt/routed multicast: Regular multicast next hop.
mcst/multicast: Wire multicast next hop (limited to the LAN).
mdsc/multicast discard: Multicast discard.
mgrp/multicast group: Multicast group member.
recv I receive: Receive.
rj ct/reject: Reject; ICMP unreachable message sent.
rslv/resolve: Resolving the next hop.
ucst/unicast: Unicast.
ulst/unilist: List of unicast next hops. A packet sent to this next hop goes to any next hop on the list.

• Multiaccess interfaces are associated with a

forwarding interface that must be resolved to a next
hop
ge-1/0/2 ge-1/0/2
® @
Forwarding Multiaccess
mxA mxB forwarding
Loo: 192.168.10.1 LoO: 192.168.20.1
next hop to
resolve interface
user@mxA> show route forwarding-table destination 192.168.20.1

Internet:
Destination Type RtRef Next hop Index NhRef Netif
192.168.20.1/32 user O 110.0.13.2! 286 2 ge-1/0/2.0
user@mxA> show a1:p

MAC Address Address Name Interface Flags
!00:90:69:6a:90:02 10.0.13.2! 10.0.13.2 ge-1/0/2.0 none
�
ARP resolves forwarding next hop

The slide demonstrates how multiaccess route entries are represented in the FT. The key point is that forwarding over a
multiaccess interface requires resolution of a forwarding next hop.
In the Ethernet-based example shown on the slide, the resolution of the 10.0.13.2 next hop is performed by the Address
Resolution Protocol (ARP).

• Active routes in the RE should be installed in the PFE

•Use show route ip prefix prefixor
show route ip lookup prefix to display PFE entries
• Shell command
• Log into shell and PFE component or use request pfe
execute operational mode command
user@router> request pfe execute command "show route ip l.ookup 192.168.36.l" target tfebO
SENT: Ukern command: show route ip lookup 192.168.36.1
GOT:
GOT: Route Information (:92.lcB.3 6.l l =
__ ___
Logical interface
GOT: interface : ge-1/0/2.143 (74) -
GOT: Next.hop prefix 172.18.5.2 index (ifl)
GOT: Nexthop ID 596
GOT: MTU 1500
GOT: Class ID O
LOCAL: End of file

We based the previous coverage on displaying FT entries on showing values in the master copy of the FT maintained in t�e RE.
By connecting to the PFE component responsible for route lookup-for example, the T -FEBs on an MX Series device-you can
display entries in the copy of the FT that resides in that PFE component. Note that on l/M/N/R platforms, each FPC is
associated with a complete PFE complex. As a result, each PFE maintains a copy of the FT.
Use the request pfe execute command to access PFE information directly without needing to log into the component
through the shell. Pass the string show route £ami.ly-name prefix pre£ix-va.lue command to display matching
entries in the PFE's FT.
Because the Junos kernel lives to keep the copy of the FT in the PFE synchronized with its own copy, inconsistencies are
extremely rare. You should contact Juniper Networks Technical Assistance Center (JTAC) if you find that entries in the RE's FT are
not accurately represented in the PFE copy of the FT.

Clearing Forwarding Table Entries

• Clear entries from the forwarding table
•Useclear route forwarding-table
destination-prefix (next-hop I index)
• For P2P entries, specify the next-hop index as the next hop
• For multiaccess entries, specify either the forwarding next
hop or the next-hop index
• Rarely needed! Can lead to RIB/FIB inconsistency
user@router> show route forwarding-table destination 192.168.36.1
Internet:
Destination Type RtRef Next hop Type Index NhRef Netif
192.168.36.1/32 user O 172.18.5.2 ucst 596 4 ge-1/0/2.143
user@router> clear route forwarding-table 192.168.36.1 ?

<next-hop> Name of next hop
user@router> clear route forwarding-table 192.168.36.1 172.18.5.2
delete 192.168.36.1: gateway 172.18.5.2
Clearing FT Entries
The slide illustrates the syntax used to remove entries from the master copy of the FT. In most cases the entry is immediately
written back into the FT as this is the kernel's idea of a good time.
Note that this command is rarely used. It is only intended to recover from the rare case of an invalid next hop interfering with a
valid next hop by virtue of its remaining in the PFE after the related route is removed from the routing table.


7 Load-Balancing Behavior
Load-Balancing Behavior

PFE Load Balancing

• Default load-balancing behavior is per prefix
• The default load-balancing behavior results in a single next
hop installed in the PFE for each prefix:
user@router> show route 172.31.18.0/24
172.31.18.0/24 *[BGP/170) 13:08:25, localpref 100, from 172.18.6.1

AS path: 65000 I
to 172. 18. 6 .1 via ge-1/0 IO. 141 A single next hop is selected
> fto 172. 18. 7. 1 via ge-1/ 0/ 1.141� based on a hash algorithm

Routing table: default.inet �
Internet:
172.31.18.0/24 user O 172.18.7.1 jucst 558 11 ge-1/0/1.141 I
• To enable per flow, configure forwarding table export policy
• Flow consists of source and destination addresses. transport
protocol and incoming interface index
• To include ports. configure layer-3 and layer-4 hashing
PFE Load Balancing

By default, when there are multiple, equal-cost paths to the same destination for the active route, the Junos operating system
uses a hash algorithm to choose one of the next-hop addresses to install into the forwarding table. Whenever the set of next
hops for a destination changes in any way, the next-hop address is rechosen, again using the hash algorithm.
The slide shows how one of two equal-cost next hops for the 172.31.18.0/24 prefix are installed into the forwarding table with
the default per-prefix load balancing in effect.
The default behavior can be changed by configuring and applying an export policy for the forwarding table as displayed on the
next slide.

Set Load Balancing $

ge-1/0/0 ge- 1j0/0 ®
ge-1/0/1 ge-1/0/1: ..
• Define policy and apply as export to forwarding table:

[edit]
user@router# show policy-options policy-statement load-balance
term one {
then {
load-balance per-packet;
accept; A ulis t entry confirms
per-flow load balancing is ln effect
[edit]
user@router# set routing-options forwarding-table export load-balance
• Confirm the results:

• Large number of flows are needed for optimum balancing
Internet:
172.31.18.0/24 user O ulst 1048574 12
172.18. 6.1 ucst 557 4 ge-1/0/0.141
Multiple next hops --172.18.7.1 ucst 558 5 ge-1/0/1.141
installed in the table
e20141'lnlpetNetwona:1nc.AJl,tJ,ils� , • , '.,,1::JUOff�°[fw�rl'!wideEducationServk:es ......,•.._,... I 17

�d-�.i<e-... -
Export Policy Is Needed

You can configure the Junos OS so, that for the active route, multiple next-hop addresses for a destination are installed in the
forwarding table. (The number of allowed hops varies with hardware.) This feature is called per-packet load balancing. You can
use load balancing to spread traffic across multiple paths between routers. The behavior of the load-balance
per-packet command might seem a bit misleading. On routers with the Internet Processor II ASIC, when you configure
per-packet load balancing, traffic between routers with multiple paths is divided into individual traffic flows (up to a maximum of
16 equal-cost load-balanced paths). Packets for each individual flow are kept on a single interface.Therefore, the load balancing
is actually per-flow load balancing rather than per-packet load balancing.
By default, the router uses the following Layer 3 information in the packet header to load-balance:
Source IP address;
Destination IP address;
Protocol; and
Incoming interface index.

Export Policy Is Needed (contd.)

By default. the software ignores port data when determining flows. If you include both the layer-3 and layer-4 statements
at the [edit forwarding-options hash-key family inet) hierarchy, the router uses the following Layer 3 and
Layer 4 information to load balance:
Source IP address;
Destination IP address;
Protocol;
Source port number;
Destination port number; and
Incoming interface index.
The router recognizes packets in which all of these Layer 3 and Layer 4 parameters are identical and ensures that these packets
are sent out through the same interface. This step prevents problems that might otherwise occur with packets arriving at their
destination out of their original sequence.
To configure the load-balancing behavior, include the load-balance per-packet option in a then statement or a
route-filter option in a from statement in a routing policy. You must apply the routing policy to routes exported from the
routing table to the forwarding table to complete the configuration. To do this, include the export statement at the [edit
routing-options forwarding-table J hierarchy, as shown on the slide.
The slide shows the affects of applying a per-packet load-balancing policy. The 172. 31. 18. O entry in the FT now contains a
ulst entry, which functions to list the set of unicast next hops that are available for per-flow load balancing. The slide also
shows that, in addition to the single ulst entry, two ucst entries relate to the parallel Gigabit Ethernet links running between
the devices.
Additional confirmation of proper load balancing is possible by monitoring the related interface counters while generating
different flows to the same destination prefix. Note that perfect load balancing-that is, a 50% split between two links-is only
likely when a large number of flows are in effect.


�Firewall Filters and Policers
Firewall Filters and Policers


Working with Firewall Filters
• Firewall filters can block, police, or direct packets to

nondefault routing instances
• Firewall filters-including network interface filters for
network protection and loopback interface filters for
RE protection-are recommended; however:
• Common cause of network outages
• Implicit Firewall behaviors
• Ordering issues
• Blocking protocol traffic
Why Firewall Filters Should Be Examined in PFE Troubleshooting

Firewall filters and policers deserve special consideration when troubleshooting forwarding problems because, by nature,
they are used to alter default forwarding behavior. The means by which they alter this behavior might include rejecting traffic,
discarding traffic, altering the forwarding path of traffic, or changing CoS behavior.
Firewall Filters Are Good, But They Can Be Naughty

Firewall filters are essential to providing protection for your network using network interface filtering or your Junos device
through the use of a loopback interface filter. However, implementing firewall filters requires detailed knowledge of the
behavior of the filters. Common problems that occur include neglecting to remember the default implicit discard action,
blocking protocol traffic, and ordering the terms incorrectly.
There are three significant implicit behaviors to remember when working with the stateless firewall filters available on
Juniper Network routers:
Implicit match: In the absence of match criteria within a specified term, everything matches.
Implicit terminating action: In the absence of a terminating action within a specified term, such as when
specifying an action modifier, there is an implicit accept.
Implicit final term: Upon the creation of the first firewall filter term, an implicit final term is also put into place
that will match all remaining traffic and silently discard it.

Displaying Firewall Filters

• Use show interfaces filters to determine
whether a filter is applied to an interface
user@router> show interfaces filters ge-1/0/1
Interface Admin Link Proto Input Filter Output Filter
ge-1/0/1 up up
ge-1/0/1.141 up up inet test
rnultiservice �
ge-1/0/1.32767 up up �
rnultiservice An output filter
"'-.
is in effect
[edit firewall family inet filter test]
user@rauter# show
term one {
from { All outgoing
protocol icmp;
ICMP messages
----
then {
----
----
----
--
are discarded
count icmp-rejected; ,,_----------- � (and counted)
discard;
term t1v·o
then accept;
Displaying Firewall Filters

The presence of a firewall filter can impact an interface's ability to forward certain types of traffic. Use the show interfaces
filters command to quickly determine whether any filters are in effect.
When a filter is listed, and you are having forwarding issues over the interface in question, you should double-check the applied
filter's configuration.

Displaying Interface Policers

• Interface policers act on specific protocol families to
limit their throughput
• Use show interfaces policers to display interface
policers
• Excess traffic is counted; display with show policers
pol.icer-na.me
• Use show policer to see default policer counters including
default ARP policer
An output policer is in effect for inet ramily
)
user@router> show interfaces policers ge-1/0/1.141
Interface Admin Link Proto Input Policer Output Policer
ge-1/0/1.14 _ up up
inet test-ge-1/0/1.141-inet-o
multiservice de�ault_arp_policer�
user@router> show policer test-ge-1/0/1.141-inet-o

Policers: 10 packets have exceeded limits
Na.-ne Packets ,
test-ge -1/0/1.141-inet-o 10;.•-------.J
Displaying Interface-Level Policers

The presence of an interface-level policer can impact an interface's ability to forward certain protocol families at native speed.
Use the show interfaces policers command to quickly determine whether interface-level policers are in effect for the
inet, mpls, or other families. Note that locally generated or terminated RE traffic is not affected by an interface policer when
you apply such a policer to a transit interface. This behavior differs from that of a standard firewall filter, where locally generated
RE traffic is subjected to a copy of the filter maintained in the RE.
To display the default ARP policer, or any customized ARP policers, issue a show policer command:
user@mx> show policer
Policers:
Name Packets
default arp_policer� 0
If you are experiencing a forwarding issue on an interface with an assigned policer, check the counters for policed packets. If
the counter shows a nonzero value, you should consider reviewing the policer configuration under the [edit firewall]
configuration hierarchy.


�Data Plane Troubleshooting Case Study
Data Plane Troubleshooting Case Study


Data Plane Troubleshooting Chart
Chassis. software. interface. transmission

line. and protocols are OK
Kernel fault
No
(consistency
checking)
Yes Yes
-�
No�
1 Yes
Suspect Suspect SuspectOoB Adjust

filter pol1cer black hole
Data Plane Troubleshooting Flowchart

The purpose of the data plane troubleshooting flowchart shown on the slide is simply to provide a set of high-level steps and
decision points that should get you started on the path of data plane troubleshooting. Note that reasonable people might
disagree on the exact ordering of the steps or on the particulars of the CU command(s) that could be used to help isolate
protocol-related problems.
One of the blocks on the sample flowchart indicates that if an active route is not installed in the FT, some type of kernel fault
occurs. The fault occur because the primary job of the Junos kernel is to ensure RIB/FIB constancy and to ensure consistency
between the master copy of the FT in the RE and the copy of the FT residing in the PFE.
Finding that an active route is not correctly installed in the FT is a very rare event. You should contact JTAC for support if you
suspect such a condition has affected your device. In most cases, RIB/FIB inconsistencies are the result of an operator (or
script) issuing a clear route destination destination-prefix command, which removes entries from the FT (but
not the RE), which leads to a RIB/FIB inconsistency.

Forwarding Case Study (1 of 3)

• Case study background:
• Server farm is attached to Router-i's ge-0/0/2 interface
• Users complain of long recovery times after Layer 2 switch is
restarted
• Everything works fine after initial delay
• What is wrong?
• Which CU commands and fault analysis steps can help
narrow down a possible cause? 0:90:69:6a:90:2 B
Router-1
.3 10.0.13/24
®ge-0/0/2
D
Data Plane Case Study: Background
This slide sets the stage for a sample data plane troubleshooting case study. We begin with a general description of the
problem, which in this case indicates that customers are complaining of long recovery times after a reboot or power-cycle of the
Layer 2 switch that interconnects the router's
ge-0/0/2 interface to a large server farm. Oddly enough, the complaints indicate that once connectivity is finally restored,
application traffic works as expected.
Feeling Lucky?
Based on this description, you would be pretty lucky if you already knew the cause of the problem. After all, it could be a
hardware error in the PFE, an interface policer, a firewall, or perhaps an MTU issue, right? We suggest that you follow the general
steps outlined on the sample data plane troubleshooting flow chart to get things started. Put another way, it might be a good
idea to start with the determination of whether the 10.0.13/24 route associated with the server farm is correctly installed into
the FT.

• Sample course of action:

• Determine whether active route is installed in forwarding
ta b I e Subnet mute installed with
user@Router-1> show route forwarding-table destination 10.0.13/24 resowenenhop
Internet:
Destination RtRef Next hop
0 10.0.13.0 recv
user@Router-1> show route forwarding-table destination 10.0.13.2

Internet:
Destination T e RtRef Next ho T e Index NhRef Netif
10.0.13.2/32 dest O 0:90:69:6a:90:2 ucst 286 2 e-0/0/2.0
The host mute to server
......___
• Are any filters in effect? ----- 10.o.13.2wascorrectty
installed and resowed
user@Router-1> show interfaces filters ge-0/0/2
Interface Admin Link Proto Input Filter Output Filter
ge-0/0/2 up up
ge-0/0/2.0 up up inet
Forwarding Case Study: Part 2

This slide provides examples of troubleshooting steps based on the sample data plane troubleshooting flowchart. In this case
you begin with the determination of whether the active route associated with the server farm was correctly installed into the FT.
The output of the show route forwarding-table commands confirms that the /24 prefix representing the IP subnet and
specific /32 entries representing individual hosts were correctly installed into the FT.
The next step is to display any firewall filters that might be in effect for the ge-0/0/2 interface. The output from the show
interfaces filters command confirms that no filtering is in effect.

• Sample course of action (contd.):

• Are any interface policers at play?
user@Router-1> show interfaces policers ge-0/0/2
Interface Admin Link Proto Input Policer output Policer
ge-0/0/2 up up
ge-0/0/2.0 up up
multiservice �default_arp_policer��
No inet (ormpJ.s) family policers are configured
• What about ARP policers? Hmmm_Auser-definedARP

user@Router-1> show policer policer isin effect!
Policers:
Name Packets
�default_arp_policer� 0
arp-limit-ge-0/0/2.0-inet-arp ......�� 200
user@Router-1> show configuration firewaJ.l policer arp-limit

if-exceeding {
bandwidth-limit 32k;
burst-size-limit 2k; ----- AratherharshARPpolicer.witha
discard action_ Bingo!
then discard;
Forwarding Case Study: Part 3

In keeping with the steps outlined on the sample troubleshooting flow chart. your next course of action is to determine whether
any interface-level policers are in effect on the router's ge-0/0/2 interface. The output of a show interfaces poJ.icers
command indicates that no inet or mpls family policers are currently in effect.
At this stage an average technician might opt to move onto MTU-related testing. But not you; you recall that the default and
user-defined ARP policers are not displayed in the output of the previous command, and so you decide to issue a show
poJ.icers command.
The output of the show poJ.icers command confirms that a user-defined ARP policer is in place; this is very interesting
because ARP transitions tend to occur in bursts just after a device resets. Once the initial learning curve completes, the rate of
ARP transactions tapers off significantly. You note that some 200 ARP messages were counted as out of profile by the ARP
policer.

Forwarding Case Study: Part 3 (contd.)

With your curiosity piqued, you display the configuration of the custom ARP policer. The rather low bandwidth and burst
tolerances of the arp-limi t policer, combined with its discard action for some 200 ARP messages, causes you to utter the
phrase "bingo!"
For the curious, here is the configuration of ge-0/0/2.0
family inet {
policer {
arp arp-limit;
address 10.0.13.1/24;

Summary
• Defined data plane problems and components
• Monitored and troubleshot data plane forwarding
• Monitored load balancing
• Troubleshot firewall filter and policer issues
We Discussed:
Data plane problems and components;
Monitoring and troubleshooting data plane forwarding;
Monitoring load balancing; and
Troubleshooting firewall filter and policer issues.

Review Questions
1. What are the symptoms of a data plane problem?

2. Which command do you use to display entries in the
main forwarding table?
3. Which types of load balancing does the Junos OS
support?
4. Which command do you use to display ARP
policers?
Review Questions
1.
2.
3.
4.

Isolate and Troubleshoot PFE Issues Lab
• View and clear forwarding table entries.

• Configure and monitor load balancing.
• Troubleshoot stateless firewall issues.
Isolate and Troubleshoot PFE Issues Lab


l.
Symptoms of a data plane problem are indicated by traffic not flowing correctly or not at all even though control plane components,
such as a route, appear to be fine.
2.
The main forwarding table is stored in the control plane and can be viewed with tl1e show route forwarding-table operational mode
cotnmand.
3.
By default, the Junos OS performs per prefoc load balancing. When you enable per-packet load balancing, tl1e Junos OS performs per
flow load balancing.
4.
To display ARP policers, use tl1e show policer operational mode command.

Acronym List
3DES .................................................................... triple Data Encryption Standard

ABR....................................................•.•••••.•.................... area border router
ADM........................................................•..................... add/drop multiplexer
AIS ............................................................................. alarm indication signal
AIS-L......•...••.....•...................................................... alarm indication signal-line
AIS-P......................................................••••••........... alarm indication signal-path
ANSI............................................................... American National Standards Institute
APS..........................................................•.•......... Automatic Protection Switching
ARP.......•..................................................••.•........... Address Resolution Protocol
AS.................................................................................autonomous system
ASBR ................................................................autonomous system boundary router
ASIC ................................................................ application-specific integrated circuits
ASN.1 .......•.............................................................. Abstract Syntax Notation One
ATM ....................................................................... Asynchronous Transfer Mode
BDR .......................................................................... backup designated router
BERT.......••.•...................................................................... bit error rate test
BFD.................................................................... Bidirectional Forwarding Detection
BPV................................................................................... bipolar violation
CB.........•••...............••......................................................... Control Board
CCC...........•................................................................... circuit cross-connect
CCV.................................................................................C-bit code violation
CDP..................•......................................................... Cisco Discovery Protocol
CE..................................................................................... customer edge
CES...............................................................................C-bit errored seconds
CFEB......•.......•••.••....•......................................... Compact Forwarding Engine Board
CFEB-E............•••.••••..•............................... Enhanced Compact Forwarding Engine Board
CHAP ...................•.................................... Challenge Handshake Authentication Protocol
chassisd........................................................................ chassis control daemon
CIP .............••••••.••••••.................................................Connector Interface Panel
Cisco HDLC ............................................................. Cisco High-Level Data Link Control
CU ....................•••••....................................................command-line interface
CM ....••........•...................................................•.•............... Case Manager
CoS..................................................................•................. class of service
CPE....................................................................... customer premises equipment
CSES...................................................................... C-bit severely errored seconds
CSU..............•......•.....•.•..•••..•.......................................... channel service unit
CSU/DSU .........•.................................................. channel service unit/data service unit
dcd...............•.....•....................................................... device control daemon
DCE.....................•................................................ data communication equipment
DES........................................................................... Data Encryption Standard
DLCI ...................................................................... data-link connection identifier
DOA ................................................................................... dead-on-arrival
Dos......................................................................•............ denial of service
DPC............................................................................ Dense Port Concentrator
DR.......................................................................•........... designated router
DTE............................................................................ data terminal equipment
EBGP .................................................................................... external BGP
ECMP .........................••.•..••............................................ equal-cost multipath
ESD...........................••.•.............................................. electrostatic discharge
ES-IS..........................••.•....••............................ End System-to-Intermediate System
EXZ ..............................•.................................................... excessive zeros
FCS.............................•••...•••....................................... Frame check sequence
FEAC..........................•••..•...••.•................................... far-end alarm and control
FEB........................................................................... Forwarding Engine Board
FEBE............................•.••.....•.••.•..................................... far-end block error
www.juniper.net Acronym List • ACR-1

FIB......................................................................... forwarding information base
FPC ...................................................••...................... Flexible PIC Concentrator
FPM................................................................................Front Panel Module
FRU........................................•......•.•.•.......................... field-replaceable unit
FT ........•..............................................•..........••................forwarding table
GMPLS .............................................•••••............................ generalized MPLS
GRE......•..........................................•••................... generic routing encapsulation
GRES.................................................................. graceful Routing Engine switchover
GUI ..................................................•••.•..................... graphical user interface
HA .................................................................................... high availability
HDB3...............................................•........................ high-density bipolar 3 code
HDLC..............................................•........................ High-Level Data Link Control
IBGP .....•.••.•...........................••.............................................internal BGP
ICMP ....................................•..••.....••.................. Internet Control Message Protocol
IDE.......................................................................... Integrated Drive Electronics
IDS.....................................•.••.•.•..•.•......................... intrusion detection service
IEEE.......................................•.•.....•.........Institute of Electrical and Electronics Engineers
IETF.....................................••.•...•......................... Internet Engineering Task Force
IGP..................................•....••••••.•••........................... interior gateway protocol
ILMI .................................•••••......••....•.••••....... Integrated Local Management Interface
IOC.................................•••••••..••••••.................................. input/output card
IOS....................................................................... Internetwork Operating System
IPMI................................••••••••.••••••............. Intelligent Platform Management Interface
IP-NCP........................... ...•.•.•.•............•.•............. ... .. IP Network Control Protocol
IPsec...............................••.••.••....••......................................... IP Security
1Pv4.........•••............................••..•.........•............................... IP version 4
1Pv6......................................•••••.•......................................... IP version 6
IS-IS........................................................... Intermediate System-to-Intermediate System
ISP............................................•................................ Internet service provider
ISSU ................................•.••••••••••••.......................... in-service software upgrade
JNCP .............................................................. Juniper Networks Certification Program
JTAC....................................•••••••••.••••....... Juniper Networks Technical Assistance Center
KB..........•.•...........................•.•........................................ knowledge base
LACP ....................................•••••••.•••.•.................. Link Aggregation Control Protocol
LCP ............................................................................... Link Control Protocol
LCV ..................................•.•••.••••••................................... line code violation
LES ............................................................................... line errored seconds
LLDP ...................................•.••••.•.•••••...................... Link Layer Discovery Protocol
LMI ..........•.•..........................••......•••......•••..••......... Local Management Interface
LOF .......................................•.••••.•••.................................... loss-of-frame
LOH..................................................................................... line overhead
LOL .......................................................................................loss of light
LOS ..................................................................................... loss-of-signal
LSA ...............................................•............................ link-state advertisement
LSDB............................................................................... link-state database
LSP .......•...........................................•............................ label-switched path
LSR ......••...................................................................... label-switching router
LTE ......•.................................................................. line terminating equipment
LTE ......••...........................................•........................... Long Term Evolution
L2CP ........................................................................... Layer 2 Control Protocol
12cpd.....•..•.••......................................•.••.............. Layer 2 Control Protocol process
MAC.............................................................................. media access control
MAC.....••••..•.........................................••...••.•......... Message Authentication Code
Mbps........•............•....................................................... megabits per second
MCS.........•.•............................................•...••...... Miscellaneous Control Subsystem
MD5 ................................................................................ Message Digest 5
MIC ........•••.............................................••..••••........ Modular Interface Controller
MPC..........•..............................................•..•.••.......... Modular Port Concentrator
MSTP.......••••••..........................................•.•..•.••.... Multiple Spanning Tree Protocol
MTU......................•...•••............................................ maximum transmission unit
ACR-2 • Acronym List www.juniper.net

NAT........................................................................ Network Address Translation
NBMA........................................................................ nonbroadcast multiaccess
NCP............................................................................ Network Control Protocol
NLRI................................................................ network layer reachability information
NMS....................................................................... network management system
NOC ......................................................................... network operations center
NPC........................................................................... Network Processing Card
NPU ........................................................................... Network Processing Unit
NSB ................................................................................. nonstop bridging
NSR ............................................................................. nonstop active routing
NSSA ............................................................................... not-so-stubby area
NTF...................................................................................no trouble found
NTP..............................................................................Network Time Protocol
OAM .......................................................... Operation, Administration, and Maintenance
OID................................................................................... object identifier
OoB....................................................................................... out-of-band
OSI ....................................................................... Open Systems Interconnection
OSS.......................................................................... operations support system
PAP.................................................................... Password Authentication Protocol
PCG.............................................................Packet Forwarding Engine Clock Generator
PCV.................................................................................P-bit code violation
PDU ................................................................................. protocol data unit
PE...................................................................................... provider edge
PEM ............................................................................... power entry module
PES...............................................................................P-bit errored seconds
PFE........................................................................... Packet Forwarding Engine
PIC ............................................................................. Physical Interface Card
PID ........................................................................................ Process ID
PIM........................................................................... Physical Interface Module
PIM..................•................................................... Protocol Independent Multicast
PoE............................................................................... Power over Ethernet
POH ...........................................................•••..................... path overhead
PPP.............................................................................. Point-to-Point Protocol
PSES...................................................................... P-bit severely errored seconds
PTE......................................................................... path terminating equipment
P2MP............................................................................... point to multipoint
P2P..................................................................................... point to point
QoS............................................................•..•.................. quality of service
RDI ............................................................................ remote defect indication
RE..........................•.......................................................... Routing Engine
REI ............................................................................. remote error indication
REI-L.............................................................................. remote indicator line
RIB ........................................................................... Routing Information Base
RID......................................................................................... router ID
RIPng ......................................................... Routing Information Protocol next generation
RMA ...................................................................... Return Materials Authorization
RMON...............................................................................Remote Monitoring
rpd ............................................................................ routing protocol daemon
RPM ................................................................... real-time performance monitoring
RSTP....................................................................... Rapid Spanning Tree Protocol
RTOS......................................................................... real-time operating system
SCB.............................................................................. System Control Board
SCG...............•..•...•......................................•••............. SONET Clock Generator
scp ...................................................................................... secure copy
SEFS................................................................... severely errored framing seconds
SFM ................................................................... Switching and Forwarding Module
SHA............................................................................. Secure Hash Algorithm
SIB ............................................................................. Switch Interface Board
SLA.............................................................................service-level agreement
SMART ....................•.••...............••••.•..... Self-Monitoring, Analysis, and Reporting Technology
www.juniper.net Acronym List • ACR-3

SMI ................................................................ Structure of Management Information
SNMP .............................................................. Simple Network Management Protocol
snmpd................................................................................. SNMP process
SNMPv1 ............................................................................... SNMP version 1
SNMPv2c ............................................................................. SNMP version 2c
SNMPv3 ............................................................................... SNMP version 3
SOH ...........•...................................................................... section overhead
SPC........................................................................... Services Processing Card
SPE ...................................................................... SONET/SDH payload envelope
SPF ......•.......................................................................... shortest-path-first
SPMB ..............................................•.................. Switch Processor Mezzanine Board
SPU......••.•.•................................................................ Service Processing Unit
SRE...................................................................•.... Services and Routing Engine
SRE......••..................................................................... Switch Routing Engine
SSB........................................................................... System and Switch Board
SSL ......•....................................................................... Secure Sockets Layer
STE ...................................................................... section terminating equipment
STP .............................................................................Spanning Tree Protocol
SWAP......................................................................... Space, Weight, and Power
TCC ............................•.....................•...................... translational cross-connect
TCP ....................................................................... Transmission Control Protocol
TDM........................................................................... time-division multiplexing
ToS .................................................................................... type-of-service
TIL.......•................................................................................ time-to-live
tty ................•.•..•••..•............................................................... teletype
UAS ............................................................................... unavailable seconds
UDP............................................................................ User Datagram Protocol
USB......••........................................................................ universal serial bus
USM......••.................................................................. user-based security model
VACM....••••...................•....................................... view-based access control model
VCI............................................................................. virtual channel identifier
VLAN ....•..••••................•.......................................................... virtual LAN
VPI......•....•...•...•........•................................................... virtual path identifier
VPN.....•......•...........•.................................................... virtual private network
VRRP................................................................. Virtual Router Redundancy Protocol
WAD .............................................................................. working as designed
ACR-4 • Acronym List www.juniper.net

JTNOC 12.b SG Vol.1 PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

JTNOC 12.b SG Vol.1 PDF

Transféré par

Droits d'auteur :

Formats disponibles

Junos Troubleshooting in the NOC

1133 Innovation Way

Course Number: EDU-JUN-JTNOC

Junos Troubleshooting in the NOC Student Guide, Revision 12.b

Chapter 1: Course Introduction .....................................................1-1

Chapter 2: Troubleshooting as a Process .............................................2-1

Chapter 3: Junos Product Families...................................................3-1

Chapter 4: Troubleshooting Toolkit ..................................................4-1

Chapter 5: Hardware and Environmental Conditions ....................................5-1

Chapter 6: Control Plane...........................................................6-1

Chapter 7: Data Plane: Interfaces ...................................................7-1

www.juniper.net Contents • iii

Acronym List ...................................................................ACR-1

www.juniper.net Course Overview • v

vi • Course Agenda www.juniper.net

CLI and GUI Text

Style Description Usage Example

Courier New Console text:

Input Text Versus Output Text

Style Description Usage Example

Normal CLI No distinguishing variant. Physical interface:fxpO,

Defined and Undefined Syntax Variables

Style Description Usage Example

CLI Variable Text where variable value is pol icy my-peers

www.juniper.net Document Conventions • vii

Education Services Offerings

About This Publication

Juniper Networks Support

viii • Additional Information www.juniper.net

Chapter 1: Course Introduction

Chapter 1-2 • Course Introduction www.juniper.net

• Before we get started ...

www.juniper.net Course Introduction • Chapter 1-3

Course Contents: Part 1

Chapter 1-4 • Course Introduction www.juniper.net

Course Contents: Part 2

www.juniper.net Course Introduction • Chapter 1-5

Chapter 1-6 • Course Introduction www.juniper.net

General Course Administration

www.juniper.net Course Introduction • Chapter 1- 7

• Available materials for classroom-based

Training and Study Materials

Chapter 1-8 • Course Introduction www.juniper.net

• For those who want more:

www.juniper.net Course Introduction • Chapter 1-9

• To receive your certificate, you must complete the

Chapter 1-10 • Course Introduction www.juniper.net

Juniper Networks Education Services

Juniper Networks Education Services Curriculum

www.juniper.net Course Introduction • Chapter 1-11

Juniper Networks Certification Program

• Why earn a Juniper Networks certification?

Juniper Networks Certification Program

Chapter 1-12 • Course Introduction www.juniper.net

Juniper Networks Certification Path

i;li" � '" ' .,"' ' s->:

"'i) i :'I '"J # � h�ll (.d -., ,_,

Expert Level (9NCIEl

Juniper Networks Certification Program Overview

The JNCP offers the following features:

www.juniper.net Course Introduction • Chapter 1-13

• Training and study resources:

Preparing and Studying

��,2�.!��!:.l,f"7i'•" ;iJAfPef ;woridwide Education Servk:es wwwJurupernet I 24