Académique Documents
Professionnel Documents
Culture Documents
12.b
Student Guide
Volume1
Jun1Pec NETWORKS
Worldwide Education Services
Juniper Networks reserves the right to change, modify, transfer. or otherwise revise this publication without notice.
YEAR 2000 NOTICE
Juniper Networks hardware and software products do not suffer from Year 2000 problems and hence are Year 2000 compliant. The Junos operating system has
no known time-related limitations through the year 2038. However. the NTP application is known to have some difficulty in the year 2036.
SOFTWARE LICENSE
The terms and conditions for using Juniper Networks software are described in the software license provided with the software, or to the extent applicable, in an
agreement executed between you andJuniper Networks, orJuniper Networks agent. By using Juniper Networks software, you indicate that you understand and
agree to be bound by its license terms and conditions. Generally speaking, the software license restricts the manner in which you are permitted to use the Juniper
Networks software. may contain prohibitions against certain uses. and may state conditions under which the license is automatically terminated. You should
consult the software license for further details.
Contents
iv • Contents www.juniper.net
Course Overview
This three-day course is designed to provide introductory troubleshooting skills for engineers in a
network operations center (NOC) environment. Key topics within this course include
troubleshooting methodology, troubleshooting tools, hardware monitoring and troubleshooting,
interface monitoring and troubleshooting, troubleshooting the data plane and control plane on
devices running the Junos operating system, staging and acceptance methodology,
troubleshooting routing protocols, monitoring the network, and working with JTAC. This course is
based on Junos operating system Release 12.2R2.5.
Objectives
After successfully completing this course, you should be able to:
Reduce the time it takes to identify and isolate the root cause of an issue impacting
your network.
Gain familiarity with Junos products as they pertain to troubleshooting.
Become familiar with online resources valuable to Junos troubleshooting.
Gain familiarity with Junos tools used in troubleshooting.
Identify and isolate hardware issues.
Troubleshoot problems with the control plane.
Troubleshoot problems with interfaces and other data plane components.
Describe the staging and acceptance methodology.
Troubleshoot routing protocols.
Describe how to monitor your network with SNMP, RMON, JFlow, and port mirroring.
Become familiar with JTAC procedures.
Intended Audience
The course content is aimed at operators of devices running the Junos OS in a NOC environment.
These operators include network engineers, administrators, support personnel, and reseller
support personnel.
Course Level
Junos Troubleshooting in the NOC is an introductory-level course.
Prerequisites
Students should have basic networking knowledge and an understanding of the Open Systems
Interconnection (OSI) reference model and the TCP/IP protocol suite. Students should also attend
the Introduction to the Junos Operating System (IJOS) course and the Junos Routing Essentials
(JRE) course, or have equivalent experience prior to attending this class.
Day 1
Chapter 1: Course Introduction
Chapter 2: Troubleshooting as a Process
Lab 1: The Troubleshooting Process
Chapter 3: Junos Product Families
Lab 2: Identifying Hardware Components
Chapter 4: Troubleshooting Toolkit
Lab 3: Monitoring Tools and Establishing a Baseline
Day2
Chapter 5: Hardware and Environmental Conditions
Lab 4: Monitoring Hardware and Environmental Conditions
Chapter 6: Control Plane
Lab 5: Control Plane Monitoring and Troubleshooting
Chapter 7: Data Plane: Interfaces
Lab 6: Monitoring and Troubleshooting Ethernet Interfaces
Chapter 8: Data Plane: Other Components
Lab 7: Isolate and Troubleshoot PFE Issues
Day3
Chapter 9: Staging and Acceptance Testing
Chapter 10: Troubleshooting Routing Protocols
Lab 8: Troubleshooting Routing Protocols
Chapter 11: High Availability
Chapter 12: Network Monitoring
Lab 9: Monitoring the Network
Chapter 13: JTAC Procedures
Appendix A: Interface Troubleshooting
Franklin Gothic Normal text. Most of what you read in the Lab Guide
and Student Guide.
CLI Input Text that you must enter. lab@San_Jose> show route
GUI Input Select File > Save, and type
config. ini in the Filename field.
CLI Undefined Text where the variable's value Type set policy po.licy-name.
is the user's discretion and text
ping 10.0.�
where the variable's value as
GUI Undefined shown in the lab guide might Select File > Save, and type
differ from the value the user fi.lename in the Filename field.
must input.
Technical Publications
You can print technical manuals and release notes directly from the Internet in a variety of formats:
Go to http://www.juniper.net/techpubs/.
Locate the specific software or hardware release and title you need, and choose the
format in which you want to view or print the document.
Documentation sets and CDs are available through your local Juniper Networks sales office or
account representative.
Objectives
• After successfully completing this content, you will be
able to:
• Get to know one another
• Identify the objectives, prerequisites, facilities, and
materials used during this course
• Identify additional Education Services courses at
Juniper Networks
• Describe the Juniper Networks Certification Program
We Will Discuss:
Objectives and course content information;
Additional Juniper Networks, Inc. courses; and
The Juniper Networks Certification Program.
Introductions
Introductions
The slide asks several questions for you to answer during class introductions.
Course Contents (1 of 2)
• Contents:
• Chapter 1: Course Introduction
• Chapter 2: Troubleshooting as a Process
• Chapter 3: Junos Product Families
• Chapter 4: Troubleshooting Toolkit
• Chapter 5: Hardware and Environmental Conditions
• Chapter 6: Control Plane
• Chapter 7: Data Plane: Interfaces
• Chapter 8: Data Plane: Other Components
Course Contents (2 of 2)
• Contents: (contd.)
• Chapter 9: Staging and Acceptance Testing
• Chapter 10: Troubleshooting Routing Protocols
• Chapter 11: High Availability
• Chapter 12: Network Monitoring
• Chapter 13: JTAC Procedures
• Appendix A: Interface Troubleshooting
Prerequisites
• The prerequisites for this course are the following:
• Basic networking knowledge
• Networking Fundamentals computer-based training, or
equivalent knowledge
• The Introduction to the Junos Operating System (IJOS)
course, or equivalent knowledge
• The Junos Routing Essentials (JRE) course, or equivalent
knowledge
Prerequisites
The slide lists the prerequisites for this course.
Course Administration
• The basics:
• Sign-in sheet
• Schedule
• Class times
• Breaks
• Lunch
• Break and restroom facilities
• Fire and safety procedures
• Communications
• Telephones and wireless devices
• Internet access
Education Materials
Additional Resources
Additional Resources
The slide provides links to additional resources available to assist you in the installation, configuration, and operation of
Juniper Networks products.
Satisfaction Feedback
&JD
Class
Feedback
Cl ==
Satisfaction Feedback
Juniper Networks uses an electronic survey system to collect and analyze your comments and feedback. Depending on the
class you are taking, please complete the survey at the end of the class, or be sure to look for an e-mail about two weeks
from class completion that directs you to complete an online survey form. (Be sure to provide us with your current e-mail
address.)
Submitting your feedback entitles you to a certificate of class completion. We thank you in advance for taking the time to
help us improve our educational offerings.
• Formats:
• Classroom-based instructor-led technical courses
• Online instructor-led technical courses
• Hardware installation elearning courses as well as technical
elearning courses
• Complete list of courses:
• http:j/www.juniper.net;training/technical_education/
Courses
You can access the latest Education Services offerings covering a wide range of platforms at
http:/ /www.juniper.neVtraining/tech nical_education/.
Juniper.
r. ',Ls
', ,
: '' , ,, • '� '',;,: ,; ,, '
, Spec1a1,st ��v�l'(JN�I�}, ; ,
::r: }' ';
=, ' ':,; ,,
;'f•
�, fr\ "., .,
., � .. 1 r"if t , 1 1-
• , """ 1, $ ,.;; ,, •
Certification Preparation
-"'""'-......'""""--I 1�:;�=
• Practice for multiple exams in Study Mode
• Hundreds of multiple choice questions and .,.
=�
" ',
�W'Y':.;:'
_.IJ,,.
'-l(r,,.,,....1, �'.)'-:,
NC.A ..lt.rO) r�'<bl:! N,,l'S1lllill"'*"'
'l
'••. OHdtor�"to'oow�J
1 P.. 0.'.3.f• --- mm
'Ju! L-''
•
• Build a virtual network with device
,. r••
achievements
. :¥
• Track your results in the app and
Game Center; share your network Iii� ..
through Facebook and Twitter
JU NOS
GF�,llF
www.juniper.net;junosgenius
Junos Genius
The Junos Genius application takes certification exam preparation to a new level. With Junos Genius you can practice for
your exam with flashcards, simulate a live exam in a timed challenge, and even build a virtual network with device
achievements earned by challenging Juniper instructors. Download the app now and Unlock your Genius today!
Find Us Online
JnetJ http:j/www.juniper.net/jnet
http:j/www.juniper.net/facebook
EJ http:j/www.juniper.net/youtube
[)j http:j/www.juniper.net/twitter
Find Us Online
The slide lists some on line resources to learn and share information about Juniper Networks.
Questions
Any Questions?
If you have any questions or concerns about the class you are attending, we suggest that you voice them now so that your
instructor can best address your needs during class.
Objectives
• After successfully completing this content, you will be
able to:
• Avoid unnecessary disruptions to production environments
• Describe a troubleshooting process
• Describe troubleshooting challenging network issues
We Will Discuss:
How to avoid unnecessary disruptions to production environments;
Troubleshooting as a process; and
Situations that pose troubleshooting challenges.
:JtJnm;
� -,;W'71!l!'f!..i'f�'.';1.�r"<"' ,� � =-=xx-.-.,,,"" �-
020f.41UnlperNellHoflei,l0C.AIIIW*�J 'Jtr"'':!u, '' ,Wo!ldwide Education Services www,unoper.net I 3
,, 2.ifu:Ji1ltit'::iiilI1i Sl'..:���jf':,.,,., ___ ;.;.,;;_ ,;:,f,»-<x==�.:��;s> > - �
• First, do no harm:
• Know what is normal
• Use change control processes
• Plan for the worst
• Backup configurations and other key files
• Use non-disruptive practices
• Recreate in a lab environment
• Use maintenance windows
First, Do No Harm
When modern medical doctors begin practicing, they often take what is called the Hippocratic oath, primum non nocere.
This Latin phrase, attributed to Hippocrates, translates as "first, do no harm."
This should also be the concern of a network administrator working in a production network environment. Some of the
information presented in this course could be disruptive to a production network. This is true not only for corrective actions
that might be taken, but also applies to the troubleshooting process itself.
The slide lists several best practice safety precautions that can be taken to minimize unforeseen impact to the network. We
will discuss each of the listed options in more detail in the next few pages.
Change-Control Processes
• Use change-control processes
• Formalized
• Balance needs with risks
• Coordinate scheduling to minimize impact to production
• Remember, customers might have change control policies in
place as well
Change-Control Policies
Best-in-class companies have formalized change-control policies that govern any modifications to the production
environment and define processes to use when changing it. These processes are designed to balance the need for changes
in an environment with the technical risks associated with implementing those changes. They usually allow for more than
one set of eyes to review changes before they are implemented. This built-in protection helps avoid unforeseen impact due to
oversight by a single individual or group. These processes also dictate how changes should be implemented in a way that
minimizes impact to the network.
Remember that change-control processes can also pertain to troubleshooting, because many troubleshooting steps can
change or impact a production environment as well. Give special consideration to the possible impact of all troubleshooting
steps before they are used. This step can be easily forgotten in a crisis, but failure to do so can make the situation worse.
When troubleshooting, always consider whether the troubleshooting step is something that needs to go through a formalized
change control process.
Configuration
Disruptive Practices
• Be aware of disruptive practices
• Review power-on hardware information for your equipment
• Hot-swappable FRUs
• Hot-pluggable FRUs
• Review hardware redundancy options where available
• Be careful when using hidden CU commands
• Hidden commands are hidden for a reason
• Understand disruptive potential before using
• Be careful when using disruptive testing techniques
Maintenance Windows
• Maintenance windows:
• Minimize impact from unforeseen issues
• Do not be distracted by perceived urgency
• Customers have maintenance windows too
Maintenance Windows
Best-in-class companies set aside time for maintenance windows. Like change-control processes, maintenance windows are
designed to balance the need for changes in an environment with the technical risks associated with implementing those
changes.
Under the best of circumstances, appropriate precautions can allow for zero down time. It is the unexpected impacts that
can make a situation worse than the one you began with. Maintenance windows are helpful not only for handling the known
interruptions when making changes in an environment, but can also be beneficial for dealing with potential possibilities.
When troubleshooting, always consider whether the troubleshooting step is something that should wait for a formalized
maintenance window.
Troubleshooting
• Troubleshooting:
• The ability to identify the root cause of a problem impacting
the network
• The ability to identify the root cause of any deviation from
the normal or expected behavior of a network
Troubleshooting
The purpose of troubleshooting is to identify the root cause of an issue.
Frequently, troubleshooting is thought of only as it applies to tracking down the root cause of a clearly identifiable problem
and, generally, in terms of the impact it has on a network or a user's ability to use the network. In reality, it can extend
beyond that simple definition to include any deviation from the normal or expected behavior.
When problems do occur, or when the behavior of a network varies from the normal or expected behavior, it is necessary to
identify the root cause to resolve the issue and eliminate the negative impact. Additionally, in a production network, it is
important to do so in a manner that introduces the least disruption possible.
A Process-Based Methodology
• Process-based methodology:
• Learnable
• Repeatable
• Can be used when dealing within any of these elements of a
device running the Junos OS:
• Chassis
• Control plane
• Interfaces and circuits
• Data plane
A Process-Based Methodology
We have all known somebody we consider to be a good troubleshooter. The purpose of this chapter is to demonstrate that
the art of troubleshooting, is not an art at all. But rather a learnable, repeatable, process-based methodology.
Two individuals working with the same sets of tools and a common symptom might approach the act of fault analysis in
completely different ways. For example, one person might always start with visual inspection while another opts to begin with
interface loopback tests. In the end, it is hard to say that one approach is better than another, assuming that both individuals
arrive at a similar conclusion, in a similar amount of time with similar levels of disruption.
Although many different approaches to troubleshooting exist, certain fundamental elements are involved in any sound
troubleshooting methodology. Experienced technical engineers likely already employ many of these techniques, intentionally
or otherwise. The goal of this course is to help establish a repeatable framework where experienced engineers can use their
existing knowledge of a given technology to achieve more efficient and effective support.
Where To Begin?
• The scientific method:
• Characterize a problem based on observation and
experience
• Hypothesize and propose an explanation for the observation
• Make a prediction based on past experiences
• Test and experiment to prove or disprove the accuracy of
the prediction
Troubleshooting Steps
• Troubleshooting steps:
Define
• Define success Success
• Isolate the component preventing success
• Characterize
• Hypothesize
0000.I
______,,,
, Isolate
�--· ...
• Predict
• Test and experiment
• Identify a solution
• Implement the solution Implement
Solution
Define Success
Define
Success
• Define success:
• Quantify the problem
• What is happening that should not be happening?
• What should be happening that is not happening?
• Define a desirable endpoint
• Be specific
• Define a recognizable endpoint
• Example: prefix a.b.c.d/z will be received from neighbor x
• Be careful not to define success using preconceived
solutions
Define Success
One of the most critical, and often most overlooked, steps in the troubleshooting process is defining success. Ideally, you
should define success in terms of a desired objective, rather than an encountered problem or error. This definition should be
a specific, recognizable, and desirable endpoint-not a restatement of the problem.
You should also remember that often many different ways exist to meet a given objective. Understanding the desired final
outcome is far more beneficial than simply understanding a particular problem encountered along the way.
Define
Suca,ss
• Hypothesize 5o1.-,
• Predict
• Test and experiment
Characterize
Hypothesize
Form a Hypothesis
A hypothesis is a possible explanation for behaviors observed during the characterization stage. Many possible explanations
might exist. Some explanations will correlate with the observations made earlier, whereas other explanations will be
immediately discounted based on observations made during the characterization stage. Be careful! Do not be blinded by
subjectivity. Keep an open mind when considering possible explanations. Be complete. Do not make assumptions. Do not
overlook the obvious because you have a preconceived notion about where the root cause of the problem is. Although
leveraging your memory and past experiences against a current problem is good, you should never close your mind to new
possibilities.
During this stage, try to identify as many explanations as possible. Although you might go quickly through this stage on a first
pass, it becomes particularly important to be complete if the troubleshooting process becomes recursive. One helpful
method for identifying the potential root cause of a problem is to identify all of the required components and dependencies
necessary to achieve the desired success. For example, for a host to connect to a remote HTIP server, several components
must exist. Connectivity must exist between the end points. Each endpoint (and all intermediate systems between) must
have appropriate routing information. Security settings must allow the traffic (or a lack of security settings that would prevent
it) and so on.
Layered Approach
OSI TCP
Application
Presentation
Application
Session
Transport Transport
Network Internet
Data Link
Link
Physical
A Layered Approach
When identifying required components, remember the reference models. It does not matter whether you use the Open
Systems Interconnection (OSI) model or the RFC 1122 Internet model. Although the two views of the network are not
intended to match exactly, each provides a layered approach with dependencies on the underlying layers for the upper layers
to perform their role.
Understanding the role that each layer plays and how each layer depends upon the lower levels can greatly simplify the task
of isolating the possible root causes of a problem.
BGP adjacencies are a great example of how the different layers work together to accomplish designated objectives. BGP
forms adjacencies at the Application Layer to share routing information. To form an adjacency, BGP relies on TCP at the
Transport Layer to establish logical connections between BGP peers that in turn rely on the underlying routing information for
reachability of internal BGP peers that, of course, has a dependency on link-level connectivity between all of the involved
devices.
Recursive Process
Test Condition
Each test should reduce the number of possible root causes for a problem-regardless of the outcome. For example, if a
device will not boot when new cards are added, one possible course of action would be to remove the nonrequired
field-replaceable units (FRUs) and see if the device boots. The remaining FRUs could then be added back in one at a time (or
in groups) to help isolate the problematic hardware.
Remember, isolation of the root cause of a problem can be a recursive process.
Compound Issues
We mentioned earlier that it is rare to see problems occur in both the control plane and data plane simultaneously in
established environments. However, if proper precautions are not taken, it is possible to introduce additional problems into
the environment during the troubleshooting process. It is also possible that in the process of troubleshooting one issue, you
might discover other previously unnoticed issues. Always be open to the idea that there might be more than one problem.
Configuration Errors
• Configuration:
• Most plausible in new setup or with recent changes
• Use show system commit to check for recent changes
• Use show compare to display differences
• Remember to check all devices that could introduce a problem
• Eliminate the control plane as a possibility before focusing
on the data plane
• When configuration errors are suspected, it is OK to quickly
glance at configuration. but rely on operational mode
commands to isolate errors
• The human brain sees what it expects to see
Configuration Errors
Configuration issues are the most likely cause of problems in new setups. They are also the most probable cause of
problems within the control plane. Problems in the control plane can even occur because changes were made to the
configuration of another device within the environment.
The Junos OS has built-in sanity checks to ensure that all configuration entries are valid. In some cases, they can also check
for completeness, such as ensuring that a referenced policy exists. This process is different from checking for accuracy
because there is no automated way to check a configuration against intent.
It is common practice to jump directly to viewing the configuration when configuration errors are suspected. This process is
not troubleshooting! It does not guarantee that you will be one step closer to finding the root cause of an issue. It is OK to
take a quick glance at a configuration to see if the configuration error is readily apparent. However, be wary of spending a lot
of time looking at a configuration for errors, because configurations can be quite long and complex. Any benefit to be
achieved from looking at a configuration is further frustrated because the human brain tends to see that which it expects to
see. It is a much better practice to rely on the output of operational mode commands when trying to isolate configuration
errors.
Hardware Errors
• Hardware:
• Plausible in new out-of-box setups
• Plausible if new problems show up in established networks
• Can be a delayed effect from improper handling
• Alarms, LEDs, and log files, along with operational mode
command output all prove helpful in troubleshooting
hardware issues
• Try moving the problem
• Generally eliminate hardware as a possibility before
progressing on to software
Hardware Errors
Hardware is a plausible cause of issues that appear in established environments where configuration is not suspect. It is
also possible in new setups, or anytime that hardware has been moved or relocated-particularly if proper handling
techniques were not used.
Always use appropriate handling techniques when working with hardware including proper grounding and other electrical
static discharge (ESD) precautions, because even the slightest electrical influx can damage the fine traces used in today's
electronic equipment. This damage is not always immediately evident-but it could weaken a trace and could contribute to a
future failure.
Several tools are available when troubleshooting hardware, including alarms, LEDs, various log files and operational mode
command output.
Another very useful tactic when troubleshooting hardware is to attempt to move the problem. By relocating hardware within a
device or between devices, it is often possible to identify the problem component. Remember to take appropriate
precautions when a possibility of impacting production traffic exists.
Because hardware issues are more easily identified than software issues and more likely to occur in established operating
environments, it is generally more efficient to eliminate hardware as a root cause before proceeding to troubleshoot
software.
Software Errors
• Software
• Plausible in new setups, with recent Junos OS upgrades, or
when using new features
• View version and last Junos OS change
show version detail
show system software detail
file list /var/sw/pkg detail I match rollback
• Check online resources for known issues
• Check release notes
www.juniper.net;techpubs/software/junos/
• Search using keyword search-requires login
www.support.juniper.net (link: Junos Defect Search)
Software Errors
Like configuration issues, software issues are not as likely to cause random failures in established functional environments.
They are more likely to appear with changes, either to the operating system, or with the utilization of new features. Software
is generally suspect once hardware has been eliminated as a probable cause.
If software errors are suspected, check online references for known issues.
If you are not running the latest version of code, be sure to check the latest release to see if an issue you are experiencing
has been identified and resolved. If an issue is identified there, remember to proceed using the normal upgrade procedures
including testing the new code in a lab setting specific to your environment. Bypassing proper testing in a rush to resolve an
issue can lead to different and possibly more disruptive issues.
Software Troubleshooting
• Troubleshooting software problems:
• First, eliminate hardware as a possible issue
• Review logs for software-related entries
• Verify required processes are running
• Move the problem:
• Can the issue be duplicated on another system using the same
version of the Junos OS?
• Can the issue be duplicated on another system using a different
version of the Junos OS?
• Core files and memory dumps might be required for
advanced troubleshooting
Hardware Is OK
Something Else
You should take into consideration an additional possibility-it might not be the network at all. Variations in traffic being
introduced to the network can often produce symptoms similar to those encountered with configuration, software, or
hardware problems. These variations might be intentional, such as in a denial of service (DoS) attack, or they might be
normal unexpected changes in the type or amount of traffic traversing the network. When these types of changes are
suspected, it is important to have a baseline reference to compare to current traffic.
Another possibility is that the network is working as designed, but differently than understood or expected-which could be
the result of trying to use a feature in a different way than it was intended or could be the result of a design decision. Modern
networks are a complex combination of standards and protocols implemented across hardware and software. Sometimes
design decisions might have been made to accommodate the complex list of features required in today's networks. Use
online documentation to verify whether the implementation of a particular feature matches your understanding.
Define
sua:ess
Identify a Solution
Define
Succe,s
2. Examine the devices between which packets are lost, and try to find the root cause. This step can be done by
process of elimination: Look at each possible cause and determine whether it is causing the problem.
Address the root cause. This step can include a variety of issues, including congestion, a circuit issue, configuration
problems (such as class of service [CoSJ, policers, or duplex settings on Ethernet interfaces), hardware faults, and so on.
To begin troubleshooting packet loss, you will need the following preliminary information:
Two endpoints: The endpoints can be two routers, a pair of hosts in a customer network, or even traffic
generators.
Using routers as test endpoints is not ideal because host-bound traffic is often rate-limited.
Remember that on most platforms, traffic to and from the router itself is treated differently from transit
traffic. In some cases, this configuration can mask a problem on the starting or ending node.
A clear map of every device and circuit in the path. As noted previously, you must proceed by process of
elimination to narrow down the problem to a section of the path. To perform this task, a topology map is a
necessary starting point.
--------------------------------
Payload-Dependent Loss
• In rare cases, the loss is dependent on payload
• The problem appears only with packets matching a specific
bit pattern
• The cause is usually faulty hardware somewhere on the path
• Bit errors in packet memory
• One way of detecting the problem is to use a rapid ping with
different payloads
• Some suggestions are 00 (all zeros). FF (all ones). AA and 55
(alternating ones and zeros). OF and FO (half byte ones. half zeros)
• These issues are rare-rule out other causes first
• But when they happen. they are difficult to pin down
Payload-Dependent Loss
In some (fortunately, very rare) cases, the loss is only triggered by packets matching a specific bit pattern. Generally, these
problems are caused by hardware problems within network elements (for example, because of faulty packet memory) or by
electrical issues on some transmission technologies, especially when payload scrambling is not used. Even though these
problems are very rare, though, it is important to be aware of their existence and to know how to recognize them.
The best way to spot these issues is to run a test with different payload patterns. In general, all zeros, all ones, and various
alternating patterns are good tests to use. If you face an elusive error, take a few minutes to run a few ping tests with
different payload patterns.
Troubleshooting Bottlenecks (1 of 3)
• Bottlenecks:
• Look at the system as a whole
• Use tracert on end hosts to gather path information
10 52 ms 39 ms 42 ms xe-11-0-0.edgel.SJ3.level3.net [10.14.23.249]
11 50 ms 39 ms 37 ms ae-41-99.carl.SJ1.level3.net [10.14.27.195j
�.,;; :,:1':."""'
Ol!OJ.4.kmlperNelllmtlei;IOC:Allrii#IIS�
� _,1I¥1:tr iP'i.lt&r �
iJ:
.Y.::; "
-
JlJIJ,W:
,;,s� ct" i:''!<,"1"> �=� '"
'51�
�"",�� ,m "'" + oc
WoddwideEducationSeMCeS
, ·��'*" ..... �-..,.,�,, =�
wwwJumpo<.net I 49
Bottlenecks
Bottlenecks represent another unique situation that can be frustrating to troubleshoot. When troubleshooting bottlenecks,
you must remember to look at the system as a whole.
Network utilities such as tracert (on end hosts) or traceroute (on devices running the Junos OS) can be used to identify the
path between two endpoints. Sometimes the output can indicate that traffic is not taking the intended or expected path. This
scenario would represent a control plane issue, and additional troubleshooting could take place accordingly. At other times,
it could indicate a resource constraint within the data plane of devices along the path. Sometimes, the link speeds can be
obtained using a DoS tracert as indicated in the output in the slide. At other times, it is necessary to use the show route
command on each device running the Junos OS along the way to determine the available throughput capacity.
Be careful when using traceroute to determine interface link capacity. The interface information that shows up is derived
from the name associated with the link and not the actual capacity of the circuit. The information is only as accurate as the
naming is current. Rely on output from the show route command for actual circuit capacity.
You should take this information in context, however. Remember, bottlenecks are not a result of throughput capacity alone,
but rather a combination of throughput and utilization. Unlike hardware issues that can occur independently of traffic load,
bottlenecks have a direct correlation with the amount of traffic passing through the system and tend to have a correlation to
the classification of traffic passing through the circuit.
Troubleshooting Bottlenecks (2 of 3)
• Link utilization:
•Use link-by-link isolation to narrow the focus
• Use extended ping options such as size. do-not-fragment,
record-route, and so on
•Useshow interface statistics
• Hardware issues can impact throughput
• Misconfigured interface properties (check both ends)
• Malfunctioning or failing interfaces or cables
• Layer 2 loops
Link Utilization
Once the full path between endpoints has been established, individual performance statistics can be collected link by link to
identify the bottleneck. The traceroute utility can be useful to determine the path between two endpoints but the information
is generally gathered from the response of only three Internet Control Message Protocol (ICMP) messages-hardly sufficient
to gather an accurate sampling. Instead, use the ping utility for more meaningful sampling information. Utilize the source
option, as well as extended ping options such as size, do-not-fragment, record-route and so on to collect
meaningful information.
Remember duplex mismatches and other interface properties can cause collisions and slow throughput of interfaces and
links. Use the show interface command to confirm settings and to view interface statistics that can help identify errors
in the configuration or other interface problems.
Troubleshooting Bottlenecks (3 of 3)
• Another approach:
• Intentionally introduce constraints within the path
• Generate additional traffic on a particular segment
• Reduce bandwidth through interface settings
• Redirect flow to a different interface with less capacity
• If end-to-end throughput changes, you have isolated the
bottleneck
• otherwise. that link was not the bottleneck
• Slow down the next portion and try again
Summary
• In this content, we:
• Described ways to avoid unnecessary disruptions to
production environments
• Described a troubleshooting process
• Described troubleshooting challenging network issues
We Discussed:
How to avoid unnecessary disruptions to production environments;
Troubleshooting as a process; and
Troubleshooting challenging network issues.
Review Questions
1. What are the four main steps in the troubleshooting
process described in this chapter?
2. What are the four categories of potential root cause
problems described in this chapter?
3. What type of symptom would indicate a problem
within the control plane?
4. What type of symptoms would indicate a problem
within the data plane?
Review Questions
l.
2.
3.
4.
Objectives
• After successfully completing this content, you will be
able to:
• Describe the architectural philosophy of devices that run the
Junos OS and understand how this relates to
troubleshooting
• Describe traffic processing for transit and exception traffic
• Describe the function and components of the RE and PFE
within a device running the Junos OS.
• Describe FRUs
• Describe current Junos product families and understand
where to go for detailed information about your hardware
We Will Discuss:
The basic design architecture of devices that run the Junos operating system;
Traffic processing for transit and exception traffic;
The major components of the Routing Engine (RE) and the Packet Forwarding Engine (PFE);
field-replaceable units (FRUs); and
Junos product families.
TheJunos OS
The slide highlights the topics we will discuss. We discuss the highlighted topic first.
TheJunosOS
• Robust, modular operating system
• Provides industry-leading performance and scalability
• Based on FreeBSD
- -
11..4 -12.1 -
- ·-
......................................................
J2320
TX Matrix
Although the source code base is the same for all platforms running the Junos OS, some features are implemented
differently on different platforms. We make a strict effort, however, to ensure features are implemented in a consistent
manner when possible.
Another significant benefit of this common design architecture is that the same troubleshooting methodology can by applied
across all devices. Although function-specific and platform-specific troubleshooting for each product family might exist, the
base troubleshooting methodology remains the same across all devices running the Junos OS.
Separation of Duties
• All platforms running the Junos OS share a common
design philosophy
• Clean separation of control and forwarding functions
• Sometimes accomplished with hardware
• Sometimes implemented within software
Data Plane
Cl!:i!c:!I
Frames In
......_____,....!!:..}--,.--..---- C!!:lt:Iul
Frames Out
Packet Forwarding Engine
Data Plane
The Brain
The RE is the brain of the device. It is responsible for system management and for processing routing updates. The RE runs
various protocol and management software processes that run inside a protected memory environment. It provides the
command-line interface (CLI) and the J-Web graphical user interface (GUI). These user interfaces run on top of the Junos
kernel and provide user access and control of the device.
The RE is also responsible for building and maintaining the forwarding information necessary for the device to perform its
function within the network.
It handles all protocol processes in addition to other software processes that control the device's interfaces, the chassis
components, system management, and user access to the device. These software processes run on top of the Junos kernel,
which interacts with the PFE. The software directs all protocol traffic from the network to the RE for the required processing
The RE controls the PFE by providing accurate, up-to-date Layer 2 and Layer 3 forwarding tables and by downloading
microcode and managing software processes that reside in the PFE's microcode. The RE receives hardware and
environmental status messages from the PFE and acts upon them as appropriate.
Separation, Revisited
Remember, the RE does not play a direct role in the forwarding of individual transit traffic packets. When troubleshooting
traffic processing, once the proper forwarding information has been validated, troubleshooting efforts can be focused on the
data plane.
Control Plane-Components
• Common components:
• Processor
• Runs the Junos OS to maintain the router's routing tables and
routing protocols
•DRAM
• Provides storage for the routing and forwarding tables
• Buffers incoming packets
• Storage
• Can be hard disk, NANO flash. or both depending on the system.
• Used to store the Junos OS and also log files and memory dumps
• Visit www.juniper.net;techpubs/ for specific information
about the components in your hardware
Control Plane-Troubleshooting
• Possible points of failure:
• Configuration errors
• Hardware errors
• Subcomponent-level failure isolation is not usually required
because faulty hardware generally results in replacing the entire
RE
• When working with platforms with redundant REs. isolation of the
faulty RE can be required
• Software errors
• Because of the design of the Junos OS. individual processes can
be restarted without impacting the entire RE
• In some situations. even subportions of processes can be
reinitialized independently
• Be sure to check latest release notes for known issues
• The workhorse:
• Uses Layer 2 and Layer 3 forwarding tables provided by the
RE to forward traffic toward its destination or special
functions component
• Implements various services such as policing, stateless
firewall filtering, and class of service
Routing Engine
Data Plane
c:Jc:J c:Jc:J
---·--·�-·-···· .. ·----+
Frames In
Frames Out
Packet Forwarding Engine
The Workhorse
The data plane, built around the PFE, systematically forwards traffic based on a synchronized local copy of the forwarding
table created by the RE. Storing and using a local copy of the forwarding table allows the PFE to forward traffic more
efficiently by eliminating the need to consult the RE each time a packet needs to be processed. Using this local copy of the
forwarding table also allows platforms running the Junos OS to continue forwarding traffic during control plane instabilities.
The PFE also maintains Layer 2 bridging information.
In addition to forwarding traffic, the PFE also implements a number of advanced services. Some examples of advanced
services implemented through the PFE include policers that provide rate limiting, stateless firewall filters, and class of
service (CoS). Other services are available through special services cards that can be added to the data plane.
• Distributed architecture
• Made up of several ASICs, processors, or both
• Different chips reside on different components within the data
plane
• On newer-generation devices. several tasks can be accomplished
within a single chip
• On newer-generation devices. several stand-alone PFEs can exist
within a single chassis. linked together through switching fabric
• On some devices, PFE functionality is emulated within a
single processor using software
Division of Duties
When Juniper Networks first entered the market, several key improvements were introduced to network hardware. In
addition to the separation of a control plane and data plane discussed earlier, we also introduced hardware-based
forwarding utilizing application-specific integrated circuits (ASICs). ASICs are designed to do specific tasks and they do them
very quickly.
Initially, individual ASICs were designed to handle each of the tasks described in the process of forwarding a packet.
Later, as technology improvements became available, newer ASICs were designed. These new ASICs added the ability to
combine several functions into a single chip, lowering power consumption and increasing throughput, while at the same time
allowing for even greater scalability than had been available with earlier ASICs.
On some devices running the Junos OS, this functionality is accomplished with software, running on a single code-complete
CPU. This functionality allows the same functionality for lower traffic volumes, but at a price performance balance
appropriate for the environment.
We discuss each of the available chipsets in the upcoming pages.
Data Plane-Components
�. : ,_ =
Ol!W.4Jun1per-11<s;loc.All��x'lg\ki,, I
,y'.y'f��"""t'
_JLJnffl; =---
""'"'ti;#;;
-= .. � ..
WorldwideEducationServu:es
;;,;t� >.�- --
"'
wwwJumpernet I ts
All transit traffic must pass through interfaces. These interfaces can be modular or built-in. As indicated in the slide, several
names are used when referring to interfaces. We cover each of these named options in upcoming slides. For consistency, all
interfaces are identified as PICs in operational command output.
Switching boards and line cards work together to form the PFE portion of the data plane. In some cases, the functionality is
distributed across the line cards and the switching board. In other cases, PFE functionality resides only on the line card, and
multiple PFEs are linked together through a switch fabric, which resides on the switching board. In still other cases, the line
cards play no active role in the PFE and all PFE functionality resides on the switching board.
As illustrated on the slide, the names for line cards can vary with each platform, based on the PFE role performed. For
consistency, all line cards are identified as FPCs in operational command output.
Naming conventions for switching boards can also vary by platform, based on the role they play in the PFE. We discuss each
of the possible names and their differing functionality in upcoming slides.
Remember that fault isolation is only necessary to the component level. Although some ASICs can produce specific errors
that indicate which component is experiencing failure, it is not always necessary to identify where within the PFE a
breakdown occurs-only that a particular component is faulty.
-
(FPCs) � E E I• E
-
Manager Manager Manager
M M M
Interfaces
(PICs) -
--
I
I
Media-
I
�-
I
I
Addti.
I
,..._,...
-
I
I
Media-
- Specific
ASIC - Services
PIG - Specific
ASIC
--,
I I
Internet Switch "VI
.-+ Processor II I
.--+
I Interface
I
I Other
I I
I I I FPCs
I I
Line Card I
I Rchip I I Nchip
I I I
(FPCs) 1' I
Layer2/Layer3 Queuing&
Packet �
Switch
f--+ Memory RAM -
Processing
Interface
Interface +
Lchip Nchip Mchip
Interfaces
(PICs) -
--
I
I
---.... Additional
I Key:
Data
-
Media- Iii
Services
Specific
ASIC
PIG PFE Control ------>
Scalability-PFE on a Card
The L/M/N/R chipset divided the duties of the Buffer Manager into two chips, the N Chip, which is responsible for creating
and processing the notification and results cells, and the M chip, which is responsible for distributing J-cells to memory.
The L chip is responsible for all Layer 2 and Layer 3 header checking as well as breaking the packet up into J-cells.
The R chip is responsible for performing the route lookup and making a forwarding decision.
The L/M/N/R PFE also combined all PFE functionality onto a single line card. Multiple PFEs could be used within a single
system for increased performance and throughput. The multiple PFEs are linked together through a switched fabric that
resides on the switching board. Only packets that arrive on one FPC and leave on another are required to cross the switching
board.
Switching Board
(FEB)
·-·
- I chip "W
Other
FEBs
Line Card
FPCs " FPCs
(FPCs/CFPCs) -+ -
_ ----
I
Interfaces
-... I - I
-
Media- Media-
(PICs)
- Specific
ASIC
Specific
ASIC
Multiple switching boards can be used to increase performance and add redundancy.
Switch Fabric
-------·----
IOC SPC
-----
Line Card
I chip
---- ---
(IOCs/NPCs/SPCs) �
�
Media
Specific
Interfaces ASIC
Data Plane-Troubleshooting
Field-Replaceable Units
The slide highlights the topic we discuss next.
Field-Replaceable Units (1 of 2)
• Juniper Networks hardware
• Most devices running the Junos OS are made up of a
chassis, containing a midplane or backplane. and several
components that can be added to the chassis
• The removable components are called field-replaceable
units
• FRUs play a role in troubleshooting
• Smallest unit required for isolation of hardware problems
• Can be relocated within chassis or between equipment to help in
the troubleshooting process
• Remove all FRUs from a chassis before shipping it for a
return materials authorization (RMA)
Modular Architecture
Most hardware running the Junos OS is made up of a central chassis containing some form of midplane and several
components that plug into it called field-replaceable units (FRUs).
FRUs are any component of the device that can be replaced. It does not include subcomponent hardware such as the
memory or hard disk on an RE. For instance, if the hard disk on an RE fails, the entire RE is replaced, not just the hard disk.
For this reason, it is not necessary to isolate hardware faults beyond the component level.
Because FRUs can often be added or removed from a system with minimal or no impact to the forwarding functions of the
device, offlining or removing a particular component can prove very helpful in isolating hardware failures. Additionally, if
like-equipment is available, FRUs can often be relocated to help isolate the faulty hardware component. In certain cases,
FRUs can be relocated within the same chassis to assist in hardware fault isolation.
Before sending any chassis back to Juniper Networks for a return materials authorization (RMA). be sure to remove all FRUs
from the chassis.
Field-Replaceable Units (2 of 2)
• FRUs vary by platform
• Some FRUs have an individual serial number
• Some FRUs do not have serial numbers
• Varies with platform
lab@rr�C-1> show chassis hardware
Hardware inventory:
Item Version ?art number serial number Description
Chassis D4897 MXBO
Midplane REV 06 711-031594 I YK898o MXBO
SCGs
CB
REs (under cover)
Air filter
SIBs
PEMs
T640 FRUs
Sample FRUs
The slide displays a populated T640 chassis and identifies several FRUs.
• Documen·tation Help �J.w !,1X SBiies Hardw.:1re & Stft."lare Doc:umentatian Horne
• E11terprise M!B5
Overview Components: Planning Safety lnsta.llation Maintenance Troubleshooting
• E:OL Documentation
t
> feature E.xplorer8f.J.6 Maintaining Components Replacing Components
Available FRUs
Each device type has a unique list of available FRUs.
You can find a list of the FRUs available for your equipment by visiting www.juniper.net/techpubs.
Additional information on handing FRUs and all steps required to change each FRUs is also available.
Be sure to follow proper electrostatic discharge (ESD) procedures when working with hardware. Always store hardware in
appropriate ESD packaging. Failing to do so can damage hardware. Even though the damage might not be immediately
evident, any static discharge can affect the integrity of the hardware and decrease its useful life.
Also be aware that some components can be very heavy. Be prepared for the weight and use appropriate equipment to avoid
injury to yourself or the equipment.
JUn05
Junos-Based Devices-Meeting Network Needs
Juniper Networks has developed a wide range of platforms to meet your networking needs. The platform families listed
below all run the Junos OS:
Multiservice routers (T Series, M Series, and J Series);
Packet transport switches (PTX);
Ethernet services routers (MX Series);
Universal access routers (ACX Series);
Mobile secure routers (LN Series);
Ethernet switches (EX Series); and
Security services gateways (SRX Series).
We discuss each of these product families in more detail on the following pages.
Although the product list provided in this course was complete at the time of publication, note that we are constantly
releasing new hardware. It takes a constant effort to always be on the leading edge! For the most current hardware
information available, visit www.juniper.net;techpubs.
per chassis
• Wide range of interfaces
• T1 to 100 Gbps Ethernet
• ATM, SONET/SDH, Ethernet,
Serialized
• Additional Services PICs
TX Matrix Plus
TX Matrix
T Series Interfaces
T Series routers provide a wide range of high speed interfaces for large networks and network applications, such as those
supported by Internet Service Providers (ISPs).
T Series Architecture
T Series routers use multiple PFEs, each using an L/M/N/R chipset for the type 3 and type 4 FPCs and the Trio chipset for
the type 5 FPCs. These PFEs are tied together through the switch fabric. All of these components are tied together through a
midplane.
Data packets are transferred across the midplane from the PFE on the originating FPC to the Switch Interface Boards (SIBs),
and from the SIBs across the midplane to the PFE on the destination FPC.
T Series Redundancy
T Series routers are designed so that no single point of failure can cause the entire system to fail. The slide outlines several
options for redundancy.
M10i
M120
M Series Interfaces
M Series routers provide a wide range of high-speed interfaces for large networks. The slide lists the available interfaces.
M40e Architecture
The M40e uses the A/B/C chipset.
M40e Redundancy
The M40e provides multiple options for redundancy including redundant REs, SFMs, PFE Clock Generators (PCGs), power
supplies, and cooling systems.
When operating with two SFMs, one is active the other acts as a hot-standby.
The A/B/C chip, which is combined within a single ASIC, resides on the CFEB or CFEB-E.
• Interfaces:
• Ethernet, Serial, ISDN, DSL, T1/E1
J6320
J Series Interfaces
All J2320, J2350, J4350, and J6350 routers ship with four fixed 10/100/1000 Ethernet ports. You can add additional
modular LAN and WAN interfaces using Physical Interface Modules (PIMs).
J Series routers provide a large selection of connectivity options including Tl and El, Serial, Fast Ethernet, Gigabit Ethernet,
DS3, E3, ISDN, ADSL2+, and G.SHDSL.
���-J_u_n_os�OS���__,/ RE
UNIX Socket
fwdd-unix PFE
hared Memory
______ -I �lost 1 .. S
c::::o--G�l�
Frame In
J[XEC H �:;:t �
'----------------------,�--�---
c::::o
Fram e Out
rt threads
RTOS
Both the RE and the PFE functions are accomplished using software processes running a real-time operating system.
The RTOS is a virtual architecture where CPU and memory resources are dynamically allocated to processes and real-time
threads on an as-needed basis. This virtual architecture allows available resources to be used in the most efficient manner,
adjusting as necessary.
The FWDD process is emulating the control board of hardware based devices.
PTXSeries
Juniper Networks PTX Series Packet Transport Switches are designed for the converged supercore. The system is the first
supercore packet switch in the industry, and delivers powerful capabilities based on innovative silicon and forwarding
architecture that is focused on optimizing MPLS and Ethernet. PTX Series Packet Transport Switches deliver several critical
core functionalities and capabilities, including game changing density and scalability, cost optimization, high availability and
network simplification. They can readily adapt to today's rapidly changing traffic patterns for video, mobility and cloud-based
services.
PTX Series Packet Transport Switches are based on Juniper's patented Express chipset. Express uses state-of-the-art 40nm
fabrication technology and is built with a no packet drop assurance. The PTX Series is designed to scale up to 2 Tbps and
600 Mpps per slot and provide significant cost reduction over traditional core transport solutions.
PTX Series provides a unique combination of hardware and software features that allow service providers to manage their
supercore network more efficiently because the platforms are built from ground up for speed, scale and cost optimization.
They are the first supercore packet switches in the Industry, and support a single chassis with 8 and 16 Tbps capacity. The
modular power design allows power efficiency in the order of 1 watt per Gbps per line rate port.
·1 q "" :
MX10 MX240
MX480
:.c·;- ·, MX960
MX Series Interfaces
MX Series routers support Dense Port Concentrator (DPC) interface cards, offering enhanced queuing capabilities, QoS, L2
switching, and L3 routing services. MPCs can contain Ethernet or SONET/SDH based interfaces.
Currently the Multiservices DPC supports the following Layer 3 services: stateful firewall, NAT, intrusion detection service
(IDS), IPsec, active flow monitoring, real-time performance monitoring (RPM), and generic routing encapsulation (GRE)
tunnels (including GRE key and fragmentation).
• PFE architecture
• Trio chipset
• Data plane distribution
• Media-specific ASICs reside on and PICs. MICs and DPCs
• Media-specific functionality is included in the Trio chipset on MPCs
• Trio chipset chip resides on MPCs
I chip resides on FPCs and DPCs
• Switch fabric resides on SCBs
• Component-level redundancy can include:
• Routing Engines. Switch Control Boards. power supplies,
cooling systems
MX Series Architecture
MPCs use the new Trio chipset for even greater performance and scalability. The I chip resides on FPCs and DPCs.
MX Series Redundancy
A fully configured MX Series router is designed so that no single point of failure can cause the entire system to fail. Only a
fully configured router provides complete redundancy. All other configurations provide partial redundancy. The MX Series
platforms offer redundancy for Routing Engines, Switch Control Boards, power supplies, and cooling systems.
ACX1000
ACX Interfaces
Equipped with interfaces for both time-division multiplexing (TDM) and Ethernet (1 Gbps and 10 Gbps interfaces), as well as
support for high precision clocking and synchronization, the ACX Series platforms can support the mobile network's
evolution path from 2G and 2.5G to 3G, 4G, and Long Term Evolution (LTE).
• LN Series routers
• High performance firewall and IDS
• I Psec features
• Favorable SWAP characteristics
• Designed for network access
• Military
• First responder
• Transportation vehicles
L 1000
LN Series Routers
The Juniper Networks LN1000 Mobile Secure Router is an edge access router that delivers a high-performance routing
firewall and IDS. Packaged in the standard 4 x 6 x .85 inches VPX form factor, it consumes 35 watts of power or less and
weighs less than 1.5 lbs. The Space, Weight, and Power (SWAP) characteristics of the LN1000 make it ideal for customers
who require a secure and rugged network access router with a small footprint in a transportable package. The LN1000
provides the power of Juniper's hardware and Junos OS routing functionality across its 8 x 1 Gbps Ethernet interfaces.
The LN1000 addresses the growing demand for a network access presence in military, first responder and transportation
vehicles, mining and exploration equipment, unmanned aircraft, and power grids. Until now, many of these networks were
forced to leverage traditional routing and security boxes that were designed for equipment rack installations requiring forced
air or fans for cooling. These designs did not consider the SWAP requirements of mobile secure networks. These mobile, and
in some instances remote network endpoints, have a unique set of requirements that only the LN1000 can provide in a VPX
form factor.
1:1===::l=•=••
EX4500
EX3200-48p
EX2200-24poe
EX4200-48p
EX2500
EX Series Architecture
The EX8200 line is midplane architecture, modular Ethernet switch that is designed for ultra high-density environments such
as campus aggregation, data center, or high performance core switching environments. Switch Routing Engines (SREs)
process all Layer 2 and Layer 3 protocols and manage individual chassis components, while the switch fabric module
provides the central crossbar matrix through which all data traffic passes. The SRE and switch fabric modules work together
to fulfill all RE and switch fabric functions.
Whereas each model uses different components to accomplish the switching and routing functions, visit www.juniper.neV
techpubs for detailed information about your specific hardware.
EX Series Interfaces
The line cards in EX8200 line switches combine a PFE and Ethernet interfaces onto a single card. All line cards are
hot-insertable and hot-removable.
EX Series Redundancy
Several different redundancy options exist for different switches. Visit www.juniper.neVtechpubs for detailed information
about your specific hardware.
• Security platforms:
• Range from 700 Mbps to 150 Gbps firewall throughput
• Range form 65 Mbps to 30 Gbps IPS throughput
• Interfaces - Ethernet, Serial, DSL, T1/E1
SRX650 SRX5800
SRX100
SRX110
• PFE architecture
• I chip
• Data plane distribution
• Media-specific ASICs reside on IOCs
• I chip resides on IOCs, NPCs, and SPCs
• Switch fabric resides on SCBs
• Component-level redundancy can include:
• Switch Control Boards, power supplies, cooling systems
• Additional redundancy available through high-availability
clustering
• PFE architecture
• RTOS
• Data plane distribution
• Media-specific ASICs reside on PIMs
• PFE components are emulated within a single processor
using software
• SRX650 and SRX550 uses a Services and Routing Engine
• Component-level redundancy
• Redundancy available through high availability clustering
Summary
• In this content, we:
• Described the architectural philosophy of devices that run
the Junos OS and learned how this philosophy relates to
troubleshooting
• Described traffic processing for transit and exception traffic
• Described the function of the RE and the PFE within a device
running the Junos OS, along with the components of each
• Described FRUs
• Described current Junos product families and learned where
to go for detailed information about specific hardware
We Discussed:
The basic design architecture of devices that run the Junos OS;
Traffic processing for transit and exception traffic;
The major components of the RE and the PFE;
FRUs; and
Junos product families.
Review Questions
1. What is a FRU?
2. Where can you find a list of FRUs for a specific
Junos-based device?
3. What is the difference between hot-swappable and
hot-pluggable?
4. Why is it important to understand which
implementation of the PFE is implemented on a
particular device?
Review Questions
1.
2.
3.
4.
2.
Detailed information for all hardware running the Jw1os OS can be obtained at www.jwuper.net/techpubs.
3.
Hot-swappable and hot-pluggable FRUs can both be added or removed without powering down the device. However, inserting or
removing hot-swappable FRUs (also referred to as hot-insertable or hot-removable FRUs) will not disrupt the global forwarding
function of the device - only services dependent on the FRU will be impacted. In contrast, inserting or removing hot-pluggable FRUs
will impact the global forwarding fwiction of the device, even if only momentarily.
4.
It can be beneficial to understand which version of the PFE is in use and where the individual subcomponents reside when interpreting
chip-specific messages or, as you will learn later, accessing microkernal for additional troubleshooting information.
Objectives
• After successfully completing this content, you will be
able to:
• Describe various tools that can be used to troubleshoot
devices that run the Junos operating system
• Explain JTAC recommendations for current best-practices
that facilitate troubleshooting
We Will Discuss:
Various troubleshooting tools supported by the Ju nos operating system; and
Juniper Networks Technical Assistance Center (JTAC) recommended configuration settings for ease of
troubleshooting.
7Troubleshooting Tools
• Best-Practices
Troubleshooting Tools
The slide lists the topics we will discuss. We discuss the highlighted topic first.
;e20J.4�:,.;N��·,nc.Ailrillllt>reseMOd..
������'---<'L--- •
JUntPer
• -...,_...
WorldwideEducationServices
• ..
WWW.JUmper.net I 6
Full Commits
• Juniper Networks optimized the commit function
• Goal is to avoid disruption to processes not affected by a
configuration change
• The hidden full option affects all processes
• Forces reread of configuration, reactivating the entire
configuration
• An excellent way to restart a process that is disabled
because of thrashing
..·..··..
[edit]
Hidden option
user@router! commit full
commit complete
·.. .
[edit] ·· ..
user@routeri ·····... ··
············· ....
Shaking It Up
Because a full commit places a processing strain on a router with a complex configuration, you should perform a full commit
only when conditions warrant.
Hardware Restart
• You can restart FPCs and PICs or bring them offline or
online using the CU:
user@router> request chassis ?
Possible comple1:ions:
cfeb Change Compact Forwarding Engine Board status
pie Change Physical Interface Card status
roucing-engine Change Routing Engine status
Hardware Restart
The slide shows how you can use the Junes CLI to take a Compact Forwarding Engine Board (CFEB) (in some models),
Flexible PIC Concentrator (FPC), or PIC offline and online. In some cases, you can clear problems by bouncing a piece of
hardware, which means taking the device offline and then bringing it back online again.
The commands shown on the slide have the same effect as if you depressed the CFEB offline button on the physical router
to bring it offline.
user@router> telnet ?
Possible completions:
<host> Hostname or address or remote host
8bit Use 8-bit data path
bypass-routing Bypass routing table, use specified interface
inet Force telnet to IPv4 destination
inet6 Force telnet to IPv6 destination
interface Name of interface for outgoing traffic
loaical-svstem Name of logical system
no-resolve Don't attempt to print addresses symbolically
port Port number or service name on remote host
routing-instance Name of routing instance for telnet session
source Source address to use in telnet connection
routing-instance: This option supports VPN and routing instance context for applications like Telnet and
FTP. A classic use would be to establish a Telnet connection from a provider edge (PE) router to an attached
customer edge (CE) device, which, being part of a VPN, would reside in a specific routing table and instance.
source: As with ping, altering the source address used in a connection request might uncover problems with
routing that prevent connection establishment when sourcing traffic from the egress interface (the default).
Monitor Traffic
• The monitor traffic command provides CLI
access to the tcpdump utility
• Displays traffic only originating or terminating on the local
Routing Engine
user@router> monitor traffic interface se-1/0/0 detail
Address resolution is ON. Use <no-resolve> to avoid any reverse lookup delay.
Address resolution timeout is 4s.
Listening on se-1/0/0, capture size 1514 bytes
Monitor Traffic
The monitor traffic command provides CU-based access to the tcpdump utility. This command monitors only traffic
originating or terminating on local the routing engine. This capability is the best way to monitor and diagnose problems at
Layer 2 with the Junos OS because tracing, which is similar to debug on equipment from other vendors, does not function for
Layer 2 protocols.We cover tracing on subsequent pages that deal with system logging.
Note that protocol filtering functions (for example, matching on only User Datagram Protocol (UDP) traffic sent from a
specific port) are currently not supported for real-time monitoring because in real-time mode, the Layer 2 headers are
stripped at ingress, which prevents filtering on protocol types. As a workaround, you can write the monitored traffic to a file
using the hidden write-file and read-file options and then read the file with a tcpdump-capable application like
Wireshark.
Green
· Red alarm active
·· · ·· ·· · · · ··· ··· +Red
· · ··· · ··
LCD screen: +--------------------+
Host
1 Alarm active
R: Supply A FAIL
+...________...........�... + ..... ...··
.
Syslog
• Syslog:
• Standard UNIX syslog configuration syntax
• Primary syslogfile is /var/log/messages
• Most processes also write to individual log files
• Supports numerous facilities and severity levels
• The facility defines the class of log message. whereas the severity
level determines the level of logging detail
• Local and remote syslog support
• We recommend remote logging (and archiving) for troubleshooting
Syslog
Syslog operations use a UNIX syslog-style mechanism to record system-wide, high-level operations, such as interfaces going
up or down or users logging in to or out of the router. You configure these operations by using the sysl.og statement at the
[edit system] hierarchy level and the options statement at the [edit routing-options] hierarchy level.
The results of tracing and logging operations go in files the router stores in the /var I log directory. You use the show l.og
fil.e-name command to display the contents of these files.
Process name or PIO: The name of the process (or the Process ID [PIO] when a name is not available) that
generated the log entry.
message-code: A code that identifies the general nature and purpose of the message. In the example shown,
the message code is CHASSISD_FRU_EVENT.
message-text: Additional information related to the message code.
When you add the explicit-priority statement, the syslog message format alters to include a numeric priority value.
In this case the value O is for the most significant and urgent messages (emergency), while 7 denotes debug level messages.
Consult the System Log Messages Reference documentation for a full description of the various message codes and their
meanings-better yet, use the CLl's help function to obtain this information.
Tracing
• Tracing decodes protocol packets and certain router
events:
• Some other vendors refer to tracing as debug
• Tracing operations include:
• Global routing behavior
• Router interfaces
• Protocol-specific information
Tracing Operations
Tracing operations allow you to monitor the operation of routing protocols by decoding the sent and received routing protocol
packets. In many ways, tracing is synonymous with the debug function on equipment made by other vendors. Note that
because of the design of hardware-based Juniper Networks platforms, you can enable reasonably detailed tracing in a
production network without negative impact on overall performance or packet forwarding.
Tracing Overview
• Tracing is the Junos OS equivalent of debug
• You can enable tracing on a production network
• Requires configuration
• Can trace multiple options (flags) to a single file
• Generic tracing configuration syntax:
••....•.•.• ·· ·•··••·····•· •····•· .. Th e protocol or function being traced
... .•
,;
[edit protocols protocol-name]
user@router# show . ··•·· ····•·•· · ·· ·•··•·• ·· ·•······ · ·•····· •••· ·• W
•·•..• here to write the trace results
traceoptions { ....·
file filename [size size] [files number]
[world-readable I no-world-readable);
flag flag [flag-modifier] [disable];
} Flags identify what aspects of
· ..............................................................
tile protocol the software traces
and at what level of detail
ill:,
• JUfil� �w.;'ddwide Education SelVices www,urnpe.-.net I 21.
�� �,,ix.}J;;',j "" ,. �-
Protocol Tracing
• Include the traceoptions statement at the [edit
protocols protocol-name] hierarchy
• Useful when troubleshooting configuration and
interoperability problems
• Search for Baseline Options at
www. juniper. net/techpubs I software/nag for
protocol-specific traceoptions setup
Protocol Tracing
You trace the operations of a specific protocol by including the traceoptions statement at the [edit protocols
protocol-name] hierarchy. In most cases you should be selective in what you trace because selecting the all keyword
can overwhelm you with endless lines of text.
Visit www.juniper. net/techpubs/software/nog and search for Baseline Operations Guide, then Search Log
Messages, then Track Error Conditions for a complete list of protocol-specific traceoptions setup flags.
Sample Output
The sample OSPF stanza on the slide reflects a typical tracing configuration that provides details about important events like
hello message or OSPF link-state advertisement (LSA) details. In most cases you should use the detail option with a given
protocol flag for the added information often needed in troubleshooting scenarios. Search for baseline options at
http: I /www.juniper.net/techpubs/software/nog for protocol-specific options.
The slide shows a sampling of the results obtained with the tracing configuration. As with any log file, enter show file
trace-fil.e-name to view the decoded protocol entries. The sample trace output reflects the receipt of an OSPF hello
message from 10.222.100.1 and goes on to show some of the hello protocol parameters.
Stopping Tracing
• To stop a tracing operation, delete a trace flag or the
entire stanza:
[edit protocols ospf traceoptions]
user@router# delete flag hello
Core Files
• Modern computing environments are complex and,
therefore, have complex bugs
• Transient software failures are extremely hard to reproduce
and, therefore, difficult to fix
• Hardware errors can also trigger software failures
• Well-written code dumps a core file for diagnostic analysis
when a fatal fault (panic) occurs
• The stack trace identifies the name of the offending process.
memory pointers. and register data at the time of the fault
• In the Junos OS numerous entities can dump a core at panic
or upon command
• The kernel. software processes. and embedded hosts in the data
plane
Forcing Cores
• Forcing a running process to write a core can help
diagnose certain problems
• Use the hidden CU command request system core
dump to force a core dump
• Use with caution! The software creates a copy of the running
process: this copy can result in excessive memory paging if the
memory footprint of the process is large
• JTAC might direct you to force a core from the shell
• The default behavior can be modified to suspend the process
during core writing
• Uses less memory. but process suspension can lead to other
problems
• Troubleshooting Tools
� Best-Practices
Best-Practices
The slide highlights the topic we discuss next.
Best-Practices
• Take the following best-practice steps before a
problem occurs
•Setup an out-of-band management network
•Setup system logging for remote logging
•Setup clock synchronization
• Establish a baseline for reference
Recommended Best-Practices
We recommend several best-practices where network resources and topology allow. We cover each of these topics in more
detail on subsequent slides.
system I
backup-router 10.210.15.254 destination 10.210.15.0/24; I
services {
ftp;
ssh;
telnet;
routing-options
static {
route 10.210.15.0/26
next-hop 10.210.15.254;
no-readvertise;
file errors {
any error;
explicit-priority; Note:(*) indicates sample factory-<lefault
settings (hardware-dependant)
Clock Synchronization
GetRequest r
SNMP Agent on device
NMS
running the Junos OS
'-------------f-, Response
SNMP Configuration
[edit snmpJ
user@router# show
description "My Junos OS Device";
Device contact
information
location "123 Main Street - Rack 4";
!
contact "John Doe - x1865";
I( j Default
community myManagedDevices
authorization read-only;• �I authorization
clients { .---�::::::::::::::::::�
SNMP requests limited to
Defining an SNMP 10.210 .15 .0/24; -------- 10.210.15/24 subnet; can
community is the also restrict to an interface
minimum SNMP
configuration
trap-group my-trap-group
version v2; Sends SNMPv2
categories ( >------"'I notifications
chassis; regarding link or
link; chassis events
�--- targets {
Defines NMS
10.210.14.173;
for trap
delivery
Summary
• In this content, we:
• Described various tools that can be used to troubleshoot
devices that run the Junos OS
• Explained JTAC recommendations for current best-practices
that facilitate troubleshooting
We Discussed:
Various troubleshooting tools supported by the Junos OS; and
JTAC recommended configuration settings for ease of troubleshooting.
Review Questions
Review Questions
1.
2.
3.
The operational mode command show chassis environment would display information about the local chassis environment.
2.
The monitor traffic command monitors real-time traffic going to and from the control plane. If no interface is specified, it
monitors the control traffic going over FXPO.
3.
SNMP is an Application-Layer protocol designed to monitor and manage TCP /IP network devices. .An NMS requests specific
infonnation frotn an SNl\t11J agent running on the managed device. The agent can also initiate alerts to send to the NMS.
Objectives
• After successfully completing this content, you will be
able to:
• Describe the key commands and features used to monitor
storage and memory issues
• Describe the key commands and features that you can use
to monitor software installations
• Determine how to find potential hardware problems using
system logs
• Describe the key commands that you can use to monitor
hardware and environmental issues
We Will Discuss:
The commands and features used to monitor storage and memory issues;
The commands and features that you can use to monitor software installations;
Finding potential hardware problems using system logs; and
The commands that you can use to monitor hardware and environmental issues.
Background: Displays the percentage of CPU utilization being used by background processes;
Kernel: Displays the percentage of CPU utilization being used by kernel processes;
Interrupt: Displays the percentage of CPU utilization being used by interrupt processes; and
Idle: Displays the percentage of idle CPU utilization.
Model: Displays the RE model.
Serial ID: Serial identification number of the RE.
Start time: Displays the time at which the RE started running.
Uptime: Displays how long the RE has been running.
Load averages: Displays the RE load averages for the last 1, 5, and 15 minutes.
/var/::un
2.0K /var/run/ext
/var/run/db
2.0K /var/run/db/private
2.0K /var/run/na..�ed
2. OK I var/run/ppp
/var/run/scripts
2.0K /var/run/3cripts/cc�mit
2. OK /var/run/ scripts/event
2.0K /var/run/scripts/op
18K /var/run/scripts/import
2.0K /var/run/scripts/lib
/ver/!,w
823M /var/3w/p�g
/var/t.."!lp
126K /var/t!:'tp/gre�-tp
Directory Usage
The slide shows the output of the show system directory-usage command. By specifying a particular directory as a
modifier to this command, you can determine how much storage space is being used by each of the underlying directories.
Output above shows no errors. A test resulting with errors would show up as
the following:
Feb 20 12:24:15 kernel: daO: FAILUFE - READ status=51<READY,DSC,ERROR>
error=lO<NID NOT FOUND> LBA=18446
dd Utility
The integrity of a storage device is determined by escaping to a root shell and using the dd utility to confirm that all blocks can
be read. This approach is typically used to test the compact flash, but it can also be used on the hard drive and RE memory by
specifying the correct device and switches. The following example shows a compact-flash test (device ado in this example) that
fails with a read error:
Continued on the next page.
dd Utility (contd.)
root@router% dd if=/dev/radO of=/dev/null bs=4k
adO: HARD READ ERROR blk# 65600 status=59 error=40
adO removed from the Boot List
dd: /dev/radO: Input/output error
8200+0 records in
8200+0 records out
33587200 bytes transferred in 25.538337 secs (1315168 bytes/sec)
Note that as a result of the read error, the compact-flash device has automatically been removed from the list of available boot
devices. You can use the sysctl -a command to display the current list of boot devices:
root@router% sysctl -a I grep bootdevs
machdep.bootdevs: pcmcia-flash,disk,lan
In this example you can see that the compac t -flash device is no longer in the boot list. If you believe that a boot device has been
incorrectly removed from the boot list, or that the error condition has been resolved, you might have to manually add that device
back into the boot listing. Note that this addition automatically occurs when you reinstall the Junos OS from removable media.
To manually add a device back to the boot list, use the sysctl -w command at a root shell:
root@router% sysctl -w machdep.bootdevs=pcmcia-flash,compact-flash,disk,lan
machdep.bootdevs: pcmcia-flash,disk,lan -> pcmcia-flas h,compact-flash,disk,lan
Chassis Alarms
You can also determine whether smartd has detected any problems by using the Show chassis alarms command:
lab@sneaky-reO> show chassis alarms
1 alarms currently active
Alarm time Class Description
2010-10-13 13:04:23 PDT Minor Host O hard-disk drive error
Continued on the next page.
Boot Monitoring
The slide highlights the topic we discuss next.
No Solid-state
flash disk Secondary boot media
Rotating or SSD
hard disk. or
solid-state
Done flash disk
• Hardware controlled
• Software notifies hardware
I I
Done
dal: <..�TP AT? IG eUSB SSD 1100> Fixed Di=-ect Access SCSI-0 device
dal: 40.000!:·IBh, tran3fe=s
dal: 35'201-13 (8028160 512 byte sectors: 255H 63S/T 4S9C)
daO at uma�s-si..�0 bus Q target O lun O
daO: <ATP AT? IG eUSB S9D 1100> Fixed Di=ect Access SCSI-0 device
daO: 40.000MB/s tran3fe.rs
daO: 3520M3 (8028160 512 byte sect.ors: 255H 63S/T 499C)
Trying to mount root from ufs:/dev/daOsla
Comrne:it:
JUNOS Base 08 Software suite [12.2R2.SJ
Comment:
JUNOS Crypto Software Suite (12.2�2.5]
Comment:
JUNOS Online Docu.�en�ation [12.2R2.5j
Comment:
JUNOS Kernel Software Su�te [12.2R2.5]
Installed Software
The slide shows the use of the show system software command which displays the details of the currently installed
version of the Junos OS.
Comment:
JUNOS Base OS Software Suite (12.2R2.51
Comment:
-JUNOS Crypto Software Suite [12.2P2.5]
Comment:
JUNOS Online Docu.�en�ation [12.2R2.5}
Comment:
JUNOS Kernel Software Suite [12.2R2.5J
Preparation
Before you install the Junos OS, you must perform the following steps:
1. You should have console access to the router so that you can observe installation messages, and so that you can
log in (as root) after the installation. Note that a factory installation supports root logins from the console port only.
2. If you plan to use the existing configuration after software reinstallation, you must take steps to copy the existing
configuration to a remote location. The active configuration file is /config/juniper. conf. This file can be
transferred using FTP to a safe location. Alternatively, you can display the current configuration for copying into a
terminal emulation buffer where it can be pasted into a word-processing program and saved as a text file.
3. Ensure that you have a Juniper Networks installation PCMCIA card or USS drive with the desired software image.
• See the Juniper Networks website for syntax used with other
software releases or media types
Backup Options
In the event of a failure on the flash drive, the router can boot from the hard drive. It is possible to have one version of the Junos
OS on the removable media and another version of the Junos OS on the hard drive. What if you want to ensure that the flash
drive and hard drive versions of the Junos OS are exactly the same?
You should back up software before you upgrade the Junos OS. Or, after you upgrade the software on the router and are
satisfied that the new packages are successfully installed and running, you should consider issuing the request system
snapshot command to back up the software onto the
I al troot and I altconfig file systems, located on the router's hard drive.
Specifically, the root file system (/) is backed up to I al troot, and I config is backed up to
I al tconfig. Normally, the root and /config file systems are on the router's flash drive, and the
I al troot and I al tconfig file systems are on the router's hard drive.
Continued on the next page.
Disk Mirroring
You can direct the hard drive to mirror the contents of the compact flash automatically. When you issue the
mirror-flash-on-disk statement at the [edit system] hierarchy, the hard drive maintains a synchronized mirror copy
of the compact-flash contents. Data written to the compact flash is simultaneously updated in the mirrored copy of the hard
drive. If the flash drive fails to read data, the hard drive automatically retrieves its mirrored copy of the flash disk.
We recommend that you disable flash disk mirroring when you upgrade or downgrade the router. You cannot issue the
request system snapshot command when you enable flash disk mirroring. After you have enabled or disabled the
mirror-flash-on-disk statement, you must reboot the router for your changes to take effect. To reboot, issue the
request system reboot command.
Feb 2 21:28:12 Bangkok-rel chassisd[4446 : CHF.SSISD BLOWERS SPEED FULL: Fans and
impellers being set to full speed [fan/blower missing/fail_;-d] -
chassisd Logs
• The chassis daemon (chassisd) maintains logs
entries as chassis-related events occur
user@mx2�0> show log chassisd
Sep 21. 18:43:43
I
uaer@mx240> help syslog CHASSISD FRU EVENT
Name: CHAS.SISDI FRO EV'ENT
Mea.:,age: <function-name>: <5cate> <f::-u-n.a."Tle> <fru-5lot>
Help: ?RU changed state
Description: The state of the indicated ccmponent (field-replaceable unit, or ?RU) changed a3
indicated.
Type: Event: This message reports an event, not an e=ror
You can also view Junos OS technical documentation to determine chassisd message
definitions at: http://www.juniper.net;techpubs/software/junos/junosmsyslog-messages/
chassisd-system-log-messages.html
- -- - ..
• Craft interface: Front panel alarm LEDs
... ....
IS) LS) ...,_ Lllll Lllll
Red
Green
CPU Temperature
In addition to the ambient temperature surrounding the system components, you can see the actual CPU temperature of the RE.
Temperature Thresholds
As temperatures rise within the chassis of a router running the Junos OS, the router will begin to protect itself by increasing
fan speeds or alerting you of the higher temperatures using a yellow or red alarm. The slide shows the temperature threshold
settings for an MX240 router. Note that if a component reaches the fire shutdown temperature threshold, the router shuts
down to stop the component from becoming damaged.
[edit]
user@mx240/ set chassis alarm?
Possible completions:
+ apply-groups Groups from which to inherit configuration data
+ apply-groups-except Don't inherit configuration data from these groups
> dsl DSl alarms
> ethernet Ethernet alarms
> integrated-services Integrated services alarms
> management-ethernet Management Ethernet alarms
> serial Serial alarms
> services services PIC alarms
> sonet SONET alarms
> t3 DS3 alarms[edit]
user@mx240/ set chassis alarm ethernet link-down red
• show chassis...
• fan: displays the current fan speeds
• fpc: displays status of FPCs and PICs
• tfeb: displays status of control board
• hardware: shows installed hardware part and serial
numbers
• firmware: shows installed firmware
• show system...
• uptime: displays current time and uptime
• reboot: displays any scheduled reboots
Other Commands
The slide shows some other useful commands to display information about your router's hardware.
Summary
• In this content, we:
• Described the key commands and features used to monitor
storage and memory issues
• Described the key commands and features that you can use
to monitor software installations
• Determined how to find potential hardware problems using
system logs
• Described the key commands that you can use to monitor
hardware and environmental issues
We Discussed:
The commands and features used to monitor storage and memory issues;
The commands and features that you can use to monitor software installations;
Finding potential hardware problems using system logs; and
The commands that you can use to monitor hardware and environmental issues.
Review Questions
1. How do you force a router to boot from rotating
media?
2. Describe ways in which you can troubleshoot
Junos platforms using visual indicators.
3. List three ways that you can use the Junos CLI to
assist in hardware troubleshooting.
4. Describe three ways of determining whether any
chassis alarms are present.
5. What CLI command searches the messages file for
all lines matching fail or error?
Review Questions
1.
2.
3.
4.
5.
2.
When standing near a router running thejunos OS, you should be able to view the LEDs on the craft interfaces as well as the PEMs to
determine so1nc indication of hardware status.
3.
You can use the Junos CLI to assist in hardware troubleshooting by issue show system, show chassis, and show log commands.
4.
To determine whether there are any chassis alarms, you can look at the craft interface LCD, the alarm LEDs, or issue the show chassis
alarms conm1and.
5.
To search the messages log file for all line match fail or error, issue the command show log mes sages J match
11 fai1 I error 11.
Objectives
We Will Discuss:
The monitoring and troubleshooting of control plane system processes;
A logical approach to troubleshooting routing issues; and
The monitoring and troubleshooting of basic bridging and Address Resolution Protocol (ARP) functionality.
The Control Plane Hosts the Brains of the Ju nos Operating System
When discussing the control plane of a device running the Junos operating system, the discussion revolves around the
Routing Engine (RE). The RE acts as the brains of a Junos device.
The RE runs various protocol and management software processes that reside inside a protected memory environment. The RE
is based on an X86 or PowerPC architecture that hosts flash memory and/or a hard disk drive, depending on the specific
platform running the Junos OS.
The RE maintains the routing tables, bridging table, and primary forwarding table and connects to the Packet Forwarding Engine
(PFE) through an internal link. It handles all protocol processes in addition to other software processes that control the device's
interfaces, the chassis components, system management, and user access to the device. These software processes run on top
of the Junos kernel, which interacts with the data plane. The software directs all protocol traffic from the network to the RE for
the required processing.
The RE provides the command-line interface (CU) in addition to the J-Web graphical user interface (GUI). These user interfaces
run on top of the Junos kernel and provide user access and control of the device. The RE controls the data plane by providing
accurate, up-to-date Layer 2 and Layer 3 forwarding tables and by downloading microcode and managing software processes
that reside in the data plane's microcode. The RE receives hardware and environmental status messages from the data plane
and acts upon them as appropriate.
Routing
Tables - Routing
Protocol
Process I Interface
IBI Chassis
I
I
Process Process
Forwarding
Table
I
t t
Layer2 Kernel (Operating System)
Bridging � Protocol
Table Process
The Kernel
The Junos kernel provides the underlying infrastructure for all the Junos processes. It is responsible for scheduling and device
control. In addition, the kernel provides the link between the routing and switching tables and the RE's forwarding table. It is
responsible for all communication with the data plane, which includes keeping the PFE's copy of the forwarding table
synchronized with the master copy in the RE.
System Processes
• Processes in the user space interact with the kernel
and are often called daemons
• The Junes OS runs a variety of daemons:
Junos Processes
Processes in the Junos OS run as daemons, or programs, in the background of the operating system. These processes run in
the user space of the operating system. A typical operating system is comprised of the user space and the kernel space. The
kernel space is reserved for kernel operations. Both spaces run in separate memory allocations.
Key Daemons
The show system processes command displays the processes running on the RE in a manner similar to a ps -ax
listing at a shell prompt. You can use this command to confirm that a given daemon (or process) is running, and to determine
what Process ID (PIO) it was assigned. In the Junos OS, the init process is a meta-daemon that starts, monitors, and, if
needed, restarts other daemons. The routing protocol process (rpd), chassis control daemon (chassisd), and the device
control daemon (dcd) are some of the key processes in the Junos OS. The following output shows a list of processes running
on an MX Series 30 Universal Edge Router. including the process name, PIO, raw CPU usage, and memory allocation. The
output has been trimmed for brevity.
user@mx> show system processes extensive I no-more
last pid: 32186; load averages: 0.00, 0.00, 0.00 up 74+17:38:15 15:55:30
119 processes: 2 running, 89 sleeping, 28 waiting
Continued on the next page.
PIO USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 1 171 52 OK 16K RUN 1773.8 98.05% idle
13 root 1 -20 -139 OK 16K WAIT 594:38 0.00% swi7: clock
1114 root 1 96 0 29872K 11624K select 137:25 0.00% chassisd
1268 root 3 20 0 42508K 12680K sigwai 85:35 0.00% jpppd
1271 root 1 96 0 10920K 4676K select 48:47 0.00% jdiameterd
1276 root 2 96 0 18828K 7832K select 32:34 0.00% pfed
1111 root 1 96 0 1968K 808K select 30:27 0.00% bslockd
1278 root 1 96 0 15856K 11140K select 17:43 0.00% snmpd
1132 root 1 96 0 3656K 936K select 15:43 0.00% license-check
1115 root 1 96 0 5440K 1976K select 15:09 0.00% alarmd
12 root 1 -40 -159 OK 16K WAIT 12:39 0.00% swi2: net
28 root 1 -68 -187 OK 16K WAIT 12:36 0.00% irq36: tsecl
23 root 1 -52 -171 OK 16K WAIT 10:20 0.00% irq43: i2c0 i2cl
1275 root 1 96 0 4756K 1980K select 8:47 0.00% irsd
56 root 1 12 0 OK 16K - 7:49 0.00% schedcpu
1265 root )_ 96 0 5932K 2736K select 7:37 0.00% cfmd
15 root 1 -16 0 OK 16K - 7:33 0.00% yarrow
1279 root 1 96 0 30040K 8684K select 6:35 0.00% dcd
2 root 1 -8 0 OK 16K - 6:19 0.00% g_event
44 root 1 20 0 OK 16K syncer 5:43 0.00% syncer
43 root )_ 20 0 OK 16K vnlrum 5:01 0.00% vnlru rnem
27 root 1 -68 -187 OK 16K WAIT 4:37 0.00% irq35: tsecl
3 root 1 -8 0 OK 16K - 4:36 0.00% g_ up
4 root 1 -8 0 OK 16K - 4:32 0.00% g_down
48 root 1 -16 0 OK 16K psleep 4:09 0.00% vmkmemdaemon
1128 root 1 96 0 7568K 3200K select 3:15 0.00% bfdd
28864 root 1 96 0 18328K 8384K select 3:09 0.00% 12ald
1264 root 1 96 0 5108K 2144K select 2:58 0.00% lfmd
1277 root 96 0 12396K 6736K select 2:34 0.00% mib2d
5122 lab 1 96 0 22032K 13168K select 2:08 0.00% cli
1124 root 1 96 0 6420K 3260K select 1:56 0.00% ppmd
1261 root 1 96 0 72636K 61272K select 1:55 0.00% dfcd
1129 root 1 96 0 8900K 3192K select 1:53 0.00% lacpd
1167 root 1 96 0 19068K 6712K select 1:38 0.00% cosd
1272 root 1 4 0 7728K 3792K kqread 1:32 0.00% mcsnoopd
9 root 1 171 52 OK 16K pgzero 1:31 0.00% page zero
1166 root 1 96 0 6880K 2804K select 1:31 0.00% ilmid
1116 root 1 96 0 6604K 1888K select 1:31 0.00% craftd
46 root 1 -16 0 OK 16K sdflus 1:25 0.00% softdepflush
1130 root 1 96 0 12812K 3372K select 1:21 0.00% bdbrepd
20755 root 1 96 0 2452K 2300K select 1:16 0.00% ntpd
1269 root 1 96 0 4616K 1680K select 0:58 0.00% iccpd
42 root 1 -16 0 OK 16K psleep 0:43 0.00% bufdaemon
28868 root 1 4 0 12260K 5440K kqread 0:37 0.00% 12cpd
45 root 1 -4 0 OK 16K vlruwt 0:37 0.00% vnlru
27887 root 1 4 0 41652K 12444K kqread 0:33 0.00% rpd
50 root 1 -16 0 OK 16K psleep 0:33 0.00% vmuncachedaemon
20763 root 1 96 0 7808K 3564K select 0:26 0.00% rmopd
1280 root 1 96 0 19980K 8552K select 0:25 0.00% dfwd
1117 root 1 96 0 37852K 17516K select 0:20 0.00% mgd
Mem: 306M Active, 34M Inact, 62M Wired, 139M Cache, 112M Buf, 1448M Free
swap: 2915M Total, 2915M Free
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 1 171 52 OK 16K RUN 1328.7 98.05% idle
Hint: Use show task memory detail over time to identify memory leaks.
You can monitor the overall state of system processes with the show system processes summary command as
shown on the slide. The output of the command lists the last PIO to start, CPU load averages over time, total processes,
process states and memory allocations. Note that the memory usage displayed represents allocated memory, rather than
actual memory usage.
Rarely, a Junos device might experience a memory leak. Although this issue and its resolution are usually handled by JTAC,
you might be asked to provide the several outputs of show task memory detail over intervals of time. This command
provides memory utilization per process and capturing it over time allows JTAC to examine which process or processes might
be eating memory.
System Connections
The show system connections command shows open IP sockets on the Junos device. It is equivalent to UNIX shell
command netstat -a and also displays open ports associated with the connection.
You can use the I etc/services file to determine service-to-port mappings:
user@rnx> file show /etc/services I match 22
ssh 22/tcp #Secure Shell Login
ssh 22/udp #Secure Shell Login
System Alarms
Two types of alarms exist on a Junos device: chassis alarms and system alarms. Although chassis alarms are more common
and pertain to a wide variety of chassis alarm conditions, system alarms are reserved for licensing issues and the absence
of a rescue configuration.
Continued on the next page.
User Processes
• Log out of the Junos device gracefully to prevent hung
user sessions:
user@router> show system users
9:46PM up 56 days, 23:25, 2 users, load averages: 0.41, 0.18, 0.07
USER TTY FROM LOGIN@ IDLE WHAT
lab uO 21Sep10 - -cli (cli)
lab po 10.210.15.30 9:45PM - -cli (cli)
User Processes
Each time a user logs in to the Junos OS, a new process associated with a PID is created just as with other system processes.
User processes can be viewed with the show system users command as shown on the slide. A teletype (tty) session with
the character "u" represents a console type of session. A tty with the character "p" represents a remote Telnet or SSH type of
session.
• Core functions:
• Controls routing protocols running on router
• Starts all configured protocols
• Handles all routing messages
• Maintains routing tables
• Implements routing policy
• Maintains its own scheduler
• Prioritizes and switches between routing tasks
Routing-�
Tables r·· · ?2::r
Junos Kernel
I�
The Routing Protocol System Process
The routing protocol daemon controls the routing protocols running on the router. It starts all configured routing protocols and
handles all routing messages. It also maintains one or more routing tables, which are also called Routing Information Bases
(RIBs). These tables consolidate the routing information learned from various routing protocols into a common table.
The routing protocol process determines the active routes to network destinations and installs these routes into the RE's
forwarding table, also called the forwarding information base (FIB). Finally, it implements routing policy, which allows you to
control the routing information that is transferred between the routing protocols and the routing table. Using routing policy, you
can filter routing information or modify attributes associated with the routes, such as adding or removing BGP communities.
The Junos OS implements unicast and multicast IP routing functionality for IP version 4 (1Pv4) and IP version 6 (1Pv6) and also
supports MPLS signaling and switching.
Task Accounting
The best way to troubleshoot scheduler slips is to temporarily enable task accounting.To enable task accounting, issue the
hidden operational mode command set task accounting on. Because task accounting adds a significant processing
burden to a system that is evidently already busy enough, hence the slips, you must take care to ensure that accounting is
turned off after a few minutes with a set task accounting off command. Note that this command is unhidden once you
have turned on task accounting.
When you enable task accounting, the rpd scheduler increases the verbosity of its system logging. The added detail should
help identify where rpd is spending all of its time. An example of the added logging detail is shown:
Nov 01 12:00:00 router rpd[609): excessive runtime: BGP 65019.192.168.1.1+179
ran for 12.908 (12.885 user, 0.023 system)
Nov 01 12:00:01 router rpd[609): task_monitor slip: 10s scheduler slip
From this log entry, you can determine that rpd spent over 12 seconds processing BGP updates from peer 192.168.1.1 in
AS 65019.
The output of a show task command with the hidden accounting switch confirms whether task accounting is currently
enabled, and if so, displays a list of the busiest processes.Using the added log file detail and the output of a show task
accounting command, you should be able to at least identify the nature, if not the actual cause, of scheduler slips.
Once you have captured the output of a show task accounting command for submission to JTAC, be sure to disable
task accounting with a set task accounting off command so that additional burden is not placed on your router.
user@mx> set task accounting on
Task accounting enabled.
Core Files
• Core dump files
• Generated by system process crashes (or forcibly)
• Files should be uploaded to JTAC and associated with a JTAC
case number
• Core dumps fall into three categories:
• Process: Processes running on the Routing Engine
• Kernel: The Routing Engine kernel itself
• PFE boards: The microkernel OS running on the PFE boards
• Check for core dump files
• System syslog
•request support information
•show system core-dumps
• /var/tmp typically hosts process cores
• /var I crash typically hosts kernel and PFE cores
Troubleshooting Methodology (1 of 3)
• New or existing implementation?
• Understanding is important for isolating the issue
• Do no harm! LeastSevereAction
• Clearing a route or database entry
• Single route must refresh
• Bouncing a protocol session or neighborship
• All learned routes must refresh
• Bouncing a protocol
• All adjacencies or peerings must re-establish
• Restarting routing (rpd)
• All routing must restart
• Rebooting the device
• All system processes must restart
MostSevereAction
Do No Harm!
Recall our troubleshooting methodology mantra of do no harm! When troubleshooting routing protocol issues on a live
network, this mantra becomes especially important. The slides display some common resolutions in order of the least severe
impact to the most severe impact on a network. Although it might seem like common sense, it deserves stating that
restarting routing might force your OSPF adjacency to re-initiate, but remember that it will also bounce all those BGP
sessions your network is relying on.
Troubleshooting Methodology (2 of 3)
• Define success (and isolate)
• Route received from neighbor
• Check protocol adjacency
• Check protocol database
• Route appears in routing table
• Check preference
• Test import routing policy
• Route being advertised to neighbor
• Check protocol adjacency
• Test export policy
• Route is stable
• Check logs. interfaces. and protocol traces
Define Success
As part of the routing troubleshooting process, be sure to have a clear, unified definition of success. Some common
questions might include:
Is the device receiving expected routes from its neighbor?
Is the suspect route in the routing table?
Is the suspect route being advertised to its neighbor?
Is the route stable over time?
Troubleshooting Methodology (3 of 3)
• Identify and implement a solution
• Repair hardware issue
• Adjust protocol configuration
• Adjacency configuration
• Metrics and preferences
• Policy
• Adjust implementation
• Prevent link overutilization
• Test in lab environment
Implementing a Solution
Once you formulated a theory as to the nature of the issue, it is time to implement change that, hopefully, results in
resolution. This change might involve a hardware swap, configuration changes, network infrastructure changes, for example,
implementing a new link to share the load of an over-utilized link.
If it all possible, test your change in a lab environment to monitor the effects in a controlled, and more importantly, test
environment where the impact will not be shared by the live network. At the very least, implement changes in an announced
maintenance window so that end user effects are minimized.
1
� Router D
RouterA �
.1
Router F
-:92168.50/24 "
2
®
Router E
Host X HostY
"C
192.168.30.2 ping statistics
6 packets transmitted, 0 packets received, 100% packet loss
Ping
The slide illustrates the most often used method of network troubleshooting: an Internet Control Message Protocol (ICMP)
ping test. In this case, the ping is sourced from Router A and issued to both known interfaces on Router D with no success.
By default, the Junos OS sources ping packets from the egress interface and default routing instance of a device. The default
command results in a continuous ping with a data payload size of 56 bytes and can be stopped with a Ctrl+C keystroke. You
can alter many aspects of this behavior (output trimmed for brevity):
user@mx> ping ?
Possible completions:
<host> Hostname or IP address of remote host
atm Ping remote Asynchronous Transfer Mode node
bypass-routing Bypass routing table, use specified interface
clns Ping ISO node
count Number of ping requests to send (1 .. 2000000000 packets)
detail Display incoming interface of received packet
do-not-fragment Don't fragment echo request packets (IPv4)
ethernet Ping to an ethernet host by unicast mac address
inet Force ping to IPv4 destination
inet6 Force ping to IPv6 destination
interface Source interface (multicast, all-ones, unrouted packets)
Router F Router E
Traceroute
The second most commonly used routing troubleshooting tool is the traceroute command. Performing a traceroute
results in ICMP packets sent to each hop in a path by incrementing the time-to-live (TIL) value of each subsequent packets
by one. By monitoring the responses of each host in the path, network operating systems such as the Junos OS can present
you with a reachability map of the network. You can use this map to isolate where a problem might reside.
In the example on the slide, Router A is performing a traceroute to one of Router D's interfaces. As shown on the slide, the
traceroute is not completely successful.
Router B Router C
Router F Router E
No Suspect
configuration
or lGP
Suspect
Suspect lGP No remote
Yes configuration l+--C. peer
policy
lnvestil!ate
forwariling
faults Suspect l)Olicy Suspect policy
orlGP orlGP
configuration configuration
• Helpful commands:
• Protocol show commands:
user@router> show ospf neighbor
Address Interface state ID Pri Dead
172.18.5.1 ge-1/0/2.144 Full 192.168.37.1 128 31
interface all;
/
Sets LSA to MAXAGE. resulting
in re-advertisement from
originator
• Traceoptions:
(edit protocols ospf]
user@router# set traceoptions flag?
Possible completions:
all Trace everything
database-description Trace database description packets
error Trace errored packets
event Trace OSPF state machine events
flooding Trace LSA flooding
user@mx2>
VLAN 144
17218 5.0/30
AreaO
LSReq 0 0 0 0
LSUpdate 0 0 0 0
LSAck 0 0 0 0
OSPF Traceoptions
To obtain more detailed OSPF information, we configure an OSPF traceoptions file and flag for hello messages. Once we
bounce the OSPF adjacency by disabling and re-enabling OSPF on the interface, we discover that we are indeed, sending
OSPF hello messages. However, no further light is shed on the issue.
Note: Monitoring the interface traffic would have been helpful with a plain-text authentication mismatch. but an MD5
secret mismatch would not have been detected.
Monitoring Bridging
The slide highlights the topic we discuss next.
MX Series Bridging
Many of the newest generation of Juniper Networks devices provide expanded Layer 2 support in addition to routing. The MX
Series of Ethernet services routers allow for bridging capabilities optimized for the metro Ethernet environment. The slide
illustrates the Layer 2 processes eligible for restart using the Junos CLI. The 12-learning process maintains bridging
functionality and the bridging table. The 12cpd-service is responsible for media access control (MAC) address system
parameters and xSTP protocols.
EX Series Bridging
EX Series Ethernet Switches provide Layer 2 functionality aimed at the enterprise environment and have slightly different
configuration and monitoring commands. The slide illustrates the Layer 2 processes eligible for restart using the Junos CLI.
The ethernet-switching process is responsible for core bridging functionality and address learning. The
lldp-service maintains the Link Layer Discovery Protocol (LLDP) process.
• EXSeries
user@ex> clear ethernet-switching table
ARP Overview
• ARP associates IP addresses with Layer 2 addresses
in an ARP table
• Once a routing issue is isolated to a broadcast segment,
monitor the ARP process for a local problem
D
.2
�-
HostX Host Y
ARP Table:
192.168.30.2 = 02:00:54:55:4E01
Summary
• In this content, we:
• Monitored and troubleshot system processes
• Practiced a logical approach to troubleshooting control
plane routing issues
• Learned methods of monitoring and troubleshooting
bridging functionality and ARP
We Discussed:
The monitoring and troubleshooting of control plane system processes;
A logical approach to control plane routing issues; and
The monitoring and troubleshooting of bridging and ARP functionality.
Review Questions
1. Name five functions of the control plane.
2. What are Junos system processes called?
3. How can you determine whether a user is logged in
using a console session versus a Telnet session?
4. Name three functions of rpd.
5. Name three commands that you can use to
troubleshoot control plane issues.
Review Questions
1.
2.
3.
4.
5.
3.
To identify whether a user is logged in using the console or remotely using Telnet or SSH, view the output of the show system users
command. A tty value beginning with a u character indicates a console session. A tty value beginning with a p character indicates a
remote Telnet or SSH session.
4.
rpd is responsible for controlling routing operations including controlling routing protocols, handling protocol messages, maintaining
routing tables, and implementing routing policy.
5.
Junos show commands such as show system statistics, show system processes, show system core-dmnps, show systetn users, and show
system alarms can be used to troubleshoot control plane issues among others.
Objectives
• After successfully completing this content, you will be
able to:
• Describe physical and logical interface properties
• Deactivate and disable interfaces
• Perform loopback testing
• Use operational mode commands to monitor and
troubleshoot Ethernet interfaces
Junw-
•
)!Of• . •• • ••
110':ro14:,..�;pa,Nei,;,.,�,oc:.;11riildsreoem,d. Worldwide Education Services ........J\PUpef.net I 2
l,��At;:c;......�=...,�- -�
We Will Discuss:
Physical and logical interface properties;
Deactivating and disabling interfaces;
Loopback testing; and
Monitoring and troubleshooting Ethernet interfaces.
7lnterface Properties
• General Interface Troubleshooting
• Ethernet Interface Troubleshooting
Interface Properties
The slide lists the topics we will discuss. We discuss the highlighted topic first.
Interface Properties
• Physical properties:
• Ethernet options (speed, autonegotiation)
• Clocking
• Scrambling
• Frame check sequence (FCS)
• Maximum transmission unit (MTU)
• Data link layer protocol, keepalives
• Logical properties:
• Protocol family (Internet, ISO, MPLS, Bridge)
• Addresses (IP address, ISO NET address)
• Virtual circuits (VCI/VPI, DLCI)
Physical Properties
The following list provides details of the interface's physical properties:
Ethernet options: For Ethernet interfaces, refers to speed, duplex, and autonegotiation parameters.
Clocking: Refers to the interface clock source, either internal or external.
Scrambling: Refers to payload scrambling, which can be on or off.
Frame check sequence (FCS): You can modify to 32-bit mode (the default is 16-bit mode).
Maximum transmission unit (MTU): You can vary the size from 256 to 9192 bytes.
Data-link-layer protocol, keepalives: You can change the data-link-layer protocol for the particular media type (for
example, Point-to-Point Protocol [PPP] to Cisco High-Level Data Link Control [Cisco HDLC)]), and you can turn
keepalives on or off.
The following list provides details of the interface's logical properties:
Protocol family: Refers to the protocol family you want to use, such family iso, inet, or mpls.
Addresses: Refers to the address associated with the particular family (for example, IP address using family inet).
Virtual circuits: Refers to the virtual circuit identifier, such as a data-link connection identifier (DLCI), virtual path
identifier (VPl)/virtual channel identifier (VCI), or virtual LAN (VLAN) tag.
Other characteristics: Some other configurable options include Inverse Address Resolution Protocol (ARP), traps,
and accounting profiles.
[edit interfaces]
user@router# show ge-1/0/1
H
## inactive: interfaces ge-1/0/1
H
[edit interfaces]
user@router# show ge-1/0/1
disable;
Deactivating an Interface
In a configuration, you can deactivate statements and identifiers so that they do not take effect when you issue the commit
command. Any deactivated statements and identifiers are marked with the inactive tag. They remain in the configuration but
are not activated when you issue a commit command.
To deactivate a statement or identifier, use the deactivate configuration mode command: deactivate (statement I
identifier). To reactivate a statement or identifier, use the activate configuration mode command: activate
(statement I identifier). You can deactivate or disable a statement at many levels of the hierarchy.
ctl-0/1/0:1
description "CTl to NxDSOs.";
tl-options {
line-encoding ami;
framing sf;
bert-algorithm all-ones-repeating;
ds-0/1/0:1:2
description "Second DSO channel bundle of ctl-0/1/0:l";
unit O {
family inet {
address 2.2.2.2/24;
)
tl-0/1/0:2
description "First full Tl from ct3-0/l/O, range is tl-0/1/0: (2-28]";
encapsulation cisco-hdlc;
unit O {
family inet
address 3.3.3.3/24;
Tools Available
The following pages discuss the tools available in the Junos OS.
___________./j
Interface Ad.min Link Proto Local Remote
so-1/1/0 laownj'"iip
so-1/1/0.0 up down inet 1.1.1.1/30
iso
.......
so-1/1/1 up @0,...711,
so-1/1/1.0 up down inet 2.2.2.2/30 Link layer down
iso
so-1/1/2 up �
so-1/1/2.0 up up inet 3.3.3.3/30 Link layer up
.-��������������������������������-,
Admin Link Meaning
down down Administratively disabled
up down Router interface problem
Interface misconfigured (encapsulation)
Keepalive sequencing not incrementing
CSU/DSU failure
Carrier problem (noisy line. timing mismatches)
Logical interface ge-1/0/ 1.141 hndex 329) (Sl-l"MP ifindex 655 )I Logical device indexes
Flags: SNMP-Traps OxO VLAN-Tag [ Ox8100.141 ] Encapsulation: ENET2
Input packets : 15479
Output packets: 14669
Protocol inet, MTU: 1500
Flags: sendbcast-pkt-to-re
Addresses, Flags: Is-Preferred Is-Primary l..o!!ical device settings
Destination: 172-18.2.0/30, Local: 172.18.2-2, Broadcast: 172-18-2_3
Protocol mul�iservice, MTU: Unlimited
Loose-LMI: Frame Relay will not use the Local Management Interface (LMI) to indicate whether the link protocol
is up.
Loose-NCP: PPP does not use Network Control Protocol (NCP) to indicate whether the device is up.
No-Keepalives: Link protocol keepalives are disabled.
The output from Ethernet interfaces, as shown on the slide, does not display link layer flags.
The output also summarizes the device-level traffic load, which is displayed in both bits and packets per second, as well as
any alarms that might be active. The final portion of the command output displays the configuration and status of each
logical unit defined on that device. In this example, a single unit is defined with support for the inet protocol family.
·I
FIFO errors: 0, Resource errors: 0
uucput errors:
Carrier transitions: 1, Errors: 0, Drops: 0, Collisions: 0, Aged packets: 0,
FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource errors: 0
Egress queues: 8 supported, 4 in use
Queue counters: Queued packets Transmitted packets Dropped packets
O best-effort 14935 14935 0
1 expedited-to O 0 0
2 assured-for,-.r 0 0 0
3 network-cont 840 840 0
Queue nwnber: Mapped forwarding classes
0 best-effor1:
1 expedited-forwarding
2 assured-forwarding
3 network-control
Ac"Cive alarms None
Active defects None
Monitoring an Interface
user@router> monitor interface ge-1/0/1
router Seconds: 57 Time: 13:15:37
Delay: 21/0/79
Interface: ge-1/0/1, Enabled, Link is Up
Encapsulacion: Ethernet, Speed: lOOOmbps
Traffic statistics: Current delta
Input bytes: 10544c910 (0 bps) (1386]
)
om:put bytes: 76075226 (368 bps) [720J
Input packets: 1489787 (0 pps) [2QJ
Output packets: 826831 (0 pps) [8]
Error statistics:
Real-time
Input errors: 0 traffic and [OJ
Input drops: 0 error rounts [OJ
Input framing errors:
_)
0 [OJ
Policed discards: 0 [OJ
13 incompletes: 0 [OJ
L2 channel errors: .(
0 [OJ
L2 mismatch timeouts: 0 Carrier transiti [0]
Monitoring an Interface
The slide depicts a typical output from the monitor interface command. You must set your terminal session to VT100 for
the screen to display correctly. This command provides real-time packet and byte counters as well as displaying error and alarm
conditions. This output contrasts to the monitor traffic command, which displays a form of packet capture for control
traffic.
Loopback Testing
• Loopback testing is the primary method for
distinguishing between interface and circuit faults
Loopback Testing
The physical path of a line usually consists of a number of segments or spans interconnected by devices that repeat and
regenerate the signal. When a fault occurs on the circuit that takes the form of either a break or signal corruption due to noise,
it is possible to localize the problem by testing the line on a segment-by-segment basis or end-to-end basis, as needed.
Each circuit is symmetric in that a transmit path from one device connects to the receive path on the remote side, and vice
versa. Looping is the process of connecting the transmit path of a router or intermediate device to the receive path. If this device
is one of the routers, the loop will either be detected if the looped segment is operational, or not detected if there is a break.
This detection is achieved by the router detecting its own data-link-layer keepalive packets (for example, the magic number when
the encapsulation is PPP).
If a loop is set back towards a router and it is not detected, you can assume that the problem lies somewhere between the router
and where the loop was set by the telco or provider. The next step is to set a loop somewhere closer to the router to localize the
problem segment.
It is usually possible to loop the router's interface locally by connecting the PIC's transmit and receive ports. You should take
care to attenuate signal strength when dealing with intermediate- and long-reach fiber-optic interfaces.
You can use a similar approach to track down noise on a line by combining the looping process with a test that checks for bit
rate errors, commonly known as a bit error rate test (BERT). Many of the Juniper Networks M Series Multiservice Edge Router
and T Series Core Router interfaces support BERT testing.
Port is OK
(Internally)
Loopback remote
Configuring Loopbacks
• Local and remote loops require configuration on most
Pl Cs
• External local loop and telco line loops do not require
configuration
[edit interfaces ge-1/0/ll
user@routert show
gige er-op 1.ons
loopback; Only local loops permitted on Ethernet interfaces
unit O {
family inet
address 172.22.241.1/24
!arp 172.22.241.10 mac 80:71:lf:c3:18:61;j
Configuring Loopbacks
Interface loopbacks require configuration in the Junos OS for most PICs and interface types. A small number of channelized DS3
and OC12 interfaces support the ability to initiate FEAC-based or Tl inband and FDL-based loopbacks using operational mode
commands. Note that configuration is never needed for an external local-loopback with a loopback plug, or when relying on the
telco to provide a line loopback (which appears as a remote loopback to the attached router). This slide shows an example of a
local-loopback configuration and the operational mode status display that confirms that the loopback is in place. The example is
based on a Gigabit Ethernet interface and displays a manually configured Address Resolution Protocol (ARP) entry. The manual
ARP configuration enables the Junos OS to send test frames without the need for ARP resolution.
Note that when the telco provides a line loopback, nothing indicates that a loopback is in place, unless the configured Layer 2
protocol has built-in loopback detection-for example, PPP. The routers used in this example are running Frame Relay with
LMl-based keepalives disabled. As a result. a remote loopback goes undetected at the remote router, which is now talking to
itself as indicated by the TTL expiration messages shown here (we cover the use of ping to test loopbacks on an upcoming slide):
Continued on the next page.
Minimize Disruption
Flapping Links
When you suspect intermittent failures of an interface or transmission line, you should consider removing the interface from the
routing protocol configuration. Removing the interface from OSPF or Intermediate System-to-Intermediate System (IS-IS)
advertisements limits the flapping interface's impact on the rest of the network while you isolate and correct the problem.
For OSPF, you can disable the interface using the command shown on the slide. For IS-IS, you can use the same approach, or
alternatively, you can remove the family iso from the interface configuration. In all cases, you should take care to restore
the interface's configuration when the problem is resolved.
• Interface Properties
• General Interface Troubleshooting
�Ethernet Interface Troubleshooting
Suspect bad IP
configuration
No
[];]
�! Bad L2 config
Ethernet Topologies
• Port types:
• Gigabit Ethernet, Fast Ethernet. and so forth
• Link mode (full or half duplex)
• Tools:
•ping
•loopback (local)
•show interfaces extensive
•show interfaces media
•show arp
•monitor traffic
•monitor interface
•clear statistics
Link Mode
When troubleshooting Ethernet topologies, consider the link mode:
Full duplex;
Half duplex; or
Link bonding (802.3ad).
Fast Ethernet interfaces can support half or full duplex, but Gigabit Ethernet and 10 Gigabit interfaces function only in
full-duplex mode.
Junos Tools
The Junos OS provides the tools shown on the slide. The tools listed depict various CU commands used to troubleshoot and
monitor Ethernet interfaces. The following pages examine these tools.
Ethernet Troubleshooting (1 of 4)
Ethernet Troubleshooting (2 of 4)
Generic Tips
We recommend the following:
Ensure that encapsulation types are equivalent to other hosts or router on link.
Use the show interfaces extensive command to check status of interface.
Use the monitor interfaces command to receive real-time statistics.
Use the monitor interface interface-name traffic command to display real-time statistics about a
physical interface. The output is updated every second. The output of this command also shows the amount that
each field has changed since you started the command or since you cleared the counters by using the c key. This
command also checks for and displays common interface failures, such as SONET/SDH and T3 alarms, loopbacks
detected, and increases in framing errors. If the framing errors are increasing, this indicates that frames are being
corrupted. If the input errors are increasing, check the cabling to the router and have the carrier verify the integrity
of the line.
Ethernet Troubleshooting (3 of 4)
-------
interface ge-1/0/2;
interface ge-1/0/1 {
back;
Allows remote peer to
set local interface in
loopback state
Ethernet Troubleshooting (4 of 4)
Summary
• In this content, we:
• Described physical and logical interface properties
• Deactivated and disabled interfaces
• Performed loopback testing
• Used operational mode commands to monitor and
troubleshoot Ethernet interfaces
We Discussed:
Physical and logical interface properties;
Deactivating and disabling interfaces;
Loopback testing; and
Monitoring and troubleshooting Ethernet interfaces.
Review Questions
Review Questions
1.
2.
3.
4.
5.
The monitor interface command displays real-time statistics, while the monitor traffic command displays a packet dump of control
traffic transiting the interface.
2.
Deactivating an interface causes the JUnos OS to ignore the deactivated configuration, while disabling an interface results in the
interface being administratively disabled.
3.
Loopback testing can be used to loop traffic back to the originator at various physical points within a circuit. This helps determine the
location of a fault.
4.
Disabling a troubled interface can remove instability in routing protocols and prevent black holes.
5.
A policed discard represents traffic unknown to the Junos OS, such as CDP traffic.
Objectives
• After successfully completing this content, you will be
able to:
• Recognize data plane problems and components
• Monitor and troubleshoot data plane forwarding
• Monitor load balancing
• Troubleshoot firewall filter and policer issues
We Will Discuss:
Data plane problems and components;
Monitoring and troubleshooting data plane forwarding;
Monitoring load balancing; and
Troubleshooting firewall filter and policer issues.
Data Plane
DD]DD]
so-0/1/1
®
mx.A. mxB
Loo: 192.168.10.1 LoO: 192.168.20.1 P2P forwarding
interface
user@mxA> show route forwarding-table destination 192.168.20-1
Routing table: default.inet
Internet:
Destination T e RtRef Next ho . e Index NhRef Netif
192.168-20.1/32 user O 256 2 so-0/1/1.0
P2P interfaces do not use a forwarding next hop
�
ARP resolves forwarding next hop
Clearing FT Entries
The slide illustrates the syntax used to remove entries from the master copy of the FT. In most cases the entry is immediately
written back into the FT as this is the kernel's idea of a good time.
Note that this command is rarely used. It is only intended to recover from the rare case of an invalid next hop interfering with a
valid next hop by virtue of its remaining in the PFE after the related route is removed from the routing table.
Load-Balancing Behavior
The slide highlights the topic we discuss next.
[edit]
user@router# set routing-options forwarding-table export load-balance
Destination IP address;
Protocol;
Source port number;
Destination port number; and
Incoming interface index.
The router recognizes packets in which all of these Layer 3 and Layer 4 parameters are identical and ensures that these packets
are sent out through the same interface. This step prevents problems that might otherwise occur with packets arriving at their
destination out of their original sequence.
To configure the load-balancing behavior, include the load-balance per-packet option in a then statement or a
route-filter option in a from statement in a routing policy. You must apply the routing policy to routes exported from the
routing table to the forwarding table to complete the configuration. To do this, include the export statement at the [edit
routing-options forwarding-table J hierarchy, as shown on the slide.
The slide shows the affects of applying a per-packet load-balancing policy. The 172. 31. 18. O entry in the FT now contains a
ulst entry, which functions to list the set of unicast next hops that are available for per-flow load balancing. The slide also
shows that, in addition to the single ulst entry, two ucst entries relate to the parallel Gigabit Ethernet links running between
the devices.
Additional confirmation of proper load balancing is possible by monitoring the related interface counters while generating
different flows to the same destination prefix. Note that perfect load balancing-that is, a 50% split between two links-is only
likely when a large number of flows are in effect.
term t1v·o
then accept;
)
user@router> show interfaces policers ge-1/0/1.141
Interface Admin Link Proto Input Policer Output Policer
ge-1/0/1.14 _ up up
inet test-ge-1/0/1.141-inet-o
multiservice de�ault_arp_policer�
Kernel fault
No
(consistency
checking)
Yes Yes
-�
No�
1 Yes
• What is wrong?
• Which CU commands and fault analysis steps can help
narrow down a possible cause? 0:90:69:6a:90:2 B
Router-1
.3 10.0.13/24
®ge-0/0/2
D
Data Plane Case Study: Background
This slide sets the stage for a sample data plane troubleshooting case study. We begin with a general description of the
problem, which in this case indicates that customers are complaining of long recovery times after a reboot or power-cycle of the
Layer 2 switch that interconnects the router's
ge-0/0/2 interface to a large server farm. Oddly enough, the complaints indicate that once connectivity is finally restored,
application traffic works as expected.
Feeling Lucky?
Based on this description, you would be pretty lucky if you already knew the cause of the problem. After all, it could be a
hardware error in the PFE, an interface policer, a firewall, or perhaps an MTU issue, right? We suggest that you follow the general
steps outlined on the sample data plane troubleshooting flow chart to get things started. Put another way, it might be a good
idea to start with the determination of whether the 10.0.13/24 route associated with the server farm is correctly installed into
the FT.
0 10.0.13.0 recv
family inet {
policer {
arp arp-limit;
address 10.0.13.1/24;
Summary
• In this content, we:
• Defined data plane problems and components
• Monitored and troubleshot data plane forwarding
• Monitored load balancing
• Troubleshot firewall filter and policer issues
We Discussed:
Data plane problems and components;
Monitoring and troubleshooting data plane forwarding;
Monitoring load balancing; and
Review Questions
Review Questions
1.
2.
3.
4.
2.
The main forwarding table is stored in the control plane and can be viewed with tl1e show route forwarding-table operational mode
cotnmand.
3.
By default, the Junos OS performs per prefoc load balancing. When you enable per-packet load balancing, tl1e Junos OS performs per
flow load balancing.
4.
To display ARP policers, use tl1e show policer operational mode command.