Index: BGP Routing Part I: BGP and Multi-Homing

BGP ROUTING PART I: BGP AND MULTI-HOMING http://wwwin-people.cisco.com/%7Emarkt/avi.
html
NOTES:
Diagrams will be up in a couple of days.

The HTMLizing of this document is NOT finished.
I haven't gone through to re-check it for accuracy.
This is intended for the first-time multi-homing small ISP. Feel free to give this to
any of your customers, and send me comments and updates to bgp@netaxs.com if
you think something can be illustrated or explained more clearly.
Have fun,
Avi Freedman Net Access
BGP ROUTING PART I: BGP AND MULTI-HOMING

Everyone wants to know about BGP. What is it? How do you use it? What is it used for? We'll try to
explain at least the basics of BGP in this document.
This document is Copyright Avi Freedman, 1997. Distribution of the original or modified versions
for profit is prohibited, but please feel free to give it away.
Index
BGP
A WARNING
PREREQUISITES
BGP ROUTING: INTERNAL (INTERIOR) AND EXTERNAL
SO WHY IS BGP INTERESTING?
BEING "CONNECTED" TO THE INTERNET
HARDWARE AND SOFTWARE FOR SPEAKING BGP
PEERING SESSIONS AND ASNs: PART I
WHAT DO YOU DO WITH BGP?
PEERING SESSIONS
eBGP vs. iBGP
BGP AND THE SINGLE-HOMED
AS-PATHS
AS-PATH LENGTH AND BGP ROUTE SELECTION
AS-PATH ACCESS LISTS (FILTERS)
ENTERING, MODIFYING, AND DELETING as-path access-lists
BGP METRICS (ATTRIBUTES) AND ROUTE SELECTION: INTRODUCTION
BGP PATH SELECTION PROCESS ACCORDING TO CISCO
BGP ATTRIBUTE TYPES
EGP vs. IGP
WHAT IS ROUTE FLAP AND WHY IS IT BAD?
WHAT TO KEEP IN MIND WHEN CONFIGURING BGP
BGP AND PEERING
1 of 26 04/02/00 15:05
BGP ROUTING PART I: BGP AND MULTI-HOMING http://wwwin-people.cisco.com/%7Emarkt/avi.html
INTERNET CONNECTIVITY WITHOUT BGP

BGP AND THE MULTI-HOMED
MULTI-HOMING AND LOAD-BALANCING
HOW TO ANNOUNCE YOUR NETWORKS
BEING ADVERTISED BY MULTIPLE PROVIDERS WITHOUT PI-SPACE
CONTROLLING OUTGOING DATA FLOW: "FULL ROUTING"
CONTROLLING OUTGOING DATA FLOW: "PARTIAL ROUTING": "CUSTOMER ROUTES
ONLY"
SO WHAT'S TO BE DONE?
AS-PATH PADDING
QUESTIONS AND COMMENTS
THANKS TO
TO BE DONE
Sidebars
Sidebar on Cisco BGP commands
Sidebar on next-hop-self
Sidebar on Outgoing Data Flow Control Without BGP
A WARNING
This is dangerous stuff. It's always best if you can test BGP configurations in a "lab" made up of a
few Cisco 2501s before implementing them in a live network connected to the Internet.
Unfortunately, there's no good reference on "using BGP" to refer people to. Reading the RFCs (the
Request For Comment documents that define the protocol at a low-to-mid-level), or even Cisco
documentation (Cisco did not invent BGP, but Cisco's BGP implementation is almost definitely the
most widely-used) does not really tell you enough. Many of the "routing gurus" out there got started
by looking at and working on running networks, where the architecture and implementation were
already done. Most of the rest, however, started with the basics and expanded their knowledge and
experience as their networks grew.
PREREQUISITES
You need to know a bit about IP routing to digest this material. It also doesn't hurt to have a few of
the aforementioned test routers (at least two, one configured as you and one configured as your
provider). Don't be afraid to ask for help. Read your vendor's BGP documentation - all of it, even the
parts you don't understand. Try to get a number of "live configs" for whatever router you're using -
preferably from someone with a similar topology and similar goals.
BGP
BGP stands for Border Gateway Protocol. The popular "BGP" protocol that people speak of ("Can a
Cisco 2501 speak BGP?") in use is actually BGP4 (which differs from BGP3 the same way that
RIPv2 differs from the old RIP protocol - in that BGP4 and RIPv2 (the result of what some call
"unsuccessful brain surgery" on the original RIP protocol) allow the announcement of "classless
routes" - routes that aren't strictly on "Class A", "Class B", or "Class C" boundaries - but instead can
2 of 26 04/02/00 15:05
also be "subnets" or "supernets"). For more information on "classless" or "CIDR" routes, see April's
Boardwatch column.
ROUTING: INTERNAL (INTERIOR) AND EXTERNAL

Internal routing is the art of getting each router in your network to know how to get to every location
(destination) in your network. You can do this simply, with static routes, or in a more complicated
but robust way, with active internal routing protocols such as RIP, RIPv2, OSPF, and IS-IS.
It's obviously critical that any box inside your network know how to get (directly or indirectly) to any
other box inside your network. Before you invite people to send data to your network, you've got to
have a running and happy network to take the data.
If you default route into one or more providers, external routing isn't something you have in your
network. But if you do want to "peer" with someone - or to "multi-home" to multiple providers and
have a little bit more control over where your data goes on the Internet, you will be taking at least
some external routes into your network (and will do so with BGP).
SO WHY IS BGP INTERESTING?

Well, as mentioned above, it's nice to have routing data for parts of the Internet in your routers.
But it is much more useful to tell people outside your network (upstream providers or "peers") about
what routes (or portions of the IP address space) you "know how to get to" inside your network. The
primary purpose of BGP4 (as we're studying it here) is to advertise routes to other networks
("Autonomous Systems").
An AS, or Autonomous System, is a way of referring to "someone's network". That network could be
yours; a friend's; MCI's; Sprintlink's; or anyone's. Normally an AS will have someone or ones
responsible for it (a point of contact, typically called a NOC, or Network Operations Center) and one
or multiple "border routers" (where routers in that AS peer and exchange routes with other ASs), as
well as a simple or complicated internal routing scheme so that every router in that AS knows how to
get to every other router and destination within that AS.
When you "advertise" routes to other entities (ASs), one way of thinking of those route
"advertisements" is as "promises" to carry data to the IP space represented in the route being
advertised. For example, if you advertise 192.204.4.0/24 (the "Class C" starting at 192.204.4.0 and
ending at 192.204.4.255), you promise that if someone sends you data destined for any address in
192.204.4.0/24, you know how to carry that data to its ultimate destination. The cardinal sin of BGP
routing is advertising routes that you don't know how to get to. This is called "black-holing" someone
- because if you advertise, or promise to carry data to, some part of the IP space that is owned by
someone else, and that advertisement is more specific than the one made by the owner of that IP
space, all of the data on the Internet destined for the black-holed IP space will flow to your border
router. Needless to say, this makes that address space "disconnected from the 'net" for the provider
that owns the space, and makes many people unhappy. The second most heinous sin of BGP routing
is not having strict enough filters on the routes you advertise (more on this later). Anyway, the
bottom line: Test your configs and watch out for typos. Think everything that you do through in terms
of how it could screw up.
Also, one terminology note: Classless routes are sometimes called "prefixes". When someone talks
3 of 26 04/02/00 15:05
about a prefix they're talking about a route with a particular starting point and a particular specificity
(length). So 207.8.96.0/24 and 207.8.96.0/20 are not the same prefix (route). We'll mostly use "route"
in this document.
BEING "CONNECTED" TO THE INTERNET

Throughout this discussion it's critical to think about what it means to be "connected" to the Internet.
In order to be connected to the Internet, for each host that is "on the Internet", you need to be able to:
Send a packet out a path that will ultimately wind up at that host, and, just as critically,
That host has to have a path back to you. This means that whoever provides "Internet connectivity" to
that host has to have a path to you - which, ultimately, means that they have to "hear a route" which
covers the section of the IP space you're using, or you will not have connectivity to the host in
question.
Take a look at Figure 1. We'll explain more of the details below, but note the "Home Dialup User".
He's connected to AOL, which is served by ANS (AOL actually owns ANS). We're using
10.10.20.0/24 as an example.
The 10.10.x.x IP addresses are often used in examples because they're "reserved" space. Most
networks will "filter" the RFC 1918 reserved space (10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16),
so people use them in examples because they don't get you into too much trouble if you accidentally
try to use them (sort of like the film industry's yyy-555-xxxx phone number convention).
In this example, the reason that an AOL dialup user can send a packet to 10.10.20.1 (for example) is
that the ISP (AS 64512) advertised that route to the two upstream providers (AS 4969 and AS 701),
who in turn advertised that route to AS 690 (ANS, which provides IP service for AOL).
Every IP address that you can get to on the Internet is reachable because someone, somewhere, has
advertised a route that covers it. The corollary to this is that if there is not a generally-advertised
route to cover an IP address, no one on the Internet will be able to reach it.
HARDWARE AND SOFTWARE FOR SPEAKING BGP

The most commonly used implementations of BGP are Cisco routers, Bay routers, and PC clones
running Linux, BSD, or some other Unix variant - and a program called gated to manage BGP.
I recommend using Cisco routers (for many reasons). In particular, the Cisco implementation of BGP
is relatively easy to use, get examples for, and debug - and there's a huge community of routing
engineers that's familiar with the Cisco implementation and algorithms (there's much that isn't
specified in the RFCs and is left up to the vendor to decide). Cisco's online documentation
(UniverCD) isn't the best (it lacks a large number of case studies) but is a very good learning tool.
PC-compatibles using gated are either the second- or third-largest community of BGP-speaking
computers. You can build cheap PC routers that route Ethernet and t1 and have more than enough
CPU and memory to handle all the routes you'd need for quite some time - but you've then got
hardware that's not really as tested or reliable as a Cisco or Bay router. Trust me on this - the cost
savings is usually not worth doing it this way. (Apologies to Riscom and ET, the leading vendors of
T1 cards-for- PCs).
4 of 26 04/02/00 15:05
Bay routers are the second-largest community of BGP-speaking boxes - but we're talking about a very
small percentage of the number of BGP- speaking Ciscos out there. Bay is cheaper than Cisco; pretty
responsive to customers (though Cisco is as well); and almost all configuration is done through a GUI
(windowing) interface that drives most routing engineers nuts. Bay claims they're working on a
command-line interface, (BCC, or "Blatant Cisco Clone"), but in the mean time most are throwing
money at Cisco. (It's much easier to debug BGP or other routing problems from a telnet session or
over the phone than it is to have to guide someone through a GUI to examine or reconfigure a router).
On the other hand, the Bays do have a better architecture and are finally showing themselves to be
more or less as stable as Ciscos. What I've seen of BCC looks quite promising, and I promise to
retract in print my slam of Bay when their command line interface looks featureful, fast, and solid.
We're going to talk about Cisco routers in these documents (and in this document in particular).
PEERING SESSIONS AND ASNs: PART I

There's a bunch of terminology associated with BGP. We already talked about Autonomous Systems
(ASs). An ASN, or Autonomous System Number, is just that - a number used to represent that
Autonomous System to the world. That number "identifies" your network to the world. Except for
Sprintlink, most networks out there use (or at least show to the world) only one ASN.
BGP-speaking routers exchange routes with other BGP-speaking routers via peering sessions. At a
technical level, this is what it means to "peer with someone". A snippet of a Cisco "BGP clause" is:
router bgp 64512

neighbor 207.106.127.122 remote-as 701
(omitted lines)
(omitted lines)
The "clause" starts out by saying "router bgp 64512". This means "What follows is a list of
commands that describe how to speak BGP on behalf of ASN 64512". 64512 is also a "reserved"
number - it's a number in the "reserved" section of ASNs (ASNs go from 1-65535).
In order to bring up a "peering session", all you need to do is have that one line. In this example,
137.39.10.46 is the remote IP address of a UUNET router (UUNET is ASN 701). Remote, that is,
with respect to the customer's router. 207.106.127.122 is the remote IP address of a Net Access router
(Net Access is ASN 4969). See Fig 1 for a diagram of the network layout used in this example.
In practice, however, you almost always use more than that one line to tell BGP how to exchange
routes with that "neighbor" via that "peering session". A typical "neighbor clause" is:
router bgp 64512

(omitted lines)
neighbor 207.106.127.122 next-hop-self
neighbor 207.106.127.122 send-communities
neighbor 207.106.127.122 route-map prepend-once out
neighbor 207.106.127.122 filter-list 2 in
(omitted lines)
WHAT DO YOU DO WITH BGP?
5 of 26 04/02/00 15:05
Speaking BGP to your provider(s) and/or peers lets you do two things:
Make (semi-)intelligent routing decisions (decide what is the "best" path for a particular route to take
outbound from your network, as opposed to simply setting a default route from your border router(s)
into your provider(s)), and, more importantly,
Announce your routes to those providers, for them to in turn to announce to others (transit) or just
use internally (in the case of peers).
PEERING SESSIONS
The purpose of the "neighbor" clauses is to bring up "peering sessions" with neighbors. For the
purposes of this document, all neighbors must be either on the other end of a leased-line from you -
or on a LAN interface (Ethernet, Fast Ethernet, FDDI). It is possible to have BGP peering sessions
that go over multiple "hops" - but "eBGP multihop" is a more advanced topic and has many potential
pitfalls.
Every time a neighbor session comes up, each router will evaluate every BGP route it has by
running it through any filters you specificity in the "neighbor" clause. Any routes that "pass" the
filter are sent to the remote end.
While the session is up, "BGP Updates" will be sent from one router to the other each time one of the
routers knows about a new BGP route or needs to "withdraw" a previous announcement ("promise").
The "sho ip bgp summ" command will show you a list of all peering sessions:
brain.netaxs.com#sho ip bgp summ

BGP table version is 1159873, main routing table version 1159873
44796 network entries (98292/144814 paths) using 9596344 bytes of memory
16308 BGP path attribute entries using 2075736 bytes of memory
12967 BGP route-map cache entries using 207472 bytes of memory
16200 BGP filter-list cache entries using 259200 bytes of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State

205.160.5.1 4 6313 0 0 0 0 0 never Active
207.106.90.1 4 64514 1145670 237369 1159873 0 0 4d03h
207.106.91.5 4 64515 6078 5960 1159869 0 0 4d03h
207.106.92.16 4 64512 6128 6782 1159870 0 0 4d03h
207.106.92.17 4 64512 5962 6894 1159870 0 0 10:08:46
206.245.159.17 4 4231 161072 276660 1159870 0 0 2d05h
207.44.7.25 4 3564 6109 310292 1159867 0 0 22:40:50
207.106.33.3 4 64513 164708 724571 1159866 0 0 3d23h
207.106.33.4 4 3564 6086 274182 1159853 0 0 4d03h
207.106.127.6 4 6078 5793 310011 1159869 0 0 2d03h
This is a session summary from one of Net Access's core routers. The 6451X ASes are BGP sessions
to other Net Access routers (using confederations, which we'll talk about in a future document) -
those ASNs are not shown to the world.
Most of it is pretty self-explanatory; briefly:
The "V" column is the BGP version number. If it is not 4, something is very wrong! BGP version 3
doesn't understand about Classless ("CIDR") routing and is thus dangerous.
The AS column is the remote ASN.
InQ is the number of routes left to be sent to us.
OutQ is the number of routes left to be sent to the other side.
6 of 26 04/02/00 15:05
The Up/Down column is the time that the session has been up (if nothing is in the State column) or
down (if something is).
Anything in the State column indicates that the session is not up. Note: A State of Active means that
the session is inactive. Just one of the nomenclature flaws of BGP.
More on all of this below.
eBGP vs. iBGP

We're talking about eBGP in this document. eBGP and iBGP share the same low-level protocol for
exchanging routes, and also share some of the algorithms, but eBGP is used to exchange routes
between different Autonomous Systems, while iBGP is used to exchange routes between the same
Autonomous System. In fact, iBGP is one of the "interior routing protocols" that you can use to do
"active routing" inside your network. We'll talk more about iBGP in a future document when we
cover all of the major interior routing protocols: OSPF, iBGP, IS-IS, RIP, RIPv2.
The major difference between eBGP and iBGP is that eBGP tries like crazy to advertise every BGP
route it knows to everyone - you have to put "filters" in place to stop it from doing so. iBGP is
actually pretty difficult to get working because it tries like crazy not to redistribute routes - in fact, all
iBGP-speakers inside your network have to peer with all other iBGP "speakers" in order to make it
work. This is called a "routing mesh" and, as you can imagine, is quite a mess. If you have 20 routers,
each router has to peer with every other router. The solution to this is "BGP confederations", also a
topic for a future document.
Also, iBGP has major drawbacks as an IGP. The main one is the necessity to "peer up" every set of
routers in your network (or in one POP if you're using confederations). Protocols like OSPF and IS-IS
just "find" each other over serial and Ethernet interfaces (they're "broadcast" protocols). This can be a
pain (you don't want to accidentally merge your IGP with a customer's or peer's) but turning off
broadcasting on certain ports is easier than turning on peering sessions between a new router and
every other router on your network. Also, iBGP doesn't do as good a job at "convergence" (closing
the gap and re-routing around failed network segments) as OSPF and IS-IS.

When you have one upstream provider, it is rarely desirable to speak BGP to them. Why? Well, you
only have one path out of your network. So filling your router with 45,000 BGP routes isn't going to
do you any good, since all of those routes point to the same place (your one upstream provider).
And if you have one upstream provider, it's almost guaranteed that you are using sub-allocations
(CIDR delegations, to be precise) of their larger IP blocks ("aggregates"). In this case your provider is
not going to advertise your more "specific" routes because:
It's pointless to waste slots in thousands of routers around the world - if you are in your provider's
address space, other networks will get to you just as well by following the announcements of the
aggregate blocks as if they also saw your more specific routes being advertised. For example, if you
are using 207.106.96.0/20 out of your provider's 207.106.0.0/16 netblock, having the 207.106.96.0 be
"out there" is redundant, since the 207.106.0.0/16 route covers that space as well. The only way to
reach you is going to be through your provider - whether the outside world sends a packet to that
provider based on a 207.106.0.0/16 or 207.106.96.0/20 route makes no difference - the packet still
goes to the same place. So the world would prefer to not see that 207.106.96.0, since it takes up an
7 of 26 04/02/00 15:05
extra slot in the global routing tables. (Hearing another "view" of a route takes up almost 10 times
less memory than hearing another route. And only the a route of the same specificity can be
considered another "view" of a route.)
If there's always one and only one path to your network, your provider should always advertise your
routes (specific or in the aggregate) to minimize CPU consumption on routers world-wide due to
"route flap". Also, enough routers out there severely penalize you if your route(s) "flap" that you
want your provider to always advertise you (and thus not make internal instability reflect itself on a
global level). Why? If your T1 goes down and your provider is advertising you as 207.106.96.0/20,
they have to withdraw that routing assertion. If you go up and down enough times to flap, you'll be
"black-holed" from large sections of the Internet. But if you're behind 207.106.0.0/16, you won't be
black-holed unless your provider flaps their /16 announcement (which should in theory be less likely
- if it isn't, choose another provider).
AS-PATHS
Every time a route is advertised via BGP, it is "stamped" with the ASN of the router doing the
advertising. As a route moves from Autonomous System to Autonomous System (network to
network), it builds up an "AS-PATH". Each route starts out with a "null AS-PATH", represented by
the regular expression "^$". See Fig 1 - the blocks that show the routes as they move from hop to
hop show you the AS-PATH accumulating as the route moves from network to network.
The AS-PATH is useful for a number of reasons:
It provides a "diagnostic trace" of routing on the 'net. If you have "full routes" in one of your routers,
or have "query access" to a router that does (such as telnet://route-server.cerf.net), you can find the
route that encompasses a particular IP address and see which ASNs have advertised it. If you do some
poking around, you can even see how a provider is actually connected (as opposed to what they might
claim...)
It is one of a number of metrics that determine how routes "heard" via BGP are inserted into the
actual IP routing table.
It is something that allows you to do "policy routing" of sorts (though policy routing has many
different definitions, so watch out) - basically, you use the AS-PATH to filter routes. Why would you
want to do this? Perhaps you only want to take UUNET, MCI, and ANS route from one provider
(because of limited memory in your router). Or perhaps you want to make sure you only send routes
originating in your network. There are many reasons (which will become clear as you read on) why
you'd want to filter based on the AS-PATH. While it's true that most filtering is now done with
communities (a community is another number which you can stamp on a route heard or to be
announced via BGP - we'll go into communities shortly), AS-PATH filtering the best "first step" that
you can work with to get comfortable with filtering routes. And if your network is fairly simple (as
90% of the networks out there are), you won't need anything fancier for quite some time.
AS-PATH LENGTH AND BGP ROUTE SELECTION

For routes of the same specificity, as-path length is going to be the deciding factor in choosing which
of multiple routes gets used by the router (i.e. put into the IP routing table) when you're just starting
out.
8 of 26 04/02/00 15:06
See Fig 2 for a sample list of routes from an actual BGP routing table - and further explanation.
Notice, though, the >'s to the left of the some of the routes. The ">" indicates the route that the router
currently thinks is "best" when there are multiple choices.
Fig 2.
A SNIPPET OF A BGP ROUTING TABLE
COMING SOON TO A TUTORIAL NEAR YOU.
AS-PATH ACCESS LISTS (FILTERS)

We'll use Cisco commands to illustrate AS-PATH filtering and "regexp matching". Each line of a
Cisco AS-PATH filter looks like:
ip as-path access-list NNN permit regexp
or:
ip as-path access-list NNN deny regexp
Where NNN is the number (same as the name in the case of as-path access-lists), and regexp is very
similar to Unix "regular expressions". (See Fig 3 for a summary of regexp characters, and the
O'Reilly and Associates Regexp book for more information about regular expressions).
Fig 3
Regexp characters:
NNN match the characters NNN (where each digit of NNN is

from 0-9)
^ match the beginning of a string
$ match the end of a string
_ match any of {space, beginning of a string, or end of a string}
_NNN_ match the "word" or "distinct number" NNN. Thus, the regexp
"_1_" will match the string "3561 1 64000" but not "3561".
(The problem is that if you don't anchor NNN with "_"s on
either side, you might match something you don't really want to).
(regexp) enclosing another regexp in parens means that the appearance of that
regexp is optional
* the * operator means that the previous regexp can be matched
0, 1, 2, or any number of times. To be safe, only use * in
conjunction with parens.
Thus, (regexp)* matches the regexp inside the parens 0 or
any number of times.
[char1char2char3] matches any one of char1, char2, char3, etc...

Each charN expression can be an actual number or other
symbol, or a range (i.e. 0-9, a-z).
If you want to match any of the special symbols, you can escape them by
putting a \ in front of them. The only special symbols you'll want to
escape when matching against AS-PATHs are the parens, which pop up in
9 of 26 04/02/00 15:06
AS-PATHs when you use BGP confederations.
We'll explore regular expressions and as-path access-lists by example. Remember the first rule of
Cisco access-lists: There's an implicit deny .* at the end of every access list. Even so, it never hurts
to add one just to be safe (we'll do that below).
Important note: On Ciscos, regexps are matched against the AS-PATH as if the whole thing is a
string, not a sequence of numbers. Thus, as you'll see below, you need to enclose ASNs within
underscores to be sure of matching only the ASN you're looking for.
How do access-lists work? When used as a filter, each route is passed through the access-list. Each
rule is listed in the order it will be applied. Once a route has been matched by any rule, the
decision on whether to pass the route through the filter or to drop it (and thus not let it pass) is
made immediately, and no further rules are processed.
Example 1:
ip as-path access-list 1 permit .*

ip as-path access-list 1 deny .*
This is a good one to have around; it permits every route to flow through the filter. The "deny .*" is
completely extraneous to the filter - every route has already passed through the first line and the
second line is never actually used.
Example 2:
This is also a handy one to have around; you might well want to always remember the number of this
"deny everything" access-list - the opposite of the "permit everything" list above.
Example 3:
ip as-path access-list 3 permit ^$

This access-list is the other of the triad of ever-handy ones: It permits only routes that originate
within your AS (because of network statements or "redistribute" statements in "router bgp" clauses
somewhere within your network).
If you have these three as-path access-lists installed and remember their numbers you'll save yourself
a lot of time you'd otherwise spend searching online or through config files to find where you put
your "send everything"; "send nothing"; or "send only my routes" filter.
Remember: BGP between different ASNs (eBGP) will, by default, cause a router to redistribute every
BGP route that the router knows about. This could lead to VERY BAD THINGS happening. (If you
redistributed all of Sprintlink's routes into UUNET, a portion of UUNET could start sending all of its
Sprintlink traffic through your t1 and you'd hurt a reasonable chunk of the Internet. Both Sprintlink
and UUNET do things to prevent you from doing this, but you should always be paranoid when
dealing with BGP.)
Again, the "deny .*" rule is useless here, except as a safety precaution, since the router would insert
that rule anyway (remember, there's an implicit "deny .*" at the end of every Cisco filter list).
A quick note: For those playing with BGP confederations on your own (a topic we'll talk about in a
10 of 26 04/02/00 15:06
future document) note that your "permit internal routes only" filter might have to look something
different ("permit ^$" will no longer be enough) - something like: "ip as-path access-list 30 permit
^($[0-9 ]*$)*$". Or you'll be using BGP communities instead of AS-PATH filtering to control which
routes you redistribute Everyone else please ignore this paragraph, unless you want to try to parse the
regexp above as an exercise.
For Examples 4 and 5, please consult Fig 4 for a list of common ASNs you'll see when examining
routes. To find out who "owns" an ASN (funny concept - owning a 16-bit integer), issue a WHOIS
query on "ASN NNN", where NNN is the ASN. Note: You may actually need to put quotes around
the "ASN NNN", especially if you're doing the whois query from a command line.
-----------------------------------------------------------------------------
Fig 4 Common ASNs
3561 MCI
1239 Sprintlink (Sprintlink also uses other ASNs, but 1239 will always
appear somewhere in the AS-PATH when looking at Sprintlink
routes from some other provider)
701 UUNET
174 PSI
1673 ANS (the old ANS ASN, 690, should be retired by now)
1 BBN
4200 AGIS (the old Net99 ASN, 3830, should be retired by now)
4969 Net Access (which will appear in the examples)
There are hundreds of ASNs in use in the Internet, and thousands of ASNs in use in internal networks
all over the world. If you want to take a look at live ASN info, check out
http://www.merit.edu/ipma/routing_table or telnet to route-server.cerf.net, a Cisco that cerf.net loads
with multiple full BGP routing tables.
-----------------------------------------------------------------------------
Example 4:
ip as-path access-list 20 permit _1_
The _NNN_ notation means "match NNN as a distinct word". This means that NNN must have
whitespace on either side of it (or must be the first or last word - or both - in the AS-PATH).
"_1_" would match "1"; "3561 1 6000"; and "3561 1" - but not "701". (ASN 1 is used by BBN, which
has a bit of history in the Internet...)
So - this as-path access list permits, in order, BBN, UUNET, PSI, ANS, and AGIS routes, and denies
all other routes. If you had a Cisco 2501, you might want to do this to accept some routes from one of
your providers in an attempt to load-balance traffic a certain way (perhaps you've noticed that
provider B gets better BBN connectivity than provider A...
Example 5:
ip as-path access-list 20 deny _3561_

This filter denies any MCI or Sprintlink route, and permits all other routes. As of 4/97, this should
yield about 45,000 routes.
11 of 26 04/02/00 15:06
This will fill up a 2501 with absolutely all of the routes it can take and still function well. It used to
be that all routes on the 'net fit in a 2501 with 16mb - and that the 2501 could still function. Then, the
routes would fit in but the 2501 didn't have enough CPU. Now, all of the routes on the 'net except for
MCI, Sprintlink, or both will fit in a 2501 and still let it function at at least a single t1's worth of
throughput.
ENTERING, MODIFYING, AND DELETING as-path access-lists

The major reason we usually append an explicit "deny .*" at the end of as-path access-lists (actually,
all filter-lists in Ciscos) is that if you already have an as-path access-list of a certain number (say,
"as-path access-list 3" above), and you try to re-enter it, the Cisco has no way of knowing that you
want to delete the old list.
So, as a security blanket, appending an explicit "deny .*" to a list ensures that you will at least not be
able to modify an existing list's functionality.
Let's say you had:
And then you configured (perhaps as a typo, perhaps as a brain-o):
You would alter the functionality of an existing filter list and potentially start redistributing
Sprintlink routes to your peers and/or upstream providers.
But if you had:

Then adding a third rule of:
Would have no effect, since every route would either be permitted or denied by the time the router
had finished evaluating the second rule (the "deny .*") and the third rule would never be looked at.
So, to modify an existing access list, either:
Enter a new list with a different number; modify the "router bgp" clause's "neighbor a.b.c.d filter-list
NNN in" clause by just typing "neighbor a.b.c.d filter-list new-number in" (use the same method for
outbound as-path filter-lists). Then, replace the old as-path access-list and change the "neighbor
a.b.c.d filter-list ..." clause back to its original state. This is the safe way to do things; or:
If you know what you're doing, you can just enter "no ip as-path access-list NNN" to delete the list,
then enter the new list (preferably via cut-and-paste or tftp, as opposed to simply typing the new list
in, since any filter that refers to that list will be in a "deny .*" mode until the new list is in place.
Please use the first method. If you have anything but "permit" clauses in your access-lists, you can do
damage (redistribute routes you shouldn't) by not using the first method.
12 of 26 04/02/00 15:06
BGP METRICS (ATTRIBUTES) AND ROUTE SELECTION:

INTRODUCTION
First, remember the primary rule of IP routing: The most specific route always wins.
There are, however, rules for how a Cisco will select the "best BGP" route when there are multiple
BGP route possibilities of the same specificity.
It goes (basically):
Route specificity and reachability and reachability

BGP weight metric
BGP local_pref metric
Internally originated vs. Externally originated
AS-PATH length
BGP metric (MED) BGP weight, MED, and local_pref metrics are just integers associated with each
route. They can be unset (zero) or can be set. Unless you set them yourself, it's unlikely that you'll
have to worry about them.
For "competing" BGP routes, the most likely way the router's going to pick the best route (if you
aren't playing games with weights) is by looking at the AS-PATH lengths.
BGP PATH SELECTION PROCESS ACCORDING TO CISCO

It is:
"BGP selects only one path as the best path. When the path is selected,
BGP puts the selected path in its routing table and propagates the path to
its neighbors. BGP uses the following criteria, in the order presented, to
select a path for a destination:
1. If the path specifies a next hop that is inaccessible, drop the update.
2. Prefer the path with the largest weight.
3. If the weights are the same, prefer the path with the largest local
preference.
4. If the local preferences are the same, prefer the path that was
originated by BGP running on this router.
5. If no route was originated, prefer the route that has the shortest
AS_path.
6. If all paths have the same AS_path length, prefer the path with the
lowest origin type (where IGP is lower than EGP, and EGP is lower than
Incomplete).
7. If the origin codes are the same, prefer the path with the lowest MED
attribute.
8. If the paths have the same MED, prefer the external path over the
internal path.
9. If the paths are still the same, prefer the path through the closest
IGP neighbor.
10. Prefer the path with the lowest IP address, as specified by the BGP
router ID."
13 of 26 04/02/00 15:06
In addition to the "core" data about a route (where in the IP space it starts; how long it is (the
"specificity"); and what the next hop is, there is other data embedded in BGP routes, most of which
are either used for route selection or for additional debugging information for humans.
Fig 8: BGP attributes
For more info, see:
RFC 2042: Registering New BGP Attribute Types

RFC 1997: BGP Communities Attribute
RFC 1773: Experience with the BGP-4 protocol
RFC 1771: A Border Gateway Protocol 4 (BGP-4)
To get an RFC, go to: http://www.internic.net/rfc/rfcXXXX.txt
BGP ATTRIBUTE TYPES
Value Code Possible Values

---- ---------------- -----------------------------------------------
1 ORIGIN 0 (IGP); 1 (EGP); 2 (Incomplete)
This attribute specifies the origin of a route.
Straightforward except that "Incomplete" means
that the route got into BGP by redistribution from
an IGP.
2 AS_PATH 0-N 2-byte values
A list of the ASNs of all ASs the route has traversed.
3 NEXT_HOP IP Address
The most critical attribute; where to send data destined
for this route.
4 MULTI_EXIT_DISC 0-2^32
A weight; designed to go outside and inside of an ASN.
5 LOCAL_PREF 0-2^32
A weight; not designed to go outside of an ASN.
6 ATOMIC_AGGREGATE TRUE/FALSE: If present, true; otherwise, false.
Present if this route was not the most specific one
known by the advertiser. Dangerous stuff.
7 AGGREGATOR {ASN,Ip address} pair.
Data to indicate who formed the route if the route
is an aggregate of smaller routes.
8 COMMUNITY 0-N 4-byte values ("communities")
To be covered in a future document.
9 ORIGINATOR_ID Used for BGP Route Reflection
10 CLUSTER_LIST Used for BGP Route Reflection
The rules above are fairly straightforward, but use some of the route attributes that we'll be getting
into in more detail in the future.
Briefly:
(Rule 2)
If you don't set them explicitly, BGP weights are 32768 for routes originated by, and 0 for routes
coming from other routers. The BGP weight is not actually an attribute (in that it's not redistributed
from one router to another as part of a BGP route update). A higher weight is "better" (means the
route will be preferred over a route with a lower weight).
(Rule 3)
The local_pref is a BGP attribute, and is set to 100 by default. Again, a higher weight is better.
14 of 26 04/02/00 15:06
(Rules 2-3,5)
Setting weights and local_prefs gives you some control over "routing policy", but for beginners,
filtering based on AS-PATH data should be more than sufficient.
(Rule 6)
Origin isn't something you get to play around with. IGP means a route was injected into BGP with a
"network" statement; EGP means it was heard via BGP from a remote AS; and incomplete means it
was injected into BGP by "redistributing" from an IGP.
(Rules 7-8)
A MED (or "BGP metric") is Yet Another Weight you get to play with. We use MEDs internally at
Net Access to tune things (because we prefer to let the router first pick the route with the shortest
AS-PATH, and BGP weights and local_prefs are looked at before AS-PATH length). Again, you
typically won't be setting this until you have worked more with BGP.
(Rule 9)
If you run "active routing" internally (an IGP other than static routes), there's some notion kept with
each route of the "distance" for each route as it's passed around your network. Let's say you have two
border routers and you're selecting between two equal-specificity, equal- AS-PATH-length, routes -
one from each border - and that no weights, local_prefs, or MEDs have been set. This rule ensures
that the router will do what is most natural - to send the packet towards the closest router of the two
routers advertising the route. We'll explain this more, with diagrams, in a future document, as it
involves an understanding of how IGPs such as OSPF and IS-IS function.
(Rule 10)
Now we're down to guessing. There has to be some tie-breaker, and since BGP router ID should be
unique, Cisco chose to make this the final factor.
For further reading, see for more details.
We'll be talking about using these metrics in the near future. If you want to experiment in the
mean-time, that document shows you how to set these metrics. Please experiment first on test or lab
networks! If you've got proper filters in place, experimenting with these things won't affect the
outside world - but it could make your customers very unhappy...
Another very big caution: BGP weights and local_prefs are very powerful. Realize that if you
advertise routes for a customer that you hear via BGP, you could wind up preferring an external route
for that customer if you set the BGP weight or local_pref too high (or at all) for external routes. The
customer won't like this - if you prefer an external route for that customer, you're not going to
advertise them to your transit providers any more, which will probably not please that customer...
EGP vs. IGP

EGP usually means "External Gateway Protocol". IGP usually means "Interior Gateway Protocol",
though it can get confusing, because different people and vendors use different terminology for the
same thing. Since I am a Cisco proponent, these documents use terminology used by the routing
community, with a Cisco dialect.
Routers which route IP packets have to have an "IP routing table". In that table are one or more routes
of a particular {starting point, length, metric}. This IP routing table gets filled with routes heard from
various sources - or configured statically (in the router's configuration store). BGP routes migrate into
the IP routing table only if:
15 of 26 04/02/00 15:06
They are more specific that any other route of "lower preference"; or
They are the only route of a particular specificity.
Here's a brief outline of the "order of preference" for filling the IP routing table. The exact order can
be found in the Cisco documentation.
Connected routes (IP addresses and routes of router interfaces) first; then
Static routes (routes configured in router configurations with 'ip route' statements); then
Routes learned via an IGP (RIP, RIPv2, OSPF, IS-IS, ...); then
Routes learned via BGP and other EGPs.
One note, though: Since static routes are really considered an "IGP" routing mechanism, there are
ways to get other IGP-learned routes (say, via OSPF) to be preferred over static routes, but again - if
you don't play with weights, this shouldn't be a worry.
WHAT IS ROUTE FLAP AND WHY IS IT BAD?

When you "assert" a route - saying "I know how to get to 192.204.4.0/24" based on some internal
knowledge that you actually do know how to get to 192.204.4/0, the natural (and
previously-though-to-be-correct-thing-to-do) is to "withdraw" that assertion if you in fact no longer
know how to get to 192.204.4.0.
But look at what happens when you withdraw that assertion. Your provider(s) must then also
withdraw that assertion. And then their provider(s) and peer(s) must do the same. All in all,
thousands of routers around the world now have to look at that route and decide if they have a
next-best path in their BGP (or other routing) table, and insert it as the current best path in their IP
routing table. This consumes many CPU-seconds on routers that are sometimes very busy.
In fact, it was consuming so much CPU time a few years ago that Sean Doran of Sprintlink said "this
must stop" and a few people came up with an idea (which Cisco implemented in record time) to
"damp"(en) the "route flap"s. You'll hear people say "damp" and "dampen". There's no real consensus
about which is the correct term.
What this means in practice today is that if your routes flap more than one or two complete
up-down-up cycles, you will be dampened by many providers for at least an hour or so. So even if
you're only "single-homed", you will be dampened if your provider withdraws your routes every time
your t1 flips up and down a few times because some Bell guy tripped over a wire.
So do not ask your upstream provider to announce you unless it makes a difference (the benefit of
being multiply-announced outweighs the possible negative effects of being dampened due to
instability in either your or your provider's network).
WHAT TO KEEP IN MIND WHEN CONFIGURING BGP

When you're bringing up a new BGP session, or considering how to do BGP in general, the things to
keep in mind for each peer are:
What routes do you want them to hear? Do you want to "tune" your announcements somehow (more
on this later). The most important thing is to ensure that you do not redistribute routes that you are
16 of 26 04/02/00 15:06
not providing "Internet connectivity" to; and
What do you want to do with the routes that you hear via the session? Do you want to "tune them"?
Only take some? Take them all? Do you have the memory and CPU in your router to really do what
you want?
BGP AND PEERING

Actually, we'll devote a whole document to this in a month or two.
What we're talking about in this document is BGP and transit - getting global transit from upstream
providers as opposed to peering, which is just mutual sharing of customer routes.
INTERNET CONNECTIVITY WITHOUT BGP

Let's review what happens when you are connected to the Internet without speaking BGP to your
provider.
You create a default route towards your upstream provider, and all non- local packets go out the
interface specified by the route; and
Your provider probably put static routes towards you on their side, and redistributes those static
routes into their IGP, and then probably redistributes their IGP into BGP - unless all of their BGP is
done statically (more on this in a future document).
Basically, if you have any address space "inside" of your provider's larger "netblock" or "aggregate",
you won't be advertised to the outside world specifically - your provider will just advertise their
larger block. If you have any other networks (an old Class C; customers with address space; etc...)
your provider will just statically announce those routes to the world and statically route them inside
their network to your leased-line/ router interface(s).
With BGP, your provider gives you all of the routes they have (the easy part), and listens to your
route announcements and then redistributes some or all of those to their peers and customers. This is
the hard part (for them - just worry about understanding and configuring your end for now). The net
difference is "just" that they may start advertising a more specific route (no mean task in a
complicated network designed, as most networks are, to prevent the accidental "leaking" of more
specific routes) or that the routes that they normally advertise for you under just their ASN will now
have your ASN attached as well.

If you've only got one upstream provider, why speak BGP to them? Well, you could say "practice",
but in general, no upstream provider's going to waste their time configuring BGP with you (since it
generally involves a fair amount of behind-the-scenes work on their part) unless you have a good
reason.
And you don't really need "full routes" so that you can "run defaultless" if you're single-homed. Since
every packet destined for the Internet (as opposed to your internal network ) is going to go out the
17 of 26 04/02/00 15:06
same router interface, it doesn't matter whether it's via one default route or via searching a list of
45,000 or more routes heard via BGP.
The only really valid reason is that you want to be able to have more control in advertising your
routes. Of course, you'll have to argue around the flap argument even if you have your own
provider-independent address space (if you're singly- connected to the 'net, why bother all of the
routers in the world by telling them whether you're reachable or not currently) and the routing-table
space argument (if you're in your provider's IP space or "aggregate announcement"), why pollute the
routing tables with an extra few routes by announcing your routes more specifically?
You're on your own for the answers to these questions. If you think you have a good case, either talk
to your current or potential provider, or perhaps send a question off to the inet-access list and see if
anyone can help.
If you do want to configure BGP and are single-homed, follow the instructions on how to announce
your networks (routes), and either filter all incoming routes - or accept them if you feel you really
want to.
BGP AND THE MULTI-HOMED

OK, so you're multi-homed. What is the most important thing about BGP to you? The ability to have
it announce routes. Getting "full" or "partial" routes from your providers is "cool" - and may even be
useful - but you can do almost as well by just load-balancing all outgoing traffic in either a
"round-robin" or "route-caching" manner. (More on this later in this document).
So the most important thing about being multi-homed is the ability to have your routes advertised to
your providers - and by them to their providers and peers (i.e. to "the rest of the Internet"). Doing this
basic level of route advertisement is not hard. You just have to do it in a paranoid way.
If you screw up BGP routing you may get slapped down pretty hard. Screwups with BGP route
advertisements can be felt all over the Internet. To repeat: Screwups with BGP route
advertisements can be felt all over the Internet. If your provider is smart, they will also implement
"filters" to prevent you from screwing them and the Internet up. But don't count on it.
If you were to announce a route that was more specific than, say, the otherwise-best route for Yahoo's
web servers, you would black-hole Yahoo for a period of time. Needless to say, they would not be
very happy with you. The solution is to do good filtering on your end - and for your provider to also
do excellent filtering wherever possible.
Before you start playing with BGP, you might really want to wait and read the "Configuring a Cisco
Router" document (also coming out in the next few months). If you do go ahead and are
implementing BGP for the first time, get a friend or another provider to review your proposed configs
for you before implementing them. And for a summary of BGP-related Cisco commands, see the BGP
Cisco Commands sidebar.
MULTI-HOMING AND LOAD-BALANCING

Generally, the goal of multi-homing is to use both connections in a sane manner and "load-balance"
them somehow. Ideally, you'd like roughly half the traffic to go in and out of each connection. You'd
also like "fail-over" routing, where if one connection goes down the other one keeps you connected to
18 of 26 04/02/00 15:06
the Internet. In an ideal network, you'd be able to have any one of your connections to the 'net go
down and still maintain connectivity and speed.
We'll talk a bit about how you load-balance incoming and outgoing traffic to and from your network.
Incoming traffic is controlled by how you announce your routes to the world (packets will flow into
your network because someone out there heard and is using a route announcement). Outgoing traffic
is controlled by the routes that you allow to flow into your border router(s) - and is thus much easier
to control and tune.
HOW TO ANNOUNCE YOUR NETWORKS

We'll now describe the safest way to announce your routes via BGP.
There are many other ways, some of which we'll talk about in future document. The way we at Net
Access do it is by redistributing from our IGP (IS-IS), through a filter list, into BGP. While we do run
BGP inside our network, it's strictly to pass external route announcements through the various parts
of our network - no internal routes are ever passed from one of our routers to another one of our
routers with BGP. But when we first started speaking BGP, we set our routers up the way described
below.
You'll always set "next-hop-self" on all peering sessions. See the sidebar on next-hop-self for an
explanation.
The safest way to announce your routes with BGP is to configure everything statically. You can think
of the process described below as turning networks into route announcements.
To do this:
Add a static route for it to the Interface Loopback0 with a weight higher than any other static route
for that network (higher numbers for static route weights mean that the routes are less preferred).
Configure a router BGP clause like the one below, with static network statements to announce your
routes, and "sanity filters" in place to make sure you only announce your routes and only take the
routes you want.
For example, let's say you're routing the following networks (also called "netblocks" sometimes):
170.100.0.0/16 (a /16 has a netmask of 255.255.0.0) 192.204.44.0/24 (a /24 has a netmask of

255.255.255.0) 206.8.128.0/17 (a /17 has a netmask of 255.255.128.0) 207.126.0.0/18 (a /18 has a
netmask of 255.255.192.0)
You'd first configure your router with:
int Loopback0
descr Loopback interface for routes to be nailed to.
ip route 170.100.0.0 255.255.0.0 Loopback0 10
Then:

19 of 26 04/02/00 15:06
router bgp 64512

network 170.100.0.0 mask 255.255.0.0
network 192.204.44.0 mask 255.255.255.0
network 206.8.128.0 mask 255.255.128.0
network 207.126.0.0 mask 255.255.192.0
neighbor remote-as
neighbor next-hop-self
neighbor filter-list 3 out
neighbor filter-list 2 in
Explanation:
This method "statically nails down" the route announcements being advertised with the "network"
statements. In order to nail them down, there must be: (1) Underlying static routes with the same
netmask as each route being advertised with a network statement; and (2) Those underlying static
routes must not go away. The purpose of the Loopback0 routes is to ensure that even if an existing
primary route which matches the netmask of the route being announced (and this is often not the
case) goes away, the Loopback0 route (with a weight of 10, which means it's only a "backup" route to
any route without a weight at the end) will kick in and keep the BGP route advertisement stable.
(Loopback0 routes always stay installed since there's no physical interface to go down and cause the
route to be withdrawn - the interface Loopback0 will always be up, so the routes pointed to them will
always be installed.)
This example uses a "deny everything" incoming filter, so it will only announce routes - it won't
accept any. If you want to accept all incoming routes, replace the "filter-list 2 in" with "filter-list 1
in". Actually, you could just not specify an "inbound as-path filter" - and the effect would be the same
- but it's better by far to be explicit about these things.
To add more peers, just create another similar neighbor statement. Ciscos give you 30 seconds to
finish typing the neighbor statement before they start trying to establish the session. It is critical that
you get those "neighbor somebody filter-list xxx .." statements in there by then. The best way by far
to do it is to either cut and paste or tftp in a complete neighbor statement to the router.
Here's an example of a completely filled-in bgp clause, based on the example above (note that the
64512 is a fictitious IP address).
router bgp 64512

network 170.100.0.0 mask 255.255.0.0
network 192.204.44.0 mask 255.255.255.0
network 206.8.128.0 mask 255.255.128.0
network 207.126.0.0 mask 255.255.192.0
neighbor 207.106.127.45 filter-list 3 out
neighbor 137.10.10.121 filter-list 3 out
BEING ADVERTISED BY MULTIPLE PROVIDERS WITHOUT PI-SPACE

Remember April 1997's document on getting provider-independent (PI) space? The reason it's so
important to have "your own" ip space is that without it multi-homing is quite tricky and requires a
lot of cooperation from your original provider. Why?
20 of 26 04/02/00 15:06
Let's say you are using 207.106.96.0/20. Your provider (let's call him oldprovider) has
207.106.0.0/16. So oldprovider announces only 207.106.0.0/16 to the world. There is no
advertisement for 207.106.96.0/20 in this case - any packet destined to 207.106.96.0/20 will be
picked up by the less specific (more general) route 207.106.0.0./16.
Now you want to multi-home. So you buy a T1 from newprovider. You set up BGP with both
oldprovider and newprovider. Suddenly, the world sees two routes for you:
207.106.0.0/16, advertised by oldprovider; and 207.106.96.0/20, advertised by newprovider.
Remember, the most specific route always wins, so newprovider will wind up carry almost all, if not
all, of your incoming traffic! In fact, certain parts of oldprovider's network may actually prefer
newprovider's t1 to get to you!
The problem is that most large-ish providers use something called "aggregate-address statements" -
and they certainly have some sort of filter to keep the more specific routes floating around inside of
their networks from being advertised to the world. Remember, the world only wants to hear about
207.106.0.0/16 if the little, more specific routes inside of 207.106.0.0 are not multi-homed.
So what does oldprovider have to do? Blow holes in their "filter". One way or another, it's going to
take modifications in oldprovider's 'border' routers to make incoming load-balancing work properly
for you - and oldprovider may not want to do this. Basically, everywhere that oldprovider peers with
anyone else (and this is usually at least 5-10 places), they have to modify their aggregation statements
or other filters to "allow" your more specific route announcement to pass through.
This is why it's important to choose a primary provider based on how cooperative they'll be when you
want to multi-home.
CONTROLLING OUTGOING DATA FLOW: "FULL ROUTING"

Believe it or not, you don't need BGP to balance the flow of traffic from your network (outbound
traffic). There are many arguments for and against, but it's true that if you are multi-homed and have a
sufficiently studly router (a Cisco 4500, 4700, 70x0, 720x, or 75xx will do, but Cisco 4000s and
2501s will not), accepting full BGP routing from your multiple providers is a Good Thing. See the
sidebar for an explanation of how to balance outbound traffic without BGP.
There are a couple of reasons. First, each provider obviously knows best the way to get to its
customers. Meaning, if you're multi-homed to Sprintlink and UUNET, you always want to send data
to Sprintlink customers out your Sprintlink T1 and data to UUNET customers out your UUNET T1.
Second, though AS-PATH length is a pretty poor selection tool, it's what we've got right now - and it
does bear some relation to an indicator of how "close" a given provider is to some other provider.
So filling your router with routes from all of your upstream providers means that, for routes of the
same specificity, AS-PATH length will decide which one actually gets used. See Fig 7 for examples
and explanation.
CONTROLLING OUTGOING DATA FLOW: "PARTIAL ROUTING":

"CUSTOMER ROUTES ONLY"
If you can't take full routes from your providers, you're going to have to either not use BGP to balance
21 of 26 04/02/00 15:06
outbound traffic - or take less than full routes.
The minimum set of "less than full" routes you'll want to take is customer routes from each provider
(from each provider, get only the routes for them and their customers). This is a problem if your
providers include Sprintlink and MCI, however, since Sprintlink and MCI customer routes together
are such a large percentage of "full routes" that you can't really put Sprintlink and MCI routes in
Cisco 2501s or 4000s either. You should, however, be able to put Sprintlink and any other few sets
of customer routes or MCI and any other few sets in even a 2501 or 4000.
The problem is getting just customer routes (also called "peering routes"). You can tell your
providers to only send you customer routes - and most providers that do a significant amount of BGP
can do this pretty easily - but if any one of your providers screws up (changes a filter list slowly, for
example) then they may blast more than enough routes at you to "melt your router". Unfortunately,
when many brands of routers (Ciscos included) run out of memory, they don't just shut down BGP
routing - or crash and restart. Ciscos, in particular, do not handle running out of memory gracefully at
all, and will gleefully consume so much memory with routing data that basic command functionality
gets trashed and someone needs to physically power cycle the router.
SO WHAT'S TO BE DONE?
Get customer routes from your providers - but put sanity filters in place to protect yourself. For each
provider, build an as-path access-list to use as a filter of what you will not accept from them. Let's say
you're triply-homed to Sprintlink, UUNET, and Net Access. Use something like the following:
(Ciscos use ! at the beginning of a line to denote a comment line.)
! Filter everything but Sprintlink (ASN 1239) from Sprintlink

! Filter everything but UUNET (ASN 701) from UUNET
! Filter the major providers from Net Access
router bgp 64512
neighbor remote-as 1239

22 of 26 04/02/00 15:06
That will ensure that even if Sprintlink, UUNET, or Net Access screw up and blow you all of the
routes they know about, you'll still take their customer routes but won't take the vast majority of
other routes from them. (Sprintlink, MCI, UUNET, ANS, PSI, BBN, and AGIS) make up the vast
majority of routes - well over 80-85% of the routes out there.
Note: If you're a Sprintlink customer, you'll probably be peering with AS 179x - or at least some ASN
other than 1239. Sprintlink uses ASNs for each major POP (as do many other providers) - but unlike
other providers, these ASNs are visible to the outside world. Any non-Sprintlink customer route,
though (any route from the outside world), will still have the ASN 1239 (which is Sprintlink's
"peering" ASN) in the AS-PATH, though. The bottom line is that instead of below you'll have
whatever ASN Sprintlink actually has you peer with.
AS-PATH PADDING
Some people just aren't content to leave things the way nature intended them. Bored routing
engineers are very dangerous. If you don't give them work to do they'll either sit and read news or
Cisco documentation - or start optimizing ("tuning") routing.
AS-PATH padding is probably the most widely-used BGP tuning method, and we'll go into it in more
detail next month.
Basically, if you make sure not to set weights or local_prefs, AS-PATH length is going to decide
which of multiple BGP routes of the same specificity will be preferred. So if you want to make one
path preferred or another one not preferred, you can "pad" the AS-PATH with extra ASNs to make
one path look longer than another. This is done with route-maps, which we'll talk more about next
month.
QUESTIONS AND COMMENTS

I expect that this document will generate a lot of questions. Please do not send them to
freedman@netaxs.com. Please use either the inet-access list, which I and many of my routing-geek
friends patrol regularly, or bgp@netaxs.com. Thanks.
THANKS TO
In no particular order:
Thanks to Alexis Rosen at Panix (alexis@panix.com), who sent me some last-minute suggestions for
clarification and pointed out an ugly factual error. Thanks to John Hawkinson (jhawk@panix.com) of
BBN, who told me about something new called BGP in 1993 at a Science Fiction convention in the
DC area. Thanks to Dave Siegel (dsiegel@rtd.net) who's shared his BGP experience with others since
1995. And thanks to Alec Peterson (ahp@hilander.com) for reviewing this document - and who
23 of 26 04/02/00 15:06
explored some of the more advanced BGP features (oh, the joy of route-maps) using my network
when I didn't have the time.
Sidebar on next-hop-self
If you've followed the "peering and transit" discussions, you may have heard of the "next-hop-self
issue". Here's the problem.
Ciscos keep the originating address of a route intact in the next-hop field when they pass it from
eBGP peer to eBGP peer. (And ditto for iBGP, but we're talking about eBGP here). It turns out that
this behavior is sometimes useful in large networks where there's an IGP running to tell every router
which way to send a packet that says it came from 192.41.177.x (some other provider's MAE-East
router); 192.157.69.x (some other provider's Pennsauken router); etc...
But this is really subtle and can screw you up big-time. In the best case you'll piss someone off (if you
forget to set "next-hop-self" in an exchange-point peering environment. In the worst case you'll cause
routing loops for yourself (examples of this will be given when we talk more about IGPs).
Setting next-hop-self causes a Cisco to override the originating address of a route and stamp instead
its own address as the "next-hop" part of the route.
Remember that the critical parts of a route are: What the base IP address is; how big the route is (the
specificity or netmask); and what destination (next-hop) to use to send data to the IP space
represented by the route.
We'll use an exchange point environment to illustrate next-hop-self. Refer to the figure (XXX) below.
When AS 4969 advertises 250.20.0.0/16 to AS 64500, AS 4969 sets next-hop-self, so the next-hop is
192.41.177.87 (AS 4969's mae-east IP address).
Now, AS 64500 advertises it to AS 64600 (see the top diagram) without next-hop-self. When AS
64600 processes the route and installs it into the IP routing table, the next-hop used will be
192.41.177.87.
But AS 64600 doesn't peer with AS 4969 - yet it's going to send data to a route advertised by AS
4969 - right to AS 4969's router. People generally do not like this. In this case, AS 4969 might
discover this "behavior" by running a few careful probes of other routers at mae-east. AS 4969 would
then look to see how it hears AS 64600 (who is announcing AS 64600 to AS 4969) and see if they're
the culprits. If AS 4969 really wants to, it can find out who the culprit is by passing a bogus route or
two to each peer in turn, and see when AS 64600's router starts using the bogus route.
The solution is for 64500 to use next-hop-self as well (see the bottom diagram). In this case, the route
as heard by 64600 has 192.41.177.NNN (AS 64500's mae-east IP address) in the next-hop field -
though the AS-PATH and certain other fields still show that AS 4969 is the origin of the route. So
when AS 64600 wants to send data to AS 4969 based on this route it'll "bounce the traffic off of" AS
64500's router. Some people don't even like this (since it's a form of providing service to downstream
customers over the "shared medium" of the exchange-point switches), but it's not going to be as
strenuously objected to as not using next-hop-self.
Sidebar on Outgoing Data Flow Control Without BGP
24 of 26 04/02/00 15:06
Without BGP, your only way to send data out (and the way 90% or more of the ISPs out there run
their networks) is to default route into their provider(s).
Any packet not destined to the inside of the ISP's network will then hit the "wildcard", or "default"
route, and be sent out the router interface towards the provider(s).
There are a few ways you can do this.
Outgoing Data Flow: Option 1
Option 1 is to default to one provider and install a "backup default" to your other provider. On a
Cisco, this is done with:
ip route 0.0.0.0 0.0.0.0 Serial0

ip route 0.0.0.0 0.0.0.0 Serial1 10
This says: "The default route (0.0.0.0/0, or 0.0.0.0, netmask 0.0.0.0) goes out Serial0 with a
preference of 0 (if you don't put a 4th field in an "ip route" statement on a Cisco, it'll assume a weight
of 0)." "Another default route is out Serial1, with a weight of 10".
If you do it this way, the route with a lower weight will be around when Serial0 is up. If Serial0 goes
down for some reason (actually, if the "line protocol" on Serial0 goes down), the route will be
invalidated and will go away, so the Cisco will look for the next-best route, which will be the route
through Serial1. Even though it has a lower weight, it's the only valid route left to consider, so it'll
"win".
Outgoing Data Flow: Option 2
Option 2 is to default equally to both providers. However, there's a catch. If you just do:

You will almost certainly not be happy with the result! Unless "ip route-cache" is set on the
interfaces in question, the Cisco will simply "round-robin" outgoing packets, sending packet N out
Serial0 and packet N+1 out Serial1. Why is this bad? Well, if you are sending data to site X, and site
X is on Provider A's network (and let's say that Provider A is at the other end of Serial0), data sent to
site X out Serial0 may arrive in 10ms. Data sent to site X out Serial1 may arrive in 30-100ms. This
means that packets 1 and 3 could arrive before packets two in a pathologically worst-case scenario.
Or even packets 1, 3, 5, and 7 could arrive before packet2 does. This kind of out-of-order (or even
worse, packet-lossy) performance spells doom for IP traffic.
The fix is easy, however:
int Serial0
ip route-cache
int Serial1
ip route-cache
Note, though, that if you are using any Cisco bigger than a 2500 series, the "ip route-cache" command
might be "ip route-cache cbus" or "ip route-cache optimum" or some other command.
And actually, many Ciscos come pre-configured with "ip route-cache" set on all of the interfaces - but
even so, it doesn't hurt to be explicit.
If you do this, the Cisco will keep a cache of all destinations you're sending packets to, and will "lock
25 of 26 04/02/00 15:06
in" each destination to one specific interface. In general, this method leads to decent load-balancing
(in the 40/60 to 50/50 split range). The worst case in this scenario is not IP degradation, but poor use
of your additional bandwidth (which can, of course, lead to IP degradation if you need your second
outgoing pipe because your first has a tendency to get full). Anyway, this kind of load-balancing
works pretty well and is what people use when they can't accept "full BGP routes" from multiple
providers.
TO BE DONE
aggregate-address
transit
bgp and peering
bgp: the provider's side: filtering
as-path padding
sync
26 of 26 04/02/00 15:06

Index: BGP Routing Part I: BGP and Multi-Homing

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Index: BGP Routing Part I: BGP and Multi-Homing

Transféré par

Droits d'auteur :

Formats disponibles

BGP ROUTING PART I: BGP AND MULTI-HOMING http://wwwin-people.cisco.com/%7Emarkt/avi.

Diagrams will be up in a couple of days.

Avi Freedman Net Access

BGP ROUTING PART I: BGP AND MULTI-HOMING

INTERNET CONNECTIVITY WITHOUT BGP

ROUTING: INTERNAL (INTERIOR) AND EXTERNAL

SO WHY IS BGP INTERESTING?

BEING "CONNECTED" TO THE INTERNET

HARDWARE AND SOFTWARE FOR SPEAKING BGP

PEERING SESSIONS AND ASNs: PART I

router bgp 64512

router bgp 64512

WHAT DO YOU DO WITH BGP?

brain.netaxs.com#sho ip bgp summ

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State

Most of it is pretty self-explanatory; briefly:

More on all of this below.

eBGP vs. iBGP

BGP AND THE SINGLE-HOMED

The AS-PATH is useful for a number of reasons:

AS-PATH LENGTH AND BGP ROUTE SELECTION

A SNIPPET OF A BGP ROUTING TABLE

COMING SOON TO A TUTORIAL NEAR YOU.

AS-PATH ACCESS LISTS (FILTERS)

ip as-path access-list NNN permit regexp

ip as-path access-list NNN deny regexp

NNN match the characters NNN (where each digit of NNN is

[char1char2char3] matches any one of char1, char2, char3, etc...

AS-PATHs when you use BGP confederations.

ip as-path access-list 1 permit .*

ip as-path access-list 2 deny .*

ip as-path access-list 3 permit ^$

Fig 4 Common ASNs

ip as-path access-list 20 deny _3561_

ENTERING, MODIFYING, AND DELETING as-path access-lists

Let's say you had:

ip as-path access-list 3 permit ^$

And then you configured (perhaps as a typo, perhaps as a brain-o):

ip as-path access-list 3 permit _1239_

But if you had:

ip as-path access-list 3 permit ^$

Then adding a third rule of:

ip as-path access-list 3 permit _1239_

So, to modify an existing access list, either:

BGP METRICS (ATTRIBUTES) AND ROUTE SELECTION:

Route specificity and reachability and reachability

BGP PATH SELECTION PROCESS ACCORDING TO CISCO

2. Prefer the path with the largest weight.

Fig 8: BGP attributes

For more info, see:

RFC 2042: Registering New BGP Attribute Types

To get an RFC, go to: http://www.internic.net/rfc/rfcXXXX.txt

BGP ATTRIBUTE TYPES

Value Code Possible Values

For further reading, see for more details.

EGP vs. IGP

WHAT IS ROUTE FLAP AND WHY IS IT BAD?

WHAT TO KEEP IN MIND WHEN CONFIGURING BGP

not providing "Internet connectivity" to; and

BGP AND PEERING

INTERNET CONNECTIVITY WITHOUT BGP

BGP AND THE SINGLE-HOMED

BGP AND THE MULTI-HOMED