Académique Documents
Professionnel Documents
Culture Documents
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page ii
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
CONTENT AT A GLANCE
FOREWORD ............................................................................................................................IV
INTRODUCTION .....................................................................................................................VI
1
Page iii
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
FOREWORD
Ivan asked me to write the intro for his latest book on Software Defined Networking and I'm a bit
mystified why. Granted, he's like the control plane to my forwarding plane. The brilliant technical
insights I've gathered from Ivan's web site and webinars have provided me with valuable content
and creative inspiration ever since I first discovered it. In fact, I almost feel like I'm cheating at my
job. Every time I clarify SDN in a conversation with, "It's the decoupling of the logical from the
physical," I want to insert a footnote referencing him.
I remember the first time I heard him on a podcast, I thought to myself "This guy must be super
smart, because he sounds like a Bond villain and I can only grasp 50% of what he's saying." I
started telling colleagues about him, "Hey, check this guy out. His webinars will make your brain
bleed out of your ears!" Trust me, in my circle that's a HUGE compliment.
When I was chosen to attend my first Tech Field Day event, I was most excited because I would
finally get to meet Ivan in person. All my engineering friends were jealous and I was almost
apoplectic when the moment finally arrived, fearful I would do something foolish like confuse SMTP
and SNMP. This is when I discovered a really wonderful aspect to Ivan, if you're ever lucky enough
to interact with him personally (stalking doesn't count), you'll find him to be witty, friendly,
generous and gracious. He never makes you feel stupid for not understanding a protocol, the details
of an RFC or an IEEE standard.
He's the consummate educator and a giving mentor to almost anyone who asks. The more I know
him, the more I admire and respect his dedication to engineering. It truly is a vocation for him.
Page iv
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
I guess I need to say something about SDN now, so here goes. While it could be the idea that finally
revolutionizes networking, data centers and even security, I advise caution. Vendors will latch onto
this new buzzword like a pitbull and promote it like the industry's new secret sauce. With this book,
you'll be able to separate facts from hype and make some educated decisions regarding your own
infrastructure.
Michele Chubirka
Security architect, analyst, writer and podcaster
December 2013
Page v
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
INTRODUCTION
OpenFlow and Software Defined Networks (SDN) entered mainstream awareness in March 2011
when several large cloud providers and Internet Service Providers formed Open Networking
Foundation.
More than three years later, the media still doesnt understand the basics of SDN, and many
networking engineers feel threatened by what they see as a fundamental shift in the way they do
their jobs.
In the meantime, I published over a hundred blog posts on ipSpace.net trying to debunk the myths,
explain how SDN and OpenFlow work, and what their advantages and limitations are. Most of the
posts were responses to external triggers false claims, vendor launches, or questions I received
from my readers.
This book contains a collection of the most relevant blog posts describing the concepts of SDN and
OpenFlow. I cleaned up the blog posts and corrected obvious errors and omissions, but also tried to
leave most of the content intact. The commentaries between the individual blog posts will help you
understand the timeline or the context in which a particular blog post was written.
The book covers these topics:
The debunking of the initial hype surrounding OpenFlow public launch and the most blatant
misconceptions (Chapter 1);
Overview of what SDN is, what it benefits might be, and deliberations whether or not it makes
sense (Chapter 2);
Page vi
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Introduction to OpenFlow, from architectural basics to protocol details, and deployment and
forwarding models (Chapter 3);
OpenFlow scalability challenges, from control-plane complexity to packet punting and limitations
of flow table updates (Chapter 5);
SDN beyond OpenFlow (Chapter 7), covering BGP-based SDN, NETCONF, I2RS, Ciscos OnePK
and Plexxis controller-based data center fabrics.
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:
Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);
2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;
As always, please do feel free to send me any questions you might have the best way to reach me
is to use the contact form on my web site (www.ipSpace.net).
Happy reading!
Ivan Pepelnjak
July 2014
Page vii
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Academic researchers were working on OpenFlow concepts (distributed data plane with centralized
controller) for years, but in early 2011 a fundamental marketing shift happened: major cloud
providers (Google) and Internet Service Providers (Deutsche Telekom) created Open Networking
Foundation (ONF) to push forward commercial adoption of OpenFlow and Software Defined
Networking (SDN) or at least their definition of it.
Since then, every single vendor started offering SDN products. Almost none of them come even
close to the (narrow) vision promoted by the Open Networking Foundation (centralized control plane
with distributed data plane), NECs ProgrammableFlow being a notable exception.
Most vendors decided to SDN-wash their existing products, branding their existing APIs Open, and
claiming they have SDN-enabled products.
MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:
Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;
Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);
2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
As usual, the industry media didnt help they enthusiastically jumped onto the OpenFlow/SDN
bandwagon and started propagating myths. More than two years later they still dont understand the
fundamentals of SDN, and tend to focus exclusively on how SDN is supposed to hurt Cisco (or not).
IN THIS CHAPTER:
OPEN NETWORKING FOUNDATION FABRIC CRAZINESS REACHES NEW HEIGHTS
OPENFLOW FAQ: WILL THE HYPE EVER STOP?
OPENFLOW IS LIKE IPV6
FOR THE RECORD: I AM NOT AGAINST OPENFLOW
NETWORK FIELD DAY FIRST IMPRESSIONS
I APOLOGIZE, BUT IM EXCITED
THE REALITY TWO YEARS LATER
CONTROL AND DATA PLANE SEPARATION THREE YEARS LATER
TWO AND A HALF YEARS AFTER OPENFLOW DEBUT, THE MEDIA REMAINS CLUELESS
WHERES THE REVOLUTIONARY NETWORKING INNOVATION?
FALLACIES OF GUI
Page 1-2
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
In March 2011, industry media quickly picked up the buzz created by the Open Networking
Foundation (ONF) press releases and started exaggerating the already extravagant claims made by
ONF, prompting me to write the following blog post.
Page 1-3
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
the cost associated with operating networks. Now were getting somewhere I told you it was all
about reducing costs (starting with the networking vendors margins).
(Some of) the industry media happily joined the craze, parroting meaningless phrases from various
press releases. Consider, for example, this article from IT World Canada.
SDN would give network operators the ability to virtualize network resources, being able to
dynamically improve latency or security on demand If you want to do it, you can do it today, using
dynamic routing protocols or QoS (latency), vShield/VSG (on-demand security) or a number of
virtualized networking appliances.
Also, protocols like RSVP to signal per-session bandwidth needs have been around for more than a
decade, but somehow never caught on. Must be the fault of those stupid networking vendors.
Sites like Facebook, Google or Yahoo would be able to tailor their networks so searches would be
blindingly fast I never realized the main search problem was network bandwidth. I always somehow
thought it was related to large datasets, CPU, database indices ... Anyhow, if the network bandwidth
is the bottleneck, why dont they upgrade to the next-generation Ethernet (10G/40G). Ah, yes, it
might be expensive. How about deploying Clos network architecture? Ouch, might be a nightmare to
configure and manage. How exactly will SDN solve this problem?
Stock exchanges could assure brokerage customers on the other side of the globe theyd get
financial data as fast as a dealer beside the exchange. Will SDN manage to flatten & shrink the
earth, will it change the speed of light, or will it use large-scale quantum entanglement?
It could be programmed to order certain routers to be powered down during off-peak power
periods. What stops you from doing that today?
Page 1-4
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Dont get me wrong OpenFlow might be a good idea and it will probably lead to interesting new
opportunities (assuming they can solve the scalability and resilience issues) ... and Im absolutely
looking forward to the podcast were recording later today (available on Packet Pushers web site).
However, there are plenty of open standards in the networking industry (including XML-based
network configuration and management) waiting to be used. There are also (existing, standard)
technologies that you can use to solve most of the problems these people are complaining about.
The problem is that these standards and technologies are not used by operating systems or
applications (when was the last time youve deployed a server running OSPF to have seamless
multihoming?)
The main problems were facing today arise primarily from non-scalable application architectures
and broken TCP/IP stack. In a world with scale-out applications you dont need fancy combinations
of routing, bridging and whatever-else; you just need fast L3 transport between endpoints. In an
Internet with decent session layer or a multipath transport layer (be it SCTP, Multipath TCP or
something else) you dont need load balancers, BGP sessions with end-customers to support
multihoming, or LISP. All these kludges were invented to support OS/App people firmly believing in
fallacies of distributed computing. How is SDN supposed to change that? Im anxiously waiting to
see an answer beyond marketing/positioning/negotiating bullshit bingo.
Page 1-5
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Not surprisingly, the OpenFlow hype did not subside, and totally inaccurate articles started
appearing in industry press, prompting me to write yet another rant in April 2011.
Page 1-6
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
NW: The programmability of the MPLS capabilities of a particular vendor's platform is specific to
that vendor. And the OpenFlow-related capabilities of individual switches will depend on specific
implementations by specific vendors. Likewise, the capabilities of an OpenFlow controller will be
specific to that vendor. What exactly is the fundamental change?
NW: MPLS is a Layer 3 technique while OpenFlow is a Layer 2 method Do I need to elaborate on
this gem? Lets just point out that OpenFlow works with MAC addresses, IP subnets, IP flow 5tuples, VLANs or MPLS labels. Whatever a switch can do, OpenFlow can control it.
But wait ... OpenFlow has no provision for IPv6 at all. Maybe Network World is so futuristic they
consider a technology without IPv6 support a layer-2 technology.
Page 1-7
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
In another blog post, I compared OpenFlow to IPv6 the evangelists of both technologies promised
way more than the technologies were ever capable of delivering.
Page 1-8
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Learn from the past bubble bursts. Whenever someone makes an extraordinary claim about
OpenFlow, remember the it cant do anything you couldnt do before fact and ask yourself:
Did we have a similar functionality in the past? If not, why not? Was there no need or were the
vendors too lazy to implement it (don't forget they usually follow the money)?
Did it get used? If not, why not? What were the roadblocks? Why would OpenFlow remove them?
Repeat this exercise regularly and youll probably discover the new emperors clothes arent nearly
as shiny as some people would make you believe.
Page 1-9
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The OpenFlow pundits quickly labeled me as an OpenFlow hater, but I was just my grumpy old self
;) Heres the blog post (from May 2011) that tried to set the record straight (not that such things
would ever work).
Page 1-10
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
In just a few months, everyone was talking about OpenFlow and SDN, and Stephen Foskett, the
mastermind behind GestaltIT, decided to organize the first ever OpenFlow symposium in September
2011.
The vendor and user presentations weve seen at that symposium, combined with the vendor
presentations weve attended during the Networking Tech Field Day 2 seemed very promising
everyone was talking about the right topics and tried to address real-life scalability concerns.
Page 1-11
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Most vendors have sensible answers. They are addressing different parts of the big problem, they
talk about different technologies, but the answers arent bad. For example, every time I spotted a
scalability issue, they were aware of it and/or had good answers (if not a solution).
Layer-2 is fading away (again). While every switching vendor will tell you how you can build large L2
domains with their fabric, nobody is actually pushing them anymore. And the only time layer-2 Data
Center Interconnect (DCI) appeared on a slide, there was a unicorn image next to it. Even more,
two vendors actually said they think long-distance VM mobility is not a good idea (youll have to
watch the videos to figure out who they were).
Were cutting through the hype. Even the OpenFlow symposium was hypeless. Its so nice being able
to spend three days with highly intelligent people who are excited about the next great thing
(whatever it is), while being perfectly realistic about its current state and its limitations.
Youll see lots of new things in the future. Even if youre working in an SMB environment, you might
get exposed to OpenFlow in the not-too-distant future (more about that in an upcoming post).
Get ready for a bumpy ride. Lots of exciting technologies are being developed. Some of them make
perfect sense, some others less so. Some of them might work, some might fade away (not because
they would be inherently bad, but because of bad execution). Now is the time to jump on those
bandwagons get involved (hint: you just might start with IPv6), build a test lab, kick the tires,
figure out whether the new technologies might be a good fit for your environment when they
become stable.
Disclosure: vendors mentioned in this post indirectly covered my travel expenses. Read the full
disclosure (or a more precise one by Tony Bourke).
Page 1-12
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Even more, the real-life approach of numerous vendors Ive seen during those two events made me
overly optimistic I thought we just might be able to get to real-life OpenFlow and SDN use cases
without the usual vendor jousting and get-rich-quick startup mentality. This is what I wrote in
October 2011:
Page 1-13
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
their OpenFlow products (watch the video, PDF is not online) and finally David Ward from Juniper
presented the hybrid approach: use OpenFlow in combination (not as a replacement) with existing
technologies.
The afternoon technical Q&A panel just confirmed that numerous vendors well understand the
challenges associated with OpenFlow deployments outside of small lab setups, and that theyre
actively working on solving those problems and making OpenFlow a viable technology.
Two vendors expanded their coverage of OpenFlow during the Network Field Day: David Ward from
Juniper did a technical deep dive (dont skip the Junos automation part at the beginning of the
video, its interesting ... and you just might spot the VRF Smurf) and NEC even showed us a demo
of their OpenFlow-based switched network.
Luckily there are still some coolheaded people around (read Ethan Banks OpenFlow State of the
Union and Derick Winkworths More Open Flow Symposium Notes), but I cant help myself. The
grumpy old man from L3 ivory tower is excited (listen to PacketPushers OpenFlow/SDN podcast if
you dont believe me), and not just about OpenFlow. I still cant believe that I stumbled upon so
many interesting or cool technologies or solutions in the last few days. Could be that its just
vendors adapting to the blogging audience, or there actually might be something fundamentally new
coming to light like MPLS (then known as tag switching) was in the late 1990s.
Disclosure: vendors mentioned in this post indirectly covered my travel expenses. Read the full
disclosure (or a more precise one by Tony Bourke).
Page 1-14
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The hard reality of intervening two years has crushed all my high hopes. This is the reality of
OpenFlow and SDN as I see it in November 2013:
Page 1-15
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
In January 2014 I took another look at what the Open Networking Foundation founding members
managed to achieve between March 2011 (the beginning of OpenFlow/SDN hype) and early 2014.
The only one that made significant progress on the centralized control plane front was Google.
Since I wrote this blog post, Facebook launched their own switch operating system, which seems to
be working along the same lines as classical network operating systems (one device, one control
plane).
Google implemented their inter-DC WAN network with switches that use OpenFlow within a
switching fabric and BGP/IS-IS and something akin to PCEP between sites;
Facebook is working on the networking platform for their Open Compute Project. It seems
theyve got to switch hardware specs; I havent heard about software running on those switches
yet or maybe theyll go down the same path as Google (We got cheap switches, and we have
our own software. Goodbye and thank you!)
Page 1-16
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Yahoo! was talking about custom changes to standard networking protocols. Havent heard about
their progress since the first OpenFlow Symposium; the April 2012 presentation from Igor
Gashinsky still concluded with Wheres My Pony?
Deutsche Telekom is still using traditional routers and a great NFV platform.
Microsoft implemented SDN using BGP, using a central controller, but not a centralized control
plane.
In the networking vendor world, NEC seems to be the only company with a mature commercial
product that matches the ONF definition of SDN. Cisco has just shipped the initial version of their
controller, as did HP, and those products seem pretty limited at the moment.
Wondering why I didnt include Big Switch Networks in the above list? My definition of shipping
includes publicly available product documentation, or (at the very minimum) something resembling
a data sheet with feature description, system requirements and maximum limits. I couldnt find
either on Big Switch web site.
On the other hand, the virtual networking world was always full of solutions with separate control
and data planes, starting with the venerable VMware Distributed vSwitch and Nexus 1000V, and
continuing with newer entrants, from Hyper-V extensible switch and VMware NSX to Juniper Contrail
and IBMs 5000V and DOVE. Some of these solutions were used years before the explosion of
OpenFlow/SDN hype (only we didnt know we should call them SDN).
Page 1-17
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
In the meantime, the industry media still hasnt grasped the basics of SDN. Heres my response to a
particularly misleading article written in November 2013:
Page 1-18
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Does the above paragraph sound like Latin to you? Dont worry just keep in mind that software
usually costs about as much (or more) as the hardware it runs on, but you dont see that.
Corporations can buy fewer routers and switches. It cant get any better than this. If you need
100 10GE ports, you need 100 10GE ports. If you need two devices for two WAN uplinks (for
redundancy), you need two devices. SDN wont change the port count, redundancy requirements, or
laws of physics.
Corporations can buy cheaper [routers and switches]. Guess what you still need the
software to run them, and until we see price tags of SDN controllers, and do a TCO calculation,
claims like this one remain wishful thinking (you did notice Im extremely diplomatic today, didnt
you?).
Page 1-19
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Finally, numerous marketers and SDN/OpenFlow pundits keep repeating how theyll save the
(networking) world and bring true nirvana to the network operations with their flashy new gadgets.
Nothing can be further from the truth because we cannot get rid of the legacy permeating the whole
TCP/IP stack, as I explained in this post written in July 2013:
Page 1-20
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The final bit of hype I want to dispel is the misleading focus on CLI that we use to configure
networking devices. CLI is not the problem, and GUI will not save the world.
FALLACIES OF GUI
I love Greg Ferros characterization of CLI:
We need to realise that the CLI is a power tools for specialist tradespeople and not a
knife and fork for everyday use.
However, you do know that most devices GUI offers nothing more than what CLI does, dont you?
Wheres the catch?
For whatever reason, people find colorful screens full of clickable items less intimidating than a
blinking cursor on black background. Makes sense after all, you can see all the options you have;
you can try pulling down things to explore possible values, and commit the changes once you think
you enabled the right set of options. Does that make a product easier to use? Probably. Will it result
in better-performing product? Hardly.
Have you ever tried to configure OSPF through GUI? How about trying to configure usernames and
passwords for individual wireless users? In both cases youre left with the same options youd have
in CLI (because most vendors implement GUI as eye candy in front of the CLI or API). If you know
how to configure OSPF or RADIUS server, GUI helps you break the language barrier (example:
moving from Cisco IOS to Junos), if you dont know what OSPF is, GUI still wont save the day ... or
it might, if you try clicking all the possible options until you get one that seems to work (expect a
few meltdowns on the way if youre practicing your clicking skills on a live network).
Page 1-21
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
What the casual network admins need are GUI wizards a tool that helps you achieve a goal while
keeping your involvement to a minimum. For example: I need IP routing between these three
boxes. Go do it! should translate into Configure OSPF in area 0 on all transit interfaces. When you
see a GUI offering this level of abstraction please let me know. In the meantime, Im positive that
the engineers who have to get a job done quickly prefer using CLI over clickety-click GUI (and Im
not the only one), regardless of whether they have to configure a network device, Linux server,
Apache, MySQL, MongoDB or a zillion other products. Why do you think Microsoft invested so heavily
in PowerShell
Page 1-22
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Open Networking Foundation (ONF) launched in March 2011 quickly defined Software Defined
Networking (SDN) as architecture with centralized control plane that controls multiple physically
distinct devices.
That definition definitely suits one of the ONF founding members (Google), but is it relevant to the
networking community at large? Or does it make more sense to focus on network programmability,
or using existing protocols (BGP) in novel ways?
This chapter contains my introductory posts on the SDN-related topics, musings on what makes
sense, and a few thoughts on career changes we might experience in the upcoming years. Youll find
more details in subsequent chapters, including an overview of OpenFlow, in-depth analysis of
OpenFlow-based architectures, some real-life OpenFlow and SDN deployments, and alternate
approaches to SDN.
MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:
Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;
Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);
2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
IN THIS CHAPTER:
WHAT EXACTLY IS SDN (AND DOES IT MAKE SENSE)?
BENEFITS OF SDN
DOES CENTRALIZED CONTROL PLANE MAKE SENSE?
HOW DID SOFTWARE DEFINED NETWORKING START?
WE HAD SDN IN 1993 AND DIDNT KNOW IT
STILL WAITING FOR THE STUPID NETWORK
IS CLI IN MY WAY OR IS IT JUST A SYMPTOM OF A BIGGER PROBLEM?
OPENFLOW AND SDN DO YOU WANT TO BUILD YOUR OWN RACING CAR?
SDN, WINDOWS AND FRUITY ALTERNATIVES
SDN, CAREER CHOICES AND MAGIC GRAPHS
RESPONSE: SDNS CASUALTIES
Page 2-2
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The very strict definition of SDN as understood by Open Networking Foundation promotes an
architecture with strict separation between a controller and totally dumb devices that cannot do
more than forward packets based on forwarding rules downloaded from the controller. Does that
definition make sense? This is what I wrote in January 2014:
A BIT OF A HISTORY
Its worth looking at the founding members of ONF and their interests: most of them are large cloud
providers looking for cheapest possible hardware, preferably using a standard API so it can be
sourced from multiple suppliers, driving the prices even lower. Most of them are big enough to write
their own control plane software (and Google already did).
A separation of control plane (running their own software) and data plane (implemented in a lowcost white-label switches) was exactly what they wanted to see, and the Stanford team working on
Page 2-3
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OpenFlow provided the architectural framework they could use. No wonder ONF pushes this
particular definition of SDN.
The need for programmable network elements and vendor-neutral programming mechanisms
(Im looking at you, netmod working group);
Will physical separation of control and forward plane solve any of these? It might, but there are
numerous tools out there that can do the same without overhauling everything weve been doing in
the last 30 years.
Page 2-4
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
We dont need the physical separation of control plane to solve our problems (although the ability to
control individual forwarding entries does help) and it will probably take a decade before we
glimpse the promised savings of white-label switches and open-source software (even Greg Ferro
stopped believing that).
NOW WHAT?
Does it make sense to accept the definition of SDN that makes sense to ONF founding members but
not to your environment? Shall we strive for a different definition of SDN or just move on, declare it
as meaningless as the clouds, and focus on solving our problems? Would it be better to talk about
NetOps?
Maybe we should stop talking and start doing there are plenty of things you can do within existing
networks using existing protocols.
Page 2-5
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Every new networking technology is supposed to solve most of our headaches. SDN is no exception.
The reality might be a bit different.
BENEFITS OF SDN
Paul Stewart wrote a fantastic blog post in May 2014 listing the potential business benefits of SDN
(as promoted by SDN evangelists and SDN-washing vendors).
Heres his list:
Easier troubleshooting/visibility
I have just one problem with this list Ive seen a similar list of benefits of IPv6:
Page 2-6
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Unfortunately, the reality of IT in general and IPv6 in particular is a bit different. The overly hyped
IPv6 benefits remain myths and legends; all we got were longer addresses, incompatible protocols
(OSPFv3 anyone), and half-thought-out implementations (example: DNS autoconfiguration) ridden
with religious wars (try to ask why dont we have first-hop router in DHCPv6 on any IPv6 mailing
list ;).
For more information, watch the fantastically cynical presentation Enno Rey had @ Troopers 2014
IPv6 Security summit, or my IPv6 resources.
Page 2-7
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
With Open Networking Foundation adamantly promoting their definition of SDN, and based on
experiences with previous (now mostly extinct) centralized architectures, one has to ask a simple
question: does it make sense? Heres what I thought in May 2014:
A BIT OF HISTORY
As always, lets start with one of the greatest teachers: history. Weve had centralized architectures
for decades, from SNA to various WAN technologies (SDH/SONET, Frame Relay and ATM). They all
share a common problem: when the network partitions, the nodes cut off from the central
intelligence stop functioning (in SNA case) or remain in a frozen state (WAN technologies).
One might be tempted to conclude that the ONF version of SDN wont fare any better than the
switched WAN technologies. Reality is far worse:
Page 2-8
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
WAN technologies had little control-plane interaction with the outside world (example: Frame
Relay LMI), and those interactions were run by the local devices, not from the centralized control
plane;
WAN devices (SONET/SDH multiplexers, or ATM and Frame Relay switches) had local OAM
functionality that allowed them to detect link or node failures and reroute around them using
preconfigured backup paths. One could argue that those devices had local control plane,
although it was never as independent as control planes used in todays routers.
Interestingly, MPLS-TP wants to reinvent the glorious past and re-introduce centralized path
management, yet again proving RFC 1925 section 2.11.
The last architecture (that I remember) that used truly centralized control plane was SNA, and if
youre old enough you know how well that ended.
Page 2-9
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Juniper XRE can control up to four EX8200 switches, or a total of 512 10GE ports;
Nexus 7700 can control 64 fabric extenders with 3072 ports, plus a few hundred directly
attached 10GE ports;
HP IRF can bind together two 12916 switches for a total of 1536 10GE ports;
QFabric Network Node Group could control eight nodes, for a total of 384 10GE ports.
NEC ProgrammableFlow seems to be an outlier they can control up to 200 switches, for a total of
over 9000 GE (not 10GE) ports but they dont run any control-plane protocol (apart from ARP and
dynamic MAC learning) with the outside world. No STP, LACP, LLDP, BFD or routing protocols.
One could argue that we could get an order of magnitude beyond those numbers if only we were
using proper control plane hardware (Xeon CPUs, for example). I dont buy that argument till I
actually see a production deployment, and do keep in mind that NEC ProgrammableFlow Controller
uses decent Intel-based hardware. Real-time distributed systems with fast feedback loops are way
more complex than most people looking from the outside realize (see also RFC 1925, section 2.4).
Page 2-10
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
I absolutely understand why NEC went down this path they did something extraordinary
to differentiate themselves in a very crowded market. I also understand why Google decided
to use this approach, and why they evangelize it as much as they do. Im just saying that it
doesnt make that much sense for the rest of us.
Finally, do keep in mind that the whole world of IT is moving toward scale-out architectures. Netflix
& Co are already there, and the enterprise world is grudgingly doing the first steps. In the
meantime, OpenFlow evangelists talk about the immeasurable revolutionary merits of centralized
scale-up architecture. They must be living on a different planet.
Page 2-11
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Just in case youre wondering how the OpenFlow/SDN movement started, heres a bit of pre-2011
history.
Page 2-12
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Assuming we forget the ONF-promoted definition of SDN and define SDN as network programmed
from a central controller, its obvious we had SDN for at least 20 years.
Page 2-13
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
HTML user interface (written in Perl) gave the operators easy access to user database (probably
implemented as a text file we were true believers in NoSQL movement in those days), and a backend Perl script generated router configuration commands from the user definitions and downloaded
them (probably through rcp the details are a bit sketchy) to the dial-up access servers.
Next revision of the software included support for leased line users the script generated interface
configurations and static routes for our core router (it was actually an MGS, but I found no good
MGS images on the Internet) or one of the access server (for users using asynchronous modems).
How is that different from all the shiny new stuff vendors are excitedly talking about? Beats me, I
cant figure it out ;) and as I said before, you dont always need new protocols to solve old
problems.
Page 2-14
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
While were happily arguing the merits of reinvented architectures, we keep forgetting that the
basics of sound network architecture were known for over a decade and we still havent made any
progress getting closer to them.
Page 2-15
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Some SDN proponents claim that the way we configure networking devices (using CLI) is the biggest
networking problem were facing today. They also conveniently forget that every scalable IT solution
uses automation, text files and CLI because they work, and allow experienced operators to work
faster.
Page 2-16
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
CLI generates vendor lock-in. Another pile of startup hype in this case coming from startups
that want to replace the network device lock-in with controller lock-in (heres a similar story).
Page 2-17
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Its reasonably easy to add automation and orchestration on top of existing network implementation.
Throwing away decades of field experience and replacing existing solutions with an OpenFlow-based
controller is a totally different story as I explained in May 2013:
Page 2-18
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Till then, it might make sense to focus on more down-to-earth technologies; after all, you don't
exactly need OpenFlow and a central controller to solve real-life problems, like Tail-f clearly
demonstrated with their NCS software.
Page 2-19
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Openness (for whatever value of Open) is another perceived benefit of SDN. In reality, youre
trading hardware vendor lock-in for controller vendor lock-in.
Page 2-20
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Now, what do you want to have in your mission-critical SDN/OpenFlow data center networking
infrastructure: a Mac-like tightly controlled and vendor-tested mix of equipment and associated
controller, or a Windows-like hodgepodge of boxes from numerous vendors, controlled by thirdparty software that might have never encountered the exact mix of the equipment you have.
If youre young and brazen (like I was two decades ago), go ahead and be your own system
integrator. If youre too old and covered with vendor-inflicted scars, you might prefer a tested endto-end solution regardless of what Gartner says in vendor-sponsored reports (and even solutions
that vendor X claims were tested dont always work). Just dont forget to consider the cost of
downtime in your total-cost-of-ownership calculations.
Page 2-21
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
SDN controllers will replace networking engineers, at least if you believe what the SDN or
virtualization vendors are telling you. I dont think we have to worry about that happening in
foreseeable future (and nothing changed since I wrote the following blog post in late 2012).
Page 2-22
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Networking in general is clearly in the late majority/laggards phase. Whats important for our
discussion is the destruction of value-add through the diffusion process. Oh my, I sound like a
freshly-baked MBA whiz-kid, lets reword it: as a technology gets adopted, more people understand
it, the job market competition increases, and thus its harder to get a well-paying job in that
particular technology area. Supporting Windows desktops might be a good example.
Page 2-23
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
As a successful technology matures, it moves through the four parts of another magic matrix (this
one from Boston Consulting Group).
Initially every new idea is a great unknown, with only a few people brave enough to invest time in it
(CCIE R&S before Cisco made it mandatory for Silver/Gold partner status). After a while, the
successful ideas explode into stars with huge opportunities and fat margins (example: CCIE R&S a
decade ago, Nicira-style SDN today at least for Niciras founders), degenerates into a cash cow as
Page 2-24
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
the market slowly gets saturated (CCIE R&S is probably at this stage by now) and finally (when
everyone starts doing it) becomes an old dog not worth bothering with.
Does it make sense to invest into something thats probably in a cash cow stage? The theory says
as much as needed to keep it alive, but dont forget that CCIE R&S will likely remain very relevant
a long time:
The protocol stacks were using havent changed in the last three decades (apart from extending
the address field from 32 to 128 bits), and although people are working on proposals like MPTCP, those proposals are still in experimental stage;
Regardless of all the SDN hoopla, neither OpenFlow nor other SDN technologies address the real
problems were facing today: lack of session layer in TCP and the use of IP addresses in
application layer. They just give you different tools to implement todays kludges.
Cisco is doing constant refreshes of its CCIE programs to keep them in the early adopters or
early majority technology space, so the CCIE certification is not getting commoditized.
If you approach the networking certifications the right way, youll learn a lot about the principles
and fundamentals, and youll need that knowledge regardless of the daily hype.
Now that Ive mentioned experimental technologies dont forget that not all of them get adopted
(even by early adopters). Geoffrey Moore made millions writing a book that pointed out that obvious
fact. Of course he was smart enough to invent a great-looking wrapper he called it Crossing the
Chasm.
Page 2-25
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Figure 2-5: The chasm before the mainstream market adoption (source: Crossing the Chasm & Inside the
Tornado)
The crossing the chasm dilemma is best illustrated with Gartner Hype Cycles. After all the initial
hype (that weve seen with OpenFlow and SDN) resulting in peak of inflated expectations, theres
the ubiquitous through of disillusionment. Some technologies die in that quagmire; in other more
successful cases we eventually figure out how to use them (slope of enlightenment).
Page 2-26
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
We still dont know how well SDN will be doing crossing the chasm (according to the latest Gartners
charts, OpenFlow still hasnt reached the hype peak - I dread what's still lying ahead of us); weve
seen only a few commercial products and none of them has anything close to widespread adoption
(not to mention the reality of three IT geographies).
Page 2-27
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Anyhow, since youve decided you want to work in networking, one thing is certain: technology will
change (whatever the change will be), and it will happen with or without you. At every point in your
career you have to invest some of your time into learning something new. Some of those new things
will be duds; others might turn into stars. See also Private Clouds Will Change IT Jobs, Not Eliminate
Them by Mike Fratto.
Finally, dont ask me for what will the next big thing be advice. Browse through the six years of
my blog posts. You might notice a clear shift in focus; its there for a reason.
Page 2-28
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
While everyone talks about SDN, the products are scarce, and it will take years before theyll
appear in a typical enterprise network. Apart from NECs Programmable Flow and overlay
networks, most other SDN-washed things Ive seen are still point products.
Overlay virtual networks seem to be the killer app of the moment. They are extremely useful and
versatile ... if youre not bound to VLANs by physical appliances. Well have to wait for at least
another refresh cycle before we get rid of them.
Data center networking is hot and sexy, but its only a part of what networking is. I havent seen
a commercial SDN app for enterprise WAN, campus or wireless (Im positive Im wrong write a
comment to correct me), because thats not where the VCs are looking at the moment.
Also, consider that the my job will be lost to technology sentiments started approximately 200 years
ago and yet the population has increased by almost an order of magnitude in the meantime, there
Page 2-29
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
are obviously way more jobs now (in absolute terms) than there were in those days, and nobody in
his right mind wants to do the menial chores that the technology took over.
Obviously you should be worried if youre a VLAN provisioning technician. However, with everyone
writing about SDN you know whats coming down the pipe, and you have a few years to adapt,
expand the scope of your knowledge, and figure out where it makes sense to move (and dont forget
to focus on where you can add value, not what job openings you see today). If you dont do any of
the above, dont blame SDN when the VLANs (finally) join the dinosaurs and you have nothing left to
configure.
Finally, Im positive there will be places using VLANs 20 years from now. After all, AS/400s and
APPN are still kicking and people are still fixing COBOL apps (that IBM just made sexier with XML
and Java support).
Page 2-30
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OPENFLOW BASICS
Based on exorbitant claims made by the industry press you might have concluded there must be
some revolutionary concepts in the OpenFlow technology. Nothing could be further from the truth
OpenFlow is a very simple technology that allows a controller to program forwarding entries in a
networking device.
Did you ever encounter Catalyst 5000 with Route Switch Module (RSM), or a combination of Catalyst
5000 and an external router, using Multilayer Switching (MLS)? Those products used architecture
identical to OpenFlow almost 20 years ago, the only difference being the relative openness of
OpenFlow protocol.
This chapter will answer a number of basic OpenFlow questions, including:
MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:
Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;
Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);
2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
What is OpenFlow?
How can a controller implement control-plane protocols (like LACP, STP or routing protocols)
and does it have to?
IN THIS CHAPTER:
MANAGEMENT, CONTROL AND DATA PLANES IN NETWORK DEVICES AND SYSTEMS
WHAT EXACTLY IS THE CONTROL PLANE?
WHAT IS OPENFLOW?
WHAT IS OPENFLOW (PART 2)?
OPENFLOW PACKET MATCHING CAPABILITIES
OPENFLOW ACTIONS
OPENFLOW DEPLOYMENT MODELS
FORWARDING MODELS IN OPENFLOW NETWORKS
YOU DONT NEED OPENFLOW TO SOLVE EVERY AGE-OLD PROBLEM
OPENFLOW AND IPSILON: NOTHING NEW UNDER THE SUN
Page 3-2
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 3-3
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The fundamental principle underlying OpenFlow and Software Defined Networking (as defined by
Open Networking Foundation) is the decoupling of control- and data plane, with data (forwarding)
plane running in a networking device (switch or router) and control plane being implemented in a
central controller, which controls numerous dumb devices. Lets start with the basics what are
data, control and management planes?
Process the transit traffic (thats why we buy them) in the data plane;
Figure out whats going on around it with the control plane protocols;
Interact with its owner (or Network Management System NMS) through the management
plane.
Routers are used as a typical example in every text describing the three planes of operation, so lets
stick to this time-honored tradition:
Interfaces, IP subnets and routing protocols are configured through management plane
protocols, ranging from CLI to NETCONF and the latest buzzword northbound RESTful API;
Page 3-4
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Router runs control plane routing protocols (OSPF, EIGRP, BGP ) to discover adjacent devices
and the overall network topology (or reachability information in case of distance/path vector
protocols);
Router inserts the results of the control-plane protocols into Routing Information Base (RIB) and
Forwarding Information Base (FIB). Data plane software or ASICs uses FIB structures to forward
the transit traffic.
Management plane protocols like SNMP can be used to monitor the device operation, its
performance, interface counters
Page 3-5
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The management plane is pretty straightforward, so lets focus on a few intricacies of the control
and data planes.
We usually have routing protocols in mind when talking about Control plane protocols, but in reality
the control plane protocols perform numerous other functions including:
Page 3-6
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Adjacent device discovery (hello mechanisms present in most routing protocols, ES-IS, ARP, IPv6
ND, uPNP SSDP);
Data plane should be focused on forwarding packets but is commonly burdened by other activities:
Neighbor address gleaning (example: dynamic MAC address learning in bridging, IPv6 SAVI);
ACL logging;
Data plane forwarding is hopefully performed in dedicated hardware or in high-speed code (within
the interrupt handler on low-end Cisco IOS routers), while the overhead activities usually happen on
the device CPU (sometimes even in userspace processes the switch from high-speed forwarding to
user-mode processing is commonly called punting).
In reactive OpenFlow architectures a punting decision sends a packet all the way to the
OpenFlow controller.
Page 3-7
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Regardless of the implementation details, its obvious the device CPU represents a significant
bottleneck (in some cases the switch to CPU-based forwarding causes several magnitudes lower
performance) the main reason one has to rate-limit ACL logging and protect the device CPU with
Control Plane Protection features.
Page 3-8
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
It seems its easy to define what a network device control plane is (and how its different from the
data plane) until someone starts unearthing the interesting corner cases.
Page 3-9
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Trying to classify protocols based on where theyre run is also misleading. Its true that the
networking device CPU almost always generates ICMP requests and responses (it doesnt make
sense to spend silicon real estate to generate ICMP responses). In some cases, ICMP packets might
be generated in the slow path, but thats just how a particular network operating system works.
Lets ignore those dirty details for the moment; just because a devices CPU touches a packet
doesnt make that packet a control plane packet.
Vendor terminology doesnt help us either most vendors talk about Control Plane Policing or
Protection. These mechanisms usually apply to control plane protocols as well as data plane packets
punted from ASICs to the device CPU.
Even IETF terminology isnt exactly helpful while C in ICMP does stand for Control, it doesnt
necessarily imply control plane involvement. ICMP is simply a protocol that passes control messages
(as opposed to user data) between IP devices.
Honestly, Im stuck. Is ICMP a control plane protocol thats triggered by data plane activity or is it a
data plane protocol? Can you point me to an authoritative source explaining what ICMP is? Share
your thoughts in the comments!
Page 3-10
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Now that we know what data, control and management planes are, lets see how OpenFlow fits into
the picture.
WHAT IS OPENFLOW?
A typical networking device (bridge, router, switch, LSR ...) runs all the control protocols (including
port aggregation, STP, TRILL, MAC address learning and routing protocols) in the control plane
(usually implemented in central CPU or supervisor module), and downloads the forwarding
instructions into the data plane structures, which can be simple lookup tables or specialized
hardware (hash tables or TCAMs).
In architectures with distributed forwarding hardware the control plane has to use a communications
protocol to download the forwarding information into data plane instances. Every vendor uses its
own proprietary protocol (Cisco uses IPC InterProcess Communication to implement distributed
CEF); OpenFlow tries to define a standard protocol between control plane and associated data plane
elements.
The OpenFlow zealots would like you to believe that were just one small step away from
implementing Skynet; the reality is a bit more sobering. You need a protocol between control and
data plane elements in all distributed architectures, starting with modular high-end routers and
switches. Almost every modular high-end switch that you can buy today has one or more supervisor
modules and numerous linecards performing distributed switching (preferably over a crossbar
matrix, not over a shared bus). In such a switch, OpenFlow-like protocol runs between supervisor
module(s) and the linecards.
Page 3-11
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Moving into more distributed space, the fabric architectures with central control plane (HPs IRF,
Ciscos VSS) use an OpenFlow-like protocol between the central control plane and forwarding
instances.
You might have noticed that all vendors support a limited number of high-end switches in a central
control plane architecture (Ciscos VSS cluster has two nodes and HPs IRF cluster can have up to
four high-end switches). This decision has nothing to do with vendor lock-in and lack of open
protocols but rather reflects the practical challenges of implementing a high-speed distributed
architecture (alternatively, you might decide to believe the whole networking industry is a
confusopoly of morons who are unable to implement what every post-graduate student can simulate
with open source tools).
Moving deeper into the technical details, the OpenFlow Specs page on the OpenFlow web site
contains a link to the OpenFlow Switch Specification v1.1.0, which defines:
OpenFlow channel (the session between an OpenFlow switch and an OpenFlow controller);
The designers of OpenFlow had to make the TCAM structure very generic if they wanted to offer an
alternative to numerous forwarding mechanisms implemented today. Each entry in the flow tables
contains the following fields: ingress port, source and destination MAC address, ethertype, VLAN tag
& priority bits, MPLS label & traffic class (starting with OpenFlow 1.1), IP source and destination
address (and masks), layer-4 IP protocol, IP ToS bits and TCP/UDP port numbers.
Page 3-12
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
To make the data plane structures scalable, OpenFlow 1.1 introduces a concept of multiple flow
tables linked into a tree (and group tables to support multicasts and broadcasts). This concept
allows you to implement multi-step forwarding, for example:
Match local MAC addresses and move into L3/MPLS table; perform L2 forwarding otherwise
(table #3)
You can pass metadata between tables to make the architecture even more versatile.
The proposed flow table architecture is extremely versatile (and Im positive theres a PhD thesis
being written proving that it is a superset of every known and imaginable forwarding paradigm), but
it will have to meet the harsh reality before well see a full-blown OpenFlow switch products. You can
implement the flow tables in software (in which case the versatility never hurts, but youll have to
wait a few years before the Moore Law curve catches up with terabit speeds) or in hardware where
the large TCAM entries will drive the price up.
Page 3-13
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
I started getting more detailed OpenFlow questions after the initial What is OpenFlow post, and
tried to answer the most common ones in a follow-up post.
Send control protocol (or data) packets through any port of any controlled data-plane devices;
Receive (and process) packets that cannot be handled by the data plane forwarding rules. These
packets could be control-plane protocol packets (for example, LLDP) or user data packets that
need special processing.
As part of the protocol, OpenFlow defines abstract data plane structures (forwarding table entries)
that have to be implemented by OpenFlow-compliant forwarding devices (switches).
Is it an abstraction of the forwarding plane? Yes, as far as it defines data structures that can be used
in OpenFlow messages to update data plane forwarding structures.
Page 3-14
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Is it an automation technology? No, but it can be used to automate the network deployments.
Imagine a cluster of OpenFlow controllers with shared configuration rules that use packet carrying
capabilities of OpenFlow protocol to discover network topology (using LLDP or a similar protocol),
build a shared topology map of the network, and use it to download forwarding entries into the
controlled data planes (switches). Such a setup would definitely automate new device provisioning in
a large-scale network.
Alternatively, you could use OpenFlow to create additional forwarding (actually packet dropping)
entries in access switches or wireless access points deployed throughout your network, resulting in a
scalable multi-vendor ACL solution.
Is it a virtualization technology? Of course not. However, its data structures can be used to perform
MAC address, IP address or MPLS label lookup and push user packets into VLANs (or push additional
VLAN tags to implement Q-in-Q) or MPLS-labeled frames, so you can implement most commonly
used virtualization techniques (VLANs, Q-in-Q VLANs, L2 MPLS-based VPNs or L3 MPLS-based VPNs)
with it.
Theres no reason you couldnt control soft switch (embedded in the hypervisor) with OpenFlow. An
open-source hypervisor switch implementation (Open vSwitch) that has many extensions for
virtualization is already available and can be used with Xen/XenServer (its the default networking
stack in XenServer 6.0) or KVM.
Open vSwitch became the de-facto OpenFlow switch reference implementation. Its used by
many hardware and software vendors, including VMware, which uses Open vSwitch in the
multi-hypervisor version of NSX.
Page 3-15
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Im positive the list of Open vSwitch extensions is hidden somewhere in its somewhat cryptic
documentation (or you could try to find them in the source code), but the list of OpenFlow 1.2
proposals implemented by Open vSwitch or sponsored by Nicira should give you some clues:
Summary: OpenFlow is like C++. You can use it to implement all sorts of interesting solutions, but
its just a tool.
Page 3-16
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OpenFlow can match on almost any field in layer-2 (Ethernet, MPLS, 802.1Q, PBB, MPLS), layer-3
(IPv4 and IPv6) and layer-4 (TCP and UPD) headers. Heres an overview covering OpenFlow version
1.0 through 1.3.
Version
Input port
1.0
1.0
1.0
VLAN tag
1.0
802.1p value
1.0
Page 3-17
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Match condition
Version
1.1
1.3
MPLS tags
1.1
1.3
1.0
ToS/DSCP bits
1.0
Layer-4 IP protocol
1.0
1.0
1.2
1.3
1.0
Page 3-18
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Match condition
Version
1.1
1.0
ICMPv6 support
1.2
OTHER OPTIONS
Extensible matching (matching on any bit pattern)
1.2
OpenFlow switches might not support all match conditions specified in the OpenFlow version
they support. For example, most data center switches dont support MPLS or PBB matching.
Furthermore, some switches might implement certain matching actions in software. For
example, early OpenFlow code for HP Procurve switches implemented layer-3 forwarding in
hardware and layer-2 forwarding in software, resulting in significantly reduced forwarding
performance.
Page 3-19
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
After matching a packet, an OpenFlow forwarding entry performs a list of actions on the matched
packet. This blog post lists actions supported in OpenFlow versions 1.0 through 1.3.
OPENFLOW ACTIONS
Every OpenFlow forwarding entry has two components:
Flow match specification, which can use any combination of fields listed in the previous table;
Initial OpenFlow specification contained the basic actions one needs to implement MAC- and IPv4
forwarding as well as actions one might need to implement NAT or load balancing. Later versions of
the OpenFlow protocol added support for MPLS, IPv6 and Provider Backbone Bridging (PBB).
An OpenFlow switch OpenFlow switches might not support all actions specified in the
OpenFlow version they support. For example, most switches dont support MAC, IP address
or TCP/UDP port number rewrites.
OpenFlow action
Version
1.0
1.1
Page 3-20
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OpenFlow action
Process the packet through specified group (example: LAG or fast
Version
1.1
failover)
Drop packet
1.0
1.0
1.0
1.0
1.1
1.3
1.1
1.0
1.0
Decrement TTL
1.1
Page 3-21
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OpenFlow action
Version
1.0
1.0
OTHER OPTIONS
Extensible rewriting (rewriting any bit pattern)
1.2
Page 3-22
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The all-or-nothing approach to OpenFlow was quickly replaced with a more realistic approach. An
OpenFlow-only deployment is potentially viable in dedicated greenfield environments, but even there
its sometimes better to rely on functionality already available in networking devices instead of
reinventing all the features and protocols that were designed, programmed, tested and deployed in
the last 20 years.
Not surprisingly, the traditional networking vendors quickly moved from OpenFlow-only approach to
a plethora of hybrid solutions.
Page 3-23
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
NATIVE OPENFLOW
The switches are totally dumb; the controller performs all control-plane functions, including running
control-plane protocols with the outside world. For example, the controller has to use packet-out
messages to send LACP, LLDP and CDP packets to adjacent servers and packet-in messages to
process inbound control-plane packets from attached devices.
This model has at least two serious drawbacks even if we ignore the load placed on the controller by
periodic control-plane protocols:
The switches need IP connectivity to the controller for the OpenFlow control session. They can
use out-of-band network (where OpenFlow switches appear as IP hosts), similar to the QFabric
architecture. They could also use in-band communication sufficiently isolated from the OpenFlow
network to prevent misconfigurations (VLAN 1, for example), in which case they would probably
have to run STP (at least in VLAN 1) to prevent bridging loops.
Fast control loops like BFD are hard to implement with a central controller, more so if you want
to have very fast response time.
NEC seems to be using this model quite successfully (although they probably have a few
extensions), but already encountered inherent limitations: a single controller can control up to ~50
switches and rerouting around failed links takes around 200 msec (depending on the network size).
For more details, watch their Networking Tech Field Day presentation.
NEC has since enhanced the scalability of their controller a single controller cluster can
manage over a 200 switches.
Page 3-24
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Some controller vendors went down that route and significantly extended OpenFlow 1.1. For
example, Nicira has added support for generic pattern matching, IPv6 and load balancing.
Needless to say, the moment you start using OpenFlow extensions or functionality implemented
locally on the switch, you destroy the mirage of the nirvana described at the beginning of the article
were back in the muddy waters of incompatible extensions and hardware compatibility lists. The
specter of Fiber Channel looms large.
Page 3-25
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
to-switch communication problem is also solved: the TCP session between them traverses the nonOpenFlow part of the network.
This approach is commonly used in academic environments where OpenFlow is running in parallel
with the production network. Its also one of the viable pilot deployment models.
INTEGRATED OPENFLOW
OpenFlow classifiers and forwarding entries are integrated with the traditional control plane. For
example, Junipers OpenFlow implementation inserts compatible flow entries (those that contain only
destination IP address matching) as ephemeral static routes into RIB (Routing Information Base).
OpenFlow-configured static routes can also be redistributed into other routing protocols.
Page 3-26
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Going a step further, Junipers OpenFlow model presents routing tables (including VRFs) as virtual
interfaces to the OpenFlow controller (or so it was explained to me). Its thus possible to use
OpenFlow on the network edge (on user-facing ports), and combine the flexibility it offers with
traditional routing and forwarding mechanisms.
From my perspective, this approach makes most sense: dont rip-and-replace the existing network
with a totally new control plane, but augment the existing well-known mechanisms with functionality
thats currently hard (or impossible) to implement. Youll obviously lose the vague promised benefits
of Software Defined Networking, but I guess that the ability to retain field-proven mechanisms while
adding customized functionality and new SDN applications more than outweighs that.
Page 3-27
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
An OpenFlow network can emulate any network behavior supported by its components (hardware or
virtual switches), from hop-by-hop forwarding to path-based forwarding paradigms.
Edge security policy authenticate users (or VMs) and deploy per-user ACLs before
connecting a user to the network (example: IPv6 first-hop security);
Programmable SPAN ports use OpenFlow entries on a single switch to mirror selected traffic
to SPAN port;
DoS traffic blackholing use OpenFlow to block DoS traffic as close to the source as possible,
using N-tuples for more selective traffic targeting than the more traditional RTBH approach.
Page 3-28
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Traffic redirection use OpenFlow to redirect interesting subset of traffic to network services
appliance (example: IDS).
Using OpenFlow on one or more isolated devices is simple (no interaction with adjacent devices) and
linearly scalable you can add more devices and controllers as needed because theres no tight
coupling anywhere in the system.
Page 3-29
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The dirty details of path-based forwarding vary based on the hardware capabilities of the switches
you use and your programming preferences. Using MPLS or PBB would be the cleanest option
those packet formats are well understood by network troubleshooting tools, so an unlucky engineer
trying to fix a problem in an OpenFlow-based fabric would have a fighting chance.
Unfortunately you wont see much PBB or MPLS in OpenFlow products any time soon they require
OpenFlow 1.3 (or vendor extensions) and hardware support thats often lacking in switches used for
OpenFlow forwarding these days. OpenFlow controller developers are trying to bypass those
problems with creative uses of packet headers (VLAN or MAC rewrite comes to mind), making a
troubleshooters job much more interesting.
Hop-by-hop forwarding. Install flow-matching N-tuples in every switch along the path. Results in
an architecture that works great in PowerPoint and lab tests, but breaks down in anything remotely
similar to a production network due to scalability problems, primarily FIB update challenges.
If an OpenFlow controller using hop-by-hop forwarding paradigm implements proactive flow
installation (install N-tuples based on configuration and topology), it just might work in small
deployments. If it uses reactive flow installation (punt new flows to the controller, install microflow
entries on every hop for each new flow), it deserves a nomination for Darwin Award.
Page 3-30
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OpenFlow is an emerging technology, and youll stumble upon numerous vendors (from startups to
major brand names) selling you OpenFlow-based solutions (and pixie dust). Its important to
understand how these solutions work behind the scenes when evaluating them. Everything will work
great in your 2-node proof-of-concept lab, but you might encounter severe scalability limitations in
real-life deployment.
Page 3-31
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Networking engineers reactions to OpenFlow were easy to predict from this will never work to
heres how I can solve my problem with OpenFlow. It turns out we can solve many problems
without involving OpenFlow; the traditional networking protocols are often good enough.
Page 3-32
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The only problem with Brads reasoning is that we already have the tools to do exactly what hes
looking for. The magic acronym is LLDP (802.1AB).
LLDP has been standardized years ago and is available on numerous platforms, including Catalyst
and Nexus switches, and Linux operating system (for example, lldpad is part of the standard Fedora
distribution). Not to mention that every DCB-compliant switch must support LLDP as the DCBX
protocol uses LLDP to advertise DCB settings between adjacent nodes.
The LLDP MIB is standard and allows anyone with SNMP read access to discover the exact local LAN
topology the connected port names, adjacent nodes (and their names), and their management
addresses (IPv4 or IPv6). The management addresses that should be present in LLDP
advertisements can then be used to expand the topology discovery beyond the initial set of nodes
(assuming your switches do include it in LLDP advertisement; for example, NX-OS does but Force10
doesn't).
Building the exact network topology from LLDP MIB is a very trivial exercise. Even a somewhat
reasonable API is available (yeah, having an API returning a network topology graph would be even
cooler). Mapping the Hadoop Data Nodes to ToR switches and Name Nodes can thus be done on
existing gear using existing protocols.
Would OpenFlow bring anything to the table? Actually not, it also needs packets exchanged between
adjacent devices to discover the topology and the easiest thing for OpenFlow controllers to use is ...
ta-da ... LLDP ... oops, OFDP, because LLDP just wasnt good enough. The only difference is that in
the traditional network the devices would send LLDP packets themselves, whereas in the OpenFlow
world the controller would use Packet-Out messages of the OpenFlow control session to send LLDP
packets from individual controlled devices and wait for Packet-In messages from other device to
discover which device received them.
Page 3-33
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The Linux configuration wouldnt change much. If you want the switches to see the hosts, you still
have to run LLDP (or OFDP or whatever you call it) daemon on the hosts.
Last but definitely not least, you could use well-defined SNMP protocol with a number of readilyavailable Linux or Windows libraries to read the LLDP results available in the SNMP MIB in the old
world devices. Im still waiting to see the high-level SDN/OpenFlow API; everything Ive seen so far
are OpenFlow virtualization attempts (multiple controllers accessing the same devices) and
discussions indicating standard API isnt necessarily a good idea. Really? Havent you learned
anything from the database world?
So, why did I mention the two posts at the beginning of this article? Because Bob pointed out that
those who cannot remember the past are condemned to fulfill it. At the moment, OpenFlow seems
to fit the bill perfectly.
Page 3-34
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Were not coming to the skeptic part of this chapter. Lets start with an easy observation: ideas
similar to OpenFlow were floated in 1990s (and failed miserably).
Page 3-35
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Likewise, some people propose downloading 5-tuples or 12-tuples in all the switches along the flow
path. The only difference is that 15 years ago engineers understood virtual circuit labels use fewer
resources than 5-to-12-tuple policy-based routing.
As expected, Ipsilons approach had a few scaling issues. From the same article:
The bulk of the criticism, however, relates to Ipsilon's use of virtual circuits. Flows are
associated with application-to-application conversations and each flow gets its very own
VC. Large environments like the Internet with millions of individual flows would exhaust
VC tables.
Not surprisingly, a number of people (myself included) that still remember a bit of the networking
history are making the exact same argument about usage of microflows in OpenFlow environments
... but it seems RFC 1925 (section 2.11) will yet again carry the day.
An hour after publishing this blog post, I realized (reading an article by W.R. Koss) that Ed
Crabbe mentioned Ipsilon being the first attempt at SDN during his OpenFlow Symposium
presentation.
Page 3-36
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Continuing the skeptic streak: do you really expect to get a network operating system just because
you have a protocol that allows you to download forwarding tables into a switch?
The blog post was written in 2011, when the shortcomings of OpenFlow werent that well
understood. Three years later (August 2014), all we have is a single production-grade commercial
controller (NEC ProgrammableFlow).
Page 3-37
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
have How to build OpenFlow Switch with Our Chipset application notes available as soon as they
find OpenFlow commercially viable. Hopefully well see another Dell (or HP) emerge, producing lowcost reasonable-quality products in the low-end to mid-range market ... but all these switches will
still need networking software controlling them.
If youre old enough to remember the original PCs from IBM, youll easily recognize the parallels.
IBM documented PC hardware architecture and BIOS API (you even got BIOS source code), allowing
numerous third-party vendors to build adapter cards (and later PC clones), but all those machines
had to run an operating system ... and most of them used MS-DOS (and later Windows). Almost
three decades later, vast majority of PCs still run on Microsofts operating systems.
Some people think that the potential adoption of OpenFlow protocol will magically materialize opensource software to control the OpenFlow switches, breaking the bonds of proprietary networking
solutions. In reality, the companies that invested heavily in networking software (Cisco, Juniper, HP
and a few others) might be the big winners ... if they figure out fast enough that they should morph
into software-focused companies.
Cisco has clearly realized the winds are changing and started talking about inclusion of OpenFlow in
NX-OS operating system. I would bet their first OpenFlow implementation wont be an OpenFlowenabled Nexus switch.
Page 3-38
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Moving a bit further, you cannot program a controller unless it has a well-defined API you can use
(the northbound API). More than two years after the creation of the Open Networking Foundation,
we still dont have a specification (not even a public draft), and ever controller vendor uses a
different API. The situation might improve with the release of Open Daylight, an open-source
OpenFlow controller that will (if it becomes widely used) set a de-facto standard.
Page 3-39
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
and create an ingress Forwarding Equivalence Class (FEC) to map the backup traffic to that path. In
short, we need whats called SDN Controller Northbound API.
Page 3-40
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
LETS SPECULATE
There might be several good reasons for the current state of affairs:
The only people truly interested in OpenFlow are the Googles of the world (Nicira is using
OpenFlow purely as an information transfer tool to get MAC-to-IP mappings into their
vSwitches);
Developers figure out all sorts of excellent reasons why their dynamic and creative work couldnt
possibly be hammered into tight confines of a standard API;
The reality is probably a random mixture of all four (and a few others), but that doesnt change the
basic facts: until theres a somewhat standard and stable API (like SQL-86) that I could use with
SDN controllers from multiple vendors, Im better off using Cisco ONE or Junos XML API, otherwise
Im just trading lock-ins (as ecstatic users of umbrella network management systems would be more
than happy to tell you).
On the other hand, if I stick with Cisco or Juniper (and implement a simple abstraction layer in my
application to work with both APIs) at least I could be pretty positive theyll still be around in a year
or two.
Page 3-41
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
When you have a hammer, every problem seems like a nail. Nicira and later Open Daylight tried to
implement network virtualization with OpenFlow. As it turns out, they might have used a wrong tool.
Page 3-42
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The OpenFlow controller can thus proactively download the forwarding information to the switches,
and stay out of the forwarding path, ensuring reasonable scalability.
BTW, even this picture isnt all rosy Nicira had to implement virtual tunnels to work around the
OpenFlow point-to-point interface model.
Perform dynamic MAC learning in the OpenFlow controller all frames with unknown source MAC
addresses are punted to the controller, which builds the dynamic MAC address table and
downloads the modified forwarding information to all switches participating in a layer-2 segment.
This is the approach used by NECs ProgrammableFlow solution.
Drawback: controller gets involved in the data plane, which limits the scalability of the solution.
Offload dynamic MAC learning to specialized service nodes, which serve as an intermediary
between the predictive static world of virtual switching, and the dynamic world of VLANs. It
seems NVP used this approach in one of the early releases.
Drawback: The service nodes become an obvious chokepoint; an additional hop through a
service node increases latency.
Give up, half-ditch OpenFlow, and implement either dynamic MAC learning in virtual switches in
parallel with OpenFlow, or reporting of dynamic MAC addresses to the controller using a nonOpenFlow protocol (to avoid data path punting to the controller). It seems recent versions of
VMware NSX use this approach.
Page 3-43
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
ARP is a nasty beast in an OpenFlow world its a control-plane protocol and thus not
implementable in the pure OpenFlow switches. The implementers have (yet again) two choices:
Punt the ARP packets to the controller, which yet again places the OpenFlow controller in the
forwarding path (and limits its scalability);
Solve layer-3 forwarding with a different tool (approach used by VMware NSX and distributed
layer-3 forwarding in OpenStack Icehouse).
Page 3-44
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Instead of continuously adjusting the tool to make it fit for the job, lets step back a bit and ask
another question: what information do we really need to implement layer-2 and layer-3 forwarding
in an overlay virtual network? All we need are three simple lookup tables that can be installed via
any API mechanism of your choice (Hyper-V uses PowerShell)
IP forwarding table;
ARP table;
VM MAC-to-underlay IP table.
Some implementations would have a separate connected interfaces table; other
implementations would merge that with the forwarding table. There are also
implementations merging ARP and IP forwarding tables.
These three tables, combined with local layer-2 and layer-3 forwarding is all you need. Wouldnt it
be better to keep things simple instead of introducing yet-another less-than-perfect abstraction
layer?
Page 3-45
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The blog post explaining how OpenFlow doesnt fit the needs of overlay virtual networks triggered a
flurry of questions along the lines of do you think theres no need for OpenFlow? Heres the
response:
IS OPENFLOW USEFUL?
OpenFlow is just a tool that allows you to install PBR-like forwarding entries into networking devices
using a standard protocol that should work across multiple vendors (more about that in another blog
post). From this perspective OpenFlow offers the same functionality as BGP FlowSpec or ForCES,
and a major advantage: its already implemented in networking gear from numerous vendors.
Where could you use PBR-like functionality? Im positive you already have a dozen ideas with
various levels of craziness; here are a few more:
Intelligent SPAN ports that collect only the traffic youre interested in;
OpenFlow has another advantage over BGP FlowSpec it has the packet-in and packet-out
functionality that allows the controller to communicate with the devices outside of the OpenFlow
network. You could use this functionality to implement new control-plane protocols or (for example)
interesting layered authentication scheme that is not available in off-the-shelf switches.
Page 3-46
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Summary: OpenFlow is a great low-level tool that can help you implement numerous interesting
ideas, but I wouldnt spend my time reinventing the switching fabric wheel (or other things we
already do well).
Page 3-47
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Its easy to say OpenFlow allows you to separate the forwarding and control planes, and control
multiple devices from a single controller, but how do you implement the control plane? How does
the control plane interact with the outside world? How do you implement legacy protocols in an
OpenFlow controller and do you have to implement them? Youll get answers to all these questions
in this chapter.
Can you build an OpenFlow-based network with existing hardware? Is it possible to build a multivendor network? These questions are answered in the second half of the chapter, which focuses on
vendor-specific implementation details.
MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:
Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;
Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);
2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
IN THIS CHAPTER:
CONTROL PLANE IN OPENFLOW NETWORKS
IS OPEN VSWITCH CONTROL PLANE IN-BAND OR OUT-OF-BAND?
IMPLEMENTING CONTROL-PLANE PROTOCOLS WITH OPENFLOW
LEGACY PROTOCOLS IN OPENFLOW-BASED NETWORKS
OPENFLOW 1.1 IN HARDWARE: I WAS WRONG
OPTIMIZING OPENFLOW HARDWARE TABLES
OPENFLOW SUPPORT IN DATA CENTER SWITCHES
MULTI-VENDOR OPENFLOW MYTH OR REALITY?
HYBRID OPENFLOW, THE BROCADE WAY
OPEN DAYLIGHT INTERNET EXPLORER OR LINUX OF THE SDN WORLD?
Page 4-2
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
How do you build a control plane network in a distributed controller-based system? How does the
controller communicate with the devices it controls? Should it use in-band or out-of-band
communication? This blog post, written in late 2013, tries to provide some answers.
Page 4-3
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OpenFlow is an application-level protocol running on top of TCP (and optionally TLS) the controller
and controlled device are IP hosts using IP connectivity services of some unspecified control plane
network. Does that bring back fond memories of SDH/SONET days? It should.
Page 4-4
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 4-5
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
You could (in theory) build another OpenFlow-controlled network to implement controlplane network you need, but youd quickly end with turtles all the way down.
On the other hand, out-of-band control plane network is safe: we know how to build a robust L3
network with traditional gear, and a controller bug cannot disrupt the control-plane communication.
I would definitely use this approach in data center environment, where the costs of implementing a
dedicated 1GE control-plane network wouldnt be prohibitively high.
Would the same approach work in WAN/Service Provider environments? Of course it would after
all, weve been using it forever to manage traditional optical gear. Does it make sense? It definitely
does if you already have an out-of-band network, less so if someone asks you to build a new one to
support their bleeding-edge SDN solution.
Page 4-6
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 4-7
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
A few days after I wrote the Control plane in OpenFlow networks blog post, I got a comment
saying we worked really hard to implement numerous safeguards that make Open vSwitch in-band
control plane safe. Heres the whole story:
If you buy servers with a half dozen interfaces (I wouldn't), then it makes perfect sense to follow the
usual design best practices published by hypervisor vendors, and allocate a pair of interfaces to user
traffic, another pair to management/control plane/vMotion traffic, and a third pair to storage traffic.
Problem solved.
Page 4-8
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Buying servers with two 10GE uplinks (what I would do) definitely makes your cabling friend happy,
and reduces the overall networking costs, but does result in slightly more interesting hypervisor
configuration.
Best case, you split the 10GE uplinks into multiple virtual uplink NICs (example: Cisco/s Adapter
FEX, Broadcom's NIC Embedded Switch, or SR-IOV) and transform the problem into a known
problem (see above) but what if you're stuck with two uplinks?
Page 4-9
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Figure 4-4: Logical interfaces created on physical NICs appear as physical interfaces to the hypervisor
Page 4-10
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Figure 4-5: Overlay virtual networks are not connected to the physical NICs
Page 4-11
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Figure 4-6: Hypervisor TCP/IP stack running in parallel with the Open vSwitch
Page 4-12
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
For example, OVS Neutron agent creates a dedicated bridge for each uplink, and connects OVS
uplinks and the host TCP/IP stack to the physical uplinks through the per-interface bridge. That
setup ensures the control-plane traffic continues to flow even when a bug in Neutron agent or OVS
breaks VM connectivity across OVS. For more details see OpenStack Networking in Too Much Detail
blog post published on RedHat OpenStack site.
Page 4-13
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 4-14
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Needless to say, this approach usually wont result in better forwarding behavior. For example, it
would be hard to implement layer-2 multipathing in hybrid OpenFlow network if the switches rely on
STP to detect and break the loops.
The Packet-out message is used by the OpenFlow controller to send packets through any port of
any controlled switch.
The Packet-in message is used to send messages from the switches to the OpenFlow controller.
You could configure the switches to send all unknown packets to the controller, or set up flow
matching entries (based on controllers MAC/IP address and/or TCP/UDP port numbers) to select
only those packets the controller is truly interested in.
For example, you could write a very simple implementation of STP (similar to what Avaya is doing
on their ERS-series switches when they run MLAG) where the OpenFlow controller would always
pretend to be the root bridge and shut down any ports where inbound BPDUs would indicate
someone else is the root bridge:
Page 4-15
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Send BPDUs through all the ports claiming the controller is the root bridge with very high
priority;
Configure flow entries that match the multicast destination address used by STP and forward
those packets to the controller;
Inspect incoming BPDUs, and shut down the port if the BPDU indicates someone else claims to
be a root bridge.
SUMMARY
OpenFlow protocol allows you to implement any control-plane protocol you wish in the OpenFlow
controller; if a controller does not implement the protocols you need in your data center, its not due
to lack of OpenFlow functionality, but due to other factors (fill in the blanks).
If the OpenFlow product youre interested in uses hybrid-mode OpenFlow (where the control plane
resides in the traditional switch software) or uses OpenFlow to program overlay networks (example:
Niciras NVP), you dont have to worry about its control-plane protocols.
If, however, someone tries to sell you software thats supposed to control your physical switches,
and does not support the usual set of protocols you need to integrate the OpenFlow-controlled
switches with the rest of your network (example: STP, LACP, LLDP on L2 and some routing protocol
on L3), think twice. If you use the OpenFlow-controlled part of the network in an isolated fabric or
small-scale environment, you probably dont care whether the new toy supports STP or OSPF; if you
want to integrate it with the rest of your existing data center network, be very careful.
Page 4-16
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Most OpenFlow controller vendors try to ignore the legacy control plane protocols; after all, theres
no glory to be had in implementing LACP, LLDP or STP. Their myopic vision might hinder the success
of your OpenFlow deployment, as youll have to integrate the new network with the legacy
equipment.
Page 4-17
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Overlay solutions (like VMware NSX) dont interact with the existing network at all. A hypervisor
running Open vSwitch and using STT or GRE appears as an IP host to the network, and uses existing
Linux mechanisms (including NIC bonding and LACP) to solve the L2 connectivity issues.
Layer-2 gateways included with VMware NSX for multiple hypervisors support STP and
LACP. VM-based gateways included with VMware NSX for vSphere run routing protocols
(BGP, OSPF and IS-IS) and rely on underlying hypervisors support of layer-2 control plane
protocols (LACP and LLDP).
Hybrid OpenFlow solutions that only modify the behavior of the user-facing network edge (example:
per-user access control) are also OK. You should closely inspect what the product does and ensure it
doesnt modify the network device behavior you rely upon in your network, but in principle you
should be fine. For example, the XenServer vSwitch Controller modifies just the VM-facing behavior,
but not the behavior configured on uplink ports.
Rip-and-replace OpenFlow-based network fabrics are the truly interesting problem. Youll have to
connect existing hosts to them, so youd probably want to have LACP support (unless youre a
VMware-only shop), and theyll have to integrate with the rest of the network, so you should ask for
at least:
LACP, if you plan to connect anything but vSphere hosts to the fabric and youll probably need
a device to connect the OpenFlow-based part of the network to the outside world;
LLDP or CDP. If nothing else, they simplify troubleshooting, and they are implemented on almost
everything including vSphere vSwitch.
STP unless the OpenFlow controller implements split horizon bridging like vSpheres vSwitch, but
even then we need basic things like BPDU guard.
Page 4-18
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Call me a grumpy old man, but I wouldnt touch an OpenFlow controller that doesnt support the
above-mentioned protocols. Worst case, if I would be forced to implement a network using such a
controller, I would make sure its totally isolated from the rest of my network. Even then a single
point of failure wouldnt make much sense, so I would need two firewalls or routers and static
routing in redundant scenarios breaks sooner or later. You get the picture.
To summarize: dynamic link status and routing protocols were created for a reason. Dont allow
glitzy new-age solutions to daze you, or you just might experience a major headache down the road.
Page 4-19
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
In 2011 I thought we might have to wait a few years before seeing the first products supporting
multiple lookup tables introduced by OpenFlow 1.1. I was wrong about the lack of hardware support
for OpenFlow 1.1 the first proof-of-concept products appeared a few months later. Unfortunately
that product never became mainstream because the hardware it uses is too expensive we had to
wait till September 2013 to get first production-grade OpenFlow 1.3 switches (almost all vendors
decided to skip OpenFlow versions 1.1 and 1.2).
Page 4-20
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Initial hardware OpenFlow implementations installed OpenFlow forwarding rules in TCAM (the
specialized memory used to implement packet filters and policy-based routing), resulting in a dismal
maximum number of forwarding entries. Most vendors quickly realized its possible to combine
multiple hardware tables available in their switching silicon, and present them as a single table to an
OpenFlow controller.
Page 4-21
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
some variant of binary tree for L3 switching). The two or three switching tables would appear as a
single OpenFlow table to the controller, and the hardware switch would be able to install more flows.
Quite ingenious;)
The vendors using this approach include Arista (L2), Cisco (L2), and Dell Force 10 (L2 and L3). HP is
using both MAC table and TCAM in its 5900 switch, but presents them as two separate tables to the
OpenFlow controller (at least that was my understanding of their documentation please do correct
me if I got it wrong), pushing the optimization challenge back to the controller.
Page 4-22
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
In spring 2014 most data center switching vendors supported OpenFlow on at least some of their
products. Heres an overview documenting the state of data center switching market in May 2014:
Most vendors have a single OpenFlow lookup table (one of the limitations of OpenFlow 1.0), HP
has a single table on 12500, two tables on 5900, and a totally convoluted schema on Procurve
switches.
Most vendors work with a single controller. Ciscos Nexus switches can work with up to 8
concurrent controllers, HP switches with up to 64 concurrent controllers.
Page 4-23
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Many vendors optimize the OpenFlow lookup table by installing L2-only or L3-only flow entries in
dedicated hardware (which still looks like the same table to the OpenFlow controller);
OpenFlow table sizes remain dismal. Most switches support low thousands of 12-tuple flows.
Exception: NEC edge switches supports between 64K and 160K 12-tuple flows.
While everyone supports full 12-tuple matching (additionally, HP supports IPv6, MPLS, and PBB),
almost no one (apart from HP) offers significant packet rewrite functionality. Most vendors can
set destination MAC address or push a VLAN tag; HPs 5900 can set any field in the packets,
copy/decrement IP or MPLS TTL, and push VLAN, PBB or MPLS tags.
Summary: Its neigh impossible to implement anything but destination-only L2+L3 switching at
scale using existing hardware (the latest chipsets from Intel or Broadcom arent much better) and
I wouldnt want to be a controller vendor dealing with idiosyncrasies of all the hardware out there
all you can do consistently across most hardware switches is forward packets (without rewrites),
drop packets, or set VLAN tags.
Page 4-24
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Based on the state of OpenFlow support in existing data center switches (see the previous post), its
fair to ask the question is it realistic to expect multi-vendor OpenFlow deployments? The answer I
got in May 2013 was no, unless you want to live with extremely baseline functionality. The
situation wasnt any better in August 2014 when this chapter was last updated.
Page 4-25
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Figure 4-8: Interop 2013 OpenFlow demo network (source: NEC Corporation of America)
In a mixed-vendor environments, ProgrammableFlow controller obviously cannot use all the smarts
of the PF5240 switches; it has to fall back to the least common denominator (vanilla OpenFlow 1.0)
and install granular flows in every single switch along the path, significantly increasing the time it
takes to install new flows after a core link failure.
Page 4-26
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Will the multi-vendor OpenFlow get any better? It might OpenFlow 1.3 has enough functionality to
implement the Edge+Core design, but of course there arent too many OpenFlow 1.3 products out
there ... and even the products that have been announced might not have the features
ProgrammableFlow controller needs to scale the OpenFlow fabric.
For the moment, the best advice I can give you is If you want to have a working OpenFlow data
center fabric, stick with NEC-only solution.
Page 4-27
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Most traditional data center switching vendors implemented hybrid OpenFlow functionality that
allows an OpenFlow controller to manage individual ports or VLANs instead of the whole switch.
Brocade was probably the first vendor that shipped a working solution (in June 2012).
Protected hybrid port mode uses OpenFlow FIB for certain VLANs or packets matching a packet
filter (ACL). This mode allows you to run OpenFlow in parallel (ships-in-the-night) with the
Page 4-28
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
traditional forwarding over the same port a major win if youre not willing to spend money for
two 100GE ports (one for OpenFlow traffic, another for regular traffic).
Unprotected hybrid port mode performs a lookup in OpenFlow FIB first and uses the traditional
FIB as a fallback mechanism (in case theres no match in the OpenFlow table). This mode can be
used to augment the traditional forwarding mechanisms (example: OpenFlow-controlled PBR) or
create value-added services on top of (not in parallel with) the traditional network.
The set of applications that one can build with the hybrid OpenFlow is well known from policybased routing and traffic engineering to bandwidth-on-demand. However, Brocade MLX has one
more trick up its sleeve: it supports packet replication actions that can be used to implement
behavior similar to IP Multicast or SPAN port functionality. You can use that feature in environments
that need reliable packet delivery over UDP to increase the chance that at least a single copy of the
packet will reach the destination.
I like the hybrid approach Brocade took (its quite similar to what Juniper is doing with its integrated
OpenFlow) and the interesting new features (like the packet replication), but the big question
remains unanswered: where are the applications (aka OpenFlow controllers)? At the moment,
everyone (Brocade included) is partnering with NEC or demoing their gear with public-domain
controllers. Is this really the best the traditional networking vendors can do? I sincerely hope not.
Page 4-29
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Is Open Daylight the right answer to the controller wars that seemed inevitable in early 2013?
Heres my take (written in February 2013):
Page 4-30
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
pay for the operating systems, the more money will be left to buy hardware. For more details, you
absolutely have to read Be Wary of Geeks Bearing Gifts by Simon Wardley.
So what will Daylight be? Another Internet Explorer (killing the OpenFlow controller market, Big
Switch in particular) or another Linux (a good product ensuring OpenFlow believers continue
spending money on hardware, not software)? I'm hoping we'll get a robust networking Linux, but
your guess is as good as mine.
Page 4-31
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
An architecture in which a central controller run the control plane and uses attached devices as pure
forwarding elements has numerous scalability challenges, including:
Existing hardware (data center switches) supports low thousands of full OpenFlow entries,
making it useless for large-scale deployments;
Existing hardware switches can install at most a few thousand new flow entries per second;
Data plane punting and packet forwarding to the controller in existing switches is extremely slow
when compared to the regular data plane forwarding performance.
MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:
Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;
Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);
2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
This chapter describes numerous challenges every OpenFlow controller implementation has to
overcome to work well in large-scale environments. Use it as a (partial) checklist when evaluating
OpenFlow controller products and solutions.
IN THIS CHAPTER:
OPENFLOW FABRIC CONTROLLERS ARE LIGHT-YEARS AWAY FROM WIRELESS ONES
OPENFLOW AND FERMI ESTIMATES
50 SHADES OF STATEFULNESS
FLOW TABLE EXPLOSION WITH OPENFLOW 1.0 (AND WHY WE NEED OPENFLOW
1.3)
FLOW-BASED FORWARDING DOESNT WORK WELL IN VIRTUAL SWITCHES
PROCESS, FAST AND CEF SWITCHING AND PACKET PUNTING
CONTROLLER-BASED PACKET FORWARDING IN OPENFLOW NETWORKS
CONTROL-PLANE POLICING IN OPENFLOW NETWORKS
PREFIX-INDEPENDENT CONVERGENCE (PIC): FIXING THE FIB BOTTLENECK
FIB UPDATE CHALLENGES IN OPENFLOW NETWORKS
Page 5-2
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 5-3
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OpenFlow controllers are usually compared with wireless controllers (particularly when someone
tries to prove that theyre a good idea). Nothing could be further from the truth.
TOPOLOGY MANAGEMENT
Wireless controllers work with the devices on the network edge. A typical wireless access point has
two interfaces: a wireless interface and an Ethernet uplink, and the wireless controller isnt
managing the Ethernet interface or any control-plane protocols that interface might have to run. The
wireless access point communicates with the controller through an IP tunnel and expects someone
Page 5-4
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
else to provide IP connectivity, routing and failure recovery. The underlying physical topology of the
network is thus totally abstracted and invisible to the wireless controller.
Data center fabrics are built from high-speed switches with tens of 10/40GE ports, and the
OpenFlow controller must manage topology discovery, topology calculation, flow placement, failure
detection and fast rerouting. There are zillions of things you have to do in data center fabrics that
you never see in a controller-based wireless network.
TRAFFIC FLOW
In traditional wireless networks all traffic flows through the controller (there are some exceptions,
but lets ignore them for the moment). The hub-and-spoke tunnels between the controller and the
individual access points carry all the user traffic and the controller is doing all the smart forwarding
decisions.
In an OpenFlow-based fabric the controller should do a minimal amount of data-plane decisions
(ideally: none) because every time you have to punt packets to the controller, you reduce the
overall network performance (not to mention the dismal capabilities of todays switches when they
have to do CPU-based packet forwarding across an SSL session).
AMOUNT OF TRAFFIC
Wireless access points handle megabits of traffic, making a hub-and-spoke controller-based
forwarding a viable alternative.
Page 5-5
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Data center fabrics are usually multi-terabit structures (every single pizza-box ToR switch has over a
terabit of forwarding capacity) three to four orders of magnitude faster than the wireless network
were comparing them with. Controller-based forwarding is totally unrealistic.
FORWARDING INFORMATION
In a traditional controller-based wireless network, the access point forwarding is totally stupid the
access points forward the data between directly connected clients (if allowed to do so) or send the
data received from them into the IP tunnel established with the controller (and vice versa). Theres
no forwarding state to distribute; all an access point needs to know are the MAC addresses of the
wireless clients.
In an OpenFlow-based fabric the controller must distribute as much forwarding, filtering and
rewriting (example: decrease TTL) information as possible to the OpenFlow-enabled switches to
minimize the amount of traffic flowing through the controller.
Furthermore, smart OpenFlow controllers build forwarding information in a way that allows the
switches to cope with the link failures (the controller has to install backup entries with lower
matching priority); you wouldnt want to have an overloaded controller and burnt-out switch CPU
every time a link goes down, network topology is lost, and the switch (in deep panic) forwards all
the traffic to the controller.
The functionality of a good OpenFlow controller that proactively pre-programs backup forwarding
entries (example: NEC ProgrammableFlow) is very similar to MPLS Traffic Engineering with Fast
Reroute; you cannot expect its complexity to be significantly lower than that.
Page 5-6
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
REAL-TIME EVENTS
User roaming is the only real-time event in a controller-based wireless network (remember: access
point uplink failure is not handled by the controller). Access points do most of the work on their own
(the expected behavior is specified in IEEE standards anyway), and the controller just updates the
MAC forwarding information. The worst thing that can happen if the controller is too slow is a slight
delay experienced by the user (noticeable only on voice calls and by players of WoW sessions
running around large buildings).
The other near-real-time wireless event is user authentication, which often takes seconds (or my
wireless network is severely misconfigured). Yet again, nothing critical; the controller can take its
time.
In data center fabrics, you have to react to a failure in milliseconds and reprogram the forwarding
entries on tens of switches (unless you know what youre doing and already installed the precomputed backup entries see above).
Page 5-7
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
SUMMARY
As you can see, wireless controllers have nothing to do with OpenFlow controllers; they arent even
remotely similar in requirements or complexity (the only exception being OpenFlow controllers that
program just the network edge, like Niciras NVP).
Comparing the two is misleading and hides the real scope of the problem; no wonder some people
would love you to believe otherwise because that makes selling the controller-based fabrics easier.
In reality, an OpenFlow controller managing a physical data center fabric is a complex piece of realtime software, as anyone who tried to build a high-end switch or router has learned the hard way.
Page 5-8
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Before going into the details of OpenFlow scalability challenges, lets try to estimate the size of the
problem were dealing with.
Page 5-9
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
And now for the real question that triggered this blog post: some people still think we can
implement stateful OpenFlow-based network services (NAT, FW, LB) in hardware. How realistic is
that?
Scenario: web application(s) hosted in a data center with 10GE WAN uplink.
Questions:
How many new sessions are established per second (how many OpenFlow flows does the
controller have to install in the hardware)?
How many parallel sessions will there be (how many OpenFlow flows does the hardware have to
support)?
Page 5-10
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Assuming a constant stream of users with these characteristics, we get 125.000 new sessions over a
10GE every 5 seconds or 25.000 new sessions per second per 10Gbps.
Always do a reality check. Is this number realistic? Load balancing vendors support way more
connections per second (cps) @ 10 Gbps speeds. F5 BIG-IP 4000s claims 150K cps @ 10 Gbps, and
VMware claims its NSX Edge Services Router (improved vShield Edge) will support 30K cps @ 4
Gbps. It seems my guestimate is on the lower end of reality (if you have real-life numbers, please
do share them in comments!).
Modern web browsers use persistent HTTP sessions. Browsers want to keep sessions established as
long as possible, web servers serving high-volume content commonly drop them after ~15 seconds
to reduce the server load (Apache is notoriously bad at handling very high number of concurrent
sessions). 25.000 cps x 15 seconds = 375.000 flow records.
Trident-2-based switches can handle 100K+ L4 OpenFlow entries (at least BigSwitch claimed so
when we met @ NFD6). Thats definitely on the low end of the required number of sessions at 10
Gbps; do keep in mind that the total throughput of a typical Trident-2 switch is above 1 Tbps or
three orders of magnitude higher. Enterasys switches support 64M concurrent flows @ 1Tbps, which
seems to be enough.
The flow setup rate on Trident-2-based switches is supposedly still in low thousands, or an order of
magnitude too low to support a single 10 Gbps link (the switches based on this chipset usually have
64 10GE interfaces).
Now is the time for someone to invoke the ultimate Moores Law spell and claim that the hardware
will support whatever number of flow entries in not-so-distant future. Good luck with that; Ill settle
for an Intel Xeon server that can be pushed to 25 mpps. OpenFlow has its uses, but large-scale
stateful services is obviously not one of them.
Page 5-11
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
State kept by networking devices is obviously one of the factors impacting scalability. Lets see how
much state we might need, how we can reduce the amount of state kept in a device, and how we
can get rid of real-time state changes.
50 SHADES OF STATEFULNESS
A while ago Greg Ferro wrote a great article describing integration of overlay and physical networks
in which he wrote that an overlay network tunnel has no state in the physical network, triggering
an almost-immediate reaction from Marten Terpstra (of RIPE fame, now @ Plexxi) arguing that the
network (at least the first ToR switch) knows the MAC and IP address of hypervisor host and thus
has at least some state associated with the tunnel.
Marten is correct from a purely scholastic perspective (using his argument, the network keeps some
state about TCP sessions as well), but what really matters is how much state is kept, which
device keeps it, how its created and how often it changes.
Page 5-12
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Decades ago we had a truly reliable system that kept session state in every single network
node; it never lost a packet, but it barely coped with 2 Mbps links (the oldtimers might
remember it as X.25).
The state granularity should get ever coarser as you go deeper into the network core edge
switches keep MAC address tables and ARP/ND caches of adjacent end hosts, core routers know
about IP subnets, routers in public Internet know about the publicly advertised prefixes (including
every prefix Bell South ever assigned to one of its single-homed customers), while the high-speed
MPLS routers know about BGP next hops and other forwarding equivalence classes (FECs)
Page 5-13
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 5-14
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
SUMMARY
Whenever youre evaluating a network architecture or reading a vendor whitepaper describing nextgeneration unicorn-tears-blessed solution, try to identify how much state individual components
keep, how its created and how often it changes. Hardware devices storing plenty of state tend to be
complex and expensive (keep that in mind when evaluating the next application-aware fabric).
Not surprisingly, RFC 3429 (Some Internet Architectural Guidelines and Philosophy) gives
you similar advice, although in way more eloquent form.
Page 5-15
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
In the initial What is OpenFlow blog post I mentioned multi-table support and why its crucial to
scalable OpenFlow implementation. It took me almost two years to write a follow-up blog post
explaining the scalability problems of OpenFlow 1.0.
Page 5-16
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Well focus on a single layer-2 segment (you really dont want to get me started on the complexities
of scalable OpenFlow-based layer-3 forwarding) implemented on a single hardware switch. Our
segment will have two web servers (port 1 and 2), a MySQL server (port 3), and a default gateway
on port 4.
The default gateway could be a firewall, a router, or a load balancer it really doesnt
matter if we stay focused on layer-2 forwarding.
Page 5-17
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Flow match
Action
DMAC = Web-1
Forward to port 1
DMAC = Web-2
Forward to port 2
DMAC = MYSQL-1
Forward to port 3
DMAC = GW
Forward to port 4
=
Smart switches wouldnt store the MAC-only flow rules in TCAM; they would use other
forwarding structures available in the switch like MAC hash tables.
Perform packet forwarding based on destination MAC address and VLAN tag.
Page 5-18
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Switches using OpenFlow 1.0 forwarding model cannot perform more than one operation during the
packet forwarding process they must match the input port and destination MAC address in a single
flow rule, resulting in a flow table similar to this one:
Flow match
Action
Forward to port 1
Forward to port 1
Forward to port 1
Forward to port 2
Forward to port 2
Forward to port 2
The number of TCAM entries needed to support multi-tenant layer-2 forwarding has exploded:
Page 5-19
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Action
TCP SRC = 80
Permit
Permit
Permit
Permit
Permit
Anything else
Drop
By now youve probably realized what happens when you try to combine the input ACL with other
forwarding rules. The OpenFlow controller has to generate a Cartesian product of all three
requirements: the switch needs a flow entry for every possible combination of input port, ACL entry
and destination MAC address.
Page 5-20
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Table #1 ACL and tenant classification table. This table would match input ports (for tenant
classification) and ACL entries, drop the packets not matched by input ACLs, and redirect the
forwarding logic to correct per-tenant table.
A typical switch would probably have to implement the first table with a TCAM. All the other tables
could use the regular MAC forwarding logic (MAC forwarding table is usually orders of magnitude
bigger than TCAM). Scalability problem solved.
Summary: Buy switches and controllers that support OpenFlow 1.3
Page 5-21
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 5-22
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
After you spend a few minutes researching the data sheets of existing OpenFlow-capable switches
from major networking vendors it becomes painfully obvious that flow-based forwarding makes no
sense on hardware switching platforms. Surprisingly, the virtual switches arent much better.
Page 5-23
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
If youre old enough to remember the Catalyst 5000, youre probably getting unpleasant flashbacks
of Netflow switching but the problems we experienced with that solution must have been caused
by poor hardware and underperforming CPU, right? Well, it turns out virtual switches dont fare
much better.
Digging deep into the bowels of Open vSwitch reveals an interesting behavior: flow eviction. Once
the kernel module hits the maximum number of microflows, it starts throwing out old flows. Makes
perfect sense after all, thats how every caching system works until you realize the default limit
is 2500 microflows, which is barely good enough for a single web server and definitely orders of
magnitude too low for a hypervisor hosting 50 or 100 virtual machines.
Page 5-24
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
WHY, OH WHY?
The very small microflow cache size doesnt make any obvious sense. After all, web servers easily
handle 10.000 sessions and some Linux-based load balancers handle an order of magnitude more
sessions per server. While you can increase the default cache size, ones bound to wonder what the
reason for the dismally low default value is.
I wasnt able to figure out what the underlying root cause is, but Im suspecting it has to do with
per-flow accounting flow counters have to be transferred from the kernel module to the user-mode
daemon periodically. Copying hundreds of thousands of flow counters over a user-to-kernel socket
at short intervals might result in somewhat noticeable CPU utilization.
Page 5-25
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
After establishing the size of the problem, lets move forward to the first scalability obstacle
controller-based packet forwarding. The review of an existing network platform behavior (Cisco IOS)
might help you understand the challenges of large-scale OpenFlow implementations.
Page 5-26
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Interrupt switching (packet forwarding within the interrupt handler) is much faster as it doesnt
involve context switching and potential process preemption. Theres a gotcha, though if you spend
too much time in an interrupt handler, the device becomes non-responsive, starts adding
unnecessary latency to forwarded packets, and eventually starts dropping packets due to receive
queue overflows (You dont believe me? Configure debug all on the console interface of a Cisco
router).
Theres not much you can do to speed up ACLs (which have to be read sequentially) and NAT is
usually not a big deal (assuming the programmers were smart enough to use hash tables).
Destination address lookup might be a real problem, more so if you have to do it numerous times
(example: destination is a BGP route with BGP next hop based on static route with next hop learnt
from OSPF). Welcome to fast switching.
Fast switching is a reactive cache-based IP forwarding mechanism. The address lookup within the
interrupt handler uses a cache of destinations to find the IP next hop, outgoing interface, and
outbound layer-2 header. If the destination is not found in the fast switching cache, the packet is
punted to the IP(v6) Input process, which eventually performs full-blown destination address lookup
(including ARP/ND resolution) and stores the results in the fast switching cache.
Fast switching worked great two decades ago (there were even hardware implementations of fast
switching) ... until the bad guys started spraying the Internet with vulnerability scans. No caching
code works well with miss rates approaching 100% (because every packet is sent to a different
destination) and very high cache churn (because nobody designed the cache to have 100.000 or
more entries).
When faced with a simple host scanning activity, routers using fast switching in combination with
high number of IP routes (read: Internet core routers) experienced severe brownouts because most
Page 5-27
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
of the received packets had destination addresses that were not yet in the fast switching cache, and
so the packets had to be punted to process switching. Welcome to CEF switching.
CEF switching (or Cisco Express Forwarding) is a proactive, deterministic IP forwarding mechanism.
Routing table (RIB) as computed by routing protocols is copied into forwarding table (FIB), where
its combined with adjacency information (ARP or ND table) to form a deterministic lookup table.
When a router uses CEF switching, theres (almost) no need to punt packets sent to unknown
destinations to IP Input process; if a destination is not in the FIB, it does not exist.
There are still cases where CEF switching cannot do its job. For example, packets sent to IP
addresses on directly connected interfaces cannot be sent to destination hosts until the router
performs ARP/ND MAC address resolution; these packets have to be sent to the IP Input process.
The directly connected prefixes are thus entered as glean adjacencies in the FIB, and as the router
learns MAC address of the target host (through ARP or ND reply), it creates a dynamic host route in
the FIB pointing to the adjacency entry for the newly-discovered directly-connected host.
Actually, you wouldnt want to send too many packets to the IP Input process; its better to create
the host route in the FIB (pointing to the bit bucket, /dev/null or something equivalent) even before
the ARP/ND reply is received to ensure subsequent packets sent to the same destination are
dropped, not punted behavior nicely exploitable by ND exhaustion attack.
Its pretty obvious that the CEF table must stay current. For example, if the adjacency information is
lost (due to ARP/ND aging), the packets sent to that destination are yet again punted to the process
switching. No wonder the router periodically refreshes ARP entries to ensure they never expire.
Page 5-28
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Proactive flow table setup, where the controller downloads flow entries into the switches based
on user configuration (ex: ports, VLANs, subnets, ACLs) and network topology;
Page 5-29
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Reactive flow table setup (or flow-driven forwarding), where the controller downloads flow
entries into the switches based on the unknown traffic the OpenFlow switches forward to the
controller.
Even though I write about flow tables, dont confuse them with per-flow forwarding that Doug
Gourlay loves almost as much as I do. A flow entry might match solely on destination MAC address,
making flow tables equivalent to MAC address tables, or it might match the destination IP address
with the longest IP prefix in the flow table, making the flow table equivalent to routing table or FIB.
The controller must know the topology of the network and all the endpoint addresses (MAC
addresses, IP addresses or IP subnets) for the proactive (predictive?) flow setup to work. If youd
have an OpenFlow controller emulating OSPF or BGP router, it would be easy to use proactive flow
setup; after all, the IP routes never change based on the application traffic observed by the
switches.
Intra-subnet L3 forwarding is already a different beast. One could declare ARP/ND to be an
authoritative control-plane protocol (please dont get me started on the shortcomings of ARP and
whether ES-IS would be a better solution) in which case you could use proactive flow setup to create
host routes toward IP hosts (using an approach similar to Mobile ARP what did I just say about
nothing being really new?).
However, most vendors marketing departments (with a few notable exceptions) think their gear
needs to support every bridging-abusing stupidity ever invented, from load balancing schemes that
work best with hubs to floating IP or MAC addresses used to implement high-availability solutions.
End result: the network has to support dynamic MAC learning, which makes OpenFlow-based
networks reactive nobody can predict when and where a new MAC address will appear (and its not
guaranteed that the first packet sent from the new MAC address will be an ARP packet), so the
Page 5-30
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
switches have to send user traffic with unknown source or destination MAC addresses to the
controller, and were back to packet punting.
Some bridges (lovingly called layer-2 switches) dont punt packets with unknown MAC addresses to
the CPU, but perform dynamic MAC address learning and unknown unicast flooding is in hardware...
but thats not how OpenFlow is supposed to work.
Within a single device the software punts packet from hardware (or interrupt) switching to
CPU/process switching, in a controller-based network the switches punt packet to the controller. Plus
a change, plus c'est la mme chose.
Page 5-31
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Packets punted to the controller from the data plane of an OpenFlow switch represent a significant
burden on the switch CPU. Large amount of punted packets (triggered, for example, by an address
scan) can easily result in a denial-of-service attack. Its time to reinvent another wheel: controlplane policing (CoPP).
Page 5-32
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Unfortunately, only a few hardware switches available on the market supports OpenFlow 1.3 yet,
and some of them might not support meters (or meters on flows sent to the controller). In the
meantime, proprietary extensions galore NEC used one to limit unicast flooding in its
ProgrammableFlow switches.
Page 5-33
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Time to move forward to another scalability roadblock: the number of flows you can install in a
hardware device per second. This limitation has nothing to do with OpenFlow; the choke point is the
communication path between the switch CPU and the forwarding hardware. Traditional switches and
routers had the same problems and solved them with Prefix Independent Convergence.
Its relatively easy to fine-tune OSPF or IS-IS and get convergence times in tens of milliseconds.
SPF runs reasonably fast on modern processors, more so with incremental SPF optimizations.
A platform using software-based switching can use the SPF results immediately (thus theres no
real need for LFA on a Cisco 7200).
The true bottleneck is the process of updating distributed forwarding tables (FIBs) from the IP
routing table (RIB) on platforms that use hardware switching. That operation can take a
relatively long time if you have to update many prefixes.
Page 5-34
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 5-35
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Adding support for OpenFlow on an existing switch doesnt change the underlying hardware.
OpenFlow agent on a hardware device has to deal with the same challenges as the traditional
control-plane software.
Page 5-36
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OpenFlow 1.0 could use flow matching priorities to implement primary/backup forwarding entries
and OpenFlow 1.1 provides a fast failover mechanism in its group tables that could be used for
prefix-independent convergence, but it's questionable how far you can get with existing hardware
devices, and PIC doesn't work in all topologies anyway.
Just in case youre wondering how existing L2 networks work at all data plane in highspeed switches performs dynamic MAC learning and populates the forwarding table in
hardware; the communication between the control and the data plane is limited to the bare
minimum (which is another reason why implementing OpenFlow agents on existing switches
is like attaching a jetpack to a camel).
Page 5-37
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Is there another option? Sure its called forwarding state abstraction, or for those more familiar
with MPLS terminology Forwarding Equivalence Class (FEC). While you might have thousands of
servers or VMs in your network, you have only hundreds of possible paths between switches. The
trick every single OpenFlow controller vendor has to use is to replace endpoint-based forwarding
entries in the core switches with path-indicating forwarding entries. Welcome back to virtual circuits
and BGP-free MPLS core. Its amazing how the old tricks keep resurfacing in new disguises every
few years.
Page 5-38
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Forwarding state abstraction (known as Forwarding Equivalence Classes in MPLS lingo) is the only
way toward scalable OpenFlow fabrics. The following blog post (written in February 2012) has some
of the details:
All the traffic that expects the same forwarding behavior gets the same label;
The intermediate nodes no longer have to inspect the individual packet/frame headers; they
forward the traffic solely based on the FEC indicated by the label.
The grouping/labeling operation thus greatly reduces the forwarding state in the core nodes (you
can call them P-routers, backbone bridges, or whatever other terminology you prefer) and improves
Page 5-39
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
the core network convergence due to significantly reduced number of forwarding entries in the core
nodes.
Figure 5-2: MPLS forwarding diagram from the Enterprise MPLS/VPN Deployment webinar
The core network convergence is improved due to reduced state not due to pre-computed
alternate paths that Prefix-Independent Convergence or MPLS Fast Reroute uses.
Page 5-40
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
When you use tunneling, the FEC is the tunnel endpoint all traffic going to the same tunnel
egress node uses the same tunnel destination address.
All sorts of tunneling mechanisms have been proposed to scale layer-2 broadcast domains and
virtualized networks (IP-based layer-3 networks scale way better by design):
Provider Backbone Bridges (PBB 802.1ah), Shortest Path Bridging-MAC (SPBM 802.1aq) and
vCDNI use MAC-in-MAC tunneling the destination MAC address used to forward user traffic
across the network core is the egress bridge or the destination physical server (for vCDNI).
Figure 5-3: SPBM forwarding diagram from the Data Center 3.0 for Networking Engineers webinar
VXLAN, NVGRE and GRE (used by Open vSwitch) use MAC-over-IP tunneling, which scales way
better than MAC-over-MAC tunneling because the core switches can do another layer of state
abstraction (subnet-based forwarding and IP prefix aggregation).
Page 5-41
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Figure 5-4: Typical VXLAN architecture from the Introduction to Virtual Networking webinar
TRILL is closer to VXLAN/NVGRE than to SPB/vCDNI as it uses full L3 tunneling between TRILL
endpoints with L3 forwarding inside RBridges and L2 forwarding between RBridges.
Figure 5-5: TRILL forwarding diagram from the Data Center 3.0 for Networking Engineers webinar
Page 5-42
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
With tagging or labeling a short tag is attached in front of the data (ATM VPI/VCI, MPLS label
stack on point-to-point links) or somewhere in the header (VLAN tags) instead of encapsulating the
users data into a full L2/L3 header. The core network devices perform packet/frame forwarding
based exclusively on the tags. Thats how SPBV, MPLS and ATM work.
Figure 5-6: MPLS-over-Ethernet frame format from the Enterprise MPLS/VPN Deployment webinar
Page 5-43
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
A few months after I wrote the Forwarding State Abstraction blog post, Martin Casado and his
team presented an article with similar ideas at the HotSDN conference. Heres my summary of that
article (written in August 2012):
THE PROBLEM
Contrary to what some pundits claim, flow-based forwarding will never scale. If youve been around
long enough to experience ATM-to-the-desktop failure, Multi-Layer Switching (MLS) kludges, demise
of end-to-end X.25, or the cost of traditional circuit switching telephony, you know what Im talking
about. If not, supposedly its best to learn from your own mistakes be my guest.
Before someone starts Moore Law incantations: software-based forwarding will always be more
expensive than predefined hardware-based forwarding. Yes, you can push tens of gigabits through a
highly optimized multi-core Intel server. You can also push 1,2Tbps through Broadcom chipset at
Page 5-44
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
comparable price. The ratios havent changed much in the last decades, and I dont expect them to
change in the near future.
SCALABLE ARCHITECTURES
The scalability challenges of flow-based forwarding have been well understood (at least within IETF,
ITU is living on a different planet) decades ago. Thats why we have destination-only forwarding,
variable-length subnet masks and summarization, and Diffserv (with a limited number of traffic
classes) instead of Intserv (with per-flow QoS).
The limitations of destination-only hop-by-hop forwarding were also well understood for at least two
decades and resulted in MPLS architecture and various MPLS-based applications (including MPLS
Traffic Engineering).
Theres a huge difference between MPLS TE forwarding mechanism (which is the right tool for the
job), and distributed MPLS TE control plane (which sucks big time). Traffic engineering is ultimately
an NP-complete knapsack problem best solved with centralized end-to-end visibility.
MPLS architecture solves the forwarding rigidity problems while maintaining core network scalability
by recognizing that while each flow might be special, numerous flows share the same forwarding
behavior.
Edge MPLS routers (edge LSR) thus sort the incoming packets into forwarding equivalence classes
(FEC), and use a different Label Switched Path (LSP) across the network for each of the forwarding
classes.
Page 5-45
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Please note that this is a gross oversimplification. Im trying to explain the fundamentals,
and (following a great example of physicists) ignore all the details... oops, take the ideal
case.
The simplest classification implemented in all MPLS-capable devices today is destination prefix-based
classification (equivalent to traditional IP forwarding), but theres nothing in MPLS architecture that
would prevent you from using N-tuples to classify the traffic based on source addresses, port
numbers, or any other packet attribute (yet again, ignoring the reality of having to use PBR with the
infinitely disgusting route-map CLI to achieve that).
Page 5-46
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Its hard to build resilient networks with centralized control plane and unreliable transport
between the controller and controlled devices (this problem was well known in the days of Frame
Relay and ATM);
Martin Casado, Teemu Koponen, Scott Shenker and Amin Tootoonchian addressed the second
challenge in their Fabric: A Retrospective on Evolving SDN paper, where they propose two layers in
an SDN architectural framework:
Edge switches, which classify the packets, perform network services, and send the packets
across core fabric toward the egress edge switch;
Not surprisingly, theyre also proposing to use MPLS labels as the fabric forwarding mechanism.
Page 5-47
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Existing MPLS implementations or protocols have no equivalent mechanism, and a mechanism for a
consistent implementation of a distributed network edge policy would be highly welcome (all of my
enterprise OpenFlow use cases fall into this category).
Page 5-48
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Using Forwarding Equivalence Classes (FECs) and path-based forwarding in an OpenFlow network
results in another simplification: core switches dont have to support the same rich functionality as
the edge switches.
Page 5-49
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
introduce a new protocol (example: IPv6) you have to deploy it on every single router throughout
the network, including all core routers.
On the other hand, you can introduce IPv6, IPX or AppleTalk (not really), or anything else in an
MPLS network, without upgrading the core routers. The core routers continue to provide a single
function: optimal transport based on MPLS paths signaled by the edge routers (either through LDP,
MPLS-TE, MPLS-TP or more creative approaches, including NETCONF-configured static MPLS labels).
The same ideas apply to OpenFlow-configured networks. The edge devices have to be smart and
support a rich set of flow matching and manipulation functionality; the core (fabric) devices have to
match on simple packet tags (VLAN tags, MAC addresses with PBB encapsulation, MPLS tags ...) and
provide fast packet forwarding.
Microsofts Hyper-V Network Virtualization uses a similar architecture with PowerShell instead of
OpenFlow/OVSDB as the hypervisor configuration API;
NECs ProgrammableFlow solution uses PF5420 (with 160K OpenFlow entries) at the edge and
PF5820 (with 750 full OpenFlow entries and 80K MAC entries) at the core.
Before you mention (multicast-based) VXLAN in the comments: I fail to see something softwaredefined in a technology that uses flooding to learn dynamic VM-MAC-to-VTEP-IP mappings.
Page 5-50
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The idea of edge and core OpenFlow makes perfect sense, but OpenFlow 1.0 doesnt support MPLS.
Could we use something else to make it work?
The following blog post was written in February 2012; in summer 2014 I inserted a few comments
to illustrate how we got nowhere in more than two years.
Page 5-51
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Tunneling support within existing OpenFlow-enabled data center switches is virtually non-existent
(Junipers MX routers with OpenFlow add-on might be an exception), primarily due to hardware
constraints.
We will probably see VXLAN/NVGRE/GRE implementations in data center switches in the next few
months, but I expect most of those implementations to be software-based and thus useless for
anything else but a proof-of-concept (August 2014: no major data center switching vendor supports
OpenFlow over any tunneling technology).
Cisco already has VXLAN-capable chipset in the M-series linecards; believers in merchant silicon will
have to wait for the next-generation chipsets (August 2014: Broadcoms and Intels chipsets support
VXLAN, but so far no vendor shipped VXLAN termination that would work with OpenFlow).
Page 5-52
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
destination-MAC combination at the egress node to recreate the original VLAN tag, but the solution
is messy, hard to troubleshoot, and immense fun to audit. But wait, it gets worse.
THE REALITY
I had the virtual circuits discussion with multiple vendors during the OpenFlow symposium and
Networking Tech Field Day and we always came to the same conclusions:
Everyone uses their own secret awesomesauce to solve the problem ... often with proprietary
OpenFlow extensions.
Someone was also kind enough to give me a hint that solved the secret awesomesauce riddle: We
can use any field in the frame header in any way we like.
Looking at the OpenFlow 1.0 specs (assuming no proprietary extensions are used) you can rewrite
source and destination MAC addresses to indicate whatever you wish you have 96 bits to work
with. Assuming the hardware devices support wildcard matches on MAC addresses (either by
supporting OpenFlow 1.1 or a proprietary extension to OpenFlow 1.0), you could use the 48 bits of
the destination MAC address to indicate egress node, egress port, and egress MAC address.
Page 5-53
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
I might have doubts about the VLAN translation mechanism described in the previous paragraph (I
am positive many security-focused engineers will have doubts), but the reuse header fields
approach is even more interesting to support. How can you troubleshoot a network if you never
know what the source/destination MAC addresses really mean?
SUMMARY
Before buying an OpenFlow-based data center network, figure out what the vendors are doing (they
will probably ask you to sign an NDA, which is fine), including:
What are the mechanisms used to reduce forwarding state in the OpenFlow-based network core?
Whats the actual packet format used in the network core (or: how are the fields in the packet
header really used?)
Will you be able to use standard network analysis tools to troubleshoot the network?
Page 5-54
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Lets conclude the forwarding scalability part of this chapter with a slightly irrelevant detour: is
MPLS tunneling?
Page 5-55
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
configured otherwise, IP routing protocol performs topology autodiscovery and LDP establishes a full
mesh of virtual circuits across the core.
VC merge: Virtual circuits from multiple ingress points to the same egress point can merge within
the network. VC merge significantly reduces the overall number of VCs (and the amount of state the
core switches have to keep) in fully meshed networks.
Its interesting to note that ITU wants to cripple MPLS to the point of being equivalent to
ATM/Frame Relay. MPLS-TP introduces out-of-band management network and management
plane-based virtual circuit establishment.
DOES IT MATTER?
It might seem like Im splitting hair just for the fun of it, but theres a significant scalability
difference between virtual circuits and tunnels: devices using tunnels appear as hosts to the
underlying network and require no in-network state, while solutions using virtual circuits (including
MPLS) require per-VC state entries (MPLS: inbound-to-outbound label mapping in LFIB) on every
forwarding device in the path. Even more, end-to-end virtual circuits (like MPLS TE) require state
maintenance (provided by periodic RSVP signaling in MPLS TE) involving every single switch in the
VC path.
You can find scalability differences even within the MPLS world: MPLS/VPN-over-mGRE (tunneling)
scales better than pure label-based MPLS/VPN (virtual circuits) because MPLS/VPN-over-mGRE relies
on IP transport and not on end-to-end LSPs between PE-routers. You can summarize loopback
addresses if you use MPLS/VPN-over-mGRE; doing the same in end-to-end-LSP-based MPLS/VPN
networks breaks them. L2TPv3 scales better than AToM for the same reason.
Page 5-56
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
All VC-based solutions require a signaling protocol between the end devices and the core switches
(or an out-of-band layer-8+ communication and management-plane provisioning). Two common
protocols used in MPLS networks are LDP (for IP routing-based MPLS) and RSVP (for traffic
engineering). Secure and scalable inter-domain signaling protocols are rare; VC-based solutions are
thus usually limited to a single management domain (state explosion is another problem that limits
the size of a VC-based network).
The only global networks using on-demand virtual circuits were the telephone system and X.25; one
of them already died because of its high per-bit costs, and the other one is surviving primarily
because were replacing virtual circuits (TDM voice calls) with tunnels (VoIP).
TANGENTIAL AFTERTHOUGHTS
Dont be sloppy with your terminology. Theres a reason we use different terms to indicate different
behavior it helps us understand the implications (ex: scalability) of the technology. For example,
its important to understand why bridging differs from routing and why its wrong to call them both
switching, and it helps if you understand that Fibre Channel actually uses routing (hidden deep
inside switching terminology).
Page 5-57
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Based on all the limitations documented in this chapter, its easy to see why nobody tries to use
OpenFlow to solve problems that reside above the transport layer (the following blog post has been
written in autumn of 2012; nothing has changed in the meantime).
Page 5-58
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
like to program those appliances from orchestration software. They have already solved the L4-7
appliance problem with existing open-source tools running on commodity x86 hardware.
Page 5-59
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Does it make sense to use OpenFlow on virtual switches, or is it usability limited to hardware
devices? I tried to give a few hints in July 2012 while answering questions from David Le Goff who
was at that time working for 6WIND.
Page 5-60
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Now, assuming youve cleaned up your design, you have switches that do fast packet forwarding
and have few needs for additional services, and the services-focused elements (firewalls, caches,
load balancers) that work on L4-7. These two sets of network elements have totally different
requirements:
Implementing fast (and dumb) packet forwarding on L2 (bridge) or L3 (router) on generic x86
hardware makes no sense. It makes perfect sense to implement the control plane on generic x86
hardware (almost all switch vendors use this approach) and generic OS platform, but it definitely
doesnt make sense to let the x86 CPU get involved with packet forwarding. Broadcom's chipset
can do a way better job for less money.
L4-7 services are usually complex enough to require lots of CPU power anyway. Firewalls
configured to perform deep packet inspection and load balancers inspecting HTTP sessions must
process the first few packets of every session by the CPU anyway, and only then potentially
offload the flow record to dedicated hardware. With optimized networking stacks, its possible to
get reasonable forwarding performance on well-designed x86 platforms, so theres little reason
to use dedicated hardware in L4-7 appliances today (SSL offload is still a grey area).
On top of everything else, shortsighted design of dedicated hardware used by L4-7 appliances
severely limits your options. Just ask a major vendor that needed years to roll out IPv6-enabled load
balancers, high-performance IPv6-enabled firewalls blade ... and still doesnt have hardware-based
deep packet inspection of IPv6 traffic.
Page 5-61
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
SUMMARY
While its nice to have high performance packet forwarding on generic x86 architecture, the
performance of software switching is definitely not an SDN showstopper. Also, keep in mind a
software appliance running on a single vCPU can provide up to a few gigabits of forwarding
performance, there are plenty of cores in todays Xeon-based servers (10Gbps per physical server is
thus very realistic), and not that many people have multiple 10GE uplinks from their data centers.
Page 5-62
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The final blog post in this chapter illustrates what happens when overexcited engineers forget the
harsh limits of reality. I hope this chapter gave you enough information to analyze how bad the idea
described in this blog post is (the blog post was written in late 2011, but there are still people
proposing similar solutions in 2014).
Page 5-63
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
A quick look back confirms that hunch: all technologies that required per-session state in every
network device have failed. IntServ (with RSVP) never really took off on a global scale, and ATM-tothe-desktop failed miserably. The only two exceptions are global X.25 networks (they were so
expensive that nobody ever established more than a few sessions) and voice networks (where
sessions usually last for minutes ... or hours if teenagers get involved).
Load balancers work as well as they do because a single device in the whole path (load balancer)
keeps the per-session state, and because you can scale them out if they become overloaded, you
just add another pair of redundant devices with new IP addresses to the load balancing pool (and
use DNS-based load balancing on top of them).
Some researchers have quickly figured out the scaling problem and theres work being done to make
the OpenFlow-based load balancing scale better, but one has to wonder: after theyre done and their
solution scales, will it be any better than what we have today, or will it just be different?
Moral of the story every time you hear about an incredible solution to a well-known problem ask
yourself: why werent we using it in the past? Were we really that stupid or are there some inherent
limitations that are not immediately visible? Will it scale? Is it resilient? Will it survive device or link
failures? And dont forget: history is a great teacher.
Page 5-64
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Traditional networking architectures and protocols are a perfect solution to a specific set of
problems: shortest-path destination-only layer-2 and layer-3 forwarding. Its amazing how many
problems one can solve with such a specific toolset, from scale-out data center fabrics to the global
Internet.
More complex challenges (example: traffic engineering) have been solved using the traditional
architecture of distributed loosely coupled independent nodes (example: MPLS TE), but could benefit
from a centralized network visibility.
Finally, the traditional solutions havent even tried to tackle some of the harder networking problems
(example: megaflow-based forwarding or centralized policies with on-demand deployment) that
could be solved with a controller-based architecture.
MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:
Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;
Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);
2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
This chapter contains several real-life SDN solutions, most of them OpenFlow-based. For alternate
approaches see the SDN Beyond OpenFlow chapter, for even more use cases watch the publicly
available videos from my OpenFlow-based SDN Use Cases webinar.
IN THIS CHAPTER:
OPENFLOW: ENTERPRISE USE CASES
OPENFLOW @ GOOGLE: BRILLIANT, BUT NOT REVOLUTIONARY
COULD IXPS USE OPENFLOW TO SCALE?
IPV6 FIRST-HOP SECURITY: IDEAL OPENFLOW USE CASE
OPENFLOW: A PERFECT TOOL TO BUILD SMB DATA CENTER
SCALING DOS MITIGATION WITH OPENFLOW
NEC+IBM: ENTERPRISE OPENFLOW YOU CAN ACTUALLY TOUCH
BANDWIDTH-ON-DEMAND: IS OPENFLOW THE SILVER BULLET?
OPENSTACK/QUANTUM SDN-BASED VIRTUAL NETWORKS WITH FLOODLIGHT
NICIRA, BIGSWITCH, NEC, OPENFLOW AND SDN
Page 6-2
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Half a year after the public launch of OpenFlow and SDN (in autumn 2011), we already identified
numerous enterprise use cases. Most of them are still largely ignored as every startup and major
networking vendor rushes toward the (supposedly) low hanging fruits of data center fabrics and
cloud-scale virtual networks.
policy based routing flow classifier followed by outgoing interface and/or VLAN tag push;
Combine that with the ephemeral nature of OpenFlow (whatever controller downloads into the
networking device does not affect running/startup configuration and disappears when its no longer
Page 6-3
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
needed), and the ability to use the same protocol with multiple product families, either from one or
multiple vendors, and you have a pretty interesting combo.
Actually, I dont care if the mechanism to change networking devices forwarding tables is OpenFlow
or something completely different, as long as its programmable, multi-vendor and integrated with
the existing networking technologies. As I wrote a number of times, OpenFlow is just a
TCAM/FIB/packet classifier download tool.
Remember one of OpenFlows primary use cases: add functionality where vendor is lacking it (see
Igor Gashinskys presentation from OpenFlow Symposium for a good coverage of that topic).
Now stop for a minute and remember how many times you badly needed some functionality along
the lines of the four functions I mentioned above (packet filters, PBR, static routes, NAT) that you
couldnt implement at all, or that required a hodgepodge of expect scripts (or XML/Netconf requests
if youre Junos automation fan) that you have to modify every time you deploy a different device
type or a different software release.
Here are a few ideas I got in the first 30 seconds (if you get other ideas, please do write a
comment):
Per-user access control (I guess NAC is the popular buzzword) that works identically on dial-up,
VPN, wireless and wired access devices;
Push user into a specific VLAN based on whatever hes doing (or based on customized user
authentication);
Give users controlled access to a single application in another VLAN (combine that with NAT to
solve return path problems);
Page 6-4
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Looking at my short list, it seems @beaker was right: security just might be the killer app for
OpenFlow/SDN OpenFlow could be used either to implement some security features (packet filters
and traffic steering), to help integrate traditional security functions with the rest of the network, or
to implement dynamic security services insertion at any point in the network something we badly
need but almost never get.
Page 6-5
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Google uses OpenFlow to control their WAN edge routers that they built from commodity switching
components. The details of their implementation are proprietary (and they havent open-sourced
their solution), heres what I was able to deduce from publicly available information in May 2012:
Page 6-6
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
A G-router is used as a WAN edge device in their data centers and runs traditional routing protocols:
EBGP with the data center routers and IBGP+IS-IS across WAN with other G-routers (or traditional
gear during the transition phase).
On top of that, every G-router has a (proprietary, I would assume) northbound API that is used by
Googles Traffic Engineering (G-TE) a centralized application thats analyzing the application
requirements, computing the optimal paths across the network and creating those paths through the
network of G-routers using the above-mentioned API.
I wouldnt be surprised if G-TE would use MPLS forwarding instead of installing 5-tuples into midpath switches. Doing Forwarding Equivalence Class (FEC) classification at the head-end device
instead of at every hop is way simpler and less loop-prone.
Like MPLS-TE, G-TE runs in parallel with the traditional routing protocols. If it fails (or an end-to-end
path is broken), G-routers can always fall back to traditional BGP+IGP-based forwarding, and like
with MPLS-TE+IGP, youll still have a loop-free (although potentially suboptimal) forwarding
topology.
IS IT SO DIFFERENT?
Not really. Similar concepts (central path computation) were used in ATM and Frame Relay
networks, as well as early MPLS-TE implementations (before Cisco implemented OSPF/IS-IS traffic
engineering extensions and RSVP that was all youd had).
Some networks are supposedly still running offline TE computations and static MPLS TE tunnels
because they give you way better results than the distributed MPLS-TE/autobandwidth/automesh
kludges.
Page 6-7
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
MPLS-TP is also going in the same direction paths are computed by NMS, which then installs in/out
label mappings (and fast failover alternatives if desired) to the Label Switch Routers (LSRs).
Page 6-8
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
than IS-IS with default timers), and has average link utilization above 90% (which in itself is a huge
money-saver).
HYPE GALORE
Based on the information from Open Networking Summit (which is all the information I have at the
moment), you might wonder what all the hype is about. In one word: OpenFlow. Lets try to debunk
those claims a bit.
Google is running an OpenFlow network. Get lost. Google is using OpenFlow between controller and
adjacent chassis switches because (like everyone else) they need a protocol between the control
plane and forwarding planes, and they decided to use an already-documented one instead of
inventing their own (the extra OpenFlow hype could also persuade hardware vendors and chipset
manufacturers to implement more OpenFlow capabilities in their next-generation products).
Google built their own routers ... and so can you. Really? Based on the scarce information from ONS
talks and interview in Wired, Google probably threw more money and resources at the problem than
a typical successful startup. They effectively decided to become a router manufacturer, and they did.
Can you repeat their feat? Maybe, if you have comparable resources.
Google used open-source software ... so the monopolistic Ciscos of the world are doomed. Just in
case you believe the fairy-tale conclusion, let me point out that many Internet exchanges use opensource software for BGP route servers, and almost all networking appliances and most switches built
today run on open source software (namely Linux or FreeBSD). Its the added value that matters, in
Googles case their traffic engineering solution.
Page 6-9
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Google built an open network really? They use standard protocols (BGP and IS-IS) like everyone
else and their traffic engineering implementation (and probably the northbound API) is proprietary.
How is that different (from the openness perspective) from networks built from Junipers or Ciscos
gear?
CONCLUSIONS
Googles engineers did a great job it seems they built a modern routing platform that everyone
would love to have, and an awesome traffic engineering application. Does it matter to you and me?
Probably not; I dont expect them giving their crown jewels away. Does it matter that they used
OpenFlow? Not really, its a small piece of their whole puzzle. Will someone else repeat their feat
and bring a low-cost high-end router to the market? I doubt, but I hope to be wrong.
Page 6-10
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OpenFlow might be an ideal tool to solve interesting problems that are too rare to merit attention of
traditional networking vendors. Internet Exchange Points (IXPs) might be one of those scenarios.
Page 6-11
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
On a somewhat tangential topic, Dean Pemberton runs OpenFlow in production in New Zealand
Internet Exchange. His deployment model is totally different: the IXP is a layer-3 fabric (not a layer2 fabric like most Internet exchanges), and his route server is the only way to exchange BGP routes
between members. Hes using Quagga and RouteFlow to program Pica8 switches.
A note from a grumpy skeptic: his deployment works great because hes carrying a pretty
limited number of BGP routes the Pica8 switches hes using support up to 12K routes.
IPv4 or IPv6? Who knows, the data sheet ignores that nasty detail.
Page 6-12
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
First-hop IPv6 security is another morass lacking a systemic solution. Could we solve it with
OpenFlow? Yes, we could but theres nobody approaching this problem from the controller-based
perspective (at least based on my knowledge in August 2014).
SHORT SUMMARY
Many layer-2 switches still lack the feature parity with IPv4;
IPv6 uses three address allocation algorithms (SLAAC, privacy extensions, DHCPv6) and its quite
hard to enforce a specific one;
Host implementations are wildly different (aka: The nice thing about standards is that you have
so many to choose from.).
Page 6-13
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Whenever a new end-host appears on the network, its authenticated, and its MAC address is
logged. Only that MAC address can be used on that port (many switches already implement this
functionality).
Whenever an end-host starts using a new IPv6 source address, the packets are not matched by
any existing OpenFlow entries and thus get forwarded to the OpenFlow controller.
The OpenFlow controller decides whether the new source IPv6 is legal (enforcing DHCPv6-only
address allocation if needed), logs the new IPv6-to-MAC address mapping, and modifies the flow
entries in the first-hop switch. The IPv6 end-host can use many IPv6 addresses each one of
them is logged immediately.
Ideally, if the first-hop switches support all the nuances introduced in OpenFlow 1.2, the
controller can install neighbor advertisement (NA) filters, effectively blocking ND spoofing.
Will this nirvana appear anytime soon? Not likely. Most switch vendors support only OpenFlow 1.0,
which is totally IPv6-ignorant. Also, solving real-life operational issues is never as sexy as promoting
the next unicorn-powered fountain of youth.
Page 6-14
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Imagine the world where you can buy a prepackaged data center (or a pod for your private cloud
deployment), with compute, storage and networking handled from the single central management
console.
As of August 2014, NEC is still the only vendor with a commercial-grade data center fabric product
using OpenFlow. Most other vendors use more traditional architectures, and the virtualization world
is quickly moving toward overlay virtual networks.
Anyhow, this is how I envisioned potential OpenFlow use in a small data center in 2012:
Page 6-15
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
THE DREAM
As you can imagine, its extremely simple to configure an OpenFlow-controlled switch: configure its
own IP address, management VLAN, and controllers IP address, and let the controller do the rest.
Once the networking vendors figure out the fine details, they could use dedicated management
ports for out-of-band OpenFlow control plane (similar to what QFabric is doing today), DHCP to
assign an IP address to the switch, and a new DHCP option to tell the switch where the controller is.
The DHCP server would obviously run on the OpenFlow controller, and the whole control plane
infrastructure would be completely isolated from the outside world, making it pretty secure.
The extra hardware cost for significantly reduced complexity (no per-switch configuration and a
single management/SNMP IP address): two dumb 1GE switches (to make the setup redundant),
hopefully running MLAG (to get rid of STP).
Finally, assuming server virtualization is the most common use case in a SMB data center, you could
tightly couple OpenFlow controller with VMwares vCenter, and let vCenter configure the whole
network:
OpenFlow controller would automatically download port group information from vCenter and
automatically provision VLANs on server-to-switch links.
Going a step further, OpenFlow controller could automatically configure static port channels
based on load balancing settings configured on port groups.
Page 6-16
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
End result: decently large layer-2 network with no STP, automatic multipathing, and automatic
adjustment to VLAN changes, with a single management interface, and the minimum number of
moving parts. How cool is that?
If you want true converged storage with DCB, you have to use IBMs switches (NEC does not
have DCB), and even then Im not sure how DCB would work with OpenFlow.
Page 6-17
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
PF5820 (NEC) and G8264 (IBM) have 40GE uplinks, but I have yet to see a 40GE OpenFlowenabled switch with enough port density to serve as the spine node. At the moment, it seems
that bundles of 10GE uplinks are the way to go.
It seems (according to data sheets, but I could be wrong) NEC supports 8-way multipathing, and
wed need at least 16-way multipathing to get 3:1 oversubscription.
Anyhow, assuming all the bumps eventually do get ironed out, you could have a very easy-tomanage network connecting a few hundred 10GE-attached servers.
Page 6-18
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OpenFlow is an ideal tool when you want to augment the software-based networking services with
packet forwarding at hardware speeds. This post describes an DoS prevention solution demonstrated
by NEC and Radware in spring 2013:
Page 6-19
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 6-20
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Fortunately Ron Meyran provided more details on Radware blog as did Lior Cohen in his SDN Central
Demo Friday presentation:
DefenseFlow software monitors the flow entries and counters provided by an OpenFlow
controller, and tries to identify abnormal traffic patterns;
The abnormal traffic is diverted to Radware DefensePro appliance that scrubs the traffic before
its returned to the data center.
Both operations are easily done with ProgrammableFlow API it provides both flow data and the
ability to redirect the traffic to a third-party next hop (or MAC address) based on a dynamicallyconfigured access list. Heres a CLI example from the ProgrammableFlow webinar; API call would be
very similar (but formatted as JSON or XML object):
Page 6-21
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 6-22
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 6-23
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
WILL IT SCALE?
You should be aware of the major OpenFlow scaling issues by now, and I hope youve realized that
real-life switches have real-life limitations. Most of the existing hardware reuses ACL entries when
you ask for full-blown OpenFlow flow entries. Now go and check the ACL table size on your favorite
switch, and imagine you need one entry for each flow spec you want to monitor or divert to the DPI
appliance.
Done? Disappointed? Pleasantly surprised?
However, a well-tuned solution using the right combination of hardware and software (example:
NECs PF5240 which can handle 160.000 L2, IPv4 or IPv6 flows in hardware) just might work. Still,
were early in the development cycle, so make sure you do thorough (stress) testing before buying
anything ... and just in case you need rock-solid traffic generator, Spirent will be more than happy
to sell you one (or few).
Page 6-24
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
NEC and IBM gave me access to one of their early ProgrammableFlow customers. This is what I got
out of that discussion which took place in February 2012.
In the meantime, Ive encountered at least one large-scale production deployment of
ProgrammableFlow, proving that NECs solution works in large data centers.
A BIT OF A BACKGROUND
Tervelas data fabric solutions typically run on top of traditional networking infrastructure, and an
underperforming network (particularly long outages triggered by suboptimal STP implementations)
can severely impact the behavior of the services running on their platform.
They were looking for a solution that would perform way better than what their customers are
typically using today (large layer-2 networks), while at the same time being easy to design,
Page 6-25
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
provision and operate. It seems that they found a viable alternative to existing networks in a
combination of NECs ProgrammableFlow Controller and IBMs BNT 8264 switches.
EASY TO DEPLOY?
As long as your network is not too big (NEC claimed their controller can manage up to 50 switches in
their Networking Tech Field Day presentation, and the later releases of ProgrammableFlow increased
that limit to 200), the design and deployment isnt too hard according to Tervelas engineers:
They decided to use out-of-band management network and connected the management port of
BNT8264 to the management network (they could also use any other switch port).
All you have to configure on the individual switch is the management VLAN, a management IP
address and the IP address of the OpenFlow controllers.
The ProgrammableFlow controller automatically discovers the network topology using LLDP
packets sent from the controller through individual switch interfaces.
After those basic steps, you can start configuring virtual networks in the OpenFlow controller
(see the demo NEC made during the Networking Tech Field Day).
Obviously, youd want to follow some basic design rules, for example:
Make the management network fully redundant (read the QFabric documentation to see how
thats done properly);
Connect the switches into a structure somewhat resembling a Clos fabric, not in a ring or a
random mess of cables.
Page 6-26
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 6-27
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
There were a few outliers (10-15 seconds), probably caused by lack of failure detection on the
physical layer. As I wrote before, detecting link failures via control packets sent by OpenFlow
controller doesnt scale you need distributed linecard protocols (LACP, BFD) if you want to have a
scalable solution.
NEC added OAM functionality in later releases of ProgrammableFlow, probably solving this
problem.
Finally, assuming their test bed allowed the ProgrammableFlow controller to prepopulate the backup
entries, it would be interesting to observe the behavior of a four-node square network, where its
impossible to find a loop-free alternate path unless you use virtual circuits like MPLS Fast Reroute
does.
Page 6-28
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
NEXT STEPS?
Tervelas engineers said the test results made them confident in the OpenFlow solution from NEC
and IBM. They plan to run more extensive tests and if those test results work out, theyll start
recommending OpenFlow-based solutions as a Proof-of-Concept-level alternative to their customers.
Page 6-29
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Every time a new networking technology appears, someone tries to solve the Bandwidth-on-Demand
problem with it. OpenFlow is no exception.
Page 6-30
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Per-flow (or per-granular-FEC) state in the network core never scales. This is what killed RSVP
and ATM SVCs.
Its pretty hard to traffic engineer just the elephant flows. Either you do it properly and traffic
engineer all traffic, or you end with a suboptimal network.
Nobody above the network layer really cares its way simpler to blame the network when the
bandwidth fairy fails to deliver.
You dont think the last bullet is real? Then tell me how many off-the-shelf applications have RSVP
support ... even though RSVP has been available in Windows and Unix/Linux server for ages. How
many applications can mark their packets properly? How many of them allow you to configure DSCP
value to use (apart from IP phones)?
Similarly, its not hard to implement bandwidth-on-demand for specific elephant flows (inter-DC
backup, for example) with a pretty simple combination of MPLS-TE and PBR, potentially configured
with Netconf (assuming you have a platform with a decent API). You could even do it with SNMP
pre-instantiate the tunnels and PBR rules and enable tunnel interface by changing ifAdminStatus.
When have you last seen it done?
So, although Im the first one to admit OpenFlow is an elegant tool to integrate flow classification
(previously done with PBR) with traffic engineering (using MPLS-TE or any of the novel technologies
proposed by Juniper) using the hybrid deployment model, being a seasoned skeptic, I just dont
believe well reach the holy grail of bandwidth-on-demand during this hype cycle. However, being an
eternal optimist, I sincerely hope Im wrong.
Page 6-31
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
In one of their pivoting phases Big Switch Networks proposed to implement virtual networking with
MAC-layer access control lists installed through OpenFlow. Im not aware of any commercial
deployment of this idea.
Page 6-32
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
OpenStack virtual networks are created with the REST API of the Quantum (networking)
component of OpenStack;
Quantum uses back-end plug-ins to create the virtual networks in the actual underlying network
fabric. Quantum (and the rest of OpenStack) does not care how the virtual networks are
implemented as long as they provide isolated L2 domains.
Big Switch decided to implement virtual networks with dynamic OpenFlow-based L2 ACLs instead
of using VLAN tags.
The REST API offered by Floodlights VirtualNetworkFilter module offers simple methods that
create virtual networks and assign MAC addresses to them.
The VirtualNetworkFilter intercepts new flow setup requests (PacketIn messages to the Floodlight
controller), checks that the source and destination MAC address belong to the same virtual
network, and permits or drops the packet.
If the VirtualNetworkFilter accepts the flow, the Floodlights Forwarding module installs the flow
entries for the newly-created flow throughout the network.
The current release of Floodlight installs per-flow entries throughout the network. Im not
particularly impressed with the scalability of this approach (and Im not the only one).
Page 6-33
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The Floodlight controller is a single point of failure (theres no provision for a redundant
controller);
Unless I cant read Java code (which wouldnt surprise me at all), the VirtualNetworkFilter stores
all mappings (including MAC membership information) in in-memory structures that are lost if
the controller or the server on which it runs crashes;
As mentioned above, per-flow entries used by Floodlight controller dont scale at all (more about
that in an upcoming post).
The whole thing is thus a nice proof-of-concept tool that will require significant efforts (probably
including a major rewrite of the forwarding module) before it becomes production-ready.
However, we should not use Floodlight to judge the quality of the yet-to-be-released commercial
OpenFlow controller from Big Switch Networks. This is how Mike Cohen explained the differences:
I want to highlight that all of the points you raised around production deployability and
flow scalability (and some you didn't around how isolation is managed / enforced) are
indeed addressed in significant ways in our commercial products. Theres a separation
between what's in Floodlight and the code folks will eventually see from Big Switch.
As always, I might become a believer once I see the product and its documentation.
Page 6-34
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The final blog post in this chapter was written in early 2012 when the industry press still wasnt able
to figure out what individual companies using OpenFlow were doing. Although its a bit old, it still
provides an overview of different solutions that use OpenFlow as a low-level forwarding table
programming tool.
In the meantime, VMware bought Nicira (as I predicted in the last paragraph), and Niciras NVP
became the basis for VMwares NSX.
Page 6-35
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Deployment paradigm: complexity belongs to the hypervisor soft switches, lets keep the network
simple. It should provide no more and no less than optimal transport between equidistant hypervisor
hosts (Clos fabrics come to mind).
Target environment: Large cloud builders and other organizations leaning toward Xen/OpenStack.
NEC and BigSwitch are building virtual networks by rearranging the forwarding tables in the physical
switches. Their OpenFlow controllers are actively reconfiguring the physical network, creating virtual
networks out of VLANs, interfaces, or sets of MAC/IP addresses.
Deployment paradigm: we know hypervisor switches are stupid and cant see beyond VLANs, so
well make the network smarter (aka VM-aware networking).
Target environment: large enterprise networks and those that build cloud solutions with existing
software using VLAN-based virtual switches.
Page 6-36
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
wont change because of Nicira. No wonder Michael Bushong from Juniper embraced Nicira's
solution.
Between Nicira and Ciscos Nexus 1000V: not at the moment. Open vSwitch runs on Xen/KVM,
Nexus 1000V runs on VMware/Hyper-V. Open vSwitch runs on vSphere, but with way lower
throughput than Nexus 1000V. Obviously Cisco could easily turn Nexus 1000V VSM into an
OpenFlow controller (I predicted that would be their first move into OpenFlow world, and was proven
dead wrong) and manage Open vSwitches, but there's nothing at the moment to indicate they're
considering it.
Between BigSwitch/NEC and Cisco/Juniper. This one will be fun to watch, more so with IBM, Brocade
and HP clearly joining the OpenFlow camp and Juniper cautiously being on the sidelines.
However, Nicira might trigger an interesting mindset shift in the cloud aspirant community: all of a
sudden, Xen/OpenStack/Quantum makes more sense from the scalability perspective. A certain
virtualization vendor will indubitably notice that ... unless they already focused their true efforts on
PaaS (at which point all of the above becomes a moot point).
Page 6-37
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The SDN = Centralized Control Plane (preferably using OpenFlow) definition promoted by Open
Networking Foundation (ONF) is too narrow for most real-life use cases, as it forces a controller
vendor to reinvent all the mechanisms we had in networking devices for the last 30 years, and make
them work within a distributed system with unreliable communication paths.
Many end-users (including Microsoft, a founding member of ONF) and vendors took a different
approach, and created solutions that use traditional networking protocols in a different way, rely on
overlays to reduce the complexity through decoupling, or use a hierarchy of control planes to
achieve better resilience.
This chapter starts with a blog post describing the alternate approaches to SDN and documents
several potentially usable protocols and solutions.
MORE INFORMATION
Youll find additional SDN- and OpenFlow-related information on ipSpace.net web site:
Read SDN- and OpenFlow-related blog posts and listen to the Software Gone Wild podcast;
Numerous ipSpace.net webinars describe SDN, network programmability and automation, and
OpenFlow (some of them are freely available thanks to industry sponsors);
2-day SDN, NFV and SDDC workshop will help you figure out how to use SDN, network function
virtualization and SDDC technologies in your network;
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
IN THIS CHAPTER:
THE FOUR PATHS TO SDN
THE MANY PROTOCOLS OF SDN
EXCEPTION ROUTING WITH BGP: SDN DONE RIGHT
NETCONF = EXPECT ON STEROIDS
DEAR $VENDOR, NETCONF != SDN
WE NEED BOTH OPENFLOW AND NETCONF
CISCO ONE: MORE THAN JUST OPENFLOW/SDN
THE PLEXXI CHALLENGE (OR: DONT BLAME THE TOOLS)
I2RS JUST WHAT THE SDN GOLDILOCKS IS LOOKING FOR?
Page 7-2
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The very strict definition of SDN as understood by Open Networking Foundation promotes an
architecture with strict separation between a controller and totally dumb devices that cannot do
more than forward packets based on forwarding rules downloaded from the controller.
This definition is too narrow for most use cases, resulting in numerous solutions and architectures
being branded as SDN. Most of these solutions fall into one of the four categories described in the
blog post I wrote in August 2014.
Page 7-3
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
resources NEC invested in ProgrammableFlow over the last years, its not realistic to expect that
well be able to use OpenDaylight in production environments any time soon (assuming youd want
to use it an architecture with a single central failure point in the first place).
FYI, Im not blaming OpenFlow. OpenFlow is just a low level tool that can be extremely
handy when youre trying to implement unusual ideas.
Topology discovery;
Fast failure detection (including detection of bad links, not just lost links);
Page 7-4
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
VENDOR-SPECIFIC APIS
After the initial magical dust of SDN-washing settled down, few vendors remained standing (Im
skipping those that allow you to send configuration commands in XML envelope and call that
programmability):
Arista has eAPI (access to EOS command line through REST) as well as the capability to install
any Linux component on their switches, and use programmatic access to EOS data structures
(sysdb);
Ciscos OnePK gives you extensive access to inner working of Cisco IOS and IOS XE (havent
found anything NX-OS-related on DevNet);
Juniper has some SDK thats safely tucked behind a partner-only regwall. Just the right thing to
do in 2014.
Page 7-5
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
F5 had iRules and iControl for years (and theres a Perl library to use it, which is totally
awesome).
Not surprisingly, vendors love you to use their API. After all, thats the ultimate lock-in they can get.
Page 7-6
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 7-7
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The following text is a slightly reworded blog post I wrote in April 2013:
Page 7-8
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
NETCONF, OF-Config (a YANG data model used to configure OpenFlow devices through NETCONF)
and XMPP (chat protocol creatively used by Arista EOS) operate at the management plane they
can change network device configuration or monitor its state.
Remote Triggered Black Holes is one of the oldest solutions using BGP as the mechanism to modify
networks forwarding behavior from a central controller.
Some network virtualization vendors use BGP to build MPLS/VPN-like overlay virtual networking
solutions.
I2RS and PCEP (a protocol used to create MPLS-TE tunnels from a central controller) operate on the
control plane parallel to traditional routing protocols). BGP-LS exports link state topology and MPLSTE data through BGP.
OVSDB is a protocol that treats control-plane data structures as database tables and enables a
controller to query and modify those structures. Its used extensively in VMwares NSX, but could be
used to modify any data structure (assuming one defines additional schema that describes the
data).
OpenFlow, MPLS-TP, ForCES and Flowspec (PBR through BGP used by creative network operators
like CloudFlare) work on the data plane and can modify the forwarding behavior of a controlled
device. OpenFlow is the only one of them that defines data-to-control-plane interactions (with the
Packet In and Packet Out OpenFlow messages).
Page 7-9
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Microsoft was one of the first companies to document their use of BGP to implement a controllerbased architecture. Numerous similar solutions have been described since the time I wrote this blog
post (October 2013) it seems BGP is becoming one of the most popular SDN implementation tools.
THE PROBLEM
Ill use a well-known suboptimal network to illustrate the problem: a ring of four nodes (it could be
anything, from a monkey-designed fabric, to a stack of switches) with heavy traffic between nodes A
and D.
Page 7-10
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
In a shortest-path forwarding environment you cannot spread the traffic between A and D across all
links (although you might get close with a large bag of tricks).
Can we do any better with a controller-based forwarding? We definitely should. Lets see how we can
tweak BGP to serve our SDN purposes.
Page 7-11
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 7-12
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Obviously Im handwaving over lots of moving parts you need topology discovery, reliable next
hops, and a few other things. If you really want to know all those details, listen to the Packet
Pushers podcast where we deep dive around them (hint: you could also engage me to help you build
it).
Page 7-13
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Two identical BGP paths (with next hops B and D) to A (to ensure the BGP route selection
process in A uses BGP multipathing);
A BGP path with next hop C to B (B might otherwise send some of the traffic for D to A, resulting
in a forwarding loop between B and A).
Page 7-14
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
You can get even fancier results if you run MPLS in your network (hint: read the IETF draft on
remote LFA to get a few crazy ideas).
MORE INFORMATION
Routing Design for Large-Scale Data Centers (Petrs presentation @ NANOG 55)
Page 7-15
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Not surprisingly, the SDN-washing (labeling whatever you have as SDN) started just a few months
after the initial SDN hype, with some people calling their NETCONF implementation SDN. This is
what NETCONF really is.
WHAT IS NETCONF?
NETCONF (RFC 6421) is an XML-based protocol used to manage the configuration of networking
equipment. It allows the management console (manager) to issue commands and change
configuration of networking devices (NETCONF agents). In this respect, its somewhat similar to
SNMP, but since it uses XML, provides a much richer set of functionality than the simple key/value
pairs of SNMP.
For more details, I would strongly suggest you listen to the NETCONF Packet Pushers podcast.
Page 7-16
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 7-17
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 7-18
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
NETCONF (or XMPP as used by Arista) operates solely on the management plane, making it an
interesting device configuration mechanism, but we might need more to implement something that
could rightfully be called SDN. This is my response (written in October of 2012) to SDN-washing
activities performed by a large data center vendor.
Page 7-19
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Most NETCONF implementations dont allow you to go below the device configuration level. On the
other hand, OpenFlow by itself isnt enough to implement a self-sufficient SDN solution, as it doesnt
allow the controller to configure the initial state of the attached devices. In a solution that
implements novel forwarding functionality we might need both.
Page 7-20
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
your favorite vendor, have you even though its development has been slowly progressing (or not,
depending on your point of view) for the last decade.
Page 7-21
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
EPHEMERAL STATE
NETCONF protocol modifies device configuration. Whatever you configure with NETCONF appears in
the device configuration and can be saved from running configuration to permanent (or startup) one
when you decide to save the changes. You might not want that to happen if all you want to do is
apply a temporary ACL on an interface or create an MPLS-TP-like traffic engineering tunnel
(computed externally, not signaled through RSVP).
OpenFlow-created entries in the forwarding table are by definition temporary. They dont appear in
device configuration (and are probably fun to troubleshoot because they only appear in the
forwarding table) and are lost on device reload or link loss.
Page 7-22
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Not surprisingly, some vendors reacted to the SDN movement by launching their own proprietary
APIs. Ciscos OnePK is (in August 2014) by far the most comprehensive one.
Page 7-23
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The second, even more important message is lets not reinvent the wheel. Google might have the
needs and resources to write their own OpenFlow controllers, northbound API, and custom
applications on top of that API; the rest of us would just like to get our job done with minimum
hassle. To help us get there, Cisco plans to add One Platform Kit (onePK) API to IOS, IOS-XR and
NX-OS.
Page 7-24
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 7-25
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
with low-level details, you can (hopefully we have to see the API first) focus on how getting your
job done.
OPEN OR PROPRIETARY?
No doubt the OpenFlow camp will be quick to claim onePK is proprietary. Of course it is, but so is
almost every other SDK or API in this industry. If you decide to develop an iOS application, you
cannot run it on Windows 7; if your orchestration software works with VMwares API, you cannot use
it to manage Hyper-V.
The real difference between networking and most of the other parts of the IT is that in networking
you have a choice. You can use onePK, in which case your application will only work with Cisco IOS
and its cousins, or you could write your own application stack (or use a third party one) using
OpenFlow to communicate with the networking gear. The choice is yours.
MORE DETAILS
You can get more details about Cisco ONE on Ciscos web site and its data center blog, and a
number of bloggers published really good reviews:
Jason Edelman did an initial analysis of Ciscos SDN material and is waiting to see the results of
the Cisco ONE announcement.
Page 7-26
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Four lambdas (40 Gbps) are used to connect to the adjacent (east and west) switch;
Two lambdas (20 Gbps) are used to connect to four additional switches in both directions.
Page 7-27
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 7-28
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The CWDM lambdas established by Plexxi switches build a chordal ring. Heres the topology you get
in a 25-node network:
Page 7-29
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
The beauty of Plexxi ring is the ease of horizontal expansion: assuming you got the wiring right, all
you need to do to add a new ToR switch to the fabric is to disconnect a cable between two switches
and insert a new switch between them as shown in the next diagram. You could do it in a live
network if the network survives a short-term drop in fabric bandwidth while the CWDM ring is
reconfigured.
Page 7-30
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
Page 7-31
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
There are at least two well-known solutions to the non-SPF routing challenge:
Central controllers (well known from SONET/SDH, Frame Relay and ATM days);
Distributed traffic engineering (thoroughly hated by anyone who had to operate a large MPLS TE
network close to its maximum capacity).
Plexxi decided to use a central controller, not to provision the virtual circuits (like we did in ATM
days) but to program the UCMP (Unequal Cost Multipath) forwarding entries in their switches.
Does that mean that we should forget all we know about routing algorithms and SPF-based ECMP
and rush into controller-based fabrics? Of course not. SPF and ECMP are just tools. They have wellknown characteristics and well understood use cases (for example, they work great in leaf-and-spine
fabrics). In other words, dont blame the hammer if you decided to buy screws instead of nails.
Page 7-32
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars
In summer of 2012 IETF launched yet another working group to develop a protocol that could
interact with routers on the control plane. I2RS (initially called IRS) might be exactly what a resilient
SDN solution needs assuming it ever gets off the ground.
Page 7-33
This material is copyrighted and licensed for the sole use by Nebojsa Marjanac (Nebojsa.Marjanac@mtel.ba [81.93.84.66]). More information at http://www.ipSpace.net/Webinars