Académique Documents
Professionnel Documents
Culture Documents
Overview
JTAC Slides
Agenda
2 x PF chips 1 x PF chip
X PE1
2x 4TX,4RX 16TX+16RX
GRP0
2x3 4TX,4RX
GRP1 PF0
QSFP
72 x 40G 36 x 40G
X
4TX,4R
X
4TX,4R
16TX+16RX
HMC
HMZ
HMZ
4TX,4R
X
4TX,4R 16TX+16RX
X PE2
16TX+16RX
2x 4TX,4RX
GRP0
2x3 4TX,4RX 16TX+16RX
GRP1
QSFP
X
4TX,4R
X
4TX,4R
10G Channelization
HMC
HMZ
HMZ
MEZZ CONN
4TX,4R
100G
Intel CPU
X
4TX,4R
X PE3
2x
4TX,4RX 16TX+16RX
2x3 GRP0
4TX,4RX 16TX+16RX
QSFP GRP1
PTP FPGA
16TX+16RX
X 16TX+16RX
4TX,4R
X
4TX,4R
HMC
HMZ
HMZ
4TX,4R
X
4TX,4R
2x X PE4
2x3 4TX,4RX
PF1
QSFP 4TX,4RX GRP0
GRP1 16TX+16RX
4TX ,4RX
X
4TX,4R
HMC
HMZ
HMZ
4TX,4R
X
4TX,4R
X PE5
2x 4TX,4RX
GRP0
2x3 4TX,4RX 16TX+16RX
GRP1
QSFP X
4TX,4R
X
4TX,4R
MEZZ BOARD
HMC
HMZ
HMZ
PE3
HW Overview: MTIP on PE
• Four Port Groups at Network side (aka wanio in PE) on each PE
• Per ASIC errata, when a port-0 speed changes, the traffic can be disruptive
on other ports (1 and 2) in same PG.
100GE Ports
QFX10002 100GE PORTS
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70
37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Elit and Elit-Lite PE and PF
• Paradise PE/PF ASIC.
• Integrated FPC contains 6 PEs (Elit) or 3 PEs (Elit-Lite) with each PE forming
a PFE.
• Elit Fabric consists of 2 PFs, Elit-Lite consists of 1 PF, each fabric plane
contains 1 PF.
• Each PFE connect in X1P mode using serdes@21G to PF
• Avago 28nm serdes
• CCL 2.0
• Each PF in Elit has 96 X1P ccl links to the PFEs.
• Elit – Each of the 6 PFE has 16 links to PF-0 and 16 links to PF-1
• Elit-Lite – Each of the 3 PFE has 32 links to PF-0
• PE uses CCL2 link to connect the fabric planes.
• Cell based forwarding
Software Architecture
• The modular software architecture of QFX10000 switches provides the
following specific benefits:
• Multicore Linux kernel (based on Windriver Yocto Linux)
• Higher control plane performance (running on four cores)
• Topology-independent in-service-software upgrades (TISSU)
• Hosting third-party apps in a virtual machine
• Zero touch provisioning (ZTP)
• Automation with Puppet, Chef, Ansible, and Python
ULTIMAT
• Midplane-less chassis
• Ultimat Control Board (UCB) / Ultimat Routing Engine (URE)
– URE and UCB are within the same FRU (QFX10000-RE) in Ultimat
QFX10000 system architecture
Control Plane
Data Path
Individual I2C bus segments
from both Control Boards to all Packet Forwarding is based on Switching Fabric
FRUs in the system, used for Paradise PFE (PE) ASICs on
FRU identification and power line cards Switching Fabric consists of up
up/down
WAN Interface on PE ASIC to six SIB boards, each with
Gen 2 PCI Express connectivity drives the front panel ports on two Paradise Fabric (PF)
from both Control Boards to all Line Cards, either directly or ASICs.
Six SIBs in the system, used to through retimers Every PE ASIC in the system is
configure and control PF ASICs
Each PE ASIC uses HMC connected to every PF ASIC,
on the SIBs
memory devices for data and resulting in single-hop
10G Ethernet connectivity table storage. connectivity from any PE to any
between both CBs, all LC other PE
Each PE ASIC in the system
CPUs, and all PE ASICs, used
connects to the Switching Switch fabric is configured and
for code download and IPC
Fabric either directly or through controlled by Master CB in the
RS232 connectivity between retimers system
CB and all LCPUs, used for
PE ASICs on a line card are
CTY
initialized and controlled by the
local CPU
UCB/URE Functionality:
• Handles system control functions
• Maintains hardware forwarding table
• Maintains routing protocol states
• Handles environmental monitoring
• Handles integrated Precision Time Protocol (PTP)
• UCB/URE Components summary:
– Intel IVY Bridge 4 core 2.5GHz CPU
– 32GB DDR3 SDRAM – Four 8GB DIMMs
– 50 GB Internal SSD Storage
– One 2.5” External SSD slot
– 10GB Ethernet Switch for Control Plane connectivity with Line Cards
– PCI Express Switch for Control Plane Connectivity with SIBs
– I2C bus segments from CB FPGA to all FRUs
– RS232 Console Port & USB Port
– RJ45 and SFP Management Ethernet
– Includes PTP logic, and SMB connectors for PTP
UCB/URE Components Details
• CPU (Intel Ivy Bridge Gladden)
– Four execution cores.
root@localhost:~# cat /proc/cpuinfo
root@localhost:~# vmstat –s
32470716 total memory
14177184 used memory
13440860 active memory
188460 inactive memory
18293532 free memory
– One or two channels of DDR3 memory with a maximum of two UDIMMs per channel
– Support Single-channel modes and Dual-channel mode
– 72-bit wide channels, 64-bit data + 8 bit ECC
• Direct Media Interface (DMI) x4 (10G full duplex) to PCH
• PCIe Root Complex (Gen1 4x1) interface.
root@localhost:~# lspci
00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM
Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core
processor PCI Express Root Port (rev 09)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core
processor PCI Express Root Port (rev 09)
00:01.2 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core
processor PCI Express Root Port (rev 09)
00:06.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core
processor PCI Express Root Port (rev 09)
Platform Controller Hub (PCH) (Intel Cave Creek)
• PCIe Root Complex interface (10G full duplex) (connects to CPU)
• PCIe EndPoint interface (5GT/s) (connects to UCBC FPGA (Control FPGA), PTP FPGA,
GbE interface)
• Integrated GbE – different LAN for different purpose
– EM 0/SGMII0 – Marvell 88E1112 GbE PHY - Mgmt RJ45
– EM 1/SGMII1 – Mgmt SFP
– EM 2 – Host (192.168.1.x)
root@Ultimat-re0:RE:0% sysctl -a | grep hypervisor_ip
hw.re.hypervisor_ip: 192.168.1.1
JUNOS VM
em6 em4 Secondary RE
192.168.1.X 128.0.0.48 192.168.2.X 128.0.0.1
em0 em1 em2 em5 em6 em7 em3
128.0.0.16 LC-1
br br br br br CB PFEM
128.0.0.17 LC-2
mgmt0 HOST Fab Guest PFEM
MGR VM
NIC
FANs, PS,
Sensors, LEDs SW NIC
Line Card
SW
Forwarding
PE
Daemon
DST
Sensors
Yocto Linux
Fabric Card
Sensors PF
TVP Chassis Device Ownership
Component Run Location Devices
JUNOS Master & Standby RE FPGA(Mastership), Console
HostOS on RE Master & Standby RE USB, Mgmt ports, DISK, PCIe
Controllers, NIC, PCH
HostOS on LC LC Forwarding Daemon - PE
Forwarding Daemon LC Data port optics, LEDs, ASIC
LCM Master & Backup RE FPGA, I2c, Fan controller, PSU,
Temp Sensor on CBs & LCs, FPM,
Mid-plane, Power mgmt.
JunOS VM • Console
• Mgmt. Ports
Chassid
PFE Process
(lookup control
information)
grant
VOQ
Ingress packet processor
manager req
(headers)
Fabric req
Ingress buffer Fabri
schedule grant
manager c
Policer/ r
Port Filter intf
group (loopback) counter
Egress buffer Output Q
manager manager
(updated headers)
FABRIC
Egress
WAN Egress Egress Header
Header Egress Filter Descriptor Parsing
host Rewrite Fetch
Egress packet processor
Power wastage
Input Output Packets are written to and read from off-chip
Ports PFE-B (Ingress) PFE-Y (Egress) Ports buffer twice
Fabric switch
HoL
blocking
JUNIPER NETWORKS RESTRICTED & CONFIDENTIAL
VoQ – Better latency, power & congestion MGMT
Virtual Output
Queues
Off-chip buffers PFE-A (Ingress) PFE-X (Egress) On-chip
~40ms buffers
Buffering only at Ingress
(~10’s us)
All the VoQs, at ingress PFEs,
corresponding to the egress port,
are part of buffer Q for that port
(Note the color code)
Fabric switch
FAB –
Requests and Grants for Packet groups
1. Packet
Effective Fabric Flow control
DRAM Packet Request and grant for every page – multiple
Buffer Group packets
No Saw-tooth effect
Fabric - Multiple cell sizes for better alligment
- Page need to fit into fixed chunks – little
padding per ~every huge-size page
VOQ ARCHITECTURE SUMMARY
The JUNOS Express chipset fabric design aimed for efficiency, scalability, and simplicity
Virtual output queueing technology
Non blocking
Each packet written to and read from external memories only once
Egress driven scheduling hierarchy
No sustained congestion in the fabric
Distributed congestion management with global buffer size control
• Iport: logical port, per ifd. LAG members use parent’s lport
• Gport Id: global port, a system-wide unique number for port.
• L2domain: Layer 2 forwarding domain capable of switching, learning, flooding,
routable mac etc
• Gl2domain Id: global L2dmain id, a system-wide identifier
• L3vpn: Layer3 forwarding domain
• Flabel: Fabric label, an identifier for header rewrite information with in scope of
egress PE.
• Egress NHId: Same as flabel, see above.