Vous êtes sur la page 1sur 22

Annex A: Specifications COBALT 2.

0 GPU
Cluster

March 11, 2018

Contents
1 Introduction 2

2 List of Abbreviations 2

3 Cluster Overview 3

4 Enterprise/Server grade GPU types 5

5 Acceptance Criteria 6

6 Requirements Conformity Tables 7

7 To do list 22

List of Tables
1 Rack requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 General node requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Head node : additional requirements . . . . . . . . . . . . . . . . . . . . . 12
4 Production nodes : additional requirements . . . . . . . . . . . . . . . . . 13
5 General network requirements . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Raw data network requirements . . . . . . . . . . . . . . . . . . . . . . . 16
7 Batch data network : additional requirements . . . . . . . . . . . . . . . 17
8 Access & control network : additional requirements . . . . . . . . . . . . 18
9 General software and firmware requirements . . . . . . . . . . . . . . . . 19
10 Support requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
11 Award Criteria:- incomplete : not for review . . . . . . . . . . . . . 21

Only for internal use at ASTRON/COBALT2.0 1 pandey draft: v.ER0.IR0.M1.m12


1 Introduction
The to be offered COBALT 2.0 GPU cluster will execute the correlator and beam-former
application for mega mode observations of the LOFAR1 radio telescope. In “LOFAR
Mega mode”- simultaneous observations can be carried out to serve half a dozen scientific
surveys and space weather applications in parallel. This simultaneity will greatly increase
the efficiency with which LOFAR is used for science.
LOFAR is the world’s largest radio telescope. It consists of a central core of densely
located Dutch stations in the Netherlands and international stations which are spread
across several countries in Europe. The real-time data streams from the stations arrive
via dedicated fiber links to the central LOFAR network switches at CIT, Groningen.
The COBALT 2.0 GPU cluster will be directly connected to the central LOFAR network
and will receive and process the incoming data streams in real-time. LOFAR is a 24×7
instrument observing the sky throughout the year. Thus COBALT 2.0 would also operate
in a similar working environment. It would be physically located and installed in the CIT
Smitsborg data center of the University of Groningen2 .

2 List of Abbreviations
Throughout this document the following abbreviations are used unless stated otherwise.
Some of the well known obvious acronyms may not have been listed.

GbE = Giga-bit Ethernet


1 Gb = 1 Gbit = 1 Gigabit = 109 bit
1 Gbps = 109 bits/second
1 MB = 106 Byte
1 GB3 = 109 Byte
1 TB = 1012 Byte

BIOS Basic Input Output system.


CIT Center for Information Technology (University of Groningen), The Netherlands
COBALT 2.0 COrrelator and Beam-former Application for the LOFAR Telescope (Mega mode)
CPU Central Processing Unit
DRAM Dynamic Random Access Memory
ECC Error Correcting Code memory
EDR Enhanced Data Rate (100 Gbps - In context of Infiniband network)
EFI Extensible Firmware Interface
FC Fully Compliant
FDR Full Data Rate (54 Gbps - In context of Infiniband network)
GPU Graphical Processing Unit
HDD Hard Disk Drive
IB InfiniBand
IGMP Internet Group Management Protocol
IPMI Intelligent Platform Management Interface
IPoIB IP-over-InfiniBand
KVM Keyboard Video Mouse
LACP Link Aggregation Control Protocol
LOFAR Low Frequency Array (http://www.lofar.org)
1
Low Frequency Array (http://www.lofar.org)
2
https://www.rug.nl/
3
except when used in context of RAM memory where 1 GB=230 bytes.

2
Only for internal use at ASTRON/COBALT2.0 pandey draft: v.ER0.IR0.M1.m12
MLD Multicast Listener Discovery (In context of snooping)
NC Not Compliant
NFS Network File System
OoBM Out of Band Management
OS Operating System
PCIe Peripheral Component Interconnect Express
PXE Preboot Execution Environment
RAM Random Access Memory
RAID Redundant Array of Independent Disks
RID Requirement IDentifier
RU Rack Unit
SNMP Simple Network Management Protocol
UTP Unshielded Twisted Pair
VLAN Virtual Local Area Network

3 Cluster Overview
An overview of the cluster layout and its external connectivity is shown in Figure 1. We
consider three (physical) networks.
• “raw data network” which is mainly used to get the real time raw data (from
LOFAR stations) into the cluster via the already existing switches (BROCADE
RX16 10 GbE) of the LOFAR network.
• “access & control network” with a “access & control network switch” (1 GbE).
All the nodes are connected twice to the access & control network switch, once for
user access (login) and once for control (OoBM/IPMI).
• “batch data network” with a batch data network switch (100 Gbit/s infiniband
EDR). The external connections from this switch to the already existing infiniband
switches of LOFAR network (FDR 54 Gbit/s) are shown in Dark Magenta.

The offered cluster must consist of :


(a) One head node.
(b) The offered number of production nodes. The number of offered production nodes
should be between 11 and 15. All the production nodes offered must be identical.
(c) A batch data infiniband EDR network (including 100 Gbps infiniband EDR switch,
fiber patches and optics).
(d) An access & control network (including the 1GbE switch, cables and optics).
(e) A raw data network (excluding the 10 GbE switch but including the fiber patches
and optics).
(f) All the power cords, rack mounting materials, and the OoBM system components.
(g) The cluster including switches, all internal connections and external connections
must be installed by the vendor (tenderer). This has to be done in coordination
with the COBALT 2.0 team consisting of ASTRON4 and CIT Personnel.
4
ASTRON is the Netherlands Institute for Radio Astronomy http://www.astron.nl

Only for internal use at ASTRON/COBALT2.0 3 pandey draft: v.ER0.IR0.M1.m12


Figure 1: An overview of the COBALT 2.0 cluster layout. The networks are depicted with
distinct colors. The raw data network (10 GbE) is shown in Blue, while the batch data
network (100 Gbit/s infiniband EDR) is shown in Green. The external connections to the
already existing infiniband switches (54 Gbit/s) are shown in Dark Magenta. The user
access (login) is shown in Brown while the control is shown in Red color.

Only for internal use at ASTRON/COBALT2.0 4 pandey draft: v.ER0.IR0.M1.m12


The cluster has two types of nodes:
• “Head node”: The head node will act as a login node, to host home directories,
as a cluster job manager, as a NFS server and may be used to compile software.
The head node needs to be a dual socket machine containing a total of two physical
CPUs. It does not have a GPU for computing. The head node is connected to the
batch data network with a single infiniband (EDR) link. The infiniband interface
on the head node may correspond to any one of the two sockets/CPUs.

• “Production nodes”: Each production node is a dual socket machine consisting of


a total of two physical CPUs. For each socket/CPU, there is a directly linked GPU,
3×10 GbE interface (ports) and 1×100 Gb EDR infiniband interface (port). Thus
in total, a production node has 2 GPUs, 6×10 GbE ports and 2×100 Gbps (EDR)
infiniband ports. It is to be emphasized that corresponding to each socket/CPU
there is a direct 3×10 GbE link (ports). The simultaneously achievable sustained
raw data network rate (aggregate) to the two sockets/CPUs must be 25 Gbps and
30 Gbps respectively.
In the main usage scenario, multiple input streams of raw data arrive via the central
LOFAR network to each production node independently. Each stream is destined
for a specific 10 GbE port on the production node. The production nodes process in
real time, the received data, on GPUs and CPUs, while heavily using the batch data
network (Infiniband EDR) to exchange amongst each other - the suitably formatted
raw data. The CPU and GPU corresponding to each socket process the received
raw data through their own 3×10 GbE interfaces (ports) independently. Thus both
set of CPU, GPU, 10 GbE interfaces and infiniband interface within a production
node are mutually exclusive in terms of computing.

Access & control network: The head node and all the production nodes are also
connected twice to the access & control network switch (1GbE), once for login/NFS/PXE
and once for OoBM. The access and control interfaces may be linked to any one of the
two sockets/CPUs on the head node. The access & control network switch should be
remotely manageable and needs to have an uplink port to connect to the rest of the
LOFAR management network.

Batch data network: The head node (once) and all the production nodes (twice per
node) are connected (100 Gbps) to the batch data network switch (EDR infiniband). In
addition, the switch must also have 9 uplink ports to connect to the already existing
infiniband (FDR 54 Gbps) switches of the LOFAR network (as shown in Figure 1). Thus
the vendor needs to ensure that appropriate fiber-patches including optics for these 9
external connection are included in the offer. The batch data network switch must be
remotely manageable and needs to have an uplink port to connect to the rest of the
LOFAR management network.

4 Enterprise/Server grade GPU types


As described in the Request for Offer (“Verzoek tot Offerte”) the Tenderer should only of-
fer cluster configuration with Enterprise/Server grade NVIDIA CUDA compatible GPUs.
This is necessary due to the compatibility with our self developed in-house software which

Only for internal use at ASTRON/COBALT2.0 5 pandey draft: v.ER0.IR0.M1.m12


needs to run on the cluster. Given the GPU RAM and memory bandwidth requirements,
there are only two models of GPUs namely, NVIDIA Tesla P100 (16 GB version) and
NVIDIA Tesla V100 (16 GB version) which comply. Thus the offer should include only
one of these two models. All the GPUs in the cluster must be identical.

5 Acceptance Criteria
Several cluster properties, most notably the number of production nodes, CPU Model,
GPU Model, storage capacity and the amount of DRAM are flexible, such that vendors
can offer the best quality and combination within the budget. All components supplied
must be new and usage of any old, second-hand or refurbished components is not allowed.

Introduction: For each requirement, an acceptance test will be conducted. The test
is to be carried out after installation and during or after configuration by the appli-
cant (Tenderer). Each test result is to be reviewed by COBALT 2.0 team comprising of
ASTRON and CIT personnel.

The basic procedure is as follows:


1. Vendors will have a period of one week to:

(a) deliver and install equipment at Center for Information Technology, Nettelbosje 1
(Smitsborg), 9747 AJ Groningen, The Netherlands;

(b) perform all configuration and tuning necessary to meet the acceptance criteria;

(c) make the system operational to the extent required to carry out the tests.

2. There will be one week to perform the acceptance tests by the applicant, with
COBALT 2.0 technicians in attendance.
3. If the equipment proves to be non-compliant, or does not meet the acceptance or
award criteria, the vendor has a maximum of four weeks to meet them (unless another
time period is agreed upon with the COBALT 2.0 team in writing) through e.g. an ex-
change, extension or reconfiguration of the system.

Step 1 is to be planned immediately before the offered delivery date. Step 2 is to be


planned immediately following the delivery date. The acceptance tests must be performed
with CentOS-7.4-1708, Kernel-3.10.0-693 (or newer).

Inspection: The applicant must prove that all offered hardware is delivered including
counting of all hardware and confirmed on a checklist to be signed by the COBALT 2.0
team.

I/O Performance Test: It is the applicant’s responsibility to prove that the system
meets all the requirements.

Only for internal use at ASTRON/COBALT2.0 6 pandey draft: v.ER0.IR0.M1.m12


Memory Speed Test: Review of offer and visual inspection of a (selection of) the
delivered head node and production nodes must confirm that the memory module speed
matches the maximum speed possible with the installed CPU and chip set.

When the BIOS, in the boot menu or via other means, displays the actual memory
speed, this speed must be recorded and documented. Where the BIOS allows setting
the actual memory speed, this speed must be set to the highest speed possible with the
installed CPU and chip set, and this speed setting must be verified and documented for
a true random selection of at least 2 nodes. The actual maximum memory speed must
be recorded. The memory settings are to be applied before any of the acceptance tests.

OS Support: An x86-64 bit version of CentOS-7.4-1708, Kernel-3.10.0-693 (or newer)


must be installed. The installation must succeed and the node must reboot into a func-
tional state. This test can be part of the Power Consumption Test or the Reliability
Test.

Network Benchmark Test: The vendor needs to demonstrate that designed PCIe
Interconnect Bandwidth to both the GPUs, both (2×1×100 Gb) infiniband interfaces
(EDR) and 6×10 GbE interfaces can be achieved when all are in operation simultaneously.
Sustained data rate across each of the interfaces must be measured when the nodes
are under normal operation. Corresponding to each socket/CPU, the measured PCIe
Interconnect Bandwidth (aggregate of both in and out) to the GPUs should be ≈32GB/s,
and to the EDR interface should at least 80 Gbps (individually for both in and out).
The simultaneously achievable sustained raw data network rate (aggregate) to the two
sockets/CPUs must be 25 Gbps and 30 Gbps respectively.

Power Consumption Test: Power consumption measurement will be performed us-


ing a power measurement device that has been calibrated and has been found to be
accurate with a tolerance of no more than 5% by an accredited European certification
institute. The power measurement will be performed under a ‘reference load’ of the
equipment, and under standard data center conditions. The ‘reference load‘ is defined as
a single instance of ‘mprime -t’. Mprime will spawn as many threads as the number of
CPU cores available; plus running one instance of ‘dd’ for each disk available to the OS;
plus running the most power consuming of ‘gpu burn’ or ‘gpu burn -d’; all simultaneously.

Reliability Test: The reliability test will be conducted similarly to the Power Con-
sumption Test. However, it will start twice as many ‘mprime -t’ and ‘dd’ processes per
node, and run simultaneously on all nodes. The test will run for 72 hours continuously.
During this time no hardware or firmware failures should occur, and the temperature of
all components should remain within their nominal operational temperature range with-
out compromising performance; and all components must continue to operate at their
nominal maximum sustainable speed.

6 Requirements Conformity Tables


The following tables list the cluster requirements in addition to any other requirements
mentioned else where in this document. Tenderers are requested to cross them

Only for internal use at ASTRON/COBALT2.0 7 pandey draft: v.ER0.IR0.M1.m12


for an offered configuration appropriately i.e. in the Fully/Not Compliant
(FC/NC) column for each requirement. The Tenderers must also fill in
the Table 11. In case of any explanation/comments, Tenderers are kindly requested to
provide them on a additional pages.

Only for internal use at ASTRON/COBALT2.0 8 pandey draft: v.ER0.IR0.M1.m12


Table 1: Rack Requirements

RID Requirement FC NC
Rk. 1 The cluster should be mountable in 19 inch racks which are
available at the site location. Racks are dimensioned at
800 mm×1000 mm×44 RU (w×d×h), of which only 42 U is avail-
able per rack. Any additional components necessary for integration
into these racks must be supplied.
Rk. 2 The cluster (including all switches) must fit physically in at most 2
racks.
Rk. 3 Rack mount kits (with sliding rails) which fit the available racks
must be included in the offer.
Rk. 4 The offer should include the power cords, all rack mounting mate-
rials, and the OoBM system components.
Rk. 5 The weight of the equipment must not exceed 1500 kg per rack.
Rk. 6 The systems, including all the components used, must be designed
for continuous (24×7) operation.
Rk. 7 The cluster must operate using 50 Hz AC and 230 V.
Rk. 8 The maximum power consumed should be less than 16 kW per rack.
Rk. 9 All systems must dissipate their heat directly into the air (i.e. no
liquid cooling solutions).
Rk. 10 All systems must be able to operate continuously to nominal speci-
fications (under full load) with an intake air temperature of (up to)
30◦ C.
Rk. 11 The direction of the airflow in the switches must be the same as
the direction of the airflow in the nodes.
Rk. 12 All the nodes and switches should be installed by the vendor in the
racks.
Rk. 13 The vendor is responsible for all internal connections within the
cluster. The connections must be clearly labelled.
Rk. 14 The vendor is responsible for all external connections as well. This
has to be done in coordination with the COBALT 2.0 team con-
sisting of ASTRON and CIT Personnel. The connections must be
clearly labelled.

Only for internal use at ASTRON/COBALT2.0 9 pandey draft: v.ER0.IR0.M1.m12


Table 2: General Node Requirements
The following list applies to each head node as well as each production node. It is
followed by additional requirements specific to both the types of nodes.

RID Requirement FC NC
GN. 1 The node must have 2 CPUs and sockets installed.
GN. 2 The CPUs should be from the Intel Xeon Gold (or Xeon Platinum)
family.
GN. 3 The base clock frequency of the CPU cores must be at least
2.1 GHz.
GN. 4 Turbo Boost for all the CPU cores will be disabled and they will
be set to highest possible base clock frequency.
GN. 5 The number of physical CPU cores per socket must be at least
12.
GN. 6 The CPUs must support a minimum of 6 memory channels.
GN. 7 All memory modules must have ECC and run at (at least) DDR4-
2666.
GN. 8 The CPUs must support DDR4-2666 memory.
GN. 9 Memory modules must be equally divided across all memory chan-
nels (e.g. six or twelve modules for six channel), and all memory
channels must be used.
GN. 10 The minimum amount of memory per socket/CPU is at least
96 GB 5 .
GN. 11 The amount of memory for both the sockets must be equal.
GN. 12 The minimum combined memory (including for both CPUs) is at
least 192 GB.
GN. 13 The node must have (at least) one general-purpose, server-grade
dedicated Ethernet interface with a speed of (at least) 1 Gbit/s
(full duplex) for user access(login) as shown in Figure 1.
GN. 14 The node must also have a dedicated 0.1/1 Gbit/s OoBM connec-
tion for IPMI and remote management (for control as shown in
Figure 1).
GN. 15 The access & control network switch connections to the node must
be UTP (1 GbE)
GN. 16 The access and control interfaces may be linked to any one of the
two sockets/CPUs on the node.
GN. 17 The node must support PXE version 2.0 (or later) on the network
interface mentioned in the previous requirement.
GN. 18 The node must support IPMI version 2.0 over Ethernet.
GN. 19 Ambient temperature and at least one temperature reading per
socket/CPU must be available over IPMI.
GN. 20 Remote (or “virtual”) KVM over IP (IPKVM) must be provided
(This may use vendor specific tools, but must be accessible over
the same management network as used for IPMI).
table continued . . .

5
In this document 1 GB=230 bytes whenever we refer to memory in context of RAM.

Only for internal use at ASTRON/COBALT2.0 10 pandey draft: v.ER0.IR0.M1.m12


. . . table continued
Table 2: General Node Requirements
RID Requirement FC NC
GN. 21 Virtual device support (at least CD-ROM/DVD-ROM) must be
provided in the OoBM solution.
GN. 22 Remote configuration of BIOS or EFI settings (device boot order,
etc.), and access to “magic” SysRq functionality via either the
IPMI SoL or rKVM must be possible.
GN. 23 The OoBM solution (package) must be usable through TCP/IP.
GN. 24 The node must be accessible via a local KVM (i.e. each node
must provide a display port and 2 USB ports for use with a local
monitor cart, keyboard/mouse).
GN. 25 It must be possible to remotely install and boot all nodes using an
unmodified x86-64 bit version of CentOS-7.4-1708 (Kernel-3.10.0-
693) or newer.
GN. 26 The head node and the production nodes must have same CPUs
and identical firmware revision levels.
GN. 27 The nodes must have “Dual Hot-plug, Redundant Power Sup-
ply (1+1)”.

Only for internal use at ASTRON/COBALT2.0 11 pandey draft: v.ER0.IR0.M1.m12


Table 3: Head Node - additional requirements
The following list applies to the head node in addition to the requirements in Table 2.
The head node will act as a login node, to host home directories, as a cluster job
manager, as a NFS server and may be used to compile software.

RID Requirement FC NC
HN. 1 The head node must meet the general node requirements.
HN. 2 In the head node - both CPUs, both GPUs, all memory modules,
hard drives etc. should be identical.
HN. 3 The head node does not need to have a GPU card for computing.
HN. 4 The head node must have (at least) one general-purpose, server-
grade 100 Gbit/s full-duplex infiniband EDR interface (connected to
any socket/CPU), for the batch data network. It must use PCIe3.0
and operate at full designed speed.
HN. 5 There should be a minimum of 8 HDDs (enterprise grade) and the
total storage capacity must be at least 32 TB.
HN. 6 The minimum capacity of each storage disk should be at least 4 TB.
HN. 7 All the storage disks should be identical.
HN. 8 The disks must be suitable to operate in software RAID, levels 0,
1, and 6, and as individual disks.
HN. 9 In the head node this disk storage must appear as a single drive,
configured in software RAID 0.
HN. 10 In the head node, It must be possible to achieve sustained local
read and write speeds of 10 Gbits/s in RAID 0 configuration. The
read and write does not need to happen simultaneously. If there is
no RAID controller on the system, SOFTWARE raid can be used.
HN. 11 No expansion cards connected via PCIe must have their through-
put speed bottlenecked. (Some PCIe riser cards and/or too many
connectors may interfere).

Only for internal use at ASTRON/COBALT2.0 12 pandey draft: v.ER0.IR0.M1.m12


Table 4: Production Nodes: additional requirements
The following list applies to all the production nodes in addition to the requirements
in Table 2. In the main usage scenario, multiple input data streams arrive via the
central LOFAR network (10 GbE) to each production node independently. Several
raw data streams are destined for a specific production node. The production nodes
process received data on GPUs and CPUs, while heavily relying on the batch data
network (Infiniband EDR) to exchange amongst each other, the suitably formatted
raw data.

RID Requirement FC NC
PN. 1 The offer must contain a minimum of 11 production nodes.
PN. 2 The offer must contain a maximum of 15 production nodes.
PN. 3 All production nodes must meet the general node requirements.
PN. 4 Same components across the production nodes must be identi-
cal (same make, model, capacity and speed for CPUs, memory
modules, hard drives, GPUs, networking interfaces, etc.), equal
number per node, and identical firmware revision levels. In other
words all production nodes must be identical to each other.
PN. 5 All nodes must have either (a) NVIDIA Tesla V100 (16 GB ver-
sion) GPU OR (b) NVIDIA Tesla P100 (16 GB version) GPU
installed for each socket/CPU. So, there must be two identical
GPUs installed in each production node. Thus mixing of P100
and V100 models is not allowed both within and across the pro-
duction nodes.
PN. 6 The GPUs must be CUDA compatible.
PN. 7 Each individual GPU must have 16 GB dedicated RAM and its
memory bandwidth should be at least 732 GB/s.
PN. 8 Corresponding to each socket/CPU, a production node must
have at least 3 (ports)×10 GbE server-grade full-duplex Ethernet
interfaces, for the raw data network (see Figure 1). The simulta-
neously achievable sustained raw data network rate (aggregate)
to the two CPUs must be 25 Gbps and 30 Gbps respectively. It
is an important requirement and the vendor is required to carry
out network bench mark tests as well as demonstrate this at the
CIT when the node is operational.
PN. 9 To be able to connect the production nodes to the raw data
network (10 GbE), usage of SR optics (multimode) is needed. It
is the sole responsibility of the Tenderer to ensure that the proper
optics and fiber patches are used.
PN. 10 For each socket/CPU in a production node, it must have
(at least) one server grade 100 Gbps full duplex infiniband
(EDR) interface as shown in Figure 1. These interfaces have
to be connected to the batch data network via the infini-
band switch (100 Gbps EDR). Components corresponding to
each socket/CPU must be able to communicate independently
via its own infiniband EDR interface with at least a sustained
rate of 80 Gbps.
table continued . . .

Only for internal use at ASTRON/COBALT2.0 13 pandey draft: v.ER0.IR0.M1.m12


. . . table continued
Table 4: Production nodes: additional requirements
RID Requirement FC NC
PN. 11 In General usage of Broadcom cards as well as cables is not
allowed. The Tenderer is encouraged to use high quality ca-
bles/optics/fiber patches/cards (e.g. Mellanox) to minimize the
overheads.
PN. 12 There should be no loss of the PCIe Interconnect Bandwidth to
both the GPUs, both (2×1×100 Gb) infiniband interfaces (EDR)
and 6×10 GbE interfaces when all are in operation simultane-
ously.
PN. 13 Both sets of CPU, GPU, 10 GbE interfaces and infiniband inter-
face within a production node are mutually exclusive in terms of
computing as explained in Cluster Overview (Section 3).
PN. 14 There should be a minimum of 2 HDDs (enterprise grade) and
the total storage capacity must be at least 8 TB.
PN. 15 The minimum capacity of each storage disk should be at least
4 TB.
PN. 16 All the storage disks should be identical.
PN. 17 The disks must be suitable to operate in software RAID, levels
0 and 1, and as individual disks.
PN. 18 The disk storage must appear as a single drive, configured in
software RAID 1.
PN. 19 Each individual disk must be able to write at a sustained speed
of 150MB/s.
PN. 20 No expansion cards connected via PCIe must have their through-
put bottlenecked (Some PCIe riser cards and/or too many con-
nectors may interfere). Compliant exception: If only for one
socket/CPU, the connected multi-port 10 GbE card uses only 4
lanes of PCIe 3.0 (instead of 8), and if that configuration does
deliver a sustained raw data network rate of 25 Gbps, it is also
considered as compliant. It is to be noted sustained raw data
network rate of 30 Gbps (via 3×10 GbE) must be achievable for
the other socket/CPU. Thus the exception does apply only to
one socket/CPU per production node.
PN. 21 For each socket/CPU, the GPU, infiniband interface (100 Gb
EDR) and 3×10 GbE interfaces must use PCIe3.0.

Only for internal use at ASTRON/COBALT2.0 14 pandey draft: v.ER0.IR0.M1.m12


Table 5: General Network Requirements:
The cluster has an access & control network (1 GbE), a batch data net-
work (infiniband EDR) and a raw data network (10 GbE). The ‘access & control
network’ and ‘batch data network’ should be supplied with their own switch. The
supplied switches should be remotely manageable via one or more uplinks. The pro-
duction nodes in the cluster need to be connected to the raw data network switches
already existing in the central lofar network. The following list applies to both
‘access & control network’ and ‘batch data network’. It is followed by additional
requirements specific to each of the two types of network.

RID Requirement FC NC
Net. 1 The supplied switches must be configurable over the network and
must support management over IPv4 and IPv6.
Net. 2 The supplied switches must support 802.3x (flow control) or have
equivalent flow control measures on all ports.
Net. 3 The supplied switches must support SNMP (v3) readouts of
packets and bytes and both in and outbound, as well as read-
out of the currently connected MAC address(es) and propagated
VLANs on all ports.
Net. 4 Both the access & control network switch (1 GbE) as well as the
batch data network Infiniband switch (EDR 100 Gbps) need to
be remotely manageable. They need to have an uplink via which
they can be managed remotely.
Net. 5 The supplied switches need to have “Dual Hot-plug, Redundant
Power Supply (1+1)”.
Net. 6 Switches and network interface cards must support jumbo
frames.
Net. 7 Switches must support 802.1q (tagged VLANs) on all ports, and
must support the concurrent configuration of multiple tagged
VLANs on any port.
Net. 8 Switches must support both tagged and untagged VLANs on the
same physical interface.
Net. 9 Switches must support both enabling as well as disabling of
IGMP and MLD(v2) snooping.
Net. 10 Switches must support 802.3ad (aggregation) and LACP on all
uplink ports.

Only for internal use at ASTRON/COBALT2.0 15 pandey draft: v.ER0.IR0.M1.m12


Table 6: Raw Data Network Requirements
The following list applies to the raw data network (10 GbE) between the central
LOFAR network and the production nodes.

RID Requirement FC NC
NR. 1 As shown in Figure 1, each production node is connected with
2×3×10 GbE (3 corresponding to each socket/CPU) to the exist-
ing Foundry/Brocade RX16 switches.
NR. 2 The fiber patches including optics (SR 850 nm multimode) for
the connection between the production nodes and the existing
Foundry/Brocade RX-16 switches. The maximum required length
is 11 m.

Only for internal use at ASTRON/COBALT2.0 16 pandey draft: v.ER0.IR0.M1.m12


Table 7: Batch Data Network : additional requirements
The batch data network connects the head node as well as the production nodes
to the infiniband (EDR 100 Gbit/s) switch as shown in Figure 1. This infiniband
switch is also connected to the three existing infiniband switches of the LOFAR
network. The already existing switches in the LOFAR network are all infiniband
FDR (54 Gbit/sec) switches.

RID Requirement FC NC
NB. 1 The batch data network has to match the general network require-
ments.
NB. 2 The batch data network switch must be an infiniband (EDR) switch
with enough interfaces for all the required links: (a) twice (each
infiniband EDR) to each of the production nodes plus (b) one (in-
finiband EDR) to the head node (c) four to the existing Mellanox
SX6036 FDR infiniband switches and (d) five to an existing Mel-
lanox SX6025 FDR infiniband switch. (f) one for the remote man-
agement interface of the batch data network switch.
NB. 3 The aggregated infiniband link to the Mellanox SX6036 FDR infini-
band switch(es) must be at least 180 Gbit/s (e.g. 4 ×54 Gbit/s).
The maximum required patch fiber length for each connection is
12 m.
NB. 4 The aggregated infiniband link to the Mellanox SX6025 FDR in-
finiband switch must be least 225 Gbit/s (e.g. 5× 54 Gbit/s). The
maximum required patch fiber length for each connection is 7 m.
NB. 5 The infiniband transport must also support IPoIB in “Connected
mode”.
NB. 6 The patch fibers and optics necessary for all the connections must
be included in the offer.
NB. 7 The batch network switch should be remotely manageable. It needs
to have an uplink via which it can be managed remotely.
NB. 8 The back plane of the batch data network must be sufficient so as
all connections can work at full designed speed simultaneously.

Only for internal use at ASTRON/COBALT2.0 17 pandey draft: v.ER0.IR0.M1.m12


Table 8: Access & Control Network - additional requirements
The access & control network connects all the nodes twice (login/NFS/PXE and
OoBM) to a single switch, which will be hooked up to the LOFAR 1 Gbit/s network.
The OoBM interfaces will be configured in their own VLAN, idem for the interfaces
used for login/NFS/PXE. We intend to connect all user access/management related
interfaces to a single switch.

RID Requirement FC NC
NUC. 1 The access & control network has to match the General Network
Requirements.
NUC. 2 The access & control network switch must be a 1 Gbit/s Ether-
net switch with enough 1 Gbit/s RJ45 full duplex interfaces for
the required links: login/NFS/PXE, OoBM, and uplink (i.e. 2
per node including the head node and all production nodes + 1
for uplink + 1 for the management interface of the batch data
network switch).
NUC. 3 The cables necessary for all the required connections must be
included in the offer.
NUC. 4 An extra network cable with RJ45 connectors suitable for
1 Gbit/s Ethernet and a length of at least 30 m must be included
in the offer (access & control network switch uplink).

Only for internal use at ASTRON/COBALT2.0 18 pandey draft: v.ER0.IR0.M1.m12


Table 9: General Software and Firmware Requirements
RID Requirement FC NC
SW. 1 All software’s related to the OoBM solution and the remote KVM
over IP setup, and all software’s/drivers related to those compo-
nents not controllable via the stock OS must be included in the
offer.
SW. 2 Wherever applicable, upgrades for the software and firmware must
be provided for 5 years and licenses must not have an expiration
date.
SW. 3 All the components must work with x86-64 bit version of CentOS-
7.4-1708 (Kernel-3.10.0-693) or newer. Any additional device
drivers or software’s needed for this must be provided by the ven-
dor.
SW. 4 The offer must include the detailed technical documentation of the
proposed cluster e.g. detailed design of ‘Motherboard’ lay out, man-
uals etc.
SW. 5 Documentation (electronic or otherwise) must be provided in En-
glish or Dutch.
SW. 6 All software licenses required for cluster operation must be included
(for at least 5 years) in the offer.

Only for internal use at ASTRON/COBALT2.0 19 pandey draft: v.ER0.IR0.M1.m12


Table 10: Support Requirements
The Tenderer is requested to offer a support contract. For some cases listed below,
an acceptable support type is outlined and the Tenderer is requested to propose a
detailed support solution.

RID Requirement FC NC
Sup. 1 A support contract must be included in the offer for a period of
5 years. The offered contract must fulfill all requirements in this
section and in the framework agreement.
Sup. 2 When the nodes and networking components are supplied by the
same vendor, then the Tenderer must provide a single contact point
for support on the delivered hardware. When the nodes and net-
working components are supplied by two different vendors, then
the Tenderer must supply at least one contact point for support on
the delivered hardware. In this case, the Tenderer may supply two
separate contact points for support on the nodes and network com-
ponents, respectively. The Tenderer remains the single responsible
entity for this offer.
Sup. 3 Replacement of components may be performed by a site personnel
from the CIT data center or ASTRON personnel without impact
to support. The Tenderer is requested to indicate for what type
of components this is acceptable (e.g. none, hot-swappable + ca-
bles/optics, nodes/switches, or all).
Sup. 4 Replacement parts must be on-site (as well as on-site replacement
if needed (see previous item)) within 5 working days when needed
to solve an incident.
Sup. 5 Support must be given on complete systems.
Sup. 6 All components must have the same service level. None of the
components may be withdrawn from support for the entire support
period.
Sup. 7 In case of an unexpected data center cooling outage, systems may
have been operating outside the supported operating temperature
range for a short duration. The Tenderer is requested to indicate
the temperature range and support impact for this situation. If sup-
port is impacted, the Tenderer is requested to indicate a procedure
(e.g. a system stress test or a period of continuous system opera-
tion within the indicated temperature range) after which support is
reinstated for still correctly functioning systems and components.
Sup. 8 Access to second-level support must be included in the offer. Any
support given must appropriately take into account and accept as
authoritative any and all trouble-shooting done by site personnel
from the CIT data center and ASTRON personnel before the call
was placed or performed later during incident handling.

Only for internal use at ASTRON/COBALT2.0 20 pandey draft: v.ER0.IR0.M1.m12


Table 11: Award Criteria
The Tenderer must fill in these quantities truthfully.
Number of offered production nodes
CPU Model
GPU Model
Total storage capacity of head node (TB)
Total storage capacity of a production node (TB)
Main Memory size of the head node (GB)
Main Memory size of a production node (GB)
Memory bandwidth of a GPU (without ECC) (GB/s)
Single precision (multiply-add (MAD)) computer power
(Tflop/s)
of a GPU
half precision (multiply-add (MAD)) computer power
(Tflop/s)
of a GPU
Delivery time (weeks)

Only for internal use at ASTRON/COBALT2.0 21 pandey draft: v.ER0.IR0.M1.m12


7 To do list
• Acronyms which have been not listed. Shall we get rid of some of the too well
known acronyms like CPU ?

• Check which is the latest CentOS and Kernel suitable for mounting the Lustre file
system. ; custom kernel

• Acceptance test need more thought. Do we have any specific tests in mind?

• vague terms such as normal load/operation, quality over cost, to the extent required
must be replaced with specifics.

• Virtual device support (at least CD-ROM/DVD-ROM) must be provided in the


OoBM solution. Do we need to include that?

• Remote configuration of BIOS or EFI settings (device boot order, etc.), and access
to “magic” SysRq functionality via either the IPMI SoL or rKVM must be possible.
Does this needs to be there.

• Do we also need some justification for Intel CPU

• PN.13 - A very difficult test. May be place it in acceptance criteria after thought
or rephrase it specifically.

• Support: What support exactly do we need.

• the strange drive must not be supplied.. the one which is cheap and can only change
stuff slowly.

• Can we make the 25Gbps a desired feature instead of a mandatory feature and
place some award criteria on it?

• Do we need to specify even in the switches if they are data center class etc..

• Award criteria table needs to be completed (To be done after sending the document
for internal review.)

• check if anywhere typo with bit and bytes or Giga withe Mega etc.

• repetition of the conditions within the tables.

• PN.12 Is it legal? and is this needed?

Only for internal use at ASTRON/COBALT2.0 22 pandey draft: v.ER0.IR0.M1.m12

Vous aimerez peut-être aussi