Académique Documents
Professionnel Documents
Culture Documents
by This document describes the Dolphin Express software stack version 3.3.0. Published November 13th, 2007 Copyright 2007 Dolphin Interconnect Solutions ASA
Published under Gnu Public License v2
Abstract ..................................................................................................................................... vii 1. Introduction & Overview ............................................................................................................ 1 1. Who needs Dolphin Express and SuperSockets? ..................................................................... 1 2. How do Dolphin Express and SuperSockets work? .................................................................. 1 3. Terminology ..................................................................................................................... 1 4. Contact & Feedback: Dolphin Support .................................................................................. 2 2. Requirements & Planning ........................................................................................................... 3 1. Supported Platforms .......................................................................................................... 3 1.1. Hardware ............................................................................................................... 3 1.1.1. Supported Platforms ...................................................................................... 3 1.1.2. Recommended Node Hardware ....................................................................... 3 1.1.3. Recommended Frontend Hardware .................................................................. 3 1.2. Software ................................................................................................................ 3 1.2.1. Linux ......................................................................................................... 4 1.2.2. Windows .................................................................................................... 4 1.2.3. Solaris ........................................................................................................ 4 1.2.4. Others ........................................................................................................ 4 2. Interconnect Planning ......................................................................................................... 4 2.1. Nodes to Equip with Dolphin Express Interconnect ....................................................... 4 2.1.1. MySQL Cluster ............................................................................................ 4 2.2. Interconnect Topology ............................................................................................. 5 2.3. Physical Node Placement ......................................................................................... 5 3. Initial Installation ...................................................................................................................... 6 1. Overview ......................................................................................................................... 6 2. Installation Requirements .................................................................................................... 6 2.1. Live Installation ..................................................................................................... 7 2.2. Non-GUI Installation ............................................................................................... 7 2.2.1. No X / GUI on Frontend ............................................................................... 7 2.2.2. No X / GUI Anywhere .................................................................................. 7 3. Adapter Card Installation .................................................................................................... 8 4. Software and Cable Installation ........................................................................................... 8 4.1. Overview ............................................................................................................... 8 4.2. Starting the Software Installation ............................................................................... 9 4.3. Working with the dishostseditor ............................................................................... 13 4.3.1. Cluster Edit ............................................................................................... 13 4.3.2. Node Arrangement ...................................................................................... 14 4.3.3. Cabling Instructions .................................................................................... 16 4.4. Cluster Cabling ..................................................................................................... 17 4.4.1. Connecting the cables .................................................................................. 17 4.4.2. Verifying the Cabling .................................................................................. 18 4.5. Finalising the Software Installation ........................................................................... 19 4.5.1. Static SCI Connectivity Test ......................................................................... 19 4.5.2. SuperSockets Configuration Test ................................................................... 20 4.5.3. SuperSockets Performance Test ..................................................................... 20 4.6. Handling Installation Problems ................................................................................ 20 4.7. Interconnect Validation with sciadmin ....................................................................... 20 4.7.1. Installing sciadmin ...................................................................................... 21 4.7.2. Starting sciadmin ........................................................................................ 21 4.7.3. Cluster Overview ........................................................................................ 21 4.7.4. Cabling Correctness Test .............................................................................. 22 4.7.5. Fabric Quality Test ..................................................................................... 22 4.8. Making Cluster Application use Dolphin Express ........................................................ 23 4.8.1. Generic Socket Applications ......................................................................... 23 4.8.2. Native SCI Applications .............................................................................. 25 4.8.3. Kernel Socket Services ................................................................................ 26 4. Update Installation ................................................................................................................... 27 1. Complete Update ............................................................................................................. 27 2. Rolling Update ................................................................................................................ 27
iv
5. Manual Installation .................................................................................................................. 1. Installation under Load ..................................................................................................... 2. Installation of a Heterogeneous Cluster ................................................................................ 3. Manual RPM Installation .................................................................................................. 3.1. RPM Package Structure .......................................................................................... 3.2. RPM Build and Installation ..................................................................................... 4. Unpackaged Installation .................................................................................................... 6. Interconnect and Software Maintenance ....................................................................................... 1. Verifying Functionality and Performance ............................................................................. 1.1. Low-level Functionality and Performance .................................................................. 1.1.1. Availability of Drivers and Services ............................................................... 1.1.2. Cable Connection Test ................................................................................. 1.1.3. Static Interconnect Test ................................................................................ 1.1.4. Interconnect Load Test ................................................................................ 1.1.5. Interconnect Performance Test ...................................................................... 1.2. SuperSockets Functionality and Performance .............................................................. 1.2.1. SuperSockets Status .................................................................................... 1.2.2. SuperSockets Functionality ........................................................................... 1.3. SuperSockets Utilization ......................................................................................... 2. Replacing SCI Cables ....................................................................................................... 3. Replacing a PCI-SCI Adapter ............................................................................................ 4. Physically Moving Nodes ................................................................................................. 5. Replacing a Node ............................................................................................................ 6. Adding Nodes ................................................................................................................. 7. Removing Nodes ............................................................................................................. 7. MySQL Operation ................................................................................................................... 1. MySQL Cluster ............................................................................................................... 1.1. SuperSockets Poll Optimization ............................................................................... 1.2. NDBD Deadlock Timeout ...................................................................................... 1.3. SCI Transporter .................................................................................................... 2. MySQL Replication ......................................................................................................... 8. Advanced Topics ..................................................................................................................... 1. Notification on Interconnect Status Changes ......................................................................... 1.1. Interconnect Status ................................................................................................ 1.2. Notification Interface ............................................................................................. 1.3. Setting Up and Controlling Notification .................................................................... 1.3.1. Configure Notification via the dishostseditor .................................................... 1.3.2. Configure Notification Manually ................................................................... 1.3.3. Verifying Notification .................................................................................. 1.3.4. Disabling and Enabling Notification Temporarily .............................................. 2. Managing IRM Resources ................................................................................................. 2.1. Updates with Modified IRM Configuration ................................................................ 9. FAQ ...................................................................................................................................... 1. Hardware ....................................................................................................................... 2. Software ........................................................................................................................ A. Self-Installing Archive (SIA) Reference ...................................................................................... 1. SIA Operating Modes ....................................................................................................... 1.1. Full Cluster Installation .......................................................................................... 1.2. Node Installation ................................................................................................... 1.3. Frontend Installation .............................................................................................. 1.4. Installation of Configuration File Editor .................................................................... 1.5. Building RPM Packages Only ................................................................................. 1.6. Extraction of Source Archive .................................................................................. 2. SIA Options ................................................................................................................... 2.1. Node Specification ................................................................................................ 2.2. Installation Path Specification .................................................................................. 2.3. Installing from Binary RPMs .................................................................................. 2.4. Preallocation of SCI Memory ..................................................................................
30 30 32 33 33 34 34 37 37 37 37 37 37 39 41 42 42 42 43 44 45 45 46 46 47 49 49 49 49 49 50 51 51 51 51 52 52 52 53 53 53 53 55 55 56 59 59 59 59 59 59 59 59 60 60 60 60 60
2.5. Enforce Installation ............................................................................................... 2.6. Configuration File Specification ............................................................................... 2.7. Batch Mode ......................................................................................................... 2.8. Non-GUI Build Mode ............................................................................................ 2.9. Software Removal ................................................................................................. B. sciadmin Reference ................................................................................................................. 1. Startup ........................................................................................................................... 2. Interconnect Status View .................................................................................................. 2.1. Icons ................................................................................................................... 2.2. Operation ............................................................................................................. 2.2.1. Cluster Status ............................................................................................. 2.2.2. Node Status ............................................................................................... 3. Node and Interconnect Control ........................................................................................... 3.1. Admin Menu ........................................................................................................ 3.2. Cluster Menu ....................................................................................................... 3.3. Node Menu .......................................................................................................... 3.4. Cluster Settings ..................................................................................................... 3.5. Adapter Settings ................................................................................................... 4. Interconnect Testing & Diagnosis ....................................................................................... 4.1. Cable Test ........................................................................................................... 4.2. Traffic Test .......................................................................................................... C. Configuration Files .................................................................................................................. 1. Cluster Configuration ....................................................................................................... 1.1. dishosts.conf ......................................................................................................... 1.1.1. Basic settings ............................................................................................. 1.1.2. SuperSockets settings .................................................................................. 1.1.3. Miscellaneous Notes .................................................................................... 1.2. networkmanager.conf ............................................................................................. 1.3. cluster.conf .......................................................................................................... 2. SuperSockets Configuration ............................................................................................... 2.1. supersockets_profiles.conf ....................................................................................... 2.2. supersockets_ports.conf .......................................................................................... 3. Driver Configuration ........................................................................................................ 3.1. dis_irm.conf ......................................................................................................... 3.1.1. Resource Limitations ................................................................................... 3.1.2. Memory Preallocation .................................................................................. 3.1.3. Logging and Messages ................................................................................ 3.2. dis_ssocks.conf ..................................................................................................... D. Platform Issues and Software Limitations .................................................................................... 1. Platforms with Known Problems ........................................................................................ 2. IRM .............................................................................................................................. 3. SuperSockets ..................................................................................................................
61 61 61 61 62 63 63 63 63 64 64 68 68 68 69 70 70 72 73 73 75 78 78 78 78 79 80 80 80 80 80 81 81 82 82 82 84 84 86 86 86 86
vi
Abstract
This document describes the installation of the Dolphin Interconnect Solutions (DIS) Dolphin Express interconnect hardware and the DIS software stack, including SuperSockets, on single machines or on a cluster of machines. This software stack is needed to use Dolphin's Dolphin Express high-performance interconnect products and consists of drivers (kernel modules), user space libraries and applications, an SDK, documentation and more. SuperSockets drastically accelerate generic socket communication as used by clustered applications.
vii
3. Terminology
We define some terms that will be used throughout this document. adapter A PCI-to-SCI (D33x series), PCI-Express-to-SCI (D35x series) or PCI-Express fabric (DXH series) adapter. This is the Dolphin Express hardware installed in the cluster nodes. A computer which is part of the Dolphin Express interconnect, which means it has an adapter installed. All nodes together constitute the cluster. The CPU architecture relevant in this guide is characterized by the addressing width of the CPU (32 or 64 bit) and the instruction set (x86, Sparc, etc.). If these two characteristics are identical, the CPU architecture is identical for the scope of this guide. A directed point-to-point connection in the SCI interconnect. Physically, a link is the cable leading from the output of one adapter to the input of another adapter. For an SCI interconnect configured in torus topology, the links are connected as closed multiple closed rings. For a two-dimensional torus topology like
node
CPU architecture
link
ringlet
when using D352 adapters, these rings can be considered to the the columns and rows. These rings are called ringlets. frontend The single computer that is running software that monitors and controls the nodes in the cluster. For increased fault tolerance, the frontend should not be part of the Dolphin Express interconnect it controls, although this is possible. Instead, the frontend should communicate with the nodes out-of-band, which means via Ethernet. The installation script is typically executed on the frontend, but can also be executed on another machine that is neither a node nor the frontend, but has network (ssh) access to all nodes and the frontend. This machine is the installation machine. The interconnect drivers are kernel modules and thus need to be built for the exact kernel running on the node (otherwise, the kernel will refuse to load them). To build kernel modules on a machine, the kernel-specific include files and kernel configuration have to be installed - these are not installed by default on most distributions. You will need to have one kernel build machine available which has these files installed (contained in the kernel-devel RPM that matches the installed kernel version) and that runs the exact same kernel version as the nodes. Typically, the kernel build machine is one of the nodes itself, but you can choose to build the kernel modules on any other machine that fulfills the requirements listed above. All nodes constitute the cluster. The network manager is a daemon process named dis_networkmgr running on the frontend. It is part of the Dolphin software stack and manages and controls the cluster using the node managers running on all nodes. The network manager knows the interconnect status of all nodes. The node manager provides is a daemon process that is running on each node and provides remote access to the interconnect driver and other node status to the network manager. It reports status and performs actions like configuring the installed adapter or changing the interconnect routing table if necessary. A self-installing archive (SIA) is a single executable shell command file (for Linux and Solaris) that is used to compile and install the Dolphin software stack in all required variants. It largely simplifies the deployment and management of a Dolphin Express-based cluster. Scalable Coherent Interface is one of the interconnect implementations that can be used with Dolphin Express software, like SuperSockets and SISCI. SCI is an IEEE standard; the implementation offered by Dolphin are the D33x and D35x series of adapter cards. SISCI (Software Infrastructure for SCI) is the user-level API to create applications that make direct use of the Dolphin Express interconnect capabilities. Despite its inherited name, it also supports other interconnect implementations offered by Dolphin, like DSX.
installation machine
node manager
SISCI
1. Supported Platforms
The Dolphin Express software stack is designed to run on all current cluster hard- and software platforms, and also supports and adapts to platforms that are several years old to ensure long-term support. Generally, Dolphin strives to support every platform that can run any version of Windows, Linux or Solaris and offers a PCI (Express) slot. Next to this general approach, we qualify certain platforms with our partners which then are guaranteed to run and perform optimally the qualified application. We also test platforms internally and externally for general functionality and performance. For details, please see Appendix D, Platform Issues and Software Limitations.
1.1. Hardware
1.1.1. Supported Platforms
The Dolphin Express hardware (interconnect adapters) complies to the PCI industry standard (either PCI 2.2 64bit/66MHz or PCI-Express 1.0a) and will thus operate in any machine that offers compliant slots. Supported CPU architectures are x86 (32 and 64 bit), PowerPC and PowerPC64, Sparc and IA-64. However, some combinations of CPU and chipset implementations offer sub-optimal performance which should be considered when planning a new system. A few cases are documented in which bugs in the chipset have shown up with our interconnect as it puts a lot of load onto the related components. For the hardware platforms qualified or tested with Dolphin Express, please see Appendix D, Platform Issues and Software Limitations. If you have questions about your specific hardware platform, please contact Dolphin support.
Note
Half-height slots can be used with Dolphin DXH-series of adapters. A half-height version of the SCIadapters will be available soon; please contact Dolphin support for availability. The Dolphin Express interconnect is fully inter-operable between all supported hardware platforms, also with different PCI or CPU architectures. As usual, care must be taken by the applications if data with different endianess is communicated.
1.2. Software
Dolphin Express supports a variety of operating systems that are listed below.
1.2.1. Linux
The Dolphin Express software stack can be compiled for all 2.6 kernel versions and most 2.4 kernel versions. A few extra packages (like the kernel include files and configuration) need to be installed for the compilation. Dolphin does only provide source-based distributions which are to be compiled for the exact kernel and hardware version you are using. Software stacks operating on different kernel versions are of course fully inter-operable for inter-node communication. Dolphin Express fully supports native 32-bit and 64-bit platforms. On 64-bit platforms offering both, 32-bit and 64-bit runtime environments, SuperSockets will support 32-bit applications if the compilation environment for 32-bit is also installed. Otherwise, only the native 64-bit runtime environment is supported. For more information, please refer to the FAQ chapter, Q: 2.1.6. Please refer to the release notes of the software stack version you are about to install for the current list of tested Linux distributions and kernel versions. Installation and operation on Linux distributions and kernel versions that are not in this list will usually work as well, but especially the most recent Linux version may cause problems if it has not yet been qualified by Dolphin.
1.2.2. Windows
The Dolphin Express software stack operates on 32- and 64-bit versions of Windows NT 4.0, Windows 2000, Windows 2003 Server and Windows XP. We provide MSI binary installer packages for each Windows version. TBA: More information on the available components under Windows.
1.2.3. Solaris
Solaris 2.6 through 9 on Sparc is supported (excluding SuperSockets). MySQL Cluster can be used via the SCI Transporter provided by MySQL, using the SISCI interface provided by Dolphin.
Note
Support for Solaris 10 on Sparc and AMD64 (x86_64) including SuperSockets is under development. Ask Dolphin support for the current status.
1.2.4. Others
The Dolphin Express software stack excluding SuperSockets also run on VxWorks, LynxOS and HP-UX. Contact Dolphin support with your requirements.
2. Interconnect Planning
This section discusses the decisions that are necessary when planning to install a Dolphin Express Interconnect.
Machines that serve as MySQL frontends (like to run the MySQL Cluster management daemon ndb_mgmd) do not benefit from the Dolphin Express interconnect.
Note
The machine that runs the Dolphin network manager must not be equipped with the Dolphin Express interconnect as this reduces the level of fault tolerance.
Note
For some chipsets, PCI performance does not scale well, reducing the performance improvement of a second fabric. If this feature is important for you, contact Dolphin Support to make sure that you chose a chipset for which the full performance will be delivered.
1. Overview
The initial installation of Dolphin Express hardware and software will follow these steps that are described in detail in the following sections: 1. Verification of installation requirements: 2. Study section Section 2, Installation Requirements. The setup script itself will also verify that these requirements are met and indicate what is missing.
Installation of interconnect adapters on the nodes (see Section 3, Adapter Card Installation).
Note
The cables should not be installed in this step! 3. Installation of software and cables. This step is refined in section Section 4, Software and Cable Installation.
2. Installation Requirements
For the SIA-based installation of the full cluster and the frontend, the following requirements have to be met: Homogeneous cluster nodes: All nodes of the cluster are of the same CPU architecture and run the same kernel version. The frontend machine may be of different CPU architecture and kernel version!
Note
The installation of the Dolphin Express software on a system that does not satisfy this requirement is described in Section 2, Installation of a Heterogeneous Cluster RPM support: The Linux distribution on the nodes, the frontend and the installation machine needs to support RPM packages. Both major distributions from Red Hat and Novell (SuSE) use RPM packages.
Note
On platforms that do not support RPM packages, it is also possible to install the Dolphin Express software. Please see Section 4, Unpackaged Installation for instructions. Installed RPM packages: To build the Dolphin Express software stack, a few RPM packages that are often not installed by default are required:
qt and qt-devel (> version 3.0.5), , glibc-devel and libgcc (32- and 64-bit, depending on what binary formats should be supported), rpm-build, and the kernel header files and configuration (typically a kernel-devel or kernel-source RPM that exactly(!) matches the version of the installed kernel)
Initial Installation
Note
The SIA will check for these packages, report what packages might be missing and will offer to install them if the yum RPM management system is supported on the affected machine. All required RPM packages are within the standard set of RPM packages offered for your Linux distribution, but may not be installed by default. If the qt-RPMs are not available, the Dolphin Express software stack can be built nevertheless, but the GUI applications to configure and manage the cluster will not be available. Please see below (Section 2.2, Non-GUI Installation) on how to install the software stack in this case. GUI support: for the initial installation, the installation machine should be able to run GUI application via X. .
Note
If the required configuration files are already available prior to the installation, a GUI is not required (see section Section 2.2, Non-GUI Installation). Disk space: To build the RPM packages, about 500MB free disk space in the system's temporary directory (typically /tmp on Linux) are required on the kernel build machine and the frontend.
Note
It is possible to assign SIA to use a specific temporary directory for building using the --build-root option.
Initial Installation
In this scenario, no GUI application is run at all during the installation. To create the the configuration files on another machine, you can either run the SIA with the --install-editor option if it is a Linux machine, or install a binary version of the dishostseditor if it is a Windows-based machine. Finally, you can send the necessary information to create the configuration files to Dolphin support which will then provide you with the matching configuration files and the cabling instructions. This information includes: external hostnames (or IP adresses) of all nodes adapter type and number of fabrics (1 or 2) hostnames (or IP adresses/subnet) which should be accelerated with SuperSockets (default is the list of hostnames provided above) planned interconnect topology (default is derived from number of nodes and adapter type) description of how nodes are physically located (to avoid cabling problems)
Note
If you know how to connect the cables, you can do so now. Please be advised to inspect and verify the cabling for correctness as described in the remainder of this chapter, though.
4.1. Overview
The integrated cluster and frontend installation is the default operation of SIA, but can be specified explicitly with the --install-all option. It works as follows:
Initial Installation
The SIA is executed on the installation machine with root permissions. The installation machine is typically the machine to serve as frontend, but can be any other machine if necessary (see Section 2.2.1, No X / GUI on Frontend). The SIA controls the building, installation and test operations on the remote nodes via ssh. Therefore, password-less ssh to all remote nodes is required. If password-less ssh access is not set up between the installation machine, frontend and nodes, SIA offers to set this up during the installation. The root passwords for all machines are required for this. The binary RPMs for the nodes and the frontend are built on the kernel build machine and the frontend, respectively. The kernel build machine needs to have the kernel headers and configuration installed, while the frontend and the installation machine only compile user-space applications. The node RPMs with the kernel modules are installed on all nodes, the kernel modules are loaded and the node manager is started. At this stage, the interconnect is not yet configured. On an initial installation, the dishostseditor is installed and executed on the installation machine to create the cluster configuration files. This requires user interaction. The cluster configuration files are transferred to the frontend, and the network manager is installed and started on the frontend. It will in turn configure all nodes according to the configuration files. The cluster is now ready to utilize the Dolphin Express interconnect. A number of tests are executed to verify that the cluster is functional and to get basic performance numbers. For other operation modes, such to install specific components on the local machine, please refer to Appendix A, Self-Installing Archive (SIA) Reference.
The script will ask questions to retrieve information for the installation. You will notice that all questions are Yes/no questions, and that the default answer is marked by a capital letter, which can be chosen by just pressing Enter. A typical installation looks like this:
[root@scimple tmp]# sh DIS_install_3.3.0.sh Verifying archive integrity... All good. Uncompressing Dolphin DIS 3.3.0 #* Logfile is /tmp/DIS_install.log_140 on tiger-0 #* #+ Dolphin ICS - Software installation (version: 1.52 $ of: 2007/11/09 16:31:32 $) #+ #* Installing a full cluster (nodes and frontend) . #* This script will install Dolphin Express drivers, tools and services #+ on all nodes of the cluster and on the frontend node. #+ #+ All available options of this script are shown with option '--help' # >>> OK to proceed with cluster installation? [Y/n]y # >>> Will the local machine <tiger-0> serve as frontend? [Y/n]y
The default choice is to use the local machine as frontend. If you answer n, the installer will ask you for the hostname of the designated frontend machine. Each cluster needs its own frontend machine. Please note that the complete installation is logged to a file which is shown at the very top (here: /tmp/ DIS_install.log_140). In case of installation problems, this file is very useful to Dolphin support.
Initial Installation
#* NOTE: Cluster configuration files can be specified now, or be generated #+ ..... during the installation. # >>> Do you have a 'dishosts.conf' file that you want to use for installation? [y/N]n
Because this is the initial installation, no installed configuration files could be found. If you have prepared or received configuration files, they can be specified now by answering y. In this case, no GUI application needs to run during the installation, allowing for a shell-only installation. For the default answer, the hostnames of the nodes need to be specified (see below), and the cluster configuration is created later on using the GUI application dishostseditor.
#* NOTE: #+ No cluster configuration file (dishosts.conf) available. #+ You can now specify the nodes that are attached to the Dolphin #+ Express interconnect. The necessary configuration files can then #+ be created based on this list of nodes. #+ #+ Please enter hostname or IP addresses of the nodes one per line. #* When done, enter a single colon ('.'). #+ (proposed hostname is given in [brackets]) # >>> node hostname/IP address <colon '.' when done> []tiger-1 # >>> node hostname/IP address <colon '.' when done> [tiger-2] -> tiger-2 # >>> node hostname/IP address <colon '.' when done> [tiger-3] -> tiger-3 # >>> node hostname/IP address <colon '.' when done> [tiger-4] -> tiger-4 # >>> node hostname/IP address <colon '.' when done> [tiger-5] -> tiger-5 # >>> node hostname/IP address <colon '.' when done> [tiger-6] -> tiger-6 # >>> node hostname/IP address <colon '.' when done> [tiger-7] -> tiger-7 # >>> node hostname/IP address <colon '.' when done> [tiger-8] -> tiger-8 # >>> node hostname/IP address <colon '.' when done> [tiger-9] -> tiger-9 # >>> node hostname/IP address <colon '.' when done> [tiger-10].
The hostnames or IP-addresses of all nodes need to be entered. The installer suggests the hostnames if possible in brackets. To accept a suggestion, just press Enter. Otherwise, enter the hostname or IP address. The data entered is verified to represent an accessible hostname. If a node has multiple IP addresses / hostnames, make sure you specify the one that is visible for the installation machine and the frontend. When all hostnames are entered, enter a single colon . to finish.
#* NOTE: #+ The kernel modules need to be built on a machine with the same kernel #* version and architecture of the interconnect node. By default, the first #* given interconnect node is used for this. You can specify another build #* machine now. # >>> Build kernel modules on node tiger-1 ? [Y/n]y
If you answer n at this point, you can enter the hostname of another machine on which the kernel modules are built. Make sure it matches the nodes for CPU architecture and kernel version.
# >>> Can you access all machines (local and remote) via password-less ssh? [Y/n]y
The installer will later on verify if the password-less ssh access actually works. If you answer n, the installer will set up password-less ssh for you on all nodes and the frontend. You will need to enter the root password once for each node and the password. The password-less ssh access remain active after the installation. To disable it again, remove the file /root/.ssh/ authorized_keys from all nodes and the frontend.
#* NOTE:
10
Initial Installation
#+ It is recommnended that interconnect nodes are rebooted after the #+ initial driver installation to ensure that large memory allocations will succeed. #+ You can omitt this reboot, or do it anytime later if necesary. # >>> Reboot all interconnect nodes (tiger-1 tiger-2 tiger-3 tiger-4 tiger-5 tiger-6 tiger-7 tiger-8 tiger
For optimal performance, the low-level driver needs to allocate some amount of kernel memory. This allocation can fail on a system that has been under load for a long time. If you are not installing on a live system, rebooting the nodes is therefore offered here. You can perform the reboot manually later on to achieve the same effect. If chosen, the reboot will be performed by the installer without interrupting the installation procedure.
#* NOTE: #+ About to INSTALL Dolphin Express interconnect drivers on these nodes: ... tiger-1 ... tiger-2 ... tiger-3 ... tiger-4 ... tiger-5 ... tiger-6 ... tiger-7 ... tiger-8 ... tiger-9 #+ About to BUILD Dolphin Express interconnect drivers on this node: ... tiger-1 #+ About to install management and control services on the frontend machine: ... tiger-0 #* Installing to default target path /opt/DIS on all machines .. (or the current installation path if this is an update installation). # >>> OK to proceed? [Y/n]y
The installer presents an installation summary and asks for confirmation. If you answer n at this point, the installer will exist and the installation needs to be restarted.
#* NOTE: #+ Testing ssh-access to all cluster nodes and gathering configuration. #+ #+ If you are asked for a password, the ssh access to this node without #+ password is not working. In this case, you need to interrupt with CTRL-c #+ and restart the script answering 'no' to the intial question about ssh. ... testing ssh to tiger-1 ... testing ssh to tiger-2 ... testing ssh to tiger-3 ... testing ssh to tiger-4 ... testing ssh to tiger-5 ... testing ssh to tiger-6 ... testing ssh to tiger-7 ... testing ssh to tiger-8 ... testing ssh to tiger-9 #+ OK: ssh access is working #+ OK: nodes are homogenous #* OK: found 1 interconnect fabric(s). #* Testing ssh to other nodes ... testing ssh to tiger-1 ... testing ssh to tiger-0 ... testing ssh to tiger-0 #* OK.
The ssh-access is tested, and some basic information is gathered from the nodes to verify that the nodes are homogeneous and equipped with at least on Dolphin Express adapter and meet the other requirements. If a required RPM package was missing, it would be indicated here with the option to install it (if yum can be used), or to fix the problem manually and retry. If the test for homogeneous nodes failes, please refer to section Section 2, Installation of a Heterogeneous Cluster for information on how to install the software stack.
#* Building node RPM packages on tiger-1 in /tmp/tmp.AEgiO27908
11
Initial Installation
#+ This will take some minutes... #* Logfile is /tmp/DIS_install.log_983 on tiger-1 #* OK, node RPMs have been built. #* Building frontend RPM packages on scimple in /tmp/tmp.dQdwS17511 #+ This will take some minutes... #* Logfile is /tmp/DIS_install.log_607 on scimple #* OK, frontend RPMs have been built. #* Copying RPMs that have been built: /tmp/frontend_RPMS/Dolphin-NetworkAdmin-3.3.0-1.x86_64.rpm /tmp/frontend_RPMS/Dolphin-NetworkHosts-3.3.0-1.x86_64.rpm /tmp/frontend_RPMS/Dolphin-SISCI-devel-3.3.0-1.x86_64.rpm /tmp/frontend_RPMS/Dolphin-NetworkManager-3.3.0-1.x86_64.rpm /tmp/node_RPMS/Dolphin-SISCI-3.3.0-1.x86_64.rpm /tmp/node_RPMS/Dolphin-SISCI-devel-3.3.0-1.x86_64.rpm /tmp/node_RPMS/Dolphin-SCI-3.3.0-1.x86_64.rpm /tmp/node_RPMS/Dolphin-SuperSockets-3.3.0-1.x86_64.rpm
The binary RPM packages matching the nodes and frontend are built and copied to the directory from where the installer was invoked. They are placed into the subdirectories node_RPMS and frontend_RPMS for later use (see the SIA option --use-rpms).
#* To install/update the Dolphin Express services like SuperSockets, all running #+ Dolphin Express services needs to be stopped. This requires that all user #+ applications using SuperSockets (if any) need to be stopped NOW. # >>> Stop all DolpinExpress services (SuperSockets) NOW? [Y/n]y #* OK: all Dolphin Express services (if any) stopped for upgrade.
On an initial installation, there will be now user applications using SuperSockets, so you can easily answer y right away.
#* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #+ #+ #+ #+ #+ #+ #+ Installing node tiger-1 OK. Installing node tiger-2 OK. Installing node tiger-3 OK. Installing node tiger-4 OK. Installing node tiger-5 OK. Installing node tiger-6 OK. Installing node tiger-7 OK. Installing node tiger-8 OK. Installing node tiger-9 OK. Installing machine scimple as frontend. NOTE: You need to create the cluster configuration files 'dishosts.conf' and 'networkmanager.conf' using the graphical tool 'dishostseditor' which will be launched now. If the interconnect cables are not yet installed, you can create detailed cabling instruction within this tool (File -> Get Cabling Instructions). Then install the cables while this script is waiting.
# >>> Are all cables connected, and do all LEDs on the SCI adapters light green? [Y/n]
The nodes get installed and drivers and the node manager are started. Then, the basic packages are installed on the frontend, and the dishostseditor application is launched to create the required configuration files /etc/dis/ dishosts.conf and /etc/dis/networkmanager.conf if they do not already exist. The script will wait at this
12
Initial Installation
point until the configuration files have been created with disthostseditor, and until you confirm that all cables have been connected according to the cabling instructions. This is described in the next section. For typical problems at this point of the installation, please refer to section Chapter 9, FAQ.
4.3.1.1. Topology The dialog will let you enter he selected topology information (number of nodes in X-, Y- and Z- dimension) according to the topology type you selected. The product of all nodes in every dimension needs to be equal (for regular topologies) or less (for irregular topology variants). The number of fabrics needs to be set to the minimum number of adapters in every node. The topology settings should already be correct by default if dishostseditor is launched by the installation script. If the cables are not yet mounted (which is the recommended way of doing it), you simply choose the settings that matches the way you plan to install. However, if the cables are already in place, it is critical to verify that the actual cable installation matches the dimensions shown here if you install a cluster with a 2D- or 3D-torus interconnect topology. I.e., a 12 node cluster
13
Initial Installation
can be set up as 3 by 4 or 4 by 3 or even 2 by 6, the setup script cannot verify that the cabling matches the dimensions that you selected. Remember that link 0 on the adapter boards (the one where the plug is right on the PCB of the adapter board) is mapped to the X-dimension, and link 1 on the adapter board (the one where the plug is on the piggy-back board) is mapped to the Y-dimension. 4.3.1.2. SuperSockets Network Address If your cluster operates within its own subnet and you want all nodes within this subnet to use SuperSockets (having Dolphin Express installed), you can simplify the configuration by specifying the address of this subnet in this dialog. To do so, activate the Network Address field and enter the cluster IP subnet address including the mask. I.e., if all your node communicate via an IP interface with the address 192.168.4.*, you would enter 192.168.4.0/8 here. If the cluster has its own subnet, this option is recommend. SuperSockets will try to use the Dolphin Express for any node in this subnet when it connects to another node of this subnet. If using Dolphin Express is not possible, i.e. because one or both nodes are only equipped with an Ethernet interface, SuperSockets will automatically fall back to Ethernet. Also, if a node gets assigned a new IP address within this subnet, you don't need to change the SuperSockets configuration. Assigning more than one subnet to SuperSockets is also possible, but this type of configuration is not yet supported by dishostseditor. See section Section 1.1, dishosts.conf on how to edit dishosts.conf accordingly. If this type of configuration is not possible in your environment, you need to configure SuperSockets for each node as described in the following section. 4.3.1.3. Status Notification In case you want to be informed on any change of the interconnect status (i.e. an interconnect link was disabled due to errors, or a node has gone down and the interconnect traffic was rerouted), active the checkbox Alert target and enter the alert target and the alert script to be executed. The default alert script is alert.sh and will send an e-mail to the address specified as alert target. Other alert scripts can be created and used, which may require another type of alert target (i.e. a cell phone number to send an SMS). For more information on using status notification, please refer to Section 1, Notification on Interconnect Status Changes.
14
Initial Installation
At this point, you need to arrange the nodes (marked by their hostnames) such that the placement of each node in the torus as shown by dishostseditor matches its placement in the physical torus. You do this by assigning the correct hostname for each node by double-clicking its node icon which will open the configuration dialog of this node. In this dialog, select the correct machine name, which is the hostname as seen from the frontend, from the drop-down list. You can also type a hostname if a hostname that you specified during the installation was wrong.
15
Initial Installation
After you have assigned the correct hostname to this machine, you may need to configure SuperSockets on this node. If you selected the Network Address in the cluster configuration dialog (see above), then SuperSockets will use this subnet address and will not allow for editing this property on the nodes. Otherwise, you can choose between 3 different options for each of the currently supported 2 SuperSocket-accelerated IP interfaces per node: disable Do not use SuperSockets. If you set this option for both fields, SuperSockets can not be used with this node, although the related kernel modules will still be loaded. Enter the hostname or IP address for which SuperSockets should be used. This hostname or IP address will be statically assigned to this physical node (its DolpinExpress interconnect adapter). Choosing a static socket means that the mapping between the node (its adapters) and the specified hostname/IP address is static and will be specified within the configuration file dishosts.conf. All nodes will use this identical file (which is automatically distributed from the frontend to the nodes by the network manager) to perform this mapping. This option works fine if the nodes in your cluster don't change their IP addresses over time. dynamic Enter the hostname or IP address for which SuperSockets should be used. This hostname or IP address will be dynamically resolved to the DolpinExpress interconnect adapter that is installed in the machine with this hostname/IP address. SuperSockets will therefore resolve the mapping between adapters and hostnames/IP addresses dynamically. This incurs a certain initial overhead when the first connection is set up, but as the mapping is cached, this is not really relevant. This option is similar to using a subnet, but resolves only the explicitly specified IP addresses and not all IP addresses of a subnet. Use this option if nodes change their IP addresses or node identities move between physical machines, i.e. in a fail-over setup.
static
Initial Installation
Note
In order to achieve a trouble-free operation of your cluster, setting up the cables correctly is critical. Please take your time to perform this task properly. The cables can be installed while nodes are powered up. The setup script will wait with a question for you to continue:
# >>> Are all cables connected, and do all LEDs on the SCI adapters ligtht green? [Y/n]
Please consider the hints below for connecting the cables: Never apply force: The plugs of the cable will move into the sockets easily. Make sure the orientation is correct. The cables have a minimum bend diameter of 5cm. 17
Initial Installation
Note
This specification applies to black All-Best cables (part number D706), but not to the grey CHH cables (part number D707). With the CHH cables, the minimum bend diameter is 10cm. Fasten evenly. When fastening the screws of the plugs, make sure you fasten both lightly before tightening them. Do not tighten only one screw of the plug, and then the other one, as this is likely to tilt the plug within the connector. Fasten gently. Use a torque screw driver if possible, and apply a maximum of 0.4 Nm. As a rule of thumb: do not apply more torque with the screw driver than you possibly could using only your finger (if there was enough space to grip the screw). Observe LEDs: When an adapter has both input and output of a link connected to it's neighboring adapter, the LED should turn green and emit a steady light (not blinking). Don't mix up links: When using a 2D-torus topology, it is important not to connect link 0 of one adapter with link 1 of another adapter. As decribed above, link 0 is the left pair of connectors on the Dolphin Express SCI interconnect adapter when the adapter is placed in a vertical position. In order to determine a left side this you may hold the Dolphin Express interconnect adapter in a vertical position: the blue "O" (indicating the OUT port) should be located at the top. LEDs are also placed on the top of the adapter the PCI/PCI-X/PCI-Express bus connector is mounted on the lower side of the adapter The left pair of connectors on the Dolphin Express interconnect adapter is what we refer to as Link 0. Link 1 is the right pair of connectors on the Dolphin Express interconnect adapter when the adapter is placed in a vertical position.
Note
If the links have been mixed up, the LED will still turn green, but packet routing will fail. The cabling test of sciadmin will reveal such cabling errors.
Important
A green link LED indicates that the link between the output plug and input plug could be established and synchronized. It does not assure that the cable is actually placed correctly! It is therefore important to verify once more that the cables are plugged according to the cabling instructions generated by the dishostseditor! If a pair of LEDs do not turn green, please perform the following steps: Disconnect the cables. Make sure you connect an Output with an Input plug. Re-insert and fasten the plug according to the guidelines above. If the LEDs still do not turn green, use a different cable. If the LEDs still do not turn green, swap the cable of the problematic connection with a working one and observe if the problem moves with the cable. Power-cycle the nodes with the orange LEDs according to Q: 1.1.1. Contact Dolphin support if you can not make the LEDs turn green after trying all proposed measures.
18
Initial Installation
When you are done connecting the cables, all LEDs have turned green and you have verified the connections, you can answer "Yes" to the question "Are all cables connected, and do all LEDs on the SCI adapters ligtht green? " and proceed with the next section to finalize the software installation.
#* Installing remaining frontend packages #* NOTE: #+ To compile SISCI applications (like NMPI), the SISCI-devel RPM needs to be #+ installed. It is located in the frontend_RPMS and node_RPMS directories. #* OK.
If no problems are reported (like in the example above), you are done with the installation and can start to use your Dolphin Express accelerated cluster. Otherwise, refer to the next subsections and Section 4.7, Interconnect Validation with sciadmin to learn about the individual tests and how to fix problems reported by each test.
If this test reports errors or warning, you are offered to re-run dishostseditor to validate and possibly fix the interconnect configuration. If the problems persist, you should let the installer continue and analyse the problems using sciadmin after the installation finishes (see Section 4.7, Interconnect Validation with sciadmin).
19
Initial Installation
Success in this test means that the SuperSockets service dis_supersockets is running and is configured identically on all nodes. If a failure is reported, it means the the interconnect configuration did not propagate correctly to this node. You should check if the dis_nodemgr service is running on this node. If not, start it, wait for a minute, and then configure SuperSockets by calling dis_ssocks_cfg.
The SuperSockets latency is rated based on our platform validation experience. If the rating indicates that SuperSockets are not performing as expected, or if it shows that a fallback to Ethernet has occurred, please contact Dolphin Support. In this case, it is important that you supply the installation log (see above). The installation finishes with the option to start the administration GUI tool sciadmin, a hint to use LD_PRELOAD to make use of SuperSockets and a pointer to the binary RPMs that have been used for the installation.
#* OK: Cluster installation completed. #+ Remember to use LD_PRELOAD=libksupersockets.so for all applications that #+ should use Dolphin Express SuperSockets. # >>> Do you want to start the GUI tool for interconnect adminstration (sciadmin)? [y/N]n #* RPM packages that were used for installation are stored in #+ /tmp/node_PRMS and /tmp/frontend_PRMS.
20
Initial Installation
cluster and allows to perform detailed status queries. It also provides means to manually control the interconnect, inspect and set options and perform interconnect tests. For a complete description of sciadmin, please refer to Appendix B, sciadmin Reference. Here, we will only describe how to use sciadmin to verify the newly installed Dolphin Express interconnect.
should tell you that the node manager is running. If this is not the case: a. Try to start the node manager: On Red Hat:
# service dis_nodemgr start
should tell you that the node manager has started successfully. b. If the node manager fails to start, please see /var/log/dis_nodemgr.log
21
Initial Installation
c.
Make sure that the service is configured to start in the correct runlevel (Dolphin installation makes sure this is the case). On Red Hat:
# chkconfig --add 2345 dis_nodemgr on
On other Linux variants, please refer to the system documentation to determine the required steps
Warning
Running this test will stop the normal traffic over the interconnect as the routing needs to be changed. If you run this test while your cluster is in production, you might experience communication timeouts. SuperSockets in operation will fall back to Ethernet during this test, which also leads to increased communication delays. If the test detects a problem, it will inform you that node A can not communicate with node B although they are supposed to be within the same ringlet. You will typically get more than one error message in case of a cabling problem, as such a problem does in most cases affect more than one pair of nodes. Please proceed as follows: 1. Try to fix the first reported problem by tracing the cable connections from node A to node B: a. Verify that the cable connections are placed within one ringlet: i. Look up the path of cable connections between node A and node B in the Cabling Instructions that you created (or still can create at this point) using dishostseditor. When you arrive at node B, do the same check for the path back from node B to node A.
ii. b.
Along the path, make sure: i. ii. That each cable plug is securely fitted into the socket of the adapter. Each cable plug is connected to the right link (0 or 1) as indicated by the cabling instructions.
2.
If you can't find a problem for the first problem reported, verify the cable connections for all following pairs of node reported bad. After the first change, re-run the cable test to verify if this change solves all problems. If this is not the case, start over with this verification loop.
3.
Warning
Running this test will stop the normal traffic over the interconnect as the routing needs to be changed. If you run this test while your cluster is in production, you might experience communication timeouts.
22
Initial Installation
SuperSockets in operation will fall back to a second fabric (if installed) or to Ethernet during this test, which also leads to increased communication delays. This test will run for a few minutes, depending on the size of your cluster, as it tests communication for about 20 seconds between each pair of nodes within the same ring. This means, for a 4 by 4 2D-torus cluster which features 8 rings with 4 nodes each, it will take 8 * ( 3 + 2 +1) * 20 seconds = 16 minutes. It will then report if any CRC errors or other problems have occurred between any pairs of nodes.
Note
Any communication errors reported here are either corrected automatically by retrying a data transfer (like for CRC errors), or are reported. Thus, an communication error does not mean data might get lost. However, every communication error reduces the performances, and an optimally set up Dolphin Express interconnect should not show any communication errors. A small number of communication errors is acceptable, though. Please contact Dolphin support if in doubt. If the test reports communication errors, please proceed as follows: 1. If errors are reported between multiple pairs of nodes, locate the pair of nodes which is located most closely (has the smallest number of cable connections between them). Normally, if any errors are reported, a pair of nodes located next to each other will show up. Check the cable connection on the shortest path between these two nodes (a single cable, if nodes are located next to each other) for being properly mounted: a. b. 3. 4. No excessive stress on the cable, like bending it to sharply or too much force on the plugs. Cable plugs need to be placed in the connectors on the adapters evenly (not tilted) and securely fastened. If in doubt, unplug cable and re-fasten it.
2.
Perform the previous check for all other node pairs; then re-run the test. If communication errors persist, change cables to locate a possibly damaged cable: a. b. Exchange the cables between the most close pair of nodes one-by-one with a cable of a connection for which no errors have been reported. Remember (note down) which cables you exchanged. Run the Fabric Quality Test after each cable exchange. i. ii. If the communication errors move with the cable you just exchanged, then this cable might be damaged. Please contact your sales representative for exchange. If the communication error remains unchanged, the problem might be with one of the adapters. Please contact Dolphin support for further analysis.
23
Initial Installation
4.8.1.1. Launch via Wrapper Script To let generic socket applications use SuperSockets, you just need to run them via the wrapper script dis_ssocks_run that sets the LD_PRELOAD environment variable accordingly. This script is installed to the bin directory of the installation (default is /opt/DIS/bin) which is added to the default PATH environment variable. To have i.e. the socket benchmark netperf run via SuperSockets, start the server process on node server_name like
dis_ssocks_run netperf
and the client process on any other node in the cluster like
dis_ssocks_run netperf -h server_name
4.8.1.2. Launch with LD_PRELOAD As an alternative to using this wrapper script, you can also make sure to set LD_PRELOAD correctly to preload the SuperSockets library, i.e. for sh-style shells such as bash:
export LD_PRELOAD=libksupersockets.so
4.8.1.3. Troubeshooting If the applications you are using do not show increased performance, please verify that they use SuperSockets as follows: 1. To verify that the preloading works, use the ldd command on any executable, i.e. the netperf binary mentioned above:
$ export LD_PRELOAD=libksupersockets.so $ ldd netperf libksupersockets.so => /opt/DIS/lib64/libksupersockets.so (0x0000002a95577000) libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x00000033ed300000) libc.so.6 => /lib64/tls/libc.so.6 (0x00000033ec800000) libdl.so.2 => /lib64/libdl.so.2 (0x00000033ecb00000) /lib64/ld-linux-x86-64.so.2 (0x00000033ec600000)
The library libksupersockets.so has to be listed at the top position. If this is not the case, make sure the library file actually exists. The default locations are /opt/DIS/lib/libksupersockets.so and /opt/DIS/ lib64/libksupersockets.so on 64-bit platforms, and libksupersockets.so actually is a symbolic link on a library with the same name and a version suffix:
$ ls -lR /opt/DIS/lib*/*ksupersockets* -rw-r--r-- 1 root root 29498 Nov 14 12:43 -rw-r--r-- 1 root root 901 Nov 14 12:43 lrwxrwxrwx 1 root root 25 Nov 14 12:50 lrwxrwxrwx 1 root root 25 Nov 14 12:50 -rw-r--r-- 1 root root 65160 Nov 14 12:43 -rw-r--r-- 1 root root 19746 Nov 14 12:43 -rw-r--r-- 1 root root 899 Nov 14 12:43 lrwxrwxrwx 1 root root 25 Nov 14 12:50 lrwxrwxrwx 1 root root 25 Nov 14 12:50 -rw-r--r-- 1 root root 48731 Nov 14 12:43
/opt/DIS/lib64/libksupersockets.a /opt/DIS/lib64/libksupersockets.la /opt/DIS/lib64/libksupersockets.so -> libksupersockets.so.3 /opt/DIS/lib64/libksupersockets.so.3 -> libksupersockets.so /opt/DIS/lib64/libksupersockets.so.3.3.0 /opt/DIS/lib/libksupersockets.a /opt/DIS/lib/libksupersockets.la /opt/DIS/lib/libksupersockets.so -> libksupersockets.so.3.3 /opt/DIS/lib/libksupersockets.so.3 -> libksupersockets.so.3 /opt/DIS/lib/libksupersockets.so.3.3.0
Also, make sure that the dynamic linker is configured to find it in this place. The dynamic linker is configured accordingly on installation of the RPM; if you did not install via RPM, you need to configure the dynamic linker manually. To verify that the dynamic linking is the problem, set LD_LIBRARY_PATH to include the path to libksupersockets.so and verify again with ldd:
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/DIS/lib:/opt/DIS/lib64 $ echo $LD_PRELOAD libksupersockets.so $ ldd netperf ....
A better solution than setting LD_LIBRARY_PATH is to configure the dynamic linker ld to include these directories in its search path. Use man ldconfig to learn how to achieve this.
24
Initial Installation
2.
You need to make sure that the preloading of the SuperSockets library described above is effective on both nodes, for both applications that should communicate via SuperSockets. Make sure that the SuperSockets kernel module (and the kernel modules it depends on) are loaded and configured correctly on both nodes. 1. Check the status of all Dolphin kernel modules via the dis_services script (defaut location /opt/DIS/ sbin):
# dis_services status Dolphin IRM 3.3.0 ( November 13th 2007 ) is running. Dolphin Node Manager is running (pid 3172). Dolphin SISCI 3.3.0 ( November 13th 2007 ) is running. Dolphin SuperSockets 3.3.0 "St.Martin", Nov 7th 2007 (built Nov 14 2007) running.
3.
At least the services dis_irm and dis_supersockets need to be running, and you should not see a message about SuperSockets not being configured. 2. Verify the configuration of the SuperSockets to make sure that all cluster nodes will connect and communicate via SuperSockets. The active configuration is shown in /proc/net/af_sci/socket_maps:
# cat /proc/net/af_sci/socket_maps IP/net Adapter NodeId List ----------------------------------------------172.16.5.1/32 0x0000 4 0 0 172.16.5.2/32 0x0000 8 0 0 172.16.5.3/32 0x0000 68 0 0 172.16.5.4/32 0x0000 72 0 0
Depending on the configuration variant you used to set up SuperSockets, the content of this file may look different, but it must never be empty and should be identical on all nodes. The examle above shows a four-node cluster with a single fabric and a static SuperSockets configuration, which will accelerate one socket interface per node. For more information on the configuration of SuperSockets, please refer to Section 1.1, dishosts.conf. 3. Make sure that the host names/IP addresses used effectively by the application are the ones that are configured for SuperSockets, especially if the nodes have multiple Ethernet interfaces configured.
4.
Check the system log for messages of the SuperSockets kernel module. It will report problems all problems, i.e. when running out of resources.
# cat /var/log/messages | grep dis_ssocks
It is a good idea to monitor the system log while you try to connect to a remote node if you suspect problems being reported there:
# tail -f /var/log/messages
For an explanation of typical error messages, please refer to Section 2, Software. 5. Don't forget to check if the port numbers used by this application, or the application itself have been explicitly been exclued from using SuperSockets. By default, only the system port numbers below 1024 are excluded from using SuperSockets, but you should verify the current configuration (see Section 2, SuperSockets Configuration). If you can't solve the problem, please contact Dolphin Support.
6.
Initial Installation
Note
The SISCI library is only available in the native bit width of a machine. This implies that on 64-bit machines, only 64-bit SISCI applications can be created and executed as there is no 32-bit version of the SISCI library on 64-bit machines. To compile and link SISCI applications like the MPI-Implementation NMPI, the SISCI-devel RPM needs to be installed on the respective machine. This RPM is built during installation and placed in the node_RPMS and frontend_RPMS directory, respectively.
26
1. Complete Update
Opposed to the initial installation, the update installation can be performed in a fully automatic manner without manual intervention. Therefore, this convenient update method is recommended if you can afford some downtime of the whole cluster. Typically, the update of a 16-node cluster takes about 30 minutes. A complete update is also required in case or protocol incompatibilities between the installed version and the version to be installed. Such incompatibilities are rare and will be described in the release notes. If this is applies, a rolling update is not possible, but you will need to update the system completely in one operation. This will make Dolphin Express functionality unavailable for the duration of this update. Proceed as follows to perform the complete update installation: 1. Stop the applications using Dolphin Express on all nodes. This step can be omitted if you choose the --reboot option below. Become superuser on the frontend. Run the SIA on the frontend with any combination of the following options:
--install-all
2. 3.
This is the default installation variant and will update all nodes and the frontend. You can specifiy --install-node or --install-frontend here to update only the current node or the frontend (you need to execute the SIA on the respective node in these cases!)
--batch
Using this option, the script will run without any user interaction, assuming the default answers to all questions which would otherwise be posed to the user. This option can safely be used if no configuration changes are needed, and if you know that all services/applications using Dolphin Express are stopped on the nodes. Rebooting the nodes in the course of the installation will avoid any problems when loading the updated drivers. Such problems can occur because the drivers are currently in use, or due to resource problems. This option is recommended. By default, packages on a node or the frontend will only be updated if the new package has a more recent version than the installed package. This option will enforce the uninstallation of the installed package, followed by the installation of the new package. This option is recommended if you are unsure about the state of the installation.
--reboot
--enforce
As an example, the complete, non-interactive and enforced installation of a specific driver version (provided via the SIA) with a reboot of all nodes will be invoked as follows:
# sh DIS_install_<version>.sh --install-all --batch --reboot --enforce
4.
Wait for the SIA to complete. The updated Dolphin Express services will be running on the nodes and the frontend.
2. Rolling Update
A rolling update will keep your cluster and all its services available on all but one node. This kind of update needs to be performed node by node. It requires that you stop all applications which use the Dolphin Express software 27
Update Installation
stack (like a database server using SuperSockets) on the node you intend to update. This means your systems needs to tolerate applications going down on a single node. Before performing a rolling update, please refer to the release notes of the new version to be installed if it supports a rolling update of the version currently installed. If this is not the case, you need to perform a complete update (see previous section).
Note
It possilbe to install the updated files while the applications are still using Dolphin Express services. However, in this case the updated Dolphin Express services will not become active until you restart them (or reboot the machine). Perform the following steps on each node: 1. 2. Log into the node and become superuser (root). Build the new binary RPM packages for this node:
# sh DIS_install_<version>.sh --build-rpm
The created binary RPM packages will be stored in the subdirectories node_RPMS and frontend_RPMS which will be created in the current working directory.
Tip
To save a lot of time, you can use the binary RPM packages built on the first node that is updated on all other nodes (if they have the same CPU architecture and Linux version). Please see Section 2.3, Installing from Binary RPMs for more information. 3. 4. Stop all applications on this node that use Dolphin Express services, like a MySQL server or NDB process. Stop all Dolphin Express services on this node using the dis_services command:
# dis_services stop Stopping Dolphin SuperSockets drivers Stopping Dolphin SISCI driver Stopping Dolphin Node Manager Stopping Dolphin IRM driver [ [ [ [ OK OK OK OK ] ] ] ]
If you run sciadmin, you will notice that this node will show up as disabled (not active).
Note
The SIA will also try to stop all services when doing an update installation. Performing this step explicitly will just assure that the services can be stopped, and that the applications are shut down properly. If the services can not be stopped for some reason, you can still update the node, but you have to reboot it to enable the updated services. See the --reboot option in the next step. 5. Run the SIA with the --install-node --use-rpms <path> options to install and updated RPM packages and start the updated drivers and services. The <path> parameter to the --use-rpms option has to point to the directory where the binary RPM packages have been built (see step 1). If you had run the SIA in /tmp in step 1, you would issue the following command:
# sh DIS_install_<version>.sh --install-node --use-rpms /tmp
Adding the option --reboot will reboot the node after the installation has been successful. A reboot is not required if the services were shut down successfully in step 4, but recommend to allow the low-level driver the allocation of suffcient memory resources for remote-memory access commuincation. 28
Update Installation
Important
If the services could not be stopped in step 4, a reboot is required to allow the updated drivers to be loaded. Otherwise, the new drivers will only be installed on disk, but will not be loaded and used. If for some reason you want to re-install the same version, or even an older version of the Dolphin Express software stack than is currently installed, you need to use the --enforce option. 6. The updated services will be started by the installation and are available for use by the applications. Make sure that node has shown up as active (green) in sciadmin again before updating the next node. If the services failed to start, a reboot of the node will fix the problem. This can be caused by situations where the memory is too fragmented for the low-level driver (see above).
29
Tip
You can speed up this node installation by re-using binary RPMs that have been build on another node with the same kernel version and the same CPU architecture. To do so, proceed as follows: 1. After the first installation on a node, the binary RPMs are located in the directories node_RPMS and frontend_RPMS, located in the directory where you launched the SIA. Copy these sub-directories to a path that is accessible from the other nodes. When installing on another node with the same Linux kernel version and CPU architecture, use the --use-rpms option to tell SIA where it can find matching RPMs for this node, so it does not have to build them once more.
2.
2.
Installing the Dolphin Express hardware For an installation under load, perform the following steps for each node one by one: 1. 2. Shut down your application processes on the current node. Power off the node, and install the Dolphin Express adapter (see Section 3, Adapter Card Installation). Do not yet connect any cables! Power on the node and boot it up. The Dolphin Express drivers should load successfully now, although the SuperSockets service will not be configured. Verify this via dis_services:
3.
# dis_services status Dolphin IRM 3.3.0 ( November 13th 2007 ) is running. Dolphin Node Manager is running (pid 3172). Dolphin SISCI 3.3.0 ( November 13th 2007 ) is running. Dolphin SuperSockets 3.3.0 "St.Martin", Nov 7th 2007 (built Nov 14 2007) loaded, but not configur
4.
5. 6.
Start all your own applications on the current node and make sure the whole cluster operates normally. Proceed with the next node until all nodes have the Dolphin Express hardware and software installed. 30
Manual Installation
3.
Creating the cluster configuration files If you have a Linux machine with X available which can run GUI applications, run the SIA with the --install-editor option to install the tool dishostseditor. Ideally, this step is performed on the frontend. If this is the case, you should create the directory /etc/dis and make it writable for root:
# mkdir /etc/dis # chmod 755 /etc/dis
After the SIA has completed the installation, start the tool dishostseditor (default installation location is /opt/DIS/sbin):
# /opt/DIS/sbin/dishostseditor
Information on how to work with this tool can be found in Section 4.3, Working with the dishostseditor. Make sure you create the cabling instructions needed in the next step. If the dishostseditor was run as root on the frontend, proceed with the next step. Otherwise, copy the configuration files dishosts.conf and networkmanager.conf which you have just created to the frontend and place it there under /etc/dis (you may need to create this directory, see above). 4. Cable Installation Using the cabling instructions created by dishostseditor in the previous step, the interconnect cables should now be connected (see Section 4.4, Cluster Cabling). 5. On the frontend machine, run the SIA with the --install-frontend option. This will start the network manager, which will then configure the whole cluster according to the configuration files created in the previous steps. Start all services on all the nodes:
# dis_services start Starting Dolphin IRM 3.3.0 ( November 13th 2007 ) Starting Dolphin Node Manager Starting Dolphin SISCI 3.3.0 ( November 13th 2007 ) Starting Dolphin SuperSockets drivers [ [ [ [ OK OK OK OK ] ] ] ]
6.
7. 8.
Verify the functionality and performance according to Section 1, Verifying Functionality and Performance. At this point, Dolphin Express and SuperSockets are ready to use, but your application is still running on Ethernet. To make your application use SuperSockets, you need to perform the following steps on each node one-by-one: 1. 2. Shut down your application processes on the current node. Refer to Section 4.8, Making Cluster Application use Dolphin Express to determine the best way to have you application use SuperSockets. Typically, this can be achieved by simply starting the process via the dis_ssocks_run wrapper script (located in /opt/DIS/bin by default), like:
$ dis_ssocks_run mysqld_safe
3.
Start all your own applications on the current node and make sure the whole cluster operates normally. Because SuperSockets fall back to Ethernet transparently, your applications will start up normally independently from applications on the other nodes already using SuperSockets or not.
After you have performed these steps on all nodes, all applications that have been started accordingly will now communicate via SuperSockets.
Note
This single-node installation mode will not adapt the driver configuration dis_irm.conf to optimally fit your cluster. This might be necessary for clusters with more than 4 nodes. Please refer to Section 3.1, dis_irm.conf to perform recommended changes, or contact Dolphin support.
31
Manual Installation
Note
This single-node installation mode will not adapt the driver configuration dis_irm.conf to optimally fit your cluster. This might be necessary for clusters with more than 4 nodes. Please refer to Section 3.1, dis_irm.conf to perform recommended changes, or contact Dolphin support. 1. Installing the Dolphin Express hardware Power off all nodes, and install the Dolphin Express adapter (see Section 3, Adapter Card Installation). Do not yet connect any cables! Then, power up all nodes again. 2. Installing the drivers on the nodes 1. On all nodes, run the SIA with the option --install-node. This is a local operation which will build and install the drivers on the local machine only.
Tip
You can speed up this node installation by re-using binary RPMs that have been build on another node with the same kernel version and the same CPU architecture. To do so, proceed as follows: 1. After the first installation on a node, the binary RPMs are located in the directories node_RPMS and frontend_RPMS, located in the directory where you launched the SIA. Copy these sub-directories to a path that is accessible from the other nodes. When installing on another node with the same Linux kernel version and CPU architecture, use the --use-rpms option to tell SIA where it can find matching RPMs for this node, so it does not have to build them once more.
2.
2.
The Dolphin Express drivers should load successfully now, although the SuperSockets service will not be configured. Verify this via dis_services:
# dis_services status Dolphin IRM 3.3.0 ( November 13th 2007 ) is running. Dolphin Node Manager is running (pid 3172). Dolphin SISCI 3.3.0 ( November 13th 2007 ) is running. Dolphin SuperSockets 3.3.0 "St.Martin", Nov 7th 2007 (built Nov 14 2007) loaded, but not configur
3.
3.
Creating the cluster configuration files If you have a Linux machine with X available which can run GUI applications, run the SIA with the --install-editor option to install the tool dishostseditor. Ideally, this step is performed on the frontend. If this is the case, you should create the directory /etc/dis and make it writable for root:
# mkdir /etc/dis # chmod 755 /etc/dis
32
Manual Installation
After the SIA has completed the installation, start the tool dishostseditor (default installation location is /opt/ DIS/sbin):
# /opt/DIS/sbin/dishostseditor
Information on how to work with this tool can be found in Section 4.3, Working with the dishostseditor. Make sure you create the cabling instructions needed in the next step. If the dishostseditor was run as root on the frontend, proceed with the next step. Otherwise, copy the configuration files dishosts.conf and networkmanager.conf which you have just created to the frontend and place it there under /etc/dis (you may need to create this directory). 4. Cable Installation Using the cabling instructions created by dishostseditor in the previous step, the interconnect cables should now be connected (see Section 4.4, Cluster Cabling). 5. On the frontend machine, run the SIA with the --install-frontend option. This will start the network manager, which will then configure the whole cluster according to the configuration files created in the previous steps. Start all services on all the nodes:
# dis_services start Starting Dolphin IRM 3.3.0 ( November 13th 2007 ) Starting Dolphin Node Manager Starting Dolphin SISCI 3.3.0 ( November 13th 2007 ) Starting Dolphin SuperSockets drivers [ [ [ [ OK OK OK OK ] ] ] ]
6.
7.
Verify the functionality and performance according to Section 1, Verifying Functionality and Performance.
33
Manual Installation
To be installed on the frontend (and addtionally other machines that should run dishostseditor). Dolphin-NetworkManager Contains the network manager on the frontend, which talks to all node managers on the nodes. Installs the service dis_networkmgr. To be installed on the frontend. Depends on Dolphin-NetworkHosts. Dolphin-NetworkAdmin Contains the GUI application sciadmin for managing and monitoring the interconnect. sciadmin talks to the network manager and can be installed on any machine that has connection to the frontend. To be installed on the frontend (or any other machine). Dolphin-SISCI-devel To compile and link applications that use the SISCI API on other machines than the nodes, this RPM installs the header files and library plus examples and documentation on any machine. To be installed on the frontend, or any other machine on which SISCI applications should be compiled and linked.
frontend_RPMS
source_RPMS
To install the packages from one directory, just enter the directory and install them all with a single call of the rpm command, like:
# cd node_RPMS # rpm -Uhv *.rpm
4. Unpackaged Installation
Not all target operating systems are supported with native software packages. In this case, a non-package based installation via a tar-archive is supported. This type of installation will build all software for both, node and frontend, and install it to a path that you specify. From there, you have to perform the actual driver and service installation using scripts provided with the installation. This type of installation installs the complete software into a directory on the local machine. Depending on whether this machine will be a node or the frontend, you have to install different drivers or services from there. To install using this method, please proceed as follows: 1. Become superuser:
$ su #
2.
Create the tar archive from the SIA, and upack it:
34
Manual Installation
# sh DIS_install_<version>.sh --get-tarball #* Logfile is /tmp/DIS_install.log_260 on node1 #* #+ Dolphin ICS - Software installation (version: 1.31 $ of: 2007/09/27 15:05:05 $) #+ #* Generating tarball distribution of the source code #* NOTE: source tarball is /tmp/DIS.tar.gz # tar xzf DIS.tar.gz
3.
Enter the created directory and configure the build system, specifying the target path <install_path> for the installation. We recommend that you use the standard path /opt/DIS, but you can use any other path. The installation procedure will create subdirectories (like bin, sbin, lib, lib64, doc, man, etc) relative to this path and install into them.
# cd DIS # ./configure --prefix=/opt/DIS
4.
Build the software stack using make. Check the output when the command returns to see if it the build operation was successful.
# make ... # make supersockets ...
5.
Tip
You can speed up the installation on multiple nodes if you copy over the installation directory to the other nodes, provided they features the same Linux kernel version and CPU architecture. The best way is to create a tar archive:
# cd /opt # tar czf DIS_binary.tar.gz DIS
6.
Install the drivers and services depending on whether the local machine should be a node or the frontend. It is recommended to first install all nodes, then the frontend, than configure and test the cluster from the frontend. For a node, install the necessary drivers and services as follows: 1. Change to the sbin directory in your installation path:
# cd /opt/DIS/sbin
2.
Invoke the scripts for driver installation using the option -i. The option --start will start the service after a successful installation:
# # # # ./irm_setup -i --start ./nodemgr_setup -i --start ./sisci_setup -i --start ./ssocks_setup -i
35
Manual Installation
Note
Please make sure that SuperSockets are not started yet (do not provide option --start to the setup script). You can remove the driver from the system by calling the script with the option -e. Help is available via -h. Repeat this procedure for each node. For the frontend, install the necessary services and perform the cluster configuration and test as follows: 1. Change to the sbin directory in your installation path:
# cd /opt/DIS/sbin
2.
For more information on using dishostseditor, please refer to Section 4.3, Working with the dishostseditor. 3. Invoke the script for service installation using the option -i:
# ./networkmgr_setup -i --start
You can remove the service from the system by calling the script with the option -e. 4. Test the cluster via the GUI tool sciadmin:
# ./sciadmin
For more information on using sciadmin to test your cluster installation, please refer to Appendix B, sciadmin Reference and Section 1, Verifying Functionality and Performance. 5. Enable all services, including SuperSockets, on all nodes.
# dis_services start Starting Dolphin IRM 3.3.0 ( November 13th 2007 ) Starting Dolphin Node Manager Starting Dolphin SISCI 3.3.0 ( November 13th 2007 ) Starting Dolphin SuperSockets drivers [ [ [ [ OK OK OK OK ] ] ] ]
Note
This command has to be executed on the nodes, not only on the frontend!
36
Dolphin provides a script dis_services that performs this task for all Dolphin services installed on a machine. It is used in the same way as the individual service command provided by the distribution:
# dis_services status Dolphin IRM 3.3.0 ( November 13th 2007 ) is running. Dolphin Node Manager is running (pid 3172). Dolphin SISCI 3.3.0 ( November 13th 2007 ) is running. Dolphin SuperSockets 3.3.0 "St.Martin", Nov 7th 2007 (built Nov 14 2007) running.
If any of the required services is not running, you will find more information on the problem that may have occured in the system log facilities. Call dmesg to inspect the kernel messages, or check /var/log/messages for related messages.
37
Interconnect and Software Maintenance Running scidiag on a node will perform a self test on the local adapter(s) and list all remote adapters that this adapter can see via the Dolphin Express interconnect. This means, to perform the static interconnect test on a full cluster, you will basically need to run scidiag on each node and see if any problems with the adapter are reported, and if the adapters in each node can see all remote adapters installed in the other nodes. An example output of scidiag for a node which is part of a 9-node cluster configured in a 3 by 3 2D-torus, and using one adapter per node looks like this:
=========================================================================== SCI diagnostic tool -- SciDiag version 3.2.6d ( September 6th 2007 ) =========================================================================== ******************** VARIOUS INFORMATION ********************
Scidiag compiled in 64 bit mode Driver : Dolphin IRM 3.2.6d ( September 6th 2007 ) Date : Thu Oct 4 14:20:45 CEST 2007 System : Linux tiger-9 2.6.9-42.0.3.ELsmp #1 SMP Fri Oct 6 06:28:26 CDT 2006 x86_64 x86_64 x86_64 GNU/Linu Number of configured local adapters found: 1 Hostbridge : NVIDIA nForce 570 - MCP55 , 0x37610de Local adapter 0 > Type NodeId(log) NodeId(phys) SerialNum PSB Version LC Version PLD Firmware SCI Link frequency B-Link frequency Card Revision Switch Type Topology Type Topology Autodetect OK: SCI SCI SCI SCI OK: OK: OK: OK: OK: ==> : : : : : : : : : : : : : D352 140 0x2204 200284 0x0d66706d 0x1066606d 0x0001 166 MHz 80 MHz CD not present 2D Torus No
Psb chip alive in adapter 0. Link 0 - uptime 11356 seconds Link 0 - downtime 0 seconds Link 1 - uptime 11356 seconds Link 1 - downtime 0 seconds Cable insertion ok. Probe of local node ok. Link alive in adapter 0. SRAM test ok for Adapter 0 LC-3 chip accessible from blink in adapter 0. Local adapter 0 ok. ********************
******************** TOPOLOGY SEEN FROM ADAPTER 0 Adapters found: 9 Switch ports found: 0 ----- List of all ranges (rings) found: In range 0: 0004 0008 0012 In range 1: 0068 0072 0076 In range 2: 0132 0136 0140 REMOTE NODE INFO SEEN FROM ADAPTER 0 Log | Phys | resp | resp nodeId | nodeId | conflict | address 4 | 0x0004 | 0| 8 | 0x0104 | 0| 12 | 0x0204 | 0| 68 | 0x1004 | 0| 72 | 0x1104 | 0| 76 | 0x1204 | 0| 132 | 0x2004 | 0| 136 | 0x2104 | 0| 140 | 0x2200 | 0| ----------------------------------
| | 0| 0| 0| 0| 0| 0| 0| 0| 0|
resp type
| | 0| 0| 0| 0| 0| 0| 0| 0| 0|
resp data
| req | | timeout | 0| 4| 0| 1| 0| 0| 0| 2| 0| 0| 0| 0| 0| 1| 0| 1| 0| 0|
TOTAL 4 1 0 2 0 0 1 1 0
38
The static interconnect test passes if scidiag delivers TEST RESULT: *PASSED* and reports the same topology (remote adapters) on all nodes. More information on running scidiag is provided in ???, where you will also find hints on what to do if scidiag reports warning or errors, or reports different topology on different nodes.
Note
It is recommended to run this test from the sciadmin GUI (see previous section) because it will perform a more controlled variant of this test and give more helpful results. All instances of sciconntest will connect and start to exchange data, which can take up to 30 seconds. The output of sciconntest on one node which is part of a 9-node cluster looks like this:
/opt/DIS/bin/sciconntest compiled Oct 2 2007 : 22:29:09 ---------------------------Local node-id : 76 Local adapter no. : 0 Segment size : 8192 MinSize : 4 Time to run (sec) : 10 Idelay : 0 No Write : 0 Loopdelay : 0 Delay : 0 Bad : 0 Check : 0 Mcheck : 0 Max nodes : 256 rnl : 0 Callbacks : Yes ---------------------------Probing all nodes Response from remote node 4 Response from remote node 8 Response from remote node 12 Response from remote node 68 Response from remote node 72 Response from remote node 132 Response from remote node 136 Response from remote node 140 Local segment (id=4, size=8192) is created. Local segment (id=4, size=8192) is shared. Local segment (id=8, size=8192) is created. Local segment (id=8, size=8192) is shared.
39
The test passes if all nodes report 0 failures for all remote nodes. If the test reports failures, you can determine the closest pair(s) of nodes for which these failures are reported and check the cabled connection between them. The numerical node identifies shown in this output are the node ID numbers of the adapters (which identify an adapter in the Dolphin Express interconnect). Although this test can be run while a system is in production, but you have to take into account that performance of the productive applications will be reduced significantly while this test is running. If links actually show problems, they might be temporarily disabled, stopping all communication until rerouting takes place.
40
2.
3.
On node B, start the client-side benchmark with the options -client and -rn <node id of A>, like:
$ scibench2 -client -rn 4
4.
To simply gather all relevant low-level performance data, the script sisci_benchmarks.sh can be called in the same way. It will run all of the described tests. For the D33x and D35x series of Dolphin Express adapters, the following results can be expected for each test using a single adapter: scibench2 minimal latency to write 4 bytes to remote memory: 0.2s maximal bandwidth for streaming writes to remote memory: 340 MB/s
--------------------------------------------------------------Segment Size: Average Send Latency: Throughput: --------------------------------------------------------------4 0.20 us 19.72 MBytes/s 8 0.20 us 40.44 MBytes/s 16 0.20 us 80.89 MBytes/s 32 0.39 us 81.09 MBytes/s 64 0.25 us 254.22 MBytes/s 128 0.37 us 348.17 MBytes/s 256 0.74 us 344.89 MBytes/s 512 1.49 us 343.05 MBytes/s 1024 3.00 us 341.90 MBytes/s 2048 6.00 us 341.39 MBytes/s 4096 12.00 us 341.45 MBytes/s 8192 24.00 us 341.32 MBytes/s 16384 48.04 us 341.03 MBytes/s 32768 96.03 us 341.24 MBytes/s 65536 192.56 us 340.33 MBytes/s
scipp
The minimal round-trip latency for writing to remote memory should be below 4s. The average number of retries is not a performance metric and can vary from run to run.
Ping Ping Ping Ping Ping Ping Ping Ping Ping Ping Ping Pong Pong Pong Pong Pong Pong Pong Pong Pong Pong Pong round round round round round round round round round round round trip trip trip trip trip trip trip trip trip trip trip latency latency latency latency latency latency latency latency latency latency latency for for for for for for for for for for for 0 4 8 16 32 64 128 256 512 1024 2048 bytes, bytes, bytes, bytes, bytes, bytes, bytes, bytes, bytes, bytes, bytes, average average average average average average average average average average average retries= retries= retries= retries= retries= retries= retries= retries= retries= retries= retries= 1292 365 359 357 4 346 871 832 1072 1643 2738 3.69 us 3.94 us 3.98 us 4.01 us 4.58 us 4.30 us 6.26 us 6.49 us 7.99 us 10.99 us 17.00 us
41
intr_bench
The interrupt latency is the only performance metric of these tests that is affected by the operating system which always handles the interrupts and can therefore vary. The following number have been measured with RHEL 4 (Linux Kernel 2.6.9):
Average unidirectional interrupt time : Average round trip interrupt time : 7.665 us. 15.330 us.
dma_bench
The typical DMA bandwidth achieved for 64kB transfers is 240MB/s, while the maximum bandwidth (for larger blocks) is at about 250MB/s:
64 128 256 512 1024 2048 4096 8192 16384 32768 65536 19.63 19.69 20.36 21.08 23.25 26.80 34.60 50.30 81.74 144.73 270.82 us us us us us us us us us us us 3.26 6.50 12.57 24.29 44.05 76.42 118.40 162.85 200.43 226.41 241.99 MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s
which should show a status of running. If the status shown here is loaded, but not configured, it means that the SuperSockets configuration failed for some reason. Typically, it means that a configuration file could not be parsed correctly. The configuration can be performed manually like
# /opt/DIS/sbin/dis_ssocks_cfg
If this indicates that a configuration file is corrupted, you can verify them according to the reference in Section 2, SuperSockets Configuration. At any time, you can re-create dishosts.conf using the dishostseditor and restore modified SuperSockets configuration files (supersockets_ports.conf and supersockets_profiles.conf) from the default versions that have been installed in /opt/DIS/etc/dis. Once the status of SuperSockets is running, you can verify their actual configuration via the files in /proc/net/ af_sci. Here, the file socket_maps shows you, which IP address (or network mask) the local node's SuperSockets know about. This file should be non-empty and identical on all nodes in the cluster.
42
Interconnect and Software Maintenance The output for a working setup should look like this:
# # # # # # # # # # # sockperf 1.35 - test stream socket performance and system impact LD_PRELOAD: libksupersockets.so address family: 2 client node: n2 server nodes: n1 sockets per process: 1 - pattern: sequential wait for data: blocking recv() send mode: blocking client/server pairs: 1 (running on 2 cores) socket options: nodelay 1 communication pattern: PINGPONG (back-and-forth) bytes loops avg_RTT/2[us] min_RTT/2[us] max_RTT/2[us] msg/s 1 1000 4.26 3.67 18.67 117247 4 1000 4.16 3.87 11.32 120177 8 1000 4.31 4.17 11.81 115889 12 1000 4.29 4.17 9.08 116537 16 1000 4.29 4.16 10.17 116468 24 1000 4.30 4.18 7.16 116251 32 1000 4.38 4.21 44.20 114233 48 1000 4.50 4.24 102.91 111112 64 1000 5.28 5.16 7.54 94687 80 1000 5.37 5.20 11.08 93170 96 1000 5.41 5.20 11.29 92473 112 1000 5.53 5.27 11.04 90400 128 1000 5.74 5.59 11.96 87033 160 1000 5.85 5.68 10.65 85411 192 1000 6.30 6.01 11.24 79383 224 1000 6.47 6.20 80.47 77291 256 1000 6.82 6.55 17.41 73314 512 1000 8.37 8.05 14.52 59766 1024 1000 11.69 11.38 17.66 42764 2048 1000 15.25 14.90 59.72 32792 4096 1000 22.40 22.03 33.08 22318 8192 512 47.19 46.39 52.45 10596 16384 256 72.87 72.20 78.05 6862 32768 128 124.56 123.52 132.97 4014 65536 64 225.73 224.68 230.26 2215
MB/s 0.12 0.48 0.93 1.40 1.86 2.79 3.66 5.33 6.06 7.45 8.88 10.12 11.14 13.67 15.24 17.31 18.77 30.60 43.79 67.16 91.41 86.80 112.43 131.54 145.17
The latency in this example starts around 4s. Recent machines deliver latencies below 3s, and on older machines, the latency may be higher. Latencies above 10s indicate a problem; typical Ethernet latencies start at 20s and more. In case of latencies being to high, please verify if SuperSockets are running and configured as described in the previous section. Also, verify that the environment variable LD_PRELOAD is set to libksupersockets.so. This is reported for the client in the second line of the output (see above), but LD_PRELOAD also needs to be set correctly on the server side. See Section 4.8, Making Cluster Application use Dolphin Express for more information on how to make generic socket applications (like sockperf) use SuperSockets.
The first line shows the number of open TCP (STREAM) and UDP (DGRAM) sockets that are using SuperSockets. For more detailed information, the extended statistics need to be enabled. Only the root user can do this:
# echo enable >/proc/net/af_sci/stats
43
Interconnect and Software Maintenance With enabled statistics, /proc/net/af_sci/stats will display a message size histogram (next to some internal information). When looking at this histogram, please keep in mind that the listed receive sizes (RX) may be incorrect as it refers to the maximal number of bytes that a process wanted to recv when calling the related socket function. Many applications use larger buffers then actually required. Thus, only the send (TX) values are reliable. To observe the current throughput on all SuperSockets-driven sockets, the tool dis_ssocks_stat can be used. Supported options are:
-d -t -w -h
Delay in seconds between measurements. This will cause dis_ssocks_stat to loop until interrupted. Print time stamp next to measurement point. Print all output to a single line. Show available options.
Example:
# dis_ssocks_stat -d 1 -t (1 s) RX: 162.82 MB/s TX: 165.43 MB/s (1 s) RX: 149.83 MB/s TX: 168.65 MB/s ... ( 0 B/s 0 B/s ) ( 0 B/s 0 B/s ) Mon Nov 12 17:59:33 CET 2007 Mon Nov 12 17:59:34 CET 2007
The first two pairs show the receive (RX) and send (TX) throughput via Dolphin Express of all sockets. The number pair in parentheses shows the throughput of sockets that operated by SuperSockets, but are currently in fallback (Ethernet) mode. Typically, there will be no fallback traffic.
2.
3.
If any of the verifications did report errors, make sure that the cable is plugged in cleanly, and that the screws are secured. If the error should persist, swap the position of the cable with another one that is known to be working, and observe if the problem is wandering with the cable. If it does, the cable is likely to be bad. Otherwise, one of the adapters might have a problem. 44
2. 3. 4. 5.
6.
Perform the cable test from within sciadmin to ensure that the cabling is correct (see Section 4.7.4, Cabling Correctness Test).
Warning
Running the cable test will stop other traffic on the interconnect for the time the test is running, which can be up to a minute. If this is not an option, please use scidiag from the commandline to verify the functionality of the interconnect (see Section 1.1.3, Static Interconnect Test). Communication between all other nodes will continue uninterrupted during this procedure.
Warning
Powering down more than one node will make other nodes not accessible via SCI if the powereddown nodes are not located within one ringlet. 2. Move the nodes to the new location and connect the cables to the adapters. Do not yet power them up!
45
Interconnect and Software Maintenance 3. Update the cluster configuration file /etc/dis/dishosts.conf on the frontend by either using dishostseditor or a plain text editor: If using dishostseditor, load the original configuration (by running dishostseditor on the frontend, it will be loaded automatically) and change the positions of the nodes within the torus. Save the new configuration. When using a plain text editor, exchange the hostnames of the nodes in this file. You can also change the adapter and socket names accordingly (which typically contain the hostnames), but this will not affect functionality.
4. 5.
Restart the network manager on the frontend to make the changed configuration effective. Power up the nodes. Once they come up, their configuration will be change by the network manager to reflect their new position within the interconnect.
5. Replacing a Node
In case a node needs to be replaced, proceed as follows concerning the SCI interconnect: 1. Power down the node. The network manager will automatically reroute the interconnect traffic. When you run sciadmin on the frontend, you will see the icon of the node turn red within the GUI representation of the cluster nodes. Unplug all cables from the adapter. Remember (or mark the cables) into which plug on the adapter each cables belongs. Unmount the adapter from the node to be replaced, and insert it into the new node. Power up the node; then connect the SCI cables in the same way they had been connected before. Make sure that all LEDs on all adapters in the affected ringlets light green again. Run the SIA with the option --install-node. To verify the installation after the SIA has finished: 1. 2. 6. The icon of the node in the sciadmin GUI must have turned green again. The output of the dis_services script should list all services as running.
2. 3. 4. 5.
Perform the cable test from within sciadmin to ensure that the cabling is correct (see Section 4.7.4, Cabling Correctness Test).
Warning
Running the cable test will stop other traffic on the interconnect for the time the test is running, which can be up to a minute. If this is not an option, please use scidiag from the commandline to verify the functionality of the interconnect (see Section 1.1.3, Static Interconnect Test). In case that more than one node needs to be replaced, please consider the following advice: To ensure that all nodes that are not be replaced can continue to communicate via the SCI interconnect while other nodes are replaced, you should replace nodes in a ring-by-ring manner: power down nodes within one ringlet only. Bring this group of nodes back to operation before powering down the next group of nodes. Communication between all other nodes will continue uninterrupted during this procedure.
6. Adding Nodes
To add new nodes to the cluster, please proceed as follows: 1. Install the adapter in the nodes to be added and power the nodes up.
46
Important
Do not yet connect the SCI cables, but make sure that Ethernet communication towards the frontend is working. 2. 3. Install the DIS software stack on all nodes via the --install-node option of the SIA. Change the cluster configuration using dishostseditor: 1. 2. 3. 4. 5. 4. 5. Load the existing configuration. In the cluster settings, change the topology to match the topology with all new nodes added. Change the hostnames of the newly added nodes via the node settings of each node. Also make sure that the socket configuration matches those of the existing nodes. Save the new cluster configuration. If desired, create and save or print the cabling instructions for the extended cluster. If you are not running dishostseditor on the frontend, transfer the saved files dishosts.conf and cluster.conf to the directory /etc/dis on the frontend.
Restart the network manager on the frontend. If you run sciadmin, the new nodes should show up as red icons. All other nodes should continue to stay green. Cable the new nodes: 1. 2. First create the cable connections between the new nodes. Then connect the new nodes to the cluster. Proceed cable-by-cable, which means you disconnect a cable at an "old" node and immeadeletly connect it to a new node (without disconnecting another cable first). This will ensure continue operation of all "old" nodes. When you are done, all LEDs on all adapters should light green, and all node icons in sciadmin should also light green.
3. 6.
Perform the cable test from within sciadmin to ensure that the cabling is correct (see Section 4.7.4, Cabling Correctness Test).
Warning
Running the cable test will stop other traffic on the interconnect for the time the test is running, which can be up to a minute. If this is not an option, please use scidiag from the commandline to verify the functionality of the interconnect (see Section 1.1.3, Static Interconnect Test).
7. Removing Nodes
To permanently remove nodes from the cluster, please proceed as follows: 1. Change the cluster configuration using dishostseditor: 1. 2. 3. Load the existing configuration. In the cluster settings, change the topology to match the topology with all nodes removed. Because the topology change might cut of nodes from the cluster at the "wrong" end, you have to make sure that the hostnames and the placement within the new topology for the remaining nodes is correct. To do this, change the hostnames of nodes by double-clicking their icon and changing the hostname in the displayed dialog box. If the SuperSocket configuration is based on the hostnames (not on the subnet addresses), make sure that the name of the socket interface matches a modified hostname.
47
Interconnect and Software Maintenance 4. Save the new cluster configuration. If desired, create and save or print the cabling instructions for the reduced cluster. If you are not running dishostseditor on the frontend, transfer the saved files dishosts.conf and cluster.conf to the directory /etc/dis on the frontend.
5. 2. 3.
Restart the network manager on the frontend. If you run sciadmin, the removed nodes should no longer show up. All other nodes should continue to stay green. Uncable the nodes to be removed one by one, making sure that the remaining nodes are cabled according to the cabling instructions generated above. On the nodes that have been removed from the cluster, the Dolphin Express software can easily be removed using the SIA option --wipe, like:
# sh DIS_install_<version>.sh --wipe
4.
This will remove all Dolphin software packages, services and configuration data from the node. If no SIA is available, the same effect can be achieved by manually uninstalling all packages that start with Dolphin-, remove potentially remaining installation directories (like /opt/DIS), and remove the configuration directory /etc/dis. 5. Perform the cable test from within sciadmin to ensure that the cabling is correct (see Section 4.7.4, Cabling Correctness Test).
Warning
Running the cable test will stop other traffic on the interconnect for the time the test is running, which can be up to a minute. If this is not an option, please use scidiag from the commandline to verify the functionality of the interconnect (see Section 1.1.3, Static Interconnect Test).
48
1. MySQL Cluster
Using MySQL Cluster with Dolphin Express and SuperSockets will significantly increase throughput and reduce the response time. Generally, SuperSockets operate fully transparently and no change in the MySQL Cluster configuration is necessary for a working setup. Please read below for some hints on performance tuning and trouble-shooting specific to SuperSockets with MySQL Cluster.
Such timeout problems can have different reasons like dead or overloaded nodes. One other possible reason could be that the time for a socket fail over between Dolphin Express and Ethernet exceeded the current timeout, which by default is 1200ms. This should rarely happen, but to solve this problem, please proceed as follows: 1. Increase the value of the NDBD configuration parameter TransactionDeadlockDetectionTimeout (see MySQL reference manual, section 16.4.4.5). As the default value is 1200ms, increasing it to 5000ms might be a good start. You will need to add this line to the NDBD default section in your cluster configuration file:
[NDBD DEFAULT] TransactionDeadlockDetectionTimeout: 5000
2.
Verify the state of the Dolphin Express by checking the logfile of the networkmanager (/var/log/ scinetworkmanager.log) to see if any interconnect events have been logged. If there are repeated logged error events for which no clear reason (such as manual intervention or node shutdown) can be determined, you should test the interconnect using sciadmin (see Section 4.2, Traffic Test). If no events have been logged, it is very unlikely that the interconnect or SuperSockets are the cause of the problem. Instead, you should try to verify that no node is overloaded or stuck.
3.
49
MySQL Operation
2. MySQL Replication
SuperSockets do also significantly increase the performance in replication setups (speedups up to a factor of 3 have been reported). All that is necessary is to make sure that all MySQL server processes involved run with the LD_PRELOAD variable set accordingly. No MySQL configuration changes are necessary.
50
FAILED
UNSTABLE
DIS_OLDSTATE
DIS_ALERT_TARGET This variable contains the target address for the notification. This target address is provided by the user when the notification is enabled (see below), and the user needs to make sure that the content of this variable is useful for the chosen alert script. I.e., if the alert script should send an email, the content of this variable needs to be an email address. DIS_ALERT_VERSION The version number of this interface (currently 1). It will be increased if incompatible changes to the interface need to be introduced, which could be a change in the possible content of an existing environment variable, or the removal of an environment variable. This is unlikely and does not necessarily make an alert script fail, but a script
51
Advanced Topics
that relies on this interface in a way where this matters needs to verify the content of this variable.
Then enter the alert target and choose the alert script by pressing the button and selecting the script in the file dialog. Dolphin provides an alert script /opt/DIS/etc/dis/alert.sh (for the default installation path) which sends out an email to the specified alert target. Any other executable can be specified here. Please consider that this script will be executed in the context of the user running the network manager (typically root), so the permissions to change this file should be set accordingly. To make the changes done in this dialog effective, you need to save the configuration files (to /etc/dis on the frontend) and then restart the network manager:
# service dis_networkmgr restart
To disable notification, these lines can be commented out (precede them with a #). After the file has been edited, the network manager needs to be restarted to make the changes effective:
52
Advanced Topics
This is a per-session setting and will be lost if the network manager is restarted.
Warning
Make sure that the messages are enabled again before you quit sciadmin. Otherwise, interconnect status changes will not be notified until the network manager is restarted.
53
Advanced Topics
be moved to dis_irm.conf.rpmsave, and the default dis_irm.conf will replace previously modified version. You will need to undo this file renaming. If you update your system with the SIA as described in Chapter 4, Update Installation, SIA will take care that the existing dis_irm.conf will be preserved and stay effective.
54
Chapter 9. FAQ
This chapter lists problems that have been reported by customers and the proposed solutions.
1. Hardware
1.1.1. Although I have properly installed the adapter in a node and its LEDs light orange, I am told (i.e. during the installation) that this node does not contain an SCI adapter! The SCI adapter might not have been recognized by the node during the power-up initialization after power was applied again. The specification requires that a node needs to be powered down for at least 5 seconds before being powered up again. To make the adapter be recognized again, you will need to power-down the node (restarting or resetting is not sufficient!), wait for at least 5 seconds, and power it up again. If this does not fix the problem, please contact Dolphin support. 1.1.2. All cables are connected, and all LEDs shine green on the adapter boards, all required services and drivers are running on all nodes. However, some nodes can not see some other nodes via the SCI interconnect. Between some other pairs of nodes, the communication works fine. These symptoms indicate that the cabling is not correct, i.e. the links 0 and 1 (x- and y-direction in a 2D-torus) are exchanged. To resolve the problem, proceed as follows: 1. 2. 3. Run the cable test from sciadmin (Server Test Cable Connections). If no problem is reported, please contact Dolphin Support. To fix the cable problem, dreate a cabling description via dishotseditor (File Get Cabling Instructions) and the cabling between the nodes that have been reported in the cable test. Repeat step 1. and 2.until no more problems are reported.
1.1.3. The SCI driver dis_irm refuses to load, or driver install never completes. Running dmesg shows that the syslog contains the line Out of vmalloc space. What's wrong? The problem is that the SCI adapter requires more virtual PCI address space than supported by the installed kernel. This problem has so far only been observed on 32 bit operating systems. There are two alternative solutions: 1. If you are building a small cluster you may be able to run your application with less SCI address space. You can change the SCI address space size for the adapter card by using sciconfig with the command set-prefetch-mem-size. A value of 64 or 16 will most likely overcome the problem. This operation can also be performed from the command line using the options -c to specify the card number (1 or 2) and -spms to specify the prefetch memory size in Megabytes:
# sciconfig -c 1 -spms 64 Card 1 - Prefetch space memory size is set to 64 MB A reboot of the machine is required to make the changes take effect.
When rebooting the machine, the problem should be solved. 2. If reducing the prefetch memory size is not desired, the related resources in the kernel have to be increased. For x86-based machines, this is achieved by passing the kernel option vmalloc=256m and the parameter uppermem=524288 at boot time. This is done by editing /boot/grub/grub.conf as shown in the following example:
title CentOS-4 i386 (2.6.9-11.ELsmp) root (hd0,0) uppermem 524288 kernel /i386/vmlinuz-2.6.9-11.ELsmp ro root=/dev/sda6 rhgb quiet vmalloc=256m initrd /i386/initrd-2.6.9-11.ELsmp.img
55
FAQ
2. Software
2.1.1. The service dis_irm (for the low-level driver of the same name) fails to load after it has been installed for the first time. Please follow the procedure below to determine the cause of the problem. 1. Verify that the adapter card has been recognized by the machine. This can be done as follows:
[root@n1 ~]# lspci -v | grep Dolphin 03:0c.0 Bridge: Dolphin Interconnect Solutions AS PSB66 SCI-Adapter D33x Subsystem: Dolphin Interconnect Solutions AS: Unknown device 2200
If this command does not show any output similar to the example above, the adapter card has not been recognized. Please try to power-cycle the system according to FAQ Q: 1.1.1. If this does not solve the issue, a hardware failure is possible. Please contact Dolphin support in such a case. 2. Check the syslog for relevant messages. This can be done as follows:
# dmesg | grep SCI
Depending on which messages you see, proceed as described below: SCI Driver : Preallocation failed The driver failed to preallocate memory which will be used to export memory to remote nodes. Rebooting the node is the simplest solution to defragment the physical memory space. If this is not possible, or if the message appears even after a reboot, you need to adapt the preallocation settings (see Section 3.1, dis_irm.conf). See FAQ Q: 1.1.3.
3.
If the driver still fails to load, please contact Dolphin support and provide the driver's syslog messages:
# dmesg > /tmp/syslog_messages.txt
2.1.2. Although the Network Manager is running on the frontend, and all nodes run the Node Manager, configuration changes are not applied to the adapters. I.e., the node ID is not changed according to what is specified in /etc/dis/dishosts.conf on the frontend. The adapters in a node can only be re-configured when they are not in use. This means, no adapter resources must be allocated via the dis_irm kernel module. To achieve this, make sure that upper layer services that use dis_irm (like dis_sisci and dis_supersockets) are stopped. On most Linux installations, this can be achieved like this (dis_services is a convenience script that come with the Dolphin software stack):
# dis_services stop ... # service dis_irm start ... # service dis_nodemgr start ...
2.1.3. The Network Manager on the frontend refuses to start. In most cases, the interconnect configuration /etc/dis/dishosts.conf is corrupted. This can be verified with the command testdishosts. It will report problems in this configuration file, as in the example below:
# testdishosts socket member node-1_0 does not represent a physical adapter in dishosts.conf
56
FAQ
In this case, the adapter name in the socket definition was misspelled. If testdishosts reports a problem, you can either try to fix /etc/dis/dishosts.conf manually, or re-create it with dishostseditor. If this does not solve the problem, please check /var/log/scinetworkmanager.log for error messages. If you can not fix the problem reported in this logfile, please contact Dolphin support providing the content of the logfile. 2.1.4. After a node has booted, or after I restarted the Dolphin drivers on a node, the first connection to a remote node using SuperSockets does only deliver Ethernet performance. Retrying the connection then delivers the expected SuperSockets performance. Why does this happen? Make sure you run the node manager on all nodes of the cluster, and the network manager on the frontend being correctly set up to include all nodes in the configuration (/etc/dis/dishosts.conf). The option Automatic Create Session must be enabled. This will ensure that the low-level "sessions" (Dolphin-internal) are set up between all nodes of the cluster, and a SuperSockets connection will immediately succeed. Otherwise, the set-up of the sessions will not be done until the first connection between two nodes is tried, but this is too late for the first connection to be established via SuperSockets. 2.1.5. Socket benchmarks show that SuperSockets are not active as the minimal latency is much more than 10us. The half roundtrip latency (ping-pong latency) with SuperSockets typically starts between 3 and 4us for very small messages. Any value above 7us for the minimal latency indicates a problem with the SuperSockets configuration, benchmark methodology of something else. Please proceed as follows to determine the reason: 1. Is the SuperSockets service running on both nodes? /etc/init.d/dis_supersockets status should report the status running. If the status is stopped, try to start the SuperSockets service with /etc/init.d/ dis_supersockets start. 2. Is LD_PRELOAD=libksupersockets.so set on both nodes? You can check using the ldd command. Assuming the benchmark you want to run is named sockperf, do ldd sockperf. The libksupersockets.so should appear at the very top of the listing. 3. Are the SuperSockets configured for the interface you are using? This is a possible problem if you have multiple Ethernet interfaces in your nodes with the nodes having different hostnames for each interface. SuperSockets may be configured to accelerate not all of the available interfaces. To verify this, check which IP addresses (or subnet mask) are accelerated by SuperSockets by looking at /proc/net/af_sci/socket_maps (Linux) and use those IP addresses (or related hostnames) that are listed in this file. 4. If the SuperSockets service refuses to start, or only starts into the mode running, but not configured, you probably have a corrupted configuration file /etc/dis/dishosts.conf: verify that this file is identical to the same file on the frontend. If not, make sure that the Network Manager is running on the frontend (/etc/init.d/dis_networkmgr start). 5. If the dishosts.conf files are identical on frontend and node, they could still be corrupted. Please run the dishostseditor on the frontend to have it load /etc/dis/dishosts.conf; then save it again (dishostseditor will always create syntactically correct files). 6. Please check the system log using the dmesg command. Any output there from either dis_ssocks or af_sci should be noted and reported to <support@dolphinics.com>. 2.1.6. I am running a mixed 32/64-bit platform, and while the benchmarks latency_bench and sockperf from the DIS installation show good performance of SuperSockets, other applications do only show Ethernet performance for socket communication.
57
FAQ
Please use the file command to verify if the applications that fail to use SuperSockets are 32-bit applications. If they are, please verify if the 32-bit SuperSockets library can be found as /opt/DIS/lib/ libksupersockets.so (this is a link). If this file is not found, then it could not be built due to a missing or incomplete 32-bit compilation environment on your build machine. This problem is indicated by the message #* WARNING: 32-bit applications may not be able to use SuperSockets of the SIA. If on a 64-bit platform 32-bit libraries can not be built, the RPM packages will still be built successfully (without 32-bit libraries included) as many users of 64-bit platforms do only run 64-bit applications. To fix this problem, make sure that the 32-bit versions of the glibc and libgcc-devel (packages) are installed on your build machine, and re-build the binary RPM packages using the SIA option --build-rpm, making sure that the warning message shown above does not appear. Then, replace the existing RPM package DolphinSuperSockets with the one you have just build. Alternatively, you can perform a complete re-installation. 2.1.7. I have added the statement export LD_PRELOAD=libksupersockets.so to my shell profile to enable the use of SuperSockets. This works well on some machines, but on other machines, I get the error message
ERROR: ld.so object 'libksupersockets.so' from LD_PRELOAD cannot be preloaded : ignore
whenever I log in. How can this be fixed? This error message is generated on machines that do not have SuperSockets installed. On these machines, the linker can not find the libksupersockets.so library. This can be fixed to set the LD_PRELOAD environment variable only if SuperSockets are running. For a sh-type shell such as bash, use the following statements in the shell profile ($HOME/.bashrc):
[ -d /proc/net/af_sci ] && export LD_PRELOAD=libksupersockets.so
2.1.8. How can I permanetly enable the use of SuperSockets for a user? This can be achieved by setting the LD_PRELOAD environment variable in the users' shell profile (i.e. $HOME/ .bashrc for the bash shell). This should be done conditionally by checking if SuperSockets are running on this machine:
[ -d /proc/net/af_sci ] && export LD_PRELOAD=libksupersockets.so
Of course, it is also possible to perform this setting globally (in /etc/profile). 2.1.9. I can not build SISCI applications that are able to run on my cluster because the frontend (where the SISCI-devel package was installed by the SIA) is a 32-bit machine, while my cluster nodes are 64-bit machines (or vice versa). I fail to build the SISCI applications on the nodes as the SISCI header files are missing. How can this deadlock be solved? When the SIA installed the cluster, it has stored the binary RPM packages in different directories node_RPMS, frontend_RPMS and source_RPMS. You will find a SISCI-devel RPM that can be installed on the nodes in the node_RPMS directory. If you can not find these RPM file, you can recreate them on one of the nodes using the SIA with the --build-rpm option. Once you have the Dolphin-SISCI-devel binary RPM, you need to install it on the nodes using the --force option of rpm because the library files conflict between the installed SISCI and the SISCI-devel RPM:
# rpm -i --force Dolphin-SISCI-devel.<arch>.<version>.rpm
58
59
2. SIA Options
Next to the different operating modes, a number of options are available that influence the operation. Not all options have an impact on all operating modes.
If this option is provided, existing configuration files like /etc/dis/dishosts.conf will not be considered.
This will install into /usr/dolphin. It is recommended to install into a dedicated directory that is located on a local storage device (not mounted via the network). When doing a full cluster install (--install-all, or default operation), the same installation path will be used on all nodes, the frontend and potentially the installation machine (if different from the frontend).
The installer does not verify if the provided packages match the installation target, but the RPM installation itself will fail in this case.
60
On a 16 node cluster, this will make the dis_irm allocate 8 + 16*8 = 136MB on each node.
Note
The operating system can not use preallocated memory for other purposes - it is effectively invisible. Setting MB to -1 will disable all modifications to this configuration option, and the fixed default of 16MB will be preallocated independently from the number of nodes. Setting MB to 0 is also valid (8 MB will be allocated). This option changes a value in the module configuration file dis_irm.conf. It is only effective on an initial installation. An existing configuration file dis_irm.conf will never be changed, i.e. when upgrading an existing installation.
The script will look for both configuration files in /tmp. If you need to specify the two configuration files being stored in different locations, use the options --dishostsconf <filename> and --networkmgr-conf <filename>, respectively, to specify where each of the configuration files can be found.
After this command returns, your cluster is guaranteed to be freshly installed unless any error messages can be found in the file install.log.
61
This will remove all packages from the node, and stop all drivers (if they are not in use). A more thorough cleanup, including all configuration data and possible remainings of non-SIA installations, can be achieved with the --wipe options:
--wipe
62
Note
Only one sciadmin process can connect to the network manager at any time. If you should ever need to connect to the network manager while another sciadmin process is blocking the connection, you can restart the network manager to terminate this connection. Afterwards, you can connect to the network manager from your sciadmin process (which needs to be running on a different machine than the other sciadmin process).
63
sciadmin Reference
Red pencil strokes indicate that a link is broken. Typically, a cable is unplugged, or not seated well.
Yellow pencil strokes indicate that links have been disabled. Links are typically disabled when there are broken cables somewhere else in the ringlet and automatic fail over recovery has been enabled. A red dot (cranberry) indicates that the node has lost connectivity to other nodes in the cluster.
Blue pencil strokes indicate that an administrator has chosen to disable this link in sciadmin. This may be done if you want to debug the cluster.
2.2. Operation
2.2.1. Cluster Status
The area at the top right informs about the current cluster status and shows settings of sciadmin and the connected network manager. A number of settings can be changed in the Cluster Settings dialog that is shown when pressing the Settings button. Fabric status shows the current status of the fabric, UP, DEGRADED, FAILED or UNSTABLE (see below). Check Interval SCIAdmin shows the number of seconds between each time the Network Manager sends updates to the Dolphin Admin GUI. Check Interval Network Manager shows the number of seconds between each time the Network Manager receives updates from the Node Managers. Topology shows the current topology of the fabric. Auto Rerouting shows the current automatic fail over recovery settings (On, Off or default). Fabric is UP when all nodes are operational and all links ok and therefore plotted in green.
64
sciadmin Reference
Fabric is DEGRADED when all nodes are operational, some links are broken, but we still have full connectivity. In the snapshot below, the input cable of link 0 of node tiger 1, which is the output cable at node tiger-3, is defunct (this typically means unplugged) and therefore the link is plotted in red. The other links on this ring have become disabled by the network manager and therefore plotted in yellow. All other links are functional and plotted in green. To get more information on the interconnect status for node tiger-1, get its diagnostics via Node Diag -V 1.
65
sciadmin Reference
Fabric is in status FAILED if several links are broken in a way that breaks the full connectivity. In this case, the input cable of link 0 and the output cable of link 1 are defunct. Node tiger-1 can not communicate via SCI in this situation, and SuperSockets-driven sockets will have fallen back to Ethernet. Because the cluster is cabled in an interleaved pattern, the link 1 output cable of tiger 1 is the link 1 input cable of tiger-4, and not tiger-7 as it would be the case for non-interleaved cabling.
66
sciadmin Reference
The fabric status is also set to FAILED if one or more nodes are dead as this node can not be reached via SCI. The reason for a node being dead can be Node is not powered up. Solution: power up the node. Node has crashed. Solution: reboot the node. The IRM low-level driver is not running. Solution: start the IRM driver like
# service dis_irm start
The node manager is not running. Solution: start the node manager like
# service dis_nodemgr start
The adapter is in an invalid state or is missing. Please check the node, and also consider the related topic in the FAQ (Q: 1.1.1).
67
sciadmin Reference
68
sciadmin Reference
Connect to the network manager running on the local or a remote machine. Disconnect from the network manager. Refresh Status of the node and interconnect (instead of waiting for the update interval to expire). Switch to Debug Statistics View will show the value of selected counters of each adapter instead of the node icons which is useful for debugging fabric problems.
Each fabric in the cluster has a sub-menu Fabric <X>. Within this sub-menu, the Diag (-V 0), Diag (-V 1), Diag (-V 9) are diagnostics functions that can be used to get more detailed information about a fabric that shows problem symptoms. Diag (-V 0) prints only errors that have been found. Diag (-V 1) prints more verbose status information (verbosity level 1). Diag (-V 9) prints the full diagnostic information including all error counters (verbosity level 9).
69
sciadmin Reference
Diag -clear clears all the error counters in the Dolphin Express interconnect adapters. This helps to observe if error counters are changing. Diag -prod prints production information about the Dolphin Express interconnect adapters (serial number, card type, firmware revision etc) The Test option is described in Section 4.2, Traffic Test The other commands in the Cluster menu are: Settings displays the Cluster Settings dialog (see below). Reboot cluster nodes reboot all cluster nodes after a confirmation. Power down cluster nodes powers down all cluster nodes after a confirmation. Toggle Network Manager Verbose Settings to increase/decrease the amount of logging from the Dolphin Network Manager Select the Arrange Fabrics option to make sure that the different adapters in your hosts are connected to the same fabric. This option is only displayed for clusters with more than one fabric. Test Cable Connections is described in Section 4.1, Cable Testt
70
sciadmin Reference
Check Interval Admin alters the number of seconds between each time the Network Manager sends updates to the SCIAdmin GUI. Check Interval Network Manager alters the number of seconds between each time the Network Manager receives updates from the Node Managers. Topology lets you select the topology of the cluster, while Topology found displays the auto-determined topology. Changes to the topology setting can be performed with dishostseditor. Auto Rerouting lets you decide to enable automatic fail over recovery (On), choose to freeze the routing to a current state (Off), or use the default routing tables in the driver (Default), the latter also means that no automatic rerouting will take place. Nodes in X,Y,Z dimension shows how the interconnect is currently dimensioned. Changes to the dimension settings can be performed with dishostseditor. Remove Session to dead nodes lets you decide whether to remove the session to nodes that are unavailable. Wait before removing session defines the number of seconds to wait until removing sessions to a node that has died or became inaccessible by other means. Automatic Create Sessions to new nodes lets you decide if the Network Manager shall create sessions to all available nodes. Alert script lets you choose to enable/disable the use of a script that may alert the cluster status to an administrator.
71
sciadmin Reference
Link Frequency sets the frequency of a link. It is not recommended to change the default setting. Prefetch Memsize shows the maximum amount of remote memory that can be accessed by this node. A changed value will not become effective until the IRM driver is restarted on the node, which has to be done outside of sciadmin. Setting this value too high (> 512MB) can cause problem with some machines, especially for 32bit platforms. SCI LINK 0 / 1 / 2 allows to set the way a link is controlled: Automatic lets the network manager control the link to enable and disable it as required by the link and the interconnect status. Disabled forces a link down. This is a per-session setting (the link will be under control of the network manager if it is restarted), and only required as a temporary measure for trouble shooting. The disable link option can also be used as a temporary measure to disable an unstable adapter or ringlet so that it does not impose unnecessary noise on the adapters. If such an unlikely event occurs, please contact Dolphin support. A manually disabled link is marked blue in the sciadmin interconnect display, as shown in the screenshot below.
72
sciadmin Reference
Warning
Please note that when Auto Rerouting is enabled (default setting), disabling a link within a ringlet will disable the complete ringlet. Disabling to many links can thus isolate nodes from access to the Dolphin Express interconnect.
Figure B.10. Link disabled by administrator (Disabling the links on the machine with hostname tiger-5 takes down the corresponding links on the other machines that share the same ringlet.).
73
sciadmin Reference
Warning
Please note that while this test is running, all traffic over the Dolphin Express interconnect will be blocked. Although this will not introduce any communication errors except the delay, it therefore is recommended to run the test on an idle cluster. SuperSockets will fall back to Ethernet while this test is running.
74
sciadmin Reference
Note
To perform this test, the SISCI RPM has to be installed on all nodes. This is the case if the installation was performed via SIA. If SISCI is not installed on a node, an error will be logged and displayed as shown below.
Warning
Please note that while this test is running, all traffic over the Dolphin Express interconnect will be blocked. Although this will not introduce any communication errors except the delay, it therefore is recommended to run the test on an idle cluster. SuperSockets will fall back to Ethernet while this test is running.
75
sciadmin Reference
Figure B.13. Result of fabric test without installing all the necessary rpms
76
sciadmin Reference
77
1.1. dishosts.conf
The file dishosts.conf is used as a specification of the Dolphin Express interconnect (in a way just like /etc/ hosts specifies nodes on a plain IP based network). It is a system wide configuration file and should be located with its full path on all nodes at /etc/dis/dishosts.conf. Templates of this file can be found in /opt/DIS/ etc/dis/. A syntactical and semantic validation of dishosts.conf can be done with the tool testdishosts. The Dolphin network manager and diagnostic tools will always assume that the current file dishosts.conf is valid. If dynamic information read from the network contradicts the information read in the dishosts.conf file, Dolphin network manager and diagnostic tools will assume that components are mis configured, faulty or removed for repair.
dishosts.conf is by default automatically distributed to all nodes in the cluster when the Dolphin network man-
agement software is started. Therefore, do edit and maintain this file on the frontend only. You should create and maintain dishosts.conf by using the dishostseditor GUI (Unix: /opt/DIS/sbin/dishostseditor). Normally, there is no reason to edit this file manually. To make changes in dishosts.conf effective, the network manager needs to be restarted like
# service dis_networkmgr restart
In case that SuperSockets settings have been changed, dis_ssocks_cfg needs to be run on every node as SuperSockets are not controlled by the network manager. The following sections describe the keywords used.
HOSTNAME: <hostname/IP>
A Dolphin network node may hold several physical adapters. Information about a node's physical adapters is listed right below the hostname. All nodes specified by a HOSTNAME need at least one physical adapter. This
78
Configuration Files
physical adapter has to be specified on the next line after the HOSTNAME. The physical adapters are associated with the keyword ADAPTER.
#Keyword name ADAPTER: host1_a0 ADAPTER: host1_a1 nodeid 4 4 adapter 0 1
Defines a virtual striping adapter comprising two physical adapters, which will be used for automatic data striping (also referred to as channel bonding). Striping adapters will also be used as redundant adapters in the case of network failure.
STRIPE: host1_s host1_a0 host1_a1
Defines a virtual redundant adapter comprising two physical adapters, which will be used for automatic fail over in case one of the fabrics fails.
REDUNDANT: host1_r host1_a0 host1_a1
Starting with DISHOSTVERSION 2 SuperSockets can handle dynamic IP-to-nodeId mappings. I.e. a certain IP address does not need to be bound to a fixed machine but can roam in a pool of machines. The address resolution is done at runtime. For such a configuration a new type of adapter must be specified: SOCKETADAPTER: <socket adaptername> [ SINGLE | STRIPE | REDUNDANT ] <adapterNo> [ <adapterNo> ... ] This keyword basically only defines an adapter number, which is not associated to any nodeId. Example:
SOCKETADAPTER: sockad_s STRIPE 0 1
Defines socket adapter "sockad_s" in striping mode using physical adapters 0 and 1. The resulting internal adapter number is 0x2003. Such socket adapters can now be used in order to define dynamic mappings, and, in extension to DISHOSTVERSION 1 whole networks can be specified for dynamic mappings: SOCKET: [ <ip> | <hostname> | <network/mask_bits> ] <socket adapter> Enables the given address/network for SuperSockets and associates it with a socket adapter. It is possible to mix dynamic and static mappings, but there must be no conflicting entries. Example:
SOCKET: SOCKET: SOCKET: SOCKET: host1 sockad_s host2 sockad_s host3 host3_s 192.168.10.0/24 sockad_s
79
Configuration Files
1.2. networkmanager.conf
The networkmanager.conf specifies the startup parameters for the Dolphin Network Manager. It is created by the dishostseditor.
1.3. cluster.conf
This file must not be edited by the user. It is a configuration file of the Network Manager that consists of the user-specified settings from networkmanager.conf and derived settings of the cluster (nodes). It is created by the Network Manager.
2. SuperSockets Configuration
The following sections describe the configuration files that specifically control the behaviour of Dolphin SuperSockets. Next to these files, SuperSockets retrieve important configuration information from dishosts.conf as well. To make changes in any of these file effective, you need to run dis_ssocks_cfg on every node. Changes do not apply to sockets that are already open.
2.1. supersockets_profiles.conf
This file defines system-wide settings for all SuperSockets applications using LD_PRELOAD. All settings can be overridden by environment variables named SSOCKS_<option> (like export SSOCKS_DISABLE_FALLBACK=1). SYSTEM_POLL [ 0 | 1 ] Usage of poll/select optimization. Default is 0 which means that the SuperSockets optimization for the poll() and select() system calls is used. This optimization typically reduces the latency without increasing the CPU load. To only use the native system methods for poll() and select(), set this value to 1. Receive poll time [s]. Default is 30. Increasing this value may reduce the latency as the CPU will spin longer to wait for new data until it blocks sleeping. Reducing this value will send the CPU to sleep earlier, but this may increase message latency. Transmit poll time [s]. Default is 0, which means that the CPU does not spin at all when a no buffers at the receiving side are available. Instead, it will imeadeatly block until the receiver reads data from these buffers (which makes buffer space available again for sending). The situation of no available receive buffers does rarely occur, and increasing this value is not recommended. Message buffer size [byte]. Default is 128KB. This value determines how much data can be sent without the receiver reading it. It has no significant impact on bandwidth.
RX_POLL_TIME <int>
TX_POLL_TIME <int>
MSQ_BUF_SIZE <int>
80
Configuration Files
MIN_DMA_SIZE <int> MAX_DMA_GATHER <int> MIN_SHORT_SIZE <int> MIN_LONG_SIZE <int> FAST_GTOD [ 0 | 1 ] DISABLE_FALLBACK [ 0 | 1 ]
Minimum message size for DMA [byte]. Default is 0 (DMA disabled). Maximum number of messages gather into a single DMA transfer. Default is 1. Switch point [byte] from INLINE to SHORT protocol. Default depends on driver. Switch point [byte] from SHORT to LONG protocol. Default depends on driver. Usage of accelerated gettimeofday(). Default is 0 which disables this optimization. Set to 1 to enable it. Control fallback from SuperSockets to native sockets. Default is 0, which means fallback (and fallforward) is enabled. To ensure that only SuperSockets are used (i.e. for benchmarking), set it to 1. Usage of fully asynchronous transfers. Default is 1, which means that the SHORT and LONG protocol is processed by a dedicated kernel thread. By this, the sending process is available immedeatly, and the actual data transfer is performed asynchronously. This generally increases throughput and reduces CPU load with affecting small message latency. To disable asynchronous transfers, set this option to 0; in this case, all data transfers are performed by the CPU that runs the process that called the send function.
ASYNC_PIO [ 0 | 1 ]
2.2. supersockets_ports.conf
This file is used to configure the port filter for SuperSockets. If no such file exists all ports will be enabled by default. It is, however, recommended to exclude all system ports. A suitable port configuration file is part of the SuperSockets software package. You can adjust it to your specific needs.
# Default port configuration for Dolphin SuperSockets # Ports specifically enabled or disabled to run over SuperSockets. # Any socket not specifically covered, is handled by the default: EnablePortsByDefault yes # Recommended settings: # Disable the privileged ports used by system services. DisablePortRange tcp 1 1023 DisablePortRange udp 1 1023 # Disable Dolphin Interconnect Manager service ports. DisablePortRange tcp 3443 3445
The following keywords are valid: EnablePortsByDefault [ yes | no ] DisablePortRange [ tcp | udp ] <from> <to> EnablePortRange [ tcp | udp ] <from> <to> Determines the policy for unspecified ports. Explicitlely disables the given port range for the given socket type. Explicitlely enables the given port range for the given socket type.
3. Driver Configuration
The Dolphin drivers are designed to adapt to the environment they are operating in; therefore, manual configuration is rarely required. The upper limit for memory allocation of the low-level driver is the only setting that may need to be adapted for a cluster, but this is also done automatically during the installation.
81
Configuration Files
Warning
Changing parameters in these files can affect reliability and performance of the Dolphin Express interconnect.
3.1. dis_irm.conf
is located in the lib/modules directory of the DIS installation (default /opt/DIS) and contains options for the hardware driver (dis_irm kernel module). Only few options are to be modified by the user. These options deal with the memory pre-allocation in the driver:
dis_irm.conf
Warning
Changing other values in dis_irm.conf as those described below may cause the interconnect to malfunction. Only do so if instructed by Dolphin support. Whenever a setting in this file is changed, the driver needs to be reloaded to make the new settings effective.
dis_max_segment_size_megabytes Sets the maximum size MiB of a memory segment that can be allocated for remote access. Some systems may lock up if too much memory is requested. max-vc-number Maximum number of n/a virtual channels (one virtual channel is needed per remote memory connection; i.e. 2 per SuperSocket connection)
integers > 0 The upper limit is the consumed memory; values > 16384 are typically not necessary.
1024
number-of-megabytes-preallocated Defines the number of MiB MiB memory the IRM shall try to allocate upon initialization. use-sub-pools-for-preallocation If the IRM fails to n/a allocate the amount memory specified by number-of-
0: disable preallocation 16 (may be increased by >0: MiB to preallocate the installer in as few blocks as pos- script) sible 0: disable sub-pools 1: enable sub-pools 1
82
Configuration Files
Option Name
Description megabytes-preallocated it will by default repetively decrease the amount and retry until success. By enabling use-sub-poolsfor-preallocation the IRM will continue allocate memory (possibly in small chunks), until the amount specified by number-ofmegabytes-preallocated is reached.
Unit
Valid Values
Default Value
block-size-of-preallocated-blocks
To allocate not a single bytes large block, but multiple blocks of the same size, this parameter has to be set to a value > 0. Pre-allocating memory this way is useful if the application to be run on the cluster uses many memory segments of the same (relatively small) size.
0: don't preallocate 0 memory in this manner > 0: size in bytes (will be aligned upwards to page size boundary) of each memory block.
number-of-preallocated-blocks
The number of block to n/a be preallocatd (see previous parameter) This sets a lower lim- KiB it on the size of memory segments the IRM may # try to allocate from the preallocated pool. The IRM will allways request the system for additional memory than resolving memory request less than this size.Due to the aspect of the need of the preallocation mechanism, there is a "hard" lower limit of one SCI_PAGE (currently 8K). The mininum size is defined in 1K blocks. Directs the IRM when n/a to try to use memo-
minimum-size-to-allocate-frompreallocated-pool
0: always allocate from 0 pre-allocated memory > 0: try to allocate memory that is smaller than this value from non-preallocatd memory
try-first-to-allocate-from-preallocated-pool
83
Configuration Files
Option Name
Unit
Valid Values ly to be used when the system can't honor a request for additional memory. 1:IRM to prefers to allocate memory from the preallocated pool when possible.
Default Value
notes-disabled
notes-on-log-file-only
3.2. dis_ssocks.conf
Configuration file for SuperSockets (dis_ssocks) kernel module. If a value different from the default is required, edit and uncomment the appropriate line.
#tx_poll_time=0; #rx_poll_time=30; #min_dma_size=0; #min_short_size=1009; #min_long_size=8192; #address_family=27; #rds_compat=0;
The following keywords are valid: tx_poll_time Transmit poll time [s]. Default is 0, which means that the CPU does not spin at all when a no buffers at the receiving side are available. Instead, it will imeadeatly block until the receiver reads data from these buffers (which makes buffer space available again for sending). The situation of no available receive buffers does rarely occur, and increasing this value is not recommended. Receive poll time [s]. Default is 30. Increasing this value may reduce the latency as the CPU will spin longer to wait for new data until it blocks sleeping. Reducing this value will send the CPU to sleep earlier, but this may increase message latency.
rx_poll_time
84
Configuration Files
Minimum message size for using DMA (0 means no DMA). Default is 0. Minimum message size for using SHORT protocol. Default and maximum is 1009. Minimum message size for using LONG protocol. Default is 8192. AF_SCI address family index. Default value is 27. If not set, the driver will automatically chose another index between 27 and 32 until it finds an unused index. The chosen index can be retrieved via the /proc file system like cat /proc/net/af_sci/family . If this value is set explicitly, this value will be chosen, and no search for unused values is performed. Generally, this value is only required if SuperSockets should be used explictely without the preload library.
rds_compat
85
2. IRM
Resource Limitations The IRM (Interconnect Resource Manager) manages the hardware and related software resources of the Dolphin Express interconnect. Some resources are allocated once when the IRM is loaded. The default setting are sufficient for typical cluster sizes and usage scenarios. However, if you hit a resource limit, it is possible to increase the
3. SuperSockets
UDP Broadcasts Heterogeneous Cluster Operation (Endianess) SuperSockets do not support UDP broadcast operations. By default, SuperSockets are configured to operate in clusters where all nodes use the same endian representation (either little endian or big endian). This avoids costly endian conversions and works fine in typical clusters where all nodes use the same CPU architecture. Only if you mix nodes with Intel or AMD (x86 or x86_64) CPUs i.e. with nodes using PowerPC- or Sparc-based CPUs, this default setting won't work. In this case, an internal flag needs to be set. Please contact Dolphin support if this situation applies to you. Sending and Receiving Vectors The vector length for the writev() and sendmsg() functions is limited to 16. For readv() and recvmsg(), the vector length is not limited. The following socket options are supported by SuperSockets for communication over Dolphin Express: SO_DONTROUTE (implicit, as SuperSockets don't use IP packets for data transport and thus are never routable) TCP_NODELAY 86
Socket Options
Platform Issues and Software Limitations SO_REUSEADDR SO_TYPE The following socket options are passed to the native (fallback) socket: SO_SENDBUF and SO_RECVBUF (the buffer size for the SuperSockets is fixed). All other socket options are not supported (ignored). Fallforward for Stream Sockets While SuperSockets offer fully transparent fall-back and fall-forward between Dolphin Express-based communication and native (Ethernet) communication for any socket (TCP or UDP) while it is open and used, there is currently a limitation on sockets when they connect: a socket that has been created via SuperSockets and connected to a remote socket while the Dolphin Express interconnect was not operational will not fall forward to Dolphin Express communication when the interconnect comes up again. Instead, it will continue to communicate via the native network (Ethernet). This is a rare condition that typically will not affect operation. If you suspect that one node performs not up to the expectations, you can either contact Dolphin support to help you diagnose the problem, or restart the application making sure that the Dolphin Express interconnect is up. Removal of this limitation as well as a simple way to diagnose the precise state of a SuperSockets-driven socket is scheduled for updated versions of SuperSockets. Resource Limitations SuperSockets allocate resources for the communication via Dolphin Express by means of the IRM. Therefore, the resource limitations listed for the IRM indirectly apply to SuperSockets as well. To resolve such limitations if they occur (i.e. when using very large number of sockets per node), please refer to the relevant IRM section above. SuperSockets logs messages to the syslog for two typical out-of-resources situations: No more VCs available. The maximum number of virtual channels needs to be increased (see Section 2, IRM). No more segment memory available. The amount of pre-allocated memory needs to be increased (see Section 2, IRM).
87