After the adapters are installed, the software has to be installed next. On the nodes, the hardware driver and additional kernel modules, user space libraries and the node manager have to be installed. On the frontend, the network manager and the cluster administration tool will be installed.
An additional RPM for SISCI development (SISCI-devel) will be created for both, frontend and nodes, but will not be installed. It can be installed as needed in case that SISCI-based applications or libraries (like NMPI) need to be compiled from source.
The integrated cluster and frontend installation is the default operation of SIA, but can be specified explicitly with the --install-all option. It works as follows:
The SIA is executed on the installation machine with root permissions. The installation machine is typically the machine to serve as frontend, but can be any other machine if necessary (see Section 1.2.1, “No X / GUI on Frontend”). The SIA controls the building, installation and test operations on the remote nodes via ssh. Therefore, password-less ssh to all remote nodes is required.
If password-less ssh access is not set up between the installation machine, frontend and nodes, SIA offers to set this up during the installation. The root passwords for all machines are required for this.
The binary RPMs for the nodes and the frontend are built on the kernel build machine and the frontend, respectively. The kernel build machine needs to have the kernel headers and configuration installed, while the frontend and the installation machine only compile user-space applications.
The node RPMs with the kernel modules are installed on all nodes, the kernel modules are loaded and the node manager is started. At this stage, the interconnect is not yet configured.
On an initial installation, the dishostseditor is installed and executed on the installation machine to create the cluster configuration files. This requires user interaction.
The cluster configuration files are transferred to the frontend, and the network manager is installed and started on the frontend. It will in turn configure all nodes according to the configuration files. The cluster is now ready to utilize the Dolphin Express interconnect.
A number of tests are executed to verify that the cluster is functional and to get basic performance numbers.
For other operation modes, such to install specific components on the local machine, please refer to Appendix A, Self-Installing Archive (SIA) Reference.
Log into the chosen installation machine, become root and make sure that the SIA file is stored in a directory with write access (/tmp is fine). Execute the script:
# sh DIS_install_<version>.sh
The script will ask questions to retrieve information for the installation. You will notice that all questions are Yes/no questions, and that the default answer is marked by a capital letter, which can be chosen by just pressing Enter. A typical installation looks like this:
[root@scimple tmp]# sh DIS_install_3.3.0.sh Verifying archive integrity... All good. Uncompressing Dolphin DIS 3.3.0 #* Logfile is /tmp/DIS_install.log_140 on tiger-0 #* #+ Dolphin ICS - Software installation (version: 1.52 $ of: 2007/11/09 16:31:32 $) #+ #* Installing a full cluster (nodes and frontend) . #* This script will install Dolphin Express drivers, tools and services #+ on all nodes of the cluster and on the frontend node. #+ #+ All available options of this script are shown with option '--help' # >>> OK to proceed with cluster installation? [Y/n]y # >>> Will the local machine <tiger-0> serve as frontend? [Y/n]y
The default choice is to use the local machine as frontend. If you answer n, the installer will ask you for the hostname of the designated frontend machine. Each cluster needs its own frontend machine.
Please note that the complete installation is logged to a file which is shown at the very top (here: /tmp/DIS_install.log_140). In case of installation problems, this file is very useful to Dolphin support.
#* NOTE: Cluster configuration files can be specified now, or be generated #+ ..... during the installation. # >>> Do you have a 'dishosts.conf' file that you want to use for installation? [y/N]n
Because this is the initial installation, no installed configuration files could be found. If you have prepared or received configuration files, they can be specified now by answering y. In this case, no GUI application needs to run during the installation, allowing for a shell-only installation.
For the default answer, the hostnames of the nodes need to be specified (see below), and the cluster configuration is created later on using the GUI application dishostseditor.
#* NOTE:
#+ No cluster configuration file (dishosts.conf) available.
#+ You can now specify the nodes that are attached to the Dolphin
#+ Express interconnect. The necessary configuration files can then
#+ be created based on this list of nodes.
#+
#+ Please enter hostname or IP addresses of the nodes one per line.
#* When done, enter a single full period ('.').
#+ (proposed hostname is given in [brackets])
# >>> node hostname/IP address <full period '.' when done> []tiger-1
# >>> node hostname/IP address <full period '.' when done> [tiger-2]
-> tiger-2
# >>> node hostname/IP address <full period '.' when done> [tiger-3]
-> tiger-3
# >>> node hostname/IP address <full period '.' when done> [tiger-4]
-> tiger-4
# >>> node hostname/IP address <full period '.' when done> [tiger-5]
-> tiger-5
# >>> node hostname/IP address <full period '.' when done> [tiger-6]
-> tiger-6
# >>> node hostname/IP address <full period '.' when done> [tiger-7]
-> tiger-7
# >>> node hostname/IP address <full period '.' when done> [tiger-8]
-> tiger-8
# >>> node hostname/IP address <full period '.' when done> [tiger-9]
-> tiger-9
# >>> node hostname/IP address <full period '.' when done> [tiger-10].The hostnames or IP-addresses of all nodes need to be entered. The installer suggests the hostnames if possible in brackets. To accept a suggestion, just press Enter. Otherwise, enter the hostname or IP address. The data entered is verified to represent an accessible hostname. If a node has multiple IP addresses / hostnames, make sure you specify the one that is visible for the installation machine and the frontend.
When all hostnames are entered, enter a single full period . to finish.
#* NOTE: #+ The kernel modules need to be built on a machine with the same kernel #* version and architecture of the interconnect node. By default, the first #* given interconnect node is used for this. You can specify another build #* machine now. # >>> Build kernel modules on node tiger-1 ? [Y/n]y
If you answer n at this point, you can enter the hostname of another machine on which the kernel modules are built. Make sure it matches the nodes for CPU architecture and kernel version.
# >>> Can you access all machines (local and remote) via password-less ssh? [Y/n]y
The installer will later on verify if the password-less ssh access actually works. If you answer n, the installer will set up password-less ssh for you on all nodes and the frontend. You will need to enter the root password once for each node and the password.
The password-less ssh access remain active after the installation. To disable it again, remove the file /root/.ssh/authorized_keys from all nodes and the frontend.
#* NOTE: #+ It is recommnended that interconnect nodes are rebooted after the #+ initial driver installation to ensure that large memory allocations will succeed. #+ You can omitt this reboot, or do it anytime later if necesary. # >>> Reboot all interconnect nodes (tiger-1 tiger-2 tiger-3 tiger-4 tiger-5 tiger-6 tiger-7 tiger-8 tiger-9)? [y/N]n
For optimal performance, the low-level driver needs to allocate some amount of kernel memory. This allocation can fail on a system that has been under load for a long time. If you are not installing on a live system, rebooting the nodes is therefore offered here. You can perform the reboot manually later on to achieve the same effect.
If chosen, the reboot will be performed by the installer without interrupting the installation procedure.
#* NOTE: #+ About to INSTALL Dolphin Express interconnect drivers on these nodes: ... tiger-1 ... tiger-2 ... tiger-3 ... tiger-4 ... tiger-5 ... tiger-6 ... tiger-7 ... tiger-8 ... tiger-9 #+ About to BUILD Dolphin Express interconnect drivers on this node: ... tiger-1 #+ About to install management and control services on the frontend machine: ... tiger-0 #* Installing to default target path /opt/DIS on all machines .. (or the current installation path if this is an update installation). # >>> OK to proceed? [Y/n]y
The installer presents an installation summary and asks for confirmation. If you answer n at this point, the installer will exit and the installation needs to be restarted.
#* NOTE: #+ Testing ssh-access to all cluster nodes and gathering configuration. #+ #+ If you are asked for a password, the ssh access to this node without #+ password is not working. In this case, you need to interrupt with CTRL-c #+ and restart the script answering 'no' to the intial question about ssh. ... testing ssh to tiger-1 ... testing ssh to tiger-2 ... testing ssh to tiger-3 ... testing ssh to tiger-4 ... testing ssh to tiger-5 ... testing ssh to tiger-6 ... testing ssh to tiger-7 ... testing ssh to tiger-8 ... testing ssh to tiger-9 #+ OK: ssh access is working #+ OK: nodes are homogenous #* OK: found 1 interconnect fabric(s). #* Testing ssh to other nodes ... testing ssh to tiger-1 ... testing ssh to tiger-0 ... testing ssh to tiger-0 #* OK.
The ssh-access is tested, and some basic information is gathered from the nodes to verify that the nodes are homogeneous and equipped with at least on Dolphin Express adapter and meet the other requirements. If a required RPM package was missing, it would be indicated here with the option to install it (if yum can be used), or to fix the problem manually and retry.
If the test for homogeneous nodes failes, please refer to section Section 2, “Installation of a Heterogeneous Cluster” for information on how to install the software stack.
#* Building node RPM packages on tiger-1 in /tmp/tmp.AEgiO27908 #+ This will take some minutes... #* Logfile is /tmp/DIS_install.log_983 on tiger-1 #* OK, node RPMs have been built. #* Building frontend RPM packages on scimple in /tmp/tmp.dQdwS17511 #+ This will take some minutes... #* Logfile is /tmp/DIS_install.log_607 on scimple #* OK, frontend RPMs have been built. #* Copying RPMs that have been built: /tmp/frontend_RPMS/Dolphin-NetworkAdmin-3.3.0-1.x86_64.rpm /tmp/frontend_RPMS/Dolphin-NetworkHosts-3.3.0-1.x86_64.rpm /tmp/frontend_RPMS/Dolphin-SISCI-devel-3.3.0-1.x86_64.rpm /tmp/frontend_RPMS/Dolphin-NetworkManager-3.3.0-1.x86_64.rpm /tmp/node_RPMS/Dolphin-SISCI-3.3.0-1.x86_64.rpm /tmp/node_RPMS/Dolphin-SISCI-devel-3.3.0-1.x86_64.rpm /tmp/node_RPMS/Dolphin-SCI-3.3.0-1.x86_64.rpm /tmp/node_RPMS/Dolphin-SuperSockets-3.3.0-1.x86_64.rpm
The binary RPM packages matching the nodes and frontend are built and copied to the directory from where the installer was invoked. They are placed into the subdirectories node_RPMS and frontend_RPMS for later use (see the SIA option --use-rpms).
#* To install/update the Dolphin Express services like SuperSockets, all running #+ Dolphin Express services needs to be stopped. This requires that all user #+ applications using SuperSockets (if any) need to be stopped NOW. # >>> Stop all DolpinExpress services (SuperSockets) NOW? [Y/n]y #* OK: all Dolphin Express services (if any) stopped for upgrade.
On an initial installation, there will be no user applications using SuperSockets, so you can easily answer y right away.
#* Installing node tiger-1 #* OK. #* Installing node tiger-2 #* OK. #* Installing node tiger-3 #* OK. #* Installing node tiger-4 #* OK. #* Installing node tiger-5 #* OK. #* Installing node tiger-6 #* OK. #* Installing node tiger-7 #* OK. #* Installing node tiger-8 #* OK. #* Installing node tiger-9 #* OK. #* Installing machine scimple as frontend. #* NOTE: #+ You need to create the cluster configuration files 'dishosts.conf' #+ and 'networkmanager.conf' using the graphical tool 'dishostseditor' #+ which will be launched now. #+ #+ If the interconnect cables are not yet installed, you can create #+ detailed cabling instruction within this tool (File -> Get Cabling Instructions). #+ Then install the cables while this script is waiting. # >>> Are all cables connected, and do all LEDs on the SCI adapters light green? [Y/n]
The nodes get installed and drivers and the node manager are started. Then, the basic packages are installed on the frontend, and the dishostseditor application is launched to create the required configuration files /etc/dis/dishosts.conf and /etc/dis/networkmanager.conf if they do not already exist. The script will wait at this point until the configuration files have been created with disthostseditor, and until you confirm that all cables have been connected according to the cabling instructions. This is described in the next section.
For typical problems at this point of the installation, please refer to Chapter 11, FAQ.
dishostseditor is a GUI tool that helps gathering the cluster configuration (and is used to create the cluster configuration file /etc/dis/dishosts.conf and the network manager configuration file /etc/dis/networkmanager.conf). A few global interconnect properties need to be set, and the position of each node within the interconnect topology needs to be specified.
When dishostseditor is launched, it first displays a dialog box where the global interconnect properties need to be specified (see Figure 4.1, “Cluster Edit dialog of dishostseditor”).
The dialog will let you enter the selected topology information (number of nodes in X-, Y- and Z- dimension) according to the topology type you selected. The product of all nodes in every dimension needs to be equal (for regular topologies) or less (for irregular topology variants).
The number of fabrics needs to be set to the minimum number of adapters in every node.
The topology settings should already be correct by default if dishostseditor is launched by the installation script. If the cables are not yet mounted (which is the recommended way of doing it), you simply choose the settings that matches the way you plan to install.
However, if the cables are already in place, it is critical to verify that the actual cable installation matches the dimensions shown here if you install a cluster with a 2D- or 3D-torus interconnect topology. I.e., a 12 node cluster can be set up as 3 by 4 or 4 by 3 or even 2 by 6, the setup script cannot verify that the cabling matches the dimensions that you selected. Remember that link 0 on the adapter boards (the one where the plug is right on the PCB of the adapter board) is mapped to the X-dimension, and link 1 on the adapter board (the one where the plug is on the piggy-back board) is mapped to the Y-dimension.
If your cluster operates within its own subnet and you want all nodes within this subnet to use SuperSockets (having Dolphin Express installed), you can simplify the configuration by specifying the address of this subnet in this dialog. To do so, activate the Network Address field and enter the cluster IP subnet address including the mask. I.e., if all your node communicate via an IP interface with the address 192.168.4.*, you would enter 192.168.4.0/8 here.
SuperSockets will try to use the Dolphin Express for any node in this subnet when it connects to another node of this subnet. If using Dolphin Express is not possible, i.e. because one or both nodes are only equipped with an Ethernet interface, SuperSockets will automatically fall back to Ethernet. Also, if a node gets assigned a new IP address within this subnet, you don't need to change the SuperSockets configuration. Assigning more than one subnet to SuperSockets is also possible, but this type of configuration is not yet supported by dishostseditor. See section ??? on how to edit dishosts.conf accordingly.
This type of configuration is required if the same node can be assigned varying IP addresses over time, as it is done for fail-over purposes where one machine takes over the identity of a machine that has failed. For standard setups where the assignment of IP addresses to nodes is static, it is recommended to not use this type of configuration, but instead use the default static SuperSockets configuration type.
In case you want to be informed on any change of the interconnect status (i.e. an interconnect link was disabled due to errors, or a node has gone down and the interconnect traffic was rerouted), active the checkbox Alert target and enter the alert target and the alert script to be executed. The default alert script is alert.sh and will send an e-mail to the address specified as alert target.
Other alert scripts can be created and used, which may require another type of alert target (i.e. a cell phone number to send an SMS). For more information on using status notification, please refer to Chapter 10, Advanced Topics,Section 1, “Notification on Interconnect Status Changes”.
In the next step, the main pane of the dishostseditor will present the nodes in the cluster arranged in the topology that was selected in the previous dialog. To change this topology and other general interconnect settings, you can always click in the Cluster Configuration area which will bring up the Cluster Edit dialog again.
If the font settings of your X server cause dishostseditor to print unreadable characters, you can change the font size and the type with the drop-down box at the top of the windows, next to the floppy disk icon.
At this point, you need to arrange the nodes (marked by their hostnames) such that the placement of each node in the torus as shown by dishostseditor matches its placement in the physical torus. You do this by assigning the correct hostname for each node by double-clicking its node icon which will open the configuration dialog of this node. In this dialog, select the correct machine name, which is the hostname as seen from the frontend, from the drop-down list. You can also type a hostname if a hostname that you specified during the installation was wrong.
After you have assigned the correct hostname to this machine, you may need to configure SuperSockets on this node. If you selected the Network Address in the cluster configuration dialog (see above), then SuperSockets will use this subnet address and will not allow for editing this property on the nodes. Otherwise, you can choose between 3 different options for each of the currently supported 2 SuperSockets-accelerated IP interfaces per node:
Do not use SuperSockets. If you set this option for both fields, SuperSockets can not be used with this node, although the related kernel modules will still be loaded.
Enter the hostname or IP address for which SuperSockets should be used. This hostname or IP address will be statically assigned to this physical node (its Dolphin Express interconnect adapter).
Choosing a static socket means that the mapping between the node (its adapters) and the specified hostname/IP address is static and will be specified within the configuration file dishosts.conf. All nodes will use this identical file (which is automatically distributed from the frontend to the nodes by the network manager) to perform this mapping.
This option works fine if the nodes in your cluster don't change their IP addresses over time and is recommend as it does not incur any name resolution overhead.
Enter the hostname or IP address for which SuperSockets should be used. This hostname or IP address will be dynamically resolved to the Dolphin Express interconnect adapter that is installed in the machine with this hostname/IP address. SuperSockets will therefore resolve the mapping between adapters and hostnames/IP addresses dynamically. This incurs a certain initial overhead when the first connection between two nodes is set up and in some other specific cases.
This option is similar to using a subnet (see Section 3.3.1., “SuperSockets Network Address”), but resolves only the explicitly specified IP addresses (for all nodes) and not all possible IP addresses of a subnet. Use this option if nodes change their IP addresses or node identities move between physical machines, i.e. in a fail-over setup.
You should now generate the cabling instructions for your cluster. Please do this also when the cables are actually installed: you really want to verify if the actual cable setup matches the topology you just specified. To create the cabling instruction, choose the menu item . You can save and/or print the instructions. It is a good idea to print the instructions so you can take them with you to the cluster.
If the cables are already connected, please proceed with section Section 3.4.2, “Verifying the Cabling”.
In order to achieve a trouble-free operation of your cluster, setting up the cables correctly is critical. Please take your time to perform this task properly.
The cables can be installed while nodes are powered up. The setup script will wait with a question for you to continue:
# >>> Are all cables connected, and do all LEDs on the SCI adapters ligtht green? [Y/n]
Please proceed by connecting the nodes as described by the cabling instructions generated by the dishostseditor. The cabling instructions refer to link 0 and link 1 if you are using D352 adapters (for 2D-torus topology), and channel A and channel B in case of D350 adapters being used (for dual-channel operation). Each of the two links/channels will form an independent ring with its adjacent adapters, and thus has an IN and OUT connector to connect to these adjacent adapters. It is critical that you correctly locate the different links/channels on the back of the card, and the IN and OUT connectors.
For D352 (D350), link 0 (channel A) is formed by the connectors that are directly connected to the PCB (printed circuit board) of the adapter, while the connectors for link 1 (channel B) are located on the piggy-back board. This is illustrated in Figure 4.4, “Location of link 0 and link 1 on D352 adapter”and Figure 4.5, “Location of channel A and channel B on D350 adapter”, respectively. For both links (channels), the IN connectors are located at the lower end of the adapter, and the OUT connectors at the top of the adapter. The D351 adapter has only a single link (0), and the same location of IN and OUT connectors.
Please consider the hints below for connecting the cables:
Never apply force:
The plugs of the cable will move into the sockets easily. Make sure the orientation is correct.
The cables have a minimum bend diameter of 5cm.
This specification applies to black All-Best cables (part number D706), but not to the grey CHH cables (part number D707). With the CHH cables, the minimum bend diameter is 10cm.
Fasten evenly. When fastening the screws of the plugs, make sure you fasten both lightly before tightening them. Do not tighten only one screw of the plug, and then the other one, as this is likely to tilt the plug within the connector.
Fasten gently. Use a torque screw driver if possible, and apply a maximum of 0.4 Nm. As a rule of thumb: do not apply more torque with the screw driver than you possibly could using only your finger (if there was enough space to grip the screw).
Observe LEDs: When an adapter has both input and output of a link connected to it's neighboring adapter, the LED should turn green and emit a steady light (not blinking).
Don't mix up links: When using a 2D-torus topology, it is important not to connect link 0 of one adapter with link 1 of another adapter. As decribed above, link 0 is the left pair of connectors on the Dolphin Express SCI interconnect adapter when the adapter is placed in a vertical position. In order to determine a left side this you may hold the Dolphin Express interconnect adapter in a vertical position:
the blue "O" (indicating the OUT port) should be located at the top.
LEDs are also placed on the top of the adapter
the PCI/PCI-X/PCI-Express bus connector is mounted on the lower side of the adapter
The left pair of connectors on the Dolphin Express interconnect adapter is what we refer to as Link 0. Link 1 is the right pair of connectors on the Dolphin Express interconnect adapter when the adapter is placed in a vertical position.
If the links have been mixed up, the LED will still turn green, but packet routing will fail. The cabling test of sciadmin will reveal such cabling errors.
A green link LED indicates that the link between the output plug and input plug could be established and synchronized. It does not assure that the cable is actually placed correctly! It is therefore important to verify once more that the cables are plugged according to the cabling instructions generated by the dishostseditor!
If a pair of LEDs do not turn green, please perform the following steps:
Disconnect the cables. Make sure you connect an Output with an Input plug. Re-insert and fasten the plug according to the guidelines above.
If the LEDs still do not turn green, use a different cable.
If the LEDs still do not turn green, swap the cable of the problematic connection with a working one and observe if the problem moves with the cable.
Power-cycle the nodes with the orange LEDs according to Chapter 11, FAQ,Q: 1.1.1.
Contact Dolphin support if you can not make the LEDs turn green after trying all proposed measures.
When you are done connecting the cables, all LEDs have turned green and you have verified the connections, you can answer "Yes" to the question "Are all cables connected, and do all LEDs on the adapters ligtht green? " and proceed with the next section to finalize the software installation.
Once the cables are connected, no more user interaction is required. Please confirm that all cables are connected and all LEDs are green, and the installation will proceed. The network manager will be started on the frontend, configuring all cluster nodes according to the configuration specified in dishosts.conf. After this, a number of tests are run on the cluster to verify that the interconnect was set up correctly and delivers the expected performance. You will see output like this:
#* NOTE: checking for cluster configuration to take effect: ... node tiger-1: ... node tiger-2: ... node tiger-3: ... node tiger-4: ... node tiger-5: ... node tiger-6: ... node tiger-7: ... node tiger-8: ... node tiger-9: #* OK. #* Installing remaining frontend packages #* NOTE: #+ To compile SISCI applications (like NMPI), the SISCI-devel RPM needs to be #+ installed. It is located in the frontend_RPMS and node_RPMS directories. #* OK.
If no problems are reported (like in the example above), you are done with the installation and can start to use your Dolphin Express accelerated cluster. Otherwise, refer to the next subsections and Section 3.7, “Interconnect Validation using the management GUI” to learn about the individual tests and how to fix problems reported by each test.
The Static Connectivity Test verifies that links are up and all nodes can see each other via the interconnect. Success in this test means that all adapters have been configured correctly, and that the cables are inserted properly. It should report TEST RESULT: *PASSED* for all nodes:
#* NOTE: Testing static interconnect connectivity between nodes. ... node tiger-1: TEST RESULT: *PASSED* ... node tiger-2: TEST RESULT: *PASSED* ... node tiger-3: TEST RESULT: *PASSED* ... node tiger-4: TEST RESULT: *PASSED* ... node tiger-5: TEST RESULT: *PASSED* ... node tiger-6: TEST RESULT: *PASSED* ... node tiger-7: TEST RESULT: *PASSED* ... node tiger-8: TEST RESULT: *PASSED* ... node tiger-9: TEST RESULT: *PASSED*
If this test reports errors or warning, you are offered to re-run dishostseditor to validate and possibly fix the interconnect configuration. If the problems persist, you should let the installer continue and analyse the problems using sciadmin after the installation finishes (see Section 3.7, “Interconnect Validation using the management GUI”).
The SuperSockets Configuration Test verifies that all nodes have the same valid SuperSockets configuration (as shown by /proc/net/af_sci/socket_maps).
#* NOTE: Verifying SuperSockets configuration on all nodes. #+ No SuperSocket configuration problems found.
Success in this test means that the SuperSockets service dis_supersockets is running and is configured identically on all nodes. If a failure is reported, it means the the interconnect configuration did not propagate correctly to this node. You should check if the dis_nodemgr service is running on this node. If not, start it, wait for a minute, and then configure SuperSockets by calling dis_ssocks_cfg.
The SuperSockets Performance Test runs a simple socket benchmark between two of the nodes. The benchmark is run once via Ethernet and once via SuperSockets, and performance is reported for both cases.
#* NOTE: #+ Verifying SuperSockets performance for tiger-2 (testing via tiger-1). #+ Checking Ethernet performance ... single-byte latency: 56.63 us #+ Checking Dolphin Express SuperSockets performance ... single-byte latency: 3.00 us ... Latency rating: Very good. SuperSockets are working well. #+ SuperSockets performance tests done.
The SuperSockets latency is rated based on our platform validation experience. If the rating indicates that SuperSockets are not performing as expected, or if it shows that a fallback to Ethernet has occurred, please contact Dolphin Support. In this case, it is important that you supply the installation log (see above).
The installation finishes with the option to start the administration GUI tool sciadmin, a hint to use LD_PRELOAD to make use of SuperSockets and a pointer to the binary RPMs that have been used for the installation.
#* OK: Cluster installation completed. #+ Remember to use LD_PRELOAD=libksupersockets.so for all applications that #+ should use Dolphin Express SuperSockets. # >>> Do you want to start the GUI tool for interconnect adminstration (sciadmin)? [y/N]n #* RPM packages that were used for installation are stored in #+ /tmp/node_PRMS and /tmp/frontend_PRMS.
If for some reason the installation was not successful, you can easily and safely repeat it by simply invoking the SIA again. Please consider:
By default, existing RPM packages of the same or even more recent version will not be replaced. To enforce re-installation with the version provided by the SIA, you need to specify --enforce.
To avoid that the binary RPMs are built again, use the option --use-rpms or simply run the SIA in the same directory as before where it can find the RPMs in the node_RPMS and frontend_RPMS subdirectories.
To start an installation from scratch, you can run the SIA on each node and the frontend using the option --wipe to remove all traces of the Dolphin Express software stack and start again.
If you still fail to install the software successfully, you should contact Dolphin support. Please provide all installation logfiles. Every installation attempt creates a differently named logfile; it's name is printed at the very beginning of the installation. Please also include the configuration files that can be found in /etc/dis on the frontend.
Dolphin provides a graphical tool named sciadmin. sciadmin serves as a single-point-of-control and manage the Dolphin Express interconnect in your cluster. It shows an overview of the status of all adapters and links of a cluster and allows to perform detailed status queries. It also provides means to manually control the interconnect, inspect and set options and perform interconnect tests. For a complete description of sciadmin, please refer to Appendix B, sciadmin Reference. Here, we will only describe how to use sciadmin to verify the newly installed Dolphin Express interconnect.
sciadmin had been installed on the frontend machine by the SIA if this machine is capable to run X applications and has the Qt toolkit installed. If the frontend does not have these capabilities, you can install it on any other machine that has these capabilities using SIA with the --install-frontend option, or use the Dolphin-NetworkAdmin RPM package from the frontend_RPMS directory (this RPM will only be there if it could be build for the frontend).
It is also possible to download a binary version for Windows that runs without the need for extra compilation or installation.
You can use sciadmin on any machine that can connect to the network manager on the frontend via a standard TCP/IP socket. You have to make sure that connections towards the frontend using the ports 3444 (network manager) and 3443 (node manager) are possible (potentially firewall settings need to be changed).
sciadmin will be installed in the sbin directory of the installation path (default: /opt/DIS/sbin/). It will be within the sciadminPATH after you login as root, but can also be run by non-root users. After it has been started, you will need to connect to the network manager controlling your cluster. Click the button in the tool bar and enter the appropriate hostname or IP address of the network manager. sciadmin will present you a graphical representation of the cluster nodes and the interconnect links between them.
Normally, all nodes and interconnect links should be shown green, meaning that their status is OK. This is a requirement for a correctly installed and configured cluster and you may proceed to Section 3.7.4, “Cabling Correctness Test”.
If a node is plotted red, it means that the network manager can not connect to the node manager on this node. To solve this problem:
Make sure that the node is powered and has booted the operating system.
Verify that the node manager service is running:
On Red Hat:
# service dis_nodemgr status
On other Linux variants:
# /etc/init.d/dis_nodemgr status
should tell you that the node manager is running. If this is not the case:
Try to start the node manager:
On Red Hat:
# service dis_nodemgr start
On other Linux variants:
# /etc/init.d/dis_nodemgr start
should tell you that the node manager has started successfully.
If the node manager fails to start, please see /var/log/dis_nodemgr.log
Make sure that the service is configured to start in the correct runlevel (Dolphin installation makes sure this is the case).
On Red Hat:
# chkconfig --add 2345 dis_nodemgr on
On other Linux variants, please refer to the system documentation to determine the required steps.
sciadmin can validate that all cables are connected according to the configuration that was specified in the dishostseditor, and which is now stored in /etc/dis/dishosts.conf on all nodes and the frontend. To perform the cable test, select . This Cabling Correctness Test runs for only a few seconds and will verify that the nodes are cabled according to the configuration provided by the dishostseditor.
Running this test will stop the normal traffic over the interconnect as the routing needs to be changed. If you run this test while your cluster is in production, you might experience communication timeouts. SuperSockets in operation will fall back to Ethernet during this test, which also leads to increased communication delays.
If the test detects a problem, it will inform you that node A can not communicate with node B although they are supposed to be within the same ringlet. You will typically get more than one error message in case of a cabling problem, as such a problem does in most cases affect more than one pair of nodes. Please proceed as follows:
Try to fix the first reported problem by tracing the cable connections from node A to node B:
Verify that the cable connections are placed within one ringlet:
Look up the path of cable connections between node A and node B in the Cabling Instructions that you created (or still can create at this point) using dishostseditor.
When you arrive at node B, do the same check for the path back from node B to node A.
Along the path, make sure:
That each cable plug is securely fitted into the socket of the adapter.
Each cable plug is connected to the right link (0 or 1) as indicated by the cabling instructions.
If you can't find a problem for the first problem reported, verify the cable connections for all following pairs of node reported bad.
After the first change, re-run the cable test to verify if this change solves all problems. If this is not the case, start over with this verification loop.
The Cable Correctness Test performs only minimal communication between two nodes to determine the functionality of the fabric between them. To verify the actual signal quality of the interconnect fabric, a more intense test is required. Such a Fabric Quality Test can be started for each installed interconnect fabric (0 or 1) from within sciadmin via .
Running this test will stop the normal traffic over the interconnect as the routing needs to be changed. If you run this test while your cluster is in production, you might experience communication timeouts. SuperSockets in operation will fall back to a second fabric (if installed) or to Ethernet during this test, which also leads to increased communication delays.
This test will run for a few minutes, depending on the size of your cluster, as it tests communication for about 20 seconds between each pair of nodes within the same ring. This means, for a 4 by 4 2D-torus cluster which features 8 rings with 4 nodes each, it will take 8 * ( 3 + 2 +1) * 20 seconds = 16 minutes. It will then report if any errors or other problems have occurred between any pairs of nodes.
Any communication errors reported here are either corrected automatically by retrying a data transfer, or are reported. Thus, a communication error does not mean data might get lost. However, every communication error reduces the performances, and an optimally set up Dolphin Express interconnect should not show any communication errors.
A small number of communication errors is acceptable, though. Please contact Dolphin support if in doubt.
If the test reports communication errors, please proceed as follows:
If errors are reported between multiple pairs of nodes, locate the pair of nodes which is located most closely (has the smallest number of cable connections between them). Normally, if any errors are reported, a pair of nodes located next to each other will show up.
Check the cable connection on the shortest path between these two nodes (a single cable, if nodes are located next to each other) for being properly mounted:
No excessive stress on the cable, like bending it to sharply or too much force on the plugs.
Cable plugs need to be placed in the connectors on the adapters evenly (not tilted) and securely fastened. If in doubt, unplug cable and re-fasten it.
Perform the previous check for all other node pairs; then re-run the test.
If communication errors persist, change cables to locate a possibly damaged cable:
Exchange the cables between the most close pair of nodes one-by-one with a cable of a connection for which no errors have been reported. Remember (note down) which cables you exchanged.
Run the Fabric Quality Test after each cable exchange.
If the communication errors move with the cable you just exchanged, then this cable might be damaged. Please contact your sales representative for exchange.
If the communication error remains unchanged, the problem might be with one of the adapters. Please contact Dolphin support for further analysis.
After the Dolphin Express hard- and software has been installed and tested, you will want your cluster application to make use of the increased performance.
All applications that use generic BSD sockets for communication will be accelerated by SuperSockets. No configuration change is required for the application as the same host names/IP v4 addresses can be used. All relevant socket types are supported by SuperSockets: TCP stream sockets as well as UDP and RDS datagram sockets.
SuperSockets will use the Dolphin Express interconnect for low-latency, high-bandwidth communication inside the cluster, and will transparently fall back to Ethernet when connecting to nodes outside the cluster.
To make an application use SuperSockets, you need to preload a dynamic library on application start. This can be achieved by two means as described in the next two sections.
To let generic socket applications use SuperSockets, you just need to run them via the wrapper script dis_ssocks_run that sets the LD_PRELOAD environment variable accordingly. This script is installed to the bin directory of the installation (default is /opt/DIS/bin) which is added to the default PATH environment variable.
To have i.e. the socket benchmark netperf run via SuperSockets, start the server process on node server_name like
dis_ssocks_run netperf
and the client process on any other node in the cluster like
dis_ssocks_run netperf -h server_name
As an alternative to using this wrapper script, you can also make sure to set LD_PRELOAD correctly to preload the SuperSockets library, i.e. for sh-style shells such as bash:
export LD_PRELOAD=libksupersockets.so
If the applications you are using do not show increased performance, please verify that they use SuperSockets as follows:
To verify that the preloading works, use the ldd command on any executable, i.e. the netperf binary mentioned above:
$ export LD_PRELOAD=libksupersockets.so
$ ldd netperf
libksupersockets.so => /opt/DIS/lib64/libksupersockets.so (0x0000002a95577000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x00000033ed300000)
libc.so.6 => /lib64/tls/libc.so.6 (0x00000033ec800000)
libdl.so.2 => /lib64/libdl.so.2 (0x00000033ecb00000)
/lib64/ld-linux-x86-64.so.2 (0x00000033ec600000)
The library libksupersockets.so has to be listed at the top position. If this is not the case, make sure the library file actually exists. The default locations are /opt/DIS/lib/libksupersockets.so and /opt/DIS/lib64/libksupersockets.so on 64-bit platforms, and libksupersockets.so actually is a symbolic link on a library with the same name and a version suffix:
$ ls -lR /opt/DIS/lib*/*ksupersockets* -rw-r--r-- 1 root root 29498 Nov 14 12:43 /opt/DIS/lib64/libksupersockets.a -rw-r--r-- 1 root root 901 Nov 14 12:43 /opt/DIS/lib64/libksupersockets.la lrwxrwxrwx 1 root root 25 Nov 14 12:50 /opt/DIS/lib64/libksupersockets.so -> libksupersockets.so.3.3.0 lrwxrwxrwx 1 root root 25 Nov 14 12:50 /opt/DIS/lib64/libksupersockets.so.3 -> libksupersockets.so.3.3.0 -rw-r--r-- 1 root root 65160 Nov 14 12:43 /opt/DIS/lib64/libksupersockets.so.3.3.0 -rw-r--r-- 1 root root 19746 Nov 14 12:43 /opt/DIS/lib/libksupersockets.a -rw-r--r-- 1 root root 899 Nov 14 12:43 /opt/DIS/lib/libksupersockets.la lrwxrwxrwx 1 root root 25 Nov 14 12:50 /opt/DIS/lib/libksupersockets.so -> libksupersockets.so.3.3.0 lrwxrwxrwx 1 root root 25 Nov 14 12:50 /opt/DIS/lib/libksupersockets.so.3 -> libksupersockets.so.3.3.0 -rw-r--r-- 1 root root 48731 Nov 14 12:43 /opt/DIS/lib/libksupersockets.so.3.3.0
Also, make sure that the dynamic linker is configured to find it in this place. The dynamic linker is configured accordingly on installation of the RPM; if you did not install via RPM, you need to configure the dynamic linker manually. To verify that the dynamic linking is the problem, set LD_LIBRARY_PATH to include the path to libksupersockets.so and verify again with ldd:
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/DIS/lib:/opt/DIS/lib64 $ echo $LD_PRELOAD libksupersockets.so $ ldd netperf ....
A better solution than setting LD_LIBRARY_PATH is to configure the dynamic linker ld to include these directories in its search path. Use man ldconfig to learn how to achieve this.
You need to make sure that the preloading of the SuperSockets library described above is effective on both nodes, for both applications that should communicate via SuperSockets.
Make sure that the SuperSockets kernel module (and the kernel modules it depends on) are loaded and configured correctly on both nodes.
Check the status of all Dolphin kernel modules via the dis_services script (defaut location /opt/DIS/sbin):
# dis_services status Dolphin IRM 3.3.0 ( November 13th 2007 ) is running. Dolphin Node Manager is running (pid 3172). Dolphin SISCI 3.3.0 ( November 13th 2007 ) is running. Dolphin SuperSockets 3.3.0 "St.Martin", Nov 7th 2007 (built Nov 14 2007) running.
At least the services dis_irm and dis_supersockets need to be running, and you should not see a message about SuperSockets not being configured.
Verify the configuration of the SuperSockets to make sure that all cluster nodes will connect and communicate via SuperSockets. The active configuration is shown in /proc/net/af_sci/socket_maps:
# cat /proc/net/af_sci/socket_maps IP/net Adapter NodeId List ----------------------------------------------- 172.16.5.1/32 0x0000 4 0 0 172.16.5.2/32 0x0000 8 0 0 172.16.5.3/32 0x0000 68 0 0 172.16.5.4/32 0x0000 72 0 0
Depending on the configuration variant you used to set up SuperSockets, the content of this file may look different, but it must never be empty and should be identical on all nodes. The examle above shows a four-node cluster with a single fabric and a static SuperSockets configuration, which will accelerate one socket interface per node.
For more information on the configuration of SuperSockets, please refer to ???.
Make sure that the host names/IP addresses used effectively by the application are the ones that are configured for SuperSockets, especially if the nodes have multiple Ethernet interfaces configured.
Check the system log for messages of the SuperSockets kernel module. It will report problems all problems, i.e. when running out of resources.
# cat /var/log/messages | grep dis_ssocks
It is a good idea to monitor the system log while you try to connect to a remote node if you suspect problems being reported there:
# tail -f /var/log/messages
For an explanation of typical error messages, please refer to Section 2, “Software”.
Don't forget to check if the port numbers used by this application, or the application itself have been explicitly been exclued from using SuperSockets. By default, only the system port numbers below 1024 are excluded from using SuperSockets, but you should verify the current configuration (see Section 1, “SuperSockets Configuration”).
If you can't solve the problem, please contact Dolphin Support.
SuperSockets can also be used to accelerate kernel services that communicate via sockets. However, such services need to be adapted to actually use SuperSockets (a minor modification to make them use a different address family when opening new sockets).
If you are interested in accelerated kernel services like iSCSI, GNBD or others, please contact Dolphin Support.