3. Software and Cable Installation

After the adapters are installed, the software has to be installed next. On the nodes, the hardware driver and additional kernel modules, user space libraries and the node manager have to be installed. On the frontend, the network manager and the cluster administration tool will be installed.

3.1. Running the Dolphin Installer

The software installation on windows is plug and play with the windows installer package (*.msi). The windows installer is executed on all nodes in the system, either in paralell or one by one. The same installer image is used for all the Dolphin software componentents.

Start the software installation by double-clicking the installer image.

Figure 4.1. Windows Installer: Icon

Windows Installer: Icon


You will be met by a welcome message.

Figure 4.2. Windows Installer: Welcome Message

Windows Installer: Welcome Message


Click Next to select which components to install.

Figure 4.3. Windows Installer: Setup Type

Windows Installer: Setup Type


Choose your setup type.

  • Graphical installation can be used to.

    • Generate configuration files on a machine away from the running cluster.

    • Monitoring the Dolphin Admin GUI on a loosely connected machine.

  • Management selection installs the centrialized management tools. You may run the centralzied management daemon, Dolphin Network Manager on one of the nodes in the cluster, or on a seperate node that does not require Dolphin hardware.

  • Compute node installs drivers, the node manager daemon and other software that runs on the Dolphin Adapters.

  • Custom installation is typically used if you want to install more than one Dolphin Software module, e.g both Compute and Management on a single node, if you want to de-select some components, or have other special requirements.

Figure 4.4. Windows Installer: Custom Setup

Windows Installer: Custom Setup


Select the features that match your installation an click next.

Figure 4.5. Windows Installer: Installing Software

Windows Installer: Installing Software


Since installing new hardware automatically generetes some pop-ups that we do not need in the Dolphin installtion you are kindly asked to Cancel these.

Figure 4.6. Windows Installer: Cancel Wizard

Windows Installer: Cancel Wizard


We ask you to trust our software even if we are part of the Windows logo program. Select Continue Anyway.

Figure 4.7. Windows Installer: Continue Anyway

Windows Installer: Continue Anyway


If your machine has a firewall installed, then you need to open up connections between the modules in the Dolphin Admin (node manager, network manager and admin).

Figure 4.8. Windows Installer: Configure Firewall

Windows Installer: Configure Firewall


Click Next to complete the installation.

Figure 4.9. Windows Installer: Completed

Windows Installer: Completed


Figure 4.10. Windows Installer: Network Connection

Windows Installer: Network Connection


If the installation was successful the Dolphin Adapters will appear in Control Panel -> Administrative Tools -> Device Manager.

Figure 4.11. Windows Installer: Conputer Management

Windows Installer: Conputer Management


The Dolphin Network Configurator, dis_netconfig is only opened if the Management or Graphical tools are selected. Typically the Dolphin Network Manager, dis_networkmgr, is started when the you have exited the dis_netconfig.

Whenever you uninstall and install again, the Device Manager needs a rescan of the devices before the new installation. You don't have to reboot the machine.

Figure 4.12. Rescan devices

Rescan devices


3.2. Commandline installation

It is possible to run the Dolphin windows installer package from the commandline as well. The same installer image is used for all Dolphin software components. When you install from the commandline you use the msiexec command.

The commandline switch to msiexec is /i for install. If you also use the /ADDLOCAL switch you may select specific modules that you want to install. Running without the /ADDLOCAL switch installs all modules. These are the modules that you may preselect:

  • Drivers

  • Winsock2

  • Manager

  • Graphical

  • Demos

  • Development

  • NMPI

  • ExpressWay

Another useful switch combination to msiexec is /qb which lets you run without opening the windows installer in Section 3.1, “Running the Dolphin Installer”. Run msiexec without options for documentation.

Figure 4.13. msiexec helpmeu

msiexec helpmeu


3.2.1. Install Standard

A rich compute node installation would typically contain the components:

  • Drivers

  • Winsock2

  • Demos

  • Development

  • NMPI

  • ExpressWay

C:\Documents and Settings\tester\Desktop>msiexec /i Dolphin_DX_WinXP_X86_chk_Ver_3.2.1_SCOPE.msi ADDLOCAL=Drivers,Winsock2,Graphical,Demos,Development,ExpressWay /qb

The installation is not 100% silent. You need to perform the step described in Figure 4.7, “Windows Installer: Continue Anyway”.

A rich frontend node user would typically select all packages:

C:\Documents and Settings\tester\Desktop>msiexec /i Dolphin_DX_WinXP_X86_chk_Ver_3.2.1.msi /qb

The installation is not 100% silent. You need to perform the step described in Figure 4.7, “Windows Installer: Continue Anyway” and Figure 4.6, “Windows Installer: Cancel Wizard”.

If you use the commandline tools to install the Manager component. The you need to start the manager manually:

C:\Documents and Settings\tester\Desktop>sc start dis_networkmgr

SERVICE_NAME: dis_networkmgr
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 4  RUNNING
                                (STOPPABLE,NOT_PAUSABLE,ACCEPTS_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0
        PID                : 3284
        FLAGS              :

C:\Documents and Settings\tester\Desktop>

3.2.2. Manager

If you use the commandline tools to install the Manager component. The you need to start the manager manually:

C:\Documents and Settings\tester\Desktop>sc start dis_networkmgr

SERVICE_NAME: dis_networkmgr
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 4  RUNNING
                                (STOPPABLE,NOT_PAUSABLE,ACCEPTS_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0
        PID                : 3284
        FLAGS              :

C:\Documents and Settings\tester\Desktop>

3.2.3. Uninstall

Whenever you uninstall and install again, the Device Manager needs a rescan,Figure 4.12, “Rescan devices”, of the devices before the new installation. You don't have to reboot the machine.If you to want to uninstall you may use the /unsinstall switch:

C:\Documents and Settings\tester\Desktop>msiexec /uninstall Dolphin_DX_WinXP_X86_chk_Ver_3.2.1_SCOPE.msi /qb 

3.3. Post installation

The nodes get installed and drivers and the node manager are started. Then, the basic packages are installed on the frontend, and the dis_netconfig application is launched to create the required configuration files c:\WINDOWS\system32\drvers\etc\dis\dishosts.conf and c:\WINDOWS\system32\drvers\etc\dis\networkmanager.conf if they do not already exist.

For typical problems at this point of the installation, please refer to Chapter 7, FAQ.

3.4. Working with the Dolphin Network Configurator, dis_netconfig

The Dolphin Network Configurator, dis_netconfig is a GUI tool that helps gathering the cluster configuration (and is used to create the cluster configuration file c:\WINDOWS\system32\drvers\etc\dis\dishosts.conf and the network manager configuration file c:\WINDOWS\system32\drvers\etc\dis\networkmanager.conf). A few global interconnect properties need to be set, and the position of each node within the interconnect topology needs to be specified.

3.4.1. Cluster Edit

When dis_netconfig is launched, it first displays a dialog box where the global interconnect properties need to be specified (see Figure 4.14, “Cluster Edit dialog of dis_netconfig”).

Figure 4.14. Cluster Edit dialog of dis_netconfig

Cluster Edit dialog of dis_netconfig

3.4.1.1. Interconnect Topology

In the upper half of the Cluster Edit dialog, you need to specify the interconnect topology that you will be using with your cluster. If dis_netconfig is launched by the installation script, the script tries to set these values correctly, but you need to verifiy the settings.

First, select the Topology of your cluster: either you use a single DXS switch for 2-10 nodes, two connected DXS switches for up to 16 nodes, or 2 or 3 nodes with direct connection. Then, specify the Number of nodes in your cluster.

The Number of fabrics needs to be set to the minimum number of adapters in every node (typically, this value is 1).

The Socketadapter setting determines which of the available adapter is used for SuperSockets:

  • SINGLE 0: only adapter 0 is used

  • SINGLE 1: only adapter 1 is used (only valid for more than one fabric)

  • Channel Bonding: SuperSockets distributes the traffic across both adapters 0 and 1 (only valid for more than one fabric)

  • NONE: SuperSockets should not be used.

You then need to Set Link Widths for each node. This can be either x4 (connected with a single cable) or x8 (connected with two cables). A mix of x4 and x8 within one cluster is possible.

The Advanced Edit option does not need to be changed: the session between the nodes should typically always be set up automatically.

3.4.2. Node Arrangement

In the next step, the main pane of the dis_netconfig will present the nodes in the cluster arranged in the topology that was selected in the previous dialog. To change this topology and other general interconnect settings, you can always click Edit in the Cluster Configuration area which will bring up the Cluster Edit dialog again.

If the font settings of your X server cause dis_netconfig to print unreadable characters, you can change the font size and the type with the drop-down box at the top of the windows, next to the floppy disk icon.

Figure 4.15. Main dialog of dis_netconfig

Main dialog of dis_netconfig

At this point, you need to arrange the nodes (marked by their hostnames) such that the placement of each node in the torus as shown by dis_netconfig matches its placement in the physical torus. You do this by assigning the correct hostname for each node by double-clicking its node icon which will open the configuration dialog of this node. In this dialog, select the correct machine name, which is the hostname as seen from the frontend, from the drop-down list. You can also type a hostname if a hostname that you specified during the installation was wrong.

Figure 4.16. Node dialog of dis_netconfig

Node dialog of dis_netconfig

In the node dialog you specify if you want to use 4 or 8 PCI Express lanes.

After you have assigned the correct hostname to this machine, you may need to configure SuperSockets on this node. If you selected the Network Address in the cluster configuration dialog (see above), then SuperSockets will use this subnet address and will not allow for editing this property on the nodes. Otherwise, you can choose between 3 different options for each of the currently supported 2 SuperSockets-accelerated IP interfaces per node:

disable

Do not use SuperSockets. If you set this option for both fields, SuperSockets can not be used with this node, although the related kernel modules will still be loaded.

static

Enter the hostname or IP address for which SuperSockets should be used. This hostname or IP address will be statically assigned to this physical node (its Dolphin Express interconnect adapter).

Choosing a static socket means that the mapping between the node (its adapters) and the specified hostname/IP address is static and will be specified within the configuration file dishosts.conf. All nodes will use this identical file (which is automatically distributed from the frontend to the nodes by the network manager) to perform this mapping.

This option works fine if the nodes in your cluster don't change their IP addresses over time and is recommend as it does not incur any name resolution overhead.

dynamic

Enter the hostname or IP address for which SuperSockets should be used. This hostname or IP address will be dynamically resolved to the Dolphin Express interconnect adapter that is installed in the machine with this hostname/IP address. SuperSockets will therefore resolve the mapping between adapters and hostnames/IP addresses dynamically. This incurs a certain initial overhead when the first connection between two nodes is set up and in some other specific cases.

This option is similar to using a subnet but resolves only the explicitly specified IP addresses (for all nodes) and not all possible IP addresses of a subnet. Use this option if nodes change their IP addresses or node identities move between physical machines, i.e. in a fail-over setup.

3.4.3. Cabling Instructions

You should now generate the cabling instructions for your cluster. Please do this also when the cables are actually installed: you really want to verify if the actual cable setup matches the topology you just specified. To create the cabling instruction, choose the menu item File -> Create Cabling Instructions. You can save and/or print the instructions. It is a good idea to print the instructions so you can take them with you to the cluster.

3.5. Cluster Cabling

If the cables are already connected, please proceed with section Section 3.5.2, “Verifying the Cabling”.

Note

In order to achieve a trouble-free operation of your cluster, setting up the cables correctly is critical. Please take your time to perform this task properly.

The cables can be installed while nodes are powered up.

3.5.1. Connecting the Dolphin DX cables

Please proceed by connecting the nodes as described by the cabling instructions generated by the dis_netconfig. Insert one or more cables into the connectors on the front of the DXH510. The connectors are labeled P0 or P1.

Generally, port 0 and port 1 (for x8 operation) need to be connected to port 0 and port 1 of a direclty connected machine for a 2-node cluster, or to one or two ports of the DXS switch.

Note

Each of the ports on the DX adapter has a LED that should glow green if the port is connected. However, if the two ports are used as a single x8 connection, only the LED of port 0 will glow green as in this case, the two ports are bonded into one on the hardware level.

Figure 4.17. DX x8 connection

DX x8 connection


Note

When connecting both ports of a DX adapter to a DXS switch, make sure that if port 0 of the adapter connects to port N of the switch, port 1 of the adapter connects to port N+1 of the switch. N must be an even number.

Additional information can be found in the DXS410 - Dolphin DX Switch Quick Start Guide available from the Dolphin web site

3.5.2. Verifying the Cabling

Important

A green link LED indicates that the link between the output plug and input plug could be established and synchronized. It does not assure that the cable is actually placed correctly! It is therefore important to verify once more that the cables are plugged according to the cabling instructions generated by the dis_netconfig!

If a pair of LEDs do not turn green, please perform the following steps:

  • If the LEDs still do not turn green, use a different cable.

  • If the LEDs still do not turn green, swap the cable of the problematic connection with a working one and observe if the problem moves with the cable.

  • Power-cycle the nodes with the orange LEDs according to Chapter 7, FAQ,.

  • Contact contact Dolphin support, www.dolphinics.com, if you can not make the LEDs turn green after trying all proposed measures.

When you are done connecting the cables, no more user interaction is required. Please confirm that all cables are connected and all LEDs are green, and the installation will proceed. The network manager will be started on the frontend, configuring all cluster nodes according to the configuration specified in dishosts.conf.

3.6. Handling Installation Problems

If you still fail to install the software successfully, refer to Chapter 5, Interconnect Maintenance.If you still fail to install the software successfully, you should contact contact Dolphin support, www.dolphinics.com. Please provide all installation logfiles. To get the installation logs, please run each MSI with /l*xv log_install.txt switch:

Please also include the configuration files that can be found in c:\WINDOWS\system32\drivers\etc\dis on the frontend.

3.7. Interconnect Validation using the management GUI

Dolphin provides a graphical tool named dxadmin. dxadmin serves as a single-point-of-control and manage the Dolphin Express interconnect in your cluster. It shows an overview of the status of all adapters and links of a cluster and allows to perform detailed status queries. It also provides means to manually control the interconnect, inspect and set options and perform interconnect tests. For a complete description of dxadmin, please refer to Appendix A, dxadmin Reference. Here, we will only describe how to use dxadmin to verify the newly installed Dolphin Express interconnect.

3.7.1. Installing dxadmin

dxadmin will be installed on the frontend machine by the installer, if you select to include the Management, or Graphical package. If you have passed this step and did not select the Management, or Graphical tools, then it is possible to rerun the installer on the frontend to include this package.

You can use dxadmin on any machine that can connect to the network manager on the frontend via a standard TCP/IP socket. You have to make sure that connections towards the frontend using the ports 3444 (network manager) and 3443 (node manager) are possible (potentially firewall settings need to be changed).

3.7.2. Starting dxadmin

Start dxadmin from the windows menu

Figure 4.18. Starting dxadmin

Starting dxadmin


.

After it has been started, you will need to connect to the network manager controlling your cluster. Click the Connect button in the tool bar and enter the appropriate hostname or IP address of the network manager.

Figure 4.19. dxadmin, connect

dxadmin, connect


dxadmin will present you a graphical representation of the cluster nodes and the interconnect links between them.

3.7.3. Cluster Overview

Normally, all nodes and interconnect links should be shown green, meaning that their status is OK. This is a requirement for a correctly installed and configured cluster and you may proceed to Section 3.7.4, “Cabling Correctness Test”.

If a node is plotted red, it means that the network manager can not connect to the node manager on this node. To solve this problem:

  1. Make sure that the node is powered and has booted the operating system.

  2. Verify that the node manager service is running:

    should tell you that the node manager is running. If this is not the case:

    1. Try to start the node manager with the startup script in the windows menu. Optionally:

    2. If the node manager fails to start, please see c:\WINDOWS\system32\drivers\etc\dis\log\dis_nodemgr.log

3.7.4. Cabling Correctness Test

dxadmin can validate that all cables are connected according to the configuration that was specified in the dis_netconfig, and which is now stored in c:\WINDOWS\system32\drvers\etc\dis\dishosts.conf on all nodes and the frontend. To perform the cable test, select Cluster -> Test Cable Connections. This Cabling Correctness Test runs for only a few seconds and will verify that the nodes are cabled according to the configuration provided by the dis_netconfig.

Warning

Running this test will stop the normal traffic over the interconnect as the routing needs to be changed.

If the test detects a problem, it will inform you that node A can not communicate with node B although they are supposed to be within the same ringlet. You will typically get more than one error message in case of a cabling problem, as such a problem does in most cases affect more than one pair of nodes. Please proceed as follows:

  1. Try to fix the first reported problem by tracing the cable connections from node A to node B:

    1. Verify that the cable connections are placed within one ringlet:

      1. Look up the path of cable connections between node A and node B in the Cabling Instructions that you created (or still can create at this point) using dis_netconfig.

      2. When you arrive at node B, do the same check for the path back from node B to node A.

    2. Along the path, make sure:

      1. That each cable plug is securely fitted into the socket of the adapter.

      2. Each cable plug is connected to the right link (0 or 1) as indicated by the cabling instructions.

  2. If you can't find a problem for the first problem reported, verify the cable connections for all following pairs of node reported bad.

  3. After the first change, re-run the cable test to verify if this change solves all problems. If this is not the case, start over with this verification loop.

3.7.5. Fabric Quality Test

The Cable Correctness Test performs only minimal communication between two nodes to determine the functionality of the fabric between them. To verify the actual signal quality of the interconnect fabric, a more intense test is required. Such a Fabric Quality Test can be started for each installed interconnect fabric (0 or 1) from within sciadmin via Cluster Fabric * Test.

Warning

Running this test will stop the normal traffic over the interconnect as the routing needs to be changed.

This test will run for a few minutes, depending on the size of your cluster.

Note

Any communication errors reported here are either corrected automatically by retrying a data transfer, or are reported. Thus, a communication error does not mean data might get lost. However, every communication error reduces the performances, and an optimally set up Dolphin Express interconnect should not show any communication errors.

If the test reports communication errors, please proceed as follows:

  1. If errors are reported between multiple pairs of nodes, locate the pair of nodes which is located most closely (has the smallest number of cable connections between them). Normally, if any errors are reported, a pair of nodes located next to each other will show up.

  2. Check the cable connection on the shortest path between these two nodes (a single cable, if nodes are located next to each other) for being properly mounted:

    1. No excessive stress on the cable, like bending it to sharply or too much force on the plugs.

    2. Cable plugs need to be placed in the connectors on the adapters evenly (not tilted) and securely fastened. If in doubt, unplug cable and re-fasten it.

  3. Perform the previous check for all other node pairs; then re-run the test.

  4. If communication errors persist, change cables to locate a possibly damaged cable:

    1. Exchange the cables between the most close pair of nodes one-by-one with a cable of a connection for which no errors have been reported. Remember (note down) which cables you exchanged.

    2. Run the Fabric Quality Test after each cable exchange.

      1. If the communication errors move with the cable you just exchanged, then this cable might be damaged. Please contact your sales representative for exchange.

      2. If the communication error remains unchanged, the problem might be with one of the adapters. Please contact Dolphin support for further analysis.

3.8. Making Cluster Application use Dolphin Express

After the Dolphin Express hard- and software has been installed and tested, you will want your cluster application to make use of the increased performance.