When installing the Dolphin Express software stack (which includes SuperSockets) via the SIA, the basic functionality and performance is verified at the end of the installation process by some of the same tests that are described in the following sections. This means, that if the tests performed by the SIA did not report any errors, it is very likely that both, the software and hardware work correctly.
The following sections describe how to verify that the interconnect is setup correctly, which means that all nodes can communicate with all other nodes via the Dolphin Express interconnect by sending low-level control packets and performing remote memory access.
Without the required drivers and services running on all nodes and the frontend, the cluster will fail to operate. On the nodes, the kernel services
dis_irm (low level hardware driver) and
dis_sisci (upper level hardware services)
and dis_ssocks need to be running. Next to these kernel drivers, the user-space service
dis_nodemgr (node manager, which talks to the central network manager) needs to be active for configuration and monitoring. On the frontend, only the user-space service
dis_networkmgr (the central network manager) needs to be running.
Because the drivers do also appear as services, you can query their status with the usual tools of the installed operating system distribution. I.e., for Red Hat-based Linux distributions, you can do
# service dis_irm status Dolphin IRM 3.3.0 ( November 13th 2007 ) is running.
Dolphin provides a script dis_services that performs this task for all Dolphin services installed on a machine. It is used in the same way as the individual service command provided by the distribution:
# dis_services status Dolphin DX 3.3.0 ( November 13th 2007 ) is running. Dolphin IRM 3.3.0 ( November 13th 2007 ) is running. Dolphin Node Manager is running (pid 3172). Dolphin SISCI 3.3.0 ( November 13th 2007 ) is running. Dolphin SuperSockets 3.3.0 "St.Martin", Nov 7th 2007 (built Nov 14 2007) running.
If any of the required services is not running, you will find more information on the problem that may have occurred in the system log facilities. Call dmesg to inspect the kernel messages, or check
c:\WINDOWS\system32\drivers\etc\dis\log. for related messages.
To get the installation logs, please run each MSI with /l*xv log_install.txt switch
The static interconnect test makes sures that all adapters are working correctly by performing a self-test, and determines if the setup of the routing in the adapters is correct (matches the actual hardware topology). It will also check if all cables are plugged in to the adapters, but this has already been done in the Cable Connection Test. The tool to perform this test is dxdiag (default location
Running dxdiag on a node will perform a self test on the local adapter(s) and list all remote adapters that this adapter can see via the Dolphin Express interconnect. This means, to perform the static interconnect test on a full cluster, you will basically need to run dxdiag on each node and see if any problems with the adapter are reported, and if the adapters in each node can see all remote adapters installed in the other nodes.
An example output of dxdiag for a node which is part of a 2 node cluster and using one adapter per node looks like this:
=======================================================g==================== Dolphin diagnostic tool -- Dxdiag version 3.3.1d ( Mon Apr 21 2008 ) =========================================================================== ******************** VARIOUS INFORMATION ******************** Dxdiag compiled in 64 bit mode Driver : Dolphin IRM (DX)3.3.1d Dec 5th 2007 (rev unknown) Date : Mon Apr 21 13:30:38 CEST 2008 System : Linux jelen-07 2.6.9-55.0.9.ELsmp #1 SMP Thu Sep 27 18:28:00 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux Number of configured local adapters found: 1 Hostbridge : , 0xbffff870 Local adapter 0 > Type : DXH510 NodeId(log) : 4 Serial number : 0xad015107061 DXH chipId : 0x01019902 DXH chip revision : 0x00000002 Assembly part revision : 0x01b1 Firmware version : 0x0157 SROM version : 0x01a0 Switch type : unknown Topology type : DX direct 2 nodes PCIe link width : x8 OK: Link alive in adapter 0. ==> Local adapter 0 ok. ******************** TOPOLOGY SEEN FROM ADAPTER 0 ******************** Adapters found: 2 Switch ports found: 0 ----- List of all ranges (rings) found: In range 0: 0004 0008 ---------------------------------- dxdiag discovered 0 note(s). dxdiag discovered 0 warning(s). dxdiag discovered 0 error(s). TEST RESULT: *PASSED*
The static interconnect test passes if dxdiag delivers
TEST RESULT: *PASSED* and reports the same topology (remote adapters) on all nodes.
While the static interconnect test sends very a few packets over the links to probe remote adapters, the Interconnect Load Test puts significant stress on the interconnect and observes if any data transmissions have to be retried due to link errors. This can happen if cables are not correctly connected, i.e. plugged in without screws being tightened. Before running this test, make sure your cluster is cabled and configured correctly by running the tests described in the previous sections.
This test can be performed from within the sciadmin GUI tool. Please refer to Appendix B,
dxadmin Reference for details.
To run this test from the command line, simply invoke sciconntest (default location
c:\Program Files\Dolphin Express DX\demo\sciconntest) on all nodes.
It is recommended to run this test from the sciadmin GUI (see previous section) because it will perform a more controlled variant of this test and give more helpful results.
All instances of sciconntest will connect and start to exchange data, which can take up to 30 seconds. The output of sciconntest on one node which is part of a 9-node cluster looks like this:
/opt/DIS/bin/sciconntest compiled Oct 2 2007 : 22:29:09 ---------------------------- Local node-id : 76 Local adapter no. : 0 Segment size : 8192 MinSize : 4 Time to run (sec) : 10 Idelay : 0 No Write : 0 Loopdelay : 0 Delay : 0 Bad : 0 Check : 0 Mcheck : 0 Max nodes : 256 rnl : 0 Callbacks : Yes ---------------------------- Probing all nodes Response from remote node 4 Response from remote node 8 Response from remote node 12 Response from remote node 68 Response from remote node 72 Response from remote node 132 Response from remote node 136 Response from remote node 140 Local segment (id=4, size=8192) is created. Local segment (id=4, size=8192) is shared. Local segment (id=8, size=8192) is created. Local segment (id=8, size=8192) is shared. Local segment (id=12, size=8192) is created. Local segment (id=12, size=8192) is shared. Local segment (id=68, size=8192) is created. Local segment (id=68, size=8192) is shared. Local segment (id=72, size=8192) is created. Local segment (id=72, size=8192) is shared. Local segment (id=132, size=8192) is created. Local segment (id=132, size=8192) is shared. Local segment (id=136, size=8192) is created. Local segment (id=136, size=8192) is shared. Local segment (id=140, size=8192) is created. Local segment (id=140, size=8192) is shared. Connecting to 8 nodes Connect to remote segment, node 4 Remote segment on node 4 is connected. Connect to remote segment, node 8 Remote segment on node 8 is connected. Connect to remote segment, node 12 Remote segment on node 12 is connected. Connect to remote segment, node 68 Remote segment on node 68 is connected. Connect to remote segment, node 72 Remote segment on node 72 is connected. Connect to remote segment, node 132 Remote segment on node 132 is connected. Connect to remote segment, node 136 Remote segment on node 136 is connected. Connect to remote segment, node 140 Remote segment on node 140 is connected. SCICONNTEST_REPORT NUM_TESTLOOPS_EXECUTED 1 NUM_NODES_FOUND 8 NUM_ERRORS_DETECTED 0 node 4 : Found node 4 : Number of failiures : 0 node 4 : Longest failiure : 0.00 (ms) node 8 : Found node 8 : Number of failiures : 0 node 8 : Longest failiure : 0.00 (ms) node 12 : Found node 12 : Number of failiures : 0 node 12 : Longest failiure : 0.00 (ms) node 68 : Found node 68 : Number of failiures : 0 node 68 : Longest failiure : 0.00 (ms) node 72 : Found node 72 : Number of failiures : 0 node 72 : Longest failiure : 0.00 (ms) node 132 : Found node 132 : Number of failiures : 0 node 132 : Longest failiure : 0.00 (ms) node 136 : Found node 136 : Number of failiures : 0 node 136 : Longest failiure : 0.00 (ms) node 140 : Found node 140 : Number of failiures : 0 node 140 : Longest failiure : 0.00 (ms) SCICONNTEST_REPORT_END SCI_CB_DISCONNECT:Segment removed on the other node disconnecting.....
The test passes if all nodes report 0 failures for all remote nodes. If the test identifies any failures, you can determine the closest pair(s) of nodes for which these failures are reported and check the cabled connection between them. The numerical node identifies shown in this output are the node ID numbers of the adapters (which identify an adapter in the Dolphin Express interconnect).
Although this test can be run while a system is in production, but you have to take into account that performance of the productive applications will be reduced significantly while this test is running. If links actually show problems, they might be temporarily disabled, stopping all communication until rerouting takes place.
Once the correct installation and setup and the basic functionality of the interconnect have been verified, it is possible to perform a set of low-level benchmarks to determine the base-line performance of the interconnect without any additional software layers. The tests that are relevant for this are scibench2 (streaming remote memory PIO access performance), scipp (request-response remote memory PIO write performance), dma_bench (streaming remote memory DMA access performance) and intr_bench (remote interrupt performance).
All these tests need to run on two nodes (A and B) and are started in the same manner:
On node A, start the server-side benchmark with the options
-rn <node id of B>, like:
$ scibench2 -server -rn 8
On node B, start the client-side benchmark with the options
-rn <node id of A>, like:
$ scibench2 -client -rn 4
The test results are reported by the client.
To simply gather all relevant low-level performance data, the script
sisci_benchmarks.sh can be called in the same way. It will run all of the described tests.
The minimal round-trip latency for writing to remote memory should be below 4µs. The average number of retries is not a performance metric and can vary from run to run.
Ping Pong round trip latency for 0 bytes, average retries= 1292 3.69 us Ping Pong round trip latency for 4 bytes, average retries= 365 3.94 us Ping Pong round trip latency for 8 bytes, average retries= 359 3.98 us Ping Pong round trip latency for 16 bytes, average retries= 357 4.01 us Ping Pong round trip latency for 32 bytes, average retries= 4 4.58 us Ping Pong round trip latency for 64 bytes, average retries= 346 4.30 us Ping Pong round trip latency for 128 bytes, average retries= 871 6.26 us Ping Pong round trip latency for 256 bytes, average retries= 832 6.49 us Ping Pong round trip latency for 512 bytes, average retries= 1072 7.99 us Ping Pong round trip latency for 1024 bytes, average retries= 1643 10.99 us Ping Pong round trip latency for 2048 bytes, average retries= 2738 17.00 us Ping Pong round trip latency for 4096 bytes, average retries= 4974 29.00 us Ping Pong round trip latency for 8192 bytes, average retries= 9401 53.06 us
The interrupt latency is the only performance metric of these tests that is affected by the operating system which always handles the interrupts and can therefore vary. The following number have been measured with RHEL 4 (Linux Kernel 2.6.9):
Average unidirectional interrupt time : 7.665 us. Average round trip interrupt time : 15.330 us.