In case a node needs to be replaced, proceed as follows concerning the Dolphin interconnect:
Power down the node. The network manager will automatically reroute the interconnect traffic. When you run sciadmin on the frontend, you will see the icon of the node turn red within the GUI representation of the cluster nodes.
Unplug all cables from the adapter. Remember (or mark the cables) into which plug on the adapter each cable belongs.
Unmount the adapter from the node to be replaced, and insert it into the new node.
Power up the node; then connect the SCI cables in the same way they had been connected before. Make sure that all LEDs on all adapters in the affected ringlets light green again.
Run the SIA with the option --install-node. To verify the installation after the SIA has finished:
The icon of the node in the sciadmin GUI must have turned green again.
The output of the dis_services script should list all services as running.
Perform the cable test from within sciadmin to ensure that the cabling is correct (see Chapter 4, Initial Installation, Section 3.7.4, “Cabling Correctness Test”).
Running the cable test will stop other traffic on the interconnect for the time the test is running, which can be up to a minute. If this is not an option, please use scidiag from the commandline to verify the functionality of the interconnect (see Chapter 7, Interconnect Maintenance, Section 1.1.3, “Static Interconnect Test”).
To ensure that all nodes that are not being replaced can continue to communicate via the SCI interconnect while other nodes are replaced, you should replace nodes in a ring-by-ring manner: power down nodes within one ringlet only. Bring this group of nodes back to operation before powering down the next group of nodes.
Communication between all other nodes will continue uninterrupted during this procedure.