- 1.1.1. Although I have properly installed the adapter in a node and its LEDs light orange, I am told (i.e. during the installation) that this node does not contain an SCI adapter!
- 1.1.2. All cables are connected, and all LEDs shine green on the adapter boards, all required services and drivers are running on all nodes. However, some nodes can not see some other nodes via the SCI interconnect. Between some other pairs of nodes, the communication works fine.
- 1.1.3. The SCI driver dis_irm refuses to load, or driver install never completes. Running dmesg shows that the syslog contains the line Out of vmalloc space. What's wrong?
|
| 1.1.1. | Although I have properly installed the adapter in a node and its LEDs light orange, I am told (i.e. during the installation) that this node does not contain an SCI adapter! |
| The SCI adapter might not have been recognized by the node during the power-up initialization after power was applied again. The specification requires that a node needs to be powered down for at least 5 seconds before being powered up again. To make the adapter be recognized again, you will need to power-down the node (restarting or resetting is not sufficient!), wait for at least 5 seconds, and power it up again. If this does not fix the problem, please contact Dolphin support. |
| 1.1.2. | All cables are connected, and all LEDs shine green on the adapter boards, all required services and drivers are running on all nodes. However, some nodes can not see some other nodes via the SCI interconnect. Between some other pairs of nodes, the communication works fine. |
| These symptoms indicate that the cabling is not correct, i.e. the links 0 and 1 (x- and y-direction in a 2D-torus) are exchanged. To resolve the problem, proceed as follows: Run the cable test from sciadmin ( ). If no problem is reported, please contact Dolphin Support. To fix the cable problem, dreate a cabling description via dishotseditor () and the cabling between the nodes that have been reported in the cable test. Repeat step 1. and 2.until no more problems are reported.
|
| 1.1.3. | The SCI driver dis_irm refuses to load, or driver install never completes. Running dmesg shows that the syslog contains the line Out of vmalloc space. What's wrong? |
| The problem is that the SCI adapter requires more virtual PCI address space than supported by the installed kernel. This problem has so far only been observed on 32 bit operating systems. There are two alternative solutions: If you are building a small cluster you may be able to run your application with less SCI address space. You can change the SCI address space size for the adapter card by using sciconfig with the command set-prefetch-mem-size. A value of 64 or 16 will most likely overcome the problem. This operation can also be performed from the command line using the options -c to specify the card number (1 or 2) and -spms to specify the prefetch memory size in Megabytes: # sciconfig -c 1 -spms 64
Card 1 - Prefetch space memory size is set to 64 MB
A reboot of the machine is required to make the changes take effect. When rebooting the machine, the problem should be solved. If reducing the prefetch memory size is not desired, the related resources in the kernel have to be increased. For x86-based machines, this is achieved by passing the kernel option vmalloc=256m and the parameter uppermem=524288 at boot time. This is done by editing /boot/grub/grub.conf as shown in the following example: title CentOS-4 i386 (2.6.9-11.ELsmp)
root (hd0,0)
uppermem 524288
kernel /i386/vmlinuz-2.6.9-11.ELsmp ro root=/dev/sda6 rhgb quiet vmalloc=256m
initrd /i386/initrd-2.6.9-11.ELsmp.img
|