Setting up your network for MPI can be problematic if you have fixed dummy IP addresses. Your dummy IP address might look like 192.168.1.xx, where xx is in the range of 1-256. This happens in two common situations:
In either case you might not have a nameserver that knows about your machines. For example, assume that I have a machine named geight and the IP address for geight is 192.168.1.16. I run the unix utility nslookup and get something like:
[geight:~] tkaiser% nslookup 192.168.1.16 Server: ns1.sdsc.edu Address: 188.8.131.52 *** ns1.sdsc.edu can't find 192.168.1.16: Non-existent host/domain [geight:~] tkaiser%
What is happening is that the machine (ns1.sdsc.edu) that does the mapping between numbers (192.168.1.16) and names (geight) does not know about my machine.
When you launch an MPI job, mpirun, might not know how to find other nodes.
If nslookup gives you a valid mapping between all or your machines then you don't have the problem this note was written to address and you can stop reading.
I have found the following solves the problem. I am not saying this is the only or best way, just that it worked for me.
I have 4 machines in my local area network.
There are two things that I did on each machine:
There is a file /etc/hostconfig that "gives" the name of your machine. There is a line close to the top of this file
On each of my machines, I replaced -AUTOMATIC- with my name of the machine. Note that /etc/hostconfig is owned by root so you will need to use sudo to edit file. (Actually you can use BBedit and it will ask you for your password when you save the file.) Also the changes do not take effect until you reboot. For me, after the edits the tops of /etc/hostconfig look like the following:
[white:/etc] tkaiser% head hostconfig ## # /etc/hostconfig ## # This file is maintained by the system control panels ## # Network configuration #HOSTNAME=-AUTOMATIC- HOSTNAME=white ROUTER=-AUTOMATIC- [white:/etc] tkaiser% [geight:/etc] tkaiser% head hostconfig ## # /etc/hostconfig ## # This file is maintained by the system control panels ## # Network configuration HOSTNAME=geight ROUTER=-AUTOMATIC- [geight:/etc] tkaiser% [silver:/etc] tkaiser% head hostconfig ## # /etc/hostconfig ## # This file is maintained by the system control panels ## # Network configuration HOSTNAME=silver ROUTER=-AUTOMATIC- [silver:/etc] tkaiser% [blue:/etc] tkaiser% head hostconfig ## # /etc/hostconfig ## # This file is maintained by the system control panels ## # Network configuration HOSTNAME=blue ROUTER=-AUTOMATIC- [blue:/etc] tkaiser%
NetInfo Manager is in /Applications/Utilities.
Launch NetInfo Manager and you will get something that looks like:
Click on "Click the lock to make changes" and enter your password.
We are going to add mappings between machine names and addresses. Click on machines and then localhost and hit Duplicate. This will create a new entry called "localhost copy." Edit the "Value(s)" on the right hand side so that they represent one of your machines.
For serves you will need to add ../network. New values are added
using "Insert Value" under the "Directory" menu. The entry for machine "blue" looks like:
Add all of your machines to the machines list. Then do a "Save Changes" under the "Domain" menu and "Restart local Netinfo Domains" under "Management."
This needs to be done to all of your machines.
Click here for a movie of a machine being added.