Creating and running parallel applications with MPI under OS X

Author: Timothy Kaiser, Ph.D.
tkaiser@sdsc.edu
Revised: July 24, 2003

Much of the information on this page is dated. The information about password-less ssh and network address resolution is still important. For information about more recent developments in running MPI on OSX see:

http://www.sdsc.edu/~tkaiser/mac_stuff/new_mpi.html

There is also a new note on how I set up my local area network to enable MPI to work without having a domain name server. See:

http://www.sdsc.edu/~tkaiser/mac_stuff/mac_setup.html





This document discusses compiling and running MPI under Macintosh OS X 10.1, in particular, the MPICH version of MPI using P4. If you don't know what P4 is, don't worry.

These directions apply only to mpich-1.2.2.3. There are some link issues associated with version 1.2.3. They don't look to serious but I have not had a chance to work it out. If you get these resolved before I do, please let me know and I'll add the changes to this page. It looks like it has something to do with the compile scripts wanting to link in the profiling versions of the libraries.



For historical interest, check out my page that I used when doing a presentation describing running parallel jobs on the Macintosh in 1995. I believe I was the first person to do so.

There are a few issues associated with compiling and running MPICH under Macintosh OS X. None of them are major.
These issues are related to:
We will use the P4 device when building MPICH. I have compiled with the "shared" memory option of P4 turned off.

Instructions are included for building the Fortran interface using Absoft's compiler. If you don't use Fortran just skip those parts.

I also assume that you are fairly comfortable using Unix and you have untarred the MPICH distribution from http://www.mcs.anl.gov/mpi/mpich/download.html to get the directory mpich-1.2.2.3.


Do not use "StuffIt Expander" or similar software to untar the file. You must use
tar -xf mpich.tar"
on the command line.

Do not use "StuffIt Expander" or similar software to untar the file. You must use
tar -xf mpich.tar"
on the command line.

A few of the commands and directory names you will see my user name "tkaiser." Please replace tkaiser with your user name as needed.

Network setup

Things work best if you have a static IP address. Dynamic DHCP works also if there is a nameserver mapping between your address and a name for your machine. That is, you must have access to a nameserver that knows about your machine.

Most of the network setup for OS X is done in Network panel of System Preference. Open this up and select the TCP/IP tab. When I am using DHCP over AirPort mine looks like:


Note your IP address. Open up a terminal window and type

nslookup "your IP address"

For me I get...

[mac:~] tkaiser% nslookup 132.249.65.138
Server:  ns1.sdsc.edu
Address:  198.202.75.26

Name:    dhcp-65-138.sdsc.edu
Address:  132.249.65.138
There is a mapping between my address, 132.249.65.138 and dhcp-65-138.sdsc.edu. The P4 device will not work without such a mapping.


SSH logins

For your parallel programs to run on a machine you need to be able to log into the machine. The best way to do this is to set up ssh password-less logins. Use ssh-keygen to generate the file
~/.ssh/identity.pub
Do this on all of the machines of interest. Concatenate the identity.pub files from each machine into a single file and put a copy in
~/.ssh/authorized_keys
on all of your machines.



Problems have been reported with ssh protocol 2 and password-less logins. To force protocol 1 put the following configuration file in your .ssh directory

[mac:~/.ssh] tkaiser% cat config
Protocol 1,2


See man ssh and man ssh-keygen for more information.

Configuration

If you are using Fortran, check that the "Unix" library links correctly.
To do this, copy the program

/Applications/Absoft/examples/GetArgs/GetArgs.f

to your home directory and compile it using the command given in the source.

If you get errors about needing to run "ranlib" on some of the libraries, do it.

In the following, if you "Copy/Paste" the lines below, make sure you get the whole line.

In the directory mpich-1.2.2.3, do a

setenv CFLAGS -fno-common
setenv RSHCOMMAND /usr/bin/ssh
setenv USERF77 /Applications/Absoft/bin/f77
setenv F77 "f77 -N109"
./configure  --with-device=ch_p4 -with-arch=LINUX --with-comm=ch_p4 --without-romio --enable-f77

or if you are not using Fortran

setenv CFLAGS -fno-common
setenv RSHCOMMAND /usr/bin/ssh
./configure  --with-device=ch_p4 -with-arch=LINUX --with-comm=ch_p4 --without-romio --disable-f77

The first setenv sets a "C" flag for building the library. This is needed so that the symbols in one of the object files, p4_globals.o, link correctly when put in the library. As an alternative to this setenv you can add the "-c" option to the ranlib that is done after the libraries are built.

/usr/bin/ssh is used by the library to launch tasks on remote nodes.

/Applications/Absoft/bin/f77 is the path to the Fortran compiler.

Setting F77 to f77 -N109 is done so that one of the tests in the configure process is successful.


Next do a

make mpilib
you will get messages like

ranlib: file: $HOME/mpich-1.2.2.3/lib/libmpich.a(sendutil.o) has no symbols
This does not appear to be a problem.

To make the Fortran library do

[mac:~/mpich-1.2.2.3] tkaiser%cd src/fortran
[mac:~/mpich-1.2.2.3] tkaiser%make flibs
When finished, your mpi library should be in lib

[mac:~/mpich-1.2.2.3] tkaiser% ls -lt $HOME/mpich-1.2.2.3/lib/*a
-rw-r--r--  1 tkaiser  staff   197692 Nov 20 20:28 libfmpich.a
-rw-r--r--  1 tkaiser  staff  1376332 Nov 20 20:28 libmpich.a
-rw-r--r--  1 tkaiser  staff     9048 Nov 20 20:28 libmpichfsup.a
-rw-r--r--  1 tkaiser  staff   198076 Nov 20 20:28 libpmpich.a
B

The files libfmpich.a and libmpichfsup.a are for Fortran.

Next "update" the libraries by doing a ranlib

[mac:~/mpich-1.2.2.3] tkaiser%ranlib $HOME/mpich-1.2.2.3/lib/*a
or, if you did not "setenv CFLAGS -fno-common" add the "-c" option

[mac:~/mpich-1.2.2.3] tkaiser%ranlib -c $HOME/mpich-1.2.2.3/lib/*a

Making the examples

The make files in $HOME/mpich-1.2.2.3/examples expect the file "mpicc" to be in the bin directory. Its not there, so do a

cp $HOME/mpich-1.2.2.3/util/mpiCC $HOME/mpich-1.2.2.3/bin/mpicc
(NOTE CASE an the above command)

Next, go to

cd $HOME/mpich-1.2.2.3/examples/basic

If you do a make of "cpi" you should get an executable. For example:

[mac:~/mpich-1.2.2.3/examples/basic] tkaiser% make cpi
/Users/tkaiser/mpich-1.2.2.3/bin/mpicc -c cpi.c
/Users/tkaiser/mpich-1.2.2.3/bin/mpicc -o cpi cpi.o -lm
[mac:~/mpich-1.2.2.3/examples/basic] tkaiser%
For Fortran you get

[mac:~/mpich-1.2.2.3/examples/basic] tkaiser% make fpi
/Users/tkaiser/mpich-1.2.2.3/bin/mpif77  -c fpi.f
FORTRAN 77 Compiler 7.0, Copyright (c) 1987-2001, Absoft Corp.
/Users/tkaiser/mpich-1.2.2.3/bin/mpif77  -o fpi fpi.o
[mac:~/mpich-1.2.2.3/examples/basic] tkaiser% 

I captured the steps given above (except checking that the Fortran Unix library links) and put them in a script. The script assumes that it and the mpich.tar file are in the same directory.


#!/bin/csh -f
setenv MPI_DIR $PWD/mpich-1.2.2.3
tar -xf mpich.tar
cd $MPI_DIR
setenv CFLAGS -fno-common
setenv RSHCOMMAND /usr/bin/ssh
setenv USERF77 /Applications/Absoft/bin/f77
setenv F77 "f77 -N109"
./configure --with-device=ch_p4 -with-arch=LINUX --with-comm=ch_p4 --without-romio --enable-f77
make mpilib
cd src/fortran
make flibs
ranlib $MPI_DIR/lib/*a
cp $MPI_DIR/util/mpiCC $MPI_DIR/bin/mpicc
cp $MPI_DIR/mpid/ch_p4/p4/lib/p4_globals.o $MPI_DIR/lib/p4_globals.o
cd $MPI_DIR/examples/basic
make cpi
make fpi

Running an application

Copy your application "cpi" to our home directory

cp cpi ~
Then we go there.

cd ~
Now use scp to copy your application to all machines that you want to use to run your job. For example I will copy to a machine called peloton.

scp cpi peloton:cpi
where peloton is the name of a machine. If you are asked for a password when you do this go back and look at the section for setting up password-less logins.

To run a parallel application you need to specify where the various tasks will be run. This is done using a "Process Group" file. You then specify the process group file on the command line when you run you application like

cpi -p4pg cpi.p4pg

where cpi.p4pg is the file.

The format for the pg file is a little odd. Say I want to run my job using three tasks; One on my local machine, one on peloton and on a machine called ozark. Here is a process group file


local 0
peloton        1 /Users/tkaiser/cpi
ozark          1 /Users/tkaiser/cpi
In theory, we can run multiple tasks on a single node and have the tasks swap messages using shared memory. (I don't have a multiple processor Mac so there is not a lot of use in doing this so I built MPICH with sharing memory turned off.)

The first line specifies that I want to run a job locally with "0" additional tasks sharing memory. A number other than 0 will give an error.

The next lines specify that I want to run additional tasks on the given nodes, (ozark and peloton) running "1" copy of /Users/tkaiser/cpi.

I need to give the full path name to the application. For your process group file change /Users/tkaiser to your home directory name.

This yields

[mac:~] tkaiser% cpi -p4pg cpi.p4pg
Process 0 of 3 on mac.sdsc.edu
pi is approximately 3.1416009869231254, Error is 0.0000083333333323
wall clock time = 0.010493
Process 1 of 3 on ozark.sdsc.edu
Process 2 of 3 on peloton.sdsc.edu
[mac:~] tkaiser%