The message-passing version of TPHOT was implemented on Lawrence Livermore National Laboratory's 128 processor BBN Butterfly TC2000 using the Livermore Message-Passing (LMPS), a library of message-passing routines. Each processor of the BBN has 16 MBytes of memory that can be ``shared'' by all nodes via a ``butterfly switch''. Under LMPS, however, each node's memory belongs to only itself from the perspective of the application program. The code yielded identical results for the test problem run with 8 tasks on both the BBN and the Cray. Many different runs were made on the BBN, varying the number of processors from 1 to 116 and the number of particles (i.e., the workload W) from 2400 to 24,000,000.
Table 5.1 gives the simulation times for the Butterfly as a function of the
number
| 1|cNumber | 1|c | 1c Workload (W) | 1c | 1c | 1c| |
| 1|cof | 1|c0.01 | 1|c 0.1 | 1|c1.0 | 1|c10.0 | 1|c| 100.0 |
| 1|cprocessors | 1|ctime |
1|c time |
1|ctime |
1|ctime |
1|c|time |
| 1|c | 1|c(sec) |
1|c (sec) |
1|c(sec) |
1|c(sec) |
1|c| (sec) |
| 1 | 17 |
144 |
1407 |
- |
- |
| 4 | 6 |
38 |
357 |
- |
- |
| 8 | 5 |
22 |
181 |
1769 |
- |
| 9 | 5 |
20 |
161 |
1595 |
- |
| 10 | 5 |
18 |
145 |
1416 |
- |
| 16 | 7 |
13 |
94 |
888 |
- |
| 32 | 15 |
15 |
54 |
450 |
- |
| 64 | - |
31 |
53 |
251 |
2364 |
| 80 | - |
- |
- |
223 |
1813 |
| 100 | - |
- |
- |
215 |
1493 |
| 116 | - |
- |
- |
224 |
1366 |
of processors N and the workload W. We have arbitrarily assigned W=1.0 to the case with approximately 240,000 particles. Blanks appear in the table for two reasons: (1) large workloads are prohibitively expensive on few processors, and (2) small workloads on a large number of processors yield chaotic timings.
The speedups for each case in table 5.1 are computed using equation (5.1),
using the N=1
case for
each workload as the reference serial case (for
). This is not
quite correct,
because this will not be the optimal serial code. This is probably not a
large
effect, but it will tend to make the speedups appear better than they should be.
| 1|cWorkload | 1|c# of | 1|c model single | 1|cobserved single | 1|c|serial |
| 1|c(W) | 1|chistories | 1|c processor execution | 1|c processor execution | 1|c|fraction |
| 1|c | 1|c(Nh) | 1|c time ( |
1|c
time ( |
1|c|(f) |
| 0.01 | 2347 | 17.2 | 17 | 0.19 |
| 0.10 | 23843 | 143.8 | 144 | 0.023 |
| 1.00 | 238232 | 1407 | 1407 | 0.0024 |
| 10.0 | 2382320 | 14070 | - | 0.00024 |
| 100.0 | 23823200 | 140700 | - | 0.000024 |