A Parallel Monte Carlo Code for Planar and SPECT Imaging: Implementation, Verification and Applications in I-131 SPECT

Yuni Dewaraja1, Michael Ljungberg2, Amitava Majumdar3, Abhijit Bose1, Kenneth F Koral1

1Internal Medicine Department, Division of Nuclear Medicine, University of Michigan, USA, 2Department of Radiation Physics, Lund University, Sweden, 3University of California San Diego, San Diego Super Computer Center, USA.

 

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization was based on equally partitioning photons among the processors and used the MPI library for interprocessor communication and the SPRNG library to generate uncorelated random number streams. These parallelization techniques are also applicable to other distributed memory parallel architectures. A linear increase in computing speed with the number of processors was demonstrated for up to 32 processors. This speed-up is especially significant in computationally tedious SPECT simulations involving higher energy photon emitters, which require explicit modeling of the phantom and collimator. For I-131 SPECT, the code was validated by comparing images from an experimental heart/thorax phantom with those from a simulation of a computerized version of the same phantom. Applications were demonstrated by assessing scatter and attenuation correction as well as quantification accuracy in simulations of the voxel-man phantom. In conclusion, the speed-up demonstrated by the present parallel implementation and the recent increase in accessibility of advanced computer architectures makes it feasible to carry out accurate clinically realistic SPECT simulations of higher energy photon emitters.

1. Introduction

There is renewed interest in I-131 quantification for dosimetry due to the success of radioimmunotherapy in treating B-cell non-Hodgkin’s lymphoma. There has long been a need for a suitable Monte Carlo (M.C.) code to assess and solve the difficult problems associated with imaging high-energy photon emitters such as I-131. Only a handful of the M.C. codes applicable to nuclear medicine imaging include complete photon transport through both the phantom and collimator which is required for accurate modeling of high-energy photon emitters (1-3). Because physical modeling of the collimator and phantom is computationally tedious one of the limitations of these codes is the speed even when variance reduction is utilized. We have previously reported on I-131 SPECT simulations of simple analytical phantoms (4) using the SIMIND M.C. code (5), which was combined with a collimator routine that explicitly models scatter and penetration (1). Even with this relatively fast code, when modeling realistic imaging situations using voxel based phantoms the simulation times for obtaining statistically acceptable data become unrealistic (several months). Computational speed is especially important in SPECT since typically 60 projections must be simulated. Due to the recent increase in accessibility of advanced computer architectures there has been much interest in vectorized and parallel M.C. The M.C. method for particle simulation is inherently parallel due to the fact that particle histories are independent (6). In two review articles on M.C. in nuclear medicine (7) and radiation physics (8) parallel processing is recognized as the ideal solution for such simulations. However, development of either vectorized or parallel M.C. codes for nuclear medicine applications have been limited (9-10). Here we describe the implementation of the SIMIND M.C code on a distributed memory parallel machine. Verification and applications of the parallel code is presented for I-131 SPECT with voxel-based phantoms.

2. Implementation of SIMIND on the parallel system

Parallel architecture: The parallel code was implemented on the RS/6000 Scalable Power Parallel 2 (SP2) machine from IBM. The distributed memory architecture on the SP2 is based on a "shared nothing" model with each processor having its own CPU, operating system, memory and disk. The SP2 at the University of Michigan has forty-eight 160 MHz processors with 1GB of memory on each. Random Number Generator: A critical part of a history based parallel algorithm is a good parallel random number generator that provides uncorelated random number streams to each processor. In the present work we use a parallel version of the Linear Congruential Generator from the recently developed software library SPRNG (Scalable Parallel Random Number Generator). Parallelization: The photons are equally distributed among processors and each processor performs the entire simulation and reports its results to the host processor. For interprocessor communication we utilized the MPI library which is a standard message-passing library for distributed memory machines. The MPI commands are implemented such that they are included at the linking stage. Therefore it was possible to still have only one version of SIMIND (for both serial and parallel implementations) which is a big advantage since this program is constantly been developed. A simplified flow chart for simulating a SPECT image is given in Fig. 1 with the diagram for the host processor on the left and the diagram for all other processors on the right. The host also contributes a complete simulation set, but has the additional tasks of summing the partial results and storing the global results. We are presently working on another version of the parallel algorithm, which would split the simulation of the different SPECT projection angles among the processors.

3. Results

For results presented here the simulated SPECT camera was a Picker Prism 3000 with an ultra high energy collimator. Included in SIMIND is access to the following voxel phantoms: 1) the voxel-man (11) and 2) the Radiological Support Devices (RSD) heart/thorax phantom. The RSD phantom is also commercially available as an experimental phantom. SIMIND allows tumors to be superimposed on to these phantoms. Reconstruction used the SAGE algorithm, which incorporated an attenuation-weighted-strip-integral, and scatter estimates based on the triple energy window method.

 

 

 

 

Small (8.4x107 photons/proj.)

Medium (8.4x108 photons/proj.)

N

Time (sec)

Speedup

Efficiency

Time (sec)

Speedup

Efficiency

1

17131

1.000

1.000

171640

1.000

1.000

2

8581

1.997

0.998

85922

1.998

0.999

4

4298

3.986

0.996

42959

3.995

0.999

8

2162

7.923

0.990

21598

7.947

0.993

16

1085

15.785

0.987

10749

15.968

0.998

32

547

31.304

0.978

5408

31.739

0.992

Table 1: Measured timing results for the SP2.

Fig 1: Flow chart of the parallel simulation

Timing Results: Table 1 shows the performance results as a function of the number of nodes and the problem size for the voxel-man simulation. The speed-up, S, is defined as the execution time for a single processor divided by the execution time on N processors and efficiency is defined as speedup divided by N. According to Table 1 linear speed-up, i.e. S approaching N, is observed. This is to be expected because in our implementation the sequential fraction is negligible and requires minimal interprocessor communication. There is a very small degradation in speed-up and efficiency with the number of processors because of increased parallelization overhead attributed to communication and synchronization.

Validation of the Parallel code: I-131 energy spectra and point spread functions obtained with the parallel version of SIMIND compared very well with those obtained with the serial code and by experimental measurement. SPECT validation is demonstrated in Figure 2 which compares a typical reconstructed slice of the experimental and simulated RSD heart/thorax phantom. The physical phantom was augmented by insertion of a 100-cc water filled plastic sphere close to the liver to represent a tumor.

Applications of the Parallel Code: We demonstrate applications of the parallel code by using it to evaluate scatter and attenuation correction as well as quantification accuracy. A qualitative assessment of the corrections is demonstrated in Fig. 2 which compares a scatter and attenuation corrected image (right) with the ideal image (center) obtained by simulating the case with no scatter or attenuation. Using our clinical SPECT quantification procedure (12) VOI counts corresponding to three tumors in the voxel-man-reconstructed image were converted to activity. Since the true simulated activity was known, quantification error could be determined. The errors were < 11%, which shows that reasonable quantification accuracy is possible even in the presence of non-uniform activity and non-uniform scatter and attenuation. It is worth noting that the tumors simulated in this work had the same size as the sphere used to carry out the calibration simulation and effects of varying tumor size were not yet evaluated.

The phantom simulations of Figures 2 and 3 were carried out on 16 processors of the SP2. The total computational time was approximately 220 hours for the entire SPECT simulation (with 1.4x109 photons/projection). Since parallel processing speed and power are steadily increasing, we can expect the computational time for such simulations to rapidly decrease. The IBM SP2 at University of Michigan is presently being upgraded to 128 processors. Recently the San Diego Supercomputing Center acquired the new IBM Teraflops RS/6000 SP configured with 1152 processors running at 222 MHz. The above supercomputing facilities are accessible to researchers at other institutions via the National Partnership for Advanced Computational Infrastructure (NPACI). Recently we successfully ran the parallel SIMIND code on a SGI Origin 2000 parallel computer configured with one hundred 300 MHz processors at Lund University, Sweden.

4. Summary and Conclusions

The linear speed-up achieved with the parallel SIMIND code makes it feasible to carry out accurate and highly realistic SPECT simulations of higher energy photon emitters. The parallelization techniques used in this work with the IBM SP2 are also applicable to other distributed memory parallel architectures.

Fig 2: True activity and the simulated and measured images for the RSD phantom. The profile is across the center of the tumor.

Fig 3: True activity distribution and the ideal and corrected images for the voxel-man. The profile is across the center of Tumor 3.

References

1. D. de Vries, S. Moore, C. Zimmerman et al, IEEE Trans. Med. Imag., vol. 9, pp. 430-438, 1990.

2. A. Bice, L. Durack, K. Pollard and J. Early, J. Nucl. Med., vol. 32(5), pp. 1058-1059, 1991.

3. J. Yanch and A. Dobrzeniecki, Proc. IEEE Nuclear Science Symposium and Medical Imaging Conference, Santa Fe, NM, pp. 1809-1813, 1991.

4. Dewaraja YK, Ljungberg M, Koral KF. Accepted to J Nucl Med, 2000.

5. Ljungberg M, Strand S-E. Comp Meth and Progr in Biomed. 1989;29:257-272.

6. Martin WR. Advances in Nucl Sci and Tech 22:105-164,1992.

7 Zaidi H. Med. Phys. 26:574-608, 1999.

8. Andreo P, Phys. Med. Biol. 36:861-920, 1991.

9. Zaidi H, Labbe C, Morel C, Parallel Computing 24:1523-1536, 1998.

10. Smith MF, Floyd CE, Jaszczak RJ. Med. Phys. 20:1121-1127, 1993.

11. Zubal IG et al. Med. Phys. 21:299-302, 1994.

12 Koral et al, Accepted to J Nucl Med, 2000.