Skip to content

COSMOLOGY | Contents | Next

PARTREE and SCF: A Strategic Applications Collaboration for the Study of Galaxies

Lars E. Hernquist
Harvard-Smithsonian Center for Astrophysics
Jay Boisseau
SAC Program Coordinator, SDSC

Stuart Johnson
Scientific Computing Group, SDSC
Robert Leary
Computational Science Research Group, SDSC

John Dubinski, University of Toronto
Steinn Sigurdsson, Pennsylvania State University
Chris Mihos, Case Western Reserve University
Romeel Davé, Princeton University
Kathryn V. Johnston, The Institute for Advanced Study
Roeland van der Marel, Space Telescope Science Institute
Stephen Vine, Ohio University

W hen astronomers survey the sky, they see that most stars are members of galaxies, collections of hundreds of millions to hundreds of billions of stars arrayed across the universe in a variety of shapes and sizes. Pictures from the Hubble Space Telescope have captured snapshots of the life cycle of galaxies from their cosmic infancy (in the farthest galaxies) to the present. How have these large families of stars formed and interacted to produce what Hubble sees? Scientists at SDSC recently helped a group of astrophysicists to model galaxy dynamics and evolution faster and more accurately.

"To sort out the history of the universe, we study the ways all of these galaxies have changed over time, using the fundamental physics that governs their gravitational evolution," said Lars Hernquist, professor at the Harvard-Smithsonian Center for Astrophysics. "Our computer codes are designed to answer galactic evolutionary questions. Now that they run two to three times faster on the fastest supercomputers, we can ask more precise questions and get better answers. This is work we could not have accomplished ourselves; it required the dedicated expertise that SDSC brought to it."






The optimization and speed-up of two of the group's codes, PARTREE and SCF, was part of NPACI's Strategic Applications Collaborations (SAC) program, originated by Jay Boisseau, Associate Director for Scientific Computing at SDSC. "We work with leading-edge researchers whose codes have wide application," Boisseau said, "to enhance the effectiveness of research on our HPC systems." Stuart Johnson of the Scientific Computing Group and Robert Leary of the Computational Science Research Group led the work on the Hernquist codes. Johnson optimized the code PARTREE for the IBM SP and CRAY T3E; Leary worked on the SCF code.

Galactic dynamics can be modeled as the collisionless interaction of a large collection of particles, the more the better. "PARTREE stands for parallel tree code," Johnson said. "In a nutshell, it solves the collisionless N-body gravitational interaction problem by approximating the effects of distant particles."

With no approximations, since every particle must interact with every other particle, solutions for large N-body systems rapidly become computationally intractable. When the forces due to more distant particles are approximated, the number of operations can be reduced to a more manageable scale (from order N2 to N log N).

In the parallel tree code, data distribution across processors is accomplished by successively halving the particles, a process called orthogonal recursive bisection (ORB). Then the particles are placed in a so-called Barnes-Hut tree, according to their locations in nested, rectangular subdivisions of the space. At each time step, the code performs an ORB domain decomposition, a local Barnes-Hut construction, and a local essential tree construction (the combination of decompositions on each processor). PARTREE then "walks" through the essential trees to calculate the forces and move the particles, according to Johnson.

Top| Contents | Next

Figure 1 - Cluster of Bright Galaxies
Figure 1. Cluster of Bright Galaxies
John Dubinski of the Canadian Institute for Theoretical Astronomy at the University of Toronto simulated this bright galactic cluster with PARTREE. It shows cluster development 7.5 million years after the Big Bang. The large elliptical galaxy at the center has formed from the merger of several galaxies. The speedup of PARTREE accomplished at SDSC will assist Dubinski and others in making such simulations.


"Optimization proceeds from an understanding of the architecture," Johnson said, "down to the way in which the processors do division and multiplication." He used a test problem, consisting of 2 million particles distributed in a galactic disk and halo of stars, supplied by John Dubinski of the Canadian Institute for Theoretical Astrophysics at the University of Toronto, the main developer of the PARTREE code.

"I analyzed PARTREE performance on a single processor of the SP to profile the code," Johnson said. "And I found that it was spending about 80 percent of the time in two subroutines that accomplish the next-to-last step in the operation." For PARTREE, then, the optimization steps focused on the two key force calculation subroutines.

Johnson introduced inverse square roots for the force calculations, avoiding costly division operations. He also took advantage of the native precision of the calculations. Both machines perform all calculations in 64-bit arithmetic, so the groups of particles on which the routines operate can be copied and converted from single to double precision for a net gain in computational speed. Another tactic was to block the results in sizes convenient for cache operations, anticipating these sizes in the declarations of working storage.

As he worked with the code, Johnson predicted cycle counts for particular operations, then tested the predictions after changing the code. "All of this tuning increased the performance of the routines," Johnson said. "Eventually the performance came within 6% of the theoretical peak speeds of the SP processors. For in-cache data, the tuned code on the SP is about 3.7 times faster than the original, and the tuned subroutines speed up the whole code by a factor of two."

Top| Contents | Next

Figure 2 - Milky Way as a Cannibal
Figure 2. The Milky Way as a Cannibal
Images from an animation of a satellite galaxy being torn apart by the Milky Way's tidal field over the course of several billion years. The Milky Way is represented in blue in the center, with the satellite orbiting around it. This group of images was color coded by the time when particles are lost from the satellite (i.e., the particles that are lost the first time the satellite goes around the Milky Way are shown in red, and each subsequent time in green, blue, yellow....). Studies like this, by Kathryn V. Johnson of The Institute for Advanced Study, will be facilitated by the Strategic Application Collaboration acceleration of the codes that were used.


The other code, SCF, simulates galactic motions due to the gravitational influence of large sets of particles arranged in symmetrical fashion about a small number of centers, as in stellar clusters. "We can also use SCF for spherical galactic clusters," Hernquist said. "We can ask how the perturbation of one galaxy by others proceeds and how external forces shape cluster evolution as a whole."

"The optimizations were generally of two types," said Leary, who optimized SCF. "First, I made arithmetic and algebraic optimizations that improved computational performance by increasing the computational rate in Mflops and by reducing the total operation count. Then I worked on communications optimizations that improved parallel efficiency for the SCF code, generally by condensing many short messages into a few long messages."

Leary changed the coding for trigonometric function calls, obtaining sine and cosine results from tables constructed recursively from a single trigonometric identity, rather than by direct evaluation. He also reduced the work on exponentiation and global summation.

The net effect of the optimizations is illustrated by the performance gains on a 16,000-particle test set included in the original distribution. Here, the single-processor performance on the T3E was 73 Mflops, and the workload was 9,760 floating-point operations per body per time step. The optimizations increased the performance to 88 Mflops and reduced the workload to 6,620 operations per particle per time step. Overall, the speedup was a factor of 1.78 for a single T3E processor, and the parallel efficiencies brought the combined speedup on 32 processors to 2.61.

Leary emphasized that the increase in parallel efficiency, while not vital on the T3E, would be particularly important for the teraflops IBM SP to be installed at SDSC. "From short runs done on SDSC's 128-processor SP, the code appears to be highly scalable and should run well on the new machine," Leary said. "There should be enough memory for several billion bodies, and I would expect the full machine to be able to sustain a significant fraction of its teraflops peak speed on the SCF code."

Top| Contents | Next


"Our codes, originally developed by me with Jeremiah Ostriker at Princeton, are being used by a large, loosely federated group of collaborators," Hernquist said. "In addition to John Dubinski and me, the group includes Steinn Sigurdsson of Pennsylvania State University, Chris Mihos of Case Western Reserve University, Romeel Davé of Princeton, Kathryn V. Johnston of the Institute for Advanced Study, and Roeland van der Marel of the Space Telescope Science Institute. The optimizations accomplished at SDSC will enable all of these people to answer a variety of different questions."

Figures 1 and 2 illustrate some of the studies that can now be carried out with higher resolution or in larger systems. "We are learning about the important role of tidal interactions and galactic collisions and mergers in driving disk galaxy evolution and the formation of elliptical galaxies," Dubinski said. "Ultimately our modeling may help us distinguish between different cosmological models through direct comparison to observations. The dynamical evolution of galaxies will differ in universes that are either open, flat, or closed, with or without a cosmological constant. We are really eager to try our codes on the teraflops-class machine." --MMend note

Top| Contents | Next
Top| Contents | Next