ith current technology, chemists can study molecular interactions that happen in picoseconds: chemical reactions in liquids, transitions between phases of condensed matter, and the interactions between water and dissolved proteins are just a few possibilities. These picosecond events, however, are only the starting points for processes--the folding of an entire protein, for example--that take a million times longer, microseconds, to occur.
David Chandler, chemistry professor at UC Berkeley, is pursuing ways to make detailed simulations of these longer processes and the key rare events within them. As a member of NPACI's Molecular Science thrust area, Chandler and his group will be working with the Programming Tools and Environments thrust area to implement algorithms software that will make such simulations feasible on the highest-performance computing systems.
Rare events can be defined in practical terms as dynamic processes that occur so infrequently that they cannot be studied by "brute force" trajectory calculations. For example, a simulation to study how water molecules interact with a dissolved protein during the folding process would have to simulate every picosecond of the process for a full microsecond. If it takes one minute of film to record one picosecond of the dynamics, a film of an entire microsecond would take nearly two years.
Most of this microsecond, however, consists of less important interactions between water molecules and the protein. The "good parts" for chemists are the points where the protein changes from one relatively long-lived shape to another, called an isomerization. These isomerization events might happen once every thousand picoseconds, and Chandler's work concentrates on how to focus simulations on these rare events without wasting cycles during the long periods between the rare events.
"We need to be able to focus simulations on these 'good parts' so that we can perform efficient quantitative studies of rare events," Chandler said. "Once we have such simulations, we can run them over and over to collect enough events from different examples of the process for a meaningful statistical analysis."
In a chemical reaction, for example, the dynamics can be viewed as a rugged energy landscape. In the valleys--low-energy sites--the reactants are in stable states. The peaks represent high-energy barriers that tend to keep the reactants in stable states. But the peaks are broken up by accessible mountain passes, bottlenecks through which the reactants travel as the reaction proceeds. Catching the reactants in the act of traversing these passes, or transition pathways, are the rare events that Chandler's group studies.
If the location of these mountain passes were known, the problem would be much easier. Straightforward trajectory calculations could reconstruct the path from one stable state through the bottleneck to the next stable state. For most chemical reactions, however, the location of these energy mountain passes is unknown (Figure 1).
Figure 1: "Mountain Pass" Between Stable States
In the energy landscape of chemical reactions, the transition from one stable state to another (circles) must traverse a lower-energy "pass." If the landscape were smooth, the problem would be a minimization, but in reality, the landscape is much more rugged. David Chandler likens finding these transition pathways to throwing ropes over mountain passes in the dark. Rather than searching for explicit saddle points, Chandler's methods sample a set of possible pathways and find those in which the 'rope' settles in the vicinity of the lowest energy pass.
Without any information about the possible paths, there would be no tractable way to compute the dynamics of the transitions short of brute force. However, while the location of the energy mountain passes are unknown, it is much easier to specify the location of the stable states in the energy landscape. And the transition pathways must go from one stable state to another. The trick is to link the stable states with the lowest energy path.
"I have likened this task to throwing ropes over rough mountain passes in the dark," Chandler said. "Rather than searching for explicit saddle points, a useful strategy must effectively sample a set of transition pathways and find those in which the 'rope' connecting the stable states settles in the vicinity of the lowest energy pass."
Building on an idea from Lawrence Pratt at Los Alamos National Laboratory, the pathways are treated as a "chain" of discrete states, rather than a continuous rope. With a chain, the links are actually time slices that snake across the energy landscape. (It is convenient to think of the landscape as 3-D, but the true energy "action-scapes" for complex reactions generally have many more dimensions, and thus much more intensive computing needs.)
New chains or paths are created by initiating trajectories, forward and backward in time, from the configuration of a stochastically chosen intermediate time slice on a prior path (Figure 2). The new chain is accepted or rejected, in accord with the Monte Carlo requirements of detailed balance, to be consistent with the statistical characterizations of the initial and final stable or metastable states. Equilibration to physically relevant paths is rapid, even when the initially chosen path is chosen arbitrarily and unphysically.
Massive parallelization enters such procedures in a number of ways. For example, because the dynamics is sampled by Monte Carlo methods, multiple processors are used as sources for harvesting statistics. In an alternative scheme, all time slices are analyzed simultaneously, with a few contiguous time slices designated to one processor, the next subsequent few time slices designated to the next processor, and so on. Due to the Markovian nature of dynamics, communication is only necessary between nearest neighbor processors.
Figure 2: Particle Paths as Polymers
Adding time as a third dimension to a 2-D particle system, the movements of particles can be viewed as directed polymers in a 3-D system. Using such "chains" allows Chandler to apply different computational methods to find transition pathways. However, with on the order of a hundred or a thousand links, the chains multiply the complexity of a given reaction. While higher-performance computers might make such calculations more feasible, new algorithms will help reduce the computation time. The box represents a region in the 2-D system that is unoccupied in the initial stable state and occupied in the final stable state.
While Chandler's method enables dynamics simulations to focus on the rare events, scientists still must pay the computational cost of treating a very large system. Using the chains, with on the order of a hundred or a thousand links, multiplies the complexity of a given reaction. For example, studying a 100-particle system requires simulating a 100,000-particle system. While access to a teraflops-class computer might make such calculations more feasible, algorithmic development will help reduce the computation time. Chandler's research, in particular, would benefit from a more efficient, general Monte Carlo package.
Such algorithmic development is one focus of NPACI's Programming Tools and Environments thrust area. Jack Dongarra of the Center for Research on Parallel Computation and the University of Tennessee, James Demmel of UC Berkeley, and their colleagues have produced widely used software for parallel linear algebra, including the LAPACK and ScaLAPACK libraries, and computations on vector and parallel high-performance computers. Yet several areas in linear algebra remain to be addressed in order to allow wide use of parallel computing. These areas include developing software libraries for distributed-memory machines, dense nonsymmetric eigenvalue problems, parallel algorithms for large-scale eigenvalue problems, sparse linear least squares, multigrid algorithms, sparse linear systems, and linear algebra for signal processing.
Algorithm development has become increasingly valuable as scientists such as Chandler build more demanding models at the same time that computer architectures--faster processors, deeper memory hierarchies, clusters of processors, and networked metasystems--are proliferating. Dongarra and Demmel will apply their experience to designing algorithms and software to help Chandler and other NPACI application scientists tune their codes. The resulting algorithms will be packaged so that other scientists can take advantage of them.
The research in Chandler's group will benefit from the collaboration with the Programming Tools and Environments thrust area, but through NPACI, the general Monte Carlo package will be made available to other researchers in chemical physics. Such a general package might also be linked to quantum chemistry calculations to study chemical reactions in solution and other complex systems. --DH