Kim's 'K'omputational 'K'emistry Software 'K'orner
Summary of Available Software

Research | People | Publications | Grants | Comments | Home Page

Chemical calculations that predict structures, energetics, and other properties of experimentally known or unknown molecules provide a fundamental resource for chemical research today. The basis of these calculations lies in an area of theoretical chemistry called molecular quantum mechanics. This science relates molecular properties to the motion and interaction of electrons and nuclei. Since the chemical properties of atoms and molecules are determined by their electronic structure, it is necessary to understand the nature of the motions and energies of the electrons and nuclei.

The understanding of molecular systems at this detailed level requires high level mathematical formulations that govern and allow prediction of molecular structure and properties. The ultimate goal of these calculations is the application to problems of general chemical/experimental interest such as a) the determination of reaction mechanisms, b) the study of the details of molecular forces and their role in structure determination, and c) the calculation of detailed potential energy surfaces and dynamics for reaction processes. Illucidations in these areas in turn lead to advancements in areas such as materials chemistry, electronics, environmental chemistry, and medicinal chemistry.

The degree of complexity in these types of molecular applications has caused a greater focus on advanced high performance computing methods, such as massive parallelization, more sophisticated visualization, and robust networked communications for data transfers. The importance of parallel computers in increasing the size and complexity of chemical systems that can be treated with quantum mechanical techniques has been realized for some time now by many groups. Here we report on results of GAMESS, General Atomic and Molecular Electronic Structure Systems, a quantum chemistry program which solves the Schrödinger Equation at various levels, leading to direct quantitative predictions of chemical phenomena from first principles (ab initio).

Parallelization Efforts

Techniques for parallelization of ab initio codes are not new. Investigators have parallelized many aspects of the quantum mechanical algorithmic procedures, including the two-electron repulsion and gradient integrals, integral transformation, MP2 energy, configuration interaction energy, and coupled-cluster energy. Much of the detail of the parallelization strategies used for GAMESS can be found in the original papers. Originally, GAMESS was parallelized using the tool kit TCGMSG (Theoretical Chemistry Group Message Passing), a message passing library developed by the theoretical chemistry group at Argonne National Laboratory [9]. This package implements a distributed memory MIMD programming model on distributed memory hardware, as well as on some shared memory multiprocessor systems.

Since the original parallel implementation via TCGMSG, several modifications have been made in order to implement GAMESS on other muliprocessor machines. To run on alpha clusters, DEC, in Ireland, succeeded in making a version of TCGMSG for AXP clusters that uses 32 bit integers. To run on the CRAY/T3D platform, Martin Feyereisen has written special code to translate TCGMSG to PVM. The Paragon and SP2 parallel versions of GAMESS are now also being implemented without TCGMSG, using only the native message-passing environments of these machines. Recently, we have ported GAMESS to our CRAY/T3E (blah, blah), 256 node parallel processor, using PVM.

GAMESS will currently evaluate the energy and gradient of any of the Hartree-Fock wavefunctions (RHF through GVB) in parallel. In order to parallelize these SCF wavefunctions, the following sections of the program were modified to run in parallel: 1e- integrals, 1e- ECP integrals, 2e- integrals, matrix manipulations to set up the SCF equations, "semiparallelization" of the matrix diagonalization, 1e- gradient, 1e- ECP gradient, and the 2e-gradient. Once the energy and gradient run in parallel, many other run types are effectively parallelized, for example geometry searches or numerical hessians.

For illustration of performance, we have chosen the cyclophane, HSi[(CH2)3]3C6H3, (Figure 1: sicage.pict), as we have comparison with several other parallel processors. Although this cyclophane has C3 symmetry, the test data were deliberately obtained by running in C1 to simulate an asymmetric case. Using the 6-31G(d) basis set, the calculation involves 288 basis functions. Reported here are total time (wall clock time), speedup ratio given as s(p) = t(p)/t(1) and, efficiency of the parallelization given as s(p)/p * 100%, where p is the number of processors for the molecules under consideration.

Figure 2 shows log plot of a partial task distribution for the silicon cage molecule. In general, the bulk of the computational cost is in the RHF step and the calculation of the 2-electron gradients. Both of these steps are nearly linearly parallel until the amount of work per node is greatly reduced, at which time one sees a leveling off in efficiency. For the cyclophane, this happens relatively quickly (~64 nodes), but for larger molecular systems, this linear nature of parallelism is extended out to many nodes.

    

It is interesting to look at the balance of the CPU time and the efficiency of the run. The crossover point between the speedup and the efficiency curves can give a general guideline for the optimal number of nodes to efficiently run on. Figure 3 gives a view of the compromise between speedup and efficiency in the case of the silicon cage benchmark. At 64 nodes, one still sees over 50% efficiency and a continued increase in speedup, and at 96 nodes, the two curves are nearly identical, the efficiency now dwindling to around 45%.

A number of larger test cases have been calculated to test the bounds of the parallel environments. The calculations have been done at a variety of basis sets, from 320-1100 basis functions, and on a variety of different node number combinations (96-192). In particular, a series of buckminsterfullerene fragments have been studied using the DZV(2d,p) level basis set. Figure 4 is an illustration of one of the largest calculations in this set, C50H10, which involves 1100 basis functions and was run on 192 nodes of the T3E.

Hardware Comparisons

The following figure illustrates graphically the comparison of five parallel platforms for the calculation of the silicon cage cyclophane. The T3D and Paragon platforms behave similarly. The workstation-based parallel platforms show very good performance for GAMESS in general. One sees that the DEC/AXP (SP2) is at least 3 (7) times faster than a Paragon node. These machines in principle have much better I/O capability than a Paragon since the scratch disks can be directly connected to each node. A comparison of the T3D versus the T3E over many molecular constructions show that the T3E is about 6 times faster than the T3D. These calculations again show the significance of targeting the right size and number of nodes for a particular calculation. Too small a memory capacity cause a significant decrease in efficiency due to paging; too large a memory capacity causes a degradation of efficiency due to Amdahl's Law.

Analysis of Results

With larger calculations come larger data sets to analyze. Visual analysis of such data sets becomes crucial to interpreting the results. Examples include energy minimized structure of a molecule, reaction path trajectories, 3-D molecular electrostatic potential maps, and 3-D molecular orbital data. To address the individual molecular quantum chemical modeling needs of chemists, SDSC principal scientist, Kim Baldridge and, staff scientist, Jerry Greenberg created a molecular visual analysis software package, called QMView, that allows flexibility as to the types of input and output data formats used in different calculation programs. QMView running on any of a variety of workstations (The original version of QMView was specifically written for the SGI platform using the IRIS GLTM graphics library. A new version, written with the OpenGLTM application programming interface and the GLUT utility library has been written and will run on any UNIX platform that has an OpenGL server), can either connect directly with GAMESS running on a parallel computer via a socket connection or through output files created by GAMESS. A QMView library is also available to link with a FORTRAN program running on a remote supercomputer.