Skip to content


Beyond MPI and HPF:
KeLP Benefits Dynamic Structured Applications

Scott B. Baden
Associate Professor, Department of Computer Science and Engineering, UC San Diego

Over the past 10 years, writing programs for high-performance parallel computers has become more commonplace. However, the standard approaches do not work well for all types of applications, particularly those that describe complex, changing phenomena, such as turbulent flows and the flow of oil underground. Such behavior is highly irregular, which means scientists must resort to a nonuniform numerical model, and standard parallel programming tools do not offer much support for describing such nonuniform models. Scott Baden of UC San Diego has developed a programming library (a set of related procedures) called KeLP to make it easier to write programs for a wider variety of irregular, dynamic simulations.

"Nonuniform representations are often the most effective means of resolving and simulating diverse physical phenomena, but they require sophisticated runtime support to manage their underlying complexity, especially on parallel computers," said Baden, an associate professor in the Computer Science and Engineering department.

KeLP, short for Kernel Lattice Parallelism, is a C++ runtime library that can handle elaborate dynamic data structures with complicated communication patterns. "We have already applied KeLP to good advantage in a wide variety of applications with nonuniform data representations," Baden said, "including codes that model ab initio molecular dynamics and turbulent flows--and we have done this on many different platforms."



Support for high-performance parallel computing has improved greatly over the past 10 years. There are now some standard approaches for implementing certain kinds of applications, such as the Message Passing Interface (MPI) and High Performance Fortran (HPF). However, when application domains span significant spatial or temporal irregularities, these approaches may not be appropriate. MPI exposes more details than programmers may wish to worry about, and extensions to HPF for handling general irregular data structures is an open problem.

"There is as yet no universal high-level approach to high-performance parallel programming," Baden said. "The best implementation of an application may well depend on both the underlying representations of data and the target hardware itself." Moreover, multi-tiered computers (symmetric multi-processor clusters, for example)--hierarchically constructed parallel computers with several levels of locality and parallelism--require a much more sophisticated orchestration of parallelism and locality to match the hardware capabilities.

"The advantage of KeLP is that it presents a small set of programming abstractions that simplify the implementation of efficient algorithms for irregular or block-structured scientific calculations," Baden said. "KeLP can work alongside MPI or HPF, and using KeLP can make an application significantly more efficient and extensible."

Two Parallel Tools and Environments efforts funded under NPACI include work with Joel Saltz and Alan Sussman at the University of Maryland to couple KeLP applications using Maryland's MetaChaos tool, and the provision of runtime support for UC Berkeley's Titanium project. Baden is working on that with UC Berkeley computer scientists Susan Graham and Katherine Yelick.

KeLP has been developed by Baden and graduate student Stephen J. Fink (Ph.D. expected June 1998). It is based in part on earlier work by former graduate student Scott Kohn (now at the Center for Applied Scientific Computation of Lawrence Livermore National Laboratory). To make KeLP interoperate with HPF, they have been joined by John H. Merlin of the Institute for Software Technology and Parallel Systems at the University of Vienna.

"Our design goals were to answer the special needs of applications that adapt to data-dependent or hardware-dependent conditions at run time," Baden said. "We've supplied a class library that is easy to use, delivers portable performance, provides an appropriate level of abstraction, and enables the re-use of software, including legacy code."

By helping to test the infrastructure for parallel computing under development within NPACI, they are moving towards cost-effective and computationally efficient solutions for real-world problems. Our participation in NPACI is facilitating our development of KeLP as an aid to tomorrow's scientific discovery.

Scott Baden,
UC San Diego


KeLP was designed to be simple and intuitive, and in several applications implemented so far by Baden's group and others, programmers have mastered it quickly and benefited from the system's advantages:

  • KeLP defines only a handful of new data types plus a small number of primitives to manipulate them. This yields an easily understood model of locality that ensures portable performance over a diverse range of platforms, from high-performance workstations to mainframes.
  • KeLP supports a task-parallel model, recently extended to two levels for SMP-based computers, that isolates parallel coordination activities from numerical computation. Existing serial kernels with known numerical properties that leverage mature compiler technology can be plugged in and do not need to be extensively rewritten.
  • KeLP applications may be written in a dimension-independent form, which reduces development time for complex 3-D applications if there exists a 2-D formulation of the problem.
  • KeLP supports general blocked decompositions that arise in finite-difference applications. Users can customize load-balancing activity to the application and treat interprocessor communication at a very high level.
  • KeLP supports run-time metadata, useful in managing multi-resolution data sets distributed irregularly over processors, as well as classes to encapsulate communications activity.

In effect, KeLP is a middle-level implementation between the application and a low-level communication substrate like MPI. The application libraries associated with KeLP let computational scientists concentrate on the application and mathematics instead of low-level concerns of data distribution and interprocessor communication. The new multi-tier version of KeLP encapsulates threads and message passing activity under a single model.

"We must overlap communication with computation to reduce idle times," Baden said. "And nonuniform data decompositions are needed to support this activity, even if the computations are uniform--say, a simple finite-differencing scheme."


The KeLP group is collaborating on two current projects with applications scientists to achieve new levels of performance for large-scale codes with nonuniform data structures. Baden and graduate student Jeffrey Howe worked with Keiko Nomura of the Department of Applied Mechanics and Engineering Sciences at UC San Diego and his graduate student Tamara Grimmett to upgrade a code optimized for the CRAY T90 at SDSC and run it on SDSC's CRAY T3E and IBM SP-2.

The code, DISTUF, performs direct numerical simulations of incompressible, homogeneous, sheared and unsheared turbulent flows. "In its pre-KeLP form, it consisted of more than 10,000 lines of code developed over 14 years, with comments in both English and German," Baden said. "But the vectorized version would not permit the analysis of small-scale turbulent flow at high Reynolds numbers without beggaring the resources of even the largest vector-class computer."

In "KeLP-ifying" and parallelizing DISTUF, Howe and Grimmett employed a principle of minimum disturbance: change as little code as possible, working around the limitations of the Fortran 77 code. The KeLP code added just 500 lines of C++ and KeLP wrapper to the original application. The resulting KDISTUF code has now been run on larger problems that motivated the effort, and Baden, Nomura, and the students will report on the work at the PARA-98 conference in Sweden this June.

The most recent effort joins NPACI's Programming Tools and Environments thrust area, in which Baden participates, with the Engineering thrust area. The KeLP team is working with the Center for Subsurface Modeling (CSM) at the University of Texas at Austin to incorporate the flow, transport, and chemistry portions of their ParSSim reservoir simulation code (see page 4) within the KeLP framework. Baden and programmer Max Orgiyan have made preliminary visits to CSM, and the design of an applications interface for the flow portion of the code (ParCel) is under way.

"Mary Wheeler, Steve Bryant, and their colleagues at CSM are well aware of the need to be able not only to use parallel machines, but also to use them very efficiently to advance the causes of oil reservoir simulation, enhanced oil recovery, and subsurface pollution remediation," Baden said. "By helping to test the infrastructure for parallel computing under development within NPACI, they are moving towards cost-effective and computationally efficient solutions for real-world problems. Our participation in NPACI is facilitating our development of KeLP as an aid to tomorrow's scientific discovery." --MM