| |
Home
> CSSS Seminars > Barton Miller |
| |
Barton Miller
Computer Sciences Department
University of Wisconsin
http://www.cs.wisc.edu/~bart/
A Path to Performance Diagnosis on 1000+ Nodes
Current performance profiling tools are limited in the size
of the system on which they can measure application programs. These
limitations are based on (1) the amount of data that must be transferred
between the components of the tool and (2) the centralized control
and monitoring that is done at the tool front-end process.
The Paradyn project is developing four new techniques which, when
used together, will allow Paradyn to be used to effectively profiling
applications on 1000+ node systems. The four major components of this
effort are:
- A software multicast/reduction
network layer (MCRNL) that allows for efficient distribution of
control operations and gathering of results from the nodes. MCNRL
incorporates innovative reduction operations and will support
a fault tolerant recovery facility.
- A scalable start-up protocol
that all reduces the amount of front-end to daemon communication
needed. In systems with 1000's of node, the initial interaction
with the corresponding 1000's of daemons can be overwhelming.
- A distribute Performance Consultant
that uses MCRNL to efficiently evaluation global (all node) bottlenecks
and distributes evaluation of local (node specific) bottlenecks
to local Performance Consultant agents. As part of this effort,
we have developed a new, more detailed model of instrumentation
overhead and feedback.
- Sub-Graph Folding, a scalable
visualization technique for displaying the results of the Performance
Consultant's bottleneck search. This technique exploits the regular
structure of SPMD programs, to combine results into equivalence
classes and presenting only the exemplars of the class.
I will describe the design based on these features an present
some initial results as to their effectiveness. |