COMPLECS: Batch Computing (Part III) High-Throughput and Many-Task Computing

Not all computational problems utilize the types of parallel applications traditionally designed to run on high-performance computing (HPC) systems. Today, many workloads running on these systems often require a modest amount of computing resources for any given job or task. For specific research workloads, however, a more important consideration is how much aggregate compute power can be consistently and reliably leveraged against a problem over time. These high-throughput computing (HTC) workloads aim to solve larger problems over extended periods by completing numerous smaller computational subtasks. For example, these often involve significant parameter sweeps over simulation input parameters or regular processing and analysis of data collected from specialized instruments. In some cases, these problems are also composed of numerous district computational subtasks linked together in highly structured, complex workflows, which can become a challenge in and of themselves to design and manage effectively. If your research problem can leverage a high-throughput or many-task computing (MTC) model, then learning how to build and run these types of workflows safely and effectively on HPC systems is vital.

In this third part of our series on Batch Computing, we introduce you to high-throughput and many-task computing using the Slurm Workload Manager. In particular, you will learn how to use Slurm job arrays and job dependencies, which can be used to create these more structured computational workflows. We will also highlight some problems you’ll likely encounter when you start running HTC and/or MTC workloads on HPC systems. This will include a discussion on job bundling strategies — what they are and when to use them. Additional topics about high-throughput and many-task computing workflows will be covered as time permits.

---

COMPLECS (COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure) is a new SDSC program where training will cover non-programming skills needed to effectively use supercomputers. Topics include parallel computing concepts, Linux tools and bash scripting, security, batch computing, how to get help, data management and interactive computing. Each session offers 1 hour of instruction followed by a 30-minute Q&A. COMPLECS is supported by NSF award 2320934.

Instructor

Marty Kandes, PhD

Computational and Data Science Research Specialist, SDSC

Marty Kandes is a Computational and Data Science Research Specialist in the High-Performance Computing User Services Group at SDSC. He currently helps manage user support for Comet — SDSC’s largest supercomputer. Marty obtained his Ph.D. in Computational Science in 2015 from the Computational Science Research Center at San Diego State University, where his research focused on studying quantum systems in rotating frames of reference through the use of numerical simulation. He also holds an M.S. in Physics from San Diego State University and a B.S. in Applied Mathematics and Physics from the University of Michigan, Ann Arbor. His current research interests include problems in Bayesian statistics, combinatorial optimization, nonlinear dynamical systems, and numerical partial differential equations.

Questions?

Contact SDSC Events Coordinator

COMPLECS: Batch Computing (Part III) High-Throughput and Many-Task Computing - Slurm Edition

Instructor

Marty Kandes, PhD

Computational and Data Science Research Specialist, SDSC

Questions?