Skip to content

Co-scheduling between SDSC and Other Sites

Co-scheduling is the process of making and synchronizing reservations to run jobs on computation resources at multiple sites. This process is one functionality in a suite of processes known under the umbrella term, "meta-scheduling." Although these two coined words are sometimes used interchangeably, co-scheduling refers specifically to reserving time for running jobs.

Grid Universal Remote (GUR)

The GUR tool is a python script which uses the ssh and scp commands to help users make reservations, compile programs, and co-schedule jobs. GUR is installed on the IA-64 clusters at NCSA (Mercury) and at SDSC. Co-scheduling is expected to be available on resources at other sites in the near future. A Web interface is in development.

Paths and Policies

GUR is invoked from the command line and requires allocations at both resources. A softenv key (+gur) has been defined on both systems. Invoking the softenv command places the executable in the user's path. Policies at SDSC for co-scheduling are the same as for other reservations, which are documented in the SDSC User Portal under the Reservations Documentation tab.

Site Path Policies Softenv command
NCSA /usr/local/GUR/gur.py NCSA policy regarding co-scheduling
softenv add +gur
SDSC /usr/local/apps/gur/gur.py Reservations policies in SDSC User Portal softenv add +gur

GUR Workflow

  1. User runs grid-proxy-init to establish grid credential

    grid-proxy-init
  2. User constructs an appropriate jobfile (See example jobfiles)
    vi jobfile
    or
    gur.py --dumpjobfile --output=metajob.script
  3. User runs gur, with jobfile as the input
    gur.py --reserve --jobfile=jobfile
    GUR returns path to file containing reservation information
    GUR: metajob submitted:
    /<working directory path>/<username>
    /info/gur/test/gurdata/metajob.1190763126.7100041
  4. GUR makes reservations at remote clusters. GUR uses gsissh to invoke commands on remote machines.
  5. User runs jobs on remote clusters (See example rsl files)
    mpirun -globusrsl job.rsl
  6. User cancels reservation, with metajob script as the input
    gur.py --cancel
    --metajobfile=/rmount/users01/sdsc/<username>
    /info/gur/test/gurdata/
    metajob.1190763126.7100041

Did You Get
What You
Wanted?
Yes No
Comments