Co-scheduling between SDSC and Other Sites
Co-scheduling is the process of making and synchronizing reservations to run jobs on computation resources at multiple sites. This process is one functionality in a suite of processes known under the umbrella term, "meta-scheduling." Although these two coined words are sometimes used interchangeably, co-scheduling refers specifically to reserving time for running jobs.
Grid Universal Remote (GUR)
The GUR tool is a python script which uses the ssh and scp commands to help users make reservations, compile programs, and co-schedule jobs. GUR is installed on the IA-64 clusters at NCSA (Mercury) and at SDSC. Co-scheduling is expected to be available on resources at other sites in the near future. A Web interface is in development.
Paths and Policies
GUR is invoked from the command line and requires allocations at both resources. A softenv key (+gur) has been defined on both systems. Invoking the softenv command places the executable in the user's path. Policies at SDSC for co-scheduling are the same as for other reservations, which are documented in the SDSC User Portal under the Reservations Documentation tab.
| Site | Path | Policies | Softenv command |
| NCSA | /usr/local/GUR/gur.py | NCSA policy regarding co-scheduling |
softenv add +gur |
| SDSC | /usr/local/apps/gur/gur.py | Reservations policies in SDSC User Portal | softenv add +gur |
GUR Workflow
- User runs grid-proxy-init to establish grid credential
grid-proxy-init - User constructs an appropriate jobfile (See example jobfiles)
vi jobfile
orgur.py --dumpjobfile --output=metajob.script
- User runs gur, with jobfile as the input
gur.py --reserve --jobfile=jobfile
GUR returns path to file containing reservation informationGUR: metajob submitted:
/<working directory path>/<username>
/info/gur/test/gurdata/metajob.1190763126.7100041 - GUR makes reservations at remote clusters. GUR uses gsissh to invoke commands on remote machines.
- User runs jobs on remote clusters (See example rsl files)
mpirun -globusrsl job.rsl
- User cancels reservation, with metajob script as the input
gur.py --cancel
--metajobfile=/rmount/users01/sdsc/<username>
/info/gur/test/gurdata/
metajob.1190763126.7100041


