(DRAFT 7/11/95)

SDSC - T3D Systems Software Plan...

Several software systems must be altered in order to support the T3D:

1) The resource management system ("res")

2) The quarterly account update software (upfile, qtr_update, etc.)

3) The daily accounting software ("csarun")

4) The user interface to daily accounting ("acctrep")

5) The default login scripts may have to be modified

6) The NQS queue configuration

7) The UNICOS user database ("udb")

8) UNICOS must be upgraded to release 8.0.4

9) The OWS and MWS Sparc systems may require software updates

10) The operator scripts such as "daily" and "opi"

11) The new user utility ("nu")

12) The qsub command

13) UNICOS re-configuration (filesystems, devices, etc.)

14) Miscellaneous notes

Also, new utilities will be required:

15) A utility similar to xpartinfo which displays MPP nodes in use. A command called "mppmon" is available and may provide some of this information.

SPECIAL NOTES:

a) After compiling this list I am not certain whether charging for memory on the T3D is possible, or even sensible. I am tempted to ignore the issue of memory tracking and charging until we gain a better understanding of shared memory on the T3D.

b) This is certainly a draft document. Many of these comments have been made without first analyzing the existing software. The major purpose of the document is to list some of the changes which we expect to make in the coming weeks... and hopefully as a result the T3D systems software will be consistent with our planned use of the T3D.

c) Proposed action items for the above tasks:

1) Wayne (coord w/ Larry, Rachel)

2) Wayne

3) MikeV

4) Larry (coord w/ Wayne)

5) Larry (coord w/ consult)

6) Larry (coord w/ consult)

7) Larry (coord w/ Cindy, Wayne)

8) Cindy, MikeM (coord w/ consult)

9) Cindy, MikeM

10) MikeV

11) Cindy (coord w/ Donna)

12) Larry

13) Cindy, MikeM

14) gcc=consult, ARSC's res=Wayne

15) George, DavidM, MikeV

================================================================

1) THE RESOURCE MANAGEMENT SYSTEM...

The resource management system ("res") must be altered to support allocations for the T3D. T3D allocations will be handled as follows:

* T3D usage will come primarily from special T3D allocations

* These allocations will be determined by the allocation committee.

* Allocations may be granted throughout the quarter (using the expedited allocation process)

* Traditional C90 users (using their C90 allocations) may, on a case-by-case basis, be granted access to the T3D. Such access will require the approval of (someone). There is great concern that such non-allocated access to the T3D may result in over-subscription of the system. However, the accounting software will be designed to handle this capability. (DOUBLE CHECK THIS POINT)

The following resource management system commands require modification:

/usr/local/etc/resd:

a) Add a bit field which designates a given record as being C90, T3D, or both. Normally, traditional C90 allocations will be marked "C90," new T3D allocations will be designated "T3D," and special cases where traditional C90 users are granted (sometimes on a temporary basis) access to the T3D will be designated as type "both."

This will allow a single record format to be used for C90 and T3D allocations.

(QUESTION: SHOULD THIS BIT BE SET FOR AN ENTIRE ACCOUNT, OR JUST FOR SPECIFIC USERS??? I THINK IT WOULD BE BEST IF THIS COULD BE DONE ON USER/ACCOUNT PAIRS. THIS WOULD BE MOST FLEXIBLE)

b) Add the following new fields to the standard user/account record:

* T3D node hours scheduled (DOUBLE CHECK THIS)

* T3D node hours used

* T3D Service Units used

* T3D memory integral (DOES THIS MAKE SENSE???)

(The idea behind the "hours scheduled" field is to be able to distinguish between nodes allocated and nodes used. That is, we would be able to easily detect a user who, for example, is sleeping on T3D processors.)

These new fields will be debited from the allocation just as CPU, disk and memory charges are currently handled by res.

c) Add the ability for resd to accept T3D node usage information using a mechanism similar to that used to incorporate disk charges. That is, a script or daemon (/usr/local/etc/resmpp) from outside of resd will determine T3D node usage and will then incorporate those charges into resd.

/usr/local/bin/reslist:

a) Add the following fields to the standard reslist output:

* T3D Service Units used

Maybe only T3D fields should be displayed for user/accounts which have the "T3D" bit set.

b) Add the following fields to the output when the "-t" option is used:

* T3D node hours scheduled (DOUBLE CHECK THIS)

* T3D node hours used

* T3D Service Units used

* T3D memory integral (DOES THIS REALLY MAKE SENSE???)

Only T3D fields should be displayed for user/accounts which have the "T3D" bit set.

c) Add a "-T" option:

* Add the command line option "-T" to force reslist to print the T3D fields regardless of whether the special "T3D" field is set.

/usr/local/bin/resalloc:

a) Add the following commands within resalloc:

* Do we want a special command for transferring allocations from a traditional C90 account to a T3D account ???

* Do we want a special command for transferring allocations from a T3D account to a traditional C90 account ???

/usr/local/bin/resadmin:

a) Add the following commands:

* Do we want a special command for designating an allocation as being either a T3D or traditional C90 account ???

The following new commands will have to be developed:

/usr/local/etc/resmpp:

This command (script or executable) will generate input for resd. It will generate a report which shows the service unit charge For each user/account active on the T3D. Its output will contain the fields:

User id (integer)

Account id (integer)

Service units charged (floating point)

Node-hours used (floating point)

Node-hours scheduled (floating point)

Memory used (floating point) (DOES THIS MAKE SENSE???)

Memory charged (floating point) (DOES THIS MAKE SENSE???)

This command should be designed to be executed at 5, 10 or 15 minute intervals.

/usr/local/bin/t3d:

This command should be designed for use by all users. Some considerations to consider:

* Provide additional information if user is T3D-enabled

* Disguise user-specific info for users from other sites.

* Provide special motd-like info for T3D users.

This command should have an option which prints a table showing T3D node usage, time each job has been running, time limit (if any) for each job, etc. The information should be similar to the output of the xpartinfo command.

This command should have an option which shows which jobs are scheduled but not running, the priority of each job, the length of time each job has been queued, etc.

NOTES:

Since the UDB grants access to the T3D on a per-user basis... and we wish instead to grant access to user/account pairs... we need some way to control users. For batch jobs such control can be made within the "qsub" command. A user will not be allowed to submit a batch job to an account unless the account/user pair is enabled to allow such access.

However, in order to avoid an interactive T3D-enabled user from using the T3D while running from a non-T3D-enabled account the resource management system must be able to detect this... and to kill such processes. Or... maybe the "mppexec" utility can be modified to check that the user is running the T3D program under an enabled account. I do not know whether SDSC can get source code for the "mppexec" utility.

================================================================

2) THE QUARTERLY ACCOUNT UPDATE SOFTWARE...

The quarterly account update software will have to be modified to support the new fields which have been added to the database. Also, Donna's account update scripts in her home directory may also require updating.

================================================================

3) THE DAILY ACCOUNTING SOFTWARE...

The daily accounting software ("csarun") may need to be modified. The script will have to be reviewed before the T3D begins to operate in a production mode.

================================================================

4) THE USER INTERFACE TO DAILY ACCOUNTING...

/usr/local/bin/acctrep must now support the following:

* The "nsfrep," "usages" and "usagesl" reports must handle the new machine type. The NSF designation for a 128 node T3D must be obtained by Rachel.

* The "user," "account," "project," and "site" report types must by updated to include T3D usage information.

* The "summary" report type must be modified to include T3D summary usage information.

The following output fields must be added to the "user," "account," "project," and "site" reports:

* Service units (minutes) charged (floating point)

* Node-hours used (floating point)

Note that priority-based charging in acctrep must be done by multiplying the PE (processor element) usage by the priority at which the job was run. In the beginning we can assume that all jobs are run at equal (static) priorities... but eventually we will want to configure NQS to schedule based on priority... and acctrep will have to be able to handle this. This may require local modification of the "csajrep" or "csacrep" commands. It is possible that the priority-based charge can be placed into the raw accounting data files by using some other non-used field. And, acctrep (which calls csacrep) could then extract the priority-based information.

================================================================

5) THE DEFAULT LOGIN SCRIPTS...

Changes may be required in the default login scripts:

* Special motd-like messages for T3D users

* Special environment variables may need to be set for T3D users

================================================================

6) THE NQS QUEUE CONFIGURATION...

To begin with, a special T3D-only queue will be created. The queue will be called "t3d." All T3D codes must be run via this queue.

As we gain more experience with the T3D we may decide to increase the points of access to the T3D. This will allow priority-based scheduling.

================================================================

7) THE UNICOS USER DATABASE...

Access to the T3D is controlled via bits in each user's UDB record. The following command (already available) can be used to see if a given user has access to the T3D:

udbsee -v (username) | egrep -e "pelimit|mpp"

By default interactive users will have a limit of having access to only 16 PEs. Batch jobs will have a limit of having access to 64 PEs.

As we gain more experience with scheduling on the T3D we may decide to try things such as:

* Allowing interactive T3D processing only during prime time

* Running a standby queue

* Supporting a special dedicated queue

* Supporting a special industrial queue for preferred scheduling

Ideally, we would like to maintain a list of accounts to which any given user is T3D-enabled. We could do this by duplicating the "acids" field in the UDB and calling it "t3d_acids." This list would include those accounts under which the given user may execute T3D jobs. This would allow us to prevent a T3D-enabled user from running T3D jobs from a traditional C90 allocation. This method however would require modification of the UDB structure... and could potentially affect other software which relies on the UDB.

This same functionality could be done outside of the UDB. That is, we could maintain a file which includes user/acid pairs for those users who have access to the T3D. This is similar to how the /usr/local/etc/aciddesc file is used by some of the accounting tools.

Finally, the resource management system could be used to determine whether a given user/acid pair should have access to the T3D. A stand-alone utility could query the "res" database for this information.

================================================================

8) UNICOS SOFTWARE MUST BE UPGRADED...

The following software must be installed:

* UNICOS must be upgraded to release 8.0.4

* UNICOS MAX Operating System installed on the T3D

* CF90 "M" compiler installed on the C90

* Standard C "M" compiler installed on the C90

* C++ "M" Compiling System installed on the C90

* C++ "M" Tools Library installed on the C90

* C++ "M" Mathpack Library installed on the C90

The following software will have to be installed in order to support the ATM interface which we plan to install (on a temporary basis) in September:

* UNICOS-under-UNICOS

================================================================

9) THE OWS AND MWS SYSTEMS MAY REQUIRE UPDATES...

The following software may need to be installed:

* OS upgrade (SunOS) on the existing OWS and MWS systems

* OS installation (Solaris???) on a new OWS and/or MWS system

================================================================

10) THE OPERATOR MONITORING SCRIPTS...

The operators will require the following:

* The "daily" script will have to be modified to report T3D usage/uptime statistics

* A run-time script will need to be developed which will allow the operators to determine whether the T3D is operational.

================================================================

11) THE NEW USER ("nu") UTILITY...

The nu utility will have to be updated to begin using a new set of defaults for access to the T3D. By default, traditional users will be denied access to the T3D. Special "T3D" accounts will be granted batch and interactive access. The default limit is 16 PEs for interactive work and 64 PEs for batch jobs.

A new question ("Is this a T3D account?") may have to be asked during the data-entry portion of the program.

================================================================

12) THE QSUB COMMAND...

The qsub command must be modified to dis-allow a T3D-enabled user from submitting a job using an account which is not also T3D-enabled.

================================================================

13) UNICOS SOFTWARE CONFIGURATION CHANGES...

A number of re-configurations must be performed in order to support the new hardware. The following list is VERY preliminary... and may likely change in the coming weeks:

* Filesystem layout will change due to the addition of 50GB of additional disk space and the installation of UNICOS-under-UNICOS

* Determine best filesystem layout for new configuration. Consider massive I/O required to perform roll-n / roll-out on MPP.

* Move the production FDDI network interface to another I/O cluster

* Remove (or relocate) the old network interface

* Remove (or relocate) the tape controller

* Relocate one of the two HIPPI interfaces

* Install a new DD-62 disk array and remove the older DD-42 disk drives

================================================================

14) MISC. NOTES...

a) The gnu "gcc" compiler has been ported to the T3D. SDSC should install this in /usr/local/apps/gcc.

b) ARSC has a 128 node T3D and uses SDSC's resource management system. We should get a copy of their version of "res" for use in our development efforts.

================================================================