Skip to content

SAM-QFS User Guide: Computation Allocations

Introduction

SAM-QFS is a high-performance archival storage system primarily for users housing data collections. See the main SAM-QFS User Guide page for information about using SAM-QFS with a data allocation. Users with computation allocations whose projects require long-term disk cache residency may contact SDSC Consulting to request access to SAM-QFS. However, SAM-QFS is considered to be in experimental mode for this use. This page contains login and data transfer information for those special computation projects.

It is your responsibility to back up critical data. This storage system is very reliable; however, data can be lost or damaged due to media failures, system software bugs, hardware failures, and user mistakes. Because of the enormous amount of data involved, SDSC maintains only one copy of SAM-QFS data on tape. For dual-copy capabilities or offsite backups, send a request to datacentral-allocations@sdsc.edu.
Back to Top

Access From DataStar

On DataStar, SAM-QFS acts as a mounted drive with the root name /archive. The environment variable $ARCHIVE is defined in your SDSC .login and .cshrc files. To transfer data, use standard UNIX commands such as copy (cp) and move (mv). For high-performance file transfers, use a GridFTP client.

For special projects involving more than 1TB of data or requiring long-term disk cache residency, contact SDSC Consulting.

SAM-QFS is not intended for interactive use. SAM-QFS can be accessed from DataStar's interactive p690 node, but is not intended for interactive use during job scripts. Although the files appear to be on a live on-line disk cache filesystem, the data may actually reside on tape. The automatic retrieval of data off of tape is not fast enough to handle interactive use.

Back to Top

Access From the IA-64 Linux Cluster (TeraGrid Cluster)

Authentication: GSI-SSH

Grid computing users must use GSI-SSH authentication to access SAM-QFS. To begin using gsissh, you must first set up your certificates and DNs:

  1. Log in to any SDSC machine. To obtain a direct login to SAM-QFS, send a request to SDSC Data Central.
  2. Obtain a certificate from the machine:
    % cacl
  3. Add a Distinguished Name (DN) entry to the machine's gridmap file:
    % gx-map -interactive
  4. For each session, create a temporary proxy certificate:
    % grid-proxy-init

For more detailed information on TeraGrid authentication methods and using gsissh, visit the TeraGrid User Guide.


Back to Top

Access From SRB

SDSC's Storage Resource Broker (SRB) is a client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and accessing replicated data sets. SRB, in conjunction with the Metadata Catalog (MCAT), provides a way to access data sets and resources based on their attributes and/or logical names rather than their names or physical locations. For more detailed information about SRB, visit the SRB User Guide.

To begin using SRB:

  1. Obtain a SRB account.
  2. For any non-SDSC machines, download and install the latest version of SRB.
  3. From UNIX, check your environment file for the proper settings:
    % cat ~/.srb/.MdasEnv
  4. Your .MdasEnv file should look similar to:
    mdasCollectionHome <home_collection_name>
    mdasDomainHome <MCAT_user_home_domain_name>
    srbUser <MCAT_username>
    srbHost archive.sdsc.edu

    AUTH_SCHEME <PASSWD_AUTH | ENCRYPT1 | SEA_AUTH | GSI_AUTH>
    SERVER_DN <server_user_distinguished_name> (for GSI authentication only)
  5. Check to see that you have a password:
    % cat ~/.srb/.MdasAuth

For each SRB session:

  1. Initiate a session:
       % Sinit
  2. Use Scommands such as Scp, Sget, Sput, etc. to transfer files.
  3. Transfer using the host name archive.sdsc.edu.
  4. For SAM-QFS only, you may also use Sstage to stage data.

Back to Top

Transferring Data to and from Compute Nodes

Many methods are available for transferring data. A brief summary is presented below. For more information on TeraGrid data transfer, see the help on tgcp and globus.

Moving & Porting Data: GridFTP

GridFTP is a high-performance secure File Transfer Protocol optimized for high-bandwidth wide-area networks. The following table describes some GridFTP clients you may use to access SAM-QFS, a.k.a. archive.sdsc.edu.

Client Description
uberftp
  • A wrapper for globus-url-copy with an interactive interface.
  • It supports parallel data channels and striping.
  • The syntax is less verbose and less prone to typographical errors.
  • uberftp is the recommended client for SAM-QFS.
For more information, visit the TeraGrid User Guide.
tgcp
  • A wrapper for globus-url-copy with a command line interface.
  • It simplifies efficient copying of files and directories between and within gridFTP enabled clusters.
For more information, visit TeraGrid Data: Transfer Examples.
globus-url-copy
  • A command line interface.
  • It is not interactive, but it is the client of choice for embedding transfers in job scripts.
For more information, visit TeraGrid Data User Guide.


Back to Top


Did You Get
What You
Wanted?
Yes No
Comments