Data Oasis User Guide

System Overview


SDSC Data Oasis is an on-line, high-performance, Lustre-based storage resource with a 5 PB capacity that is available to all users of SDSC's  Comet compute resource. It was designed to meet the needs of data-intensive research by providing easy-to-use, high capacity, short- to medium-term storage with useable bandwidth on the order of 100 GB/s and latencies that are far lower than near-line and tape-based storage systems. However, it is not an archival system and stored data is single-copy and not backed up.

Data Oasis is divided into several file systems, including local scratch space on Comet and a shared, persistent 2.5 PB Project space that is available to users with an allocation. All projects on Comet receive a default allocation of 500 GB.



The default allocation is 500 GB of Project Storage, to be shared among all users of a project. Projects that require more than 500 GB of Project Storage must request additional space by sending an email to This e-mail from the project PI should provide a 500 words-or-less justification that includes:

  1. Amount of storage being requested
  2. How the storage will be used
  3. How this augments other non-SDSC storage resources available to the project

The project's storage requests will be reviewed by SDSC staff, and a decision will be made within 5 business days.

Methods of Access

The Data Oasis Project Storage space  can be accessed as a filesystem on all Comet login and compute nodes. Each user's personal space can be found in


where allocationname is the project's six-character allocation name (found by running the show_accounts command) and username is the user's local login name.

Since Data Oasis is mounted as a standard filesystem, UNIX file transfer utilities such as scp, sftp, and rsync can be used for transfers of modest size or scale. To enable the efficient transfer of larger amounts of data, Data Oasis is also mounted on the SDSC data mover servers:

  • (for users with accounts on Comet)

These data movers can be used in conjunction with Globus Online and GridFTP, which are discussed in the Data Transfer Methods section below and the XSEDE Data Transfers & Management page.

Checking your Quota

You can review your groups project storage utilization with the following command

> lfs quota -g <project> /oasis/projects/nsf

Transferring Data to/from SDSC Data Oasis

Data Transfer Methods

While using the standard UNIX file transfer tools (scp, sftp, rsync) is acceptable for simple and small file transfers (< 1 GB) to and from Data Oasis, they cannot realize the maximum performance of the Data Oasis storage resource because of their limited internal buffers and inability to stripe transfers across multiple data mover servers. The preferred method for transferring big data (both large file sizes and large numbers of files) is using GridFTP (a part of the Globus Toolkit). Keep in mind that attempting to transfer large numbers of small files will result in poor performance. Whenever possible, create archives of directories with large file counts before initiating the data transfers.

The XSEDE Data Transfers & Management page provides a detailed explanation of how to use GridFTP and its associated GUI- and terminal-based tools in XSEDE. To facilitate GridFTP with SDSC Data Oasis, the following data movers have Data Oasis mounted under /oasis/projects/nsf:

  • Comet: gsi (XUP File Manager/globus-url-copy) or xsede#comet (Globus)

These data movers are load-balanced in a round-robin fashion, but advanced users may wish to access the individual data movers explicitly via oasis-dm1, oasis-dm2, oasis-dm3 and oasis-dm4.


globus-url-copy provides the greatest flexibility for optimizing transfers between XSEDE resources. To transfer a file from another XSEDE resource (e.g., TACC Stampede) to SDSC Comet,

    $ module load globus
    $ myproxy-login -l xsedeusername

This will load the commands to use GridFTP and generate the GSI credential needed to access xsedeusername's accounts across XSEDE Resources. Then,

    $ globus-url-copy -vb -stripe -tcp-bs 8m -p 4 \
        gsi \


  • "-vb" enables verbosity (report transfer rate, among other things)
  • "-stripe" enables striped transfers
  • "-tcp-bs 8m" specifies a 8 megabyte TCP buffer. The optimal value for this will vary; Globus provides a way to estimate the optimal tcp-bs value in its documentation
  • "-p 4" indicates that four parallel data connections should be used

By comparison, the equivalent transfer using scp would be:

    $ scp \

In the case of a 341 MB file transfer test case, GridFTP achieved an average 171 MB/s while scp achieved only 34.1 MB/s. When transferring terabytes of data, GridFTP is clearly preferable.

Caveats to Users

This resource is based on a Lustre filesystem which has some limitations. A comprehensive list of Lustre best-practices is beyond the scope of this guide, but it is important to minimize unnecessary access of file metadata. For example,

  • avoid performing many small file operations: opens/closes, random reads/writes
  • avoid putting too many (e.g., more than several hundred) files in one directory
  • avoid using "ls -l" unnecessarily, and consider using "ls --color=no -U" when navigating Data Oasis
  • limit unnecessary use of wildcards on the command line
  • avoid using the "find" and "du" commands. Use "lfs find" and "lfs du" instead

The "lfs" command can be enabled using "module load lustre" on Comet.

Troubleshooting / Common Errors

  • Problem: Any attempts to access files on Data Oasis just hang OR access is extremely sluggish/unresponsive
    Solution: This can occur on both login nodes and compute nodes and typically results from Data Oasis being overloaded. These conditions typically "un-hang" within a few minutes; if they persist for longer, contact with the system (or specific compute nodes) on which this is occurring.
  • Problem: /oasis/projects/nsf exists but is empty
    Solution: This problem is infrequent and should be reported to the XSEDE helpdesk with the system (or specific compute nodes) on which this is occurring.


SDSC Data Oasis Projects Storage is provided on a per-project basis and is available for the duration of the associated compute allocation period. Data will be retained for three months beyond the end of the project, by which time the data must be migrated elsewhere.

Data Oasis Projects Storage is not subject to automatic purges, but be aware that the data stored there is single-copy and not backed up! Users are responsible for ensuring that critical data are duplicated elsewhere. Data accidentally deleted from Data Oasis cannot be recovered.