SDSC Data Oasis is an on-line, high-performance, Lustre-based storage resource with a 4 PB capacity that is available to all users of SDSC Gordon and Comet. It was designed to meet the needs of data-intensive research by providing easy-to-use, high capacity, short- to medium-term storage with useable bandwidth on the order of 100 GB/s and latencies that are far lower than near-line and tape-based storage systems. However, it is not an archival system and stored data is single-copy and not backed up.
Data Oasis is divided into several file systems, including local scratch spaces for Comet and Gordon and a shared, persistent 2 PB Project space that is available to users with an allocation on either machine. All projects on Comet or Gordon receive a default allocation of 500 GB.
Data Oasis consists of 64 Object Storage Servers (OSSs) each contributing 72 TB of capacity and connected to two Arista 7508 switches via 2×10GigE links.
On Gordon, the Data Oasis Project Storage is mounted via 64 I/O nodes which act also as routers. The Gordon compute nodes are connected using QDR Infiniband to the I/O nodes, and each I/O node is connected to Data Oasis's Arista 7508 switches via 2×10GigE links.
The default allocation is 500 GB of Project Storage to be shared among all users of a project. Projects that require more than 500 GB of Project Storage must request additional space by sending an email to firstname.lastname@example.org. This e-mail from the project PI should provide a 500 words-or-less justification that includes:
The project's storage requests will be reviewed by SDSC staff, and a decision will be made within 5 business days.
The Data Oasis Project Storage space is mounted on both Gordon and Comet and can be accessed as a filesystem on all login and compute nodes. Each user's personal space can be found in
allocationname is the project's six-character allocation name (found by running the
show_accounts command) and
username is the user's local login name.
Since Data Oasis is mounted as a standard filesystem, UNIX file transfer utilities such as
rsync can be used for transfers of modest size or scale. To enable the efficient transfer of larger amounts of data, Data Oasis is also mounted on the SDSC data mover servers:
While using the standard UNIX file transfer tools (
rsync) is acceptable for simple and small file transfers (< 1 GB) to and from Data Oasis, they cannot realize the maximum performance of the Data Oasis storage resource because of their limited internal buffers and inability to stripe transfers across multiple data mover servers. The preferred method for transferring big data (both large file sizes and large numbers of files) is using GridFTP (a part of the Globus Toolkit). Keep in mind that attempting to transfer large numbers of small files will result in poor performance. Whenever possible, create archives of directories with large file counts before initiating the data transfers.
The XSEDE Data Transfers & Management page provides a detailed explanation of how to use GridFTP and its associated GUI- and terminal-based tools in XSEDE. To facilitate GridFTP with SDSC Data Oasis, the following data movers have Data Oasis mounted under
These data movers are load-balanced in a round-robin fashion, but advanced users may wish to access the individual data movers explicitly via oasis-dm1, oasis-dm2, oasis-dm3 and oasis-dm4.
globus-url-copy provides the greatest flexibility for optimizing transfers between XSEDE resources. To transfer a file from another XSEDE resource (e.g., TACC Stampede) to SDSC Gordon,
$ module load globus $ myproxy-login -l xsedeusername
This will load the commands to use GridFTP and generate the GSI credential needed to access xsedeusername's accounts across XSEDE Resources. Then,
$ globus-url-copy -vb -stripe -tcp-bs 8m -p 4 \ gsiftp://data1.stampede.tacc.utexas.edu:2811///home1/02255/username/somefile.bin \ gsiftp://oasis-dm.sdsc.xsede.org:2811///oasis/projects/nsf/allocation/username/somefile.bin
-vb" enables verbosity (report transfer rate, among other things)
-stripe" enables striped transfers
-tcp-bs 8m" specifies a 8 megabyte TCP buffer. The optimal value for this will vary; Globus provides a way to estimate the optimal tcp-bs value in its documentation
-p 4" indicates that four parallel data connections should be used
By comparison, the equivalent transfer using
scp would be:
$ scp login1.stampede.tacc.utexas.edu:/home1/02255/username/somefile.bin \ /oasis/projects/nsf/allocation/username/
In the case of a 341 MB file transfer test case, GridFTP achieved an average 171 MB/s while
scp achieved only 34.1 MB/s. When transferring terabytes of data, GridFTP is clearly preferable.
This resource is based on a Lustre filesystem which has some limitations. A comprehensive list of Lustre best-practices is beyond the scope of this guide, but it is important to minimize unnecessary access of file metadata. For example,
ls -l" unnecessarily, and consider using "
ls --color=no -U" when navigating Data Oasis
find" and "
du" commands. Use "
lfs find" and "
lfs du" instead
lfs" command is available by default on Gordon and can be loaded using "
module load lustre" on Comet.
/oasis/projects/nsfexists but is empty
SDSC Data Oasis Projects Storage is provided on a per-project basis and is available for the duration of the associated compute allocation period. Data will be retained for three months beyond the end of the project, by which time the data must be migrated elsewhere.
Data Oasis Projects Storage is not subject to automatic purges, but be aware that the data stored there is single-copy and not backed up! Users are responsible for ensuring that critical data are duplicated elsewhere. Data accidentally deleted from Data Oasis cannot be recovered.