Mass Store

From SRB

Revision as of 20:05, 7 April 2006; view current revision
←Older revision | Newer revision→

This is a description of how the SRB can be used as a complete Mass Storage System itself, as implemented in SRB 2.0.2 (also see Release_Notes_2_0_2).

IMPORTANT NOTE: Source release of the tape library server for the STK silo (item c of 1) CANNOT be done because it is developed using StorageTek's CSC Developer's Toolkit which requires software licensing for any source access to software written with the Toolkit. Binary release is available only for the Solaris platform and will be included in the next patch. Please email srb@sdsc.edu if you need a copy of the binary release before the next patch.

1) Components of the MSS:

a) A new type of resource "compound resource" - A compound resource may be configured to contain a pool of cache resources and a tape resource. When a user creates a file using a compound resource, the object created becomes a "compound object". The actual data of a "compound object" may reside on cache or tape or both. Unlike the SRB replica, a "compound object" always appears as a single object even though there may be multiple copies of the data. It is a simple hierarchical system where data migrate automatically between cache and tape. Data is always staged on cache automatically whenever it is accessed and migrates to tape by the system admin when more cache space is needed.

b) A set of drivers functions for basic tape I/O operations - a set of drivers functions for basic tape I/O operations has been incorporated into the SRB server. These functions include mount, dismount, open, close, read, write, seek, etc. Currently, the driver has only been tested for 3590 tape drives.

c) A tape library server - A tape library server for the STK silo running ACSLS software has been incorporated into the SRB system. Its primary function is to schedule and perform the mounting and dismounting of tapes. It uses the same authentication system and server framework as other SRB servers. Currently, only the IBM 3590 tape drive has been tested.

d) A set of tape and cache management utilities.

inittape - Label a tape and register the tape with the MCAT.
lstape - List the tapes and their associated meta data in MCAT.
tapemeta - Modify the metadata of a tape that has been registered with MCAT.
dumptape - Dump and purge files in the cache system to tape.

e) A set of tape and cache management APIs. These are all privileged calls.

srbTapelibMntCart() - Request the Tape Library server to mount a tape. srbTapelibDismntCart() - Request the Tape Library server to dismount a tape. srbGetTapeCartPri() - Get the priorities for each tape type depending on the availability of tape drives. srbDumpFileList() - Dump a list of files to tape. srbStageCompObj() - Stage a compound Object from tape to cache

f) A set of APIs that deal with compound object metadata. These are all privileged calls. Raja did the internal MCAT calls.

srbRegInternalCompObj() - register an internal compound object. srbRmIntCompObj() - unregister an internal compound objects. srbModInternalCompObj - Modify the metadata of an internal comp object. srbRmCompObj() - unregister a compound object.

2) Building and configuring the SRB server for tape Mass storage

a) Building the SRB server where the tape drives are attached with tape driver enabled:

After running ./configure, edit the mk/mk.config file and uncomment the TAPE_DRIVE parameter to switch on the tape drivers in the server.

Build the SRB software with the normal build procedures given in README.build

b) Configuring the SRB server with tape driver enabled. Configurations of two files are needed:

data/tapeLibConfig - This file configures the SRB server to communicate with one or more tape library servers for tape mount. In this file, each tape library server is represented by a triplet - (tapeLibInx, tapeLibHost, tapeLibPort). TapeLibInx is an arbitrary unique integer representing the tape library server. TapeLibHost and tapeLibPort are the host address where is server runs and the port number to use to connect to this server, respectively.

data/tapeDevConfig - This file configures all the tape drives known to the Tape Mass Storage system. This file is used by both the tape enabled SRB server and the tape library server for drive configurations. In this file, each tape drive is represented by a triplet - (devicePath, addressInSilo, driveType). DevicePath is the Unix device path (e.g., /dev/rmt/0) of this drive. AddressInSilo is the address of the tape device in the stk silo (e.g., 0:0:7:1) and driveType is the drive type (e.g., T_3590).

c) Creating a compound resource for the Tape Mass Storage system. Typically, a compound resource contains a pool (one or more) of cache resources and a single tape resource. The SRB java Admin Tool (SAT) is best suited for such purpose. The following shows the step for creating a compound resource using SAT:

Create an empty Compound Resource - Use the "Create a New Compound Resource" menu to create an empty compound resource. The "location" of the resource MUST be the network location where the tape enabled SRB server is located. The "Resource Class" can be anything, but typically set to "archival". This setting determines what resource class it will be treated as external to the compound resource.

Create one or more cache resources - Use the "Create a New Physical Resource" menu to create one or more cache resource. The "Resource Type" must be set to "tapeCache" and the "Resource class" must be set to "cache".

Create one tape resource - Use the "Create a New Physical Resource" menu to create one tape resource. The "Resource Type" must be set to "tape" and the "Resource class" must be "archival". The "location" of the resource MUST be the network location where the tape enabled SRB server is located.

Add the cache and tape resources to the empty compound resource - Use the "Add resource to compound resource" menu to add these newly created cache and tape resources to the compound resource.

To create compound resource using command line, please do the following:

1) First a new resource is registered to be of the compound type:

     ingestResource 'myCmpRsrc' 'compound' 'sdsc' 'compound' 'compound' 0

Note that no pathname is given.

2) Second, other physical resources are designated as components of the new compound resource:

     makeCompoundResource myCmpRsrc comp-cache-sdsc
     makeCompoundResource myCmpRsrc comp-tape-sdsc
     makeCompoundResource myCmpRsrc comp-cache-caltech

3) Using the tape and cache management utilities.

The MCAT maintains a set of metadata for every tape in the Tape Mass Storage system. Before a tape can be used by the system, the tape must be labeled and registered with the MCAT. The "inittape" utility in the tape/bin directory can be used for such purpose. In addition, the "tapemeta" utility can be used to modify the metadata of a tape that has been registered with MCAT and "lstape" is used to list the tapes and their associated meta data in MCAT.

The "dumptape" utility is used by system administrator to migrate files from the cache resources of a compound resource to the tape resource. The purge option (-p) can also be selected to purge a file on cache once its has been migrated to tape.

These are easy to use utilities and their usages are explained in the manpages given the in tape/man directory.

An example of using dumptape:

The dumptape utility in the "tape/bin" directory can be used to do the migration. I think the name dumptape is misleading.

Usage  :dumptape [-p] [-f filelistFile] file1 file2 ... file1, file2 are the physical paths of the files to be dumped to tape -p - purge the file after dumping to tape -f filelistFile. Instead of specifying file1, file2 ..., the list files to be dumped can be specified in a file named filelistFile.

Please note that file1 file2 ... are the actual UNIX file name stored in the "cache" resource of the compound resource.

In this example, we have a compound resource: "test-compound-sdsc" with 2 phyical resources:

 "test-cache-sdsc" - cache class
 "test-sdsc" - archival class

e.g.,

miner-1875% SgetR -c test-compound-sdsc
--------------------------- RESULTS ------------------------------
netprefix :srb.sdsc.edu:NULL:NULL
default_path :/misc/srb/srb/test/testVault/?DATANAME.?RANDOM.?TIMESEC
phy_default_path :/misc/srb/srb/test/testVault/?DATANAME.?RANDOM.?TIMESEC
phy_rsrc_name :test-sdsc
rsrc_typ_name :unix file system
rsrc_class_name :archival
max_obj_size :0
phy_rsrc_name :test-compound-sdsc
-----------------------------------------------------------------
netprefix :srb.sdsc.edu:NULL:NULL
default_path
:/misc/srb/srb/test/testCacheVault/?USER.?DOMAIN/?SPLITPATH/?PATH?DATANAME.?RANDOM.?TIM
ESEC
phy_default_path
:/misc/srb/srb/test/testCacheVault/?USER.?DOMAIN/?SPLITPATH/?PATH?DATANAME.?RANDOM.?TIM
ESEC
phy_rsrc_name :test-cache-sdsc
rsrc_typ_name :unix file system
rsrc_class_name :cache
max_obj_size :0
phy_rsrc_name :test-compound-sdsc
-----------------------------------------------------------------
netprefix :srb.sdsc.edu:NULL:NULL
default_path :
phy_default_path
:/misc/srb/srb/test/testCacheVault/?USER.?DOMAIN/?SPLITPATH/?PATH?DATANAME.?RANDOM.?TIM
ESEC
phy_rsrc_name :test-cache-sdsc
rsrc_typ_name :unix file system
rsrc_class_name :cache
max_obj_size :0
phy_rsrc_name :test-compound-sdsc
-----------------------------------------------------------------

If we put a file in resource "test-compound-sdsc" :

miner-1877% Sput -S test-compound-sdsc Sls foo1

Then drill down and find out where the file is located:

miner-1879% SgetD -t foo1
repl_enum :0
data_name :foo1
is_dirty :1
seg_num :-1
int_repl_num :0
int_seg_num :-1
cmpd_path_name
:/misc/srb/srb/test/testCacheVault/srb.sdsc/39/48/foo1.222146445.1076611627
phy_rsrc_name :test-cache-sdsc
rsrc_typ_name :unix file system
is_dirty :1
offset :0
data_size :2593496
-----------------------------------------------------------------

You can see that the file is in /misc/srb/srb/test/testCacheVault/srb.sdsc/39/48/foo1.222146445.1076611627

of the "test-cache-sdsc"


A way to find all the files in the vault of "test-cache-sdsc" is:

find /misc/srb/srb/test/testCacheVault -type file > mylist

The -ctime, -mtime or -atime option of the find command can be used to filter the age of the file listed.

We can then force a migration of all the files in mylist:

dumptape -f mylist

Now if we list the file again:

miner-1880% SgetD -t foo1
repl_enum :0
data_name :foo1
is_dirty :1
seg_num :-1
int_repl_num :0
int_seg_num :-1
cmpd_path_name
:/misc/srb/srb/test/testCacheVault/srb.sdsc/39/48/foo1.222146445.1076611627
phy_rsrc_name :test-cache-sdsc
rsrc_typ_name :unix file system
is_dirty :0
offset :0
data_size :2593496
-----------------------------------------------------------------
repl_enum :0
data_name :foo1
is_dirty :1
seg_num :-1
int_repl_num :1
int_seg_num :-1
cmpd_path_name :/misc/srb/srb/test/testVault/foo1.1079999574.1076611868
phy_rsrc_name :test-sdsc
rsrc_typ_name :unix file system
is_dirty :0
offset :0
data_size :2593496
-----------------------------------------------------------------

It now has two copies. The -p option can be used to purge the cashe copy when the migration is done.