Release Notes 3 4 1

From SRB

These are the release notes for SRB 3.4.1, released Friday, April 28, 2006.

Contents

Any valid ASCII characters are now acceptable in SRB filenames,
except a string of two quotes in a row

Previously, many special characters could be included in the filenames that were Sput into SRB, but the quote (') and the ampersand (&) could not. Now, even these can be in names and any valid file name on Unix or Windows systems should be ingestible, except when the string has two quotes one after the other. This was particularly a problem on Windows and for certain projects. Previously, users would get errors when storing (Sput'ing) the files, but any file successfully stored into SRB would be fine. As you might imagine, this required quite an effort and significant revisions to the source code, to accomplish. SRB does not support Unicode currently. See bug Bug 198.

Data integrity and vault management

A number of features and bug fixes were added in this release to improve the integrity checking of data stored in SRB, and identifying and removing orphan files in the SRB storage vault.

1) srbFileChk server - The srbFileChk server is a new file check server that checks the integrity of newly uploaded files. By default, it runs on the MCAT enabled host. It can also be configured to run on other servers by changing the fileChkOnMes and fileChkOnThisServer parameters in the runsrb script.

A fairly nasty problem with loss of data integrity can exist in recently uploaded files if the UNIX OS of the resource servers crashes. When the OS crashes, even though the data for the recently uploaded files (within about 0.5 hour of the crash) were properly written to the resource, data loss can still exist because some of these data are still in the OS's buffer cache and have not been written to disk yet. This problem is particularly bad because the SRB system has already registered the files in the MCAT and is not aware of the integrity problem until someone tries to retrieve the data. A typical symptom of this mode of corruption is the size of the corrupted file is not the same as the one registered in MCAT.

A way to fix this problem is to call the UNIX fsync() call which forces the synchronization of the buffer cache to disks when the upload is completed, but a performance test showed the fsync() call can slow down the upload by as much as a factor of 3.

The srbFileChk server performs file check operation for newly created SRB files in the home zone. By default, it wakes up once per day to perform the checking. Please read the srbFileChk manpage for more info.

2) The vaultchk utility Bug 197 - The new vaultchk utility can be used by the sysadmin to identify and delete orphan files in the UNIX vaults. Orphan files (files in SRB vaults with no entry in the MCAT) can exist in the SRB vault when the SRB server goes down due to system crashes or server shutdown at certain critical points (such as during a bulk upload operation). Please read the vaultchk manpage for more info.

3) Fix Bug 209 - Sls -V to verify in-container files correctly. Sls -V can be used to verify the file size in the vault vs the size registered in MCAT.

4) Add -K option for Sbkupsrb Bug 199 to verify the backup copy is indeed copied correctly.

5) Fix Bug 200 - Add capability to Schksum so that it will check for consistency of checksum values across replicas. Also the output of "Schksum -l" was enhanced to include the checksum value of each replica and the resource associated with each replica is also printed.

6) Call fsync after each phyMove - The Sphymove command is used mostly by the sysadmin to move user's files from one resource to another to manage the resources. The Sphymove operation will also be subjected to the same data corruption mechanism mentioned above. We felt that the phyMove operation is more critical than other operations because the original copy will be removed after the move. Therefore, it is important that before the original is deleted, we must make sure that the new copy is good. Therefore, the UNIX fsync call is added after each successful phyMove.

Quota System

A new quota system has been incorporated in the SRB that provides a means to manage usage at a per-user level. The quota information is stored in MCAT along with usage information at the <user,resource> level. Currently, quotas work on individual users and physical resource levels. They do not operate at group users and logical users level. A few commands for administration are provided as part of the Scommands.

- SmodR -q can be used to set quota and/or current usage for a user, resource pair. SmodR -q -1 -1 ALL ALL computes current usage for all users at every resource based on sizes of files stored in the MCAT.

- SmodR -Q can be used to enforce quotas by reducing user access to null for the resource where their use exceeds the quota limit. The user can still read a file from such a restricted resource but will not be able to write a file in there.

- SgetU -q can be used to display quotas and usage for a given user or users.

SRBAdmin can run the two SmodR options one after the other at well-defined intervals in a cron job and enforce the quota requirements. Immediate imposition of quota checking is not implemented currently. For some database systems one can define a set of triggers to handle immediate quota checking and enforcement.

SRB Web Perl Portal

Perl code for a general purpose SRB web interface (or portal)

1) Can be ran with a single SRB account or with individual SRB account login

2) Can be used with either ENCRYPT1 or GSI_AUTH

3) Will run in html or shtml configuration

4) Can easily be extended for your projects needs

5) Location in tree: SRB-Web-Portals/SRB-Perl-Portal

SRB account management via grid-mapfile

Running this simple script via cronjob will keep SRB accounts synced with entries in a grid-mapfile. When new DNs are added, the DNs will be inserted into the matching SRB account. If no matching SRB account is present, one will be created. If DNs are removed, the matching DN in the MCAT will be removed as well.

Location in tree: admin/Auto-Accounts-via-Grid-mapfile

Real time data management

1) Introducing new Scommand Sds2db. Sds2db converts and synchronizes a BRTT Datascope database, a popular database for real-time data, to a SQL based database, such as Oracle. This feature had been requsted by many in the real-time data management community.

2) Introducting new Scommand Ssql. Ssql allows a user to query a converted database, by Sds2db, with standard SQL as well as customized SQL specific for real-time waveform data.

New driver for NCAR MSS

A new driver has been developed for the NCAR Mass Storage System. Since the MSS API operates at the get/put level, rather than at the open/read/write/close level, this driver includes a simple cache management system. Since the transfers to/from MSS will not be done in parallel with the SRB network transfer, it will be somewhat slower than many other resources, but should work well as a deep archive. This was bug 124.

Completely reworked web site/documentation system (MediaWiki)

Our SRB home page/documentation system has been completely reworked using a MediaWiki infrastructure (as used by Wikipedia) to significantly improve the quality and accessibility of our documentation. In addition, this makes it much easier for the SRB team to update and revise the documentation and, much like wikipedia, allows our user community to also participate in the extension and refinement of the information available.

The current system includes many key pages from the old SRB web site, most of our "readme" documents from the distribution, the Scommand man pages, plus some new pages such as a glossary and organization pages. The information is presented in a clean and direct manner, and is significantly hyper-linked, and immediately searchable. This was bug 25:There are too many README files, 30:Basic concepts are confusing and foreign, and 31:Site is hard to navigate.

The release no longer contains the readme.dir subdirectory and files; see the srb home page, instead.

Other new features

  • Add timeout to the open call in the UNIX driver Bug 222. This feature is primarily for the SAM-QFS resources which can hang on open. This feature can be switched on with the UnixOpenTimeout parameter in the runsrb script.
  • Add -A option for Sstage to allow sysadmin to stage other users files stored in SAM-QFS resources.
  • Require "read" permission (vs requiring "write" permission previously) only to backup data using Sbkupsrb. Some users might have switched off the "write" permission to their files to ensure that the data cannot be changed. But this prevents sysadmin from backing up these files on behalf of the user. Therefore, the required permission has been changed to "read".
  • The server will no longer forward the entire svrReplContainer call to the MES. Now, only the getContainerInfo query will be sent to the MES.
  • Python C functions were added for user-defined metadata, developed by Robert Sanderson.

Critical bug patches for in 3.4.0 included

The critical bug fixes that were released as patches to 3.4.0 are, of course, included in the 3.4.1 release. These are bug 190 "Sput -b of individual files fails", bug 193 "Cross zone connections fail", and bug 196 "configure with --enable-psglobj fails to build".

Other bug fixes

  • bug 62 Multiple metadata delete in inQ now works correctly.
  • bug 99 MCAT cleanup script (v3.3) + compilation opt for MCAT adm...
  • bug 101 inQ can now replicate recursively across collections.
  • bug 122 SmodR -D (set resource down) does not seem to have any effect.
  • Bug 126 Fix a problem with srbObjStat where the call is too slow and does not work if the path is a collection.
  • bug 179 Trailing / in mdasCollectionHome should be allowed.
  • bug 181 Sauth fails (with error -14) on Windows
  • bug 183 Srsync with no-follow symbolic link option.
  • bug 191 Sbkupsrb of a collection with more than 300 subcollection failed.
  • bug 192 Singesttoken not creating group for new domain.
  • Bug 193 Fix a across zone connect problem where the ticket user should be used to check the encrypt1 authentication.
  • bug 195 (Jargon) SRB Classes too verbose by default.
  • bug 201 MCAT Java admin tool. The admin tool was extended in a few ways to make it easier to connect to different servers. It will read what it can from the default or specified MdasEnv file, show you the values, and let you update fields. See the help window for more information.
  • bug 202 SRB3.4 and 3.3.1 SgetD -A output differs from SgetD.
  • bug 203 Bad permissions on files in tarball
  • bug 204 collection permisions for groups.
  • bug 205 aidi_rcv_token error, token is too large for buffer.
  • Bug 206 Fix a problem with Sput -f from a Mac to a Linux resource appends instead of replacing the file.
  • bug 207 * in SgetD does not work when used with -A option.
  • bug 208 SmodR -A does not make a logical resource if it is not there.
  • Bug 210 Fix a problem with Sput -bc where the resource associated with the full container should be used instead of using the default resource.
  • bug 211 Build fails when using --enable-psglobj for non-MCAT servers.
  • bug 212 Build fails when using --enable-myslobj for non-MCAT servers.
  • Bug 213 Fix a problem with Sput -bc seg fault for some linux platform when the current container is full and a new container needs to be made.
  • bug 215 gsi-enabled Sinit incorrectly prompts for SRB password.
  • Bug 216 Request for new GSI authentication mode without srbUser. Via Jargon, one can now connect to the SRB using only a GSI certificate, no other user information is needed.
  • bug 217 mdasEnvFile env variable on Windows
  • Bug 218 Sls with Oracle 10g returns rows out of order SRB 3.3.1
  • bug 219 Sput -b with empty directories in the tree can get rcv_error or a socket connection broken message and failed. A work around was also implemented for a similar problem with the Spcommand where the socket connection timed out with a ETIMEDOUT error.
  • bug 220 Srsync from srb to srb does not work if space character in collection.
  • Bug 220 Allow Srsync from a SRB collection to another SRB collection to work when the collection or dataName contains white spaces.
  • Bug 221 Create empty collections for Bulk Copy.
  • bug 215 gsi-enabled Sinit incorrectly prompts for SRB password.
  • Bug 227 thread failure message appears rarely.
  • Bug 228 drag and drop only method to ingest folders

As always, a number of smaller bugs have also been fixed.

MCAT Patch

A simple patch for upgrading MCAT from version 3.4 to 3.4.1 is also part of the release. It creates two new synonyms/aliases/views in the database.