Release Notes 3 4

From SRB

This document describes changes for SRB 3.4, released October 31, 2005.

Please note that the 3.4 servers can be used with 3.3 clients but the 3.3 server is not 100% compatible with 3.4 clients. We have added some new client APIs such as srbExecCommandC() and some new options such as bulk registration into a compound resource. But the 3.4 server is backward compatible with the 3.3 client protocol.

Major new features:

Master/Slave MCAT

A SRB federation (zone) can be configured to run with a single Master MCAT plus zero or more Slave MCATs. The purpose of the Slave MCAT is to improve responsiveness across a Wide-Area-Network. The Slave MCATs are used for "read only" type queries. The following Scommands have been converted to use the Slave Mcat by default:

Scat, SgetColl, Sget, SgetD, SgetR, SgetU, Sls, Slscont and Stoken.

The -X option or the environment variable "masterMcat" can be used to force the query to be run on the Master MCAT instead. Please read the readme file in readme.dir/README.MasterSlaveMcat for more information.

A prototype integration of the HDF5 and SRB

HDF5 is a general-purpose library and file format for storing scientific data developed by NCSA. In the past year, NCSA and DICE collaborated to provide efficient access to objects in HDF5 files stored in the SRB. A prototype was developed, demonstrating the feasibility of this approach, and showing that significant performance gains can be achieved for clients that need to access only parts of a file, such as individual objects, subsets of large arrays, or metadata.

The integration was carried out by integrating the HDF5 library on the server so that HDF5 functions can be run directly on where the data is stored. A set of HDF5 specific thin client APIs was provided for accessing HDF5 data stored in SRB. A user guide is included in the release, in readme.dir/HDF-SRB-UG.pdf.

SRB GridStatus, a monitoring and alert system

GridStatus can monitor one or more SRB grids, all you need to do is add connection parameters for one SRB account per Grid and it will do the rest. It will discover all of the hosts and resources that are apart of the grid and begin monitoring them. It will send email alerts if something goes down, or if a resource goes above 90% utilization, and it uses an optional mySQL database to store all downtime information. Package: admin/GridStatus

More extensive pre-release (QA) testing

The 3.4 release has gone through much more vigorous Quality Assurance-type testing than previous releases due in large part to a number of extensions to our automatic testing environment and scripts.

We wish to thank Adil Hasan of the UK E-Science Data Management group for providing a set of python Scommand test scripts (see our contributed software page) which are now run as part of this.

A large set of tests are run continually on four hosts under our tinderbox system (available off the SRB home page), which now includes a Solaris system, and a configuration of two cooperating hosts to create and use network-based resources. Also see bugzilla item 91 below.

Most of the other new features and bug fixes are bugzilla items (listed at the beginning of each below). Please check the srb bugzilla system for more information.

New features and bug fixes:

44 - SPCommand / Proxy can not be used across firewall. Client initiated proxy command and API. To get around firewall issues on the client side, a "-c" option has been added to the Spcommand command for client initiated connection. The default uses server initiated connection. A new API - srbExecCommandC () has been added for issuing client initiated proxy operation.

63 - to disconnect from database when a spawned srbServer is idle.

65 - handle port scans better.

72 - Rare and intermittent Sput -b data corruption on a Mac

73 - Rare and intermittent Sput -b file loss

75 - SgetR crashes with: glibc detected free(): invalid pointer:

78 - Have password to be prompted on Sinit instead of from MdasAuth. If no .MdasAuth or .srbAuthFile is available, Sinit will prompt for the user password and create a temporary scrambled password file (like Sauth) which Sexit will deletes (if temporary).

80 - srbLog should not be purged but stored by date. The srbLog files will now be stored in data/log/ where: mm = month, dd = day, yy = year. The "logfileInt" parameter in bin/runsrb script can be used to specify the interval in days for switching to a new logfile. The default interval is 5 days.

81 - SRB web-page, broken link

85 - Schmod -i seems to require -r

86 - Schmod -i -r doesn't actually recurse

91 - Need a much more extensive set of test (QA) scripts. Many extensions were made to our automatic testing system, including the integration of an additional set of tests provided by Adil Hasan of the UK E-Science Data Management group (see our contributed software page).

102 - add a function - Change resourceName

105 - problem with accentuated letter

105 - problem with accentuated letter. Collection names and data names with single-quote characters are now handled . This required a lot of debugging (more then 3 weeks) to get this working for all SRB operations. User-defined metadata with single-quotes in them are double quoted before inserting. Same thing with querying also. This work was also involved in solving bug 148.

106 - Sufmeta -R -c option returns COLECTION_NOT_IN_CAT

108 - proxy command arguments cannot contain blanks

111 - After a while, SRBServers fail trying to open mcatHost

112 - jargon file transfer problems

113 - should detect, explain, and quit on 64-bit hosts

115 - perl rindex problem on some hosts causing failure

116 - Build assumes GNU make is installed as "gmake". The build system will now check and use gmake if it is available, and 'make' otherwise (assuming that 'make' is 'gmake').

121 - Java Admin tool should handle more SmodR functions. A new set of miscellaneous resource operations have been added.

125 - Doing Sget a file, if the local directory is full, need better error message.

127 - Configuration check for Globus flavors is arbitrary

128 - Build failure, PPC MacOS X, when using GSI

129 - Physical Move fails for non-admin users

130 - Build failure, AIX (libtool problem?)

135 - In link commands, use "-L/dir", not "-L /dir"

136 - Sls -R "DATA_CHESUM='0'" does not work

137 - Array indexing error in clStub.c

138 - srbMaster crashes on startup

139 - jargon intermittent bulkload error on Linux

141 - auth_scheme should be written, esp if env var mdasEnvFile set

143 - GSI-enabled server get inconsistent user_id from Sput/Sget

     This could result in NO_ACCESS error.

145 - logEval.c does not build on latest OS X (8.0.0)

146 - MCAT not handling bulk load properly if a file already exists

147 - annotations of previous same-name collections reappear

148 - Apostrophes in metadata

149 - Add ability to modify resource/location netprefix

152 - Add a new input parameter- newPathName to srbObjCopy

153 - SgetD misleading answer

154/155- Sbload Command problems in SRB3.3.1. Sbload seg fault. Needed to initialize the mcatHost parameter before using it.

156 - Add federation(s) to attributes kept by the ZoneAuthority

158 - SgetR -l no longer works

159 - problem deleting DN strings for a given user

161 - SmodColl -d and then -c failure; perhaps OS X specific

162 - Srsync failed if user only have read permission

163 - mcatAdmin, refreshed windows will revert if closed and reopened

164 - add srbPort to

165 - Scommands fail to build on some Macs. There was a fatal error msg in commExtern.h. Later, we found this happening in other systems too (Linux) and this fix should correct that too.

168 - SERVER_DN is not read when mdasEnvFile is used

169 - Rare and intermittent Sphysmove problem on a Linux host

170 - mcatAdmin.jar hang

172 - Sphymove -P option does not seem to work, should it be removed?

     Problem was where the input path was not passed along.

173 - Sput -b with invalid directoryname segfaults

174 - Ssh dies with segmentation fault

176 - Apostrophe in filename breaks download on SRB 3.3.1

180 - Sauth scrambled password failures

184 - Scp -b -r does not work on OS X using CVS version

185 - Compilation failure under gcc 4 (Fedora Core 4)

186 - Very slow operations with Oracle. Srsync, "Sget -b", Sphymove and "Scp -b" operations using Oracle MCAT could become very slow, up to 10 minutes, when using Oracle for the MCAT. The problem was traced to a bug in Oracle involving ESCAPE character that can cause the query to be extremely slow. This bug was fixed in Oracle but we now have a workaround for this problem in SRB by not using ESCAPE character in these queries. This problem does not exist for other DBMSes.

187 - Sreplicate broken pipe to server. Fix a couple obscure problems that could cause server to server connections to fail. One is specific to certain versions of Linux. The other occurred when servers were restarted. For the Linux solution, the code now uses both gethostbyname_r (reentrant) and gethostbyname and checks for a type of failure (depending on the Linux version) each can return.

188 - Srsync -a issues. There were some cases where Srsync -a would not update all replicas properly.

189 - lowLevelClose error from 9

na - Files sizes of containers and files in containers. The maximum size of a container has been increased from 2 Gbytes to 200 Gbytes. The maximum size of a file that can be stored in a container has also been increased from 2 Gbytes to 200 Gbytes.

na - Bulk Scp into container. "Scp -bc container" now works and can be used to recursively copy a whole collection into a container.

na - Sbkupsrb - added -v option for verbose mode.

na - Sbkupsrb and Srsync - continue operation even though one or more errors have occurred.

na - Sregister - added "-C" option for registering files into compound resource.

na - Sbregister - Allow bulk registration into compound resource with the "-f" option.

na - The handling of srbBulkUnload() has been re-done such that it will download inContainer files too in addition to normal files so that a separate download for inContainer files is no longer needed.

na - Fix a problem with "Sput -n" where the copy number is ignored when it is used with the m/M option.

na - Fix a problem that Scp was not working properly when the source files have replica.

na - Fix a problem that "Sget -n" option may download the wrong copy.

na - Fix a problem that Sget may incorrectly think the size downloaded is wrong with an OBJ_ERR_COPY_LEN error if the source file contains multiple copies of different sizes.

na - Resource access permission was not checked when the request came from a foreign zone. The problem has been fixed.

na - Include the value of errno to the error msg in Sget to make it easier to identify a cause of some problems (bug 125). One example is when the local file system is running out of space.

na - Spcommands will take an argument with "space" character in it.

na - For overwriting SRB files, change the size to zero before the overwrite so that if the write fails in the middle, the registered file size will be zero instead of some undetermined value.

na - Take out the size verification after file transfer for ADS type files because it does not support the stat () call.

na - Fix a problem that Sput/Srsync print out bogus error of COLLECTION_NOT_IN_CAT when the collection does not exist.

na - Fix a problem with Sphymove of a single file into container.