FAQ

From SRB

(Difference between revisions)
<
Revision as of 22:15, 15 February 2006
Wayne (Talk | contribs)

← Previous diff
Revision as of 00:12, 16 February 2006
Wayne (Talk | contribs)

Next diff →
Line 27: Line 27:
SRB systems to interact with each other and allow for seamless access of data SRB systems to interact with each other and allow for seamless access of data
and metadata across these SRB systems. These systems are called the 'zones. and metadata across these SRB systems. These systems are called the 'zones.
-More information about zoneSRB can be found at:+More information about zoneSRB can be found at: [[FedMcat]] and [[README.zones]].
-[[FedMcat]]+
- and+
-[[README.zones]].+
===What kinds of resources does the SRB support?=== ===What kinds of resources does the SRB support?===
->Storage resources can be directories in Unix file systems,+ 
 +Storage resources can be directories in Unix file systems,
directories in Windows file systems, directories in Windows file systems,
archival storage systems such as HPSS (and, previously, archival storage systems such as HPSS (and, previously,
Line 55: Line 53:
For commercial applications, please For commercial applications, please
contact the UCSD Technology Transfer & Intellectual Property Services contact the UCSD Technology Transfer & Intellectual Property Services
-at invent@ucsd.edu. See <a+at invent@ucsd.edu. See [Is_SRB_Open_Source]
-href=http://www.sdsc.edu/srb/srbOpenSource.html>+
-http://www.sdsc.edu/srb/srbOpenSource.html+
</a> for more information. </a> for more information.
===How does the SRB compare to commercial software?=== ===How does the SRB compare to commercial software?===
 +
As far as we know, there is no commercial product much like As far as we know, there is no commercial product much like
the SRB (except for the commercial version of the SRB, see below). the SRB (except for the commercial version of the SRB, see below).
Line 105: Line 102:
NARA, and the Library of Congress. NARA, and the Library of Congress.
See See
-<a href=http://www.sdsc.edu/srb/Projects/main.html>+[SRB_Projects]
-http://www.sdsc.edu/srb/Projects/main.html</a>.+
-<br>+
c) We support the Globus Grid Security Infrastructure (GSI) as an c) We support the Globus Grid Security Infrastructure (GSI) as an
optional method of authentication. optional method of authentication.
<br> <br>
-d) The <A HREF="#T9">SDSC Matrix</A> workflow management system is+d) The [[SDSC Matrix]] workflow management system is
a grid-based system and uses a Web Service Definition Language (WSDL) a grid-based system and uses a Web Service Definition Language (WSDL)
interface. interface.
Line 163: Line 158:
compute-intensive operation and there will be some unavoidable performance compute-intensive operation and there will be some unavoidable performance
penalty. penalty.
-See <a href=http://www.sdsc.edu/srb/SecureAndOrCompressedData.html>+See [[SecureAndOrCompressedData]]
-http://www.sdsc.edu/srb/SecureAndOrCompressedData.html</a>+
for more information. for more information.
Line 207: Line 201:
We will freely provide answers and provide some limited support to We will freely provide answers and provide some limited support to
-help get sites up and running with the SRB. There is now a+help get sites up and running with the SRB. There is a
-<a href=https://lists.sdsc.edu/mailman/listinfo.cgi/srb-chat>+[https://lists.sdsc.edu/mailman/listinfo.cgi/srb-chat srb-chat email list]
-srb-chat email list</a> for SRB admins, developers and users to discuss+for SRB admins, developers and users to discuss
questions, problems, and solutions (it includes an archive of previous posts). questions, problems, and solutions (it includes an archive of previous posts).
-<a href=http://www.sdsc.edu/srb>Our web site</a> +[http://www.sdsc.edu/srb Our web site]
includes information includes information
on current bugs, future plans, current projects, etc. The SRB tar on current bugs, future plans, current projects, etc. The SRB tar
Line 221: Line 215:
===What operating systems does the SRB run on?=== ===What operating systems does the SRB run on?===
-SRB has been ported to +SRB has been ported to a variety of Unix platforms including Linux,
- a variety of Unix+Mac OS X, AIX (ex. SP-2 machines), Solaris, SunOS, SGI Irix and to
- platforms including Linux, Mac OS X, AIX (ex. SP-2 machines),+Windows. The Windows version of the Server cannot be configured with
- Solaris, SunOS, SGI Irix+an MCAT (so it talks to one that is), but can store and retrieve data
- and to Windows.+from the Windows file system. SRB is easily portable to Unix-type
- The Windows version of the Server cannot+OSes.
- be configured with an MCAT (so it talks to one that is), but can+
- store and retrieve data from the Windows file system.+
- SRB is easily portable to Unix-type OSes.+
===What authentication mechanisms are available for SRB?=== ===What authentication mechanisms are available for SRB?===
-> SRB supports three types of authentication: 1) A basic password-based + 
- authentication, 2) password-based authentication in which the password+SRB supports three types of authentication: 1) A basic password-based
- is used in a challenge-response protocol so no plain-text password+authentication, 2) password-based authentication in which the password
- is sent on the network ("encrypt1"), and 3) GSI authentication.+is used in a challenge-response protocol so no plain-text password is
- Encrypt1 is a simple and secure stand-alone authentication system.+sent on the network ("encrypt1"), and 3) GSI authentication. Encrypt1
-In both password-based systems, user passwords are stored in the MCAT and+is a simple and secure stand-alone authentication system. In both
-users can record their passwords into their ~/.srb/.MdasAuth file to provide+password-based systems, user passwords are stored in the MCAT and
-convenient and reasonably-secure access.+users can record their passwords into their ~/.srb/.MdasAuth file to
- <a href=http://www.globus.org/security/>GSI (Globus Grid +provide convenient and reasonably-secure access. <a
- Security Infrastructure)</a> is convenient when using other Globus+href=http://www.globus.org/security/>GSI (Globus Grid Security
- tools but requires users to acquire Certificates (i.e. a Public+Infrastructure)</a> is convenient when using other Globus tools but
- Key Infrastructure is needed).+requires users to acquire Certificates (i.e. a Public Key
- Previously we also supported SEA authentication (SDSC+Infrastructure is needed). Previously we also supported SEA
- Encryption and Authentication system) but now GSI provides similar+authentication (SDSC Encryption and Authentication system) but now GSI
- functionality.+provides similar functionality.
Line 258: Line 249:
agencies, and have every reason to believe that this will continue agencies, and have every reason to believe that this will continue
long term. See our plans for the near future in our long term. See our plans for the near future in our
-<a href=http://srb.npaci.edu/bugzilla/>+[Bugzilla] system which we use to track bugs and pending features.
-Bugzilla system</a>+
-which we use to track bugs and pending features.+
===Where can I find more information about SRB and related systems?=== ===Where can I find more information about SRB and related systems?===
-<DD> We maintain a set of + 
- web-pages at <a href=http://www.sdsc.edu/srb>http://www.sdsc.edu/srb</a>+We maintain a set of
- where a lot information about the SRB is available. This FAQ also+web-pages at <a href=http://www.sdsc.edu/srb>http://www.sdsc.edu/srb</a>
- contains many links to additional information on specific topics. +where a lot information about the SRB is available. This FAQ also
- There are also many documents included with the release under+contains many links to additional information on specific topics.
- the MCAT and readme.dir directories.+There are also many documents included with the release under
-<p>+the MCAT and readme.dir directories.
 + 
Some general information is also available in Some general information is also available in
-<A HREF=http://en.wikipedia.org/wiki/Storage_resource_broker>+[http://en.wikipedia.org/wiki/Storage_resource_broker>
-wikipedia</A>.+wikipedia].
-<p>+or
 +*http://en.wikipedia.org/wiki/Storage_resource_broker.
 + 
===What kind of query capabilibities are available?=== ===What kind of query capabilibities are available?===
Line 287: Line 279:
See See
-<a href=http://www.sdsc.edu/srb/howToSrbEnable.html>+[[howToSrbEnable.html]
-http://www.sdsc.edu/srb/howToSrbEnable.html</a> for a list of options+for a list of options
and links to additional information. and links to additional information.
Line 319: Line 311:
</li> </li>
</UL> </UL>
 +
 +==Interfaces and Tools==
 +
 +===What are the Scommands?===
 +
 +Scommands refers to a set of utility routines that can be used in a
 +Unix shell or Windows DOS command shell and access data and meta data
 +information from SRB and MCAT.
 +For more information on Scommands see README.utilities. Scommands
 +also have a set of man pages describing each of the commands.
 +One first logs in via a Sinit, and can then do Sls, Scd, Sput, Sget, etc.
 +Man pages are available at
 +[http://nbirn.ucsd.edu/ForUsers/Tutorials/SRB/manpagesv20.html]
 +and
 +[http://www.sdsc.edu/srb/srbcommands.html]
 +.
 +
 +===What is inQ?===
 +
 +inQ is a graphical SRB client for Windows 98/Me/NT/2k/XP.
 +
 +In a nutshell, inQ provides a familiar file-manager-like interface that SRB
 +users can use to manage their data stored on SRB; actually it's more like a
 +file-manager interface on steroids. inQ looks and acts a lot like Windows
 +Explorer or Nautilus but also throws in features found in several web
 +browsers like Internet Explorer or Netscape Navigator. It offers an easy way
 +to manage metadata and access permissions, as well as a query builder
 +capable of performing nested queries. It also throws in friendly,
 +context-sensitive buttons that show you which actions can be performed on
 +any given item in SRB.
 +
 + For more
 + information, see [inQ].
 +
 +===What is MySRB? ===
 +
 +MySRB is a web-based browse and search interface to the SRB.
 +See the mySRB home page at
 +<a href=http://www.sdsc.edu/srb/mySRB/mySRB.html>
 +http://www.sdsc.edu/srb/mySRB/mySRB.html
 + for more information.
 +
 +===What APIs are available?===
 +
 +The most comprehensive programmatic API is the SRB
 + C library which can be linked with any application program.
 +We also have a pure Java client
 + library, which contains the most commonly used function calls
 +[Jargon]
 +Almost all of the
 + C library calls can be accessed through our Python binding.
 +Some sample programs for using the API can be found in
 +the release under test/examples.
 +Also see the API description at
 +[README.clientAPI]
 +http://www.sdsc.edu/srb/install/README.clientAPI</a>
 +and
 +the SRB Technical Information page at
 +<a href=http://www.sdsc.edu/srb/srb.html>
 +http://www.sdsc.edu/srb/srb.html</a>.
 +<p>
 +
 +===What is the srbBrowser? ===
 +
 +The srbBrowser is a java-based graphical SRB client. It provides a subset
 +of the functionality of inQ but can be used as a graphical client on Unix
 +systems.
 +
 +
 +===What is the mcatAdmin (Java Admin Tool) ? ===
 +
 +mcatAdmin (also commonly called the Java Admin Tool) is a java-based
 +graphical (GUI) srb-mcat Administration tool. It assists in the
 +administration by making clear the available functions (like most
 +GUIs) and presenting available values from which to choose. For
 +example, when adding a new user, the existing domains are listed and
 +the adminstrator clicks on the domain to use for the new user. And
 +when modifying a user, one clicks on a domain and is given a list of
 +the users in that domain to choose from. The GUI includes windows to
 +create, display, and modify zones, users, resources, locations,
 +domains, and other tokens. There are also command-line utilities that
 +perform administrative functions.
 +
 +
 +===What is Jargon (Java API)?===
 +
 +JARGON is a pure Java API for developing SRB (or other) datagrid interfaces.
 +The API currently handles file I/O for
 +local and SRB file systems and is easily extensible to other file
 +systems. File handling with JARGON closely matches file handling in
 +Sun's java.io API, a familiar API to most java programmers.
 +[Jargon]
 +
 +===What is the SDSC Matrix?===
 +
 +SDSC Matrix is a data grid workflow management system. Matrix can be used to
 +create, access and manage workflow process pipelines. Matrix internally uses
 +the Data Grid Language, which can be used to describe, query and control
 +process-flow pipelines.
 +See [Matrix] for more information.
 +
 +The Matrix API can be used to define multiple SRB commands (and non-SRB grid
 +services) as a single dataflow process and execute it on multiple servers.
 +Matrix is available as a (SOAP/WSDL) web service. Matrix client programming
 +for SRB is made very simple using a developer friendly Java API (less
 +learning curve).
 +
 +===Does MCAT functionality vary from one client to another?===
 +
 +All functionalities are supported in the Scommand utilities for
 +Unix/Linux/MacOSX and Windows. This is because we do all development on
 +Unix clients and they get ported to other platforms. The MySRB provides a
 +different perspective to metadata management but on a single-file level
 +and collection level. It provides a good way of browsing and querying
 +of metadata across collections, and also allows for ingesting,
 +extracting, updating and deleting metadata and user annotations for
 +single SRBobject or SRB collection. The inQ provides a unique capability
 +where one can associate metadata to SRB objects and collections in an
 +intuitive way and also query across collection and form (temporary)
 +query-collections. This allows one to query based on attribute-metadata
 +and then get a collection and slowly refine the query to drill down to
 +a sub-collection that is of interest.
 +
 +Hence, each client provides a unique way of handling metadata and their
 +management. One of our goals is to provide uniform functionality across
 +all client interfaces. But this requires a huge amount of programming
 +which we are unable to dedicate at this time.
 +
 +===Is one client better than another for entering metadata?===
 +
 +The Scommands client is very good for entering metadata. As mentioned
 +before one can use inQ or MySRB for entering/updating metadata of
 +individual SRB object and SRB collection. But the Scommands provide
 +for Bulk ingestion of metadata for multiple SRB objects possibly
 +residing in more than one SRB collection.
 +
 +For cutting and pasting, there are utilities in both MySRB and in
 +Scommands for copying metadata from one SRBobject to another, from one
 +SRB collection to another and from one SRB collection to an SRB
 +object. This is different from cutting and pasting as it is done
 +internal to the SRB and not at the user-GUI.
 +
 +===Is there a way to load attribute/value pairs from another application into the MCAT?===
 +
 +Yes. SRB allows one to bulk ingest metadata associated with one or
 +more SRB objects. This is done by writing a metadata file in a
 +particular format. Hence, if an application can generate a file in
 +that format or one can write a wrapper which takes the application
 +output and creates the file in the SRB metadataFile format then we can
 +ingest the metadata attribute/value pairs. Actually, if you are doing
 +this in Unix-based systems you can do that by writing simple scripts
 +or by piping multiple applications together with the final pipe going
 +to the SRB Scommand for ingesting metadata.
 +
 +Also another unique way of associating metadata for SRB objects is to
 +do automatic extraction INSIDE SRB and storing them in the MCAT. This is
 +done by writing simple templates (basically rules) that allows one to
 +identify the metadata values in the SRB object and then extracting them
 +and storing them as attribute-value pairs in the MCAT. We have done this
 +type of templates for multiple file formats including DICOM, FITS, email,
 +NSFAwardAbstracts and HTML files. This can be launched through the MySRB
 +or through the Scommands.
 +
 +==MetaData Catalog (MCAT)==
 +
 +===What is MCAT?===
 +
 +MCAT, or Meta data Catalog, is a meta data repository system
 +implemented at SDSC to provide a mechanism for storing and querying
 +system-level and domain-dependent meta data using a uniform interface.
 +MCAT provides a resource and data object discovery mechanism that can be
 +effectively used to identify and discover resources and data objects
 +of interest using a combination of their characteristic attributes
 +instead of their physical names and/or locations.
 +
 +===What is meta data?===
 +Meta data is information about data.
 +
 +
 +===What is system-level meta data?===
 +
 +MCAT considers five kinds of entities as primitive objects on which it
 +keeps additional information. These are: data objects, resources,
 +collections, users and methods. The system-level MCAT meta data items
 +are these primitive objects and others derived from these.
 +
 +
 +===Since there are primitive MCAT objects, are there other MCAT objects?===
 +
 +There are many derived MCAT objects. For example, MCAT, in the current
 +release, supports notions of logical resources, compound resources,
 +user groups, etc.
 +
 +
 +===What is application-level meta data?===
 +
 +Application-level meta data are information about data objects that
 +pertain to the non-systemic description of the data objects.
 +Application-level meta data are characterized by information that is
 +particular to the data for that application and are not generalizable
 +across all data objects. For example, location, size, creation date
 +information are systemic as they are available for every data object
 +where as information about how the data object was created and what
 +parameters were used in its creation may not be easily generalized
 +across all data objects and hence form part of application-level meta
 +data. Also, certain applications might have metadata specific to the
 +data object such as FITS metadata used in Astronomy and DICOM metadata
 +for medical images.
 +
 +===What is domain-dependent meta data?===
 +
 +Domain-dependent meta data is another name for application-level meta
 +data.
 +
 +===Does SRB/MCAT support application-level meta data?===
 +
 +Yes, the SRB does support application-level meta data. There are two
 +ways in which the SRB can support application-level meta data: First,
 +as user-defined metadata and second as schema-extended metadata.
 +
 +===What databases can be used for installing MCAT?===
 +
 +MCAT can be installed on either Oracle, DB2, Sybase, Postgres, or
 +Informix. SQLServer, since it is so similar to Sybase, should be
 +fairly straight-forward to implement too.
 +
 +==Administration/Operation==
 +
 +===What do I need to run SRB?===
 +
 +As noted elsewhere, one can have many different setups of SRB. You can
 +get the source code for any of these setups and build your SRB server
 +or client as needed. SRB has been ported on to several platforms (see
 +appropriate FAQ question) and we recommend that you use one of
 +these. If you port to other platforms, we would be glad to include it
 +in our subsequent releases. If you are setting up an MCAT-enabled
 +SRB, you will require either an Oracle, DB2, Sybase, or Postgres
 +database to which MCAT has been ported. We also recommend having a
 +separate user-account called 'srb' (or any variant such as "ucsdsrb")
 +which can be used for setting, administrating and running the
 +system. Once you have the source for SRB and/or MCAT, separate readme
 +files are included to take you through the build, setup and test
 +process.
 +
 +
 +===What are the hardware requirements (disk space, CPU speed, memory)
 +for an SRB Server host? ===
 +
 +The hard disk size depends upon how much storage you want to
 +broker. The SRB software system itself requires only about 200 MBytes
 +of storage. For MCAT-enabled servers, the DBMS will require
 +additional space; on Linux, for example, the SRB with Postgres
 +and ODBC take about 700 MB.
 +
 +We normally recommend 1 to 6 TBytes, depending on the usage. We have
 +specs for a system called the SRB Brick which costs around $15K for 6
 +TBytes (January, 2005).
 +
 +As for CPU speeds, any Linux system with more than 1.5 GHz should be fine.
 +Memory of 1/2 GB or 1 GB will be sufficient.
 +
 +
 +===Which DB system should I use?===
 +
 +For a large and/or heavy-load instance of SRB, you will probably want
 +to use a commercial DBMS like Oracle. It does have better
 +performance, at least in many cases. It costs money though, and you
 +really should have a DBA to manage it. We also have a project planned
 +for the fall 2005 (with some UK folks) that would make use of some
 +Oracle features (including some stored procedures) that will further
 +enhance performance when using Oracle.
 +
 +Postgresql works fine for initial testing (for "getting your feet
 +wet"), and it works fine for light to moderate data loads. It is also
 +relatively easy to install via our install.pl script. It is used in
 +production for some projects (for example, SIOExplorer project which
 +takes SRB ship-board for ocean surveys). Some Postgresql tuning is
 +available via the 'install.pl vacuum' and 'install.pl index' commands
 +(see install.pl for documentation).
 +
 +For any DBMS system, the performance decreases as the size of the MCAT
 +increases.
 +
 +===What is a data object (data set)?===
 +
 +In the terminology of SRB, a data object is a "stream-of-bytes" entity
 +that can be uniquely identified. For example, a file in HPSS or Unix
 +is a data object, or a LOB stored in a SRB Vault database is a data
 +object. Importantly, note that a data object is not a set of data
 +objects/files. Each data object in SRB is given a unique internal
 +identifier by SRB. A data object is associated with a collection (see
 +below). Previously, we used the term "data set" for this, but are
 +phasing it out (as it was often confusing) and instead using "SRB data
 +object".
 +
 +
 +===Who is a registered SRB user?===
 +
 +SRB users are registered in the MCAT catalog and are given unique SRB ids.
 +These identifiers are independent of the location or system ids, such as
 +Unix ids.
 +
 +
 +===What is a method?===
 +
 +In the terminology of SRB, a method is any executable piece of code
 +that is registered in the MCAT catalog.
 +
 +Methods can be defined to operate on data on the server before being
 +returned to the client. This can be quite efficient in cases where
 +the data object is being reduced by the method (for example, the method
 +selects a subset of the data object based on inputs, such as metadata
 +extractors (FITS, DICOM, etc)). Format converters, such as
 +tiff2gif and tex2ps can also be useful SRB methods.
 +
 +===Who is SRBadmin?===
 +
 +SRBadmin is the person who creates and manages SRB and
 +MCAT systems. A SRBadmin is also a registered SRB user who
 +has additional privileges compared to normal users. A SRBadmin
 +does NOT need to have "root" privilege.
 +
 +
 +===What is a (data object) collection?===
 +
 +A collection is a logical name given to a set of data objects. All
 +data objects stored in SRB/MCAT are stored in some collection. A
 +collection can have sub-collections, and hence provides a hierarchical
 +structure. As a simple analogy, a collection in SRB/MCAT can be
 +equated to a directory in a Unix file system. But unlike a file
 +system, a collection is not limited to a single device (or
 +partition). A collection is logical but the data objects grouped under
 +a collection can be stored in heterogeneous storage devices. There is
 +one obvious restriction, the name given to a data object in a
 +collection or sub-collection should be unique in that collection. <p>
 +
 +===What is the SRB logical name space?===
 +
 +It is easy to think of SRB Collections as Unix directories (or Windows
 +folders), but there is a fundamental difference. Each individual data
 +object (file) in a collection can be stored on a different physical
 +device. Unix directories and Windows folders use space from the
 +physical device on which they reside, but SRB collections are part of
 +a "logical name space" that exists in the MCAT and maps individual
 +data objects (files) to physical files.
 +
 +The logical name space is the set of names of collections
 +(directories) and data objects (files) maintained by the SRB. Users
 +see and interact with the logical name space, and the physical
 +location is handled by the SRB system and administrators. The SRB
 +system adds this logical name space on top of the physcial name space,
 +and derives much of its power and functionality from that.
 +
 +
 +===What is a resource?===
 +
 +In the terminology of SRB, a resource is a software/hardware system
 +that provides the storage functionalities. The term is equivalent to
 +"physical resource". For example, HPSS can be a resource, as can a
 +Unix file system.
 +
 +
 +===What is a physical SRB resource?===
 +
 +A physical SRB resource is a system that is capable of storing data
 +objects and is accessible to the SRB (see [FAQ:What kinds
 +of resources does the SRB support?]). It is registered in SRB with
 +its physical characteristics such as its physical location, resource
 +type, latency, and maximum file size.
 +
 +
 +===What is a logical SRB resource?===
 +
 +A logical SRB resource is a SRB resource that is derived from physical
 +SRB resources. A logical SRB resource might be derived with further
 +constraints on a registered physical resource or by combining more
 +than one physical resource as an entity. For example, if a physical
 +resource 'A' is defined using a particular directory in a HPSS, a
 +logical resource A-bar might be defined as a resource that restricts
 +to a further sub-directory in 'A'.
 +
 +===What is a logical SRB resource set?===
 +
 +A 'logical SRB resource set' is kind of logical SRB resource. It is
 +defined as a set of physical SRB resources. The aim is for this is to
 +give a unique (logical) name to a set of resources and when SRB opens
 +or writes a buffer to the logical resource it opens or writes to every
 +resource in that set. A logical resource containing multiple physical
 +resources can be treated as a 'single' resource when using it.
 +
 +===What is a compound SRB resource?===
 +
 +A compound resource allows the SRB to function as a complete (although
 +basic) archival storage system (also known as an hierarchical storage
 +system (HSM)). A compound resource may be configured to contain a
 +pool of cache resources and a tape resource. When a user creates a
 +file using a compound resource, the object created becomes a "compound
 +object". The actual data of a "compound object" may reside on cache or
 +tape or both. Unlike the SRB replica, a "compound object" always
 +appears as a single object even though there may be multiple copies of
 +the data. It is a simple hierarchical system where data migrate
 +automatically between cache and tape. Data is always staged on cache
 +automatically whenever it is accessed and migrates to tape by the
 +system administrator when more cache space is needed. The cache and
 +tape resources can be distributed across a WAN.
 +
 +===What is a user group?===
 +A user group is a uniquely identifiable name given to a set
 +of SRB registered users.
 +
 +===Who can form a user group?===
 +Any set of mutually agreeable users can form a user group.
 +
 +===Who can register a user group?===
 +SRBadmin has the authority to register user groups.
 +
 +===What is a domain?===
 +
 +A domain is a string used to identify a site or project. Users are
 +uniquely identified by their usernames combined with their domain
 +'smith@npaci'. SRBadmin has the authority to create domains.
 +
 +
 +===What are tokens?===
 +
 +Tokens are string items stored in the MCAT used as root items when
 +creating other items (resources, etc). We have quite a few predefined
 +tokens. SRBadmin has the authority to create tokens.
 +
 +===What is a replicated data object? ===
 +
 +In SRB, one can make copies of a data object and store the copies in
 +different locations. But, all these copies in SRB are considered to be
 +identifiable by the same identifier. That is, each copy is considered
 +to be equivalent to each other.
 +
 +===How can one read a replicated data object?===
 +
 +When a user reads a replicated data object, SRB cycles through all the
 +copies of the data object and reads the one that is accessible at that
 +time. It uses a simple replica identificatoin mechanism to order this
 +list of replicated data objects.
 +
 +===How can one create a replicated data object?===
 +
 +There are three ways of creating a replicated data object.
 +In the first method, which can be viewed as asynchronous replication,
 +one can create a data object (using Sput Scommand or srbObjCreate API),
 +and then replicate the data object using the Sreplicate Scommand or the
 +srbObjReplicate API. In the second method, which can be viewed as
 +synchronous replication, one can define a 'logical resource set'
 +as a set of resources and then create a data object in that logical
 +resource set (using Sput Scommand or srbObjCreate API). SRB
 +automatically replicates the data object as it gets written.
 +
 +One can also off-line create two data objects separately in a physical
 +resource and then register them as replicas of each other. This is called
 +out-of-band replication. SRB provides the means to replicate
 +collections of data objects recursively.
 +
 +Also see "Replicated Data Management user SRB" (GGF-4, February, 2002) at
 +<a href=http://www.npaci.edu/dice/Pubs/SRBReplication.ppt>
 +http://www.npaci.edu/dice/Pubs/SRBReplication.ppt</a>.
 +
 +===What is a Container?===
 +
 +A Container is a way to put together a lot of small files into one
 +larger file to improve performance. This works very well with
 +resources that include tapes (such as HPSS). The whole container
 +is retrieved from tape, cached on SRB disk, and then multiple files
 +can be quickly read and written on the container copy on disk.
 +The SRB handles the book-keeping for the container.
 +
 +===How do Containers work?===
 +
 +The SRB container is a like a tarball in the sense that
 +it stores multiple files as a one single file. It grows the container on
 +the fly by adding new files as they are ingested into the container.
 +Hence, unlike a tarball, the container can be grown as needed. Also,
 +unlike a tarball, users can read individual files without downloading
 +the container on to their desktops.
 +
 +The SRB keeps all the information about how the container is laid out in
 +its Metadata Catalog (MCAT) and uses it when retrieving individual files.
 +One can also modify and delete files in a container as though they are doing
 +these operations on a normal file and the SRB takes care of the operation.
 +
 +To answer a related question, the container is not "made" on the
 +desktop and then loaded into the SRB. Instead it is constructed in situ
 +on the resource. But what happens is that containers are normally
 +assigned a logical resource which has two physical components: an archive
 +resource such as the HPSS or roadnet-sam, and a cache resource
 +such as a unix file system (eg. roadnet-unix). All the construction,
 +file access and modifications are done on the cache resource and the
 +storage of a full container or a non-needed container is done on the
 +archive resource.
 +
 +Hence, the archive sees a single file and the construction is done before
 +getting into the archive on the cache resource (not on the users desktop)
 +which is also a resource controlled by the SRB.
 +
 +Containers grow in size and are pinched off into physical pieces by the
 +SRB so that a container might look really long, but are actually multiple
 +files of smaller sizes. Normally we recommend these pinching off to be
 +around 100 MBytes or 200 MBytes but then can be in the GB range also.
 +This is akin to blocks in a tape system.
 +
 +What this means is that the user sees one container where they "put" in
 +their data, but like a goods-train, the container is physically divided.
 +Obviously individual files are much smaller than the container size.
 +To give an example, in one of our collections, we have containers of size
 +around 50 MBytes, storing files of sizes 2 MBytes each. Each container
 +stores about 25 files in its physical blocks.
 +
 +
 +===Does Sget work properly for files that are in containers?===
 +
 +Yes, Sget works fine for files in containers. The MCAT stores all
 +the file offsets for each file in a container, and Sget will download
 +just the portion of the container that has the file you are trying to
 +download. Since Sget (currently) doesn't support any bulk operations
 +it's still slow trying to download a lot of small files.
 +
 +===How do you discover the container information?===
 +If you're on a windows machine InQ is the easiest, the file details
 +show the container information. In Scommands, SgetD on a file prints
 +container_name and the respective container (if any) that the file is
 +contained in.
 +
 +===Once I know which container a file is in, what is the most efficient way to download the data?===
 +
 +If you just need a few small files, then running Sget on each will be
 +the quickest. If you want all or most of the container, and you know
 +the container you want to download, then simply running 'Sbunload
 +<container>' will be much faster.
 +
 +===Who can register SRB users?===
 +SRBadmin can add new users to
 +the MCAT catalog.
 +
 +===Who can register physical or logical resources?===
 +SRBadmin has the authority and the required privileged utilities to register
 +physical and logical resources.
 +
 +===How does SRB provide access to remote storage systems?===
 +SRB provides access to remote storage systems through a proxy
 +mechanism. When one stores a data object under SRB, the data object is
 +stored and accessed by SRB acting as a proxy for the user. Because
 +of this mechanism, a user can store data objects on remote storage
 +systems without having personal accounts at these site. In this mode,
 +SRB acts as a 'system privileged proxy' user. The above proxy mode
 +also allows for SRB to SRB authentication enabling servers to
 +access files that are under the control of another SRB server.
 +
 +===Can multiple SRB servers be federated?===
 +Yes. SRB servers can communicate to other servers and
 +can form a federation. More than one federation can also exist
 +with one SRB federation being unaware of another.
 +A user can access data objects stored under any SRB in the federation
 +provided the user has proper permissions.
 +
 +In 3.0 we released a Federated MCAT capability, where complete
 +MCAT-enabled SRB systems can be integrated with other SRB federations.
 +Each MCAT member of such a federation is called an SRB Zone.
 +
 +===How does one SRB know about another SRB?===
 +
 +A SRB server knows about another SRB through the MCAT. When the
 +SRBAdmin creates a location, a SRB host is specified. When the
 +SRBAdmin ingests a resource at a location, that resource is associated
 +with that SRB host.
 +
 +===What are the different setup configurations that I can have at my site?===
 +First, there are three basic configurations for the SRB/MCAT system:
 + (1) client-only,
 + (2) server without MCAT and
 + (3) server with MCAT.
 +
 +Each of these three setups can be enabled with password,
 +password-encrypt1, and/or GSI authentication.
 +
 +In the simplest configuration, one can use the SRB client components
 +(client utilities, GUI applications, and libraries) at a site and use
 +SRB servers running at remote sites or hosts. A SRB client can connect
 +to a specific (possibly remote) SRB server and access data objects
 +that are under the control of that server and/or other servers in the
 +federation. With the client-only setup one cannot access any data
 +object at the local site through SRB. In the second setup, a site can
 +have a SRB server running locally but without any MCAT service. In
 +this setup, the local SRB server can provide access to local resources
 +and contacts another SRB server that has MCAT service for retrieving
 +the meta data about data objects. In the third configuration, one can
 +have a SRB server and a MCAT database running locally. Any client can
 +talk to any SRB server and need not necessarily talk to a local or
 +'nearest' server.
 +
 +===Is MCAT needed to run SRB?===
 +Yes, an MCAT is needed but you do not need to install one yourself.
 +Many sites use the SDSC MCAT-enabled SRB to support their SRB system.
 +
 +===Where can I get SRB?===
 +Source code and related material for SRB and MCAT can be obtained
 +from the web-site at <a href=http://www.sdsc.edu/srb>
 +http://www.sdsc.edu/srb</a>. The tar files
 +are PGP encrypted and one can get the passwords for decrypting them by
 +sending email to srb@sdsc.edu.
 +
 +
 +===What is a SRB Vault?===
 +
 +SRB vault is a data repository system that SRB can maintain in any
 +of the storage systems that it can access. For example, the SRB
 +runningat sdsc (host: srb.sdsc.edu) runs a SRB vault in its Unix
 +file system, and another SRB running at sdsc (host: hpss47.sdsc.edu)
 +runs SRB vaults in HPSS, a unix file system and a DB2 database.
 +SRB vaults provide a convenient storage area for storing data objects.
 +A data object stored in a SRB vault is stored as a SRB-written object
 +and its access is controlled through the MCAT catalog. This is
 +different to legacy data objects that can be accessed by SRB but which
 +are still owned by previous owners of the data. One can define SRB
 +vaults in any storage device that can be accessed by a SRB server.
 +In thecase of file systems such as Unix and HPSS, a separate
 +directory is used for the purpose, and in case of databases such as
 +Oracle, DB2 orIllustra, a system-defined table with LOB-space is
 +used for the purpose.
 +
 +===What is a SRB Space?===
 +SRB space is a union of all SRB Vaults that can
 +be accessed by a system of SRB servers. Users registered in the
 +system can store, retrieve and modify data objects (provided owners of
 +the data objects grant appropriate permits) in this space. Hence, one
 +can visualize SRB space as a logical storage volume that is
 +distributed and heterogeneous.
 +
 +
 +===What are the various data object interfaces supported by SRB?===
 +The SRB supports four types of interfaces. The first type is a
 +stream interface. It allows Unix file operations such as open, close,
 +read, write and seek on SRB data objects. The second is an
 +object-level interface. It provides means to
 +create, modify and destroy collections of objects, move, copy
 +and replicate objects, and apply user-defined proxy operations on
 +objectsto obtain a new type of the object. The third type is a
 +discovery-level interface for obtaining
 +meta data information about data objects (eg., replication information,
 +ownership, access, location, type information, etc), resources and
 +users. These operations accessthe information located in MCAT
 +catalog. Finally, SRB provides an interface for modifying the
 +data about data objects in SRB, and for performing access
 +control and auditing on various SRB objects.
 +
 +===How do I backup SRB space, both SRB data and MCAT metadata?===
 +
 +It is a good idea to backup the MCAT database daily. If your MCAT
 +DBMS (Oracle, DB2, etc) can do hot backups, then you can do it when
 +the system is being used. Otherwise, you will need to stop the SRB
 +(killsrb), do the cold backup and then restart the SRB.
 +
 +As for the files stored under the SRB, one can do it in multiple ways:
 +The first and easist is to backup the storage resource directory (for
 +example, the SRBVault directory), as an incremental backup. Depending
 +upon your system, you can do it on the fly or during PMs. Weekly PMs
 +will be helpful.
 +
 +A second startegy is to make sure that there are replicated copies of
 +the file in two distributed storage systems which hopefully don't
 +share any hardware and are geographically separated. This can be done
 +either under user-control (replicate only those that are needed) or
 +under srbAdmin control (possible with 3.0.2 release soon) which will
 +replicate all files that are modified to a particular backup resource.
 +
 +A third strategy, is to use the zoneSRB is to run a backupZone at a
 +remote site and back up to this zone from your zone. We are testing and
 +finalizing some protocols for doing this.
 +
 +===What ports does the SRB use? What ports do I
 +need to open in a firewall to run the SRB?===
 +
 +The firewall needs to open some ports on the server side, and possibly
 +on the client side too.
 +
 +On the SRB server server-side, you must open the port that the
 +srbMaster is listening on, plus a set of 100 or so configurable
 +data ports (described below).
 +
 +The srbMaster listens on the port defined in srb.h or specified in
 +srbPort environment variable (often set in the runsrb script). By
 +default, srb.h has DefaultPort "5544" but this can be changed via the
 +configure --enable-srbport=value option. Regardless of the
 +DefaultPort value, the srbMaster will listen on the port specified in
 +the srbPort environment variable value if it is defined. You can edit
 +the runsrb script to change this.
 +
 +(The clients also need to know the port number to connect to. For the
 +Scommands they will default to the value in srb.h or will use the
 +number specified in the srbPort line in each user's ~/.MdasEnv file.)
 +
 +
 +To configure a specific range of ports for the servers, include
 +'--enable-commports' on the configure line. Without this, the
 +srbServers will use arbitrary ports.
 +
 +By default, the configurable ports are 20000 to 20199 (see
 +mk/mk.config COMM_PORT_NUM_START and COMM_PORT_NUM_COUNT). So
 +'./confiugre --enable-commports' will restrict the servers to using
 +ports 20000 to 20199 (plus the DefaultPort). You can change the start
 +and number of the ports via configure options --enable-commstart and
 +--enable-commnum, for example: ./configure --enable-commports
 +--enable-commstart=21000 --enable-commnum=200
 +
 +
 +Based on our experience, we recommend using at least 100 ports, but
 +you may need more or possibly fewer. It depends on how many transfers
 +(especially parallel transfers), you will have going at the same time.
 +There's a MaxThread parameter in runsrb that specifies how many
 +threads to use; 4 is default, but you may wish to increase that. Each
 +of these threads uses a port, so each parallel transfer could take 4
 +(or MaxThread) ports. If the servers run out of ports, the later
 +transfers will fail. So it's a trade-off. If you don't set commnum
 +high enough, and you get quite a few transfers going at the same time,
 +some of them could fail.
 +
 +
 +On the SRB client side, if you use the Sput or Sget -m option (server
 +initiated connection for parallel I/O), the client's firewall needs to
 +open at least 16 configurable ports. These are the same ports as the
 +server uses, i.e. 20000 to 20199 if just --enable-commports is used.
 +As with the server, the port range can be modified, for example:
 +./configure --enable-commports --enable-commstart=21000
 +--enable-commnum=16
 +
 +But starting with 3.1.1, users can now use the -M option (client
 +initiated connection for parallel I/O) which does not require the
 +opening of ports on the client side.
 +
 +The commports are used for data transfer and are also normally used
 +for non-data connections too. If the server system (the srbMaster
 +server) is prespawning server processes (PRE_SPAWN_CNT is greater than
 +0 in runsrb, which is normally the case), then the server will use one
 +of these ports for its communication connection. The client will
 +connect to the srbMaster on the defaultPort (5544 or whatever) and
 +then will be told which port to connect back to for the particular
 +pre-spawned server that has been assigned.
 +
 +
 +===Does SRB exploit any specific interfaces to HPSS to optimize performance?===
 +
 +Yes, the SRB uses both the sequential POSIX like interface as well as the
 +parallel mover interface of HPSS for data movement. We have a set of SRB
 +API's for both types of interfaces. In addition, all our data
 +movement utilities have a switch that allows users to choose between these
 +two modes.
 +
 +===What kind of injest rates can SRB get?===
 +
 +In terms of number of files per second, for files stored in HPSS, it
 +is limited by the ingestion rate of HPSS (we are still using the
 +ENCINA based HPSS). The SRB bulk load mechanism can do 30-50 files
 +per second.
 +
 +For I/O rate, it pretty much depends on the hardware and network.
 +We can get up to 50 Mb/sec with 3-4 mover threads while the rate
 +from HSI is about 30-40 % higher because data does not have to go
 +through any SRB server.
 +
 +===How well does SRB handle packaging small data packets into "bundles",
 +and store?===