FAQ
From SRB
(Difference between revisions)
| Revision as of 22:15, 15 February 2006 Wayne (Talk | contribs) ← Previous diff |
Revision as of 00:12, 16 February 2006 Wayne (Talk | contribs) Next diff → |
||
| Line 27: | Line 27: | ||
| SRB systems to interact with each other and allow for seamless access of data | SRB systems to interact with each other and allow for seamless access of data | ||
| and metadata across these SRB systems. These systems are called the 'zones. | and metadata across these SRB systems. These systems are called the 'zones. | ||
| - | More information about zoneSRB can be found at: | + | More information about zoneSRB can be found at: [[FedMcat]] and [[README.zones]]. |
| - | [[FedMcat]] | + | |
| - | and | + | |
| - | [[README.zones]]. | + | |
| ===What kinds of resources does the SRB support?=== | ===What kinds of resources does the SRB support?=== | ||
| - | >Storage resources can be directories in Unix file systems, | + | |
| + | Storage resources can be directories in Unix file systems, | ||
| directories in Windows file systems, | directories in Windows file systems, | ||
| archival storage systems such as HPSS (and, previously, | archival storage systems such as HPSS (and, previously, | ||
| Line 55: | Line 53: | ||
| For commercial applications, please | For commercial applications, please | ||
| contact the UCSD Technology Transfer & Intellectual Property Services | contact the UCSD Technology Transfer & Intellectual Property Services | ||
| - | at invent@ucsd.edu. See <a | + | at invent@ucsd.edu. See [Is_SRB_Open_Source] |
| - | href=http://www.sdsc.edu/srb/srbOpenSource.html> | + | |
| - | http://www.sdsc.edu/srb/srbOpenSource.html | + | |
| </a> for more information. | </a> for more information. | ||
| ===How does the SRB compare to commercial software?=== | ===How does the SRB compare to commercial software?=== | ||
| + | |||
| As far as we know, there is no commercial product much like | As far as we know, there is no commercial product much like | ||
| the SRB (except for the commercial version of the SRB, see below). | the SRB (except for the commercial version of the SRB, see below). | ||
| Line 105: | Line 102: | ||
| NARA, and the Library of Congress. | NARA, and the Library of Congress. | ||
| See | See | ||
| - | <a href=http://www.sdsc.edu/srb/Projects/main.html> | + | [SRB_Projects] |
| - | http://www.sdsc.edu/srb/Projects/main.html</a>. | + | |
| - | <br> | + | |
| c) We support the Globus Grid Security Infrastructure (GSI) as an | c) We support the Globus Grid Security Infrastructure (GSI) as an | ||
| optional method of authentication. | optional method of authentication. | ||
| <br> | <br> | ||
| - | d) The <A HREF="#T9">SDSC Matrix</A> workflow management system is | + | d) The [[SDSC Matrix]] workflow management system is |
| a grid-based system and uses a Web Service Definition Language (WSDL) | a grid-based system and uses a Web Service Definition Language (WSDL) | ||
| interface. | interface. | ||
| Line 163: | Line 158: | ||
| compute-intensive operation and there will be some unavoidable performance | compute-intensive operation and there will be some unavoidable performance | ||
| penalty. | penalty. | ||
| - | See <a href=http://www.sdsc.edu/srb/SecureAndOrCompressedData.html> | + | See [[SecureAndOrCompressedData]] |
| - | http://www.sdsc.edu/srb/SecureAndOrCompressedData.html</a> | + | |
| for more information. | for more information. | ||
| Line 207: | Line 201: | ||
| We will freely provide answers and provide some limited support to | We will freely provide answers and provide some limited support to | ||
| - | help get sites up and running with the SRB. There is now a | + | help get sites up and running with the SRB. There is a |
| - | <a href=https://lists.sdsc.edu/mailman/listinfo.cgi/srb-chat> | + | [https://lists.sdsc.edu/mailman/listinfo.cgi/srb-chat srb-chat email list] |
| - | srb-chat email list</a> for SRB admins, developers and users to discuss | + | for SRB admins, developers and users to discuss |
| questions, problems, and solutions (it includes an archive of previous posts). | questions, problems, and solutions (it includes an archive of previous posts). | ||
| - | <a href=http://www.sdsc.edu/srb>Our web site</a> | + | [http://www.sdsc.edu/srb Our web site] |
| includes information | includes information | ||
| on current bugs, future plans, current projects, etc. The SRB tar | on current bugs, future plans, current projects, etc. The SRB tar | ||
| Line 221: | Line 215: | ||
| ===What operating systems does the SRB run on?=== | ===What operating systems does the SRB run on?=== | ||
| - | SRB has been ported to | + | SRB has been ported to a variety of Unix platforms including Linux, |
| - | a variety of Unix | + | Mac OS X, AIX (ex. SP-2 machines), Solaris, SunOS, SGI Irix and to |
| - | platforms including Linux, Mac OS X, AIX (ex. SP-2 machines), | + | Windows. The Windows version of the Server cannot be configured with |
| - | Solaris, SunOS, SGI Irix | + | an MCAT (so it talks to one that is), but can store and retrieve data |
| - | and to Windows. | + | from the Windows file system. SRB is easily portable to Unix-type |
| - | The Windows version of the Server cannot | + | OSes. |
| - | be configured with an MCAT (so it talks to one that is), but can | + | |
| - | store and retrieve data from the Windows file system. | + | |
| - | SRB is easily portable to Unix-type OSes. | + | |
| ===What authentication mechanisms are available for SRB?=== | ===What authentication mechanisms are available for SRB?=== | ||
| - | > SRB supports three types of authentication: 1) A basic password-based | + | |
| - | authentication, 2) password-based authentication in which the password | + | SRB supports three types of authentication: 1) A basic password-based |
| - | is used in a challenge-response protocol so no plain-text password | + | authentication, 2) password-based authentication in which the password |
| - | is sent on the network ("encrypt1"), and 3) GSI authentication. | + | is used in a challenge-response protocol so no plain-text password is |
| - | Encrypt1 is a simple and secure stand-alone authentication system. | + | sent on the network ("encrypt1"), and 3) GSI authentication. Encrypt1 |
| - | In both password-based systems, user passwords are stored in the MCAT and | + | is a simple and secure stand-alone authentication system. In both |
| - | users can record their passwords into their ~/.srb/.MdasAuth file to provide | + | password-based systems, user passwords are stored in the MCAT and |
| - | convenient and reasonably-secure access. | + | users can record their passwords into their ~/.srb/.MdasAuth file to |
| - | <a href=http://www.globus.org/security/>GSI (Globus Grid | + | provide convenient and reasonably-secure access. <a |
| - | Security Infrastructure)</a> is convenient when using other Globus | + | href=http://www.globus.org/security/>GSI (Globus Grid Security |
| - | tools but requires users to acquire Certificates (i.e. a Public | + | Infrastructure)</a> is convenient when using other Globus tools but |
| - | Key Infrastructure is needed). | + | requires users to acquire Certificates (i.e. a Public Key |
| - | Previously we also supported SEA authentication (SDSC | + | Infrastructure is needed). Previously we also supported SEA |
| - | Encryption and Authentication system) but now GSI provides similar | + | authentication (SDSC Encryption and Authentication system) but now GSI |
| - | functionality. | + | provides similar functionality. |
| Line 258: | Line 249: | ||
| agencies, and have every reason to believe that this will continue | agencies, and have every reason to believe that this will continue | ||
| long term. See our plans for the near future in our | long term. See our plans for the near future in our | ||
| - | <a href=http://srb.npaci.edu/bugzilla/> | + | [Bugzilla] system which we use to track bugs and pending features. |
| - | Bugzilla system</a> | + | |
| - | which we use to track bugs and pending features. | + | |
| ===Where can I find more information about SRB and related systems?=== | ===Where can I find more information about SRB and related systems?=== | ||
| - | <DD> We maintain a set of | + | |
| - | web-pages at <a href=http://www.sdsc.edu/srb>http://www.sdsc.edu/srb</a> | + | We maintain a set of |
| - | where a lot information about the SRB is available. This FAQ also | + | web-pages at <a href=http://www.sdsc.edu/srb>http://www.sdsc.edu/srb</a> |
| - | contains many links to additional information on specific topics. | + | where a lot information about the SRB is available. This FAQ also |
| - | There are also many documents included with the release under | + | contains many links to additional information on specific topics. |
| - | the MCAT and readme.dir directories. | + | There are also many documents included with the release under |
| - | <p> | + | the MCAT and readme.dir directories. |
| + | |||
| Some general information is also available in | Some general information is also available in | ||
| - | <A HREF=http://en.wikipedia.org/wiki/Storage_resource_broker> | + | [http://en.wikipedia.org/wiki/Storage_resource_broker> |
| - | wikipedia</A>. | + | wikipedia]. |
| - | <p> | + | or |
| + | *http://en.wikipedia.org/wiki/Storage_resource_broker. | ||
| + | |||
| ===What kind of query capabilibities are available?=== | ===What kind of query capabilibities are available?=== | ||
| Line 287: | Line 279: | ||
| See | See | ||
| - | <a href=http://www.sdsc.edu/srb/howToSrbEnable.html> | + | [[howToSrbEnable.html] |
| - | http://www.sdsc.edu/srb/howToSrbEnable.html</a> for a list of options | + | for a list of options |
| and links to additional information. | and links to additional information. | ||
| Line 319: | Line 311: | ||
| </li> | </li> | ||
| </UL> | </UL> | ||
| + | |||
| + | ==Interfaces and Tools== | ||
| + | |||
| + | ===What are the Scommands?=== | ||
| + | |||
| + | Scommands refers to a set of utility routines that can be used in a | ||
| + | Unix shell or Windows DOS command shell and access data and meta data | ||
| + | information from SRB and MCAT. | ||
| + | For more information on Scommands see README.utilities. Scommands | ||
| + | also have a set of man pages describing each of the commands. | ||
| + | One first logs in via a Sinit, and can then do Sls, Scd, Sput, Sget, etc. | ||
| + | Man pages are available at | ||
| + | [http://nbirn.ucsd.edu/ForUsers/Tutorials/SRB/manpagesv20.html] | ||
| + | and | ||
| + | [http://www.sdsc.edu/srb/srbcommands.html] | ||
| + | . | ||
| + | |||
| + | ===What is inQ?=== | ||
| + | |||
| + | inQ is a graphical SRB client for Windows 98/Me/NT/2k/XP. | ||
| + | |||
| + | In a nutshell, inQ provides a familiar file-manager-like interface that SRB | ||
| + | users can use to manage their data stored on SRB; actually it's more like a | ||
| + | file-manager interface on steroids. inQ looks and acts a lot like Windows | ||
| + | Explorer or Nautilus but also throws in features found in several web | ||
| + | browsers like Internet Explorer or Netscape Navigator. It offers an easy way | ||
| + | to manage metadata and access permissions, as well as a query builder | ||
| + | capable of performing nested queries. It also throws in friendly, | ||
| + | context-sensitive buttons that show you which actions can be performed on | ||
| + | any given item in SRB. | ||
| + | |||
| + | For more | ||
| + | information, see [inQ]. | ||
| + | |||
| + | ===What is MySRB? === | ||
| + | |||
| + | MySRB is a web-based browse and search interface to the SRB. | ||
| + | See the mySRB home page at | ||
| + | <a href=http://www.sdsc.edu/srb/mySRB/mySRB.html> | ||
| + | http://www.sdsc.edu/srb/mySRB/mySRB.html | ||
| + | for more information. | ||
| + | |||
| + | ===What APIs are available?=== | ||
| + | |||
| + | The most comprehensive programmatic API is the SRB | ||
| + | C library which can be linked with any application program. | ||
| + | We also have a pure Java client | ||
| + | library, which contains the most commonly used function calls | ||
| + | [Jargon] | ||
| + | Almost all of the | ||
| + | C library calls can be accessed through our Python binding. | ||
| + | Some sample programs for using the API can be found in | ||
| + | the release under test/examples. | ||
| + | Also see the API description at | ||
| + | [README.clientAPI] | ||
| + | http://www.sdsc.edu/srb/install/README.clientAPI</a> | ||
| + | and | ||
| + | the SRB Technical Information page at | ||
| + | <a href=http://www.sdsc.edu/srb/srb.html> | ||
| + | http://www.sdsc.edu/srb/srb.html</a>. | ||
| + | <p> | ||
| + | |||
| + | ===What is the srbBrowser? === | ||
| + | |||
| + | The srbBrowser is a java-based graphical SRB client. It provides a subset | ||
| + | of the functionality of inQ but can be used as a graphical client on Unix | ||
| + | systems. | ||
| + | |||
| + | |||
| + | ===What is the mcatAdmin (Java Admin Tool) ? === | ||
| + | |||
| + | mcatAdmin (also commonly called the Java Admin Tool) is a java-based | ||
| + | graphical (GUI) srb-mcat Administration tool. It assists in the | ||
| + | administration by making clear the available functions (like most | ||
| + | GUIs) and presenting available values from which to choose. For | ||
| + | example, when adding a new user, the existing domains are listed and | ||
| + | the adminstrator clicks on the domain to use for the new user. And | ||
| + | when modifying a user, one clicks on a domain and is given a list of | ||
| + | the users in that domain to choose from. The GUI includes windows to | ||
| + | create, display, and modify zones, users, resources, locations, | ||
| + | domains, and other tokens. There are also command-line utilities that | ||
| + | perform administrative functions. | ||
| + | |||
| + | |||
| + | ===What is Jargon (Java API)?=== | ||
| + | |||
| + | JARGON is a pure Java API for developing SRB (or other) datagrid interfaces. | ||
| + | The API currently handles file I/O for | ||
| + | local and SRB file systems and is easily extensible to other file | ||
| + | systems. File handling with JARGON closely matches file handling in | ||
| + | Sun's java.io API, a familiar API to most java programmers. | ||
| + | [Jargon] | ||
| + | |||
| + | ===What is the SDSC Matrix?=== | ||
| + | |||
| + | SDSC Matrix is a data grid workflow management system. Matrix can be used to | ||
| + | create, access and manage workflow process pipelines. Matrix internally uses | ||
| + | the Data Grid Language, which can be used to describe, query and control | ||
| + | process-flow pipelines. | ||
| + | See [Matrix] for more information. | ||
| + | |||
| + | The Matrix API can be used to define multiple SRB commands (and non-SRB grid | ||
| + | services) as a single dataflow process and execute it on multiple servers. | ||
| + | Matrix is available as a (SOAP/WSDL) web service. Matrix client programming | ||
| + | for SRB is made very simple using a developer friendly Java API (less | ||
| + | learning curve). | ||
| + | |||
| + | ===Does MCAT functionality vary from one client to another?=== | ||
| + | |||
| + | All functionalities are supported in the Scommand utilities for | ||
| + | Unix/Linux/MacOSX and Windows. This is because we do all development on | ||
| + | Unix clients and they get ported to other platforms. The MySRB provides a | ||
| + | different perspective to metadata management but on a single-file level | ||
| + | and collection level. It provides a good way of browsing and querying | ||
| + | of metadata across collections, and also allows for ingesting, | ||
| + | extracting, updating and deleting metadata and user annotations for | ||
| + | single SRBobject or SRB collection. The inQ provides a unique capability | ||
| + | where one can associate metadata to SRB objects and collections in an | ||
| + | intuitive way and also query across collection and form (temporary) | ||
| + | query-collections. This allows one to query based on attribute-metadata | ||
| + | and then get a collection and slowly refine the query to drill down to | ||
| + | a sub-collection that is of interest. | ||
| + | |||
| + | Hence, each client provides a unique way of handling metadata and their | ||
| + | management. One of our goals is to provide uniform functionality across | ||
| + | all client interfaces. But this requires a huge amount of programming | ||
| + | which we are unable to dedicate at this time. | ||
| + | |||
| + | ===Is one client better than another for entering metadata?=== | ||
| + | |||
| + | The Scommands client is very good for entering metadata. As mentioned | ||
| + | before one can use inQ or MySRB for entering/updating metadata of | ||
| + | individual SRB object and SRB collection. But the Scommands provide | ||
| + | for Bulk ingestion of metadata for multiple SRB objects possibly | ||
| + | residing in more than one SRB collection. | ||
| + | |||
| + | For cutting and pasting, there are utilities in both MySRB and in | ||
| + | Scommands for copying metadata from one SRBobject to another, from one | ||
| + | SRB collection to another and from one SRB collection to an SRB | ||
| + | object. This is different from cutting and pasting as it is done | ||
| + | internal to the SRB and not at the user-GUI. | ||
| + | |||
| + | ===Is there a way to load attribute/value pairs from another application into the MCAT?=== | ||
| + | |||
| + | Yes. SRB allows one to bulk ingest metadata associated with one or | ||
| + | more SRB objects. This is done by writing a metadata file in a | ||
| + | particular format. Hence, if an application can generate a file in | ||
| + | that format or one can write a wrapper which takes the application | ||
| + | output and creates the file in the SRB metadataFile format then we can | ||
| + | ingest the metadata attribute/value pairs. Actually, if you are doing | ||
| + | this in Unix-based systems you can do that by writing simple scripts | ||
| + | or by piping multiple applications together with the final pipe going | ||
| + | to the SRB Scommand for ingesting metadata. | ||
| + | |||
| + | Also another unique way of associating metadata for SRB objects is to | ||
| + | do automatic extraction INSIDE SRB and storing them in the MCAT. This is | ||
| + | done by writing simple templates (basically rules) that allows one to | ||
| + | identify the metadata values in the SRB object and then extracting them | ||
| + | and storing them as attribute-value pairs in the MCAT. We have done this | ||
| + | type of templates for multiple file formats including DICOM, FITS, email, | ||
| + | NSFAwardAbstracts and HTML files. This can be launched through the MySRB | ||
| + | or through the Scommands. | ||
| + | |||
| + | ==MetaData Catalog (MCAT)== | ||
| + | |||
| + | ===What is MCAT?=== | ||
| + | |||
| + | MCAT, or Meta data Catalog, is a meta data repository system | ||
| + | implemented at SDSC to provide a mechanism for storing and querying | ||
| + | system-level and domain-dependent meta data using a uniform interface. | ||
| + | MCAT provides a resource and data object discovery mechanism that can be | ||
| + | effectively used to identify and discover resources and data objects | ||
| + | of interest using a combination of their characteristic attributes | ||
| + | instead of their physical names and/or locations. | ||
| + | |||
| + | ===What is meta data?=== | ||
| + | Meta data is information about data. | ||
| + | |||
| + | |||
| + | ===What is system-level meta data?=== | ||
| + | |||
| + | MCAT considers five kinds of entities as primitive objects on which it | ||
| + | keeps additional information. These are: data objects, resources, | ||
| + | collections, users and methods. The system-level MCAT meta data items | ||
| + | are these primitive objects and others derived from these. | ||
| + | |||
| + | |||
| + | ===Since there are primitive MCAT objects, are there other MCAT objects?=== | ||
| + | |||
| + | There are many derived MCAT objects. For example, MCAT, in the current | ||
| + | release, supports notions of logical resources, compound resources, | ||
| + | user groups, etc. | ||
| + | |||
| + | |||
| + | ===What is application-level meta data?=== | ||
| + | |||
| + | Application-level meta data are information about data objects that | ||
| + | pertain to the non-systemic description of the data objects. | ||
| + | Application-level meta data are characterized by information that is | ||
| + | particular to the data for that application and are not generalizable | ||
| + | across all data objects. For example, location, size, creation date | ||
| + | information are systemic as they are available for every data object | ||
| + | where as information about how the data object was created and what | ||
| + | parameters were used in its creation may not be easily generalized | ||
| + | across all data objects and hence form part of application-level meta | ||
| + | data. Also, certain applications might have metadata specific to the | ||
| + | data object such as FITS metadata used in Astronomy and DICOM metadata | ||
| + | for medical images. | ||
| + | |||
| + | ===What is domain-dependent meta data?=== | ||
| + | |||
| + | Domain-dependent meta data is another name for application-level meta | ||
| + | data. | ||
| + | |||
| + | ===Does SRB/MCAT support application-level meta data?=== | ||
| + | |||
| + | Yes, the SRB does support application-level meta data. There are two | ||
| + | ways in which the SRB can support application-level meta data: First, | ||
| + | as user-defined metadata and second as schema-extended metadata. | ||
| + | |||
| + | ===What databases can be used for installing MCAT?=== | ||
| + | |||
| + | MCAT can be installed on either Oracle, DB2, Sybase, Postgres, or | ||
| + | Informix. SQLServer, since it is so similar to Sybase, should be | ||
| + | fairly straight-forward to implement too. | ||
| + | |||
| + | ==Administration/Operation== | ||
| + | |||
| + | ===What do I need to run SRB?=== | ||
| + | |||
| + | As noted elsewhere, one can have many different setups of SRB. You can | ||
| + | get the source code for any of these setups and build your SRB server | ||
| + | or client as needed. SRB has been ported on to several platforms (see | ||
| + | appropriate FAQ question) and we recommend that you use one of | ||
| + | these. If you port to other platforms, we would be glad to include it | ||
| + | in our subsequent releases. If you are setting up an MCAT-enabled | ||
| + | SRB, you will require either an Oracle, DB2, Sybase, or Postgres | ||
| + | database to which MCAT has been ported. We also recommend having a | ||
| + | separate user-account called 'srb' (or any variant such as "ucsdsrb") | ||
| + | which can be used for setting, administrating and running the | ||
| + | system. Once you have the source for SRB and/or MCAT, separate readme | ||
| + | files are included to take you through the build, setup and test | ||
| + | process. | ||
| + | |||
| + | |||
| + | ===What are the hardware requirements (disk space, CPU speed, memory) | ||
| + | for an SRB Server host? === | ||
| + | |||
| + | The hard disk size depends upon how much storage you want to | ||
| + | broker. The SRB software system itself requires only about 200 MBytes | ||
| + | of storage. For MCAT-enabled servers, the DBMS will require | ||
| + | additional space; on Linux, for example, the SRB with Postgres | ||
| + | and ODBC take about 700 MB. | ||
| + | |||
| + | We normally recommend 1 to 6 TBytes, depending on the usage. We have | ||
| + | specs for a system called the SRB Brick which costs around $15K for 6 | ||
| + | TBytes (January, 2005). | ||
| + | |||
| + | As for CPU speeds, any Linux system with more than 1.5 GHz should be fine. | ||
| + | Memory of 1/2 GB or 1 GB will be sufficient. | ||
| + | |||
| + | |||
| + | ===Which DB system should I use?=== | ||
| + | |||
| + | For a large and/or heavy-load instance of SRB, you will probably want | ||
| + | to use a commercial DBMS like Oracle. It does have better | ||
| + | performance, at least in many cases. It costs money though, and you | ||
| + | really should have a DBA to manage it. We also have a project planned | ||
| + | for the fall 2005 (with some UK folks) that would make use of some | ||
| + | Oracle features (including some stored procedures) that will further | ||
| + | enhance performance when using Oracle. | ||
| + | |||
| + | Postgresql works fine for initial testing (for "getting your feet | ||
| + | wet"), and it works fine for light to moderate data loads. It is also | ||
| + | relatively easy to install via our install.pl script. It is used in | ||
| + | production for some projects (for example, SIOExplorer project which | ||
| + | takes SRB ship-board for ocean surveys). Some Postgresql tuning is | ||
| + | available via the 'install.pl vacuum' and 'install.pl index' commands | ||
| + | (see install.pl for documentation). | ||
| + | |||
| + | For any DBMS system, the performance decreases as the size of the MCAT | ||
| + | increases. | ||
| + | |||
| + | ===What is a data object (data set)?=== | ||
| + | |||
| + | In the terminology of SRB, a data object is a "stream-of-bytes" entity | ||
| + | that can be uniquely identified. For example, a file in HPSS or Unix | ||
| + | is a data object, or a LOB stored in a SRB Vault database is a data | ||
| + | object. Importantly, note that a data object is not a set of data | ||
| + | objects/files. Each data object in SRB is given a unique internal | ||
| + | identifier by SRB. A data object is associated with a collection (see | ||
| + | below). Previously, we used the term "data set" for this, but are | ||
| + | phasing it out (as it was often confusing) and instead using "SRB data | ||
| + | object". | ||
| + | |||
| + | |||
| + | ===Who is a registered SRB user?=== | ||
| + | |||
| + | SRB users are registered in the MCAT catalog and are given unique SRB ids. | ||
| + | These identifiers are independent of the location or system ids, such as | ||
| + | Unix ids. | ||
| + | |||
| + | |||
| + | ===What is a method?=== | ||
| + | |||
| + | In the terminology of SRB, a method is any executable piece of code | ||
| + | that is registered in the MCAT catalog. | ||
| + | |||
| + | Methods can be defined to operate on data on the server before being | ||
| + | returned to the client. This can be quite efficient in cases where | ||
| + | the data object is being reduced by the method (for example, the method | ||
| + | selects a subset of the data object based on inputs, such as metadata | ||
| + | extractors (FITS, DICOM, etc)). Format converters, such as | ||
| + | tiff2gif and tex2ps can also be useful SRB methods. | ||
| + | |||
| + | ===Who is SRBadmin?=== | ||
| + | |||
| + | SRBadmin is the person who creates and manages SRB and | ||
| + | MCAT systems. A SRBadmin is also a registered SRB user who | ||
| + | has additional privileges compared to normal users. A SRBadmin | ||
| + | does NOT need to have "root" privilege. | ||
| + | |||
| + | |||
| + | ===What is a (data object) collection?=== | ||
| + | |||
| + | A collection is a logical name given to a set of data objects. All | ||
| + | data objects stored in SRB/MCAT are stored in some collection. A | ||
| + | collection can have sub-collections, and hence provides a hierarchical | ||
| + | structure. As a simple analogy, a collection in SRB/MCAT can be | ||
| + | equated to a directory in a Unix file system. But unlike a file | ||
| + | system, a collection is not limited to a single device (or | ||
| + | partition). A collection is logical but the data objects grouped under | ||
| + | a collection can be stored in heterogeneous storage devices. There is | ||
| + | one obvious restriction, the name given to a data object in a | ||
| + | collection or sub-collection should be unique in that collection. <p> | ||
| + | |||
| + | ===What is the SRB logical name space?=== | ||
| + | |||
| + | It is easy to think of SRB Collections as Unix directories (or Windows | ||
| + | folders), but there is a fundamental difference. Each individual data | ||
| + | object (file) in a collection can be stored on a different physical | ||
| + | device. Unix directories and Windows folders use space from the | ||
| + | physical device on which they reside, but SRB collections are part of | ||
| + | a "logical name space" that exists in the MCAT and maps individual | ||
| + | data objects (files) to physical files. | ||
| + | |||
| + | The logical name space is the set of names of collections | ||
| + | (directories) and data objects (files) maintained by the SRB. Users | ||
| + | see and interact with the logical name space, and the physical | ||
| + | location is handled by the SRB system and administrators. The SRB | ||
| + | system adds this logical name space on top of the physcial name space, | ||
| + | and derives much of its power and functionality from that. | ||
| + | |||
| + | |||
| + | ===What is a resource?=== | ||
| + | |||
| + | In the terminology of SRB, a resource is a software/hardware system | ||
| + | that provides the storage functionalities. The term is equivalent to | ||
| + | "physical resource". For example, HPSS can be a resource, as can a | ||
| + | Unix file system. | ||
| + | |||
| + | |||
| + | ===What is a physical SRB resource?=== | ||
| + | |||
| + | A physical SRB resource is a system that is capable of storing data | ||
| + | objects and is accessible to the SRB (see [FAQ:What kinds | ||
| + | of resources does the SRB support?]). It is registered in SRB with | ||
| + | its physical characteristics such as its physical location, resource | ||
| + | type, latency, and maximum file size. | ||
| + | |||
| + | |||
| + | ===What is a logical SRB resource?=== | ||
| + | |||
| + | A logical SRB resource is a SRB resource that is derived from physical | ||
| + | SRB resources. A logical SRB resource might be derived with further | ||
| + | constraints on a registered physical resource or by combining more | ||
| + | than one physical resource as an entity. For example, if a physical | ||
| + | resource 'A' is defined using a particular directory in a HPSS, a | ||
| + | logical resource A-bar might be defined as a resource that restricts | ||
| + | to a further sub-directory in 'A'. | ||
| + | |||
| + | ===What is a logical SRB resource set?=== | ||
| + | |||
| + | A 'logical SRB resource set' is kind of logical SRB resource. It is | ||
| + | defined as a set of physical SRB resources. The aim is for this is to | ||
| + | give a unique (logical) name to a set of resources and when SRB opens | ||
| + | or writes a buffer to the logical resource it opens or writes to every | ||
| + | resource in that set. A logical resource containing multiple physical | ||
| + | resources can be treated as a 'single' resource when using it. | ||
| + | |||
| + | ===What is a compound SRB resource?=== | ||
| + | |||
| + | A compound resource allows the SRB to function as a complete (although | ||
| + | basic) archival storage system (also known as an hierarchical storage | ||
| + | system (HSM)). A compound resource may be configured to contain a | ||
| + | pool of cache resources and a tape resource. When a user creates a | ||
| + | file using a compound resource, the object created becomes a "compound | ||
| + | object". The actual data of a "compound object" may reside on cache or | ||
| + | tape or both. Unlike the SRB replica, a "compound object" always | ||
| + | appears as a single object even though there may be multiple copies of | ||
| + | the data. It is a simple hierarchical system where data migrate | ||
| + | automatically between cache and tape. Data is always staged on cache | ||
| + | automatically whenever it is accessed and migrates to tape by the | ||
| + | system administrator when more cache space is needed. The cache and | ||
| + | tape resources can be distributed across a WAN. | ||
| + | |||
| + | ===What is a user group?=== | ||
| + | A user group is a uniquely identifiable name given to a set | ||
| + | of SRB registered users. | ||
| + | |||
| + | ===Who can form a user group?=== | ||
| + | Any set of mutually agreeable users can form a user group. | ||
| + | |||
| + | ===Who can register a user group?=== | ||
| + | SRBadmin has the authority to register user groups. | ||
| + | |||
| + | ===What is a domain?=== | ||
| + | |||
| + | A domain is a string used to identify a site or project. Users are | ||
| + | uniquely identified by their usernames combined with their domain | ||
| + | 'smith@npaci'. SRBadmin has the authority to create domains. | ||
| + | |||
| + | |||
| + | ===What are tokens?=== | ||
| + | |||
| + | Tokens are string items stored in the MCAT used as root items when | ||
| + | creating other items (resources, etc). We have quite a few predefined | ||
| + | tokens. SRBadmin has the authority to create tokens. | ||
| + | |||
| + | ===What is a replicated data object? === | ||
| + | |||
| + | In SRB, one can make copies of a data object and store the copies in | ||
| + | different locations. But, all these copies in SRB are considered to be | ||
| + | identifiable by the same identifier. That is, each copy is considered | ||
| + | to be equivalent to each other. | ||
| + | |||
| + | ===How can one read a replicated data object?=== | ||
| + | |||
| + | When a user reads a replicated data object, SRB cycles through all the | ||
| + | copies of the data object and reads the one that is accessible at that | ||
| + | time. It uses a simple replica identificatoin mechanism to order this | ||
| + | list of replicated data objects. | ||
| + | |||
| + | ===How can one create a replicated data object?=== | ||
| + | |||
| + | There are three ways of creating a replicated data object. | ||
| + | In the first method, which can be viewed as asynchronous replication, | ||
| + | one can create a data object (using Sput Scommand or srbObjCreate API), | ||
| + | and then replicate the data object using the Sreplicate Scommand or the | ||
| + | srbObjReplicate API. In the second method, which can be viewed as | ||
| + | synchronous replication, one can define a 'logical resource set' | ||
| + | as a set of resources and then create a data object in that logical | ||
| + | resource set (using Sput Scommand or srbObjCreate API). SRB | ||
| + | automatically replicates the data object as it gets written. | ||
| + | |||
| + | One can also off-line create two data objects separately in a physical | ||
| + | resource and then register them as replicas of each other. This is called | ||
| + | out-of-band replication. SRB provides the means to replicate | ||
| + | collections of data objects recursively. | ||
| + | |||
| + | Also see "Replicated Data Management user SRB" (GGF-4, February, 2002) at | ||
| + | <a href=http://www.npaci.edu/dice/Pubs/SRBReplication.ppt> | ||
| + | http://www.npaci.edu/dice/Pubs/SRBReplication.ppt</a>. | ||
| + | |||
| + | ===What is a Container?=== | ||
| + | |||
| + | A Container is a way to put together a lot of small files into one | ||
| + | larger file to improve performance. This works very well with | ||
| + | resources that include tapes (such as HPSS). The whole container | ||
| + | is retrieved from tape, cached on SRB disk, and then multiple files | ||
| + | can be quickly read and written on the container copy on disk. | ||
| + | The SRB handles the book-keeping for the container. | ||
| + | |||
| + | ===How do Containers work?=== | ||
| + | |||
| + | The SRB container is a like a tarball in the sense that | ||
| + | it stores multiple files as a one single file. It grows the container on | ||
| + | the fly by adding new files as they are ingested into the container. | ||
| + | Hence, unlike a tarball, the container can be grown as needed. Also, | ||
| + | unlike a tarball, users can read individual files without downloading | ||
| + | the container on to their desktops. | ||
| + | |||
| + | The SRB keeps all the information about how the container is laid out in | ||
| + | its Metadata Catalog (MCAT) and uses it when retrieving individual files. | ||
| + | One can also modify and delete files in a container as though they are doing | ||
| + | these operations on a normal file and the SRB takes care of the operation. | ||
| + | |||
| + | To answer a related question, the container is not "made" on the | ||
| + | desktop and then loaded into the SRB. Instead it is constructed in situ | ||
| + | on the resource. But what happens is that containers are normally | ||
| + | assigned a logical resource which has two physical components: an archive | ||
| + | resource such as the HPSS or roadnet-sam, and a cache resource | ||
| + | such as a unix file system (eg. roadnet-unix). All the construction, | ||
| + | file access and modifications are done on the cache resource and the | ||
| + | storage of a full container or a non-needed container is done on the | ||
| + | archive resource. | ||
| + | |||
| + | Hence, the archive sees a single file and the construction is done before | ||
| + | getting into the archive on the cache resource (not on the users desktop) | ||
| + | which is also a resource controlled by the SRB. | ||
| + | |||
| + | Containers grow in size and are pinched off into physical pieces by the | ||
| + | SRB so that a container might look really long, but are actually multiple | ||
| + | files of smaller sizes. Normally we recommend these pinching off to be | ||
| + | around 100 MBytes or 200 MBytes but then can be in the GB range also. | ||
| + | This is akin to blocks in a tape system. | ||
| + | |||
| + | What this means is that the user sees one container where they "put" in | ||
| + | their data, but like a goods-train, the container is physically divided. | ||
| + | Obviously individual files are much smaller than the container size. | ||
| + | To give an example, in one of our collections, we have containers of size | ||
| + | around 50 MBytes, storing files of sizes 2 MBytes each. Each container | ||
| + | stores about 25 files in its physical blocks. | ||
| + | |||
| + | |||
| + | ===Does Sget work properly for files that are in containers?=== | ||
| + | |||
| + | Yes, Sget works fine for files in containers. The MCAT stores all | ||
| + | the file offsets for each file in a container, and Sget will download | ||
| + | just the portion of the container that has the file you are trying to | ||
| + | download. Since Sget (currently) doesn't support any bulk operations | ||
| + | it's still slow trying to download a lot of small files. | ||
| + | |||
| + | ===How do you discover the container information?=== | ||
| + | If you're on a windows machine InQ is the easiest, the file details | ||
| + | show the container information. In Scommands, SgetD on a file prints | ||
| + | container_name and the respective container (if any) that the file is | ||
| + | contained in. | ||
| + | |||
| + | ===Once I know which container a file is in, what is the most efficient way to download the data?=== | ||
| + | |||
| + | If you just need a few small files, then running Sget on each will be | ||
| + | the quickest. If you want all or most of the container, and you know | ||
| + | the container you want to download, then simply running 'Sbunload | ||
| + | <container>' will be much faster. | ||
| + | |||
| + | ===Who can register SRB users?=== | ||
| + | SRBadmin can add new users to | ||
| + | the MCAT catalog. | ||
| + | |||
| + | ===Who can register physical or logical resources?=== | ||
| + | SRBadmin has the authority and the required privileged utilities to register | ||
| + | physical and logical resources. | ||
| + | |||
| + | ===How does SRB provide access to remote storage systems?=== | ||
| + | SRB provides access to remote storage systems through a proxy | ||
| + | mechanism. When one stores a data object under SRB, the data object is | ||
| + | stored and accessed by SRB acting as a proxy for the user. Because | ||
| + | of this mechanism, a user can store data objects on remote storage | ||
| + | systems without having personal accounts at these site. In this mode, | ||
| + | SRB acts as a 'system privileged proxy' user. The above proxy mode | ||
| + | also allows for SRB to SRB authentication enabling servers to | ||
| + | access files that are under the control of another SRB server. | ||
| + | |||
| + | ===Can multiple SRB servers be federated?=== | ||
| + | Yes. SRB servers can communicate to other servers and | ||
| + | can form a federation. More than one federation can also exist | ||
| + | with one SRB federation being unaware of another. | ||
| + | A user can access data objects stored under any SRB in the federation | ||
| + | provided the user has proper permissions. | ||
| + | |||
| + | In 3.0 we released a Federated MCAT capability, where complete | ||
| + | MCAT-enabled SRB systems can be integrated with other SRB federations. | ||
| + | Each MCAT member of such a federation is called an SRB Zone. | ||
| + | |||
| + | ===How does one SRB know about another SRB?=== | ||
| + | |||
| + | A SRB server knows about another SRB through the MCAT. When the | ||
| + | SRBAdmin creates a location, a SRB host is specified. When the | ||
| + | SRBAdmin ingests a resource at a location, that resource is associated | ||
| + | with that SRB host. | ||
| + | |||
| + | ===What are the different setup configurations that I can have at my site?=== | ||
| + | First, there are three basic configurations for the SRB/MCAT system: | ||
| + | (1) client-only, | ||
| + | (2) server without MCAT and | ||
| + | (3) server with MCAT. | ||
| + | |||
| + | Each of these three setups can be enabled with password, | ||
| + | password-encrypt1, and/or GSI authentication. | ||
| + | |||
| + | In the simplest configuration, one can use the SRB client components | ||
| + | (client utilities, GUI applications, and libraries) at a site and use | ||
| + | SRB servers running at remote sites or hosts. A SRB client can connect | ||
| + | to a specific (possibly remote) SRB server and access data objects | ||
| + | that are under the control of that server and/or other servers in the | ||
| + | federation. With the client-only setup one cannot access any data | ||
| + | object at the local site through SRB. In the second setup, a site can | ||
| + | have a SRB server running locally but without any MCAT service. In | ||
| + | this setup, the local SRB server can provide access to local resources | ||
| + | and contacts another SRB server that has MCAT service for retrieving | ||
| + | the meta data about data objects. In the third configuration, one can | ||
| + | have a SRB server and a MCAT database running locally. Any client can | ||
| + | talk to any SRB server and need not necessarily talk to a local or | ||
| + | 'nearest' server. | ||
| + | |||
| + | ===Is MCAT needed to run SRB?=== | ||
| + | Yes, an MCAT is needed but you do not need to install one yourself. | ||
| + | Many sites use the SDSC MCAT-enabled SRB to support their SRB system. | ||
| + | |||
| + | ===Where can I get SRB?=== | ||
| + | Source code and related material for SRB and MCAT can be obtained | ||
| + | from the web-site at <a href=http://www.sdsc.edu/srb> | ||
| + | http://www.sdsc.edu/srb</a>. The tar files | ||
| + | are PGP encrypted and one can get the passwords for decrypting them by | ||
| + | sending email to srb@sdsc.edu. | ||
| + | |||
| + | |||
| + | ===What is a SRB Vault?=== | ||
| + | |||
| + | SRB vault is a data repository system that SRB can maintain in any | ||
| + | of the storage systems that it can access. For example, the SRB | ||
| + | runningat sdsc (host: srb.sdsc.edu) runs a SRB vault in its Unix | ||
| + | file system, and another SRB running at sdsc (host: hpss47.sdsc.edu) | ||
| + | runs SRB vaults in HPSS, a unix file system and a DB2 database. | ||
| + | SRB vaults provide a convenient storage area for storing data objects. | ||
| + | A data object stored in a SRB vault is stored as a SRB-written object | ||
| + | and its access is controlled through the MCAT catalog. This is | ||
| + | different to legacy data objects that can be accessed by SRB but which | ||
| + | are still owned by previous owners of the data. One can define SRB | ||
| + | vaults in any storage device that can be accessed by a SRB server. | ||
| + | In thecase of file systems such as Unix and HPSS, a separate | ||
| + | directory is used for the purpose, and in case of databases such as | ||
| + | Oracle, DB2 orIllustra, a system-defined table with LOB-space is | ||
| + | used for the purpose. | ||
| + | |||
| + | ===What is a SRB Space?=== | ||
| + | SRB space is a union of all SRB Vaults that can | ||
| + | be accessed by a system of SRB servers. Users registered in the | ||
| + | system can store, retrieve and modify data objects (provided owners of | ||
| + | the data objects grant appropriate permits) in this space. Hence, one | ||
| + | can visualize SRB space as a logical storage volume that is | ||
| + | distributed and heterogeneous. | ||
| + | |||
| + | |||
| + | ===What are the various data object interfaces supported by SRB?=== | ||
| + | The SRB supports four types of interfaces. The first type is a | ||
| + | stream interface. It allows Unix file operations such as open, close, | ||
| + | read, write and seek on SRB data objects. The second is an | ||
| + | object-level interface. It provides means to | ||
| + | create, modify and destroy collections of objects, move, copy | ||
| + | and replicate objects, and apply user-defined proxy operations on | ||
| + | objectsto obtain a new type of the object. The third type is a | ||
| + | discovery-level interface for obtaining | ||
| + | meta data information about data objects (eg., replication information, | ||
| + | ownership, access, location, type information, etc), resources and | ||
| + | users. These operations accessthe information located in MCAT | ||
| + | catalog. Finally, SRB provides an interface for modifying the | ||
| + | data about data objects in SRB, and for performing access | ||
| + | control and auditing on various SRB objects. | ||
| + | |||
| + | ===How do I backup SRB space, both SRB data and MCAT metadata?=== | ||
| + | |||
| + | It is a good idea to backup the MCAT database daily. If your MCAT | ||
| + | DBMS (Oracle, DB2, etc) can do hot backups, then you can do it when | ||
| + | the system is being used. Otherwise, you will need to stop the SRB | ||
| + | (killsrb), do the cold backup and then restart the SRB. | ||
| + | |||
| + | As for the files stored under the SRB, one can do it in multiple ways: | ||
| + | The first and easist is to backup the storage resource directory (for | ||
| + | example, the SRBVault directory), as an incremental backup. Depending | ||
| + | upon your system, you can do it on the fly or during PMs. Weekly PMs | ||
| + | will be helpful. | ||
| + | |||
| + | A second startegy is to make sure that there are replicated copies of | ||
| + | the file in two distributed storage systems which hopefully don't | ||
| + | share any hardware and are geographically separated. This can be done | ||
| + | either under user-control (replicate only those that are needed) or | ||
| + | under srbAdmin control (possible with 3.0.2 release soon) which will | ||
| + | replicate all files that are modified to a particular backup resource. | ||
| + | |||
| + | A third strategy, is to use the zoneSRB is to run a backupZone at a | ||
| + | remote site and back up to this zone from your zone. We are testing and | ||
| + | finalizing some protocols for doing this. | ||
| + | |||
| + | ===What ports does the SRB use? What ports do I | ||
| + | need to open in a firewall to run the SRB?=== | ||
| + | |||
| + | The firewall needs to open some ports on the server side, and possibly | ||
| + | on the client side too. | ||
| + | |||
| + | On the SRB server server-side, you must open the port that the | ||
| + | srbMaster is listening on, plus a set of 100 or so configurable | ||
| + | data ports (described below). | ||
| + | |||
| + | The srbMaster listens on the port defined in srb.h or specified in | ||
| + | srbPort environment variable (often set in the runsrb script). By | ||
| + | default, srb.h has DefaultPort "5544" but this can be changed via the | ||
| + | configure --enable-srbport=value option. Regardless of the | ||
| + | DefaultPort value, the srbMaster will listen on the port specified in | ||
| + | the srbPort environment variable value if it is defined. You can edit | ||
| + | the runsrb script to change this. | ||
| + | |||
| + | (The clients also need to know the port number to connect to. For the | ||
| + | Scommands they will default to the value in srb.h or will use the | ||
| + | number specified in the srbPort line in each user's ~/.MdasEnv file.) | ||
| + | |||
| + | |||
| + | To configure a specific range of ports for the servers, include | ||
| + | '--enable-commports' on the configure line. Without this, the | ||
| + | srbServers will use arbitrary ports. | ||
| + | |||
| + | By default, the configurable ports are 20000 to 20199 (see | ||
| + | mk/mk.config COMM_PORT_NUM_START and COMM_PORT_NUM_COUNT). So | ||
| + | './confiugre --enable-commports' will restrict the servers to using | ||
| + | ports 20000 to 20199 (plus the DefaultPort). You can change the start | ||
| + | and number of the ports via configure options --enable-commstart and | ||
| + | --enable-commnum, for example: ./configure --enable-commports | ||
| + | --enable-commstart=21000 --enable-commnum=200 | ||
| + | |||
| + | |||
| + | Based on our experience, we recommend using at least 100 ports, but | ||
| + | you may need more or possibly fewer. It depends on how many transfers | ||
| + | (especially parallel transfers), you will have going at the same time. | ||
| + | There's a MaxThread parameter in runsrb that specifies how many | ||
| + | threads to use; 4 is default, but you may wish to increase that. Each | ||
| + | of these threads uses a port, so each parallel transfer could take 4 | ||
| + | (or MaxThread) ports. If the servers run out of ports, the later | ||
| + | transfers will fail. So it's a trade-off. If you don't set commnum | ||
| + | high enough, and you get quite a few transfers going at the same time, | ||
| + | some of them could fail. | ||
| + | |||
| + | |||
| + | On the SRB client side, if you use the Sput or Sget -m option (server | ||
| + | initiated connection for parallel I/O), the client's firewall needs to | ||
| + | open at least 16 configurable ports. These are the same ports as the | ||
| + | server uses, i.e. 20000 to 20199 if just --enable-commports is used. | ||
| + | As with the server, the port range can be modified, for example: | ||
| + | ./configure --enable-commports --enable-commstart=21000 | ||
| + | --enable-commnum=16 | ||
| + | |||
| + | But starting with 3.1.1, users can now use the -M option (client | ||
| + | initiated connection for parallel I/O) which does not require the | ||
| + | opening of ports on the client side. | ||
| + | |||
| + | The commports are used for data transfer and are also normally used | ||
| + | for non-data connections too. If the server system (the srbMaster | ||
| + | server) is prespawning server processes (PRE_SPAWN_CNT is greater than | ||
| + | 0 in runsrb, which is normally the case), then the server will use one | ||
| + | of these ports for its communication connection. The client will | ||
| + | connect to the srbMaster on the defaultPort (5544 or whatever) and | ||
| + | then will be told which port to connect back to for the particular | ||
| + | pre-spawned server that has been assigned. | ||
| + | |||
| + | |||
| + | ===Does SRB exploit any specific interfaces to HPSS to optimize performance?=== | ||
| + | |||
| + | Yes, the SRB uses both the sequential POSIX like interface as well as the | ||
| + | parallel mover interface of HPSS for data movement. We have a set of SRB | ||
| + | API's for both types of interfaces. In addition, all our data | ||
| + | movement utilities have a switch that allows users to choose between these | ||
| + | two modes. | ||
| + | |||
| + | ===What kind of injest rates can SRB get?=== | ||
| + | |||
| + | In terms of number of files per second, for files stored in HPSS, it | ||
| + | is limited by the ingestion rate of HPSS (we are still using the | ||
| + | ENCINA based HPSS). The SRB bulk load mechanism can do 30-50 files | ||
| + | per second. | ||
| + | |||
| + | For I/O rate, it pretty much depends on the hardware and network. | ||
| + | We can get up to 50 Mb/sec with 3-4 mover threads while the rate | ||
| + | from HSI is about 30-40 % higher because data does not have to go | ||
| + | through any SRB server. | ||
| + | |||
| + | ===How well does SRB handle packaging small data packets into "bundles", | ||
| + | and store?=== | ||