MCAT

From SRB

Contents

Introduction

With the emergence of ubiquitous access to information researchers are increasingly publishing scientific results on the Web, and providing Web-based access mechanisms for querying data repositories and applying analysis algorithms. Hence, the need for scalable information discovery systems that catalogs large number of data sets and enables analysis of tera byte sized data collections is becoming imminent. Information sources, including scientific data generated by simulations, observational data, standard reference test case results, published scientific algorithms for analyzing data, published research literature, data collections, and domain specific databases are a few requirements that currently drive the new information-based computing phenomenon. In this climate, meta information catalog systems play a vital role in publishing authenticated information, storing and disseminating such information through a controlled but uniform interface.

MCAT is a meta information catalog system implemented at SDSC as part of the Data Intensive Computing Environment (DICE) with requirements mainly based on the Storage Resource Broker system. MCAT catalog is designed to serve both a core-level of meta information and domain-dependent meta information. Part of the system is operational and used as a catalog service by SRB Version 1.1 released for use in the NPACI partnership. MCAT Version 1.1 was released along with SRB Version 1.1 in March 1998. A similar design and implementation document about SRB can also be found under the home page for SRB. A FAQ is also available for SRB and MCAT.



Architecture of MCAT System

Metadata can be defined as data about data. Meta information can be defined as data about entities that form first class objects in a system. From our experience in MDAS and SRB projects, we have inferred that there are four different types of elements that are of interest in a Data Intensive Computing Environment (DICE). They are:

Resources
such as hardware systems including computing platforms, communications networks, storage systems, peripherals; system software including DBMSs, file systems, operating systems, schedulers; and application systems including digital libraries, search engines and logical groupings of resources.
Methods
such as access methods for using standardized and non-standard APIs, system and user-defined functions for manipulating datasets, data mining, data sub-setting and format-conversion routines and composition of methods
Data Objects
such as individual data sets and collections of data sets
Users and groups
who can create/update/access the resources, methods and datasets, and the metadata on all four entities.

We briefly describe the MCAT architecture along with the view about how it can interact with other catalogs and/or support extensible (read application dependent) meta information. Figure 1 provides an architecture for MCAT system which provides an extensible architecture.

Image:mcatimg001.gif Figure 1: Architecture of MCAT

As can be seen from the architecture defined in Figure 1, MCAT provides an interface protocol for the application to interact with MCAT. The protocol uses a data structure for the interchange which is called MAPS - Metadata Attribute Presentation Structure. The data structure (which will have a wire-format for communication and a data format for computation) would provide an extensible model for communicating metadata information. Internal to MCAT, the schema for storing metadata might possibly differ from MAPS (eg. a database schemata may be used), and hence mappings between internal format and MAPS would be required. Also, the data interchanged using MAPS is identified through queries to the Catalog and hence a query generation mechanism is used in the process. Moreover, the query generator would use internal schema definitions, relationships and semantics to develop an appropriate query. Depending upon the catalog, different types of query would be generated. For example, if the catalog resides in a database, then SQL would be the target language of the query generator; if the catalog is in LDAP, then LDAP-specific query would be produced. This query generation in its extensions would be able to deal with multiple catalogs as well as heterogeneous catalogs, both internal and external to MCAT. Figure 2 provides the components of a system that can handle multiple catalogs both internal to MCAT and external to MCAT.


Image:mcatimg002.gif Figure 2: Multiple Catalog Interface

Figure 2 provides a mechanism for multiple catalogs to be served through MCAT. Assume that a group has their own database with a large quantity of metadata. Moreover, the organization might be peculiar to the group's field of activity and the metadata may reside as different types of objects -eg. files, tables, etc. The MCAT-to-catalog interaction is facilitated by a uniform abstract interface (to be define and may be called the Catalog Interface Definition) that allows external catalogs to communicate with MCAT. The communications would be of two different types: meta-metadata communication wherein semantics of metadata in the external catalog is communicated to MCAT (and vice versa) and metadata communication where metadata is transferred between the two catalogs in response to queries and updates. With the definition of an abstraction, including new catalogs would become an exercise in writing middle ware components (similar to what is done in the case of media-drivers in SRB.) We presume that the architecture is simple but powerful enough to deal with extensible metadata schemata and with multiple heterogeneous metadata services.



Entities in MCAT and their Attributes

One can view the meta information stored in MCAT from either of these four entity-centric views, though MCAT, by design, is predominantly data-centric being a supportive tool in the data intensive computing environment. Hence, we start with this view and briefly touch upon other views at the end.


Data View

Data objects are first class objects in MCAT. Meta information or metadata about data objects can be internal to the object (i.e., it can be part (or even whole) of it) or external and formed separately from the data object. Internal metadata are mostly extracted,derived or mined where as external metadata are annotated or logged from external sources.

Metadata about data objects describes the properties and attributes of the objects. We discuss the metadata of data objects as implemented in MCAT (Version 1.1). Note that there are (and more will be added) many other properties not listed here and which may be hidden. I am showing mainly those that are visible to the user of MCAT.

  1. identifier (internal - not seen by user)
  2. name
  3. types and formats (a type-hierarchy is supported)
  4. size
  5. comments
  6. liveness (i.e., deleted or exists or locked or under construction, etc)
  7. replica-number (a data object can have many clones)
  8. creation-time stamp
  9. creation-owner

Even though a data object is a first class object, it cannot reside by itself in MCAT. Every data object is associated with a (data) collection. A collection, in MCAT, can be viewed as a set of data objects and other collections (which are called sub-collections to the collection). The collection lattice is strictly hierarchical, and notions such as navigation through a collection, searching recursively through a collection hierarchy, etc are possible within MCAT.

Hence a data object has another property:

  1. collection name (to which it belongs)

A question may arise, whether a data object can belong to more than one collection? In MCAT 1.1, the answer is no. This may change in later releases.

Next, we look at orthogonal connections of data objects with other first class entities in MCAT. Every replica of a data object resides in a physical storage in a location in that storage. Hence, we have the following type of meta information:

  1. physical resource where it is located
  2. location inside the resource

Each data object has an access control mechanism imposed on it through an access control list (ACL) which connects it to the user entity. An entry in the ACL is a triple:

  1. < dataObjectId, userId, PermissionId >.

Each user is given exactly one permission per data object. Each PermissionId has an associated list of actions that is permitted, hence this only-one restriction should not cause any problem. New PermissionIds are added at the discretion of the SRBAdmin. With respect to a dataObject, the permissible actions include read, write, control, grantTicket. More information on access control in MCAT can be found in a paper in the DICE publication section.

The notion of ACL is also associated with collections with permissible actions being createDataObject, createSubCollection, control, grantTicket. The paper mentioned above also provides more details.

There is also a functionality of auditing in MCAT/SRB. Each action on a data object can be audited and the actions success or failure noted in an audit trail. The owner of a data object (or anyone with control permission on the object) can impose an audit on actions on a data object for specific users. Then whenever that user accesses the dataset the action is noted in the audit trail. The user can impose a restriction on the audit, by stating that he/she doesn't want to be audited. In such a case, an anonymous user audit trail is written. The audit trail is a a 5-tuple:

  1. < objectId, userId, actionId, timeStamp, Comments >

Apart from obtaining a read/write permission on a data object a person can get a "ticket" on a data object. A ticket (which is a 10-character string internally generated) provides a holder with an action (currently only read) permit on the data object. A ticket giver can impose some restrictions on the ticket such as who can use it (can be a registered user, a registered user group, or any one in the world!), when (time-period) it can be used and how many times it can be used. Hence a ticket on a data object is an 8-tuple:

  1. < ticketValue, objectId, userId, actionId, beginTime, endTime, accessCount, ticketGiver >

A ticket can be issued at collections-level but allows a holder access only to data objects that are controlled by the ticketGiver. A collection-level ticket can be given with a recursive option allowing the holder to drill down the collection.

Currently, MCAT does not support the method entity and hence connections with it from a data-centric view is not available now. The schema of MCAT is designed for such a connection, and we plan to enable these links incrementally in further releases.

Apart from these properties which one can term as attributes that apply to all data objects ( and hence can be called as system-oriented meta information), one can associate properties that are specific to a data object or collections holding data objects of same or similar type. These set of properties can be termed as application-, domain-oriented or user-defined meta information. Currently, domain-oriented meta information is not supported in release 1.1; but, we plan to include facilities for

  • creating application-oriented meta information schema,
  • supporting insertion, deletion, update operations on the schema, and
  • supporting data discovery through the schema.

Domain-oriented meta information can come from two types of sources. Such information might reside inside MCAT (as stated above) or would come from other catalogs that hold such information. We plan to consider extending MCAT system to be able to interface with other catalogs in order to acquire, ingest and publish meta information from and to external catalogs.

There are other concepts and attributes that MCAT currently does not support for data objects. Concepts such as partitioning of data objects, versioning, multi-valences, lineage(both in terms of data objects and methods used), derivatives, applicable methods, triggers, locking, public and private access keys and encryption keys on data objects or collections, summaries or aggregations (eg. thumbnail sketch of an image), performance measurements, commerce attributes, etc. These will be added in later releases incrementally.



Resource View

As mentioned earlier, MCAT is mainly data-centric. But other views are also possible. Coming from the resource-centric view point, we have the following meta information on resources.

  1. name
  2. type
  3. access address
  4. default location template in resource for storing data objects. (useful mainly for storage resources)
  5. replica-number (copies of same resource, any one of the copies would do)
  6. comments

An ACL for resources is also maintained in MCAT. (Currently all registered users have access to all resources registered in MCAT). MCAT also has facilities for auditing access to resources (currently no auditing is performed on resources).

There are several other meta information that we will be incrementally adding to the list in our later releases. For example, a set of static and dynamic properties are being gathered for inclusion as part of our study for utilizing Network Weather Service and other resource weather service. Once domain-oriented meta information facility has been incorporated into MCAT, one can use it to ingest resource-specific meta information also.

From a resource-centric view, apart from the above properties (supported and used currently) of resources, MCAT also provides an abstract definition for resources. In this abstraction, one can gather a set of physical resources to form a single logical resource with some abstract properties. For example, in our current release, we have defined the concept of a `replicated-resource' as a logical resource formed as a set of physical (possibly heterogeneous) resources. When one creates a data object in such a replicated-resource then the object is replicated in each one of the component physical resource. Hence a user has the view of writing to a single resource but the data gets written in multiple resources under the covers. This is useful for enhancing throughputs and for fault tolerance. Other concepts such as 'striped-resources' (blocks written in different resources in round robin basis), 'logical stripes' (data written in different resources based on some underlying logical compositions, eg. image files in one place, data files in another place and metadata files in yet another place), 'write-once resources', 'read-only resources', etc and combinations thereof. The same physical resource can be bundled into several logical resources and even tailored to individual users needs. Such a logical resource abstraction also allows one to define classes of service (aka HPSS), mimic other resource types (eg. a printer is a write-only storage resource), define new kinds of parallel i/o (eg. striped-resources can be used as providing parallel channels to a data object), etc. Note that in order to use these abstractions, one not only needs to define them in MCAT, but also have them enabled in SRB (not an easy task).

In addition to the data-centric view, one can also view data objects from the resource point of view as to what is stored in each resource and which users have created or have access to objects in a resource.



User View

From a user-view, the following attributes are stored:

  1. name
  2. type (discussed below)
  3. address
  4. email
  5. phone
  6. pass phrase

A user is given a type. We have classes of users (privileged (eg. srbadmin), normal, projects) which can be used to define different capabilities.

A user is also associated with at least one domain:

  1. domain

A domain is a logical notion in MCAT, used to distinguish users at some higher level. Domains can be physical such as sdsc, ucsd, caltech users, or can be abstract such as project 'npaci' domain, 'adl' domain etc. Domains are defined in an hierarchical lattice. Currently, domains are being used as a vehicle for authenticating users allowing same names to be used for users in different domain.

MCAT also supports the concept of a groups of user. One can register a user-group and associate individual users with that group. Access control can be applied at group levels. No recursive definitions of user-groups are allowed currently. User-group names are treated as individual users and hence they can also create, delete and/or update data objects. A user group called 'public' is defined by default and all users are members of it. A user can belong to more than one group:

  1. user-group(s)



Method View

(not implemented in Version 1.1)
Methods can be viewed as any executable code that has well-defined input and output parameter list. Methods may be of two kinds: (1) executable under the control of MCAT/SRB and (2) executable outside MCAT/SRB. The first kind would be used as proxies, user defined function, conversion and translation programs, dynamic methods, etc, that can be used by SRB or MCAT to perform some operation at the server level. The second kind of methods are for execution either at the client side or by a third-party computation daemon like Globus or Legion. This section will be modified as and when methods are incorporated into MCAT/SRB.



Dynamic Meta Information

(not implemented in Version 1.1)
Apart from properties and attributes listed above (both implemented and otherwise), there are properties on the above entities that are dynamic - i.e., which need to be computed at the time of access. For example, the location of a data object in a hierarchical storage system might be dynamic since it may reside in any of the caches, or on tape or on the shelf, or the available space in a disk resource may need to be dynamically calculated. These properties are associated with a method that is invoked when the value is accessed. We plan to include facilities for such such dynamic attributes in a future release. Incorporations of metadata from Network Weather Service and other Resource Weather Services would possibly use such dynamic methods.



Data Models and Data Exchange Formats for Meta Information

Data Models are standards used for structuring information that can be used by a group. Dublin Core and FGDC are two examples of data models. Data exchange formats provide a standard means to communicate metadata. z39.50 or XML are examples of such communications standard. In order for MCAT to be widely acceptable it should be able to deal with widely used data models and exchange formats. Currently, MCAT uses its own data model and exchange format, called MAPS - Metadata Attribute Presentation Structure. Hence, mapping from MAPS to other models and exchange formats would be highly useful. Such (two-way) mappings would be useful for presenting MCAT meta information in formats of other models and also for accessing (and possibly ingesting) meta information from other model repositories (eg. to access domain-oriented meta information from other catalogs.). Currently, in Version 1.1, no such mapping is provided. A mapping to Dublin Core format is under implementation. Other mappings would be incorporated in later releases.



Uniform Access Interface to Meta Information Attributes

We discuss a uniform interface for obtaining meta information from the MCAT catalog. The interface uses a structure called Metadata Attribute Presentation Structure (or MAPS) that is independent of the internal representation of the attributes inside the catalog. In fact, as currently implemented, the attribute structure is stored as a set of relational tables numbering around 30. In an LDAP framework, the structure might be stored in a tree format and in a file system, the information might be stored in a hashed, hierarchical file system. MAPS provides a uniform interface specification that can be used between user applications and the MCAT catalog and also between the MCAT catalog and external catalog systems.

There are three different structures which form the MAPS:

  • MAPS_Query_Struct, a structure used in querying the meta catalog,
  • MAPS_Result_Struct, a structure for transferring results from the catalog,
  • MAPS_Update_Struct, a structure for ingesting and modifying meta information in meta catalog
  • MAPS_Definition_Struct, a structure used for defining and exchanging meta information schemata

We define below the grammar for the four structures . A query to MCAT is in the form of a query in the MAPS (Metadata Attribute Presentation Structure) query structure format. The MAPS format provides a flexible means for querying for meta information, but does not allow arbitrary complicated queries. The rules shown in highlighted color correspond to the set currently implemented in Version 1.1.


MAPS Query Structure

The MAPS Query Structure is defined by the following grammar:

< MCAT_Query_Struct >  ::= < MAPS_Query_Struct >
 ::= < Union_Criteria > < MAPS_Query_Struct > union < MAPS_Query_List >
< MAPS_Query_List >  ::= < MAPS_Query_Struct >
 ::= < MAPS_Query_Struct > union < MAPS_Query_List >
< Union_Criteria >  ::= function | other
< MAPS_Query_Struct >  ::= < Query_Id > : < Schema_List > : < Query_Restriction > : < Out_List > :< Q_Condition_List >
 ::= < Auxiliary_Query_Struct >
< Query_Id >  ::= null
 ::= number
< Schema_List >  ::= < Schema_Name >
 ::= < Schema_Name > , < Schema_List >
< Schema_Name >  ::= null /* implicitly uses MCAT's core schema*/
 ::= name
< Query_Restriction >  ::= [distinct] count number
 ::= others
< Out_List >  ::= < Attribute_Id > < Aggregate_Oper > < Sort_Oper >
 ::= < Attribute_Id > < Aggregate_Oper < Sort_Oper >, < Out_List >
< Attribute_Id >  ::= identifier for attribute in schema list
< Sort_Oper >  ::= null
 ::= ordering_number asc | ordering_number desc | other
< Aggregate_Oper >  ::= max | min | avg | sum | count | group
 ::= other
< Q_Condition_List >  ::= null
 ::= < Q_Condition >
 ::= < Q_Condition > & < Q_Condition_List >
< Q_Condition >  ::= < Attribute_Function > < Attribute_Id > < Comparison_Struct >
< Attribute_Function >  ::= null
 ::= other
< Comparison_Struct >  ::= < Comparator > < Const_Struct >
 ::= < Not_Value > < Special_Comp_Struct >
< Comparator >  ::= <> | < | > | <= | >= | other
< Const_Struct >  ::= < Const_Value > < Const_Type >
< Const_Value >  ::= quoted string
 ::= (number , ... , number ) /* list of numbers */
 ::= (quoted string , ... , quoted string ) /* list of strings */
 ::= other
< Const_Type >  ::= null
 ::= ( logical_type_definition , ontology , language , other )
< Not_Value >  ::= not
< Special_Comp_Struct >  ::= between < Const_Value > and < Const_Value >
 ::= in ( < In_List > )
 ::= like < Like_Value > < Escape_Expr > /* value in catalog should be a string */
 ::= includes < Const_Value > /* value in catalog should be a list */
 ::= superset ( < In_List > ) /* value in catalog should be a list */
 ::= subset ( < In_List > ) /* value in catalog should be a list */
 ::= other
< In_List >  ::= < Const_Value >
 ::= < Const_Value >, < In_List >
< Like_Value >  ::= quoted string with '_' or '%' used as wild card character and string respy.
< Escape_Expr >  ::= escape character
< Auxiliary_Query_Struct >  ::= count number
 ::= skip number count number

Note that the MAPS query format is a derivative of SQL syntax. We followed SQL because of two reasons:

  • large metadata catalogs require database systems (and our MCAT is also implemented in DB2 and Oracle), and
  • metadata are normally given as attribute-value pairs whose search can easily be translated into SQL-like query.

The MAPS format also has properties similar to that of z39.50. The definition of MAPS also does not preclude any hierarchical structure in the attribute set. This hierarchical relationship, (common with object-oriented paradigm), if it exists, can be captured in the schema definition and used by the meta catalog that provides the answers; the query-view can still be flat.

A query to MCAT consists of a list of queries in MAPS format whose answer sets are unioned to obtain the final results. The answer sets from the list should be union-compatible. Moreover, the union-criteria is used for performing the union. Examples of criteria might be to use a function to interleave data at a certain percentage basis. Current implementation deals with a single MCAT query. An MCAT query can span more than one schema which by default includes the MCAT's core schema described in the third section. An example of a query that can span more than one schema is given below:

Let ELIB_FLORA be a schema which holds metadata about pictures of flowers and has information about when and where it was taken. Let WEATHER be a schema that holds information about rainfall. Then a query across the two schema might ask for a picture of a particular flower taken during a month when the average rainfall was more than 1 inch. Another query might ask for all pictures of flowers taken in a particular region which had sparse rainfall in month when the picture was taken. If there is another catalog called ADL_REGION holding metadata about aerial maps of regions, then another query may span the three schemas and ask for pictures taken of the region during that time.

An important point to note is that even though the MAPS query structure has condition list and selection attribute list as in SQL statements, there is no concept of a table. The condition list provide only conditions on individual attributes but does not provide any criteria for joining tables, if the schema is stored in multiple tables in a database. We chose this approach because of three reasons:

  • A user may not be conversant in SQL and hence, even if aware of a tabular structure of the meta catalog, may not be able to write a SQL statement. This may lead to canning of queries written by expert SQL programmers, again defeating the purpose of general data discovery. Our model is based a user
  • The query language should not be dependent upon the implementation of the schema. Hence, even if the underlying schema implementation changes, the user should be immune from such changes.
  • Existing and emerging meta data models such as Dublin Core and FGDC are flat structure and amenable to use through MAPS data structures.

Our MAPS model is based on the premise that the user should be aware only of attributes that are of relevance and be able to query based on those attributes. Intuitively, a user has some conditions on some attribute values and would like to query corresponding values for other attributes. The system should take this information and dynamically generate a query to the implementation and produce a result. We discuss in detail in the section on [#Implementation of the MCAT catalog Implementation of the MCAT catalog], the features of the automatic query generation system of MCAT based on these information.

As can be seen, the MAPS query structure is some what restrictive compared to full-fledged SQL. For example, one is not allowed to perform any nesting of queries. We consider that such complex query structures, though needed in OLTP and OLAP applications, are not relevant in a data or resource discovery framework. One can perform disjunctive conditions through in lists and conjunctive queries through union operative.



MAPS Result Structure

The MAPS Result structure is defined by the following grammar:

< MCAT_Result_Struct >  ::= < MAPS_Result_Struct >
< MAPS_Result_Struct >  ::= < Query_Id > : < Result_Height > < Result_Width > < Continuation_Index > < Result_Struct >
< Result_Struct >  ::= < Attribute_Result_Struct >
 ::= < Attribute_Result_Struct > : < Result_Struct >
< Attribute_Result_Struct >  ::= < Schema_Name > : < Attribute_Id > : < Value_List >
< Value_List >  ::= < Const_Struct >
 ::= < Const_Struct > < Delimiter > < Value_List >
< Delimiter >  ::= special character
< Continuation_Index >  ::= flag/number /* denoting if more rows are available */

The result structure provides a means to transfer tabular data from the catalog to client or from one catalog to another. By providing schema names and the attribute identifiers, the result structure can be processed independent of the query information. The same result structure is used to return results from a main MAPS query or an auxiliary MAPS query. The query identifier is used as a provision for non-blocking operation, but currently query and answer processing are serialized in version 1.1



MAPS Update Structure

The MAPS Update structure is defined using the following grammar:

< MCAT_Update_Struct >  ::= < MAPS_Update_Struct >
< MAPS_Update_Struct >  ::= insert < Schema_Name > : < Insert_List >
 ::= delete < Schema_Name > : < Delete_List > : < Q_Condition_List >
 ::= update < Schema_Name > : < Insert_List > : < Q_Condition_List >
 ::= insert token < Schema_Name > < Token_Name > < Const_Value > [ < Parent_Const_Value > ]
< Insert_List >  ::= < Attr_Value_Pair >
 ::= < Attr_Value_Pair > , < Insert_List >
< Attr_Value_Pair >  ::= ( < Attribute_Id > , < Const_Struct > )
< Delete_List >  ::= < Attribute_Id >
 ::= < Attribute_Id > , < Delete_List >
< Parent_Const_Value >  ::= < Const_Value >

The implementation of the insert and delete lists in Version 1.1 is slightly different from the grammar. Instead of a list of attribute value pairs, we use two lists one of attribute id's and another of values.



MAPS Definition Structure

The MAPS Definition structure is formed as follows:

< MCAT_Definition_Struct >  ::= < MAPS_Definition_Struct >
< MAPS_Definition_Struct >  ::= < Schema_Name > < Schema_Definition >
 ::= < Schema_Name > < Schema_Meta_Information >
 ::= < Schema_Name > < Schema_Constraints >
 ::= < Schema_Name > < Intra_Schema_Relation_Definition >
 ::= < Schema_Name > < Inter_Schema_Relation_Definition >
< Schema_Definition >  ::= < Token_Definitions > : < Attribute_Definitions >
< Token_Definitions >  ::= null
 ::= < Single_Token_Definition >
 ::= < Single_Token_Definition > , < Token_Definitions >
< Attribute_Definitions >  ::= < Single_Attribute_Definition >
 ::= < Single_Attribute_Definition > , < Attribute_Definitions >
< Single_Token_Definition >  ::= < Token_Name > < Token_Identifier_Type > < Token_Data_Type > < Token_Root_Value >
< Token_Identifier_Type >  ::= hierarchical | list
< Token_Name >  ::= name
< Token_Data_Type >  ::= < Data_Type >
< Data_Type >  ::= double | char( integer ) | varchar( integer ) | binary( integer ) | blob( integer [K|M|G] ) | date | timestamp
< Token_Root_Value >  ::= < Const_Value >
< Single_Attribute_Definition >  ::= token < Attribute_Name > [ < Schema_Name > . ] < Token_Name > [< Attribute_Default_Value > ]
< Single_Attribute_Definition >  ::= < Attribute_Name > < Attribute_Data_Type > [nullable] [< Attribute_Default_Value > ]
< Attribute_Name >  ::= name
< Attribute_Default_Value >  ::= < Const_Value >
< Attribute_Data_Type >  ::= < Data_Type >
< Schema_Meta_Information >  ::= null
 ::= resource_name
 ::= database_name
 ::= owner_name
 ::= subject of schema
 ::= comments
 ::= < Schema_Meta_Information > < Schema_Meta_Information >
< Schema_Constraints >  ::= null
 ::= < Q_Condition_List >
< Intra_Schema_Relation_Definition >  ::= [ < Attribute_Function > .] < Attribute_Set > < Tabular_Relation > [ < Attribute_Function > .] < Attribute_Set >
< Tabular_Relation >  ::= n | nm /* 1:1, 1:n or n:m relationships between two attributes; all others are considered as unrelated */
< Inter_Schema_Relation_Definition >  ::= [ < Attribute_Function > .] < Attribute_Set > < Comparator > [ < Attribute_Function > .] < Schema_Name > . < Attribute_Set >
< Attribute_Set >  ::= < Attribute_Name>
 ::= ( < Attribute_Name> ... < Attribute_Name> )

The MAPS definition structure provides the grammar for defining new meta catalog schema and also for interrelating two schemas through their attributes. Even though, one is tempted to use SQL's create table command as a means of defining the schema of a meta catalog, we refrained from that approach because of the imposition that users be SQL-conversant. Instead, we chose an approach that ties in with well with the MAPS query and result structures that deal with lists of attributes and comparisons among them. Hence, to define a new meta catalog schema, the user specifies an attribute list along with the types of each attribute and further provide how the attributes are related using 1:1, 1:n and n:m relationships. This information should be intuitive to the creator of the meta information and is independent of the underlying database or other catalog language (eg. LDAP) implementation. The inter-schema relationship definition provides a means defining the means of interoperating two independent schemas that may have common attributes or expressions.

The Token Definitions provide a means of defining a domain and range characteristics of some of the attribute values and also providing a means to define an ontology for the attribute-values, if necessary. We discuss how the token identifier type is used in MAPS query and MAPS update generation in the section on [#Implementation of the MCAT catalog Implementation of the MCAT catalog].

The schema meta information provides a means of defining physical characteristics of the schema of where to store the schema and who will be the owner of the schema and other such information. The schema constraints can be seen as providing a high-level integrity constraint on the schema being created. This can be a condition that can be checked when inserting new values into the schema database or when answering a query. For example, if we have a schema which has a constraint that subject = 'flora' or subject = 'flowers' then this information can be used to see whether a query is applicable to this schema or not. If a query has 'fauna' as its subject, then one can avoid using this schema. The schema constraint can also be used for resource discovery.

Note that the way the grammar is defined, one can do the different parts of MAPS definitions at different times. The schema name is supposed to be unique across the MCAT system. The MAPS definition structures are planned to be part of the next releases.



Implementation of the MCAT catalog

In this section, we describe the current implementation of MCAT in detail. The implementation follows closely the MCAT architecture detailed in [#mcatimg001.gif Figure 1] and the grammar laid out in the [#Uniform Access Interface to Meta Information Attributes Uniform Access Interface to Meta Information Attributes] section. Version 1.1 provides two types of API's for querying and administering the MCAT catalog. The first type of API provides a general functionality for accessing and updating information from MCAT whereas the second type of API provides such functionality for specific information in MCAT and is built using the first type of API. The first type of API's provide a means to insert, delete and query the information in MCAT using the MAPS structure. The API's are described in the [#MCAT API MCAT API] appendix section. Three data structures are used for exchanging information to and from application programs. The MAPS Query Structure as used in the MCAT library is defined as follow:


  typedef struct                                /* Q_condition_List */
  {
    char *tab_name;                             /* both these names together form
    char *att_name;                                the attribute identifier */
    char *att_val;                              /* aggregation operator */
  }mdasC_conditioninfo;
  typedef struct                                /* Out_List */
  {
    char *tab_name;
    char *att_name;
    char *aggr_val;                             /* aggregation operator */
  }mdasC_selectinfo;
  
  typedef struct                                /* MAPS_Query_Struct */
  {
    int condition_count;                             /* size of Q_condition_List */
    mdasC_conditioninfo sqlwhere[MAX_SQL_LIMIT];
    int select_count;                                /* size of  Out_List */
    mdasC_selectinfo sqlselect[MAX_SQL_LIMIT];
  }mdasC_sql_query_struct;
  int answer_count                              /* number of answer tuples requested */
  int distinct                                  /* distinct flag */

Note that the query structure has the same structure as defined by the (highlighted region) of the [#MAPS Query Structure query grammar]. The constant MAX_SQL_LIMIT limits the number of conditions given and the number of attribute-values expected as result. This is kept at 250 in the current implementation. The answer_count is also used by the auxiliary query structure.

The query structure at the client level is much simpler than the MAPS query structure at the library level.


  int sel_val[]                                   
  char query_val[][]
  int answer_count                              /* number of answer tuples requested */
  int distinct                                   /* distinct flag */  

The sel_val array of integers has a one-on-one mapping onto the mdasC_selectinfo structure of the mdasC_sql_query_struct and the query_val array of integers has a one-on-one mapping onto the mdasC_conditioninfo structure of the mdasC_sql_query_struct. This mapping makes for efficient client-server communication. The client can come to know about the schema structure by querying the server schema. The MAPS to Schema Converter module in [#mcatimg001.gif Figure 1] converts from this external format to internal MAPS format and then partitions them according to the schemata being queried. The Dynamic Query Generator module takes this internal schema and generates queries for the underlying database. In the case or Oracle or DB2 the system generated SQL queries according to their ideosyncracies. (A similar query generation can be done for other catalog systems such as LDAP). The query generation is an involved process and uses Steiner tree generation algorithms for generating an efficient query for the schema based on the given conditions and selection attributes. A detailed description of the query generation can be found elsewhere soon.

The MAPS result structure is defined as follows:


  typedef struct                               /* Attribute_Result_Struct */
  {
    char *tab_name;   
    char *att_name;   
    void *values;                              /* pointer to an array containing a column of information   */
  }mdasC_resultinfo;
  typedef struct                               /* MAPS_Result_Struct  */
  {
    int result_count;                          /* number of columns in the result */
    int row_count;                             /* number of rows returned */
    mdasC_resultinfo sqlresult[MAX_SQL_LIMIT];
    int  continuation_index;                   /* positive number indicates that more rows exist */
  } mdasC_sql_result_struct;

Again note the similarity with corresponding the [#MAPS Result Structure result grammar]. The continuation_index can be used for requesting more data from the catalog after a query (or auxiliary query). The count can be different between the query and the auxiliary query and also between auxiliary queries. Note that in Version 1.1 we are not using the query identifier since parallel queries are not permitted in this version and the queries (and auxiliary queries) are blocking.

The Answer Extractor and Cursor Control module in [#mcatimg001.gif Figure 1] deal with the database system and controls the answer sets given back to the application. The Schema to MAPS Converter module packages the answers and ships them to the client.



Querying MCAT using Scommands

Scommands is a utility that provides Unix-type shell-level commands for accessing SRB and MCAT. A few of the commands are directed towards MCAT only and the rest are directed towards SRB that in turn uses MCAT to perform internal data and resource discovery. Man Pages for these commands are also available both at Unix-level at web-level.

MCAT-specific
Commands
Comments
Scd Changes working SRB collection
Schmod Modify access permits for SRB objects and collections
Senv Displays environmental file content
Sexit Clears environmental files created during SRB operationat the client-level
SgetD.c Displays information about SRB objects
SgetR.c Displays information about SRB resources
SgetT.c Displays information about SRB tickets
SgetU.c Displays information about SRB users
Shelp Displays one-line help messages for all Scommands
Sinit Initializes environmental files for SRB operation and connects to SRB/MCAT to authenticate the user
Sls Lists SRB objects and collections
Smkdir Creates a new SRB collection
SmodD Modifies metadata information about SRB objects
Smv Changes the collection for objects in SRB space
Spasswd Change user passwords
Spwd Displays current working SRB collection
Sregister Registers an object as a SRB object in MCAT
Srename Changes names of a SRB object or collection
Srmdir Removes an existing SRB collection
Srmticket Remove a ticket
Stcat Streams ticketed SRB Objects to standard output
Sticket Issue tickets for SRB objects and collections
Stls Lists ticketed SRB objects and collections
Stoken Displays information about SRB-aware types.
SRB-directed
Commands
Comments
Scat Streams SRB Objects to standard output
Scp Copies an object to a new collection in SRB space
Sget Exports SRB objects into local file system
Sphymove Moves a SRB object to a new SRB resource
Sput Imports local files into SRB space
Sreplicate Replicates an existing SRB object in a (possibly) new resource
Srm Removes SRB objects



Installation of MCAT

For Installation of the Metadata Catalog (MCAT) follow the following steps (for testing only see section below on the topic):

  • In the following, we assume that there is a database called MCAT installed on either DB2 or ORACLE. Also, we assume that the user-id installing SRB and MCAT has privileges for creation of objects in the database. The database should have roll-forward- recovery set and backups done so as to enable recovery from crashes. If you want another name for the database, please refer to section on MCAT CONFIGURATION.
  • We assume that SRB-MCAT package has been compiled successfully. In the discussion, we call the bin directory of SRB as SRBBINDIR. and the data directory as SRBDATADIR. We assume that they are sibling directories in the file system.

The following files should be available in SRBDATADIR directory:

  • MdasConfig
  • metadata.fkrel
  • catalog.install.???
  • catalog.cleanup.???
  • install.results.???
  • test.results.???
  • startmcat.???


where ??? is the database specific extension (eg. db2, ora) The following files should be available in SRBBINDIR directory:

  • test.catalog
  • test_srb_mdas_*****

Installation Procedure

  1. Set environmental variable 'srbData' to SRBDATADIR
  2. Change the file MdasConfig to suit your installation. For more details refer to the section below on MCAT CONFIGURATION.
  3. Change directory to SRBBINDIR/..
  4. Depending upon the database on which you are installing MCAT Run startmcat.??? The startmcat routine creates all necessary MCAT catalog tables, indices,aliases and views, and populates the database with some minimal data. The routine also tests the installation and the Interface between MCAT and SRB.
  5. The startmcat routine compares results of its execution with expected results. If there are any diff messages in the output of the routine, make sure that database setup and steps (1)-(3) have been completed properly. If the problems still persist, the files myinstall.results.??? and mytest.results.??? when compared to the files install.results.??? and test.results.??? provide clues to which routines failed the installation. If the problems are deeper than expected, contact SRB developers.
  6. If the installation completed successfully, then the MCAT is primed and ready to be used.

MCAT Configuration

The file MdasConfig holds all the system-level details tailoring each sites MCAT installation. The file contains the following information:

MDASDBTYPE < database-type >
MDASSCHEMENAME < schema-name >
MDASDBNAME < database-name >
MDASINSERTSFILE < insertion-log-file-name >
METADATA_FKREL_FILE < schema-semantics-file >
DB2INSTANCE < database-instance-name >
DB2PASSWORD < owner-password >
DB2USER < owner-name >
DB2LOGFILE < log-file-name >
DBHOME < database-home-directory >

< database-type > can be either db2 or oracle.
< schema-name > currently set to 'MCAT.' for db2 and 'sekar.' for oracle.
< database-name > mcat or other name that you want to give the catalog database..
< database-instance-name > instance name if needed can be null if not needed.
< owner-name > user-id creating and using MCAT.
< owner-password > password of above. A few systems allow null passsword and use unix-authentication. If password is given here make sure that the file is not readable by others.
< database-home-directory > home directory of the database.
< schema-semantics-file > set to 'metadata.fkrel'.
< insertion-log-file-name > file for logging insertions and updates; can be /dev/null.
< log-file-name > file for logging database error messages. Recommended to be in SRBDATADIR.


Testing of the MCAT Catalog

For testing without performing a cleanup and installation, perform the following steps:

    1. Set environmental variable 'srbData' to SRBDATADIR
    2. Change the file MdasConfig to suit your installation. For more details refer to the section above on MCAT CONFIGURATION.
    3. Change directory to SRBBINDIR/..
    4. Run testmcat
    5. If the tests did not complete successfully, check the files test.results and mytest.results for differences. If the differences cannot be reasoned because of already existing MCAT objects are cannot be removed by modifying the configurations, contact MCAT developers or reinstall the database (Note that this is an extreme step which will clear all your data in MCAT).

MCAT Administrator Service

We have a set of tools that allow an administrator to ingest new system types into the MCAT catalog. Administrative tools for deleting and updating various system metadata elements are under development. There is a web form for applying for registering a user or group. The form is automatically mailed to srb@sdsc.edu on submission. Ingestion routines have no client counterparts and need to be run in SRBHOST. All the commands require 'sysadmin' privileges. Ingesting New User Names:

    1. Find the values for < UserDomain > and < UserType > that is appropriate for the user.
    2. The user name should be unique in the domain.
    3. Run ingestUser command as follows:
ingestUser < RegistrarName >  < RegistrarDomain >  < RegistrarPassword >  < UserName >  < UserPassword >  < UserDomain >  < UserType >  < UserAddress >  < UserPhone >  < UserEmail >

Ingesting New User Group Names:

    1. This is very similar to ingesting a new user name.
    2. Find the values for < GroupType > that is appropriate for the group. These are same as < userType > group.
    3. Run ingestUsergroup command as follows:
ingestUsergroup   < RegistrarName >  < RegistrarDomain >  < RegistrarPassword >  < GroupName >  < GroupPassword >  < ContactAddress >  < ContactPhone >  < ContactEmail >
    1. The groups are placed in a separate domain called 'groups'.

Ingesting New (Physical) Resources:

    1. Find the values of < Location > and < ResourceType > that conform to the properties of the new resource. These can be obtained by using Stoken command of client utilities.
    2. Make sure the name of the resource is unique in MCAT. Use SgetR command of client utilities to check the uniqueness.
    3. Run ingestResource command as follows:
ingestResource  < RegistrarName >  < RegistrarDomain >  < RegistrarPassword >  < ResourceName >  < ResourceType >  < Location >  < DefaultPath >
    1. If either < ResourceName > or < ResourceType > is not in MCAT or if < ResourceName > is not unique in MCAT, errors may result.
    2. < DefaultPath > is used for storing srb data objects in the resource when the user does not provide any default addres for placing the object. Samples of default paths for file systems and database large objects are:
      Files: /users/sdsc/srb/SRBVault/?USER.?DOMAIN/?PATH?DATANAME.?RANDOM
      DBLOBs: /srbVault/obj_locator/obj_id='?PATH/?DATANAME.?RANDOM'
      When the default path is used, the variables such as ?USER are filled with actual names (eg. owner of the srb object). In the case of files, we obtain the path name of the file in the file system. In the case of database LOBs, the three parts of the path are to be deciphered as follows:
      /table_name/LOB_attribute_name/LOB_selection_condition
    3. A resource is also registered automatically as a singleton logical resource under this name.

Ingesting New Logical Resources:

    1. This can be used to make a new logical resource from one or more physical resources. This command allows one to create a logical resource out of an existing physical resource. The addLogicalResource command allows to add other resources to the same name. In Version 1, the meaning of a logical resource (as mentioned in the section on [# ]) is to define replication resources. Hence if a data object is written into a logical resource then it will be written into all the underlying physical resource. If a data set residing in a logical resource is read, then only one copy of it is read.
    2. Run ingestLogicalResource command as follows:
ingestLogicalResource  < RegistrarName >  < RegistrarDomain >  < RegistrarPassword >  < LogicalResourceName >  < PhysicalResourceName >   < LogicalResourceType >  < LogicalDefaultPath >
    1. The < LogicalDefaultPath > can be different from the physical resource path. If the < LogicalDefaultPath > is an empty string then the physical default path is used. If the < LogicalDefaultPath > is not a relative path name (i.e., does not begin with a '/'0, then it is concatenated with the physical path to form the default path for placing a srb object.
    2. A logical resource cannot be made or added into another logical resource.

Adding Physical Resources to Logical Resources:

    1. Run addLogicalResource command as follows:
addLogicalResource < RegistrarName >  < RegistrarDomain >  < RegistrarPassword >  < oldLogicalResourceName >  < NewLogicalResourceType >   < PhysicalResourceName >  < NewLogicalDefaultPath >

Ingesting New Data Collections:

    1. Find the < ParentCollectionName > under which the new collection needs to be created.
    2. The registrar should have 'all' permit in the < ParentCollectionName > .
    3. Run ingestCollection command as follows:
ingestCollection < RegistrarName >  < RegistrarDomain >  < RegistrarPassword >  < CollectionName >  < ParentCollectionName >

Ingesting New Type-Descriptions:

    1. Find the parent Type value for the new Type being ingested. It can be obtained using Stoken command of client utilities.
    2. Make sure that the new Type is not already in MCAT.
    3. Run as follows:
      • for ingesting Location use:
ingestLocation < RegistrarName >  < RegistrarDomain >  < RegistrarPassword >  < LocationName >  < NetPrefix >  < ParentLocation >

The < NetPrefix > argument is of the form
< hostName > :NULL:NULL for file, archival and ftp systems
< hostName > :< databaseName > :< InstanceName > for databases.
(< InstanceName > can be NULL or can be used for < PortNumber > )

      • for ingesting other types use:
ingestToken  < RegistrarName >  < RegistrarDomain >  < RegistrarPassword < TokenName >  < NewTypeName >  < ParentTypeName >

valid TokenNames are:
ResourceType
DataType
UserType
Domain
Action


Beyond MCAT Version 1.1

We have several avenues to follow beyond the Version 1.1 releases of MCAT and SRB: Some of the tasks that we plan to do in the near future are as follows:

    1. expand the core systemic meta data to include other well-defined data models and provide interoperability between MAPS format and other models;
    2. provide an interface for creating new application-level meta data schemas so that domain-dependent meta schemata can be integrated with core MCAT schema;
    3. develop interfaces and specification for interfaces to external meta catalogs;
    4. integrate well-defined functions for extracting meta information from documents, images and other objects
    5. integrate meta information for method objects and relate them with other systemic objects and application-oriented meta information
    6. develop a graphical user interface to ease of use SRB and MCAT.
    7. expand core systemic meta catalog to include lineage, derivative, associative information among objects in the meta catalog.
    8. expand meta catalog to handle partitioned objects, subset objects, objects at different resolutions, object sequences and versions of objects
    9. provide other functionalities such as locking, triggers, active rules, full-fledged access control mechanism, etc to enhance the database capability of the catalog.
    10. Integrate with other services to provide dynamic meta information about data and resource objects.

We discuss a few of them in brief in the next few paragraphs.


Standard Data Models

Dublin Core provides a standard set of meta information for documents. The meta data includes a set of 15 attributes along with augmentation given by the Warwick Framework. These attributes provide a complimentary set of attributes that with the systemic-core attributes in MCAT will provide comprehensive meta information for searching, discovering, locating and transporting data objects. Our plan is to include the Dublin core attributes as a separate schema in the MCAT framework. The 'token identifier' facility in MCAT provides a convenient way of incorporating the semantic aspects of the Dublin Core attributes with strong typing and controlled vocabulary. For example, the name attribute in the Dublin Core is a string, but one can attribute the name to be of type organization, author, illustrator, etc which provides a semantics to that string. This types of can be easily provided in the MCAT framework. We also will consider including FGDC data model standard fro geo-spatial data objects in MCAT. The Enabling Technology group in collaboration with the Interactive Environments Group at SDSC is currently formulating a set of meta data elements as a core set for describing information about images and visualization objects. We plan to incorporate this "image core meta elements" in MCAT.


Domain-dependent Meta Information

There are several groups who are planing to define metadata schemas specific to their domains of interest (eg. Neuroscience group, environmental sciences group). We plan to implement facilities for describing and creating application-specific and domain-specific meta information catalogs using the language grammar defined in the [#MAPS Definition Structure MAPS Definition Structure] section. A module for that can associate and allow interaction between the core meta information in MCAT and specific meta information would be built by extending current MAPS semantics and Schema semantics.


Meta Information for Methods

Methods, as discussed in the [#Method View Method View] subsection, is a first-class object in the MCAT catalog. Even though current implementation of MCAT does not support facilities for using meta information about methods, the design of the MCAT system included methods and the implementations were made with future inclusions of methods. Several meta descriptions of methods including descriptions for composition of methods, lineage for compiled methods, and performance attributes would be part of the method schemata. We also plan to use the schemata the location of methods themselves which can be ported and executed on client systems. Also, meta information about proxy functions in SRB would be stored in MCAT facilitating their use.


Extraction of Meta Information

In case of several objects such as documents, texts and images , it is feasible to have routines that can extract meta information about individual objects. The Virage's Image Read/Write Toolkit or the ImageTools developed at SDSC are two examples of tools for such meta data extractions. We plan to integrate functionalities of such tools into the MCAT and SRB systems and provide automatic ways of extracting meta information from ingested objects. MCAT would not only hold the meta data through specialized schema and allow searching them using uniform query interface, it would also store information about the extraction engines thereby facilitating their use by SRB.


Graphical User Interface

Currently, we have a minimal set of graphical interface called SRBTool that allows one to interact with MCAT and SRB using point-and-click mechanisms. We plan to augment this to provide a full-fledge GUI system for MCAT and SRB. With respect to MCAT, the facility would have functionality for querying the MCAT catalog (a MAPS to GUI translation would be provided) as well as ingestion and update of meta information the catalog. The GUI would provide an interface where by the user can query across several schemata and the interface would adapt to changing set of MCAT attributes.


Languages for Digital Libraries

Digital libraries can be viewed as providing access to digital objects through their metadata. The [#Architecture of MCAT System architecture] and the [#Uniform Access Interface to Meta Information Attributes uniform access interface] of MCAT system provide an infra-structural framework for developing digital libraries. With the concepts of collections, data objects, resources and methods along with facilities for domain-specific meta information creation and management, MCAT provides a generic platform for building digital libraries. Coupled with SRB, that facilitates distributed resource handling, and provides heterogeneous storage access, we have a powerful system for building distributed (and autonomous) digital libraries. The grammar defined in the [#Uniform Access Interface to Meta Information Attributes uniform access interface] section provides a language for defining, querying, updating and presenting results from a meta information catalog In order to be more general and useful, one can develop a full-fledged system of languages for a digital library system. The following chart provides such a design and also points out where such work is being undertaken under the NPACI partnership. Our aim is to further this language-based scheme to define a generic digital library infrastructure that can be used across multiple domain.

Digital Libraries - A Matrix of Functionality
Definition
Language
Manipulation
Language
Language
Compilers
NPACI
Partners
Ontology Dewey Decimal System
Gloss
Buckets
?
Infobus
ADL

Stanford
UCSB
Schema
(semantics
included)
Schema Definition
Language
Schema Manipulation
Language
new MCAT SDSC
Metadata Metadata Definition
Language
Metadata Manipulation
Language
STARTS

MCAT
Infobus

SDSC
Stanford
Server
Side
Proxies
Proxy Description
Language
IDL
Method MCAT
Program Graphs
Infobus
new SRB
Legion
Stanford
SDSC
UVa
Collection ? access API SRB/MCAT SDSC
Resources Globus RDL
NWS
Resource MCAT
none Globus
NWS
MCAT
ANL
UCSD
SDSC



Appendix 1: MCAT API

MCAT API's at the client level are described in the A API Appendix of the SRB description web page.

#ifndef  MCAT_PROTOTYPES_H
#define  MCAT_PROTOTYPES_H

#include "mdasC_db2_externs.h"


/***************************************************************************
 NAME   : get_dataset_info
 PURPOSE: To ping the Metadata Catalog for information about an 
          object-dataset. Call  get_more_rows at least once
 INPUT  :  cat_type         - catalog type (ex. MDAS_CATALOG)
           data_name        - name of the dataset (or object)
           obj_user         - user name
	    access_name      - access permission (eg. read, write, append, 
	                       execute)
	    domain_name      - security domain name (eg. legion, sdsc)
	                       user and dataset should belong to this domain
	    collection_name  - collection to which the dataset belongs
	    myresult         - holder for pointer to result structure that 
	                       returns the results.
	    rows_required    - maximum number of result rows required
	                       for this call
			       (see also get_more_rows)
 OUTPUT : myresult          - updated with object information, 
 RETURN : 0 for SUCCESS negative for failure
****************************************************************************/

extern int get_dataset_info(int                      cat_type,
			    char                      *data_name, 
			    char                      *obj_user, 
			    char                      *access_name, 
			    char                      *domain_name,
			    char                      *collection_name,
			    mdasC_sql_result_struct  **myresult,
			    int                       rows_required);



/*-------------------------------------------------------------------------*/
/***************************************************************************
 NAME   : get_more_rows
 PURPOSE: To get additional rows after a get_* call
          Call get_more_rows with rows_required = 0 for ending
	   the query.
 INPUT  :  cat_type         - catalog type (ex. MDAS_CATALOG)
           result_descr     - index into array  maintaining the 
	                       query continuation information
			       use myresult->continuation_index from any
			       get_* call (including most recent get_more_rows
			       call).
	    myresult         - holder for pointer to result structure that 
	                       returns the results.
	    rows_required    - maximum number of result rows required
	                       for this call
			       if zero then no rows are returned and
			       the query structures are closed.
			       (see also get_more_rows)
  OUTPUT : myresult          - updated with object information, 
  RETURN : 0 for SUCCESS negative for failure
****************************************************************************/

extern int get_more_rows(int                        cat_type,
			 int                        result_descr,
			 mdasC_sql_result_struct  **myresult,
			 int                        rows_required);
 
 

/*-------------------------------------------------------------------------*/

/***************************************************************************
  NAME   :  register_dataset_info
  PURPOSE: To register an object-dataset into a Metadata Catalog. Information 
	   about the operation is  also logged in the audit  trail.
  INPUT  :  cat_type         - catalog type (ex. MDAS_CATALOG)
            data_name        - name of the dataset (or object)
            data_user         - user name
	    access_name      - access permission (eg. read, write, append, 
	                       execute)
	    domain_name      - security domain name (eg. legion, sdsc)
	                       user and dataset should belong to this domain 
	    data_type_name    - object type name (eg. legion-object, html, etc)
	    data_path_name    - full path name of the object if a file
	                       table-name/object-id if a DB LOB
	    resource_name    - name of the storage resource.
	    collection_name  - collection to which the object belongs 
	                       (eg.'adl')
	    data_size	     - size of the object
  OUTPUT : none
  RETURN : 0 for SUCCESS negative for failure
****************************************************************************/

extern int register_dataset_info(int                 cat_type,
		   char                      *data_name, 
		   char                      *data_user, 
		   char                      *access_name, 
		   char                      *domain_name,
                   char                      *data_type_name,
		   char                      *data_path_name,   
		   char                      *resource_name,
                   char                      *collection_name,
		   int                        data_size);

/*-------------------------------------------------------------------------*/

/***************************************************************************
  NAME   :  modify_dataset_info
  PURPOSE: To modify, add or delete metadata information about an
           object-dataset into a Metadata Catalog. Information 
	   about the operation (except D_INSERT_AUDIT) performed is 
	   also logged in the audit  trail.
  INPUT  :  cat_type         - catalog type (ex. MDAS_CATALOG)
            data_name        - name of the dataset (or object)
	    collection_name  - collection to which the dataset belongs
	    data_path_name    - full path name of the object if a file
	                       table-name/object-id if a DB LOB
	    resource_name    - name of the storage resource
	    data_value_1     - data used for addition/deletion 
	                       or modification
	                       	  D_DELETE_ONE      (not used) 
	                     	  D_DELETE_DOMN     (domain_desc)  
				  D_INSERT_DOMN     (domain_desc)   
				  D_CHANGE_SIZE     (size)             
				  D_CHANGE_TYPE     (data_typ_name)
				  D_CHANGE_GROUP    (data_grp_name)
				  D_CHANGE_SCHEMA   (schema_name)
				  D_INSERT_ACCS     (user_name) given access
				  D_DELETE_ACCS     (user_name) given access
				  D_DELETE_ALIAS    (user_name) for whome the
				                    alias is provided
				  D_INSERT_ALIAS    (user_name)
				  D_DELETE_COMMENTS (comments)
				  D_INSERT_COMMENTS (comments)
				  D_INSERT_AUDIT    (action_desc)
	    data_value_2     - additional data used for addition/deletion 
	                       or modification
	                       	  D_DELETE_ONE      (not used)   
	                     	  D_DELETE_DOMN     (not used)
				  D_INSERT_DOMN     (not used)
				  D_CHANGE_SIZE     (not used)       
				  D_CHANGE_TYPE     (not used)
				  D_CHANGE_GROUP    (not used)
				  D_CHANGE_SCHEMA   (not used)
				  D_INSERT_ACCS     (access_constraint)
				  D_DELETE_ACCS     (access_constraint)
				  D_DELETE_ALIAS    (alias_data_name)
				  D_INSERT_ALIAS    (alias_data_name)
				  D_DELETE_COMMENTS (not used) 
				  D_INSERT_COMMENTS (not used) 
				  D_INSERT_AUDIT    (comments, if any)
	    retraction_type  - type of operation to be performed.
	                       	  #define D_DELETE_ONE       1
	                     	  #define D_DELETE_DOMN      2
				  #define D_INSERT_DOMN      3
				  #define D_CHANGE_SIZE      4           
				  #define D_CHANGE_TYPE      5    
				  #define D_CHANGE_GROUP     6
				  #define D_CHANGE_SCHEMA    7
				  #define D_INSERT_ACCS      8
				  #define D_DELETE_ACCS      9
				  #define D_DELETE_ALIAS    10
				  #define D_INSERT_ALIAS    11
				  #define D_DELETE_COMMENTS 12
				  #define D_INSERT_COMMENTS 13
				  #define D_INSERT_AUDIT    14
	    data_user_name    - name of the user
	    user_domain_name  - domain to which the user belongs

  OUTPUT : none
  RETURN : 0 for SUCCESS negative for failure
****************************************************************************/


extern int modify_dataset_info(int                   cat_type,
			       char                      *data_name, 
			       char                      *collection_name,
			       char                      *data_path_name,   
			       char                      *resource_name,
			       char                      *data_value_1,
			       char                      *data_value_2,
			       int                        retraction_type,
			       char                      *data_user_name,
			       char                      *user_domain_name);


/*-------------------------------------------------------------------------*/
/***************************************************************************
  NAME   :  copy_dataset
  PURPOSE: To record the copy of an object-dataset inthe a Metadata Catalog.
           Information about the operation is  also logged in the audit trail.
  INPUT  :  cat_type             - catalog type (ex. MDAS_CATALOG)
            data_name            - name of the dataset (or object)
	    collection_name      - collection to which the dataset belongs
	    old_resource_name    - name of the storage resource in which the
	                           object resided previously.
	    old_data_path_name    - full path name of the object in
	                           old_resource_name
	    new_resource_name    - name of the storage resource to which the
	                           object has been currently  copiedd 
	    new_data_path_name    - full path name of the object in
	                           new_resource_name
            obj_user_name        - user name
	    user_domain_name  - domain to which the user belongs
  OUTPUT : none
  RETURN : 0 for SUCCESS negative for failure
****************************************************************************/

extern int copy_dataset(int                          cat_type,
			char                      *data_name, 
			char                      *collection_name,
			char                      *old_resource_name,
			char                      *old_data_path_name,   
			char                      *new_resource_name,
			char                      *new_data_path_name, 
			char                      *data_user_name,
			char                      *user_domain_name);

/*-------------------------------------------------------------------------*/

/***************************************************************************
  NAME   :  move_dataset
  PURPOSE: To record the move of an object-dataset inthe a Metadata Catalog.
           Information about the operation is  also logged in the audit trail.
  INPUT  :  cat_type             - catalog type (ex. MDAS_CATALOG)
            data_name            - name of the dataset (or object)
	    collection_name      - collection to which the dataset belongs
	    old_resource_name    - name of the storage resource in which the
	                           object resided previously.
	    old_data_path_name    - full path name of the object in
	                           old_resource_name
	    new_resource_name    - name of the storage resource to which the
	                           object is currently  moved 
	    new_data_path_name    - full path name of the object in
	                           new_resource_name
            data_user_name        - user name
	    user_domain_name  - domain to which the user belongs
  OUTPUT : none
  RETURN : 0 for SUCCESS negative for failure
****************************************************************************/

extern int move_dataset(int                          cat_type,
			char                      *data_name, 
			char                      *collection_name,
			char                      *old_resource_name,
			char                      *old_data_path_name,   
			char                      *new_resource_name,
			char                      *new_data_path_name, 
			char                      *data_user_name,
			char                      *user_domain_name);
/*-------------------------------------------------------------------------*/


extern int auditDatasetAccess(int cat_type, char *user_name,
                  char *data_name, char *collection_name,
		  char *data_path_name,
                  char *resource_name, char *data_access_name, 
                  char *comments, int success, char *user_domain_name);
 

extern int get_mdas_authorization(char *user_name, 
				 char *user_password, 
				 char* domain_name);

extern int get_mdas_sys_authorization(char *user_name, 
				      char *user_password, 
				      char* domain_name);

extern int make_new_collection(int cat_type, char *parent_col_name, 
                    char *new_col_name,
                    char *user_name, char *user_domain);

extern int register_user_group(int cat_type, char *group_name, 
                        char *group_password,
                        char *group_type_name, 
                        char* group_address, char* group_phone, 
                        char* group_email,
                        char *registrar_name, char * registrar_password,
                        char *registrar_domain_name);
 
extern int register_user_info(int cat_type, char *user_name, 
		       char *user_password, char *user_domain_name, 
		       char *user_type_name, 
		       char* user_address, char* user_phone, char* user_email,
		        char *registrar_name, char *registrar_password,
		       char *registrar_domain_name);


 
extern int get_data_dir_info(int cat_type, char qval[][MAX_TOKEN],
                             int selval[], 
                             mdasC_sql_result_struct **myresult,
                             char tname[][MAX_TOKEN],
                             char aname[][MAX_TOKEN],
			     int  rows_required);
 
 
extern int get_collections(int cat_type, 
			   char * col_name,
			   char *flag,
			   mdasC_sql_result_struct **myresult,
			   int  rows_required);
 
extern int resAttrLookup(int catType, char *resourceName,
                       char *domainName, char **resourceType,
                       char **resourceLoc, int  rows_required, int *row_count);
 

extern int modify_collection_info(int cat_type, char *obj_user_name,
			   char *group_name, 
			 char *data_value_1, char *data_value_2,
			   char *datavalue_3,
			 int retraction_type, char *user_domain_name);

extern int modify_user_info(int cat_type, char *obj_registrar,
                         char *data_value_1, char *data_value_2,
                         int retraction_type, char *registrar_password,
                            char *registrar_domain_name);

extern int commit_db2_interaction(int  transaction_end_code);

extern close_db2_interaction(int transaction_end_code);

extern int open_db2_interaction(int cat_type);

extern void rmGlbFromResult (mdasC_sql_result_struct  *myresult);

extern int removeTicket(char *ticketId, char *userName, char *domainName);

int
issueTicket(char *objName, char *collection, char *flag, 
	    char *begTime, char* endTime, int AccessCount,
	    char *userSet, char **ticketId, 
	    char *userName, char *domainName);

#endif	/* MCAT_PROTOTYPES_H */




Last Updated: May 11, 1998

Contact: Arcot Rajasekar sekar@sdsc.edu
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center, MC 0505
University of California, San Diego
9500 Gilman Drive, Bldg 109
La Jolla, CA 92093-0505