MCAT Attributes

From SRB

This document provides a list of MCAT attributes that are currently exposed to the user through the srbGetDataDirInfo Client API.

The srbGetDataDirInfo call provides a means to query the MCAT catalog. The call has the prototype:

extern int srbGetDataDirInfo(srbConn* conn, int catType, char qval[][MAX_TOKEN], int *selval, mdasC_sql_result_struct *myresult, int rowsWanted);

The MCAT uses the input qval[][] and selval[] array to compose and execute SQL queries and returns the query result in myresult. The selval[] array specifies a list of attrbutes to retrieve and qval[][] specifies a lists of predicates conditions to search, which are used conjunctively. Both selval[] and qval[][] must be arrays of size MAX_DCS_NUM and are indexed by values given in mdasC_db2_externs.h under the heading DCS-ATTRIBUTE-INDEX DEFINES. Please check the list there for the most recent set of attributes that are available for querying.


The selval[] values can range as follows with the following semantics (follows the general relational database) functions:

	 1 :  select value 
         2 :  count()
	 3 :  max()
	 4 :  min()
	 5 :  avg()
	 6 :  sum()
	 7 :  variance()
	 8 :  stddev()
	 9 :  count(distinct ) 

The qval[] provides conditions for retrieval. The values can be quoted strings or numbers as per SQL requirements. We provide the following types of comparisons for the qval values.

         =		 : equal 
	<>		 : not equal
	 >		 : greater than
	 <		 : less than
	>=		 : greater than or equal
	<=		 : less than or equal
	 in		 : in a list [eg. in ('alpha','beta','gamma') ]
	not in		 : not in a list
	between		 : between two items [eg. between 24 and 32 ]
	not between	 : not between two items
	like		 : like use % and _  for wildcard string and 
		           character respy. Can use ESCAPE as per SQL.
	not like	 : not like an item
	sounds like	 : if soundex is built into the database 
				Oracle supports it.
	sounds not like	 : sounds not like an item.

For the selval[] array, setting an element of the array to 1 means that the value associated with this element is to be retrieved. e.g., selval[USER_NAME] = 1; means the "user_name" attribute is to be retrieved. The qval[][] array value includes comparison predicates to search. e.g., sprintf(qval[DATA_NAME]," = '%s'", "unixFileObj1"); means that the search condition includes the term (data_name = "unixFileObj1").

Once the query is successful, individual columns of the resulting structure can be retrieved using the getAttributeColumn function whose prototype is given below:

char *getAttributeColumn(mdasC_sql_result_struct *result, int attrIndex)

An example showing the usage of srbGetDataDirInfo and getAttributeColumn (the example is also in test/examples):

#include <stdio.h>
#include <sys/file.h>
#include <sys/stat.h>
#ifdef PORTNAME_solaris
#include <fcntl.h>
#endif

#include "srbClient.h"

#define SRB_HOST "torah.sdsc.edu"
#define MY_PASSWORD "CCCCC"
int main(int argc, char **argv)
{
 int i, status;
 srbConn *conn;
 mdasC_sql_result_struct myresult;
 char qval[MAX_DCS_NUM][MAX_TOKEN];
 int selval[MAX_DCS_NUM];
 char *pathName, *rsrcName;
 int numOfRows = 20;
 
    if (argc != 3 ) {
        fprintf(stderr, "Usage: %s collectionName dataName \n",
                argv[0]);
        exit(1);
    }
 /* connect to a SRB server */
 conn = clConnect (SRB_HOST, NULL, MY_PASSWORD);
 if (clStatus(conn) != CLI_CONNECTION_OK) {
    fprintf(stderr,"Connection to srbMaster failed.\n");
    fprintf(stderr,"%s",clErrorMessage(conn));
    srb_perror (2, clStatus(conn), "", SRB_RCMD_ACTION|SRB_LONG_MSG);
     clFinish(conn);
     exit(1);
 }

 /* initalize the input structures */
 for (i = 0; i < MAX_DCS_NUM; i++) {
     selval[i] = 0;
     sprintf(qval[i],"");
 }

 /* set user requirements - in this example, the user is asking for
	the resource name and path name for a given dataset. */
 sprintf(qval[DATA_NAME]," = '%s'",argv[2]);
 sprintf(qval[COLLECTION_NAME]," = '%s'",argv[1]);
 selval[PATH_NAME] = 1;
 selval[RSRC_NAME] = 1;

 /* perform the MCAT query */
 status = srbGetDataDirInfo(conn, MDAS_CATALOG, 
			    qval, selval, &myresult, numOfRows);
 while ( status ==  0) {
   /* retrieve the values */
   pathName = (char *) getAttributeColumn((mdasC_sql_result_struct *) &myresult, 
				 PATH_NAME);
   rsrcName = (char *) getAttributeColumn((mdasC_sql_result_struct *) &myresult, 
				 RSRC_NAME);
   /* print the result */
   for (i = 0; i < myresult.row_count; i++) {
      fprintf(stdout, "%20.20s    %s\n",rsrcName,pathName);
      pathName += MAX_DATA_SIZE;
      rsrcName += MAX_DATA_SIZE;
   }
   /* free SRB allocated query column structures */
   free(pathName);
   free(rsrcName);
   /* if there are more rows retrieve them */
   if (myresult.continuation_index >= 0) {
      status = srbGetMoreRows(conn, MDAS_CATALOG,
		     myresult.continuation_index, 
		     &myresult, numOfRows);
   }
   else {
     break;
   }
 }
 /* disconnect from SRB Server */
 clFinish(conn);
 exit(0);
}

We give below the list of attributes that can be queried or used as conditions in a query in srbGetDataDirInfo. This list is not exhaustive since the design is far more comprehensive; further, even in implementations some of the attributes are not exposed at the client level and a few are exposed only through specific routines. Also, there are a few sets of attributes such as the IV Core, etc that are not exposed through this interface at this time. Please check the mdasC_db2_externs.h for the set of attributes exposed to the clients in the current implementation of the SRB client library:

Caveat: A few attributes occur more than one time under different names depending upon the role they play (eg. DATA_OWNER_EMAIL and USER_EMAIL). These multiple attributes actually point to the same data internally and hence there are no problems of inconsistency involved in this virtual replication.

Dataset Information (core meta-information about datasets):

  DATA_NAME                    /* data name */
  DATA_REPL_ENUM               /* replica copy number */
  COLLECTION_NAME                /* collection name in which the data resides*/
  SIZE                         /* size of data */
  DATA_TYP_NAME                /* data type  (mostly data formats)*/
  DATA_CLASS_NAME              /* classification name for data */
  DATA_CLASS_TYPE              /* classification type */
  ACCESS_CONSTRAINT            /* access restriction on  data */
			currently supported access constraints are:
                           'execute','read audit','read',
			   'annotate audit','annotate',
			   'write audit','write',
                           'create audit','create',
                           'all audit','all',   /* all is like ownership and
					allows to grant/revoke access
					to other users */
                           'curate audit','curate' /* used at collection level 
					to have  ownership on objects
					in/and below the collection. This 
				 	access constraint is still under
					design and may have its properties/usage
					modified in future releases */

  DATA_COMMENTS                /* comments on data */
  DATA_COMMENTS_TIMESTAMP      /* time stamp for comments on data */
  REPL_TIMESTAMP               /* data modification time stamp */
  PATH_NAME                    /* physical path name of data object */
  DATA_CREATE_TIMESTAMP        /* data creation time stamp */
  DATA_IS_DELETED              /* data liveness */
  DATA_OWNER                   /* data creator name */
  DATA_OWNER_DOMAIN            /* domain of data creator */
  DATA_OWNER_EMAIL             /* email of data creator */

Collection Information (core meta-information about collections):

  COLLECTION_NAME              /* collection name in which the data resides*/
  PARENT_COLLECTION_NAME       /* name of parent collection (15) */
  ACCESS_COLLECTION_NAME       /* use this as collection name 
					for checking access to collection */
  COLLECTION_ACCESS_CONSTRAINT /* access restriction  on collection*/
  CONTAINER_FOR_COLLECTION     /* default container for collection */

User Information (this contains core information about SRB-registered users):

  USER_NAME                    /* user name */
  DOMAIN_DESC                  /* user domain name */
  USER_TYP_NAME                /* user type */
  USER_GROUP_NAME              /* name of user group */

  USER_ADDRESS                 /* user address */
  USER_PHONE                   /* user phone number */
  USER_EMAIL                   /* user email */

  USER_DISTIN_NAME             /* distinguished name of user 
					(used by authentication systems ) */
  USER_AUTH_SCHEME             /* user authentication scheme associated 
					with the user_distin_name */

Physical Resource Information (core meta-information about physical resources including SRB servers):

  SERVER_LOCATION              /* location of SRB server */
  SERVER_NETPREFIX             /* net address of SRB server */
  PHY_RSRC_NAME                /* physical resource name */
  PHY_RSRC_TYP_NAME            /* physical resource type */
  RSRC_CLASS                   /* classification of physical resource 
					(inherited by logical) */
  MAX_OBJ_SIZE                 /* maximum size of data object allowed in
				 physical resource (not enforced by MCAT) */
  PHY_RSRC_DEFAULT_PATH        /* default path in physical resource */
  LOCATION_NAME                /* registered name of location 
					(of resource) in MCAT */
  RSRC_ADDR_NETPREFIX          /* net address of resource */
  RESOURCE_MAX_LATENCY          /* physical resource estimated latency (max) */
  RESOURCE_MIN_LATENCY          /* physical resource estimated latency (min) */
  RESOURCE_BANDWIDTH            /* physical resource estimated bandwidth */
  RESOURCE_MAX_CONCURRENCY      /* physical resource maximum concurrent
					 requests */
  RESOURCE_NUM_OF_HIERARCHIES   /* number of hierarchies in the 
						physical resource */
  RESOURCE_NUM_OF_STRIPES       /* number of striping of data in the 
						physical resource */
  RESOURCE_CAPACITY             /* capacity of  the physical resource */
  RESOURCE_DESCRIPTION          /* comments on the resource */

Logical Resource Information (core meta-information about logical resources):

  RSRC_NAME                     /* name of logical resource -  logical resources
				   inherit many of the attributes from the
				   associted  physical resource*/
  RSRC_REPL_ENUM                /* index of physical rsrc in logical rsrc*/
  PHY_RSRC_NAME                 /* associated physical resource name */
  RSRC_ACCESS_LIST              /* access list for resource */
  RSRC_TYP_NAME                 /* logical resource type */
  RSRC_DEFAULT_PATH             /* default path in logical resource */

Container Information (core meta-information about containers):

  CONTAINER_NAME                /* name of container - container has all
				   the properties of a dataset */
  CONTAINER_REPL_ENUM           /* container copy number */
  CONTAINER_MAX_SIZE            /* maximum size of container */
  IS_DIRTY                      /* data has been changed in the container
					compared to other copies  */
  OFFSET                        /* position of data in container */
  CONTAINER_SIZE                /* current size of container */
  CONTAINER_RSRC_NAME           /* name of physical resource of container */
  CONTAINER_LOG_RSRC_NAME       /* logical resource associated with a  
					container */
  CONTAINER_RSRC_CLASS          /* class of physical resource associated
					with a container */

Ticket-based Access Control Information (information about ticket-based access control for datasets, collections as well as recursively under a collection):

  TICKET_D                      /* identifier for ticket given for data*/
  TICKET_BEGIN_TIME_D           /* data ticket validity start time */
  TICKET_END_TIME_D             /* data ticket validity end  time */
  TICKET_ACC_COUNT_D            /* valid number of opens allowed on data ticket*/
  TICKET_ACC_LIST_D             /* valid access allowed on data ticket 
					(currently readonly) */
  TICKET_OWNER_D                /* data ticket creator */
  TICKET_OWNER_DOMAIN_D         /* data ticket creator domain */
  TICKET_USER_D                 /* allowed ticket user or user group */
  TICKET_USER_DOMAIN_D          /* data ticket user domain */

  TICKET_C                      /* identifier for ticket given for 
					collection and sub collections*/
  TICKET_BEGIN_TIME_C           /* collection ticket validity start time*/
  TICKET_END_TIME_C             /* collection ticket validity end time*/
  TICKET_ACC_COUNT_C            /* valid number of opens allowed on 
					data in collection */
  TICKET_ACC_LIST_C             /* valid access allowed on data in 
					collection  (currently readonly) */
  TICKET_OWNER_C                /* collection ticket creator */
  TICKET_OWNER_DOMAIN_C         /* collection ticket creator domain */
  TICKET_USER_C                 /* allowed collection ticket user */
  TICKET_USER_DOMAIN_C          /* collection ticket user domain */

Audit Information (audit information on users and on datasets:)

  AUDIT_USER                    /* audit user name */
  AUDIT_USER_DOMAIN             /* audit user domain */
  USER_AUDIT_TIME_STAMP         /* audit on user time stamp */
  USER_AUDIT_COMMENTS           /* audit on user comments */
  AUDIT_ACTION_DESC             /* audited action  on data */
  AUDIT_TIMESTAMP               /* audit time stamp for data */
  AUDIT_COMMENTS                /* audit comments  for data */

Annotations Information (core meta-information on annotating datasets. see also ACCESS_CONSTRAINT attribute for access control ) :

  DATA_ANNOTATION_USERNAME      /* name of annotator */
  DATA_ANNOTATION_USERDOMAIN    /* domain of annotator */
  DATA_ANNOTATION               /* actual annotation string */
  DATA_ANNOTATION_TIMESTAMP     /* time of annotation */
  DATA_ANNOTATION_POSITION      /* user-defined location for the annotation */

Structured Metadata Information (user can store structured (treated as a blob) metadata information):

 STRUCTURED_METADATA_TYPE      /* type of user-inserted structured metadata  */
 STRUCTURED_METADATA_COMMENTS  /* comments on the structured metadata  */ 
 STRUCTURED_METADATA_DATA_NAME /* data name of structured metadata
			          stored as another data object inside SRB */
 STRUCTURED_METADATA_COLLNAME  /* collection name of structured metadata 
                                  stored as another data object inside SRB */
 INTERNAL_STRUCTURED_METADATA  /* strcutured metadata stored as string in MCAT */

Index Information (user can index a dataset , datasets of given type or datasets in a collection. the index is treated as a SRB registered dataset. The user can download the index and search on it. The location can be collection-information (i.e., index is stored as several datasets inside a collection, or can be a URL!. Note that index is treated as a SRB registered dataset and hence inherits all meta information about datasets including structured metadata which can be used to store information about the index. see datacutter proxy for more information):

 INDEX_NAME_FOR_DATASET        /* data name of index on data */
 IX_COLL_NAME_FOR_DATASET      /* collection name of index on data */
 IX_DATATYPE_FOR_DATASET       /* index type*/
 IX_LOCATION_FOR_DATASET       /* path name of index*/

 INDEX_NAME_FOR_DATATYPE       /* data name of index on data type */
 IX_COLLNAME_FOR_DATATYPE      /* collection name of index on data type */
 IX_DATATYPE_FOR_DATATYPE      /* index type*/
 IX_LOCATION_FOR_DATATYPE      /* path name of index*/

 INDEX_NAME_FOR_COLLECTION     /* data name of index on collection */
 IX_COLLNAME_FOR_COLLECTION    /* collection name of index on collection */
 IX_DATATYPE_FOR_COLLECTION    /* index type */
 IX_LOCATION_FOR_COLLECTION    /* path name of index*/

Method Information (users can associate methods on dataset , datasets of given type or datasets in a collection. the method is treated as a SRB registered dataset and hence inherits all meta information about datasets including structured metadata which can be used to store information about the arguments and method return values. see datacutter proxy for more information):

 METHOD_NAME_FOR_DATASET       /* data name of method on data */
 MTH_COLLNAME_FOR_DATASET      /* collection name of method on data */
 MTH_DATATYPE_FOR_DATASET      /* method type */

 METHOD_NAME_FOR_DATATYPE      /* data name of method on data type */
 MTH_COLLNAME_FOR_DATATYPE     /* collection name of method on data type*/
 MTH_DATATYPE_FOR_DATATYPE     /* method type */

 METHOD_NAME_FOR_COLLECTION    /* data name of method on collection */
 MTH_COLLNAME_FOR_COLLECTION   /* collection name of method on collection */
 MTH_DATATYPE_FOR_COLLECTION   /* method type */

Pre-Allocated User-defined Metadata Indformation for Datasets (MCAT has pre-defined some attributes for users to store metadata about their datasets. These metadata can be used in whatever form the user desires including and not restricted to: user-mapped attributes, (variable,value) pairs to store arbitrary list of meta data, small-sized structured metadata, sorted list of values, etc... Note that the size of the strings are 350 characters.):

 UDMS0                         /* user-defined string metadata 0 for data */
 UDMS1                         /* user-defined string metadata 1 for data */
 UDMS2                         /* user-defined string metadata 2 for data */
 UDMS3                         /* user-defined string metadata 3 for data */
 UDMS4                         /* user-defined string metadata 4 for data */
 UDMS5                         /* user-defined string metadata 5 for data */
 UDMS6                         /* user-defined string metadata 6 for data */
 UDMS7                         /* user-defined string metadata 7 for data */
 UDMS8                         /* user-defined string metadata 8 for data */
 UDMS9                         /* user-defined string metadata 9 for data */
 UDMI0                         /* user-defined integer metadata 0 for data */
 UDMI1                         /* user-defined integer metadata 1 for data */

Pre-Allocated User-defined Metadata Indformation for Collections

 UDSMD_COLL0                   /* user-defined string metadata 0 for collection */
 UDSMD_COLL1                   /* user-defined string metadata 1 for collection */
 UDSMD_COLL2                   /* user-defined string metadata 2 for collection */
 UDSMD_COLL3                   /* user-defined string metadata 3 for collection */
 UDSMD_COLL4                   /* user-defined string metadata 4 for collection */
 UDSMD_COLL5                   /* user-defined string metadata 5 for collection */
 UDSMD_COLL6                   /* user-defined string metadata 6 for collection */
 UDSMD_COLL7                   /* user-defined string metadata 7 for collection */
 UDSMD_COLL8                   /* user-defined string metadata 8 for collection */
 UDSMD_COLL9                   /* user-defined string metadata 9 for collection */
 UDIMD_COLL0                   /* user-defined integer metadata 0 for collection */
 UDIMD_COLL1                   /* user-defined integer metadata 1 for collection */

Dublin Core Metadata for Datasets (for more information please check http://www.dublincore.org/ (this set of metadata, even though part of MCAT core, is normally turned off in order to speed up processing. patches ned to be applied if this option needs to be used).

 DC_DATA_NAME                   /* Dublin Core Data Name same as DATA_NAME */
 DC_COLLECTION                  /* DC: Collection NAme same as COLLECTION_NAME */
 DC_CONTRIBUTOR_TYPE            /* DC: Contributor Type: Eg. Author, Illustrator */
 DC_SUBJECT_CLASS               /* DC: Subject Classification */
 DC_DESCRIPTION_TYPE            /* DC: Type of Description */
 DC_TYPE                        /* DC: Type of the Object */
 DC_SOURCE_TYPE                 /* DC: Type of the Source */
 DC_LANGUAGE                    /* DC: Language of the Object */
 DC_RELATION_TYPE               /* DC: Relation with another Object in (170,171) */
 DC_COVERAGE_TYPE               /* DC: Coverage Type */
 DC_RIGHTS_TYPE                 /* DC: Rights Type */
 DC_TITLE                       /* DC: Title of the Object */
 DC_CONTRIBUTOR_NAME            /* DC: Contributor Name. NOT same as (7) */
 DC_CONTRIBUTOR_ADDR            /* DC: Contributro Address */
 DC_CONTRIBUTOR_EMAIL           /* DC: Contributor Email */
 DC_CONTRIBUTOR_PHONE           /* DC: Contributor Phone */
 DC_CONTRIBUTOR_WEB             /* DC: Contributor Web Address */
 DC_CONTRIBUTOR_CORPNAME        /* DC: Contributor Affiliation */
 DC_SUBJECT_NAME                /* DC: Subject */
 DC_DESCRIPTION                 /* DC: Description */
 DC_PUBLISHER                   /* DC: Publisher Name */
 DC_PUBLISHER_ADDR              /* DC: Publisher Address */
 DC_SOURCE                      /* DC: Source Name */
 DC_RELATED_DATA_DESCR          /* DC: Related Data Description */
 DC_RELATED_DATA                /* DC: Date Related to (152,153) */
 DC_RELATED_COLL                /* DC:  */
 DC_COVERAGE                    /* DC: Coverage Information */
 DC_RIGHTS                      /* DC: Rights Information */