This document provides a brief note on usage of the database access interface (DAI). The DAI provides access to tabular data stored in databases using the SRB. This document details how to use the SRB API to ingest into and retrieve from a database tables. The DAI includes mechanisms for tailoring the input and output streams. This is performed by associating a 'template' file with the input which can be used to interpret the data in the input file and convert them into SQL statements for inserting into the database. On the way out, the 'template' file can be used to construct forms and marked-up documents using the tabular data produced by SQL statements. These conversion operations occur on the fly and inside the SRB and is conveniently transparent to the user. We have implemented a template language called T-language which is used for building these templates. We discuss briefly how to use the T-language templates. A separate primer on the T-language is also available in the TLANG.primer file.
We provide brief notes on various stages in using the DAI interface:
1. Registering a database object with MCAT:
(a) Register the location of the database using ingestLocation command using the string "<hostname>:<databasename>:<instancename>" as the netprefix parameter.
<hostname> is the name of the host where the SRBserver brokering the database is located. <databasename> is the name of the database. <instancename> is the name of the database instance or it can also be the name of the user under which the database is going to be accessed.
ingestLocation 'catharsis dlib1 DB' 'catharsis.sdsc.edu:dlib1:srb' 'catharsis sdsc' srb sdsc
Currently we provide only one user per resource (i.e., per location). We plan to provide a more flexible combination in a later release.
(b) Register the resource using ingestResource command with as the default path name, '??? table database' as the resource type, where ??? can be oracle, db2 or illustra.
ingestResource 'ora-dlib1-sdsc' 'oracle table database' 'catharsis dlib1 DB' permanent 0
(c) Register the database itself as a SRB dataset using the Sregister command and use options -D 'database shadow object' /* for datatype */ and (empty string) for RegisteringObjectPath value.
Sregister -D "database shadow object" -R ora-dlib1-sdsc "" dlib1
The shadow object is a unique concept in SRB. Normally, when we ingest or register an object in SRB, the object is considered to be an atomic object and is retrieved as such when accessed. When we register (shadow objects cannot be ingested) a shadow object in SRB, it implies that the object is not a 'real' object (with its own bits and bytes), but stands for (or is a pointer to) a set or structure of objects. Using the data type 'directory shadow object' one can register a directory in SRB without registering (or ingesting) its component files and sub-directories). In this sense the path name registered for that object is partial and gives a prefix. During actual access, the user provides the suffix portion of the path that is appended to the path. The shadow object in this sense is not a real object but stands for a structured set of objects whose contents/access paths are defined at access time.
The 'database shadow object' provides a data type for a database. With the path name being empty, it will allow any query to be sent to the database. One can provide for a more controlled querying by giving a partial path such as 'select ' instead of . Note that the default path registered as RegisteringObjectPath can contain variables in them and hence a very flexible but uniquely controllable querying is possible.
Sregister -D 'database shadow object' -R ora-dlib1-sdsc 'select * from AA ' dlib1 with the where condition optionally specified by the user.
Sregister -D 'database shadow object' -R ora-dlib1-sdsc '<TEMPLATETYPE>HTMLREL</TEMPLATETYPE> select ' dlib1 which predefines a template to be used for accessing results from this database through this object.
(d) Enter the three lines (for Oracle) in data/TableConfig file: OracleUser '<databasename>:<instancename>' '<username>' OraclePasswd '<databasename>:<instancename>' '<password>' OracleHome '<databasename>:<instancename>' '<home-dir-of-oracle>' <username> is the name of the user under which the database is going to be accessed. <password> is the password to be used when connecting to the database. <home-dir-of-oracle> is the path name of Oracle Homeas given in ORACLE_HOME Example: OracleUser 'dlib1:srb' 'sekar'
OraclePasswd 'dlib1:srb' 'sekarpassword' OracleHome 'dlib1:srb' '/usr/local/apps/oracle/product/9.2.0'
In case of DB2, use the phrases Db2User and Db2Passwd and for Illustra use the phrases IllusUser and IllusPasswd.
The PostgreSQL database driver is compiled by adding the --enable-psqlobj to the configure flags. Also, check the patches page, and ensure you don't need the patches for Bug 196, which prevents SRB from compiling.
Enter the three lines (for PostgreSQL ) in data/TableConfig file:
PostgresUser 'PostgreSQL:emp_test' 'osUser'
PostgresHome 'PostgreSQL:emp_test' '/scratch/slocal/osUser/pgsql'
For postgreSQL database, if your 'database shadow object' was from another postgreSQL database, (not SRB metadata MCAT database, for SRB user)the OS user who runs DAI-enabled SRB server, in this user's home directory should have an .odbc.ini file, look like this:
[PostgreSQL] Servername = your-MCAT-hostname Database = MCAT Port = 5432 Driver = PostgreSQL UserName = srb [sampleDB] Servername = your-DAI-enable-SRBserver-hostname Database = sampleDB Port = 5432 Driver = PostgreSQL UserName = srb
2. Retrieving data from databases:
There are three ways of accessing data from a database through the DAI, each of increasing complexity.
(a) Packed Output
In the first and simplest method, one provides a SQL select query as value for the SHADOW variable. The output is in the form of a file with every value string in the row-order output is terminated with a ASCII NULL symbol. We call this mode of presentation as the PACKED mode, since there is no mark up involved except for the delimiting of values using the NULL symbol ('\0' of C).
Scat 'dlib1&SHADOW= select * from AA'
Sget '/home/sekar.sdsc/dlib1&SHADOW= select A,B from AA where B >= 2' outFile
Scat 'dlib1&SHADOW= select distinct T.A, S.C from AA T, BB S where T.B = S.B order by 1 asc, 2 desc'
Note how, the value of the SHADOW variable is given using the ampersand as part of the object name in the Scommand. Also there should be no blanks between the object name and the '&' symbol unless the object name has been registered with blanks. In these samples we are assuming that the tables are pre-created and have a few values inserted appropriately before hand.
(b) Built-In Output Templates
In the second method, the user can apply any of the few built-in templates that are provided in the DAI package. These templates give a a set of output independent of the attributes involved. The templates currently provided by DAI are:
HTMLREL - result table encoded as an HTML tabular data. The attribute names are used for heading the column names. HTMLTREE - result table is given in a directory-like tree form with one value per line of the table. HTMLNEST - result table is given as a nested relational table. This requires that the columns be ordered. The difference between this and HTMLTREE is that in here the values in a subsequent column are nested into the same box of the HTML table if there column values are the same. XMLREL - result table is marked up in XML tags. The tag structures used are very simple and intuitive. XMLTREE - the result table is nested and tagged in XML. This captures the semi-structures found in the result table. This also requires that the columns be ordered.
Scat 'dlib1&SHADOW= <TEMPLATETYPE>HTMLREL</TEMPLATETYPE> select * from AA'
Sget '/home/sekar.sdsc/dlib1&SHADOW= <TEMPLATETYPE>HTMLNEST</TEMPLATETYPE> select A,B from AA where B >= 2 order by 1,2' outFile
Scat 'dlib1&SHADOW= <TEMPLATETYPE>XMLTREE</TEMPLATETYPE> select distinct T.A, S.C from AA T, BB S where T.B = S.B order by 1 asc, 2 desc'
(c) User-defined Templates
In this method, the user creates a template in a special language called the T-language (cf. TLANG.primer) and registers that template as a SRB object. The user can give read permissions for the template to other users and groups if desired. The template can then be used to format the output of the result table. The template can take parameters which can also be given by the user when performing the access. These templates can be viewed as style sheets for displaying the tabular data.Note that templates, more often than not, are tailored to specific queries or class of queries.
Scat 'dlib1&SHADOW= <TEMPLATETYPE>TEMPLATE /styles/userInfoForm1.html</TEMPLATETYPE> select rsrc_id,rsrc_name,rsrc_typ_name from mdas_cd_rsrc A,mdas_td_rsrc_typ B where A.rsrc_typ_id=B.rsrc_typ_id'
Sget '/home/sekar.sdsc/dlib1&SHADOW= <TEMPLATETYPE>TEMPLATE /styles/userInfoForm2</TEMPLATETYPE><TEMPLATEPARAM>alpha=2</TEMPLATEPARAM> select A,B from AA ' outFile
In the first example, the template userInfoForm1.html is being used for configuring the output of the query. The second example uses a template which requires a parametric value for alpha at access time. The user supplied value of 2 is used during this access. The value is used in computation when formatting the output. For more information on the template creation and execution, please see the primer in file TLANG.primer.
The section on registering a dataset provides an example of how one can pre-associate a template to an object.
3.Inserting data into databases:
The Database Access Interface provides four ways of inserting data into a database. One important point to note is that when one inserts data into a database using the DAI, the database and the associated tables are expected to exist before hand and registered with SRB. Because of this one needs to open the srbObject (using srbObjOpen) instead of creating the object (using srbObjCreate). In the case of Scommands, one should use the Sappend utility instead of the Sput utility. In the current release Scp or Smv are not supported for this type of resource. But, using the DAI, one can copy a set of tables (possibly one) from one database into another set of tables (possibly arranged in a different schema) in another database of possibly different vendor. The Scat and Sappend are used inconnected by a pipe to achieve this.
(a) The simple method : using SQL insert statements. In this method we send insert statements through the write-buffers. These insert statements get executed and committed when the file is closed. For this type of ingestion SHADOW value = insert [into <tab-name> ] and the data value sent through buffers is a set of insert statements separated by commas.
Sappend -i 'dlib1&SHADOW=insert into AA' insert into AA values (30,40,30,'aa','bb'); insert into AA values (40,40,30,'aa','bb')
Sappend -i 'dlib1&SHADOW=insert' insert into AA values (30,40,30,'aa','bb'); insert into AA values (30,20,30,'aa','bb'); insert into AA values (40,40,30,'aa','bb')
Sappend -i 'dlib1&SHADOW=insert' insert into AA values (30,40,30,'aa','bb'); insert into BB values (40,'aa','bb')
Sappend -i "dlib1&SHADOW=insert" insert into AA select A,A,A,B,C from BB;
The -i option of Sappend diverts the terminal input into SRB object. Hence, the lines following that statement are the ones that are filled into a buffer and sent to SRB. As can be seen one can ingest multiple tables and execute complex insert statements using this method.
(b) Insertion using PACKED data and VALUES data This method allows one to insert data into a single table where the data is in PACKed form (i.e., each value is ASCII NULL separated). For example, the output from a table using the packed format (cf. Section 2(a)) is in this format and can be inserted into another database table if necessary using this format.
Sappend packFile 'dlib1&SHADOW=<TEMPLATETYPE>PACK</TEMPLATETYPE>insert into AA'
Scat '/home/sekar.sdsc/dlib1&SHADOW= select distinct * from AA' | Sappend -i 'dlib1&SHADOW=<TEMPLATETYPE>PACK</TEMPLATETYPE>insert into AA'
The second example shows how one can copy tabular data from one table to another (in this case itself) possibly on different databases from different vendors. Also, The ingestion takes into account whether the value goes into a character column or a numeric column and adds surrounding quotes if needed. The table name in the insert statement of SHADOW is used for this purpose.
One can insert the PACKED values into multiple tables also. This is accomplished using the MULTIINSERT preamble being added to the SQL statements in the SHADOW value.
Sappend packFile "dlib1&SHADOW=<TEMPLATETYPE>PACK</TEMPLATETYPE>MULTIINSERT insert into CC1 values (***0:,***1:,***2:);insert into CC2 values (***0:,'***3:','***4:');"
The DAI executes the set of SQL statements given in the MULTIINSERT part one after the other. The variables (predefined from ***0: onwards upto ***255: ) are bound with values from the input buffers. Note that if the field is a character field, the user has to specifically add the quotes unless the input buffer has quoted the appropriate field values. The technique can be used to copy data from one database to another and into an entirely different schema.
The PACKED format takes in NULL separated data and is useful when the data is generated by a machine. In case, data is hand generated one can use the VALUES template. With this tag, each row of values to be inserted is separated from the next row of values using the ';' symbol. DAI takes these values and plugs them into insert statements and executes them.
Sappend -i 'dlib1&SHADOW=<TEMPLATETYPE>VALUES</TEMPLATETYPE> insert into AA' 300,400,300,'eaa','ebb'; 300,200,300,'eaa','ebb'; 400,400,300,'eaa','ebb';
Note that each value can be in separate lines as long as each value is separated by a ',' and each row separated by a ';'. Multi table inserts are also possible (as in previous example) with the VALUES type.
(c) Insertion of Data in Built-In Template formats This method allows the user to insert data that exist in any of the built-in formats supported by DAI.
Sappend xmlrelfile 'dlib1&SHADOW= <TEMPLATETYPE>XMLREL</TEMPLATETYPE> insert into AA'
Scat '/home/sekar.sdsc/dlib1&SHADOW= <TEMPLATETYPE>XMLREL</TEMPLATETYPE>select distinct * from AA'
| Sappend -i 'dlib1&SHADOW=<TEMPLATETYPE>XMLREL</TEMPLATETYPE>insert into AA'
The same techniques will work for other built-in templates also, Multi table inserts are also allowed in this method.
Sappend htmlnestFile "dlib1&SHADOW=<TEMPLATETYPE>HTMLNEST</TEMPLATETYPE> MULTIINSERT insert into CC1 values (***0:,***1:,***2:); insert into CC2 values (***0:,'***3:','***4:');"
Note that the DAI automatically recovers all the values even when the data is in the nested relational formats.
(d) Insertion of Data in User-Defined Template formats
This method allows the user to insert data that was created/output using a user-defined template. Note that for performing the ingestion, the template should have ingestion rules as part of the template. Additional information on these rules are given in the TLANG.primer file.
Sappend templfile 'dlib1&SHADOW= <TEMPLATETYPE>TEMPLATE /styles/userInfoForm1.html</TEMPLATETYPE> <TEMPLATEPARAM> alpha = 24 , beta = 34 </TEMPLATEPARAM> insert into BB'
Note that two parameters alpha and beta are being used by the template routine when calculating the data to be stored. Multiple table inserts are also possible with user-defined templates.
Sappend templfile2 "dlib1&SHADOW=<TEMPLATETYPE>TEMPLATE /styles/userInfoForm1.html</TEMPLATETYPE><TEMPLATEPARAM>alpha = 2, beta = 34 </TEMPLATEPARAM> MULTIINSERT insert into B1 values (***RSRC_ID:,'***RSRC_NAME:'); insert into B2 values (***RSRC_ID:,'***RSRC_TYP_NAME:')"
Since in a template, the order in which the values are placed in the template may differ from the order of target fields in the tables, user-defined variable-values, such as ***RSRC_ID: are used. More details of how the user-defined variable-values are gathered from the data using the template can be found in the TLANG.primer file.
4. Performing other database functions:
The DAI provides a way for performing other database functions. But this requires the user to give the exact statements needed by the native database being accessed. Note that in most cases if many such statements are issued during a single access, all of them will be performed in order but in a bunch (i.e., even if typed in, they probably will be executed only when a ^-D is pressed). Hence one looses interactivity as seen from a SQL query client. Because of this we recommend that users do not use this for executing ad hoc data definition functions through this API. But if there is a script that has been tested, this may be a good vehicle to execute them.
Sappend -i "dlib1&SHADOW=" create table D1 (AA char(25), BB int); insert into D1 values('aaa',25);
Sappend execFile "dlib1&SHADOW="
Note that the SHADOW variable is given an empty string. Do not use "dlib1&SHADOW=" as this will be taken as a null quoted string and will result in an error. Note also that even though we are creating a new table, we still use the Sappend command and not the Sput command.