Volume 1 Chapter 13 Customising your QUEST/QUEST3D Implimentation
Back to Table of Contents
GSTAT is a multi-functional geometry program within the Cambridge Structural
Database System (CSDS). It will perform:
- Geometrical calculations for complete CSD entries.
- Location of 3D substructural fragments (intramolecular or intermolecular)
within the crystallographic connectivity records of the CSD.
- Geometrical calculations for substructural fragments.
- Statistical and numerical analyses of fragment geometry.
- Output of atomic coordinate data, in a variety of forms, for complete CSD
entries or for substructural fragments.
Fragment location and geometrical calculations are of two types:
- Intramolecular: based on the connectivity of bonded units (residues) in the
crystal chemical unit of each CSD entry.
- Intermolecular: based on the extended connectivity of the complete crystal
structure, constructed using van der Waals radii supplied by the user.
The GSTAT program uses two mandatory input files:
- A subfile of CSD entries in FDAT format. This subfile is generated by the
program QUEST in response to a particular search query.
- A file containing alphanumeric instructions to control the operation of the
GSTAT program.
The GSTAT program will optionally generate a number of output files, principal
among these are:
- A file of atomic coordinates for use by external modelling software.
- A file of fragment geometry, as defined by the user, for use by external
statistical, numerical or visualization software.
- Files of information generated by one (or more) of the statistical
methodologies contained within GSTAT itself.
The GSTAT program is supplied with both the basic and graphics implementations
of the CSDS. In the former case, GSTAT provides the sole ability to perform 3D
substructure searches.
Within the graphics version, much of the functionality of GSTAT (except the
ability to perform statistical analyses) has been transferred to the upgraded
QUEST3D program. In this Version, the processes of statistical analysis and of
data visualization, are being transferred to a new menu-driven and interactive
program called VISTA.
All releases of GSTAT issued after October 1st 1992 are interfaced to the
graphics Version QUEST3D program via the "Fragment" file generated by
that program. This interface permits GSTAT to take advantage of the improved
precision of 2D/3D substructure searches that is afforded by QUEST3D.
Volume 4 (Chapters 1 - 9) of the CSD System Documentation describe the
functions of GSTAT and the structure of the alphanumeric "Instruction File"
through which these functions are accessed.
A central part of Volume 4 is Chapter 9, which presents a definitive glossary
of GSTAT keywords and qualifiers, arranged in keyword order.
The earlier Chapters 2 through 4 of Volume 4 describe the operation of
individual sections of the program, together with the associated keyword subset
and illustrations of program input and output. Chapters 5 through 8 deal with
miscellaneous background information essential to a full understanding of the
GSTAT program.
The remainder of this introduction presents brief summaries of the
functionality of GSTAT and its mode of operation. The introduction is ordered
according to the Chapter titles of Volume 4. Items in bold upper case font in
Sections 14.2 - 14.4 below are the relevant major GSTAT keywords.
Chapter 2 of Volume 4 describes the instructions required for calculations
involving the complete crystal chemical unit, consisting of discrete bonded
units or 'residues'. Functions available are:
- CALCulation of standard INTRAmolecular geometry: bond
lengths, valence angles and torsion angles.
- CALCulation of COORDination sphere geometry: distances and
angles about a selected element within some specified radius.
- CALCulation of INTERmolecular distances involving specified
elements within specified distance limits.
- OUTPUT of atomic COORDinates for the complete crystal
chemical unit in the following forms:
- Fractional coordinates referred to crystallographic axes.
- Cartesian coordinates based on crystallographic axes.
- Cartesian coordinates referred to inertial axes.
Chapter 2 of Volume 4 also describes how the user may control (a) which entries
are processed, (b) which elements are treated in the calculations, and (c) the
content and layout of the results file.
Chapter 3 of Volume 4 describes the facilities for 3D search and for the
calculation of molecular geometry for intramolecular and intermolecular
chemical fragments. These comprise:
- FRAG: specification of the substructural fragment.
- SETUP: specification of geometrical objects such as centroids,
vectors and planes.
- DEFine: definition and naming of numerical parameters to be
generated for each occurrence of the specified chemical fragment. These
parameters may be:
- Data-entry parameters, e.g. R-factor, space group number, etc.
- Simple geometrical parameters calculated from the atomic positions and/or the
geometrical objects established via the SETUP command.
- Special geometrical parameters, such as ring puckering parameters, generated
via special functions within GSTAT.
- TRANS: formation of linear combinations of DEFINE'd parameters
via simple FORTRAN-like statements, e.g. addition of two
parameters, absolute value of a given parameter, etc.
- SELect: selection of located fragments on the basis of their 3D
geometrical characteristics, e.g. a specified torsion angle must fall
within a specified numerical range, etc.
- KILL/KEEP: elimination/retention of specific fragments by use
of their reference number(s).
Later sections of Chapter 3 of Volume 4 describe how to control (a) the CSD
entries to be used in these calculations, and (b) the content and layout of the
results file.
Special and more detailed sections discuss the facilities available for (a)
treating fragments that may exist in 'chiral' and 'achiral' chemical
environments, and (b) handling the effects of topological symmetry that may
occur in some simple chemical fragments.
The processes described in Chapter 3 of Volume 4 result in the generation of a
data matrix of Nf rows and Np columns, where Nf is the number of fragments that
are located and which pass the 3D selection procedures, and Np is the (fixed)
number of numerical parameters specified for each fragment.
Chapter 4 of Volume 4 is primarily concerned with the analysis of the data
matrix generated using the procedures described in Chapter 3 (Volume 4).
Analyses may use one (univariate), two (bivariate) or many (multivariate)
columns of the stored matrix to derive systematic results concerning the
chemical fragment. Current GSTAT facilities are:
- Simple Descriptive Statistics: provides a listing of the data matrix and,
for each column (variable) calculates the mean, the maximum and minimum values,
the sample standard deviation, the standard deviation of the mean, and the
number of observations.
- Visual Display of Geometrical Data: generation of
- HISTogram(s) for individual variable(s).
- SCAT
tergram(s) for pair(s) of variable(s).
- Visual Display of 3D Structures:
- OUTPUT of atomic COORDinates for each occurrence of the fragment in
the styles described at 14.2 above, for use by external plotting packages.
- SUPerposition of retrieved fragments using inertial axis coordinates
and least-squares fitting, for use by external plotting packages
- CHI-squared analyses of distribution(s) of individual variable(s).
- Generation of CORrelations, covariances and individual parameter
variances in matrix form.
- Simple linear REGRession of one parameter on another.
- Principal Component (FACtor) Analysis of a
number of variables selected from the complete data matrix. Scattergrams of
principal component scores often provide a valuable visual overview of the
multivariate data matrix, and can indicate the presence of groups (clusters) of
fragments having similar geometry.
- CLUSTer Analysis based on a number of variables selected from the
complete data matrix: numerical dissection of the dataset into groups
(clusters) of fragments having similar geometry
Chapter 4 of Volume 4 is NOT intended as a textbook on statistical and
numerical methods. It does, however, contain many references to suitable
background texts, and also to the application of statistical and numerical
techniques to crystallographic data retrieved from the CSD.
Substructure search facilities within GSTAT operate on a distance-based
crystallographic connectivity representation. Distance-based connectivity for
the bonded residues of the crystal chemical unit is established using a set of
'standard' covalent radii during the building of each CSD entry. Details of
this intramolecular connectivity representation are included in each FDAT entry
and will be used as a default by GSTAT.
GSTAT has the ability to override this default so as to construct a
distance-based intramolecular connectivity table which uses covalent radii that
are input by the user.
GSTAT also has the ability to construct an intermolecular connectivity
representation using van der Waals radii for specific elements that are input
by the user. This intermolecular connectivity can then be used to locate
fragments that involve hydrogen-bonded and non-bonded interactions between
specified elements.
Chapter 5 of Volume 4 describes how to modify the connectivity representations
that are used by GSTAT in the location of chemical fragments. It also
describes the differences that exist between the crystallographic connectivity
tables employed in GSTAT, and the chemical connectivity tables that are used by
the CSD programs QUEST (basic) and QUEST3D (graphics).
The various functions that may be performed by program GSTAT lead, inevitably,
to a large number of possible instructions. Chapter 6 of Volume 4 breaks these
instructions down into subsets that are related to broad areas of GSTAT
operations. It indicates any rules that govern the ordering of these subsets
within the complete instruction set and, in some cases, the required ordering
of individual keywords within a given subset.
The integration of 2D and 3D substructural searching within the graphics search
program QUEST3D represents a fundamental advance for the CSD System.
Specifically, it provides absolute precision in the identification of the
constituent atoms of a substructural fragment, in terms of both the 2D chemical
constraints and the 3D geometrical constraints that must be satisfied. This
precision is not always available in the search mechanisms of GSTAT, as
summarized in Section 14.5..
The QUEST3D-GSTAT link described in Chapter 7 of Volume 4 provides a mechanism
through which GSTAT may take advantage of the integrated fragment location
algorithms of QUEST3D. By this means, the fragment search mechanisms of GSTAT
are circumvented, and the program will operate on the atoms identified in the
more precise QUEST3D search.
Chapter 8 of Volume 4 describes the information content and format of the
various files that can be output by the GSTAT program.
These files consist of (a) files of atomic coordinates, (b) files of fragment
geometry (the data matrix referred to in Sections 14.3 and 14.4), and (c)
various files that are specific to one or more of the statistical and numerical
techniques summarized in Section 14.4.
Chapter 9 of Volume 4 is the core of the GSTAT manual. It provides fully
detailed definitions of each of the keywords, subkeywords and numerical values
that can be input as part of a GSTAT instruction file.
The glossary is ordered alphabetically by keyword for ease of reference.