Back to PreQuest User Guide
In order to show how PreQuest can be used to create a small private database we have provided some example files with the distribution program
(see $CSDHOME/examples/prequest). The most common formats presented by
users are CIF (example.[1-4]) and SHELX (example.[5-8]). This section will illustrate what you will need to do to process these examples into a useful CSD searchable database.
- start-up PreQuest, specifying X-Windows and the input file example.1:
prequest -term x -if example.1
- the input file is recognised as being
CIF,
and the Main Menu appears with the example loaded:
Schematic of PreQuest Main Window
- the program has automatically created a 2D chemical diagram from the given
3D coordinates
(see section 2.14).
This diagram is essential for the search
routines of the Quest program to function correctly, as it provides the
chemical connectivity, bond types and atomic charges. This diagram does not
need to be elegant, but it can be visually improved by use of the
2D Edit
function if required (see section 2.16).
- a small amount of editing is required for this entry before it can be
stored in a database. The Message Box
contains colour-coded
error messages
where:
- red "serious" error must be dealt with
- yellow "warning" should be dealt with, but less imperative
- grey "information only" does not require attention
(Note: the level of checking can be controlled -
see section 1.5)
- the red error message for this entry states that there is an incomplete
JRNL record. No journal reference was given in the input file. Structures
that do not have valid journal references can be recorded as "Private
Communications". In order to classify this structure as a Private
Communication, use the mouse to click anywhere in the Reference
text box, then at the cursor type 1078,,,1996 - where 1078
refers to the
journal code
number for Private Communications, and 1996 is the year
(see section 4.5 for a
full explanation of the format of the reference line - #JRNL). Hit
Return and
the red error message will be replaced by the grey information message stating
that the journal reference has successfully been set to a Private
Communication.
- the compound name should also be given, despite the absence of this not
being an error. Use the mouse to position the cursor in the Compound
text box, then type the compound name for this structure:
Methyl 4-iodo-1-cubane
carboxylate.
- in the Refcode List the entry will now have a green Checking
Status. This means that there are no significant errors and this structure
can be written to a searchable database file. To save the entry to a private
database select Export, chose the ASER format, then type a
filename (your intended database name e.g. mybase). Once
successfully exported the "database" can be searched by Quest,
using the command
quest problemname -db mybase
PreQuest performs a large number of checks on the data fields of every entry. For convenience these can be divided into:
- 1D: Text and Numerical Items
- 2D: Chemical Diagram
- 3D: Crystallographic Connectivity
The individual data fields in each of these distinct classes can be easily
edited from within the PreQuest program. This can be done using the
Editing Function buttons available in the
Main Menu.
1D Edit
This allows the user to edit the actual text of the record (in BCCAB format -
see section 4.5).
However, many entries can be satisfactorily changed by direct
overtyping in the Data Boxes provided, e.g. Author, Compound etc.
When editing text fields in PreQuest:
- CONTROL-U erases all text
- ESCAPE reverts to previous contents
- DELETE deletes the character left of cursor
- RETURN completes the edit
- TAB moves cursor to next field
Note that on starting PreQuest the default editor is set to be
"textedit" for Suns. If your computer is not a Sun you must select
another editor from the list in the Preferences menu
(see section 2.19).
We recommend that SG users choose "jot", while other Unix users
choose "xedit".
2D Edit
Allows modification of the chemical diagram. Specific editing of the diagram
may be necessary to ensure that chemical bond types and atomic charges are
correctly assigned. Users also have the option of completely redrawing the
chemical diagram. This may be required, for example, if certain chemical
conventions are to be adhered to. See section
2.16.
3D Check
The user can change the coordinates to correct any errors in the structure, and
has the ability tosuppress any unwanted atomic sites in disordered
structures. See section 2.18.
When to Edit
As a general guide, if there is a "red" error status then some
editorial action should be taken. PreQuest will allow erroneous entries
to be archived, for example when they are missing fields or contain mistakes.
However, it should be remembered that unchecked data greatly reduces the value
of your database, and may result in search failures when using Quest. It
is, therefore, well worth the effort at this stage to create records of as high
a standard as possible.
There is an option to control the levels of error checking in the
Preferences menu
(see section 2.19).
It is normally sufficient to
use the Relaxed (minimal rules) option.
It is essential to edit the atomic coordinates for disordered sites. This is
because the current Version 5 CSD software does not have storage facilities for
disorder groups, site numbers, and occupancy factors. If a structure contains
disorder (as occurs for ~12% of all CSD entries) then the minor occupancy sites
must be suppressed to leave a single representative atomic position,
that will then be matched against the chemical diagram. Suppressed
atoms are
not deleted, they simply take no further part in the establishment of the
crystallographic connectivity, and therefore will not be used in Quest 3D
searches.
How to deal with a typical example of a disordered structure is now illustrated using another example:
- Open example.2
- there are a number of error messages shown in the dialogue box for this
structure. The summary of the number and status of these is also shown (6 red
errors, 16 yellow errors). The error message about the missing journal
reference can be fixed as for the first example. The remaining messages
are all a result
of coordinates for a disordered CF3 group being present. This is
immediately apparent from the chemical diagram which is nonsensical.
- the disorder must be treated before the entry
is in a satisfactory state to be archived to a
database file.
- Select to 3D Check.
- it is clear that two sites are given for each CF3 group.
Assuming that the occupancy factors are 0.5 we can choose to suppress one of
those sites. Select SUPP (suppress) then click on the atoms F26', F27',
F28', F30', F31' and F32' (you may have to use the 3D Controls to
manipulate the view of the structure). Finally re-select SUPP to finish
the selection process. The selected atoms are suppressed - their atom labels
remain on the screen but the bonds to them are removed.
- to save these edits and return to the main menu select QuitS (quit
and save). To complete the corrections for this entry the chemical diagram
needs to be re-made. Select
Make2D. PreQuest re-makes the
chemical
diagram based on the new crystallographic connectivity.
- all of the error messages now disappear and the entry goes
"green". This is now in a suitable state for saving to a database
file.
The assignment of reference codes (refcodes) for your private database is
controlled by an auxiliary file called prequest.refcodes. This file
should be present in your root directory. If it is not present PreQuest
will assign a 6-digit number in sequence starting at 000001.
prequest.refcodes consists of 8 lines, each of which defines the
sequence of characters to be assigned by each new data entry read. This example
shows the 8 lines as:
S
0123456789
0123456789
0123456789
0123456789
0123456789
<blank>
<blank>
This has the effect of always keeping character 1 as "S" and
characters 7 and 8 as "S". The sequence of the codes will then be
S00001, S00002, S00003 etc.
Between sessions PreQuest records the last refcode it generated in the
9th line of this file. A filename other than prequest.refcodes can be
specified by using the environment variable CCDCNEWREFCODES.
It should also be noted that the CIF field _database_code_CSD
can be used from
within a CIF to specify a refcode for the structure as it is read into PreQuest.
At the CCDC production of the main database requires that all entries pass a
series of detailed and elaborate checks. This ensures that the data is of the
highest standard and integrity. PreQuest incorporates two levels of
checking that reflect this: Strict (full CCDC rules) and Relaxed
(minimal rules). Most users will find that the Relaxed setting is adequate
for private database creation, and this is the default setting in the
Preferences menu.
However, some words of warning are needed. It is, for example, not necessary to
give an author or compound name, but this means that these records will never
be retrievable by Quest using these search parameters. The absolute
minimum for an entry is to have a journal reference. If you switch on the
Strict setting you will find that the author, compound name, formula, cell
and class area also required at the CCDC.
The Check
menu presents options to switch on/off checking at 1D, 2D and
3D levels. We advise keeping all of these checks ON. Switching off these
checking tests can result in important errors being missed which result in
making the entry unsearchable by Quest.
The recommended minimum data for private databases are:
- Journal
- Author
- Compound Name
- Cell, Space Group, z-value
- R-factor
- Atomic Coordinates
Note that for space groups which require exact cell angle values (e.g. for P212121 the cell angles alpha =
beta = gamma = 90 degrees ) the alpha, beta and gamma
data fields for the entry may show no values. This is because they are exact
and are therfore considered as "redundant" or "assumed"
for that system. (For more information see #CELL -
section 4.5.)
We also recommend that any textual information useful for retrieving items
within your local context be added. For example:
Synonym Compound 2317P Lab number 567894
Qualifier antibiotic activity
Remarks Refinement incomplete - see file wxyz.res
Properties Phi 56.7
Most of the data fields that are presented to the user have obvious meanings.
They relate to the data fields of the BCCAB format specification
(see section 4.5). Of the slightly less
obvious 1D data fields:
- sys is the crystallographic system
(see #SYSCAT) e.g. sys T =
tetragonal
- cat is the determination category
(see #SYSCAT) e.g. cat 3 = full
structure determination
- int refers to the intensity measurement
(see #UNIS) e.g. int 3 =
diffractometer data
- class refers to the chemical class assignments made for the compound
(see #CLASS)
Back to PreQuest User Guide
PreQuest: 2. Main Menu
.