Prequest: 4.4 MOL2 Format

Back to File Formats

4.5 BCCAB Format

This is the format used in-house at the CCDC for the main-file database creation.It is also the internal format used in PreQuest; each entry is stored in a temporary working file using this format. When an entry is edited in Prequest using a text-editor this BCCAB format will appear, regardless of the original input format (SHELX etc.). The description given here is sufficient to allow the user to achieve two objectives:

An entry in BCCAB is defined as a set of text lines of maximum length 80 characters. The first line must begin with the "?" character, followed by the reference code. The last line is #END.

For private databases it is best to use numeric reference codes which will not be confused with main-file records. The reference code must be the single unique identifier for each record, e.g. 00001234. This is the key reference number used in the CSD database system.

A typical input record for a published structure is shown below:

?HEMTEY 
#JRNL 035,59,2787,1994
#AUTHOR R.Gleiter, B.Treptow, H.Irngartinger, T.Oeser
#QUAL at 243 deg.K 
#PROPS "Color: colorless.
#SYSCAT sys O cat 3
#CELL   a 15.284,2 b 14.882,2 c 7.805,1
  z 4 sg Pnma v 1775.  cent 1
#DENSITY dx 1.24 fw 330.4
#RFACT R=  0.054
#COMPND Dimethyl hexacyclo(9.5.0.0$1,3!.0$2,10!.0$3,9!.0$9,11!)hexadeca 2,10-dicarboxylate
#SYNONM Dimethyl propella(3)prismane dicarboxylate
#FORMUL C20 H26 O4 
#ATOM O1 0.8519,1 0.4546,1 0.0761,2
O2 0.7778,1 0.3475,1 -0.0662,2
C1 0.8856,1 0.3022,1 0.1253,3
C2 0.9865,1 0.3024,1 0.1573,3
C3 0.9251,1 0.3025,1 0.3078,3
C4 0.9150,1 0.3661,1 0.4540,3
C5 0.9732,2 0.3380,2 0.6058,3
C6 0.9476,3 0.2500 0.6921,5
C7 1.0560,1 0.3662,2 0.1005,3
C8 1.1455,1 0.3382,2 0.1703,4
C9 1.1831,2 0.2500 0.0997,6
C10 0.8376,1 0.3760,1 0.0449,3
C11 0.7316,2 0.4181,2 -0.1575,4
#BOND O1 C10 1.214,3
C2 C3 1.504,3
O2 C10 1.329,3
C2 C7 1.492,3
O2 C11 1.453,3
C3 C3* 1.562,3
C1 C10 1.462,3
C3 C4 1.491,3
C1 C1* 1.554,3
C4 C5 1.539,4
C1 C2 1.562,3
C5 C6 1.525,3
C1 C3 1.548,3
C7 C8 1.530,3
C2 C2* 1.560,3
C8 C9 1.535,3
#END
The entry consists of a number of data fields. Each field begins on a new line with the character "#" followed by the field name. There are no restrictions on the order of fields, or the spacing within the text. The suggested minimum requirement for a private database is the following:

It is strongly recommended that as much data as possible is entered at the time of input as it will enrich the database and make the record much more useful to future users. You should check through the fields listed below and provide data if you can. Remember that #QUAL and #RMARKS text is "searchable" by Quest, and you can use your own keywords here.

A Note on Chemical Diagrams
At the CCDC a 2D chemical diagram is constructed using the 2D Edit function of Prequest, which will appear as the two fields #CONN and #DIAG. There is no need to describe the format of these fields here - if they appear in a working record, please do not edit them. In general you do not input the diagram by typing these fields. Prequest will make the diagram either automatically (see Make 2D), or using the graphical interface (see 2D Edit).


#ATOM : Atomic information

Each atom has an atom label, and fractional coords x/a, y/b, z/c with optional estimated standard deviations (e.s.d.). e.g.

   #ATOM C1  0.1234,3  -0.3456,12  0.4567,8
   C1'       0.3456,2   0.2345,13  0.3456,7
Atom labels
Maximum length 8 characters. Must begin with a valid element symbol, usually followed with numbers, e.g. Br2, C123, but any string of alphabetic characters or quote mark is allowed, e.g. C1' C1" H11a Ow1

Atom coordinates
Input these with a decimal point. If an e.s.d. is given type this after the comma and with no space. e.g. 0.1234,12 means e.s.d. is 0.0012.

Suppressed flag
Suppressed atoms have an "S" after the z-coordinate. These can be manually written using the text editor. In the example below C1' is suppressed.

   #ATOM C1  0.1234,3  -0.3456,12  0.4567,8
   C1'       0.3456,2   0.2345,13  0.3456,7  S

#AUTHOR : Authors' names

Author names to be written in the following style:

   #AUTHOR A.B.Smith, J.-P.Mornon, P.Van Stappen, G.L'Abbe, P.Murray-Rust, 
   D. van der Helm, Yu.T.Struchkov, R.King III, E.F.Meyer Junior, Shao Mei-cheng
Note
Give initials with full-stop, no spaces, and use comma to separate names. This is consistent with the main CSD file and enables concurrent searching.


#BOND : Bond lengths

This field contains the bond lengths reported by the author and corresponds to the atomic coordinates listed in #ATOM. Each bond length is described using the appropriate pair of atom labels followed by the value of the distance. If the e.s.d. of the bond length is available then it follows the value and is separated from it by comma. E.g.

   #BOND  C1 C12  1.451,3
          C1 H1  0.98,1 
Note
This is optional input. These author-given bonds are used as a consistency check in Check-3D, comparing calculated and given values and are not vital in PreQuest.


#CAS : Chemical Abstracts Service Registry Number

This field contains the Chemical Abstracts Service Registry Number. It takes the numeric form AAAAAA-BB-C where the first number AAAAAA can have up to 6 digits, BB has 2 digits, C is a single check digit, e.g. 699-98-9

Note
Optional input.


#CELL : Unit Cell Data

This field contains all of the unit cell information using a variety of keywords:

An example of a #CELL field is:

   #CELL   a 6.3746,5   b 15.8638,8   c 7.7460,6 
           alpha 87.12,1   beta 91.34,4   gamma 93.67,4   v 776.42 
           z 4   sg P-1   cent 1
This example of an anorthic cell can be used to illustrate various details:

In most reported studies the monoclinic cell is chosen with the b-axis unique so beta is recorded. However, if the a-axis is unique then alpha is recorded and likewise if the c-axis is unique then gamma is recorded.

Special conventions are used for the recording of monoclinic space group symbols for a- or c-axis unique:

   a-axis unique     P21 is recorded as P2111
                     P21/n is recorded as P21/n11

   c-axis unique     P21 is recorded as P1121
                     P21/n is recorded as P1121/n
If a trigonal space group is described in terms of a rhombohedral unit cell (a and alpha recorded) then the conventional space group symbol, e.g. R3, is recorded as R3r.

The cent flag is directly linked to the set of general equivalent positions (#SYMM field) which is program-generated from the space group symbol. For cent 1 the #SYMM field contains only one half of the general equivalent positions - those not related by the centre of symmetry at the origin.

Certain space groups allow a choice of origin and the program default always chooses the setting with a centre of symmetry at the space group origin. If this choice is incorrect for a particular structure determination then cent 2 should be manually set and the appropriate #SYMM field input manually.


#CLASS : Chemical Class

This field contains the chemical class assignment for the compound. These classes are listed below or can be seen in Quest by typing HELP CLASS. Example:

   #CLASS   5  9
Note
Each entry can be assigned up to 4 class numbers.

This is optional data but can be very useful especially for classifying natural products.

Chemical Class Class Number
Carbohydrates 1
Nucleosides & nucleotides 2
Amino-acids, peptides and complexes 3
Porphyrins, corrins & complexes 4
Antibiotics 5
Steriods 6
Terpenes 7
Alkaloids 8
Micellaneous natural products 9
Suprmolecular entities 10
High polymers 11


#COMPND : Compound Name

This field contains the chemical compound name following standard rules if possible. For natural products the trivial name is usually recorded - a field exists to supplement the name by one or more synonyms e.g. drug or trade name (see #SYNONM).

CSD conventions are:

Note
You can use trivial names or local names e.g. Compound A12387


#FORMUL : Molecular Formula

This field contains the molecular formula, represented as the sum of the individual formulae for each of the residues. A residue is defined here as being a discrete bonded unit. For example, sodium acetate monohydrate consists of 3 residues viz. acetate anion, sodium cation, water molecule - its formula is recorded as:

   #FORMUL  C2 H3 O2 1-,Na1 1+,H2 O1
The general expression for a residue formula is:

Elements are listed in the order:

Note
This is always used at CCDC for consistency checking. If you use Prequest to construct the 2D chemical diagram (Make-2D or Edit 2D) the #FORMUL field is generated automatically. A #FORMUL field is necessary for screen generation by PreQuest and without it the entry may be rendered unsearchable in the database.

In the example:

   #FORMUL  C2 H3 O2 1-,Na1 1+,H2 O1
notice that spaces separate each element-item, commas separate residues. Charges are given at end of residue in the style 1-, 2+ etc.


#JRNL : Journal Reference

This field contains the journal reference for a published structure. It takes the form: coden, volume, page, year. E.g.

   #JRNL  591,39,136,1983
   #JRNL  1078,,,1995
Note
Optional input for private databases. If #JRNL is not input PreQuest will treat the records as "Private Communication", coden 1078, and fill in the current year from the computer system date. Coden is a table of code numbers pointing to text names for journals. This can be seen in Quest by typing HELP CODEN.


#PROPS : Crystal Properties

This field stores crystal property information using four keywords

Example:

   #PROPS "Mp. 123deg.C
   "Color: red-brown.
   "Source: bark of Japoni
   "Note: Hygroscopic and decomposes on exposure to x-ra
Note
Optional input but potentially contains valuable information - include if relevant data are available.


#QUAL : Qualifier

This field contains important attributes of the compound, or the crystallographic study if different from X-rays at room temperature.

The types of data recorded in main-file CSD are:

Examples:

   #QUAL neutron study, at 123 deg.C
   #QUAL neuroleptic drug, absolute configuration
   #QUAL blue monoclinic form
   #QUAL alpha form, high-order data only
   #QUAL refinement of data of Cannas et al.,Inorg.Chem. 16,228,1977
   #QUAL refinement in space group no. 33
Note
Optional for private databases - but recommended input.


#RADIUS : Elemental Radius for bonding

This field contains the radius used to determine the crystal connectivity for each element present in the list of atomic coordinates (#ATOM). The distance Dij between two atoms i and j is defined to be a bonding distance if

where Ri, Rj are radii of atoms and Tol is a tolerance normally 0.4 Å.

Note
Not normally input. The radius values are obtained from a standard table by PreQuest. The user will not normally need to edit these values as radius adjustment can be performed by Radj in Check 3D. The table is printed in Vol 3. Appendix 10.


#RFACT : R-Factor (crystallographic refinement accuracy)

This field contains the crystallographic R-factor as a decimal number e.g.

   #RFACT R=0.0410
The sub-keyword R= has the value 0.0410, indicating an R-factor of 4.1%

Note
Optional input - but a very valuable indicator of accuracy.


#RMARKS : Remarks

This field contains general remarks not catered for by the other keywords. Examples:

   #RMARKS  The coordinates of P(8) seem to be in error.
   #RMARKS  Unresolved problems with the coordinates of solvent.
Note
Optional input - but remember this is searchable text in Quest.


#SYNONM : Compound Synonym

This field contains any appropriate synonym(s) for the compound name. It is often used to record trade or drug names. If more than one synonym is required then separate with semi-colon. Examples:

   #SYNONM  Aspirin
   #SYNONM   Ampicillin; Nuvapen; Totapen
   #SYNONM   1,8-Dihydroxy-2,4,5,7-tetranitro-9,10-anthracenedione
Note
Optional input. Local company names could be given e.g. A1234C5.


#SYSCAT : Crystal System and Entry Category

This field contains the crystal system information using the keywords sys and cat e.g. #SYSCAT sys A cat

Crystal system is recorded as:

For trigonal space groups the conventions used are:

Category is always recorded as 3, meaning full structure determination.

Note
Not normally input - generated automatically by Prequest.


#TOLER : Bonding Tolerance

This field contains the tolerance, in Angstroms, used in conjunction with #RADIUS for the determination of the crystal connectivity. See #RADIUS. E.g.

   #TOLER 0.40
Note
Not normally input - generated automatic by PreQuest.


#UNIS : Processing Information

This field is used to control the level of processing of the data, causing certain test to be by-passed. It is chiefly for use by the CCDC editorial staff to allow unresolved errors to be flagged in the main database, and to ensure that no incorrect connectivity is stored. The data consists of keywords followed by values.

   int  3  diffractometer data
        2  densitometer photographic data
        1  visual photographic data

   ig   1  ignore atom valence checks
        2  serious error - no crystal connectivity output

   rpa  1  refer problem to author - by letter

   pd   1  disorder is present (set by PreQuest)
Back to File Formats