DERIVING SYBYL ATOM AND BOND TYPES FROM THE CSD

Contents


A. Preamble

An entry in the CSD has a 2D domain (CONN) and a 3D domain (DATA). The 2D domain contains the chemical connectivity of the structure, i.e., the element symbol of each atom and the bond types connecting them. The 3D domain contains atom coordinates and the crystal connectivity of the structure, i.e., element symbol and an indication of which atoms are connected to which; the 3D domain does not contain information about bond types.

The 2D and 3D domain are connected by matching numbers. These indicate which atom in the 3D domain maps to which atom in the 2D domain. For some entries, this mapping is not possible. Such an entry is said to be unmatched. When an entry is unmatched it is impossible to relate the bond types stored in the 2D domain to the crystal connectivity and coordinates stored in the 3D domain.

If an entry is matched then atom-typing is done predominantly from the 2D element and bond types. Only one 3D test is used in order to distinguish planar and pyramidal nitrogen.

If an entry is unmatched then the atom-typing can only be done using the 3D crystal connectivity and the geometry of the structure. Because there is no information about bond types and hydrogen atoms by be missing from the 3D domain, atom-typing will be less accurate.

Some entries are partially matched. In these entries, if an atom is matched then the 2D atom-type is used; otherwise, the 3D atom-type is used.

Some entries have coordinates but no crystal connectivity (they will as a consequence be unmatched). For these entries, it is only possible to make a wild guess at the atom-types.

Reference

Clark, M., Cramer, R.D. III, Van Opdenbosch, N., J. Comp. Chem., 10, 982-1012 (1989).


B. Index to Atom Type Rules

[ 1 ] [ 1.1 ] [ 1.2 ] [ 1.3 ] [ 1.4 ] [ 1.5 ] [ 1.6 ] [ 1.7 ] [ 1.8 ] [ 1.9 ] [ 1.10 ] [ 1.11 ]
[ 2 ] [ 2.1 ] [ 2.2 ] [ 2.3 ] [ 2.4 ] [ 2.5 ] [ 2.6 ] [ 2.7 ] [ 2.8 ] [ 2.9 ] [ 2.10 ]
[ 3 ] [ 3.1 ] [ 3.2 ] [ 3.3 ] [ 3.4 ] [ 3.5 ] [ 3.6 ] [ 3.7 ] [ 3.8 ] [ 3.9 ]
Carbon sp3 (C.3)				1.6.1, 2.5.1, 2.5.2, 2.5.3, 3.5
Carbon sp2 (C.2)				1.6.5, 2.5.2, 2.5.3
Carbon sp (C.1)					1.6.4, 2.5.2, 2.5.3
Carbon aromatic (C.ar)				1.6.3
Carbocation (guanadinium) (C.cat)		1.6.2
Nitrogen sp3 (N.3)				1.8.6, 2.7.2, 2.7.4, 3.7
Nitrogen sp2 (N.2)				1.8.7, 2.7.5
Nitrogen sp (N.1)				1.8.3, 1.8.4, 2.7.2, 2.7.5
Nitrogen aromatic (N.ar)			1.8.2
Nitrogen amide (N.am)				1.8.5, 2.7.3
Nitrogen trigonal planar (N.pl3)	 	1.8.6, 2.7.4	
Nitrogen sp3 positively charged (N.4)		1.8.1, 2.7.1
Oxygen sp3 (O.3)				1.7.2, 2.6.2, 2.6.3, 3.6
Oxygen sp2 (O.2)				1.7.3, 2.6.4
Oxygen in carboxylates and phosphates (O.co2)	1.7.1, 2.6.1
Sulphur sp3 (S.3)				1.9.3, 2.8.3, 3.8
Sulphur sp2 (S.2)				1.9.4, 2.8.4
Sulphoxide sulphur (S.o)			1.9.1, 2.8.1
Sulphone sulphur (S.o2)				1.9.2, 2.8.2
Phosphorus sp3 (P.3)				1.4, 2.3, 3.3
Titanium (Ti.th, Ti.oh)				1.10, 2.9, 3.4
Chromium (Cr.th, Cr.oh)				1.10, 2.9, 3.4
Cobalt (Co.oh)					1.5, 2.4, 3.4
Ruthenium (Ru.oh)				1.5, 2.4, 3.4
Deuterium 					1.3, 2.2, 3.2
Polymeric atoms					1.2
Suppressed atoms				1.1, 2.1, 3.1
Other atoms					1.11, 2.10, 3.9
Matched entries					1
Unmatched entries				2
Entries with no crystal connectivity		3

C. Rules for Determining Atom Types

num_bond is the number of bonds an atom forms excluding pi-bonds.
num_nonmet is the number of non-metal bonds an atom forms excluding pi-bonds.

A non-metal bond is a bond to one of

   H D B C N O F Si P S Cl As Se Br Te I At He Ne Ar Kr Xe Rn
Default atom-type is dummy (Du).

The order of the rules is important.

 
1. If entry is matched then for each atom
 
1.1 	If atom is suppressed then atom_type is Du
 
1.2 	If atom is at the end of a polymeric bond then atom_type is Du
 
1.3 	If element_symbol is D then atom_type is H
 
1.4 	If element_symbol is P then atom_type is P.3
 
1.5 	If element_symbol is Co .OR. element_symbol is Ru then 
	atom_type is Co.oh or Ru.oh
 
1.6 	If element_symbol is C then

1.6.1 		If num_bond .ge. 4 .AND. all bonds are single then 
		atom_type is C.3

1.6.2 		If num_bond .eq. 3 .AND. all bonds are acyclic .AND. 
		all bonds are to nitrogen .AND. each nitrogen forms bonds to 2 
		other atoms both of which are not oxygen then atom_type 
		is C.cat

1.6.3 		If num_bond .ge. 2 .AND. 2 bonds are aromatic then atom_type 
		is C.ar

1.6.4 		If ( num_bond .eq. 1 .OR. num_bond .eq. 2 ) .AND. one bond is 
		triple then atom_type is C.1

1.6.5 		If element_symbol is C and none of the above then atom_type 
		is C.2
 
1.7 	If element_symbol is O then

1.7.1 		If num_nonmet .eq. 1 then

1.7.1.1 		If bond is to carbon .AND. carbon forms a total of 3 
			bonds, 2 of which are to an oxygen forming only 1 
			non-metal bond then atom_type is O.co2

1.7.1.2 		If bond is to phosphorus .AND. phosphorus forms at 
			least 2 bonds to an oxygen forming only 1 non-metal 
			bond then atom_type is O.co2

1.7.2 		If num_bond .ge. 2 .AND. all bonds are single then atom_type 
		is O.3

1.7.3 		If element_symbol is O and none of the above then atom_type 
		is O.2
 
1.8 	If element_symbol is N then

1.8.1 		If num_nonmet .eq. 4 .AND. all bonds are single then atom_type 
		is N.4

1.8.2 		If num_bond .ge. 2 .AND. 2 bonds are aromatic then atom_type 
		is N.ar

1.8.3 		If num_nonmet .eq. 1 .AND. bond is triple then atom_type is N.1

1.8.4 		If num_nonmet .eq. 2 .AND. ( bonds are double, double .OR. 
		bonds are single, triple ) then atom_type is N.1

1.8.5 		If num_nonmet .eq. 3 .AND. one bond is to C=O or C=S then 
		atom_type is N.am

1.8.6 		If num_nonmet .eq. 3 then

1.8.6.1 		If one bond is not single then atom_type is N.pl3

1.8.6.2 		If all bonds are single then

1.8.6.2.1 			If one single bond is to an atom that forms a 
				bond of type double, triple, aromatic or 
				delocalised .AND. one other single bond is to H 
				then atom_type is N.pl3

1.8.6.2.2 			If one single bond is to an atom that forms a 
				bond of type double, triple, aromatic or 
				delocalised .AND. neither of the other single 
				bonds are to H .AND. sum_of_angles around 
				N .ge. 350 deg then atom_type is N.pl3

1.8.6.3 		If num_nonmet .eq. 3 otherwise then atom_type is N.3

1.8.7 		If element_symbol is N and none of the above then atom_type 
		is N.2
 
1.9 	If element_symbol is S then

1.9.1 		If num_nonmet .eq. 3 .AND. 1 bond is to an oxygen with only one 
		non-metal bond then atom_type is S.o

1.9.2 		If num_nonmet .eq. 4 .AND. 2 bonds are to an oxygen with only 
		one non-metal bond then atom_type is S.o2

1.9.3 		If num_bond .ge. 2 .AND. all bonds are single then atom_type 
		is S.3

1.9.4 		If element_symbol is S and none of the above then atom_type 
		is S.2
 
1.10 	If element_symbol is Ti .OR. element_symbol is Cr then

1.10.1 		If num_bond .le. 4 then atom_type is Ti.th or Cr.th

1.10.2 		If num_bond .gt. 4 then atom_type is Ti.oh or Cr.oh
 
1.11 	If element_symbol is none of the above then atom_type is element_symbol

 
2. If entry is not matched then for each atom
 
2.1 	If atom is suppressed then atom_type is Du
 
2.2 	If element_symbol is D then atom_type is H
 
2.3 	If element_symbol is P then atom_type is P.3
 
2.4 	If element_symbol is Co .OR. element_symbol is Ru then 
	atom_type is Co.oh or Ru.oh
 
2.5 	If element_symbol is C then

2.5.1 		If num_bond .ge. 4 then atom_type is C.3

2.5.2		If num_bond .eq. 1 then calculate bond_distance

2.5.2.1			If bond_distance .gt. 1.41A then atom_type is C.3

2.5.2.2			If bond_distance .le. 1.22A then atom_type is C.1

2.5.2.3			If bond_distance is none of the above then atom_type 
			is C.2

2.5.3		If element_symbol is C and none of the above then calculate 
		average_angle about C

2.5.3.1			If average_angle .le. 115 deg then atom_type is C.3

2.5.3.2			If average_angle .gt. 160 deg then atom_type is C.1

2.5.3.3			If average_angle is none of the above then atom_type 
			is C.2
 
2.6 	If element_symbol is O then

2.6.1 		If num_nonmet .eq. 1 then

2.6.1.1 		If bond is to carbon .AND. carbon forms a total of 3 
			bonds, 2 of which are to an oxygen forming only 1 
			non-metal bond then atom_type is O.co2

2.6.1.2 		If bond is to phosphorus .AND. phosphorous forms at 
			least 2 bonds to an oxygen forming only 1 non-metal 
			bond then atom_type is O.co2

2.6.2		If num_nonmet .eq. 0 then atom_type is O.3

2.6.3		If num_bond .ge. 2 then atom_type is O.3

2.6.4		If element_symbol is O and none of the above then atom_type 
		is O.2
 
2.7 	If element_symbol is N then

2.7.1 		If num_nonmet .eq. 4 then atom_type is N.4

2.7.2		If num_nonmet .eq. 1 then calculate bond_distance

2.7.2.1			If bond_distance .le. 1.2A then atom_type is N.1

2.7.2.2			If bond_distance .gt. 1.2A then atom_type is N.3

2.7.3		If num_nonmet .eq. 3 .AND. one bond is to C--O or C--S then 
		atom_type is N.am

2.7.4		If num_nonmet .eq. 3 otherwise then calculate sum_of_angles 
		around N

2.7.4.1			If sum_of_angles .ge. 350 deg then atom_type is N.pl3

2.7.4.2			If sum_of_angles .lt. 350 deg then atom_type is N.3

2.7.5		If element_symbol is N and none of the above then calculate 
		average_angle about N

2.7.5.1			If average_angle .gt. 160 deg then atom_type is N.1

2.7.5.2			If average_angle .le. 160 deg then atom_type is N.2
 
2.8 	If element_symbol is S then

2.8.1 		If num_nonmet .eq. 3 .AND. 1 bond is to an oxygen with only one 
		non-metal bond then atom_type is S.o

2.8.2 		If num_nonmet .eq. 4 .AND. 2 bonds are to an oxygen with only 
		one non-metal bond then atom_type is S.o2

2.8.3		If num_bond .ge. 2 then atom_type is S.3

2.8.4		If element_symbol is S and none of the above then atom_type 
		is S.2
 
2.9 	If element_symbol is Ti .OR. element_symbol is Cr then

2.9.1 		If num_bond .le. 4 then atom_type is Ti.th or Cr.th

2.9.2 		If num_bond .gt. 4 then atom_type is Ti.oh or Cr.oh
 
2.10 	If element_symbol is none of the above then atom_type is element_symbol


 
3. If entry has no crystal connectivity then for each atom
 
3.1 	If atom is suppressed then atom_type is Du
 
3.2 	If element_symbol is D then atom_type is H
 
3.3 	If element_symbol is P then atom_type is P.3
 
3.4 	If element_symbol is Co .OR. element_symbol is Ru 
	.OR. element_symbol is Ti .OR. element_symbol is Cr then
	atom_type is Co.oh or Ru.oh or Ti.oh or Cr.oh
 
3.5 	If element_symbol is C then atom_type is C.3
 
3.6 	If element_symbol is O then atom_type is O.3
 
3.7 	If element_symbol is N then atom_type is N.3
 
3.8 	If element_symbol is S then atom_type is S.3
 
3.9 	If element_symbol is none of the above then atom_type is element_symbol

D. Bond Types

CSD bond types in matched entries are mapped to SYBYL bond types as follows:

   CSD bond type		SYBYL bond type

	1	(single)		1
	2	(double)		2
	3	(triple)		3
	4	(quadruple)		un
	5	(aromatic)		ar
	6	(polymeric)		un
	7	(delocalised double)	un
	9	(pi)			un
In addition, the SYBYL bond type amide (am) is assigned to a bond from an amide nitrogen (N.am) to a carbon atom connected to oxygen with a double bond.

For unmatched entries, bond types are assigned as follows.

	Any bond from the following SYBYL atom types is single (1):

		H F Cl Br I C.3 S.3 N.3 N.4

	A bond between SYBYL atom types O.2 and C.2 is double (2).

	A bond between N.am and C--O or C--S is amide (am)

	All other bond types are unspecified (un)
No bond record is written for entries with no crystal connectivity.