Volume 1 Chapter 1 The CSD

Back to Table of Contents

1.3 A Statistical Survey of the CSD

The following graphs show the growth in the contents of the database.

The first graph shows the increase in the number of new entries input to the CSD by publication year from 1965 to 1990.

The second graph shows the increase in the average number of atoms in a structure for each publication year from 1965 to 1990.

The best picture of the increasing size of the CSD is given in the final graph by plotting the number of megabytes of information stored in the master archive for a given publication year. It shows, for example, that the amount of information processed for 1990 entries was approximately equal to the total for the period 1965-1977, and almost double that for 1984.

It can be seen that the CSD is a growing entity, hence the graphs above and the statistics presented below can only reflect its content at a single point in time, October 1992. However, with over 100,000 structures reported over a period of some 60 years, it is unlikely that gross changes in relative statistics will occur overnight! Hence, the information in this section provides a general overview of the growth of small-molecule crystallography, the types of compounds that have been studied, and a variety of other global facts concerning the CSD itself.

Overall Statistics

Number of entries                                            102589
Number of compounds 	                                      91325
 
Number of entries with 3D-coordinates                         90315
Number of error-free 3D coordinate sets                       88565
Number of entries with errors corrected by CCDC               11520
 
Number of atoms with 3D coordinates                         4737722
 
Number of X-ray studies                                      101803
Number of neutron studies                                       786
Absolute configuration by X-ray methods                        2635
Low temperature studies                                       10795
 
Number of different literature sources                          634
 
Number of entries with perfect connectivity matching          76537
Number of entries with partial connectivity matching           8793
Number of entries with no connectivity matching                2990
Number of entries for which matching is impossible            14269

Chemical Class Statistics

 
Classes     Types of Compounds             Number of Entries    %
 
1 - 12      Simple aliphatics                           5471   5.3
13 - 23     Monocyclic hydrocarbons                     5223   5.1
24 - 31     Polycyclic hydrocarbons                     4337   4.2
32 - 42     Heterocyclic compounds                     16543  16.1
43 - 59     Natural products                           12356  12.0
60 - 61     Molecular complexes,clathrates              2811   2.7
62 - 70     Main group compounds                       11837  11.5
71 - 75     Transition metal complexes (sigma,pi)      17603  17.2
76 - 86     Transition metal complexes (coordination)  26408  25.7

Precision of Structural Results

R                Precision  Number of Entries     %
 
1 - 3            Exceptional       6778          6.6
3 - 4            Very high        16976         16.5
4 - 5            High             21225         20.
5 - 7            Good             29400         28.7
7 - 9            Average          13422         13.1
9 - 10           Fair              3564          3.5
10 - 15          Poor              6482          6.3
15 and over      Bad               1656          1.6
Not reported     ?                 3086          3.0

Thus some 72% of the entries in the CSD would be judged as of good precision or better (R < 7%). A further 17% (making a total of 89% with R < 10%) would be judged as adequate for most molecular modelling applications.


Back to Table of Contents

Volume 1 Chapter 1 The Basic System.