Volume 1 Chapter 6 Menu and Sub-menu Summary
Back to Table of Contents
Qualitatively, one can understand that o-aminophenol, m-aminophenol and
p-aminophenol are very similar to each other, differing only in the relative
positions of the functional groups; likewise the nitrophenols are similar to
each other and to the aminophenols. Thus intuitively one can grasp the idea
that similarity in respect of connectivity representations involves an
assessment of rings, numbers of substituents, nature of substituents, relative
positions of substituents, etc., etc.
Similarity searching has certain advantages over normal *CONN searches:
- It produces relevant results whether or not the query structure exists in
the database.
- It yields close chemical neighbours of the query structure without the user
having to make decisions about the level of substitution to be allowed.
- It offers the possibility of "browsing" in the database and the chance of
finding unexpected structural relationships.
In computer terms 2D similarity searching involves the comparison of an input
connectivity record, referred to as the query, with the connectivity
records of every entry in the database, the targets.
A quantitative measure of the similarity between the query and each of
the targets can be calculated.
This measure is known as a similarity coefficient. The procedure
implemented within QUEST and QUEST3D is based on the work reported by Willett
et al. (J.Chem.Inf.Comput.Sci. 26,36,1986).
The similarity coefficient is derived from the chemical connectivity bit
screens (bits 249-682 in Appendix 1) of the query structure and each of the
target structures.
A variety of similarity coefficients can be used to express the similarity
between two bit-maps of equal length and a full discussion is presented by
B.Everitt (Cluster Analysis; Halsted-Heinemann:London, 1980).
Two similarity coefficients are available in QUEST and QUEST3D:
- (i) the Tanimoto coefficient (also known as the Jaccard coefficient) -
this is the default.
- (ii) the Dice coefficient
- Suppose
- Nq= the number of bit screens set in the query structure
Nt= the number of bit screens set in the target structure
Nc= the number of bit screens which are common to both the query and target
structures.
The Tanimoto coefficient is: T= Nc/(Nq+Nt-Nc)
The Dice coefficient is: D= 2Nc/(Nq+Nt)
The similarity search procedure can be summarised thus:
- Define the query structure as precisely as possible, specifying exact
hydrogen counts, bond types etc.
- Specify the number of hits, eg. the 50 target structures most similar to
the query structure.
- Specify the similarity coefficient to be used, Tanimoto or Dice.
- Initiate the search process.
- At the end of the search the top 50 hits are displayed, ranked by
decreasing values of the similarity coefficient, and a sub-database of these
50 hits is saved for subsequent searching.
- It is strongly recommended that the examples described under BASIC
QUEST are studied before proceeding to GRAPHICS QUEST3D.
Back to Table of Contents
Volume 1 Chapter 6 2D Similarity Searching in Basic Quest.