Volume 1 Chapter 6 Menu and Sub-menu Summary

Back to Table of Contents

6.4 2D Similarity Searching

Qualitatively, one can understand that o-aminophenol, m-aminophenol and p-aminophenol are very similar to each other, differing only in the relative positions of the functional groups; likewise the nitrophenols are similar to each other and to the aminophenols. Thus intuitively one can grasp the idea that similarity in respect of connectivity representations involves an assessment of rings, numbers of substituents, nature of substituents, relative positions of substituents, etc., etc.

Similarity searching has certain advantages over normal *CONN searches:

In computer terms 2D similarity searching involves the comparison of an input connectivity record, referred to as the query, with the connectivity records of every entry in the database, the targets.

A quantitative measure of the similarity between the query and each of the targets can be calculated.

This measure is known as a similarity coefficient. The procedure implemented within QUEST and QUEST3D is based on the work reported by Willett et al. (J.Chem.Inf.Comput.Sci. 26,36,1986).

The similarity coefficient is derived from the chemical connectivity bit screens (bits 249-682 in Appendix 1) of the query structure and each of the target structures.

A variety of similarity coefficients can be used to express the similarity between two bit-maps of equal length and a full discussion is presented by B.Everitt (Cluster Analysis; Halsted-Heinemann:London, 1980).

Two similarity coefficients are available in QUEST and QUEST3D:


Nq= the number of bit screens set in the query structure

Nt= the number of bit screens set in the target structure

Nc= the number of bit screens which are common to both the query and target structures.

The Tanimoto coefficient is: T= Nc/(Nq+Nt-Nc)

The Dice coefficient is: D= 2Nc/(Nq+Nt)

The similarity search procedure can be summarised thus:

Back to Table of Contents

Volume 1 Chapter 6 2D Similarity Searching in Basic Quest.