3.6 Similarity Searching
A similarity search compares the chemical attributes of a query structure with
those of each database entry (Vol.1, p.6-39 to 6-52).
The technique is useful since:
Two coefficients are available:
- Relevant results are produced whether or not the query structure exists in
- Close chemical neighbours are `hit' without any decisions about chemical
substitution having to be made.
- It allows `browsing' of CSD, and a chance of finding unexpected
- TANImoto (Jaccard) coefficient
- DICE coefficient
Both yield values between 1.0 for maximum similarity (identity) and 0.0
2D Similarity Search Procedure
- Draw structure in BUILD menu and define it as precisely as possible by specifying
exact hydrogen counts, bonds types, etc. using commands in 2D-CONSTRAIN sub-menu.
- Select the similarity coefficient to be used: TANI or DICE in 2D-CONSTRAIN sub-menu.
- Select SIMIL in the 2D-CONSTRAIN sub-menu to define fragment for similarity search.
- Select STOP-LIMIT in SEARCH menu, to specify number of hits, eg. 30.
- Start search by selecting HITALL and START commands.
- At the end of search the top n hits are ranked by decreasing values of similarity
coefficient and a suub-database of these n hits is saved for subsequent searching.
- Initiate QUEST3D in graphics mode (Section 3.1).
- Search CSD to find the 30 structures which are most similar to
o-aminophenol using the TANI similarity coefficient:
1. Draw fragment in BUILD menu:
- Select AROM bond type and RING to draw phenyl group.
2. Constrain O and N to be NH2 and OH:
- Select 2D-CONSTRAIN
- Select HYDROGENS, type 1, <R>, select O
- Reselect HYDROGENS type 2 <R>, select N
3. Specify fragment as precisely as possible:
- Select GENERATE-HYDS to add phenyl hydrogens.
- Select ALL-EXACT.
- Select CYCLICITY, and reselect it, so A appears in cyclicity box, select
C-O and C-N bonds.
4. Fragment definition is now complete:
Initiate CSD Similarity Search: Follow steps 4 to 6 in the `search
procedure' described above.