MMQLlib - MMQL C++ class library

The basic components of the MMQL query language map into the MMQLlib class hierarchy. The two most important components of MMQLlib are Set and Expression classes (Figure. MMQLlib classes.). Set classes describe the results of both intermediate and final search requests, whereas Expression classes model the actual search request.

Set Class Design

The most important role in supporting the Set class belongs to the ListElement class, since the ListElement class is responsible for maintaining the hierarchy of macromolecular components (Figure. Set object design (a).). ListElement represents macromolecular objects from the following classes Compound, Entity, SubEntity, and Atom. Theoretically any other macromolecular component, for example, a functional site, represented as a NonLinearSelectList by PDBlib Chang et al., (1994), but not currently represented by MMQLlib, could be represented by ListElement. The current limitation is imposed by the implementation of Expression which only recognizes the ListElement constructs specified. This has the advantage of making it easier to maintain a uniform protocol between various query methods, but leads to inefficiencies when considering queries with no ListElement representation. A practical example of this would be trying to find all structures exhibiting a catalytic triad as found in the active site of serine proteases. The PDB entry reports this information in the SITE records which are represented in the Site class (derived from the MultiEntitySubStruc class) in PDBlib, but not currently used by MMQLlib.

The details of the ListElement class for Atom are shown in (Figure. Set object design (b).). The class has a member object\_ which points to the appropriate macromolecular data object, thus providing communication between a query result and the data it represents as modelled by PDBlib. Communication to PDBlib is via pointers at each corresponding level (Figure. Set object design (a).).

Member listSelection maintains handles to Variables and Selections in Variables as integer codes of Selections which are actually defined in the SelectionList of the Set class (Figure. Set object design (a).). The total number of Variables for this particular macromolecular object, ListElement, is defined by the member nSelection. Four ListElement type pointers (prev_, next_, up_, and down_) provide communication between instances of ListElement.

When a certain macromolecular object is no longer needed (i.e. nSelection is zero) the corresponding ListElement object is simply removed from the list, thus increasing performance and reducing memory usage. At the conclusion of the query the Set object includes only those ListElement objects which represent the results of a query.

Another important feature of the ListElement object is propagation of selection into related objects. Thus when the Compound object is selected all Entity, Subentity and Atom objects are also selected. From a query perspective this implies an unnecessary overhead. For example, when querying all structures at better than a certain resolution, why not just return the compound names and not the complete structure representation to the level of the atom? The premise is that selected structures will be displayed with PDBquery and so Entity, Subentity, and Atom objects will be needed and are also returned. A proposed compression mechanism for managing a subset of data on each compound is discussed subsequently.

Expression Class Design

The Expression class design (Figure. MMQLlib classes.) seeks to provide an extensible framework which, from a user's perspective, a large variety of queries can be posed and, from a programmer's perspective, new methods can be added.

At the Expression class level only members representing input and output Variables are provided. At the next level are three types of query expressions; Pattern, Variable, and Filter. From the user's perspective Pattern provides the most substantive queries. Generic methods for parsing the appropriate query script statements, including verification of Specificators and Elements are provided here:


	class Pattern : public Expression
	{
	public:
	  PatternTextSpecificator * textSpecificator;
	  PatternIntSpecificator * intSpecificator;
	  PatternDoubleSpecificator * doubleSpecificator;
	  PatternElement * firstElement;
		 	...
	  virtual void search(ListElement *){};
	};

It means that any pattern type with an inheritance relationship with Pattern needs only to provide the definitions for Specificators and Elements to make this pattern a functional part of the query language. Specificators are represented in three lists, textSpecificator, intSpecificator, and doubleSpecificator corresponding to the three allowable data types. The example of a constructor for the PropertyPattern class demonstrates how Specificators are defined:


	class PropertyPattern : public SubentityPattern
	{
	public:
	  char * propertyType;
	  int minLength;
	  int maxLength;
	  int averaging;

	  PropertyPattern();
	  void addDoubleElement();
	  void addDoubleItem(double);
	  double propertyValue(SubEntity *);
	  void searchSubentities(ListElement **, int);
	};

	PropertyPattern::PropertyPattern()
	{
	  static char * name_={"PropertyPattern"};
		...
	  static PatternIntSpecificator intSpecificator_[]=
	    {"minLength", NULL,  5, 1, 1000,
	     "maxLength", NULL, 15, 1, 1000,
	     "averaging", NULL,  5, 1, 1000,
	     NULL, NULL, NULL, NULL, NULL};
		...
	}

Elements of patterns are arranged according to specific features of the pattern. In PropertyElement for instance, the two elements minValue and maxValue define the range of property values:


	class PropertyElement : public PatternElement
	{
	public:
	  int itemCount;
	  double minValue;
	  double maxValue;
		...
	};

The search method of the Pattern class is a generic base method used in searching with specialized Pattern classes. The method takes one ListElement representing a Compound and organizes the iteration through all descending components, namely (Entity, SubEntity, and Atom) down to the level required for this type of pattern. During iteration the data required by the specific pattern are passed to the appropriate method of that pattern. If a new occurrence of the pattern is detected in the data a new Selection is formed for the pattern output Variable and all applicable data components are selected.

A generic Filter class handles common features of filter-type expressions for extracting subsets of data based on features such as compound name and resolution:


	class Filter : public Expression
	{
	public:
	  char ** operators;
	  char * filterObject;
	  char * filterAttribute;
	  char * filterOperator;
	  ValueType type;
	  char * textValue;
	  int intValue;
	  double doubleValue;

It provides operators members to store the list of allowable operators for the specific filter and textual strings filterObject and filterAttribute which store strings for filter object and filter attribute, respectively. These strings are not required for pattern searching but are required for parsing and script generation. The current operator is stored in the filterOperator member. Member type tells which type of values this filter is using and its value is supplied in one of the members textValue, intValue, doubleValue, depending on the data type.


Back to The MMQL Home Page