In general terms future development of MMQL is aimed at providing generic features for a more rigorous treatment of the detection and classification of structure features. To achieve this goal we propose to extend MMQLlib to support new types of methods and variables and extending functionality of existing variables as well. The current variable is a composite object representing macromolecular data and various selections of those data according to query specifications. Currently MMQLlib supports multiple selections of data as required by multiple variables and multiple pattern occurences within one variable. Further extension of variable functionality is to support searches for parametrized patterns, i.e. patterns with multiple states determined by alterations in possible "degrees of freedom". This obviously provides basis for production of various kinds of structural and functional assignments (e.g. secondary structure assignment, topological motifs assignment), i.e. established classifications of features matching predefined group of generalized patterns.
Another direction is introducing new variable types to support different types of relationships on macromolecular data. The first candidate is specialized object for representing identity relationships between macromolecules (##ALIGNMENT variable type, we reserve two '#' symbols for variable types in MMQL, and the only currently existing type of variable should become ##COMPOUND type). Another example of new variable type is a specialized object for representing macromolecules in terms of domain topology. That may result in ##MOTIF variable type, which provides basis for adequate description of features like topological class of beta-sheet and beta-sheet to beta-sheet or beta-sheet to alpha-helix interactions. Also similar approach can be taken for handling macromolecular surface representations.
While populating variable types it is becoming important to maintain completeness of MMQL syntax, which means any valid expression in terms of MMQL grammar has biological meaning and can be executed on available data. To handle that the concept of dynamic binding between heterogeneous objects and methods is to be employed.
Extending the method repertoire of MMQL the focus now is not just on covering basic functionality, but to have well specialized methods taking advantage from composite data model. First, further generalization of existing methods should be considered. Thus existing in MMQL PropertyPattern, HBondingPattern and DihedralAnglePattern methods provide a basis for developing of generalized PropertyPatternAssignment method which will incorporate properties in a manner of linear form, allowing to search sequence and/or structural fragments mathching patterns described in terms of weighted properties schema. Further generalization of ContactPattern method of MMQL may result in more advanced InteractionPattern method which does not simply require of two polypeptide fragment to be in contact but rather specify fine details of interaction, particularly distinguishing main-chain side-chain interactions, helical and extended conformation residues clustering peculiarities and specificity of interresidue contacts as described by statistical interaction preferences.
Another group of methods will be developed to facilitate superfamily classifications and handling relationships between heterogeneous data. We should mention that "developed" here does not necessarily mean development of new algoritms, but also adaptation of existing ones into framework. Thus, there is no special need to develop new algorithm for detecting identity relationships (alignment) since already an ample set of algorithms available. But incorporating methods into framework the advantage of having universal protocol when query heterogeneous data is gained.
Some specialized methods for data analysis are also required to fulfil the query functionality. Thus, Statistics method is proposed to evaluate significance of matching of two variables. This, for instance, can be used for developing of a new protein structure prediction method (if one variable is assignment based on structure properties and another just on sequence) or for similarity significance. The actual statistical procedure to be used depends on input variables i.e. binded during query execution and results can be stored (in specialazed ##Statistics variable) and eventually visualized with appropriate viewing tool (need to be developed).
Other methods required to maintain data conversion and manipulation need to be revealed and implemented.