ACA Meeting 1995 CIF Workshop Final Report

Prepared by (SDSC). Approximately 40 people were in attendance.

Syd Hall (U. of Western Australia), the founder of CIF, then elaborated on the history of the Self-defining Text Archival and Retrieval (STAR) format from which CIF is derived as well as CIF itself. STAR is a simple set of rules from which a Dictionary Definition Language (DDL) can be derived. The DDL defines the form of the various dictionaries and as Hall pointed out is critical to the development of good software. He then went on to describe CIFtbx (CIF Toolbox) a set of Fortran routines for the basic reading and writing of CIF files [SH URL, ftp site?]. He emphasized the need for further software development particularly in the reading and browsing of CIF files. Brian McMahon (IUCr) then provided some insight into CIF processing at the IUCr offices in Chester and the software that has been developed to process CIF-based submissions to Acta C. Of the 582 submissions to Acta C this year, 76% were CIF's that included the text of the paper. Any submissions in hardcopy form are first turned into a CIF and all submissions undergo data checking and subsequent format conversion into an easily read form. Once accepted a paper may automatically be converted into a final typeset version in the style of Acta C. McMahon also reported on three recent developments of great benefit to authors. First, was the availability of a booklet titled "A Guide to CIF for Authors." Second, was the ability to submit a CIF file by email to checkcif@iucr.ac.uk. The submitter will receive back a detailed report of errors and potential errors found. This includes syntax errors, data items not conforming to the official dictionaries (a potential error), missing data items, data items with unusual values, derived values, and an indication that higher symmetry than reported may exist. Once the entry has passed checkcif, a third development is the ability to send the CIF to printcif@iucr.ac.uk. Princif produces a PostScript formatted version of the file with any problematic values highlighted. This formatted version is not that used by the journal, but is easy to read and correct.

Paula Fitzgerald (Merck) who is chairperson of the mmCIF Working Group presented an overview of the mmCIF dictionary and an update of recent progress. The current mmCIF dictionary is over 20,000 lines and contains a description for several thousand data items organized into over 300 categories. The categories are further subdivided into category groups to provide a hierarchical representation tracing the progress of the crystallographic experiment and the subsequent structure. While the development of the dictionary had taken five years of volunteer effort, it should be acknowledges that the dictionary is comprehensive and provides a very useful reference work. The draft mmCIF dictionary is finished except for the incorporation of changes recognized by the community and final editorial checking by COMCIFs. The current draft dictionary can be found as an ASCII file at the Web site http://ndbserver.rutgers.edu/mmCIF. Also present are limited examples and introductory material. The contents of the site will be improved in the near future. A list server has also been established, to subscribe send an email message to Eldon Ulrich (Univ. of Wisconsin) described progress with the NMR dictionary (NMRif). This project began on April 10, 1994 with a workshop, "Biological Macromolecular NMR Data Exchange and Archiving," organized by members of BioMagResBank and the mmCIF committee. The CIF and mmCIF concept and its development was described to scientists from the NMR community and proposed as a format for macromolecular NMR data exchange. BioMagResBank under the direction of Eldon Ulrich agreed to undertake the task of developing the dictionary with the help and advice of volunteers from the NMR community. The dictionary is represented as a relational database using a schema design tool called Opossum. Development of the relational schema has been carried out in collaboration with Miron Livny and Yannis Ioannidis, computer scientists at the Univ. of Wisconsin. The current dictionary contains over 260 tables or categories comprising more than 750 unique data names. While no estimate was given as to when the dictionary will be completed, it is expected to grow to 2-3 times its current size. It was suggested that Opossum could be used as a graphical tool for representing and browsing the formidable mmCIF dictionary.

Gotzon Madariaga (Uni. del Pais Vasco) presented the main features of the modulated structures dictionary that he is developing, a draft of which has been submitted to the IUCr and which is a superset of all the data items defined by the Commission on Aperiodic Crystals. Interestingly he raised several issues relating to problems with CIF which had also been noted by the mmCIF developers. Notable was the need to provide a reference between related blocks of data, possibly contained in different files. The lack of nested loops in CIF (they are available in STAR) was also raised as the lack of these loops makes it more cumbersome to represent the data.

The final morning session was given by Vivian Stojanoff (Brookhaven National Laboratory) began the afternoon session with a discussion of an extension to the mmCIF dictionary for representing structure factors. While the current structure dictionary is not considered complete, it does describe more than the basic h,k,l,F and sigmaF found in many existing PDB submissions. Currently there is support for data from multiple derivatives and wavelengths.

John Westbrook (Rutgers Uni.) described the Dictionary Definition Language Syd Halli returned to describe the program Xtal_GX which provides a general approach to processing CIFs and uses the CIFtbx he had described earlier in the day. It also has a graphics feature for display purposes.

The final formal presentation of the day was given by Weider Chang (Columbia Uni.) who described the next generation object oriented tools he is developing with Bourne for basic CIF manipulation. Chang presented the basic layout of the library which is written in Objective C and from which browsers, a CIF2HTML convertor and several dictionary checking tools have been developed. The CIF2HTML convertor was used in the ACA abstract procedure.

A lively discussion followed the formal presentation in which the need to recognize that subset of mmCIF data items which constitutes a formal entry to the PDB figured prominently. Attendees seemed well satisfied with the days proceeding and hopefully some of them will be stimulated to develop other dictionaries and new and useful software.