C.MOL
The content model of a C.MOL (molecule) allows for considerable flexibility
in storage.
Among the molecular properties and data it can handle are (in order):
- A description (X.HTML).
- Any of the following (in any order):
- Molecular formula and/or connection table (C.FORM) (Repeatable).
- Molecular symmetry (C.SYMM).
- Crystallographic data (including cell dimensions, spacegroup and
experimental data) (C.CRYS).
- Molecular chirality information (C.CHIR).
- Macromolecular sequence (C.SEQ) (Repeatable)
- Macromolecular features (C.FEAT) (Repeatable)
- Atoms and their attributes (C.AT), followed by (optional) bond
information (C.BO) and any number of coordinate sets (e.g. for
conformations, dynamics, NMR, etc. (C.COOR).
- Any mixture of any of the following any number of times:
- Bibliography (X.BIB).
- Data blocks (X.LIST).
- Figures (X.FIG).
- Free text or foreign files (X.FRE).
Although many of these could also be held in an XML file without MOL.DTD,
the containment within a molecule is very well suited to molecular
databases (e.g. crystallography) where all data is "attached" to a molecule.
NOTE: The use of the term 'molecule' is not meant to imply anything
about the bonding model or physical nature of the thing in question.
C.MOL
can be used to hold data on extended solids (such as NaCl) or van der Waals
complexes. The bonding model is kept simple to emphasise that for many
molecules there need to be additional semantics to specify it adequately.
The simple model may be refined over time.
The primary use of C.MOL is to provide at least one way of accurately
conveying the precise nature and identity of the substance. This may not
always be the best or most efficient.
The present limitations of C.MOL are:
- Only one molecule can be stored per C.MOL. (It is possible to store
disjoint molecules, such as complexes or salts with simple ratios). Mixtures
would be best described by defining two or more molecules and using links
(A) embedded in hypertext.
- There are no descriptors for generic molecules, such as substructures,
Markush, search queries, etc. For these we shall need a grammar.
- It cannot deal with reactions. These can be partially dealt with by
hypertext and references, but this needs to be developed.
Content
- c.at -- A generic container for atomic coordinates and properties
- c.bo -- A generic container for bonds and their properties
- c.chir -- Representation of the molecular chirality.
- c.coor -- Additional coordinates for the atoms.
- c.crys -- Crystallographic data, especially unit cell and symmetry.
- c.feat -- Features of macromolecules (e.g. SITE, MUTATION).
- c.form -- Chemical formula.
- c.seq -- Represents a macromolecular sequence.
- c.symm -- Molecular symmetry.
- x.bib -- A bibliographic entry.
- x.fig -- A figure, possibly in encoded binary.
- x.fre -- Free text for any purpose, including foreign or encoded files.
- x.html -- A hypertext container for use in XML and CML.
- x.list -- A very flexible generic list/tree/table container.
ATTRIBUTES
CONTENT DECLARATION
- Tag Minimization
-
Open Tag: REQUIRED
Close Tag: REQUIRED
Parent Elements
- cml -- A toplevel DTD encompassing HTML 2.0, XML and MOL.
- x.list -- A very flexible generic list/tree/table container.
Top Elements
All Elements
Tree
cml DTD