C.AT
Table of atoms. This is the heart of C.MOL and represents an atom-centred
description with optional bonds (C.BO). This is perhaps driven by my
background as an inorganic crystallographer.
Elemental identity, atomic
positions and spacegroup are often necessary and sufficient to describe
what the substance is. Many theoretical chemists would agree that,
with the addition of the total electron count, everything else is opinion.
C.AT and C.BO can be used to give the molecular formula
(connectivity) by the use of attributes such as formal ligand count, number
of attached hydrogen atoms, formal charge, etc. Where possible, however,
we recommend that C.FORM is used since standardisation is likely to
be clearer in that format. The current conventions (SMILES and MOL) could be
expanded to include others.
C.AT/C.BO may be difficult to relate to C.FORM. Where C.AT
represents coordinate data, this might relate to multiple copies of a
molecule (as in crystallography where an asymmetric unit can contain
several identical molecules and all the coordinates must be included so that
the crystal structure can be recreated.) A related problem is where some of
the atomic coordinates are not determined, a frequent occurrence in some
techniques.
C.AT and C.BO are linked by the SERID attribute. This need not be an
integer, and could be a construct such as CA15. If the tables are edited or
modified it will be important to make sure that consistency is obtained and
that SERIDs are always unique.
The content model is simple: an optional description (X.DESC), followed
by a number of (column) arrays all of length equivalent to the number of atoms.
Each X.ARR corresponds to an atomic attribute. The semantics of the
attribute is given by one of two mechanisms:
- hardcoded: A number of key attributes describe
what an molecule is rather than our opinion or calcaulations.
- links to glossaries (use of HREF with REL='glossary').
The actual enumeration of the attributes are given in a file 'builtin.ent'
and this is definitive, rather than what is written below (although hopefully
they are in sync!). It contains:
&mol_arr_builtin;
The semantics of the hardcoded atom attributes are:
- SERID (optional) serial number/id.A serial number is
a valuable way of identifying an atom and should be unique. It is not
mandatory unless C.BO is used. SERID has no implied semantics and could be
C123A (CCDC), GLU13CA (PDB), etc.
- ELSYM ELEMENT symbol. The element symbol is very important
and takes precedence over other methods of specifying the element (such as
number). It MUST relate to a standard table of elements and is 2-letters.
Additionally allowed elements are: D, T (hydrogen); * (any atom); ? (unknown);
DD (dummy); EP (electron pair); E (electron).
- ELNUM ELEMENT (atomic) number. The atomic number is
subordinate to the element symbol, and cannot deal with dummy, etc.
- X2 2-D X-coordinate. The X-coordinate of an atom in a
conventional chemical structure diagram. This is in arbitrary units and
will have no relation to the 3-D coordinates.
- Y2 2-D Y-coordinate. The corresponding Y-coordinate.
- X3 3-D X-coordinate (Cartesian, A). The 3-dimensional
Cartesian coordinate of the atom in Angstrom units. Note that without an
orthogonalisation matrix it is normally impossible to recreate Fractional
coordinates from Cartesian ones, where this is meaningful.
- Y3 3-D Y-coordinate (Cartesian, A).
The corresponding Y-coordinate.
- Z3 3-D Z-coordinate (Cartesian, A).
The corresponding Z-coordinate.
- XF 3-D X-coordinate (Fractional).
The X-coordinate of an atom in fractions of the corresponding unit cell length.
Fractional coordinates only have meaning for a molecule located in a
unit cell (C.CRYS). They are required if the symmetry opertions of the
unit cell are to be applied to the molecule. Cartesian coordinates can
be obtained from Fractional with an orthogonalisation matrix, but there
are several conventions and you you state which one is used.
- YF 3-D Y-coordinate (Fractional).
The corresponding Y-coordinate.
- ZF 3-D Z-coordinate (Fractional).
The corresponding Z-coordinate.
- ZL Z-matrix length. The molecular geometry can be
represented by internal coordinates (bond lengths, valence angles and torsional
angles.) Note that these do not have to involve atoms bonded in the
conventional way. Each atom requires the SERIDs of three other atoms
(Q1, Q2 and Q3). The position of the current atom is such that
length SERID-Q3 is ZL, angle SERID-Q3-Q2 is ZA and torsion SERID-Q3-Q2-Q1 is
ZT. Some atoms may require some or all of Q1, Q2 or Q3 to be dummy atoms.
- ZA Z-matrix angle. See above.
- ZT Z-matrix torsion. See above.
- Q1 First Z-matrix atom. See above.
- Q2 Second Z-matrix atom. See above.
- Q3 Third Z-matrix atom. See above.
- DISORDDisorder code. Application dependent at present.
- PROTATProtein atom type. For compatibility with PDB.
PROTAT has values of "SG" "CG1" etc. In the fullness of time this might
become obsolete. AT PRESENT PDB TYPES ARE USED AS WELL
- CHAINChain ID. Application dependent at present.
- RESIDResidue ID. Application dependent at present.
- OCCOccupancy. At present this is application dependent.
(There is often confusion between atoms which are not at full occupancy, and
atoms on symmetry elements. The present value is the value of the
occupancy after any symmetry elements have been applied.)
- TOTL Total number of ligands. When C.AT is being used to
describe connectivity, the formal number of ligands may be useful. This
is what might appear in a chemical structure diagram and may bear no relation
to the proximity of atoms in 3-Dimensional space.
It is often conventional to split the ligands into hydrogen atoms and others
because many chemical structure diagrams and many connection tables are
hydrogen-suppressed. Note that bridging hydrogens (as in electron-deficient
compounds) and isotopically substituted hydrogen atoms may need explicit
inclusion here.
- NONH number of NON-H ligands. See above.
- NUMH TERMINAL hydrogen count. See above.
- PARITYATOM parity (-1,0,+1). We strongly recommend that
stereochemistry and chirality are approached through the use of chiral volumes
rather than descriptors such as CIP or annotations to a 2-Dimensional diagram.
For a chiral volume, 4 atoms must be described, and it is most common that
these represent 4 ligands to a central atom such as C. However some or all of
the atoms (including the central one) may be dummy atoms as, for example, in
the biphenyls where a dummy atom could be placed halfway between the benzene
rings. Since only the sign of the volume is required, accurate placement of
dummy atoms is unimportant.
The chiral volume of a tetrahdron with 4 vertices at X1,
X2, X3, X4, is given by the determinant:
|1 1 1 1 |
|x1 x2 x3 x4| /6
|y1 y2 y3 y4|
|z1 z2 z3 z4|
The four atoms representing the corners of the tetrahedron (PID1-PID4)
must be specified. For atoms without described parity, these fields
should be NULL.
- PID1SERID of atom at vertex 1. See above.
- PID2SERID of atom at vertex 2. See above.
- PID3SERID of atom at vertex 3. See above.
- PID4SERID of atom at vertex 4. See above.
- FCHG FORMAL atom charge. The formal (integral)
charge on the atom,
used only to determine the chemical identity of the molecule. The sum of the
formal charges should represent the charge on the molecule.
- ISOT Isotope number. The isotope number of the element.
This will normally be an integer, rather than an accurate atomic mass.
Content
- x.arr -- A very flexible matrix/array/geometry container.
- x.html -- A hypertext container for use in XML and CML.
ATTRIBUTES
CONTENT DECLARATION
- Tag Minimization
-
Open Tag: REQUIRED
Close Tag: REQUIRED
Parent Elements
- c.mol -- Toplevel container for molecular information.
Top Elements
All Elements
Tree
cml DTD