pdb-l: Machine detection of sequence microheterogeneity
Eric Martz
emartz at microbio.umass.edu
Wed Dec 7 10:21:07 PST 2005
What is the best way for a program, reading a PDB file, to distinguish
sequence microheterogeneity from insertions in the sequence numbering? Both
involve "insertion codes" in the ATOM records (column 27), but in the
former case, they represent alternate residues at the same position in the
chain some times using insertion codes, others just with the same number,
while in the latter, they represent sequential residues.
At http://www.rcsb.org/pdb/docs/format/pdbguide2.2/part_35.html is stated
for SEQRES:
"In case of microheterogeneity, only one of the sequences is presented. A
REMARK is generated to explain this and a SEQADV is also generated."
PDBv3 (as of 2009.05.24) says at http://www.wwpdb.org/documentation/format32/sect3.html#SEQRES:
"Microheterogeneity is to be represented as a variant with one of the possible
residues in the site being selected (arbitrarily) as the primary residue.
The residues which do not match to the UNP reference will be listed in
SEQADV records with the explanation of “microheterogeneity”.
1H9H describes microheterogeneity in SEQADV and REMARK 999, but has none in
its ATOM records. Instead, it has three sequence insertions!
I don't see that now (maybe due to PDB remediation?), it has ATOM
records for residues C & S at several positions along the chains,
that share the residue number without insertion code.
The pre-2007 remediation file* is in the same state.
(*) obtained from http://www.umass.edu/microbio/chime/pe_beta/pe/protexpl/unremed.htm
1DIN specifies microheterogeneity in REMARK 6, but gives no ATOM
coordinates for the alternate residue.
No; it has both CSD 123 and Cys 123 in the ATOM section, only CSD in SEQRES.
Even in the pre-2007 remediation file* REMARK 6 says that coordinates
are provided for both residues.
1AL4, 1CBN, and 1ETA have microheterogeneity in their ATOM records, but no
mention of it in SEQADV. Instead, 1AL4 describes it in COMPND
OTHER_DETAILS, while 1ETA and 1CBN describe it in REMARK 4, and 1ETA also
in FTNOTE 1.
1TAB describes microheterogeneity in REMARK 4 for three positions, 184,
188, and 221. In the ATOM records, GLY 184A precedes(!) TYR 184, but both
are in the SEQRES, as though an insertion rather than microheterogeneity.
The alpha carbons have different positions, and the two residues are
peptide-bonded. The same pattern occurs at the other two positions. I don't
understand why this is described as microheterogeneity!
Agreed. This is still so in current PDB, and they are processed and
displayed correctly as insertions. Same happens at Gly188a/Lys188
and Ala221a/Gln221.
Clearly, SEQADV cannot be relied upon to indicate the presence of
microheterogeneity in the ATOM records.
Possible Method I: Compare the sequence in SEQRES with the sequence in ATOM
records. In the few cases I have examined, residues with insertion codes
representing microheterogeneity do not appear in SEQRES. In contrast, for
sequence insertions (e.g. 1QKZ, 1H9H) the SEQRES contains all the residue
with insertion codes.
Implemented. Microheterogeneity is interpreted when a residue in ATOM
is absent from SEQRES, and it is grouped inside [] with the previous (and
next) residue(s) based on their identical residue number,
with or wthout insertion code present.
Possible Method II: Compare the coordinates for the alpha carbon atoms. In
1CBN, they are identical for 22 vs. 22A, and 25 vs. 25A. But in 1ETA, they
are slightly different at position 30, 0.484 Angstroms. So, if the alpha
carbon distance is less than 1 Angstrom, consider it microheterogeneity?
Not implemented.
Method II seems simpler to implement, and likely more robust.
Don't know, but fulfilling the other specifications lead me to implement
method I.
One can also wonder how to determine the number of entries in the PDB that
have sequence microheterogeneity. Searching for the word gives 24 hits, but
some of the hits lack actual sequence heterogeneity in the coordinates
(e.g. 1CN4, 180D, 1UCS, 1BGN).
A simple search for 'microheterogeneity' now returns 65 entries.
Advice will be appreciated.
Thanks, -Eric
/* - - - - - - - - - - - - - - - - - - - - - - - - - - -
Eric Martz, Professor Emeritus, Dept Microbiology
U Mass, Amherst -- http://www.umass.edu/molvis/martz
Protein Explorer - 3D Visualization: http://proteinexplorer.org
FirstGlance in Jmol - http://firstglance.jmol.org
Workshops: http://www.umass.edu/molvis/workshop
Biochem 3D Education Resources http://MolviZ.org
World Index of Molecular Visualization Resources: http://molvisindex.org
ConSurf - Find Conserved Patches in Proteins: http://consurf.tau.ac.il
Atlas of Macromolecules: http://molvis.sdsc.edu/atlas/atlas.htm
PDB Lite Macromolecule Finder: http://pdblite.org
Molecular Visualization EMail List (molvis-list):
http://bioinformatics.org/mailman/listinfo/molvis-list
- - - - - - - - - - - - - - - - - - - - - - - - - - - */
More information about the pdb-l
mailing list