Jmol Sequence Tool

Progress on the implementation of Eric Martz's proposal, described at http://molvis.sdsc.edu/fgij/seqspecs.htm
in the prototype at http://biomodel.uah.es/Jmol/sequence_info/.

Specifications for Interactive Sequence Listings
for FirstGlance in Jmol (FG)
Created March 26, 2006. Revised March 30, 2006 thanks to feedback from Frieda Reichsman and Jaime Prilusky.

This is a pre-implementation draft proposal subject to change. Please share your ideas, suggestions, and additional juicy examples with the FGiJ Development Team

A. Functions of Sequence Listing

The purpose of the sequence listing in FG is to make it easy to relate sequence to 3D structure within the FG application. Therefore the listing should be kept very simple. Sequence annotations such as secondary structure or sequence motifs can be viewed in other databases, and need not be shown in the FG sequence listing.

no problem

1. The sequence listing will be displayed in the lower left division of FG.

done

2. Residues will be listed in one-letter code, with x for non-standard residues.
I've decided to omit water & solvent from the listing, and leave the other hetero groups in the sequence

done

3. Touching the one-letter code for an amino acid will display, in a form slot, its chain name, ATOM sequence number (and insertion code when present) and three-letter code.
OK; insertion code is lowercase, it has better visibility in my opinion; aa use the standard Uppercase-lowercase-lowercase format; nucleotides are converted to single-letter even if deoxy (PDBv3 uses DG etc.)

1. done

2. how?

4. Seq->3D: Clicking the one-letter code for a residue will highlight it in the 3D view in Jmol.
Done using halos; other highlights are feasible

Optionally, the residue could automatically be brought to the front of the molecule (by rotating the molecule) (don't know how to do this), slid smoothly to the center, and zoomed.

done

5. 3D->Seq: Clicking a residue in the 3D view will highlight it in the sequence listing.

The following are optional and need not be in the initial release.

done

6. Entering a sequence fragment will highlight the locations of any matches in the sequence listing
and also in the model. May fail if gaps or microheterogeneity are involved.

done

7. Entering a residue name (e.g. PRO or CYS or A or U) will highlight the locations of that residue in the sequence listing.
and also in the model. But we must use one-letter code (or implement a different slot)

--how / when to trigger?

8. Coloring the sequence listing automatically, according to the color scheme in the 3D view. This would be appropriate for all views:

Secondary structure
Cartoon, Vines (color by chain)
N->C Rainbow
Composition
Hydrophobic/Polar
Charge
Contacts (contacting residues highlighted)

B. Contents & Format of the Sequence Listing

done

1. There will be a single sequence list taken from the ATOM records (with residues that lack coordinates taken from SEQRES). The SEQRES residues will not be listed separately, but discrepancies between aligned SEQRES and ATOM records will be indicated in the single list as detailed below.

See example below for 1QKZ:L.

done

2. Non-standard residues will be listed as x (lowercase). Touching the x's will display their 1-3 letter ATOM record abbreviation codes in a form slot.

2SOC (NMR) has DPN, DTR, THO. Its listing would be xCFxKTCx.
1BKX:A has TPO197 and SEP338.
1AL4 has D amino acids DLE, DVA, also ETA etc.
1EVV has many non-standard nucleotides.

done

3. Sequence numbers will be those in the ATOM records. Thus, some listings will start with a negative, zero or 2 or higher sequence number.

2FSR starts at -9, skips -2 through 3, and resumes at 4.
1D5T:A starts at -2 and skips 0.
1AVQ:A starts at -1 and includes 0.
1BXW:A starts at 0.
1UCY:K starts at 16;
163D:A starts at 43;

Additionally, some listings will have numbers in decreasing order, or large discontinuities in numbering.

1NSA contains a single (unnamed) protein chain with sequence 7A-95A that continues 4-308.
1IAO:B contains (in this order) 1S, 323P-334P, 6-94, 94A, 95-188, 1T, 2T.

done

4. Inserted residues will be listed in-line with other residues, but distinguished by having their one-letter codes displayed as ^superscript letters.

See example below for 1QKZ:L.
1UCY:L starts with 8 inserted residues in reverse alphabetic order, and has 13 inserted residues near the end. Other chains have many insertions as well.

done

5. When there is a numbering gap (due to numbering according to a reference sequence) but no residues are missing in the 3D structure (SEQRES and ATOM records match), the position of the gap will be indicated by two hyphens surrounding a number indicating the size of the gap, e.g. -1-, -2-, -3-, -23-, and so forth. But using ~ instead.

See example below for 1QKZ:L.
1IGT:B has 23 such gaps, e.g following 97, 130, 154, 157, etc.
163D:A (RNA) has a gap following 58.

mostly done

6. When there is a physical gap in the 3D model (residues in SEQRES that are absent in ATOM records, typically due to crystallographic disorder), the residues with no coordinates will be listed in lower case. Touching such a residue will report an interpolated sequence number. Clicking on such a residue will produce a message* explaining that the residue lacks coordinates. Still need to implement interpolation (?); right now, it increments the nr. by 1
(*) tooltip (onMouseOver) and alert box (onClick)

See example below for 1QKZ:L.
2ACE has leading, embedded, and trailing physical gaps.

done

7. When there is sequence microheterogeneity (residues in ATOM records that are absent in SEQRES), the alternate residues at the same sequence position will be enclosed in square brackets.

For example, in 1CBN, residues starting at number 20 would be represented ...GT[PS]EA.... At position 22, PRO and SER are alternate residues.
1AL4 and 1ETA have sequence microheterogeneity.
More on sequence microheterogeneity. Sequence microheterogeneity appears to be quite rare in the PDB, but I have not found definitive website-based search strategy.

The following are optional and need not be in the initial release.

--how to check?
--sure a checkbox?

8. A checkbox to highlight residues with missing atoms. For example, some crystallographic results have the alpha and beta carbons of certain amino acids, but lack the remainder of the sidechains.

(@@ examples needed) and a criterion

done (w/ link)
--sure a checkbox?

9. A checkbox to highlight residues with alternate sidechain conformations (rotamers; multiple sets of coordinates for sidechain atoms). Alternate sidechain conformations are quite common in the PDB.

5HVP:A, 1AL4 have alternate sidechain conformations at 12,45,60,65... and ...

C. Listing Examples

Experience with Protein Explorer has shown that it is quite easy to find any sequence number by moving the mouse over the listing, and watching the number reports in the form slot. Here is the slot as it appears in Protein Explorer when residue 27, insertion code A, is touched in the sequence listing for the example below, 1QKZ chain L:

done:

This reporting slot will be immediately above the sequence listing. Thus, it is not important that the sequence number of a residue be apparent from inspection of the sequence listing table itself. Indeed, maintaining a correspondence between column and row and sequence number is not always feasible because of the anomalous sequence numberings used by some authors (see examples listed above). In the listing below, residues are divided into three groups of ten per line. However, the listing may start at a number <1 or >1, so line 1 need not be numbered 1-30. In the example below, it includes residues numbered 1-27C (1-27 plus 27A, 27B, 27C).

Proposed listing format for FirstGlance
1QKZ chain L

Comments

L NIVMTQTPLS LPVSLGDQAS ISCRSSQ^SLV
L ^HSNGNTYLHW YLQKPGQSPK LLIYTVSNRF
L SGVPDLRFSG SGSGTDFTLK ISRVEAEDLG
L VYFCSQSTHF P-1-TFGGGTKL EIKRADAA
L PTVSIFPPSSEQ LTSGGASVVC FLNNFYPK
L DINVKWKIDGKE RQNGVLNSWT DQdskDST
L YSMSSTLTLTKD EYERHNSYTC EATHKTST
L SPIVKSFNRNE

<Insertions at 
...SSQ^SLVHSNGN... (27, 27A, 27B, ...)

<Numbering gap: 
...FP-TF... (P95 peptide bonded to T97)

<Physical gap: 
residues without coordinates "dsk"

The green color reflects the color assigned to chain L by Jmol. Other color schemes could also be applied (see below).

The above listing lacks instances of sequence microheterogeneity and non-standard residues. For examples, see numbered items above on those topics.

The above listing is shown largely in a single color to emphasize that insertions, numbering gaps, and physical gaps can be indicated without the use of color. Thus, colors can be applied to the listing to indicate other desired characteristics. Below are two Seq3D listings from Protein Explorer (in a slightly different format than that proposed for FirstGlance) illustrating two of many possible uses of color.

Protein Explorer Seq3D Snapshots Illustrating Uses of Color