About Functional Details page
September 16, 2005
[About] Functional Details page
The information comes from several databases and literature sources.
1) Functional Information from GO Data
The GO (GeneOntology)
Consortium produces a controlled vocabulary that can be applied to all organisms. The information
concerning a particular protein's Biological Process, Molecular Function, and Cellular Component is
collected and described in GO ( http://geneontology.org/ontology/gene_ontology_edit.obo ). PDBMLplus at PDBj includes
GO annotations for protein chains. The correspondence between PDB chains and GO IDs are extracted
from the ID mapping file provided by UniProt ( ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz
).
2) Functional Information from PDB Data
The SITE information in the original PDB data is displayed with the type name "SITE". The names of
chain and residue are those defined by authors, which appear in the original PDB flat files.
3) Functional Information from PDB atom coordinates for the
"HETATM" binding sites
The ligand binding site information is extracted from every PDB structure as the residues, any
atoms of which are close to atoms identified as "HETATM" excluding the residue names of "HOH",
"WAT", "PO4", "SO4", "MSE", "TPO", "SEP", "PTR", "HIP", "PAS", "ASQ", NA ", and "CL " less than 5.0
Angstrom. The "HETATM" residues composed of less than three atoms are also not counted. Every
functional site is displayed with the type name "binding site". The chain-names and residue numbers
are defined by the authors, which appear in the original PDB flat files.
4) Functional Information from PROSITE/UniProt
The amino acid residues identified as the motif sequences by PROSITE/UniProt is annotated from Scan
Prosite ( http://www.expasy.org/tools/scanprosite/ ) and displayed with the type name
"PROSITE".
5) Functional Information from
SwissProt/UniProt
The Footnote (FT) information of the corresponding amino acid sequence data in SwissProt/UniProt relating to the molecular
function of each protein is collected and displayed with the following type names,
respectively:
- enzyme active site(ACT_SITE) for ACT_SITE
- metal binding site(METAL) for METAL
- calcium binding site(CA_BIND) for CA_BIND
- binding site(BINDING) for BINDING
- other interesting site(SITE) for SITE
- DNA-binding region(DNA_BIND) for DNA_BIND
- Nucleotide phosphate binding region(NP_BIND) for NP_BIND
- Zn Finger region(ZN_FING) for ZN_FING
- Transmembrane region(TRANSMEM) for TRANSMEM
6) Catalytic Information from CSA
The catalytic information of enzymes is collected and distributed as CSA (Catalytic Site Atlas)
database, where the catalytic sites of representative enzyme proteins are annotated by C.T.Porter,
G.J.Bartlett, and J.M.Thornton (http://www.ebi.ac.uk/thornton-srv/databases/CSA/). The CSA
information is annotated to the individual protein with the site_id and type name as "CSA#" and
"catalytic site", respectively, where # is the ID number. The chain-names and residue numbers are
defined by the authors, which appear in the original PDB flat files.
7) Catalytic Information from CATRES
The catalytic information of enzymes is also collected and distributed as CATRES (Catalytic Residue
Dataset) database, where the catalytic sites of representative enzyme proteins are manually
annotated by G.J.Bartlett, C.T.Porter, N. Borkakoti, and J. M. Thornton
(http://www.ebi.ac.uk/thornton-srv/databases/CATRES/). The CATRES information is annotated to the
individual protein with the site_id and type name as "CATRES#" and "catalytic site", respectively,
where # is the ID number. In addition, proteins homologous to the representative enzymes in the
original CATRES database are automatically extracted from all the PDB entries and the corresponding
chain and residue IDs are displayed using sequence alignment. The extended CATRES information is
annotated to the individual protein with the site_id and type name as "extCATRES#" and "catalytic
site", respectively, where # is the ID number. The chain-names and residue numbers are defined by
the authors, which appear in the original PDB flat files. The procedure to extract the catalytic
residues from homologous enzymes is as follows:
1. Extract the PDB sequence from the original CATRES file. (If the catalytic residues span more
than one chain, it is skipped at the moment.)
2. Align the query sequence to all the PDB sequences using BLAST.
3. If all catalytic residues are contained within the BLAST alignment and all catalytic residues
are conserved (100% identity), the function of the new (template) sequence is likely to be the same
as that in the CATRES file.
4. The final determination is made based on the structural similarity of the active-site residues.
When the distance RMSD value is less than 3A, the active-site is considered to have the same
catalytic function, and the extCATRES file described with XML is stored. The distance RMSD of the
active-site atoms is computed as follows:
For each pair of residues (i) in the query, find the pair of atoms with the smallest distance
(dq_i). Compute the same distance in the templates (dt_i). Compute rmsd as sqrt( sum_i(
(dq_i-dt_i)^2 )/Npair), where Npair is the number of residue pairs.
Questions and comments about the Miner should transmit mail to pdbj-master.