SIFTS

This page is also available in: 日本語

What is SIFTS?

SIFTS^*1 (Structure integration with function, taxonomy and sequence) is an up-to-date resource for residue-level mapping between UniProt and PDB entries. The resource also provides residue-level annotation from the IntEnz, GO, Pfam, InterPro, SCOP, CATH and PubMed resources. The information is updated and released every week at the same time as the release of new PDB entries and is widely used by resources such as RCSB, PDBsum, Pfam, SCOP, InterPro, and DAS server providers.

Search PDB entries

By combining PDBj Mine2 and SIFTS, the user can easily retrieve, using SQL, various annotations of protein sequences in the PDB such as the Gene Ontology, taxonomy (biological species), structural classification (SCOP and CATH), enzyme codes, and correspondence with UniProt sequences.

Example SQL queries

Please refer to Mine 2 help page for SQL examples with SIFTS data.

SIFTS data

Please refer to ftp site to integrate PDBe's SIFTS into the PDBj Mine2 database.
ftp://ftp.pdbj.org/mine2/sifts/

Table definitions for the SIFTS data

Currently, the SIFTS data available in the TSV (tab-separated values) format are stored in RDB tables. The table definitions are show below. As you can see, the names of the tables reflect those of the files provided at SIFTS (Quick Access) Note that the following SIFTS tables are in the SIFTS schema so that the user should access them prefixing with "sifts." (e.g., sifts.pdb_chain_uniprot, etc.).


CREATE TABLE pdb_chain_uniprot (
  pdbid CHARACTER(4),  -- PDB ID.
  chain TEXT,          -- Chain ID (auth_asym_id).
  SP_PRIMARY TEXT,     -- UniProt accession ID.
  RES_BEG INTEGER,     -- Beginning of the alignment in wwPDB's canonical sequence numbering scheme (pdbx_poly_seq_scheme.seq_id).
  RES_END INTEGER,     -- End of the alignment (see above).
  PDB_BEG TEXT,        -- Beginning of the alignment in author's sequence numbering scheme (pdbx_poly_seq_scheme.auth_seq_num).
  PDB_END TEXT,        -- End of the alignment (see above).
  SP_BEG INTEGER,      -- Beginning of the alignment in the UniProt sequence.
  SP_END INTEGER       -- End of the alignment in the UniProt sequence.
  );

CREATE TABLE pdb_chain_taxonomy(
  pdbid CHARACTER(4),
  CHAIN TEXT,
  TAX_ID TEXT,  -- NCBI taxonomy code.
  SCIENTIFIC_NAME TEXT -- This is NOT scientific at all!! (includes common names, etc.)
  );
CREATE TABLE pdb_pubmed(
 pdbid CHARACTER(4),
 ordinal TEXT,
 pubmed_id TEXT
 );

CREATE TABLE pdb_chain_enzyme (
 pdbid CHARACTER(4),
 chain TEXT,
 accession TEXT,  -- UniProt accession.
 EC_number TEXT   -- EC number.
 );

CREATE TABLE pdb_chain_go (
 pdbid CHARACTER(4),
 chain TEXT,
 SP_primary TEXT,
 WITH_STRING TEXT,
 EVIDENCE TEXT, -- Evidence code.
 GO_ID TEXT     -- GO (Gene Ontology) ID.
 );
CREATE TABLE pdb_chain_interpro(
  pdbid CHARACTER(4),
  CHAIN TEXT,
  interpro_ID TEXT -- InterPro ID.
  );

CREATE TABLE pdb_chain_pfam (
  pdbid CHARACTER(4),
  chain TEXT,
  SP_PRIMARY TEXT,
  PFAM_ID TEXT -- Pfam ID.
  );

CREATE TABLE pdb_chain_cath_uniprot(
  pdbid CHARACTER(4),
  CHAIN TEXT,
  SP_PRIMARY TEXT, -- UniProt accession
  CATH_ID TEXT     -- CATH ID (see http://www.cathdb.info/).
  );
CREATE TABLE pdb_chain_scop_uniprot(
  pdbid CHARACTER(4),
  CHAIN TEXT,
  SP_PRIMARY TEXT,  -- UniProt accession
  SUNID TEXT,       -- SCOP's SUN ID (see http://scop.berkeley.edu/).
  SCOP_ID TEXT      -- SCOP ID.
  );

CREATE TABLE uniprot_pdb (
  SP_PRIMARY TEXT,
  PDBIDS TEXT,  -- A list of PDBID's (in text).
  PDBIDS_arr TEXT[] -- The same as above but in an array of text (for convenience).
  );

*1) Velankar et al., Nucleic Acids Research 41, D483 (2013)

Created: 2016-04-12 (last edited: more than 1 year ago)