Loading
PDBj
MenuPDBj@FacebookPDBj@TwitterPDBj@YouTubewwPDB FoundationwwPDB
RCSB PDBPDBeBMRBAdv. SearchSearch help

PDBj Mine 2: SQL Search

本頁面還有其他語言版本: English日本語

This page provides some examples of SQL queries that can be executed from the PDBj Mine interface at https://pdbj.org/rdb/search.
You can use these samples at SQL Search of PDBj Mine or PDBj Mine:REST API. Documentation describing the SQL schema can be found on the Mine 2 RDB Documentation page.
The result are returned in web pages or a test file such as XML, CSV, and TSV, respectively. (See also PDBj Mine: search result.)

Search for SIFTS and PDB

Using Chemical Component Dictionary (CCD)

Using Chemical Component Model (CC Model)

Using Biologically Interesting Molecule Reference Dictionary (BIRD)


[Search for all PDB IDs:]

SELECT pdbid FROM brief_summary

[Search for all count (PDB IDs):]

SELECT count(pdbid) FROM brief_summary

[Search by keyword:]

SELECT pdbid FROM keyword_search('ubiquitin')

[Search for all PDB IDs (1g*):]

SELECT pdbid FROM brief_summary where pdbid like '1g%'

[Search for citation_author category (1gof):]

SELECT * FROM citation_author where pdbid='1gof'

[Search for citation_author ('Ito, N.'):]

SELECT pdbid
FROM brief_summary b 
WHERE '{"Ito, N."}' <@ citation_author

[Search for citation_author ('Ito, N.' and 'Phillips, S.E.V.'):]

SELECT pdbid
FROM brief_summary b 
WHERE '{"Ito, N.","Phillips, S.E.V."}' <@ citation_author

[Search for citation_author ['Ito, N.'] (primary):]

SELECT pdbid
FROM brief_summary b 
WHERE '{"Ito, N."}' <@ citation_author_pri

[Search for journal year ['2001']:]

SELECT pdbid
FROM brief_summary b 
WHERE '{2001}' && citation_year

[Search for journal volume ['555']:]

SELECT pdbid
FROM brief_summary b 
WHERE '{"555"}' && citation_volume

[Search for exprimental method [SOLUTION NMR]:]

SELECT pdbid
FROM brief_summary b
WHERE 6 = ANY(exptl_method_ids)

Note on experimental methods

In the brief_summary table, experimental method (exptl_method column) is defined as an enumeration of keywords defined as follows:

  1. X-RAY DIFFRACTION
  2. NEUTRON DIFFRACTION
  3. FIBER DIFFRACTION
  4. ELECTRON CRYSTALLOGRAPHY
  5. ELECTRON MICROSCOPY
  6. SOLUTION NMR
  7. SOLID-STATE NMR
  8. SOLUTION SCATTERING
  9. POWDER DIFFRACTION
  10. INFRARED SPECTROSCOPY
  11. EPR
  12. THEORETICAL MODEL
  13. Others

[Search for journal year[1991] volume[266] exp_method [X-RAY DIFFRACTION]:]

SELECT pdbid
FROM brief_summary b 
WHERE '{1991}' && citation_year AND '{"266"}' && citation_volume 
AND '{1}' && exptl_method_ids

[Search for exptl_crystal/density_percent_sol (1iqq):]

select density_percent_sol from exptl_crystal where pdbid='1iqq'

[Search for citation/pdbx_database_id_DOI (1iqq):]

select "pdbx_database_id_DOI" from citation where pdbid = '1iqq'

[Search the PubMed ID of 1iqq (citation/pdbx_database_id_PubMed):]

select "pdbx_database_id_PubMed" from citation where pdbid = '1iqq'

[Result]

 pdbx_database_id_PubMed: 
11577107

[Search the PubMed ID of 1ao8 and 1ai9 (citation/pdbx_database_id_PubMed):]

select pdbid, "pdbx_database_id_PubMed" from citation where pdbid='1ai9' or pdbid='1ao8'

[Result]

 1ao8 
 pdbx_database_id_PubMed: 
9514736 1ai9 
 pdbx_database_id_PubMed: 
9374515

[Search for citation/pdbx_database_id_DOI (1i*):]

select pdbid, "pdbx_database_id_DOI" from citation where pdbid like '1i%'

[Search for all entries with release date [after 2000-01-01]:]

SELECT pdbid
FROM brief_summary 
WHERE release_date >= '2000-01-01'

[Search for all entries with release date [after 2003-01-01] and deposition date [after 2001-01-01] and PDB ID (2k*) :]

SELECT pdbid
FROM brief_summary b 
WHERE release_date >= '2003-01-01' 
AND deposition_date >= '2001-01-01' 
AND b.pdbid LIKE '2k%'

[Search for all entries with chain type [polypeptide(D)]:]

SELECT pdbid FROM brief_summary WHERE '{1}' <@ chain_type_ids

Note on chain types

The chain types of polymers in the brief_summary table are defined as an enumeration of specific keywords. The following chain types are currently available:

  1. polypeptide(D)
  2. polypeptide(L)
  3. polydeoxyribonucleotide
  4. polyribonucleotide
  5. polysaccharide(D)
  6. polysaccharide(L)
  7. polydeoxyribonucleotide/polyribonucleotide hybrid
  8. cyclic-pseudo-peptide
  9. other

[Search for all entries containing polypeptide chains with both type D-form and L-form:]

SELECT pdbid FROM brief_summary WHERE '{1,2}' <@ chain_type_ids

[Search for all entries containing polypeptide chains with either type D-form or L-form:]

SELECT pdbid FROM brief_summary WHERE ('{1,2}' && chain_type_ids)

[Search for all entries with chain type [no polydeoxyribonucleotide]:]

SELECT pdbid FROM brief_summary WHERE NOT ('{3}' && chain_type_ids)

[Search for all entries not containing both the chain types [polydeoxyribonucleotide and polyribonucleotide]:]

SELECT pdbid
FROM brief_summary 
WHERE NOT ('{3,4}' && chain_type_ids)

[Search for all entries with chain type [all no except polysaccharide(D)]:]

SELECT pdbid
FROM brief_summary 
WHERE ('{5}' <@ chain_type_ids) AND NOT ('{1,2,3,4,6,7,8,9}' && chain_type_ids)

[Search for exptl/method, unit axes, unit angles, exptl_crystal/density_percent_sol, exptl_crystal/density_percent_sol (1iqq, 1ioo):]

SELECT exptl.pdbid, exptl.method, cell.length_a, cell.length_b, cell.length_c, exptl_crystal.density_percent_sol 
from exptl
join cell on exptl.pdbid = cell.pdbid
join exptl_crystal on exptl.pdbid = exptl_crystal.pdbid where exptl.pdbid = '1iqq' or exptl.pdbid = '1ioo'

[Search for entity names with EC number 1.1.1.1:]

select pdbid, pdbx_description from entity where pdbx_ec='1.1.1.1'

[Search for all entries with resolution less than or equal to 2.0 angstroms:]

select pdbid, ls_d_res_high as "resolution" from refine where ls_d_res_high <= 2.0

[List PDB ID and total polymer molecular weights:]

select pdbid, sum(pdbx_number_of_molecules*formula_weight) as weight 
from entity
where type = 'polymer'
group by pdbid
order by weight desc

[Search for all entries with pdbx_descriptor, pdbx_keywords, unit axes, unit angles, number of polymer molecules, and polymer weight:]

select cell.pdbid, struct.pdbx_descriptor, struct_keywords.pdbx_keywords, 
cell.length_a, cell.length_b, cell.length_c,
cell.angle_alpha, cell.angle_beta, cell.angle_gamma,
sum(entity.pdbx_number_of_molecules) as number_of_molecules,
sum(entity.pdbx_number_of_molecules * entity.formula_weight) as weight
from cell
join struct on cell.pdbid = struct.pdbid
join struct_keywords on cell.pdbid = struct_keywords.pdbid
join entity on cell.pdbid = entity.pdbid
where entity.type = 'polymer'
group by cell.pdbid, struct.pdbx_descriptor, struct_keywords.pdbx_keywords, cell.length_a, cell.length_b, cell.length_c, cell.angle_alpha, cell.angle_beta, cell.angle_gamma

[Retrieve biological unit information]

select pdbx_struct_assembly.pdbid, pdbx_struct_assembly.details, pdbx_struct_assembly.oligomeric_count, pdbx_struct_assembly.oligomeric_details,
pdbx_struct_assembly_gen.assembly_id, pdbx_struct_assembly_gen.asym_id_list, pdbx_struct_assembly_gen.oper_expression
from pdbx_struct_assembly
join pdbx_struct_assembly_gen on pdbx_struct_assembly.pdbid = pdbx_struct_assembly_gen.pdbid and pdbx_struct_assembly.id = pdbx_struct_assembly_gen.assembly_id
where pdbx_struct_assembly.details is not null

[Retrieve biological unit information for PDB entry 1nov]

select pdbx_struct_assembly.pdbid, pdbx_struct_assembly.details, pdbx_struct_assembly.oligomeric_count, pdbx_struct_assembly.oligomeric_details,
pdbx_struct_assembly_gen.assembly_id, pdbx_struct_assembly_gen.asym_id_list, pdbx_struct_assembly_gen.oper_expression
from pdbx_struct_assembly
join pdbx_struct_assembly_gen on pdbx_struct_assembly.pdbid = pdbx_struct_assembly_gen.pdbid and pdbx_struct_assembly.id = pdbx_struct_assembly_gen.assembly_id
where pdbx_struct_assembly.pdbid='1nov'

[Result]

 1nov 
 details: 
complete icosahedral assembly oligomeric_count: 
360 oligomeric_details: 
360-MERIC assembly_id: 
1 asym_id_list: 
A,B,C,D,E,F oper_expression: 
(1-60) 1nov 
 details: 
icosahedral asymmetric unit assembly_id: 
2 asym_id_list: 
A,B,C,D,E,F oper_expression: 
1 1nov 
 details: 
icosahedral pentamer assembly_id: 
3 asym_id_list: 
A,B,C,D,E,F oper_expression: 
(1-5) 1nov 
 details: 
icosahedral 23 hexamer assembly_id: 
4 asym_id_list: 
A,B,C,D,E,F oper_expression: 
(1,2,6,10,23,24) 1nov 
 details: 
icosahedral asymmetric unit, std point frame assembly_id: 
PAU asym_id_list: 
A,B,C,D,E,F oper_expression: 
P

[Counting the total number of residues in entries]

with slen(pdbid, entity_id, len) as
(select pdbid, entity_id, count(*) 
 from entity_poly_seq 
 group by pdbid, entity_id)
select entity.pdbid, sum(entity.pdbx_number_of_molecules*s.len)
from entity
join slen s on s.pdbid = entity.pdbid and s.entity_id = entity.id
group by entity.pdbid

[Map between PDB entity and UniProt accession]


Search for SIFTS and PDB

[Assign UniProt accession(s) to each polymer entity]

Run SQL query
SELECT e.pdbid, e.entity_id, s.chain, s.sp_primary
FROM entity_poly e
JOIN sifts.pdb_chain_uniprot s
  ON s.pdbid = e.pdbid AND s.chain = ANY(STRING_TO_ARRAY(e.pdbx_strand_id,','))

[Find polymer entities with GO:0006220 (pyrimidine nucleotide metabolic process)]

Run SQL query
SELECT DISTINCT e.pdbid, e.entity_id, s.sp_primary
FROM entity_poly e
JOIN sifts.pdb_chain_go s
  ON s.pdbid = e.pdbid AND s.chain = ANY(STRING_TO_ARRAY(e.pdbx_strand_id,','))
WHERE s.go_id = 'GO:0006220'

[Same as above, but limit to human proteins]

Run SQL query
SELECT DISTINCT e.pdbid, e.entity_id, s.sp_primary
FROM entity_poly e
JOIN sifts.pdb_chain_go s
  ON s.pdbid = e.pdbid AND s.chain = ANY(STRING_TO_ARRAY(e.pdbx_strand_id,','))
JOIN sifts.pdb_chain_taxonomy t
  ON t.pdbid = e.pdbid AND t.chain = s.chain
WHERE s.go_id = 'GO:0006220' AND t.scientific_name = 'Homo sapiens'

[Finding polymer entities assigned to multiple species (i.e., chimeric proteins)]

** Caution ** This query may take a long time to finish.
SELECT e.pdbid,e.entity_id, COUNT(DISTINCT t.tax_id) AS n, ARRAY_AGG(DISTINCT t.tax_id) AS tax_ids
FROM entity_poly e
JOIN sifts.pdb_chain_taxonomy t
  ON t.pdbid = e.pdbid AND t.chain = ANY(STRING_TO_ARRAY(e.pdbx_strand_id,','))
GROUP BY e.pdbid,e.entity_id
HAVING COUNT(DISTINCT t.tax_id) > 1
ORDER BY n DESC

[Retrieve PDB chain sequences matching to the Pfam accession "PF00046" (Homeobox) and having resolution better than 2.0 Angstrom and sequence length greater than or equal to 58 (residues)]

Run SQL query
SELECT s.*, r.ls_d_res_high as reso,
     LENGTH(p.pdbx_seq_one_letter_code_can) as len,
     ('>' || s.pdbid || s.chain) as header,
     p.pdbx_seq_one_letter_code_can as aaseq
FROM sifts.pdb_chain_pfam s
JOIN refine r on r.pdbid = s.pdbid
JOIN entity_poly p on p.pdbid = s.pdbid
     AND s.chain = ANY(regexp_split_to_array(p.pdbx_strand_id,','))
WHERE pfam_id = 'PF00046'
AND r.ls_d_res_high < 2.0
AND LENGTH(p.pdbx_seq_one_letter_code_can) >= 58
ORDER BY reso, len,s.chain

Using Chemical Component Dictionary (CCD)

[Find PDB entries containing monomers with the given InChIKey.]

Run SQL query
SELECT p.pdbid, p.id
FROM pdbj.chem_comp p
JOIN cc.pdbx_chem_comp_descriptor cc ON cc.comp_id = p.id
WHERE cc.type = 'InChIKey'
AND   cc.descriptor = 'ZKHQWZAMYRWXGA-KQYNXXCUSA-N'

Using Chemical Component Model (CC Model)

[Find PDB entries containing a compound corresponding to "YARXEW " in the Cambridge Structure Database.]

Run SQL query
SELECT p.pdbid, p.id, p.name, r.db_code
FROM pdbj.chem_comp p
JOIN ccmodel.pdbx_chem_comp_model m ON m.comp_id = p.id
JOIN ccmodel.pdbx_chem_comp_model_reference r ON r.model_id = m.model_id
WHERE r.db_name = 'CSD' AND r.db_code = 'YARXEW'

Using Biologically Interesting Molecule Reference Dictionary (BIRD)

[Find PDB entries containing antibiotics of molecular weight less than 1000 Da.]

Run SQL query
SELECT mf.pdbid, rm.name
FROM pdbj.pdbx_molecule_features mf
JOIN prd.pdbx_reference_molecule rm ON rm.prd_id = mf.prd_id
WHERE rm.class = 'Antibiotic'
AND rm.formula_weight < 1000.0

Here, the following subquery is used to extract the SIFTS data:
select distinct pdbid from sifts.pdb_chain_go where go_id='GO:0097180'


Created: 2019-10-04 (last edited: more than 1 year ago)2019-10-04

218500

數據於2024-04-17公開中

PDB statisticsPDBj update infoContact PDBjnumon