PDBj Mine 2: SQL Search
This page provides some examples of SQL queries that can be executed from the PDBj Mine interface at https://pdbj.org/rdb/search.
You can use these samples at SQL Search of PDBj Mine or PDBj Mine:REST API.
Documentation describing the SQL schema can be found on the Mine 2 RDB Documentation page.
The result are returned in web pages or a test file such as XML, CSV, and TSV, respectively.
(See also PDBj Mine: search result.)
- Search for all PDB IDs:
- Search for all count (PDB IDs):
- Search by keyword:
- Search for all PDB IDs (1g*):
- Search for citation_author category (1gof):
- Search for citation_author ('Ito, N.'):
- Search for citation_author ('Ito, N.' and 'Phillips, S.E.V.'):
- Search for citation_author ['Ito, N.'] (primary):
- Search for journal name ['molecular cell']:
- Search for journal year ['2001']:
- Search for journal volume ['555']:
- Search for exprimental method [SOLUTION NMR]:
- Search for journal year[1991] volume[266] exp_method [X-RAY DIFFRACTION]:
- Search for exptl_crystal/density_percent_sol (1iqq):
- Search for citation/pdbx_database_id_DOI (1iqq):
- Search the PubMed ID of 1iqq (citation/pdbx_database_id_PubMed):
- Search the PubMed ID of 1ao8 and 1ai9 (citation/pdbx_database_id_PubMed):
- Search for citation/pdbx_database_id_DOI (1i*):
- Search for all entries with release date [after 2000-01-01]:
- Search for all entries with release date [after 2003-01-01] and deposition date [after 2001-01-01] and PDB ID (2k*) :
- Search for all entries with chain type [polypeptide(D)]:
- Search for all entries containing polypeptide chains with both type D-form and L-form:
- Search for all entries containing polypeptide chains with either type D-form or L-form:
- Search for all entries with chain type [no polydeoxyribonucleotide]:
- Search for all entries not containing both the chain types [polydeoxyribonucleotide and polyribonucleotide]:
- Search for all entries with chain type [all no except polysaccharide(D)]:
- Search for exptl/method, unit axes, unit angles, exptl_crystal/density_percent_sol, exptl_crystal/density_percent_sol (1iqq, 1ioo):
- Search for entity names with EC number 1.1.1.1:
- Search for all entries with resolution less than or equal to 2.0 angstroms:
- List PDB ID and total polymer molecular weights:
- Search for all entries with pdbx_descriptor, pdbx_keywords, unit axes, unit angles, number of polymer molecules, and polymer weight:
- Retrieve biological unit information
- Retrieve biological unit information for PDB entry 1nov
- Counting the total number of residues in entries
- Map between PDB entity and UniProt accession
Search for SIFTS and PDB
- Assign UniProt accession(s) to each polymer entity
- Find polymer entities with GO:0006220 (pyrimidine nucleotide metabolic process)
- Same as above, but limit to human proteins.
- Finding polymer entities assigned to multiple species (i.e., chimeric proteins)
- Retrieve PDB chain sequences matching to the Pfam accession "PF00046" (Homeobox) and having resolution better than 2.0 Angstrom and sequence length greater than or equal to 58 (residues)
Using Chemical Component Dictionary (CCD)
Using Chemical Component Model (CC Model)
Using Biologically Interesting Molecule Reference Dictionary (BIRD)
[Search for all PDB IDs:]
SELECT pdbid FROM brief_summary
[Search for all count (PDB IDs):]
SELECT count(pdbid) FROM brief_summary
[Search by keyword:]
SELECT pdbid FROM keyword_search('ubiquitin')
[Search for all PDB IDs (1g*):]
SELECT pdbid FROM brief_summary where pdbid like '1g%'
[Search for citation_author category (1gof):]
SELECT * FROM citation_author where pdbid='1gof'
[Search for citation_author ('Ito, N.'):]
SELECT pdbid FROM brief_summary b WHERE '{"Ito, N."}' <@ citation_author
[Search for citation_author ('Ito, N.' and 'Phillips, S.E.V.'):]
SELECT pdbid FROM brief_summary b WHERE '{"Ito, N.","Phillips, S.E.V."}' <@ citation_author
[Search for citation_author ['Ito, N.'] (primary):]
SELECT pdbid FROM brief_summary b WHERE '{"Ito, N."}' <@ citation_author_pri
[Search for journal year ['2001']:]
SELECT pdbid FROM brief_summary b WHERE '{2001}' && citation_year
[Search for journal volume ['555']:]
SELECT pdbid FROM brief_summary b WHERE '{"555"}' && citation_volume
[Search for exprimental method [SOLUTION NMR]:]
SELECT pdbid FROM brief_summary b WHERE 6 = ANY(exptl_method_ids)
Note on experimental methods
In the brief_summary table, experimental method (exptl_method column) is defined as an enumeration of keywords defined as follows:
- X-RAY DIFFRACTION
- NEUTRON DIFFRACTION
- FIBER DIFFRACTION
- ELECTRON CRYSTALLOGRAPHY
- ELECTRON MICROSCOPY
- SOLUTION NMR
- SOLID-STATE NMR
- SOLUTION SCATTERING
- POWDER DIFFRACTION
- INFRARED SPECTROSCOPY
- EPR
- THEORETICAL MODEL
- Others
[Search for journal year[1991] volume[266] exp_method [X-RAY DIFFRACTION]:]
SELECT pdbid FROM brief_summary b WHERE '{1991}' && citation_year AND '{"266"}' && citation_volume AND '{1}' && exptl_method_ids
[Search for exptl_crystal/density_percent_sol (1iqq):]
select density_percent_sol from exptl_crystal where pdbid='1iqq'
[Search for citation/pdbx_database_id_DOI (1iqq):]
select "pdbx_database_id_DOI" from citation where pdbid = '1iqq'
[Search the PubMed ID of 1iqq (citation/pdbx_database_id_PubMed):]
select "pdbx_database_id_PubMed" from citation where pdbid = '1iqq'
[Result]
pdbx_database_id_PubMed: 11577107
[Search the PubMed ID of 1ao8 and 1ai9 (citation/pdbx_database_id_PubMed):]
select pdbid, "pdbx_database_id_PubMed" from citation where pdbid='1ai9' or pdbid='1ao8'
[Result]
1ao8 pdbx_database_id_PubMed: 9514736 1ai9 pdbx_database_id_PubMed: 9374515
[Search for citation/pdbx_database_id_DOI (1i*):]
select pdbid, "pdbx_database_id_DOI" from citation where pdbid like '1i%'
[Search for all entries with release date [after 2000-01-01]:]
SELECT pdbid FROM brief_summary WHERE release_date >= '2000-01-01'
[Search for all entries with release date [after 2003-01-01] and deposition date [after 2001-01-01] and PDB ID (2k*) :]
SELECT pdbid FROM brief_summary b WHERE release_date >= '2003-01-01' AND deposition_date >= '2001-01-01' AND b.pdbid LIKE '2k%'
[Search for all entries with chain type [polypeptide(D)]:]
SELECT pdbid FROM brief_summary WHERE '{1}' <@ chain_type_ids
Note on chain types
The chain types of polymers in the brief_summary table are defined as an enumeration of specific keywords. The following chain types are currently available:
- polypeptide(D)
- polypeptide(L)
- polydeoxyribonucleotide
- polyribonucleotide
- polysaccharide(D)
- polysaccharide(L)
- polydeoxyribonucleotide/polyribonucleotide hybrid
- cyclic-pseudo-peptide
- other
[Search for all entries containing polypeptide chains with both type D-form and L-form:]
SELECT pdbid FROM brief_summary WHERE '{1,2}' <@ chain_type_ids
[Search for all entries containing polypeptide chains with either type D-form or L-form:]
SELECT pdbid FROM brief_summary WHERE ('{1,2}' && chain_type_ids)
[Search for all entries with chain type [no polydeoxyribonucleotide]:]
SELECT pdbid FROM brief_summary WHERE NOT ('{3}' && chain_type_ids)
[Search for all entries not containing both the chain types [polydeoxyribonucleotide and polyribonucleotide]:]
SELECT pdbid FROM brief_summary WHERE NOT ('{3,4}' && chain_type_ids)
[Search for all entries with chain type [all no except polysaccharide(D)]:]
SELECT pdbid FROM brief_summary WHERE ('{5}' <@ chain_type_ids) AND NOT ('{1,2,3,4,6,7,8,9}' && chain_type_ids)
[Search for exptl/method, unit axes, unit angles, exptl_crystal/density_percent_sol, exptl_crystal/density_percent_sol (1iqq, 1ioo):]
SELECT exptl.pdbid, exptl.method, cell.length_a, cell.length_b, cell.length_c, exptl_crystal.density_percent_sol from exptl join cell on exptl.pdbid = cell.pdbid join exptl_crystal on exptl.pdbid = exptl_crystal.pdbid where exptl.pdbid = '1iqq' or exptl.pdbid = '1ioo'
[Search for entity names with EC number 1.1.1.1:]
select pdbid, pdbx_description from entity where pdbx_ec='1.1.1.1'
[Search for all entries with resolution less than or equal to 2.0 angstroms:]
select pdbid, ls_d_res_high as "resolution" from refine where ls_d_res_high <= 2.0
[List PDB ID and total polymer molecular weights:]
select pdbid, sum(pdbx_number_of_molecules*formula_weight) as weight from entity where type = 'polymer' group by pdbid order by weight desc
[Search for all entries with pdbx_descriptor, pdbx_keywords, unit axes, unit angles, number of polymer molecules, and polymer weight:]
select cell.pdbid, struct.pdbx_descriptor, struct_keywords.pdbx_keywords, cell.length_a, cell.length_b, cell.length_c, cell.angle_alpha, cell.angle_beta, cell.angle_gamma, sum(entity.pdbx_number_of_molecules) as number_of_molecules, sum(entity.pdbx_number_of_molecules * entity.formula_weight) as weight from cell join struct on cell.pdbid = struct.pdbid join struct_keywords on cell.pdbid = struct_keywords.pdbid join entity on cell.pdbid = entity.pdbid where entity.type = 'polymer' group by cell.pdbid, struct.pdbx_descriptor, struct_keywords.pdbx_keywords, cell.length_a, cell.length_b, cell.length_c, cell.angle_alpha, cell.angle_beta, cell.angle_gamma
[Retrieve biological unit information]
select pdbx_struct_assembly.pdbid, pdbx_struct_assembly.details, pdbx_struct_assembly.oligomeric_count, pdbx_struct_assembly.oligomeric_details, pdbx_struct_assembly_gen.assembly_id, pdbx_struct_assembly_gen.asym_id_list, pdbx_struct_assembly_gen.oper_expression from pdbx_struct_assembly join pdbx_struct_assembly_gen on pdbx_struct_assembly.pdbid = pdbx_struct_assembly_gen.pdbid and pdbx_struct_assembly.id = pdbx_struct_assembly_gen.assembly_id where pdbx_struct_assembly.details is not null
[Retrieve biological unit information for PDB entry 1nov]
select pdbx_struct_assembly.pdbid, pdbx_struct_assembly.details, pdbx_struct_assembly.oligomeric_count, pdbx_struct_assembly.oligomeric_details, pdbx_struct_assembly_gen.assembly_id, pdbx_struct_assembly_gen.asym_id_list, pdbx_struct_assembly_gen.oper_expression from pdbx_struct_assembly join pdbx_struct_assembly_gen on pdbx_struct_assembly.pdbid = pdbx_struct_assembly_gen.pdbid and pdbx_struct_assembly.id = pdbx_struct_assembly_gen.assembly_id where pdbx_struct_assembly.pdbid='1nov'
[Result]
1nov details: complete icosahedral assembly oligomeric_count: 360 oligomeric_details: 360-MERIC assembly_id: 1 asym_id_list: A,B,C,D,E,F oper_expression: (1-60) 1nov details: icosahedral asymmetric unit assembly_id: 2 asym_id_list: A,B,C,D,E,F oper_expression: 1 1nov details: icosahedral pentamer assembly_id: 3 asym_id_list: A,B,C,D,E,F oper_expression: (1-5) 1nov details: icosahedral 23 hexamer assembly_id: 4 asym_id_list: A,B,C,D,E,F oper_expression: (1,2,6,10,23,24) 1nov details: icosahedral asymmetric unit, std point frame assembly_id: PAU asym_id_list: A,B,C,D,E,F oper_expression: P
[Counting the total number of residues in entries]
with slen(pdbid, entity_id, len) as (select pdbid, entity_id, count(*) from entity_poly_seq group by pdbid, entity_id) select entity.pdbid, sum(entity.pdbx_number_of_molecules*s.len) from entity join slen s on s.pdbid = entity.pdbid and s.entity_id = entity.id group by entity.pdbid
[Map between PDB entity and UniProt accession]
Search for SIFTS and PDB
[Assign UniProt accession(s) to each polymer entity]
Run SQL querySELECT e.pdbid, e.entity_id, s.chain, s.sp_primary FROM entity_poly e JOIN sifts.pdb_chain_uniprot s ON s.pdbid = e.pdbid AND s.chain = ANY(STRING_TO_ARRAY(e.pdbx_strand_id,','))
[Find polymer entities with GO:0006220 (pyrimidine nucleotide metabolic process)]
Run SQL querySELECT DISTINCT e.pdbid, e.entity_id, s.sp_primary FROM entity_poly e JOIN sifts.pdb_chain_go s ON s.pdbid = e.pdbid AND s.chain = ANY(STRING_TO_ARRAY(e.pdbx_strand_id,',')) WHERE s.go_id = 'GO:0006220'
[Same as above, but limit to human proteins]
Run SQL querySELECT DISTINCT e.pdbid, e.entity_id, s.sp_primary FROM entity_poly e JOIN sifts.pdb_chain_go s ON s.pdbid = e.pdbid AND s.chain = ANY(STRING_TO_ARRAY(e.pdbx_strand_id,',')) JOIN sifts.pdb_chain_taxonomy t ON t.pdbid = e.pdbid AND t.chain = s.chain WHERE s.go_id = 'GO:0006220' AND t.scientific_name = 'Homo sapiens'
[Finding polymer entities assigned to multiple species (i.e., chimeric proteins)]
** Caution ** This query may take a long time to finish.SELECT e.pdbid,e.entity_id, COUNT(DISTINCT t.tax_id) AS n, ARRAY_AGG(DISTINCT t.tax_id) AS tax_ids FROM entity_poly e JOIN sifts.pdb_chain_taxonomy t ON t.pdbid = e.pdbid AND t.chain = ANY(STRING_TO_ARRAY(e.pdbx_strand_id,',')) GROUP BY e.pdbid,e.entity_id HAVING COUNT(DISTINCT t.tax_id) > 1 ORDER BY n DESC
[Retrieve PDB chain sequences matching to the Pfam accession "PF00046" (Homeobox) and having resolution better than 2.0 Angstrom and sequence length greater than or equal to 58 (residues)]
Run SQL querySELECT s.*, r.ls_d_res_high as reso, LENGTH(p.pdbx_seq_one_letter_code_can) as len, ('>' || s.pdbid || s.chain) as header, p.pdbx_seq_one_letter_code_can as aaseq FROM sifts.pdb_chain_pfam s JOIN refine r on r.pdbid = s.pdbid JOIN entity_poly p on p.pdbid = s.pdbid AND s.chain = ANY(regexp_split_to_array(p.pdbx_strand_id,',')) WHERE pfam_id = 'PF00046' AND r.ls_d_res_high < 2.0 AND LENGTH(p.pdbx_seq_one_letter_code_can) >= 58 ORDER BY reso, len,s.chain
Using Chemical Component Dictionary (CCD)
[Find PDB entries containing monomers with the given InChIKey.]
Run SQL querySELECT p.pdbid, p.id FROM pdbj.chem_comp p JOIN cc.pdbx_chem_comp_descriptor cc ON cc.comp_id = p.id WHERE cc.type = 'InChIKey' AND cc.descriptor = 'ZKHQWZAMYRWXGA-KQYNXXCUSA-N'
Using Chemical Component Model (CC Model)
[Find PDB entries containing a compound corresponding to "YARXEW " in the Cambridge Structure Database.]
Run SQL querySELECT p.pdbid, p.id, p.name, r.db_code FROM pdbj.chem_comp p JOIN ccmodel.pdbx_chem_comp_model m ON m.comp_id = p.id JOIN ccmodel.pdbx_chem_comp_model_reference r ON r.model_id = m.model_id WHERE r.db_name = 'CSD' AND r.db_code = 'YARXEW'
Using Biologically Interesting Molecule Reference Dictionary (BIRD)
[Find PDB entries containing antibiotics of molecular weight less than 1000 Da.]
Run SQL querySELECT mf.pdbid, rm.name FROM pdbj.pdbx_molecule_features mf JOIN prd.pdbx_reference_molecule rm ON rm.prd_id = mf.prd_id WHERE rm.class = 'Antibiotic' AND rm.formula_weight < 1000.0
Here, the following subquery is used to extract the SIFTS data:
select distinct pdbid from sifts.pdb_chain_go where go_id='GO:0097180'