National Natural Science Foundation of China (NSFC)
22177073
China
Citation
Journal: Nat Commun / Year: 2025 Title: Discovery of CRISPR-Cas12a clades using a large language model. Authors: Yuanyuan Feng / Junchao Shi / Zhanwei Li / Yongqian Li / Jiaxi Yang / Shisheng Huang / Jinfang Zheng / Wei Han / Yunbo Qiao / Jun Zhang / Qi Liu / Yao Yang / Chunyi Hu / Lina Wu / Xiaokang ...Authors: Yuanyuan Feng / Junchao Shi / Zhanwei Li / Yongqian Li / Jiaxi Yang / Shisheng Huang / Jinfang Zheng / Wei Han / Yunbo Qiao / Jun Zhang / Qi Liu / Yao Yang / Chunyi Hu / Lina Wu / Xiaokang Zhang / Jin Tang / Xingxu Huang / Peixiang Ma / Abstract: CRISPR-Cas systems revolutionize life science. Metagenomes contain millions of unknown Cas proteins. Traditional mining relies on protein sequence alignments. In this work, we employ an evolutionary ...CRISPR-Cas systems revolutionize life science. Metagenomes contain millions of unknown Cas proteins. Traditional mining relies on protein sequence alignments. In this work, we employ an evolutionary scale language model (ESM) to learn the information beyond sequences. Trained with CRISPR-Cas data, ESM accurately identifies Cas proteins without alignment. Limited experimental data restricts feature prediction, but integrating with machine learning enables trans-cleavage activity prediction of uncharacterized Cas12a. We discover 7 undocumented Cas12a subtypes with unique CRISPR loci. Structural analyses reveal 8 subtypes of Cas1, Cas2, and Cas4. Cas12a subtypes display distinct 3D-folds. CryoEM analyses unveil unique RNA interactions with the uncharacterized Cas12a. These proteins show distinct double-strand and single-strand DNA cleavage preferences and broad PAM recognition. Finally, we establish a specific detection strategy for the oncogene SNP without traditional Cas12a PAM. This study highlights the potential of language models in exploring undocumented Cas protein function via gene cluster classification.
History
Deposition
Aug 18, 2023
Deposition site: PDBJ / Processing site: PDBJ
Revision 1.0
Sep 4, 2024
Provider: repository / Type: Initial release
Revision 1.0
Sep 4, 2024
Data content type: EM metadata / Data content type: EM metadata / Provider: repository / Type: Initial release
Revision 1.0
Sep 4, 2024
Data content type: FSC / Data content type: FSC / Provider: repository / Type: Initial release
Revision 1.0
Sep 4, 2024
Data content type: Half map / Part number: 1 / Data content type: Half map / Provider: repository / Type: Initial release
Revision 1.0
Sep 4, 2024
Data content type: Half map / Part number: 2 / Data content type: Half map / Provider: repository / Type: Initial release
Revision 1.0
Sep 4, 2024
Data content type: Image / Data content type: Image / Provider: repository / Type: Initial release
Revision 1.0
Sep 4, 2024
Data content type: Mask / Data content type: Mask / Provider: repository / Type: Initial release
Revision 1.0
Sep 4, 2024
Data content type: Primary map / Data content type: Primary map / Provider: repository / Type: Initial release
Data content type: EM metadata / Data content type: EM metadata / EM metadata / Group: Data processing / Experimental summary / Data content type: EM metadata / EM metadata / Category: em_admin / em_software / Data content type: EM metadata / EM metadata / Item: _em_admin.last_update / _em_software.name
In the structure databanks used in Yorodumi, some data are registered as the other names, "COVID-19 virus" and "2019-nCoV". Here are the details of the virus and the list of structure data.
Jan 31, 2019. EMDB accession codes are about to change! (news from PDBe EMDB page)
EMDB accession codes are about to change! (news from PDBe EMDB page)
The allocation of 4 digits for EMDB accession codes will soon come to an end. Whilst these codes will remain in use, new EMDB accession codes will include an additional digit and will expand incrementally as the available range of codes is exhausted. The current 4-digit format prefixed with “EMD-” (i.e. EMD-XXXX) will advance to a 5-digit format (i.e. EMD-XXXXX), and so on. It is currently estimated that the 4-digit codes will be depleted around Spring 2019, at which point the 5-digit format will come into force.
The EM Navigator/Yorodumi systems omit the EMD- prefix.
Related info.:Q: What is EMD? / ID/Accession-code notation in Yorodumi/EM Navigator
Yorodumi is a browser for structure data from EMDB, PDB, SASBDB, etc.
This page is also the successor to EM Navigator detail page, and also detail information page/front-end page for Omokage search.
The word "yorodu" (or yorozu) is an old Japanese word meaning "ten thousand". "mi" (miru) is to see.
Related info.:EMDB / PDB / SASBDB / Comparison of 3 databanks / Yorodumi Search / Aug 31, 2016. New EM Navigator & Yorodumi / Yorodumi Papers / Jmol/JSmol / Function and homology information / Changes in new EM Navigator and Yorodumi