- What is CRNPRED?
- How to use CRNPRED server
- How CRNPRED predicts 1D structures?
- How accurate is CRNPRED?
- Locally installing CRNPRED
- REST interface to CRNPRED
[What is CRNPRED?]
CRNPRED is a web-based service that predicts one-dimensional (1D) protein structures including secondary structures, contact numbers, and residue-wise contact orders from amino acid sequence. Prediction results are returned by email.
[How to use CRNPRED server]
You paste a FASTA-formatted amino acid sequence, select one of the three available predictors, and supply your email address. Then click the submit button. After a while, the result will be sent back to you.
Currently, three prediction methods are available:
A simple linear regression method based on PSSM. This method itself is fast, but compared to other methods, is less accurate.
CRN-based method with 2000-dimensional state vectors. This is slower than the linear predictor, but generates more accurate predictions. This is the default predictor.
Same as CRN2000, but with 5000-dimensional state vectors. This is the slowest, but the most accurate predictor.
It should be noted, however, that the most time-consuming part of the prediction is the generation of PSSM by PSI-BLAST. Depending on the size of your protein sequence, the CPU-time for the whole prediction process may not vary so much depending on the predictors.
How should I interpret the results?
After you have submitted a protein sequence to the CRNPRED server, you will get a result something like the following:
References: 0. CRNPRED prediction server: https://pdbj.org//crnpred/ 1. ...... ;;BEGIN jobid: F2.1259394568.774714 # Input sequence >1GOFA:GALACTOSE OXIDASE ASAPIGSAISRNNWAVTCDSAQSGNECNKAIDGNKDTFWHTFYGANGDPKPPHTYTIDMK TTQNVNGLSMLPRQDGNQNGWIGRHEVYLSSDGTNWGSPVASGSWFADSTTKYSNFETRP ..... # prediction by CRN2000 # * * * * * * AA: ASAPIGSAISRNNWAVTCDSAQSGNECNKAIDGNKDTFWHTFYGANGDPKPPHTYTIDMK SS: CCCCCCCCCCCCCEEEEECCCCCCCCCCEEECCCCCCCECCCCCCCCCCCCCEEEEEECC CN: BBBBBBBBEBBBBEEEEEEEBEBBBEEEEEEEEEEEEEEEEBEBBBBBBEEEEEEEEEEE # * * * * * * AA: TTQNVNGLSMLPRQDGNQNGWIGRHEVYLSSDGTNWGSPVASGSWFADSTTKYSNFETRP SS: CCEEEEEEEEECCCCCCCCCCCEEEEEEECCCCCCCCEEEECCCCCCCCCCEEEEECCCC CN: BEEEEEEEEEEEEEEBBBBEEEEEEEEEEEEEBBEEEEEEEEEEEBBBBBEEEEEEEEEE ...... // ># AA : SS P_H P_E P_C : CN : RWCO 1 A : C 5 6 89 : B 13 : 2402 2 S : C 6 7 87 : B 10 : 2287 3 A : C 7 8 85 : B 12 : 2757 4 P : C 7 8 85 : B 17 : 3635 5 I : C 7 7 86 : B 16 : 3643 6 G : C 7 7 86 : B 16 : 3868 7 S : C 7 7 86 : B 16 : 3692 8 A : C 7 8 85 : B 22 : 4265 9 I : C 7 9 84 : E 33 : 5742 10 S : C 7 8 85 : B 20 : 3646 11 R : C 6 7 87 : B 21 : 3978 12 N : C 6 7 86 : B 16 : 3615 13 N : C 9 20 71 : B 20 : 3969 14 W : E 10 63 27 : E 38 : 5945 15 A : E 6 87 7 : E 32 : 4798 16 V : E 5 91 4 : E 47 : 6656
- References point to the web site and papers that are relevant to CRNPRED.
jobid is the identifier of this job.
- You can access the (same) result at the web site https://pdbj.org//crnpred/crnpred.cgi?jobid=... by specifying the jobid in ...
- Then comes your protein sequence.
prediction by CRN2000 summarizes the prediction results.
- AA means the amino acid sequence (input).
- SS means predicted secondary structures where H is (alpha) helix, E is (beta) strand, and C is coil (all others).
- CN means predicted contact numbers encoded in 2 states where B indicates buried and E means exposed.
- The lines after# AA : SS P_H P_E P_C : CN : RWCOgive the details of the predicted quantities.
- AA again means the amino acid residues;
- SS, secondary structures
- P_H, P_E, and P_C mean the probability of finding helix (H), strand (E) and coil (C) structures at each sequence position.
- CN column gives the predicted contact numbers (integers and 2-state encoding).
- RWCO column gives the residue-wise contact order (in integers).
I don't receive any email from CRNPRED!
One possible reason is that the email returned by CRNPRED has been classified as a SPAM. Please check your SPAM email folder and teach your email client program that CRNPRED emails are not SPAMs.
Otherwise, our server may be overcrowded with many requests. Please wait for some time.
If you have any doubt that CRNPRED failed for some reason, please contact us from the query form at this site. Please include the wordCRNPREDin the subject field.
[How CRNPRED predicts 1D structures?]
CRNPRED uses a position-specific scoring matrix (PSSM) generated by PSI-BLAST as its input, and based on the PSSM, a machine-learning method called Critical Random Network (CRN) is applied. CRN can extract some hidden patterns in the PSSM. The patterns are expressed as a set of high-dimensional state vectors which are used as input variables for a simple linear regression to produce predicted secondary structures, contact numbers and residue-wise contact orders.
More concretely, the following procedure is applied:
- Run PSI-BLAST (3 iterations) against UniRef90 sequence database to generate a PSSM.
- Run CRNPRED
- In case of the linear predictor, this takes much less than 1 second.
- In case of CRN2000 and CRN5000, this typically takes 1-5 minutes. (the required CPU time increases approximately linearly with the sequence length.)
Except for the linear predictor, CRNPRED combines predictions from 20 independent predictors which were trained differently. This ensemble prediction significantly increases the accuracy.
[How accurate is CRNPRED?]
According to benchmark (Ref. 1 and 2, below), the average accuracies are the following:
CRN5000 (from Ref. 1)
|SS||Q3= 80.5%||SOV = 80.0%|
|CN||Cor = 0.746||DevA = 0.686|
|RWCO||Cor = 0.613 (0.646)||DevA = 0.877|
Linear (from Ref. 2)
|SS||Q3 = 75.2 %||SOV = 72.7%|
|CN||Cor = 0.701||DevA = 0.735|
|RWCO||Cor = 0.584||DevA = 0.902|
[Locally installing CRNPRED]
You may download the source code of CRNPRED from
You will also need to install BLAST and set up an appropriate sequence database thereof. See the instructions in the above web site.
The following articles are all open-access (i.e., freely downloadable).
Highly accurate prediction of one-dimensional protein structures by large-scale critical random
Kinjo, A. R.; Nishikawa, K. BMC Bioinformatics 7 :401 (2006) [Primary reference]
- Predicting secondary structures, contact numbers, and residue-wise contact orders of
native protein structure from amino acid sequence using critical random networks.
Kinjo, A. R.; Nishikawa, K. BIOPHYSICS 1 :67-74 (2005) [Basic methodology]
- Recoverable one-dimensional encoding of three-dimensional protein structures.
Kinjo AR, Nishikawa K. Bioinformatics, 21 :2167-2170 (2005) [1D structures]
[REST interface to CRNPRED]
You can call the CRNPRED web service from your program.
The content of the FASTA file.
The type of the predictors. Possible values are lin (for linear predictor), crnpred2k (for CRN2000), and crnpred5k (for CRN5000).
The job ID for your query. This is issued once you have submitted a query. This parameter is used for retrieving the prediction result.
Submitting a query
You should use the HTTP POST method. Using the curl program, a query may be submitted like:
curl -F "email@example.com" -F "method=lin" "https://pdbj.org//crnpred/" > myquery.xml
where the sequence is saved in the file hoge.seq.
Then, an xml (xhtml 1.1) page is saved in the file named myquery.xml which contains the job ID of your query. The job ID is in the h2 tag with class attribute value of jobid. That is, something like
where DC.1244676954.282566 is the job ID.
If you are writing a program that calls the CRNPRED REST service, you should be able to write something equivalent to the above example which uses curl.
Retrieving the result
You can retrieve the result of your query by specifying the jobid parameter. Again, using the curl program,
curl "https://pdbj.org//crnpred/?jobid=DC.1244676954.282566" > result.txt
If the job has been completed, the prediction result is saved in the file named result.txt. The result is a plain text.
;;BEGIN (your result here) ;;END
If the job has not yet finished, the result will beIn progress.Then, you should try again after some time. If the job ID doesn't exist,Not found.is returned.