CRNPRED

This page is also available in: 日本語

[Contents]

[URL]

[What is CRNPRED?]

CRNPRED is a web-based service that predicts one-dimensional (1D) protein structures including secondary structures, contact numbers, and residue-wise contact orders from amino acid sequence. Prediction results are returned by email.

[How to use CRNPRED server]

You paste a FASTA-formatted amino acid sequence, select one of the three available predictors, and supply your email address. Then click the submit button. After a while, the result will be sent back to you.

Currently, three prediction methods are available:

linear: A simple linear regression method based on PSSM. This method itself is fast, but compared to other methods, is less accurate.
CRN2000: CRN-based method with 2000-dimensional state vectors. This is slower than the linear predictor, but generates more accurate predictions. This is the default predictor.
CRN5000: Same as CRN2000, but with 5000-dimensional state vectors. This is the slowest, but the most accurate predictor.

It should be noted, however, that the most time-consuming part of the prediction is the generation of PSSM by PSI-BLAST. Depending on the size of your protein sequence, the CPU-time for the whole prediction process may not vary so much depending on the predictors.

How should I interpret the results?

After you have submitted a protein sequence to the CRNPRED server, you will get a result something like the following:

References:
0. CRNPRED prediction server: https://pdbj.org//crnpred/

1. ......

;;BEGIN
jobid: F2.1259394568.774714
# Input sequence
>1GOFA:GALACTOSE OXIDASE
ASAPIGSAISRNNWAVTCDSAQSGNECNKAIDGNKDTFWHTFYGANGDPKPPHTYTIDMK
TTQNVNGLSMLPRQDGNQNGWIGRHEVYLSSDGTNWGSPVASGSWFADSTTKYSNFETRP
.....

#   prediction by CRN2000
#                  *         *         *         *         *         *
AA:       ASAPIGSAISRNNWAVTCDSAQSGNECNKAIDGNKDTFWHTFYGANGDPKPPHTYTIDMK
SS:       CCCCCCCCCCCCCEEEEECCCCCCCCCCEEECCCCCCCECCCCCCCCCCCCCEEEEEECC
CN:       BBBBBBBBEBBBBEEEEEEEBEBBBEEEEEEEEEEEEEEEEBEBBBBBBEEEEEEEEEEE
#                  *         *         *         *         *         *
AA:       TTQNVNGLSMLPRQDGNQNGWIGRHEVYLSSDGTNWGSPVASGSWFADSTTKYSNFETRP
SS:       CCEEEEEEEEECCCCCCCCCCCEEEEEEECCCCCCCCEEEECCCCCCCCCCEEEEECCCC
CN:       BEEEEEEEEEEEEEEBBBBEEEEEEEEEEEEEBBEEEEEEEEEEEBBBBBEEEEEEEEEE
......
//

>#   AA : SS P_H P_E P_C : CN     : RWCO
   1 A : C    5   6  89 : B   13 : 2402
   2 S : C    6   7  87 : B   10 : 2287
   3 A : C    7   8  85 : B   12 : 2757
   4 P : C    7   8  85 : B   17 : 3635
   5 I : C    7   7  86 : B   16 : 3643
   6 G : C    7   7  86 : B   16 : 3868
   7 S : C    7   7  86 : B   16 : 3692
   8 A : C    7   8  85 : B   22 : 4265
   9 I : C    7   9  84 : E   33 : 5742
  10 S : C    7   8  85 : B   20 : 3646
  11 R : C    6   7  87 : B   21 : 3978
  12 N : C    6   7  86 : B   16 : 3615
  13 N : C    9  20  71 : B   20 : 3969
  14 W : E   10  63  27 : E   38 : 5945
  15 A : E    6  87   7 : E   32 : 4798
  16 V : E    5  91   4 : E   47 : 6656

where

References point to the web site and papers that are relevant to CRNPRED.
jobid is the identifier of this job.
- You can access the (same) result at the web site https://pdbj.org//crnpred/crnpred.cgi?jobid=... by specifying the jobid in ...
Then comes your protein sequence.
prediction by CRN2000 summarizes the prediction results.
- AA means the amino acid sequence (input).
- SS means predicted secondary structures where H is (alpha) helix, E is (beta) strand, and C is coil (all others).
- CN means predicted contact numbers encoded in 2 states where B indicates buried and E means exposed.
The lines after# AA : SS P_H P_E P_C : CN : RWCOgive the details of the predicted quantities.
- AA again means the amino acid residues;
- SS, secondary structures
- P_H, P_E, and P_C mean the probability of finding helix (H), strand (E) and coil (C) structures at each sequence position.
- CN column gives the predicted contact numbers (integers and 2-state encoding).
- RWCO column gives the residue-wise contact order (in integers).

I don't receive any email from CRNPRED!

One possible reason is that the email returned by CRNPRED has been classified as a SPAM. Please check your SPAM email folder and teach your email client program that CRNPRED emails are not SPAMs.

Otherwise, our server may be overcrowded with many requests. Please wait for some time.

If you have any doubt that CRNPRED failed for some reason, please contact us from the query form at this site. Please include the wordCRNPREDin the subject field.

[How CRNPRED predicts 1D structures?]

CRNPRED uses a position-specific scoring matrix (PSSM) generated by PSI-BLAST as its input, and based on the PSSM, a machine-learning method called Critical Random Network (CRN) is applied. CRN can extract some hidden patterns in the PSSM. The patterns are expressed as a set of high-dimensional state vectors which are used as input variables for a simple linear regression to produce predicted secondary structures, contact numbers and residue-wise contact orders.

More concretely, the following procedure is applied:

Run PSI-BLAST (3 iterations) against UniRef90 sequence database to generate a PSSM.
Run CRNPRED
- In case of the linear predictor, this takes much less than 1 second.
- In case of CRN2000 and CRN5000, this typically takes 1-5 minutes. (the required CPU time increases approximately linearly with the sequence length.)

Except for the linear predictor, CRNPRED combines predictions from 20 independent predictors which were trained differently. This ensemble prediction significantly increases the accuracy.

[How accurate is CRNPRED?]

According to benchmark (Ref. 1 and 2, below), the average accuracies are the following:

CRN5000 (from Ref. 1)

SS	Q3= 80.5%	SOV = 80.0%
CN	Cor = 0.746	DevA = 0.686
RWCO	Cor = 0.613 (0.646)	DevA = 0.877

Linear (from Ref. 2)

SS	Q3 = 75.2 %	SOV = 72.7%
CN	Cor = 0.701	DevA = 0.735
RWCO	Cor = 0.584	DevA = 0.902

[Locally installing CRNPRED]

You may download the source code of CRNPRED from

http://www.bioinformatics.org/crnpred/

You will also need to install BLAST and set up an appropriate sequence database thereof. See the instructions in the above web site.

[References]

The following articles are all open-access (i.e., freely downloadable).

CRNPRED: Highly accurate prediction of one-dimensional protein structures by large-scale critical random networks.
Kinjo, A. R.; Nishikawa, K. BMC Bioinformatics 7 :401 (2006) [Primary reference]
Predicting secondary structures, contact numbers, and residue-wise contact orders of native protein structure from amino acid sequence using critical random networks.
Kinjo, A. R.; Nishikawa, K. BIOPHYSICS 1 :67-74 (2005) [Basic methodology]
Recoverable one-dimensional encoding of three-dimensional protein structures.
Kinjo AR, Nishikawa K. Bioinformatics, 21 :2167-2170 (2005) [1D structures]

[REST interface to CRNPRED]

You can call the CRNPRED web service from your program.

CGI parameters

fasta: The content of the FASTA file.
method: The type of the predictors. Possible values are lin (for linear predictor), crnpred2k (for CRN2000), and crnpred5k (for CRN5000).
jobid: The job ID for your query. This is issued once you have submitted a query. This parameter is used for retrieving the prediction result.

Submitting a query

You should use the HTTP POST method. Using the curl program, a query may be submitted like:

curl -F "fasta=@hoge.seq" -F "method=lin" "https://pdbj.org//crnpred/"  > myquery.xml

where the sequence is saved in the file hoge.seq.

Then, an xml (xhtml 1.1) page is saved in the file named myquery.xml which contains the job ID of your query. The job ID is in the h2 tag with class attribute value of jobid. That is, something like

<h2 class="jobid">DC.1244676954.282566</h2>

where DC.1244676954.282566 is the job ID.

If you are writing a program that calls the CRNPRED REST service, you should be able to write something equivalent to the above example which uses curl.

Retrieving the result

You can retrieve the result of your query by specifying the jobid parameter. Again, using the curl program,

curl "https://pdbj.org//crnpred/?jobid=DC.1244676954.282566" > result.txt

If the job has been completed, the prediction result is saved in the file named result.txt. The result is a plain text.

;;BEGIN
(your result here)
;;END

If the job has not yet finished, the result will beIn progress.Then, you should try again after some time. If the job ID doesn't exist,Not found.is returned.

Created: 2012-07-13 (last edited: more than 1 year ago)