Loading
PDBj
MenuPDBj@FacebookPDBj@TwitterPDBj@YouTubewwPDB FoundationwwPDB
RCSB PDBPDBeBMRBAdv. SearchSearch help

NER

This page is also available in: 日本語

[NER]

NER stands for the Number of Equivalent Residues [1] is an empirical measure of the similarity of two protein structures. The NER superposition corresponds to the rotation and translation that maximizes the NER score. The NER score is bounded between 1 and the number of aligned residues.

In the discussion below, we will describe NER superposition and scoring. We assume an alignment exists and is of the form:

jA = m(j,A)
jB = m(j,B)

where jA and jB are residue indices in proteins A and B, respectively, and m is a map between the two sequences.

Given two structures A and B, with residue indices iA and iB , and coordinates RA and RB, we assume that there exists an alignment that lists equivalent residue pairs in the two structures. That is, for the kth residue pair,RA[m(k,A)] is aligned to RB[m(k,B)].

We further assume that we are free to rotate and translate structure A using matrix M and translation vector V, respectively:

RA' = MRA + V

M is a function of three Euler angles and V is composed of three Cartesian coordinates. In our case the score is a function only of the inter-residue Euclidian distances

(1) dk = sqrt[(RA[m(k,A)]'-RB[m(k,B)])^2]

Thus, the maximization of NER represents a six-dimensional optimization problem, the goal of which is to rotate and translate structure A such that the NER score is a maximum.

In the NER method, the score is given by the sum over all aligned residue pairs of a similarity function:

(2) SUM S_dcut(dk),

where k corresponds to an aligned residue pair, dk is the distance between atoms in the two residues (equation 1), and the residue-based score is a Gaussian curve with unit amplitude at zero distance:

(3) S_dcut = exp[ -(dk/d_cut)^2].

Note that the residue-based score does not represent a statistical distribution; rather it is a convenient functional form for rewarding residue pairs separatnued by a small distance, much in the way contact potentials are used in protein folding simulations [5]. Equation 3 monotonically decreases as the distance increases so residue pairs outside the cutoff distance may contribute to the sum. Since the amplitude of equation 3 is bounded within [0-1], the sum in equation 2 represents the effective number of equivalent residues. For all calculations dcut was set to 4A. Note that if the similarity score in equation 3 is replaced by a step function, then equation 2 reduces to the scoring function used in the GDT method[3,4]. In a similar manner, Gerstein and Levitt have employed an inverse distance function in conjunction with iterative dynamic programming for structural alignment[6].

We can maximize the target function directly by optimizing the relative rotational and translational degrees of freedom between the two structures. Since equation 2 is differentiable, many techniques may be employed to maximize its value. We found that if the initial alignment is close to optimal, the NER score can be maximized by first minimizing the  RMSD of the aligned residues, then refining the superposition by conjugate gradient optimization. The method of Mclachlan is used for RMSD minimization[2]. For the maximization step we use the Fletcher-Reeves-Polak-Rebiere method as implemented in the Numerical Recipes program frprmn [7].

Bibliography

1. Standley DM, Toh H, Nakamura H. Detecting local structural similarity in proteins by maximizing number of equivalent residues. Proteins 2004;57(2):381-391

2.McLachlan AD. Gene duplications in the structural evolution of chymotrypsin. J Mol Biol 1979;128(1):49-79.

3.Zemla A, Venclovas C, Reinhardt A, Fidelis K, Hubbard TJ. Numerical criteria for the evaluation of ab initio predictions of protein structure. Proteins 1997;Suppl 1:140-150.

4.Zemla A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res 2003;31(13):3370-3374.

5.Maiorov VN, Crippen GM. Contact potential that recognizes the correct folding of globular proteins. J Mol Biol 1992;227(3):876-888.

6.Gerstein M, Levitt M. Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures. Proc Int Conf Intell Syst Mol Biol 1996;4:59-67.

7.Press WH, Teukolsky SA, Vettering WT, Flannery BP. Numerical Recipes in C. Cambridge: Cambridge Univ. Press; 1992.


Created: 2012-07-13 (last edited: more than 1 year ago)2014-08-12

218500

PDB entries from 2024-04-17

PDB statisticsPDBj update infoContact PDBjnumon