NER

This page is also available in: 日本語

[NER]

NER stands for the Number of Equivalent Residues [1] is an empirical measure of the similarity of two protein structures. The NER superposition corresponds to the rotation and translation that maximizes the NER score. The NER score is bounded between 1 and the number of aligned residues.

In the discussion below, we will describe NER superposition and scoring. We assume an alignment exists and is of the form:

jA = m(j,A)
jB = m(j,B)

where jA and jB are residue indices in proteins A and B, respectively, and m is a map between the two sequences.

Given two structures A and B, with residue indices iA and iB , and coordinates RA and RB, we assume that there exists an alignment that lists equivalent residue pairs in the two structures. That is, for the kth residue pair,RA[m(k,A)] is aligned to RB[m(k,B)].

We further assume that we are free to rotate and translate structure A using matrix M and translation vector V, respectively:

RA' = MRA + V

M is a function of three Euler angles and V is composed of three Cartesian coordinates. In our case the score is a function only of the inter-residue Euclidian distances

(1) dk = sqrt[(RA[m(k,A)]'-RB[m(k,B)])^2]

Thus, the maximization of NER represents a six-dimensional optimization problem, the goal of which is to rotate and translate structure A such that the NER score is a maximum.

In the NER method, the score is given by the sum over all aligned residue pairs of a similarity function:

(2) SUM S_dcut(dk),

where k corresponds to an aligned residue pair, dk is the distance between atoms in the two residues (equation 1), and the residue-based score is a Gaussian curve with unit amplitude at zero distance:

(3) S_dcut = exp[ -(dk/d_cut)^2].

Note that the residue-based score does not represent a statistical distribution; rather it is a convenient functional form for rewarding residue pairs separatnued by a small distance, much in the way contact potentials are used in protein folding simulations [5]. Equation 3 monotonically decreases as the distance increases so residue pairs outside the cutoff distance may contribute to the sum. Since the amplitude of equation 3 is bounded within [0-1], the sum in equation 2 represents the effective number of equivalent residues. For all calculations dcut was set to 4A. Note that if the similarity score in equation 3 is replaced by a step function, then equation 2 reduces to the scoring function used in the GDT method[3,4]. In a similar manner, Gerstein and Levitt have employed an inverse distance function in conjunction with iterative dynamic programming for structural alignment[6].

We can maximize the target function directly by optimizing the relative rotational and translational degrees of freedom between the two structures.Â Since equation 2 is differentiable, many techniques may be employed to maximize its value. We found that if the initial alignment is close to optimal, the NER score can be maximized by first minimizing the Â RMSD of the aligned residues, then refining the superposition by conjugate gradient optimization. The method of Mclachlan is used for RMSD minimization[2]. For the maximization step we use the Fletcher-Reeves-Polak-Rebiere method as implemented in the Numerical Recipes program frprmn [7].

Bibliography

1. Standley DM, Toh H, Nakamura H. Detecting local structural similarity in proteins by maximizing number of equivalent residues. Proteins 2004;57(2):381-391

2.McLachlan AD. Gene duplications in the structural evolution of chymotrypsin. J Mol Biol 1979;128(1):49-79.

3.Zemla A, Venclovas C, Reinhardt A, Fidelis K, Hubbard TJ. Numerical criteria for the evaluation of ab initio predictions of protein structure. Proteins 1997;Suppl 1:140-150.

4.Zemla A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res 2003;31(13):3370-3374.

5.Maiorov VN, Crippen GM. Contact potential that recognizes the correct folding of globular proteins. J Mol Biol 1992;227(3):876-888.

6.Gerstein M, Levitt M. Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures. Proc Int Conf Intell Syst Mol Biol 1996;4:59-67.

7.Press WH, Teukolsky SA, Vettering WT, Flannery BP. Numerical Recipes in C. Cambridge: Cambridge Univ. Press; 1992.

Created: 2012-07-13 (last edited: more than 1 year ago)