The rigid body fitting problem is classified by several points. The first point is the number of subunits to be fitted on the map. Fitting only one subunit on the map is called the single subunit fitting problem, whereas the fitting of more than one subunit is called the multiple subunit fitting problem. Another point is the locality of the map to be fitted by the subunits. The fitting of one or multiple subunits on the entire region of the given map is called as a global fitting problem, whereas the fitting on part of the given map, is called a partial fitting problem. In view of the computation costs, the single fitting problem is much easier than the multiple problem. An exhaustive search is often possible for the single problem, if the six degrees of freedom are properly discretized. The single global problem is also solved by the principal-axes transformations. When the number of subunits becomes large (such as > 10), the computation cost for the search increases exponentially.
The program gmfit superimposes molecular subunits into the density map of their complex. To reduce computational costs for the superimposiing, both subunits and complexes are transformed into GMM (Gaussian Mixture Model). Transformation of molecular models into GMM can be done usint another program gmconvert. The program is designed to superimpose multiple atomic models of subunits into a low-resolution 3D density map, however, it is also applied to superimposing two 3D density maps, or two atomic models.
The source code of gmfit is written in C assuming the compiler "gcc" in Linux environment. After you download the file "gmfit-src-[date].tar.gz", just type following commands:
tar zxvf gmfit-src-[date].tar.gz cd src makeThen you will find the execute file "gmfit" in the upper directory (../src).
We prepare two methods to input parameters for the program gmfit. The first method is using arguments of the command gmfit, the second method is preparing the parameter file for the program. An example of the argument is input a command as follows:
gmfit -cg 1afwAB_r10_g16.gmm -sg1 1afwA_g8.gmm -sg2 1afwB_g8.gmm -sa1 1afwA.pdb -sa2 1afwB.pdb -NI 1000The same calculation can be done using the parameter file method. If you prepare the following file "1afw.gmfit":
## <1afw.gmfit> ## -cg 1afwAB_r10_g16.gmm -sg1 1afwA_g8.gmm -sg2 1afwB_g8.gmm -sa1 1afwA.pdb -sa2 1afwB.pdb -NI 1000Lines starting with '#' are comments, not read by the program. You input just a following command, then the gmfit will start the calculation.
gmfit -script 1afw.gmfitYou can use both argument and a parameter file, such as follows:
gmfit -script 1afw.gmfit -NI 10If the same option (in this case '
-NI
') is assigned by both argument and script file, the option value given by the argument has a priority:
'-NI
' is set to '10
', not '100
'.
Details of these options and parameter files will be shown using the option '-h
'.
gmfit -h
The system for the gmfit is composed of two types of molecule:COMPLEX and SUBUNIT.
-cg
' option.
-cg : input GMM for the complex []Other options about the COMPLEX molecule are as follows. They are not obligatory.
-cm : input 3D density map files in CCP4 (*.map) format for the complex [] -ca : input atomic PDB file for the complex [] -cutoff : cutoff density for the map (-cm) if density < [-cutoff], it is regarded as zero density. [-10.000000]The SUBUNIT molecule is one of the element molecule composing the COMPLEX molecule. Their positions and orientations are optimezed by the program. We assume the SUBUNIT molecule is represented by both atomic model and GMM. The file for atomic model of the i-th SUBUNIT is assined by the option '
-sa[
i]
' (-sa1, -sa2, -sa3
,...).
The file for GMM of the i-th SUBUNIT is assined by the option '-sg[
i]
' (-sg1, -sg2, -sg3
,...).
-sa1,-sa2,-sa[x]: input atomic PDB file for 1st,2nd,x-th subunits (MAX_SUBUNIT 21)[ ] -sg1,-sg2,-sg[x]: input GMM for 1st,2nd,x-th subunits (MAX_SUBUNIT 21)[ ]
The program gmfit searches the optimal configurations of subunits by four steps: I-step (initialization), S-step (search), R-step (refinement) and D-step (detailed fitting). Among the four, the last D-step is optional; this step has better sensitivity than GMM-based fitting, but requires much larger computatinonal costs.
NI
configuration, only the best NS
configurations are optimized by the next
S-step (search).
Finaly, among NS
configuration, only the best NR
configurations are optimized by
the final R-step (refinement).
Number of initial configuration,
that of search configuration,
and that of refinement configuration,
are assigned by the options
'-NI
'.
'-NS
'.
and '-NR
', respectively.
For each step, the best NIO
, NSO
, NRO
, NDO
configurations are written.
-NI : number of initial configurations. [10] -NS : number of configurations for searching [10] -NR : number of configurations for refinement [10] -ND : number of configurations for detailed fitting[-1]
-NIO : number of output configuraions for I-step [4] -NSO : number of output configuraions for S-step [4] -NRO : number of output configuraions for R-step [4] -NDO : number of output configuraions for D-step [1]Detals for each step are summarized as follows.
The method for generating random initial configuraions is assigned by the option '-I
'.
-I : Initial Configuration. 'R':random, 'P':principal (pca) axis fittng, 'GL':Grid-Layout : 'CSS': Combination of Single-subunit Searches. 'F':from PDB files : 'X':do nothing. use input subunit config as the initial[R] : 'numeric': test for analytic forace/torque by numeric calculation [R]
-NI
is equal to 4,
the 4 orientations of the subunit are genrated by matching three principal axis of the subunit with those of the complex
by the order of their eigen values.
If the number -NI
is equal to 24,
orientations of the subunit are genrated by matching three principal axis of the subunit with those of the complex
without considering the order of their eigen values.
If the number -NI
is more than 24,
the random generation procedure (same as -I R
) is performed to generate aditional random initial configurations with the number more than 24.
-sg[x]
and -sa[x]
)
is used as the initial configuration.
-S : Search method [X]. 'SF':segmentation-fitting (SegFit), 'MSF':masked segmentaion-fitting : 'X': do nothingSegmentation-fitting(
-S SF
) or masked segmentation-fitting(-S MSF
)
is performed.
For local fitting or multiple-subunits fitting, we recommend to use masked segmentation-fitting(-S MSF
).
The method for the refinement is assigned by the option -R
.
-R : Refinement method. 'S'teepest Descent, 'X': do nothing [S] : 'STR':SteepestDescent_ForceTorque_TransRot_CoordinateDescentIn the detault setting,the local search will be performed by the steepest descent method (
-R S
).
-D : Detailed fitting method for Atom-vs-Voxel or Voxel-vs-Voxel.[X] : 'S'teepest Descent, 'X':do nothing : 'E':just evaluate detailed energy for configurations of R-step. : 'STR':SteepestDescent_ForceTorque_TransRot_CoordinateDescentThe D-step does not use GMM; it uses 3D map for the complex, atomic model or 3D map for the subunits. The option
-cm
for the 3D map file for the complex is necessary.
For the subuntis, atomic model files -sa1, -sa2, -sa[x]
, or
3D map files -sm1, -sm2, -sm[x]
are necessary.
Because the D-step employs more detailed models than GMM,
its results should be more accurate, but it requires a longer computational time,
especially steepest descent method (-D S
).
For faster compuatation, we recommend to use the option -D E
.
it just evaluate detailed energy for configurations of R-step without changing them,
and sort these condifugurations using the detrailed energy.
All the results will be written in the output directory, assigned by the option '-odir
'. The default output directory is the current directory (.
).
If -script [filename].gmfit
is given, output directroy assigned as [filename]
.
-odir : Output directory for optimizing results [.]In the output directory assigned by
-odir
, following files will be written.
init.sum.txt
: summary of iniital I-step configutations.
srch.sum.txt
: summary of search S-step configutations.
refn.sum.txt
: summary of refine R-step configutations.
detl.sum.txt
: summary of detail D-step configutations.
init_1.pdb, init_1.gmm, ... , init_[NIO].pdb, init_[NIO].gmm
: PDB and GMM files generated by I-step
srch_1.pdb, srch_1.gmm, ... , srch_[NSO].pdb, srch_[NSO].gmm
: PDB and GMM files generated by S-step
refn_1.pdb, refn_1.gmm, ... , refn_[NRO].pdb, refn_[NRO].gmm
: PDB and GMM files generated by R-step
detl_1.pdb, detl_1.gmm, ... , detl_[NDO].pdb, detl_[NDO].gmm
: PDB and GMM files generated by D-step
Symmetrical constrants can be assigned by the options
-csym
and -homounit[x]
:
The option -csym
assigns the symmetry group of the complex.
-csym :symmetric group type for the complex. 'C2','C3',..., 'C7','D2','D3'.[]
The option -homounit[n]
assigns id numbers of the homologous multimer group.
-homounit[n]:subunit id numbers for [n]-th homo multimer group. [n] is 1,2,3,...,[]
:subunit id numbers are given by camma-splited-form, such as '1,2,3','2,4,6,8,10'.
For example, if the subunits 1,2,3 are idential, and they have 120-degree rotational symmetry,
you should assign "-csym C3 -symunit1 1,2,3
".
If the complex is composed of two types of chains (alpha and beta), is composed of three alpha chains (subunit 1,2,3) and
three beta chains (subunits 4,5,6), and these three chains have 120-degree rotational symmetry,
you should assign "-csym C3 -homounit1 1,2,3 -homounit2 4,5,6
".
Please keep in mind that the symmetric constraint only works for subunit-GMMs with the same shape.
The "same" means the very strict sense; the subunit GMMs should contain
the "same" number of the "same"-shape gaussian distribution functions
with the "same" configurations. The pose (orientation and position) of the subunit GMM can be different.
If you do not have a special reason, you would be better assign the same GMM file for subunits under symmetric constraints.
If you use -tpdb
option of gmconvert
program, you can make the same shape GMM with the different pose
from a homo oligomeric PDB file (see tutorial).
For the first simple calculation, we will show procedures for superimpose two identical subunits into a simulated low resolution map of dimeric complex structure, using PDB entry 1afw. We have to prepare three PDB files, "1afwAB.pdb":the 3D structure for the dimer, "1afwA.pdb" :the 3D structure of the subunit A, "1afwB.pdb" :the 3D structure of the subunit B. First the atomic model 1afwAB.pdb is transformed into a low resotluion 3D density map with 10.0 angstrom resolution using the program gmconvert.
gmconvert A2V -ipdb 1afwAB.pdb -reso 10.0 -gw 2.0 -omap 1afwAB_r10.mapNotice the line
MIN_DENSITY_AMONG_ATOM_CENTERS 4.625308e-01 0.462531
is shown as the stdout output by "gmconvert A2V".
This value will be used to convert the map to GMM as the option -cutoff
.
Next, the map will be transformed into GMMs with Ngauss = 16 using the program gmconvert.
gmconvert V2G -imap 1afwAB_r10.map -cutoff 0.462531 -ng 16 -ogmm 1afwAB_r10_g16.gmmThe
-cutoff
value for the map is always important to make a GMM with a proper size.
gmconvert A2G -ipdb 1afwA.pdb -ng 8 -ogmm 1afwA_g8.gmmThen a following parameter file "1afw.gmfit" is prepared:
## <1afw.gmfit> ## -cg 1afwAB_r10_g16.gmm -sg1 1afwA_g8.gmm -sg2 1afwA_g8.gmm -sa1 1afwA.pdb -sa2 1afwA.pdb -I R -S SF -R S -D X -NI 100 -NS 100 -NR 10 -ND 0 -NRO 5Finally, the following command generates several result files into the directory '1afw'.
gmfit -script 1afw.gmfit
In the directory 1afw
, the file refn_1.pdb
is the best configuration.
The summary of candidate configs is described in refn_sum.txt
.
The script with C2 symmetry constraint is as follows.
The additional options -csym C2
and -homounit1 1,2
were added.
## <1afw_symC2.gmfit> ## -cg 1afwAB_r10_g16.gmm -csym C2 -sg1 1afwA_g8.gmm -sg2 1afwA_g8.gmm -sa1 1afwA.pdb -sa2 1afwA.pdb -homounit1 1,2 -I R -S SF -R S -D X -NI 100 -NS 100 -NR 10 -ND 0 -NRO 5The script file with Dstep
-D E
is as follows.
The additional options -cm
, -sa1
, -sa2
, -sreso1
, -sreoso2
are necessary.
## <1afw_D_E.gmfit> ## -cg 1afwAB_r10_g16.gmm -cm 1afwAB_r10.map -sg1 1afwA_g8.gmm -sg2 1afwA_g8.gmm -sa1 1afwA.pdb -sa2 1afwA.pdb -sreso1 10 -sreso2 10 -I R -S SF -R S -D E -NI 100 -NS 100 -NR 10 -ND 5 -NRO 5 -NDO 5
The next problem is fitting an atomic model of a complex into a density map of the same complex.
The yeast's ribosome structure is taken for the example.
The atomic model taken from PDB code 4v7r
is going to be fitted into the density map of EMDB-1067.
The mmCIF file for biological unit 1 "4v7r-assembly1.cif" should be used for our purpose.
The atomic model "4v7r-assembly1.cif" is converted to GMM using a following command.
Because the ribosome contains too many atoms, we recommend to add the option -atmsel R -atmrw R
to use residue-based model instead of atom-based model for fast GMM estimation.
gmconvert A2G -icif 4v7r-assembly1.cif -atmsel R -atmrw R -ng 10 -ogmm 4v7r-assembly1_g10.gmmFor the large atomic model such as ribosome, a simple down sampling algorithms of atoms is much faster than the EM algorithm. A following command considers a 100-angstrom-width cube corresponds to one Gaussian function.
gmconvert A2G -icif 4v7r-assembly1.cif -emalg D -dsatm E -wdsatm 100 -ogmm 4v7r-assembly1_w100.gmmBecause the current gmfit cannot read mmCIF format file, a PDB file has to be obtained from the mmCIF file using the gmconvert as follows:
gmconvert A -icif 4v7r-assembly1.cif -opdb 4v7r-assembly1.pdbNext, the density map "emdb_1067.map" is converted into GMM. The cutoff value 0.000477 was taken from contour_list level in emd-1067.xml.
gmconvert V2G -imap emd_1067.map -cutoff 0.000477 -ng 10 -ogmm emd_1067_g10.gmmIf you want to quick calculate GMM with small number of Gaussians for the large map, the options
-emalg DG
and -fdsmap 2
may be usefull.
gmconvert V2G -imap emd_1067.map -cutoff 0.000477 -emalg DG -fdsmap 2 -ng 10 -ogmm emd_1067_g10.gmmFinally, the program gmfit is executed by the following arguments:
## 4v7r-emd_1067.gmfit ## -cg emd_1067_g10.gmm -sg1 4v7r-assembly1_g10.gmm -sa1 4v7r-assembly1.pdb -I P -S X -R S -D X -NI 24 -NS 10 -NR 10 -NRO 5
gmfit -script 4v7r-emd_1067.gmfitYou will find the fitted PDB file "
refn_1.pdb
" in the directory "4v7r-emd_1067/".
In this tutorial, fitting a density map of E.coli ribosome into another density map of yeast ribosome.
Before the calculation, two data of density maps,
emd_1915.map
and
emd_1067.map
have to be downloaded from EMDB.
Next, these two density maps are converted into GMM with 10 Gaussian functions. To enhance the computational speed, we limit maximum voxel length to 64.
The cutoff values 254.0 and 0.000477 was taken from contour_list level in emd-1915.xml and emd-1067.xml, respectively.
gmconvert V2G -imap emd_1915.map -cutoff 254.0 -ng 10 -maxsize 64 -ogmm emd_1915_g10.gmm gmconvert V2G -imap emd_1067.map -cutoff 0.000477 -ng 10 -maxsize 64 -ogmm emd_1067_g10.gmmThe map
emd_1915.map
is superimposed into the map emd_1067.map
by a following command.
## emd-1915_emd-1067.gmfit ## -cg emd_1067_g10.gmm -sg1 emd_1915_g10.gmm -ochimera T -I P -S X -R S -D X -NI 24 -NR 10 -NRO 5
gmfit -script emd-1067_emd-1915.gmfitAfter the fitting calculation, the file "
refn_1.com
" is generated in the directory emd-1067_emd-1915/
containing the transforming commands for UCSF chimera.
To visualiaze superimposed density maps using UCSF Chimera, following three steps are required.
emd_1067.map
" by the menu [File]->[Open...].
emd_1915.map
" by the menu [File]->[Open...].
refn_1.com
" by the menu [File]->[Open...].
This tutotial is also a test calculation;superimposing three atomic subunit models into a simulated low resolution map of
their complex. In this case, we impose a C3 rotational symmetric constraint. First, you have to download PDB entry
1qu9 as
pdb1qu9.ent
, which is a homo trimeric structure. Next split this file into each chain 'A','B','C', and save as
1qu9A.pdb
, 1qu9B.pdb
, 1qu9C.pdb
, respectively.
The PDB file of the complex (pdb1qu9.ent
) is converted into a low resolution map as follows:
gmconvert A2V -ipdb pdb1qu9.ent -reso 10 -gw 2.0 -omap pdb1qu9_r10.mapThe mapfile "pdb1qu9_r10.map" is then converted into a GMM file "pdb1qu9_r10_g12.gmm". The cutoff value 0.708823 was decided by the standard output of the previous command "MIN_DENSITY_AMONG_ATOM_CENTERS 7.088230e-01 0.708823".
gmconvert V2G -imap pdb1qu9_r10.map -cutoff 0.708823 -ng 12 -ogmm pdb1qu9_r10_g12.gmmNext, the atomic model of the subunits with chain 'A' is converted into GMM.
gmconvert A2G -ipdb 1qu9A.pdb -ng 8 -ogmm 1qu9A_g8.gmmSimilarly, the models chains 'B' and 'C' can be converted into GMM, however, we assume that chains 'A','B' and 'C' are identical. Therefore, we can use "
1qu9A_g8.gmm
" for both chains B and C.
A following gmfit file is prepared as "1qu9.gmfit
".
Note that this script assumes no symmetrical constraint.
## 1qu9.gmfit ## -cg pdb1qu9_r10_g12.gmm -sg1 1qu9A_g8.gmm -sg2 1qu9A_g8.gmm -sg3 1qu9A_g8.gmm -sa1 1qu9A.pdb -sa2 1qu9A.pdb -sa3 1qu9A.pdb -I R -S SF -R S -D X -NI 1000 -NS 10 -NR 10 -NRO 5
Finally, the following command generates several result files into the directory '1qu9/
'.
gmfit -script 1qu9.gmfitYou will find the fitted PDB files as
1qu9/refn_1.pdb
.
To introduce C3-symmetry constraint, the options -csym C3
and -homounit1 1,2,3
should be added.
## 1qu9_symC3.gmfit ## -cg pdb1qu9_r10_g12.gmm -sg1 1qu9A_g8.gmm -sg2 1qu9A_g8.gmm -sg3 1qu9A_g8.gmm -sa1 1qu9A.pdb -sa2 1qu9A.pdb -sa3 1qu9A.pdb -csym C3 -homounit1 1,2,3 -I R -S SF -R S -D X -NI 1000 -NS 10 -NR 10 -NRO 5Then, the following command generates several result files into the directory '
1qu9_symC3/
'.
gmfit -script 1qu9.gmfitYou will find the fitted PDB files as
1qu9_symC3/refn_1.pdb
, which may be more fitted into the map
pdb1qu9_r10.map
and the trimeric model pdb1qu9.ent
.