(1)
where P(i -> j ) is the transition probability that structure i changes to structure j during the evolutionary process, and P(i) is the probability that structure i appears by chance. i and j can represent any kinds of 3D structural features, such as secondary structures and distance between residues. Generally speaking, estimation of the transition probability P(j -> i). We estimated the transition probability by the Markov transition model, which is similar to Dayhoff's substitution model between amino acids. Matras uses the following three kinds of similarity scores.SSE Score (Ssse)
A secondary structure element (SSE) is a continuous residue group that is defined as an alpha-helix a beta-strand. It is represented by a single vector defined by the principle inertial axis with the smallest moment. The spatial arrangement of a pair of SSEs is described by six parameters : the number of residues L1, L2, the closest distance between SSE pairs d, the bond angles theta1,theta2, and the dihedral angle phi. And we made six kinds of log-odds score corresponding six parameters, the total SSE score is represented as the sum of six terms.
Environment Score (Senv)
This score was defined for the environment states, which are a combination of local structure and solvent accessibility. The ten kinds of ``environment'' are defined by combining the five local structures and the two accessibility classes.
Distance Score (Sdis)
This score focuses on the distance between Cbeta atoms of the i-th and j-th residues. The distance is transformed into a discrete histogram with 1 A (angstrom) width. This score is prepared for each residues separation k (=|i-j|). It is used in the final stage of alignment of our program, because it is the most sensitive to detect structural similarity among our three scores.
% ungzip Matras[version].tar.gz % tar xvf Matras[version].tarNew directory ``Matras[version]'' appears.
% cd Matras[version]/src
% makeIf you succeed, an executable file ``Matras'' is made at the upper directory of ``src''.
.matras
'.
You must put the '.matras
' files
on (1)your current directory, or (2)your home directory.
A sample environmental file is shown as follows,
which is stored as "dot.matras" in the base directory.
############################### ### MATRAS ENVIRONMENT FILE ### ############################### BASE_DIR /home/takawaba/work/Matras12 SCORE_DIR /home/takawaba/work/Matras12/data_sc/ROM-04JAN29 TMP_DIR /home/takawaba/work/Matras12/tmpout BSSP_DIR /DB/BSSP PDB_DIR /DB/PDBA line whose head is '#' is a comment that Matras skips to read. Other lines are combinations of [Variable Name] [Value of Variables]. We will explain important variables.
% dsspcmbi -c [pdb_file] [dssp_file]
% bssp.pl [dssp_file] [pdb_file] > [bssp_file]
% dsspcmbi -c pdb1mbd.ent 1mbd-.dssp % bssp.pl 1mbd-.dssp pdb1mbd.ent > 1mbd-.bsspI recommend to add ".bssp" to the end of the bssp file as a suffix. In principle, Matras assumes that one BSSP file only contains one chain of protein. For the PDB files with multi chains, you must make a new PDB file that contains only one chain you want to compare. The BSSP files must be located in one of the following three locations.
% MatrasMatras shows simple help messages. A more detailed help messages are shown using a following command:
% Matras H
% Matras P -A [bsspfileA] -B [bsspfileB]For example, if you want to compare myoglobin(1mbdA.bssp) and hemoglobin alpha chain (4hhbA.bssp), input a following command:
% Matras P -A 1mbdA.bssp -B 4hhbA.bsspIf you want read PDB files directly, you can execute Matras as follows:
% Matras P -A [pdbfileA] -Ac [ChainID for proteinA] -B [pdbfileB] -Bc [ChainID for proteinB]When a PDB file is provided, Matras assigns its secondary structures, by the dihedral angles (phi,psi) and positions of Calpha atoms. For example, if you want to compare A chain of myoglobin (
pdb1mbd.ent
) and hemoglobin(pdb4hhb.ent
)
A chain in put a following command:
% Matras P -A pdb1mbd.ent -Ac A -B pdb4hhb.ent -Bc AThen you get a following output in standard output.
#### MATRAS VER 1.2: PROGRAM FOR PROTEIN 3D STRUCTURE COMPARISON #### # coded by Takeshi Kawabata. Last Modified : May 8, 2014 ## Takeshi Kawabata and Ken Nishikawa. # "Protein Structure Comparison Using the Markov Transition Model of Evolution". # Proteins vol.41:108-122(2000). # "Matras P -A 1mbdA.bssp -B 4hhbA.bssp " # "Jul 16,2014 11:54:54" # P:PAIRWISE COMPARISON # # ProAFile "1mbdA.bssp" ProBFile "4hhbA.bssp" # SseAliType T EnvAliType T AlgType L # ssescfile "3U" envscfile "T10-3U.rom" DisSc N disscfile "3U" DisScE - EnvType T Nenvstate 10 # GapExtE -6.0 GapExtD -100.0 Nkeep 35 Nrep 10 SseOffsetD 0.00 [ALIGN_RANK] 1 [PROTEIN A] 1mbdA Naa 153 Nsse 8 "MYOGLOBIN" [PROTEIN B] 4hhbA Naa 141 Nsse 7 "HEMOGLOBIN (DEOXY) (ALPHA CHAIN)" [ALIGNMENT] Ncomp_aa 141 Ncomp_sse 6 [SIMILARITY] Seq 27.0 % Sec 88.7 % Exp 82.3 % CRMS 1.56 A DRMS 1.40 A [SCORE] ScSSE 676.0 ScEnv 4007.8 ScDis 149318.8 Rdis 70.7 (%) Rsse 44.8 (%) [RELIABILITY] Superfamily 99.4 % Fold 99.4 % : H1 H2 H3 H4 H5 SecA : HHHHHHHHHHHHHHGGGHHHHHHHHHHHHHHH HHHHTT TTTTT SHHHHHH HH 1 :VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASED: 60 *** * * ** * * * * * * * * * * * * 1 :VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFD-L----SHG-SAQ: 54 SecB : HHHHHHHHHHHHHHTTTHHHHHHHHHHHHHHH GGGGGG TTS - ----STT- HH : H1 H2 - ---- - H3 : H6 H7 SecA :HHHHHHHHHHHHHHHHTTTT HHHHHHHHHHHHHTS HHHHHHHHHHHHHHHHHH G 61 :LKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHP: 120 * ** * ** * * ** * * * * 55 :VKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLP: 114 SecB :HHHHHHHHHHHHHHHHHTGGGHHHHTHHHHHHHHHTT THHHHHHHHHHHHHHHHH T : H4 H5 H6 : H8 SecA :GG HHHHHHHHHHHHHHHHHHHHHHH 121 :GDFGADAQGAMNKALELFRKDIAAKYK: 147 * * * ** 115 :AEFTPAVHASLDKFLASVSTVLTSKYR: 141 SecB :TT HHHHHHHHHHHHHHHHHHTTT : H7
[ALIGN_RANK] 1 [PROTEIN A] 1mbd- Naa 153 Nsse 8 "MYOGLOBIN (DEOXY, $P*H 8.4)" [PROTEIN B] 4hhbA Naa 141 Nsse 7 "HEMOGLOBIN (DEOXY)" [ALIGNMENT] Ncomp_aa 141 Ncomp_sse 6 [SIMILARITY] Seq 27.0 % Sec 88.7 % Exp 82.3 % CRMS 1.56 A DRMS 1.40 A [SCORE] ScSSE 676.0 ScEnv 4007.8 ScDis 149312.6 Rdis 70.7 (%) Rsse 44.8 (%) [RELIABILITY] Superfamily 100.0 % Fold 100.0 %
(2)
where Sdis(A,B) is a raw distance score between proteinA and B, Smax and Smin is the maximum and minimum value of the score correspondingly. We set Smin = 0, Smax is defined as the averaged value among the self scores:(3)
-A
' has a chain identifier 'A', and one assigned by '-B
' has a 'B'.
If you don't want to write this file, add the option '-op -
'.
% rasmol 1.pdb RasMol> script "1.ras"then RasMol shows a colored structure pairs. If you don't want to write this file, add the option '
-or -
'.
-Ac
' on to the molecule with chain ID '-Bc
'.
However, other ligand molecules in the file '-A
' are also transformed and saved in the file '-opdb
' or '-opdbA
'.
This function is useful for the studies of ligand-protein interactions
(such as kcombu,
and fkcombu).
-A
' and '-B
' assigned PDB files.
The molecules in the file '-A
' are transformed to superimpose on the molecule in the file '-B
'.
The molecules in the file '-B
' are fixed.
-A
' PDB file.
The file contains only the molecules in the file '-A
', which are transformed to superimposed on the molecule
in the file '-B
'.
% Matras P -A pdb1mbd.ent -Ac A -B pdb4hhb.ent -Bc A -opdbA super_1mbd.pdbThen you will find the superimposed myoglobin's HEM in the file '
super_1mbd.pdb
'.
When ligand molecules are stored in a separated file from a protein file, following options are useful.
-A
').
-B
').
This corresponds to the input ligand PDB file assigned by '-ilgA
'.
% Matras P -A pdb1mbd.ent -Ac A -B pdb4hhb.ent -Bc A -ilgA HEM_1mbdA.pdb -olgA super_HEM_1mbdA.pdb
-SO T
', then Matras calculate sub-optimal
alignments. The default is ``F''alse.
If Matras recognize more than one optimal alignment,
they show all the alignments in stdout, and write
correspoinding RasMol scripts are ``1.ras'',''2.ras''..''n.ras'',
correspoinding matrix files are ``1.mat'',''2.mat''..''n.mat''.
-SQ T
', then Matras calcualate an alignment
considering only amino acid type, not any 3D structure features.
BLOSUM 62 score and gap penalty for extension is -1, and that for open is -11.
-A
' and `-B
'
options,and add `-SA T
' option.
% Matras P -A [bsspfile] -B [bsspfile] -SA TFor example, if you want to find repeated structures of triose phosphate isomerase (TIM), run a following command.
% Matras P -A 1timA.bssp -B 1timA.bssp -SA TYou can use any other options for pairwise 3D alignments.
% Matras L -Q [query_bsspfile] -L [library_listfile] > [result_file]The file [library_listfile] contains names of BSSP files for the structure library. The format of library list file is shown in Appendix.
(4)
where q and l represent proteins, S(q,l) is the similarity score of proteins q and l, and E(q) and sigma(q) are the average value and the standard deviation of score of protein q over the database. Our distance score Sdis correlates with the square of Ncomp, which is the number of compared residues. We therefore employed a quadratic normalization, in which E(q) and sigma(q)\) in the equation is determined by the least-square fitting of the similarity score. The regression line
(5)
is calculated for the score S(q,l) of the query protein q against proteins l stored in the database, by the fitting parameters Aq and Bq. Then, the E(q) in the equation is replaced by Sqreg(q,l). sigma(q) is obtained by the averaged error of the regression line,
(6)
where Npro is the number of protein chains in the database. Because we assume that all the parameters Aq,Bq, and sigma(q) should be derived from the non-homologous proteins, we repeat the esimation two times. First, all the proteins in the library are used to estimate the parameters. Using these parameters, we calculate Z-score for each protein. To extract non-homologous protein, we chose proteins with Z-score <= 4.0, and reesimate the parameters and Z-score using the extracted non-homologous proteins.% Matras L -Q 4azuA.bssp -L 30scop1.71nm.list -R q -zt 5The query structure is ``4azuA'', and the library list file is ``30scop1.71nm.list'',which is the 30 \% representative list of structural domains registered in SCOP 1.71.
#### MATRAS VER 1.2: PROGRAM FOR PROTEIN 3D STRUCTURE COMPARISON #### # coded by Takeshi Kawabata. Last Modified : Apr 15, 2007 ## Takeshi Kawabata and Ken Nishikawa. # "Protein Structure Comparison Using the Markov Transition Model of Evolution". # Proteins vol.41:108-122(2000). # "Matras L -Q 4azuA.bssp -L 30scop1.71nm.list -R q -zt 5 " # "Apr 15,2007 11:40:0" # L:ONE-VS-LIBRARY COMPARISON # # QueFile "L" LibFile "4azuA.bssp" # SseAliType T EnvAliType T AlgType L # ssescfile "3U" envscfile "T10-3U.rom" DisSc N disscfile "3U" DisScE - EnvType T Nenvstate 10 # GapExtE -6.0 GapExtD -100.0 Nkeep 35 Nrep 10 SseOffsetD 1 [QUERY_PROTEIN] 4azuA.bssp [QUERY_COMPND] AZURIN (PH 5.5) [QUERY_SIZE] Naa 128 Nsse 10 [MAX_NAAB] 1450 [LIBRARY_FILE] 30scop1.71nm.list [LIBRARY_SIZE] 5931 [WAY_OF_RANKING] DisSc Ncmp^2-fit Zsc after plain Zsc filter( 2.219025 *x*x + -68.313 S D 1781.572 ) [Z_THRESHOLD] 5.000000 [NRANK] 123 BEST_SCORE_RANKING] rk entry start end Rdis Zsc SqID MOLECULAR NAME 1 1jzgA 1 128 93.7 71.54 100.0 "AZURIN" 2 1e30A 37 155 35.2 28.46 16.3 "RUSTICYANIN" 3 1fwxA1 486 579 34.4 24.13 18.5 "NITROUS OXIDE REDUCTASE" 4 1oe1A1 31 151 26.6 21.19 19.4 "DISSIMILATORY COPPER-CONTAINING NITRITE" 5 1gskA1 25 175 24.5 21.10 6.9 "SPORE COAT PROTEIN A" 6 1cyx- 126 225 23.8 19.44 10.2 "CYOA" 7 1aozA1 3 122 33.2 19.39 13.5 "ASCORBATE OXIDASE (E.C.1.10.3.3)" 8 1kv7A1 43 163 29.6 19.30 15.0 "PROBABLE BLUE-COPPER PROTEIN YACK" 9 2cuaA 78 167 30.0 18.32 17.0 "CUA" 10 1hfuA1 5 127 32.6 18.27 12.0 "LACCASE 1" : 121 1ulvA2 689 771 18.1 5.10 6.2 "GLUCODEXTRANASE" 122 1ti6B1 196 263 15.9 5.04 13.2 "PYROGALLOL HYDROXYTRANSFERASE SMALL SUBUNI T" 123 1wmdA1 319 434 18.0 5.02 9.5 "PROTEASE" [BEST_SCORE_RANKING_WITH_DETAILED_INFORMATION] rk entry Naa Ncmp SqID rms Ssse Rsse Sdis Rdis Zsc RelS RelO TAXONOMY 1 1jzgA 128 128 100.0 0.8 2618 94.7 163738 93.7 71.54 87.7 94.2 [b.6.1.1] 2 1e30A 153 104 16.3 2.7 1136 26.7 74639 35.2 28.46 83.2 92.1 [b.6.1.1] 3 1fwxA1 132 92 18.5 2.0 1015 33.6 61703 34.4 24.13 82.3 91.5 [b.6.1.4] 4 1oe1A1 159 98 19.4 3.3 669 23.6 59002 26.6 21.19 80.5 91.0 [b.6.1.3] 5 1gskA1 174 102 6.9 3.8 741 26.1 60602 24.5 21.10 80.4 91.0 [b.6.1.3] 6 1cyx- 158 88 10.2 2.3 932 22.6 51757 23.8 19.44 78.8 90.8 [b.6.1.2] 7 1aozA1 129 104 13.5 5.1 835 36.7 58476 33.2 19.39 78.8 90.8 [b.6.1.3] 8 1kv7A1 140 100 15.0 4.5 831 25.9 56499 29.6 19.30 78.7 90.7 [b.6.1.3] 9 2cuaA 122 88 17.0 2.3 896 32.7 49748 30.0 18.32 77.4 90.6 [b.6.1.2] 10 1hfuA1 131 108 12.0 5.3 805 31.7 58371 32.6 18.27 77.3 90.6 [b.6.1.3] : 121 1ulvA2 89 81 6.2 4.1 619 28.3 23580 18.1 5.10 28.0 53.6 [b.1.18.2] 122 1ti6B1 79 68 13.2 3.9 407 18.4 19175 15.9 5.04 27.5 53.1 [b.3.5.1] 123 1wmdA1 116 95 9.5 5.9 465 21.2 28902 18.0 5.02 27.4 52.8 [b.18.1.20] [BEST_RANKING_WITH_ONE_LINE_SECONDARY_STRUCTURE] rk entry staQ |staL | . . . . : . . . . + . . | endL| endQ (query ) 1| ---|ccEEEccccEEccEEEEccHHHcEEcHHHHHHccccccEEcccEEEEccccEEccccEEE|--- |128 1 1jzgA 1| 1|ccEEEccccEEccEEEEccHHHcEEcHHHHHcccccccEEcccEEEEccccEEccccEEE|128 |128 2 1e30A 2| 37|-EEEEccEEEEccEEEEcc--ccEEcccccc---ccccEEcccEEEc-ccEEEccccEEE|155 |128 3 1fwxA1 2| 486|-EEEEEcEcEEEcEEEEcccccEEc-------------EEEEcEEEEc-cEEEccccEEE|579 |128 4 1oe1A1 1| 31|cEEEEEEEcEEEcEEEEcc-cccEccc----------ccccccEEEEc-cEEEcHHHEEE|151 |128 5 1gskA1 2| 25|-EEEEEEEcEEEccEEEccccccEEcc----------cccccccEEcc-cEEEcHHHEEE|175 |128 6 1cyx- 2| 126|-EEEEEcEEEEEccEEEc---cEEc-------------EEEEccEEc-ccEEEcccccEE|225 |128 7 1aozA1 2| 3|-EEEEEEEcEEEcEEEEcc-cccEEccccc---ccc---ccccEEEE-ccEEEccccEEE|122 |128 8 1kv7A1 1| 43|ccEEEEEEcEEEcEEEEc--cccEEcccc----cc-----cccEEEEcccEEEcHHHcEE|163 |128 9 2cuaA 2| 78|-EEEEEcEcEEEcEEEEc---cEEc-------------cEEEccEEEcccEEEccccEEE|167 |128 10 1hfuA1 3| 5|-cEEEEEEcEEEccEEEccccccEEccccc---ccc---ccccEEEccccEEEccccEEE|127 |128 : 121 1ulvA2 10| 689|-----ccEEEEEcEEEE----cEEEcE-----------EEcc-EEEE-cEEEEEccEEEE|771 |128 122 1ti6B1 28| 196|-------------cEEEEcEccEEEc-----------EEEEccEEEEEE--EEEcE-EEE|263 |128 123 1wmdA1 5| 319|--cEEEcEEEEcccEEEccccEEEEc-----------cEEcccEEEccc-EEEEEcEEEE|434 |127 [CLUSTALW_STYLE_ALIGNMENT] CLUSTAL W (1.82) multiple sequence alignment QUERY AECSVDIQGNDQMQFNTNAITVDKSCKQFTVNLSHPGNLPKNVMGHNWVLSTAADMQGVV 1jzgA AECSVDIQGNDQMQFNTNAITVDKSCKQFTVNLSHPGNLPKNVMGHNWVLSTAADMQGVV 1e30A -TVHVVAAAVPFpSFEVPTLEIPAGA-TVDVTFINTNKG----FGHSFDITKK-GPp--Y 1fwxA1 -KVRVYMSSV-ApSFSIESFTVKEGD-EVTVIVTNLDEID--DLTHGFTMGN-------- 1oe1A1 KVVEFTMTIEEKMTFNGPTLVVHEGD-YVQLTLVNPATN---AMPHNVDFHGATG----- 1gskA1 -KTYYEVTMEECWGYNGPTIEVKRNE-NVYVKWMNNLPSTHPEVKTVVHLHGGVT----- 1cyx- -PITIEVVSM-DWKWFFNEIAFPANT-PVYFKVTSNS------VMHSFFIPR-------- 1aozA1 -IRHYKWEVEYMMGINGPTIRANAGD-SVVVELTNKLH----TEGVVIHWHGILQRGTPW 1kv7A1 DRNRIQLTIGAGWGYNGPAVKLQRGK-AVTVDIYNQL-----TEETTLHWHGLEVPGEVD 2cuaA -QYTVYVLAF-AfGYQpNpIEVPQGA-EIVFKITSPD------VIHGFHVEG-------- 1hfuA1 --SVDTMTLTNAILVNGPLIRGGKND-NFELNVVNDLDNPTMLRPTSIHWHGLFQRGTNW : 1ulvA2 ---------LSSPELSVTApESTADSATAVVRGTT--------NAAKVYVSVNGT----- 1ti6B1 ---------------------------KNYVTAGILVQGDCF-EGAKVVLKSGG------ 1wmdA1 ----AYVSSLSTSQKATYSFTATAGK-PLKISLVWSDAPVTLVNDLDLVITAPN------ QUERY TDGMASGLDKDYLKPDDSRVIAHTKLIGSGEKDSVTFDVSKLKEGEQYMFFCTFPGHSAL 1jzgA TDGMASGLDKDYLKPDDSRVIAHTKLIGSGEKDSVTFDVSKLKEGEQYMFFCTFPGHSAL 1e30A AV-M--------PV--IDpIVAGTGFSPVPGYTNFTWH---PTA-GTYYYVCQIPGHAAG 1fwxA1 -------------------YGVAME-IGPQMTSSVTFVAAN---PGVYWYYCQWFALHME 1oe1A1 ------------------ALGGALTNVNPGEQATLRFKADR---SGTFVYHCAPMWHVVG 1gskA1 -----------------PDDSDGYAWFSKDFREVYHYPNQQ--RGAILWYHDHARLNVYG 1cyx- -------------------LGSQIY-AMAGMQTRLHLI---ANEPGTYDGICAEIPGHSG 1aozA1 ADG-------TASI--------SQCAINPGETFFYNFT---VDNPGTFFYHGHLGMQRSG 1kv7A1 G---------GPQ-----------GIIPPGGKRSVTLNVD--QPAATCWFHPHQHRQVAG 2cuaA -------------------TNINVE-VLPGEVSTVRYTFK--RP-GEYRIICNQYLGHQN 1hfuA1 ADG-------ADGV--------NQCPISPGHAFLYKFTPA--GHAGTFWYHSHFGTQYCG : 1ulvA2 --------------------ATEAPVTD--GTFSLDVAL--TGAKNKVTVAAVAADG-GT 1ti6B1 ------------------KEVASAETNFF-GEFKFDALDNGE-----YTVEIDADGKS-- 1wmdA1 ------------------GTQYVGNWDGRNNVENVFIN-APQS--GTYTIEVQAYNVpQT QUERY MKGTLTLK 1jzgA MKGTLTLK 1e30A QFGKIVVK 1fwxA1 MRGRMLVE 1oe1A1 MSGTLMVL 1gskA1 LVGAYIIH 1cyx- MKFKAIAT 1aozA1 LYGSLIVD 1kv7A1 LAGLVVIE 2cuaA MFGTIVVK 1hfuA1 LRGPMVIY : 1ulvA2 AVEDRTVL 1ti6B1 YSDTVVID 1wmdA1 FSLAIVN- [ALIGNMENTS] >1 1jzgA [b.6.1.1] "AZURIN" #Naa 128 start 1 end 128 SqID 100 % crms 0.8 Ssse 2618 Sdis 163738 Rdis 93.7 Z 71.54 : E1 E2 E3 H1 E4 H2 SecA : TTEEEEEB TTS BS SEEEE TT SEEEEEEE SS HHHH B EEEETTTHHHHH 1:AECSVDIQGNDQMQFNTNAITVDKSCKQFTVNLSHPGNLPKNVMGHNWVLSTAADMQGVV:60 ************************************************************ 1:AECSVDIQGNDQMQFNTNAITVDKSCKQFTVNLSHPGNLPKNVMGHNWVLSTAADMQGVV:60 SecB : EEEEEB TTS BS SEEEE TT SEEEEEEE SSS HHHH B EEEEGGGHHHHH : E1 E2 E3 H1 E4 H2 : E5 E6 E7 SecA :HHHHHH GGGTTS TT TT SEE B TT EEEEEEEGGGS TT EEEE STTTTTT 61:TDGMASGLDKDYLKPDDSRVIAHTKLIGSGEKDSVTFDVSKLKEGEQYMFFCTFPGHSAL:120 ************************************************************ 61:TDGMASGLDKDYLKPDDSRVIAHTKLIGSGEKDSVTFDVSKLKEGEQYMFFCTFPGHSAL:120 SecB :HHHHTT GGGTTS TT TT EE B TT EEEEEEEGGG TT EEEE STTGGGT : E5 E6 E7 : E8 SecA :SEEEEEE 121:MKGTLTLK:128 ******** 121:MKGTLTLK:128 SecB :SEEEEEE : E8 // >2 1e30A [b.6.1.1] "RUSTICYANIN" #Naa 153 start 37 end 155 SqID 16 % crms 2.7 Ssse 1136 Sdis 74639 Rdis 35.2 Z 28.46 : E1 ---- ----- E2 E3 H1 E4 SecA :TTEEEEEB ----TTS BS -----SEEEE TT SEEEEEEE SS HHHH B EEEE 2:ECSVDIQGN----DQMQFNT-----NAITVDKSCKQFTVNLSHPGNLPKNVMGHNWVLST:52 * * * ** 37:TVHVVAAAVLPGFPFpSFEVHDKKNPTLEIPAGA-TVDVTFINTNKG----FGHSFDITK:91 SecB :EEEEEEEES TTS SS EEETTEES EEEE TT -EEEEEEEE TT---- EEES :E4 E5 E6 E7 -E8 ---- E9 : H2 E5 ----- E6 SecA :TTTHHHHHHHHHHH GGGTTS TT TT SEE B TT----- EEEEEEEGGGS TT 53:AADMQGVVTDGMASGLDKDYLKPDDSRVIAHTKLIGSG-----EKDSVTFDVSKLKEGEQ:107 * * * * 92:K-GPp--YAV-M--------PV--IDpIVAGTGFSPVPKDGKFGYTNFTWH---PTA-GT:133 SecB : - SS-- S-S-------- -- SEEEEB BTTEEEEEEEEE --- S-EE : - -- - -------- -- E10 E11 --- -E1 :E7 - E8 SecA :EEEE STTTTT-TSEEEEEE 108:YMFFCTFPGHSA-LMKGTLTLK:128 * * *** * * * 134:YYYVCQIPGHAATGQFGKIVVK:155 SecB :EEEE STTTTTTT EEEEEE :2 E13 // ### : ### ### : ### ### SKIPPING THE PAIRWISE ALIGNMENTS FROM THE 3RD TO THE 122-TH ### ### : ### ### : ### >123 1wmdA1 [b.18.1.20] "PROTEASE" #Naa 116 start 319 end 434 SqID 9 % crms 5.9 Ssse 465 Sdis 28902 Rdis 18.0 Z 5.02 :1 --- E2 E3 ------ H1 E4 SecA :EEE---EB TTS BS SEEEE TT SEEEEEEE SS ------ HHHH B EEEETTT 5:VDI---QGNDQMQFNTNAITVDKSCKQFTVNLSHPGNL------PKNVMGHNWVLSTAAD:55 * * * * * * 319:AYVNESSSLSTSQKATYSFTATAGK-PLKISLVWSDAPASTTASVTLVNDLDLVITAPN-:376 SecB : EEEEEEEE TT EEEEEEEE TTS- EEEEEE TT S SEEEEEEE TT- : E1 E2 - E3 E4 - :H2 E5 --------- E6 SecA :HHHHHHHHHHH GGGTTS TT TT SEE --------- B TT EEEEEEEGGGS TT 56:MQGVVTDGMASGLDKDYLKPDDSRVIAHTK---------LIGSGEKDSVTFDVSKLKEGE:106 * * :-----------------------GTQYVGNDFTSpYNDNWDGRNNVENVFIN-APQS--G:410 SecB :-----------------------S EEETT SSSTTS SS SEEEEEES-S S--E :----------------------- E5 E6 - --E : E7 --- E8 SecA : EEEE STTT---TTTSEEEEEE 107:QYMFFCTFPGH---SALMKGTLTL:127 * 411:TYTIEVQAYNVPVGpQTFSLAIVN:434 SecB :EEEEEEEEEE SS EEEEEEEE :7 E8 //
% Matras A -L [library_listfile]The format of libaray_listfile is described in the appendix. If a following listfile is used as inputs,
1mbd- 1ecd- 4hhbA 4hhbBa following result will be obtained:
#### MATRAS VER 1.2: PROGRAM FOR PROTEIN 3D STRUCTURE COMPARISON #### # coded by Takeshi Kawabata. Last Modified : Feb 6, 2004 ## Takeshi Kawabata and Ken Nishikawa. # "Protein Structure Comparison Using the Markov Transition Model of Evolution". # Proteins vol.41:108-122(2000). # "Matras A -L globinlist " # "May 9,2004 11:20:5" # A:ALL-VS-ALL COMPARISON # # LibFile "globinlist" # SseAliType T EnvAliType T AlgType L # ssescfile "3U" envscfile "T10-3U.rom" DisSc N disscfile "3U" DisScE - EnvType T Nenvstate 10 # GapExtE -6.0 GapExtD -100.0 Nkeep 35 Nrep 10 SseOffsetD 0 #[Matras A -L globinlist ] #Nlibrary 4 MaxNaaLib 0 #READ ALL THE STRUCTURE #Nlib 4 Ncomb 10 #AVAbunshi/bunbo [0]/[1] #Npair_start 0 Npair_end 10 Npair_to_be_calculated 9 #MALLOC FOR DP:MaxNaaA 153 MaxNaaB 153 #COLDEF [proA] [proB] [NaaA] [NaaB] [Ncomp] [ScSSE] [ScEnv] [ScDis] [SqID] [CRMS] [Rdis] [Rsse] 1mbd- 1mbd- 153 153 153 1715.7 5526.4 227844.9 100.00 0.00 100.00 100.00 1mbd- 1ecd- 153 136 136 1313.6 4125.9 132476.6 20.59 1.65 65.14 76.11 1mbd- 4hhbA 153 141 141 676.0 4007.8 149312.6 26.95 1.56 70.68 44.82 1mbd- 4hhbB 153 146 145 1017.1 4428.8 157392.8 24.83 1.62 72.03 67.47 1ecd- 1ecd- 136 136 136 1736.2 4798.5 178893.6 100.00 0.00 100.00 100.00 1ecd- 4hhbA 136 141 131 610.5 3704.9 106591.4 18.32 2.40 57.07 40.21 1ecd- 4hhbB 136 146 136 889.4 3730.7 114607.0 19.12 2.25 59.07 58.61 4hhbA 4hhbA 141 141 141 1300.4 4817.5 194641.9 100.00 0.00 100.00 100.00 4hhbA 4hhbB 141 146 139 685.5 4124.9 155279.5 43.88 1.45 76.91 52.74 4hhbB 4hhbB 146 146 146 1299.1 5031.7 209173.0 100.00 0.00 100.00 100.00
mulmat.pl
', which is
in the BASE_DIR directory. This script calls the Matras program
several times to get pairwise alignments, and it makes a multiple
alignment by assembling these pairwise alignments.
mulmat.pl
' executes the Matras program,
and stores all the results of pairwise alignment in the
TMP_DIR directory (assigned in the `.matras
' file).
mulmat.pl
' without any arguments,
following help messages are shown:
% mulmat.pl [str1] [str2]... [strN] (-options) for 'mul'tiple 3D alignment using 'Mat'ras written by Takeshi Kawabata. LastModDate :Dec 26, 2003Its basic procedure to run is as follows:-F : strucutre list file[] -TMP_DIR : temporary output dir[/home/takawaba/work/Matras12/tmpout] -RES_DIR : result output dir[.] -ad : alignment file directory[] -ow : Outfile in ClustalW[-] -ov : Outfile in Vertical style[] -ovp : Outfile in Vertical style with Plain Residue Num[] -oh : Outfile in Horizontal style[] -ohs : Outfile in Horizontal SecStr[] -ohtml : Outfile in Horizontal SecStr HTMLfile [] -ocon : Outfile for consensus sequence [] -opdb : Outfile for sup-imposed PDBs[] -oph : Outputfile for guided UPGMA tree[] -ops : Outputfile PSI-BLAST multiple alignment[] -OS : Output StrType 'B'ssp, 'P'db [B] -rhead : header of all the result outputfile[] -thead : header of all the temporary outputfile[] -so : Matras SubOptimal[F] -QO : seQuence Order ('T'ree)[T] -dmat : Output distance matrix file[] -smat : Output similarity matrix file[smat] -rm : Remove Temoporary File (T|F) [F] -M : do MATRAS (T|F) [T] -T : do Tree (T|F) [T]
% mulmat.pl [bsspfile1] [bsspfile2] [bsspfile3] ....We show an example for multiple alignments of
1mbd-.bssp
, 1ecd-.bssp
, 4hhbA.bssp
and 4hhbB.bssp
.
You can omit their tail string `.bssp
'.
% mulmat.pl 1mbd- 1ecd- 4hhbA 4hhbBIf you want to compare many structures, we recommend that you make a file that contains protein names (one protein per one line), and assign the file using ``-F'' option. For example, firstly, you make a following file named ``listfile'',
1mbd- 1ecd- 4hhbA 4hhbBand execute ``mulmat.pl'' using a following options.
% mulmat.pl -F listfilePlease note that the current version of the program mulmat.py reads only BSSP file, not PDB files. We will improve this problem in the next version of Matras.
rasmol [output pdbfile] RasMol>script "mulgrp.ras"
[RDIS(%)] 1mbd- 0.0 65.1 70.7 72.0 1ecd- 65.1 0.0 57.1 59.1 4hhbA 70.7 57.1 0.0 76.9 4hhbB 72.0 59.1 76.9 0.0 [RMS(A)] #for aligned Calpha atoms 1mbd- 0.000 1.652 1.560 1.617 1ecd- 1.652 0.000 2.397 2.252 4hhbA 1.560 2.397 0.000 1.451 4hhbB 1.617 2.252 1.451 0.000 [DRMS(A)] #for aligned Cbeta atoms 1mbd- 0.000 1.462 1.398 1.417 1ecd- 1.462 0.000 1.916 1.875 4hhbA 1.398 1.916 0.000 1.125 4hhbB 1.417 1.875 1.125 0.000 [SqID(%)] 1mbd- 100.0 20.6 27.0 24.8 1ecd- 20.6 100.0 18.3 19.1 4hhbA 27.0 18.3 100.0 43.9 4hhbB 24.8 19.1 43.9 100.0
1 1 1 1 2 3 4 5 6 7 8 9 0 1 2 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678 # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA 1 1 A V 0 0 155 0, 0.0 2,-0.4 0, 0.0 127,-0.1 0.000 360.0 360.0 360.0 144.8 6.9 17.8 4.6 2 2 A L - 0 0 20 71,-0.1 122, 0.0 1,-0.1 0, 0.0 -0.791 360.0-141.9 -92.9 121.5 10.6 17.9 4.3 3 3 A S > - 0 0 44 -2,-0.4 4,-2.8 1, 0.0 5,-0.2 -0.150 29.4-103.9 -60.8-176.0 12.3 19.9 7.1 4 4 A P H > S+ 0 0 99 0, 0.0 4,-2.9 0, 0.0 5,-0.3 0.997 124.4 56.4-100.2 -1.8 15.0 21.9 6.2 1 1 1 1 2 3 4 5 6 7 8 9 0 1 2 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678 # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N X-CB Y-CB Z-CB PHI PSI X-CA Y-CA Z-CA 1 1 A V 0 0 155 0, 0.0 2,-0.4 0, 0.0 127,-0.1 6.4 19.0 5.8 360.0 144.8 6.9 17.8 4.6 2 2 A L - 0 0 20 71,-0.1 122, 0.0 1,-0.1 0, 0.0 11.1 18.0 2.8 -92.9 121.5 10.6 17.9 4.3 3 3 A S > - 0 0 44 -2,-0.4 4,-2.8 1, 0.0 5,-0.2 12.7 19.0 8.2 -60.8-176.0 12.3 19.9 7.1 4 4 A P H > S+ 0 0 99 0, 0.0 4,-2.9 0, 0.0 5,-0.3 15.8 23.1 7.2-100.2 -1.8 15.0 21.9 6.2
NPRO 2 PRO1 [Protein 1] PRO2 [Protein 2] COMMENT [Comment Line] ALIGNMENT [ResNum1] [ResName1] [ResNum2] [ResName2] : ENDThe residue names [ResName] must be written in one character way. The residue numbers [ResNum] must be identical to those in PDB files (23-27columns). If the "RNUMPLAIN" line appears, then plain residue number (named from 1 by integer) are used. The residue name for a inserted/deleted position must be assigned as '-', and the residue number for indel position must be ``-1''. Matras also output parameters for superimposing in this file. A following is an example.
NPRO 2 PRO1 1timA.bssp PRO2 1kv8A.bssp COMMENT Naa1 247 Naa2 213 COMMENT Ncomp 195 SqID 10.3 RMS 3.388 DRMS 2.755 COMMENT ScDis 188198.0 Rdis 34.8 PARAM_FOR_SUPERIMPOSING #Afit=R*(A-Ga)+Gb Ga 43.78974 29.88718 2.43385 Gb 64.59436 12.80051 25.07641 R0 0.86940 -0.47082 -0.14987 R1 0.23641 0.13004 0.96291 R2 -0.43387 -0.87259 0.22437 ALIGNMENT K 5 L 3 F 6 P 4 F 7 M 5 V 8 L 6 G 9 Q 7 G 10 V 8 : K 237 D 196 P 238 A 197 - -1 A 198 - -1 S 199 - -1 P 200 - -1 V 201 - -1 E 202 E 239 A 203 F 240 A 204 V 241 R 205 D 242 Q 206 I 243 F 207 I 244 K 208 - -1 R 209 N 245 S 210 A 246 I 211 K 247 A 212 H 248 E 213 ENDA following is an example with "RNUMPLAIN".
NPRO 2 PRO1 1timA.bssp PRO2 1kv8A.bssp COMMENT Naa1 247 Naa2 213 COMMENT Ncomp 195 SqID 10.3 RMS 3.388 DRMS 2.755 COMMENT ScDis 188198.0 Rdis 34.8 RNUMPLAIN PARAM_FOR_SUPERIMPOSING #Afit=R*(A-Ga)+Gb Ga 43.78974 29.88718 2.43385 Gb 64.59436 12.80051 25.07641 R0 0.86940 -0.47082 -0.14987 R1 0.23641 0.13004 0.96291 R2 -0.43387 -0.87259 0.22437 ALIGNMENT K 4 L 1 F 5 P 2 F 6 M 3 V 7 L 4 G 8 Q 5 G 9 V 6 : K 236 D 194 P 237 A 195 - -1 A 196 - -1 S 197 - -1 P 198 - -1 V 199 - -1 E 200 E 238 A 201 F 239 A 202 V 240 R 203 D 241 Q 204 I 242 F 205 I 243 K 206 - -1 R 207 N 244 S 208 A 245 I 209 K 246 A 210 H 247 E 211 END
NPRO [Number of Proteins] PRO1 [Proteine Name 1] PRO2 [Protein Name 2] : PRO[N] [Protein Name N] COMMENT [Comment] ALIGNMENT [ResNum1] [ResName1] [ResNum2] [ResName2].... [ResNumN] [ResNameN] : ENDThe residue names [ResName] must be written in one character way. The residue numbers [ResNum] must be identical to those in PDB files (23-27columns). If the "RNUMPLAIN" line appears, then plain residue number (named from 1 by integer) are used. The residue name for a inserted/deleted position must be assigned as '-', and the residue number for indel position must be ``-1''. A following is an example.
NPRO 4 PRO1 1mbd- PRO2 1ecd- PRO3 4hhbA PRO4 4hhbB ALIGNMENT - - - - - - V 1 V 1 - - V 1 H 2 L 2 L 1 L 2 L 3 S 3 S 2 S 3 T 4 E 4 A 3 P 4 P 5 G 5 D 4 A 5 E 6 E 6 Q 5 D 6 E 7 W 7 I 6 K 7 K 8 Q 8 S 7 T 8 S 9 L 9 T 8 N 9 A 10 : : Y 151 - - - - - - Q 152 - - - - - - G 153 - - - - - - END
#[COMMENT] bsspfile_head1 comment_for_bsspfile1 bsspfile_headr2 comment_for_bsspfile2 : : #MAXLENGTH [MAXIMUM_AA_LENGTH_IN_LIBRARY]The first field splited by spaces is for a library BSSP file. The following fields are for comments of each library structures. You can put anything such a protein name and a taxonomy id in these fields. The bottom line started by ``\#MAXLENGTH'' is the maximum length of proteins in the library. If you omit this line, Matras uses the default value (1500 amino acids). We show an example of a list file using SCOP taxonomy ID as comments.
119l- d.2.1 1a02F h.1.3 1a04A c.23.1 - a.4.6 1a0aA a.38.1 1a0i- d.142.2 - b.40.4 1a0p- a.60.9 - d.163.1 : #MAXLENGTH 1419