[Reference for KCOMBU]
Kawabata T. Build-up algorithm for atomic correspondence between chemical structure.
J.Chem.Info.Model., 2011, 51, 1775-1787.
The source code of the kcombu is mainly written in C, and developed and executed the linux environment. Some additional programs for 2D graphics and statistical analyses are written in python. For the installation, you need the gcc compiler. If you want to use another compipler, please change the "Makefile" in the "src" directory. The standard installation procedures are as follows:
tar zxvf kcombu-src-[date].tar.gz
cd src
make -f Makefile.pkcombu
If the sources is successfully compiled, an execution file "pkcombu" will appear in the "../src" directory.
For 2D molecular graphics, we prepare python scripts in the directory "src/moldraw". The script "moldraw.cgi" is for making image/PDF file of 2D molecules, it requires python and Python Imaging Library (PIL) ( http://www.pythonware.com/products/pil/ ).
$pkcombu -A [molecule_fileA] -B [molecule_fileB] -oam [atom-matching file]
$pkcombu -A [molecule_fileA] -B [molecule_fileB] -con T -mtd 1
$pkcombu -A [molecule_fileA] -B [molecule_fileB] -con C -alg X
$pkcombu -h
$pkcombu -hmcs
< input options for 'pkcombu'> -A : molecule A (molA)(*.sdf|*.mol2|*.pdb|*.kcf|*.smi)[] -B : molecule B (molB)(*.sdf|*.mol2|*.pdb|*.kcf|*.smi)[] -fA,-fB : file formats.'P'db,'S'df,'K'cf, '2':MOL2, 'M':smiles [--] -aA,-aB : AtomHetero type. 'A'tom 'H'etatm 'B'oth[BB]The program "pkcombu" can read several file formats of chemical structures: sdf, mol2, pdb, kcf and smi. The file format is distinguished by file exetentions : *.sdf -> SDF, *.mol2 ->MOL2, *.pdb -> PDB, *.kcf -> KCF, *.smi -> SMILES. For molecular files without proper file extensions, users should directly assign the format types by the option '-fA' or '-fB'. For example, if a file "hoge" is for molecule A and in SDF format, users should add the option "-A hoge -fA S". For the molecule in PDB format, a bond connection table is automatically generated from the 3D coordinates of atoms, even if it does not have 'CONNCT' lines.
The options "-oam" and "-oAm" are for outputting detailed descriptions of calcualted atom matching (atom correspondence). The option "-oam" is for outputting the best match, the option "-oAm" is for outputting all the calculated candidate matches. The file format of atom matching file is described in other section.
The option '-sup3' is for superimposing molecules. The pkcombu program can simply superimpose molecule A with smallest RMSD against matched atoms of molecule B. For example, if the user assign "-sup3 T -opA outA.pdb", the structure of superimposed molecule A (for molecule B) is written as "outA.pdb" in PDB format.
The option "-oras" is for showing matching atoms using the molecular visualization program RasMol. If the user assign "-oras out", the pkcombu program generates two files "out-A.ras" and "out-B.ras". To visualize matched atoms in molecule A, a user inputs commands as follows:
$ rasmol -pdb [moleculeA(in PDB)] or $ rasmol -mdl [moleculeA(in SDF)] or $ rasmol -mol2 [moleculeA(in MOL2)] RasMol> script "out-A.ras"
The options "-ops" is for output PostScript file to show corresponding atom pairs.
This option basically assumes that input molecules has two-dimentional structures.
This figure is generated by following commands:
$pkcombu -A SIA.sdf -B G39.sdf -ops out.ps $evince out.ps
$pkcombu -A molA.pdb -B molB.pdb -oam out.am
$moldraw.cgi -iam out.am
$display out.png $eog out.pngThe program "moldraw.cgi" can also generating PDF file by a following command:
$moldraw.cgi -iam out.am -G P -of out.pdf
Pairwise atom matching calculated by the program pkcombu,
can be written in -oam
option.
Its format is described as follows:
>[number for atom machings] [numfileAB(1,2)] [numAB(3,4)] [atomnameAB(5,6)] [atomtype(7,8)] [EC_AB(9)] [ECdiff(10)] [Nnei_diff(11)] : //The numbers [numfileAB] are atomic numbers described in the file. The numbers [numAB] are atomic numbers, which starts from '1' and increase one by one. For the SDF and MOL2 files, [numfileAB] and [numAB] are the same. However, for the PDB file, [numfileAB] and [numAB] can be diffrent, if the file is taken from the part of the ligand-protein complex. However, numbers [num_in_fileA(2)] and [num_in_fileB(6)] may start with the number 2943; this always happens if the file is taken from the part of the PDB file. An example of the atom matching from the comparison of the PDB file(ATP_1atpE.pdb) and SDF file (ADP.sdf) is shown as follows:
#>> Atom_Number MATCHing file << #COMMAND 'pkcombu -T ATP_1atpE.pdb -B ATP.sdf -oam oam' #DATE_START Dec 13,2013 15:17:12 #DATE_END Dec 13,2013 15:17:12 #COMP_TIME 0.021546 seconds #AlgoType B #ConnectGraphType C #Weight Wneiatm 1.00 Wextcon 1.00 Wtopodis 0.00 #CalcFinished F #MoleculeA ATP_1atpE.pdb #MoleculeB ATP.sdf #FiletypeA P #FiletypeB S #NatomA 31 #NatomB 47 #NheavyatomA 31 #NheavyatomB 31 #TotalNatompair 199 #NpermuA 0 #NpermuB 0 #Len_of_MATCHlist 1 #RankMatchOutput 1 >1 #Npair_atom 31 #tanimoto 1.000000 #select_dis 0.000000 #Ncomponent 1 #Maxdiff_topodis 0 #[numfileAB][numAB] [atomnameAB][atomtype] [EC_AB] [ECdiff] [Nnei_diff] 2939 1 1 1 PG P1 P 20 20 0 0 2940 2 2 2 O1G O1 O1 5 5 0 0 2941 3 3 3 O2G O2 O1 5 5 0 0 2942 4 4 4 O3G O3 O1 5 5 0 0 2943 5 5 5 PB P2 P 24 24 0 0 2944 6 6 6 O1B O4 O1 6 6 0 0 2945 7 7 7 O2B O5 O1 6 6 0 0 2946 8 8 8 O3B O6 O 11 11 0 0 2947 9 9 9 PA P3 P 22 22 0 0 2948 10 10 10 O1A O7 O1 6 6 0 0 2949 11 11 11 O2A O8 O1 6 6 0 0 2950 12 12 12 O3A O9 O 12 12 0 0 2951 13 13 13 O5' O10 O 11 11 0 0 2952 14 14 14 C5' C1 C 13 13 0 0 2953 15 15 15 C4' C2 C@ 18 18 0 0 2954 16 16 16 O4' O11 O@ 15 15 0 0 2955 17 17 17 C3' C3 C@ 17 17 0 0 2956 18 18 18 O3' O12 O1 7 7 0 0 2957 19 19 19 C2' C4 C@ 18 18 0 0 2958 20 20 20 O2' O13 O1 7 7 0 0 2959 21 21 21 C1' C5 C@ 21 21 0 0 2960 22 22 22 N9 N1 N@ 21 21 0 0 2961 23 23 23 C8 C6 C@ 13 13 0 0 2962 24 24 24 N7 N2 N@ 13 13 0 0 2963 25 25 25 C5 C7 C@ 19 19 0 0 2964 26 26 26 C6 C8 C@ 16 16 0 0 2965 27 27 27 N6 N3 N1 6 6 0 0 2966 28 28 28 N1 N4 N@ 10 10 0 0 2967 29 29 29 C2 C9 C@ 10 10 0 0 2968 30 30 30 N3 N5 N@ 12 12 0 0 2969 31 31 31 C4 C10 C@ 21 21 0 0 //
When users want pkcombu/fkcombu to read their own atom matching file, by
the option -iam
, a much simpler format is favorable. The pkcombu/fkcombu can read
a following format, which contains only two columns; [numfile] for molecule A and [numfile]
for molecule B. The [numfile] is an atomic number described in the molecular file.
2939 1 2940 2 2941 3 2942 4 2943 5 2944 6 2945 7 2946 8 2947 9 2948 10 2949 11 2950 12 2951 13 2952 14 2953 15 2954 16 2955 17 2956 18 2957 19 2958 20 2959 21 2960 22 2961 23 2962 24 2963 25 2964 26 2965 27 2966 28 2967 29 2968 30 2969 31