Instruction for the command 'dkcombu' in the "KCOMBU" program

Apr 19, 2015
Takeshi Kawabata
(kawabata@protein.osaka-u.ac.jp)
Laboratory of Protein Informatics
Institute for Protein Research, Osaka University

[Reference for KCOMBU]
Kawabata T. Build-up algorithm for atomic correspondence between chemical structure. J.Chem.Info.Model., 2011, 51, 1775-1787.


The program 'dkcombu' is developed for a faster chemical structrure comparison than MCS comparison programs, such as 'lkcombu'. To search much faster than MCS, the atom pair descriptor search method is introduced as the first filtering step.

Installation

The source code of the kcombu is mainly written in C, and developed and executed the linux environment (actually on the CentOS). Some additional programs for 2D graphics and statistical analyses are written in python. For the installation, you need the gcc compiler. If you want to use another compipler, please change the "Makefile" in the "src" directory. The standard installation procedures are as follows:

  1. download the file "kcombu-src-[date].tar.gz" file.
  2. tar zxvf kcombu-src-[date].tar.gz
  3. cd src
  4. make -f Makefile.dkcombu

If the sources is successfully compiled, an execution file "dkcombu" will appear in the "../src" directory.


Simple Usage of 'dkcombu' program


File Format of Output Result

The format of the search result file (-osc) is described in
README_lkcombu.html.
The format of the similarity matrix (-osm) is described in README_lkcombu.html.

Options for calculating MCS

Input options for calculating MCS is the same as that of the program pkcombu(see README_pkcombu.html).

Atom pair descriptor

The atom pair desctiptor was proposed by Cahart et al.(1985). The atom pair descriptor encodes atom pairs with the atom types and the shortest separation distance on the shortest path ( [atom type1]-[distance]-[atom type2] ). The vector of the observed count of atom pairs is used as the descriptor. The default atom classification is the "KCOMBU"-recommended (-at K). It employs 12 classes for the classfication (C,C@,C1,O,O@,O1,N,N@,N1,P,S,X). The default separation (-sep) is set to 10. Therefore, number of atom pair pattern is 720 (=12 x 12 /2 x 10).

An example of the atom pair descriptor for a serine molecule is shown as follows.


Atom pair desctiptor vector for a serine molecule.
Count Atom Pair
[atom type1]-[distance]-[atom type2]
2 [C ]-[1]-[C ]
3 [O1]-[1]-[C ]
1 [N1]-[1]-[C ]
1 [C ]-[2]-[C ]
3 [O1]-[2]-[C ]
1 [O1]-[2]-[O1]
2 [N1]-[2]-[C ]
3 [O1]-[3]-[C ]
3 [N1]-[3]-[O1]
2 [O1]-[4]-[O1]


File format of descriptor file

[Reference for Atom pair desctiptor]
Cahart, R.E., Smith, H.S., Venkataraghavan R. Atom pairs as molecular features in structure-activity studies: definition and applications. J.Chem.Inf.Comput.Sci., 25, 64-73 (1985).