MAFFTash URL: http://sysimm.org/MAFFTash//
MAFFTash is a server that calculates multiple sequence alignments from
sequences and structures. It consists of two existing programs, MAFFT and
ASH. ASH is a structural alignment program that
utilizes an extension of the double dynamic programming algorithm to maximize the number of
structurally equivalent residues between two proteins [1-3]. The pairwise structural alignments are
then subjected to MAFFT, a widely-used multiple sequence alignment program [4-9]. MAFFT uses the
structural alignments to construct an overall multiple alignment that is consistent with the
pairwise structural alignments as much as possible. Sequence homologs with no structural
information can also be included in the alignment.
MAFFTash is a server that calculates multiple sequence alignments from sequences and structures.
To run MAFFTash you must provide a list of sequences and/or PDB and chain identifiers. The list may
be pasted into the text area or uploaded from an external file. In either case, the sequences must
be input in FASTA format, and the PDB and chain identifier must be joined as a string of length 5
(e.g. 1nagA). Each PDB and chain identifier line must be proceeded by a line containing the string
'PDBID' and nothing else. For example:
is valid input. Note that chain identifiers are now mandatory for all PDB entries. Whitespaces
(' '), dashes ('-'), and underbars ('_') are not acceptable chain identifiers. If you are uncertain
about which chain IDs to use, please use PDBj Mine (the PDBj search engine). Type in your PDB ID, then click on
'sequence information (FASTA format)'. You will see the PDB sequence for each chain in FASTA
format. Note also, that MAFFTash provides a tool for automatically picking up a set of PDB IDs,
given a set of (FASTA-formatted) sequences. To use this feature, click 'Prep-MAFFTash' under the
Example on the MAFFTash top page.
You are not limited to PDB entries and may provide your own PDB-formatted structures. To upload
your own structures, first specify the number of files to be uploaded and a new form will be
generated. The ‘Structure weight’ (default value .2) controls how much influence ASH has on the
MAFFT alignment. Different values might need to be experimented with, depending on the ratio of
structures to sequences.
MAFFTash works by first aligning all pairs of structures using a modified version of the program
ASH, then extracting the aligned residue pairs and constructing a
multiple sequence alignment of all sequences with a reward for the structurally aligned residue
The ASH was modified so that each structure is first partitioned into domains using Protein
Domain Parser, then all pairs of domains are aligned using conventional ASH. Finally, a complete
pairwise alignment of the whole structure is formed from a dynamic programming calculation
constructed from the complete set of domain-domain alignments. In this way, the ASH alignment is
'rigid' within domains but 'flexible' between domains.
A multiple sequence alignment is computed using a modified version of the program MAFFT.
MAFFTash provides a tool for automatically preparing valid MAFFTash input from a limited set of
sequences or PDB IDs. To use this feature, click Prep-MAFFTash under the Example on the MAFFTash
The Prep-MAFFTash entry form looks like the MAFFTash page. There is a text window where
sequences and/or PDB IDs can be pasted; however there are a number of additional options. These are
grouped into three sections:
Add structures. This feature will use BLAST to search the PDB using and input
you type in the text box as a query. There are three parameters that control what Prep-MAFFTash
- a. Max seq ID between added structures (default 90%). This parameter
prevents many instances of a particular structure from being retrieved. If you want fewer
structures, lower the value; if you want more, increase it; using 100 will add all PDB
entries that are homologous to your input. The pruning of sequences is performed using the
program cd-hit 
- b. Min seq ID from original input (default 20%). This parameter controls
what BLAST considers a sequence homolog. Increasing this parameter will reduce the number of
PDB entries retrieved; decreasing it will increase the number retrieved. However, an internal
parameter prevents PDB entries with e-values0.01 from being included.
- c. Min coverage of original input (default 50%). This parameter
determines how much of particular PDB entry must ‘cover’ the query sequence. Ideally, the
structure would cover all or most of the query; it it does not, you might consider breaking
your query sequences into domains.
Add ASH structural neighbors. This feature allows you to pull in structural
homologs to your query sequence(s). We maintain a database of ASH structural alignments. If one
or more of your queries can be matches to one or more of the structures for which pre-computed
alignments are available, the list of structural ‘neighbors’ can be added subject to the
- a. Max seq ID between added structures (default 90%). This parameter is
analogous to 1.a (above) except that it applies to the ASH structural neighbors.
- b. Min seq ID from original input (default 0). This parameter is
analogous to 1.b (above) except that it applies to the ASH structural neighbors.
Add sequences. This feature allows you to pull in sequences from the UniRef
database. The options are similar to those above.
- a. Max seq ID between added sequences (default 90%). This option is
analogous to 1.a (above) except that it applies to the Uniref100 sequences. Be careful about
making this too large as there are potentially many homologous sequences.
- b. Min seq ID from original input. (default 0). This parameter is
analogous to 1.b (above) except that it applies to the UniRef100 sequences. Again, be careful
about adding too many sequences, unless you are sure that is what you want.
The output of Prep-MAFFTash is just a MAFFTash-formatted input file. You can paste it into the
MAFFTash text window or upload it as a file.
MAFFTash will send an email containing a link to your results. The results consist of a
FASTAformatted multiple sequence alignment (a text file) as well as a Jalview link from which
you can view the multiple sequence alignment in your web browser.
Figure 1. MAFFTash alignment viewed through
- Standley, Toh, Nakamura,ASH structure alignment package: sensitivity and selectivity in
domain classification., BMC Bioinformatics 8 (4),116 (2007) Link
- Standley, Toh, Nakamura,GASH: an improved algorithm for maximizing the number of equivalent
residues between two protein structures., BMC Bioinformatics 6
- Standley, Toh, Nakamura,Detecting local structural similarity in proteins by maximizing
number of equivalent residues, Proteins 57 (2),381-91 (2004) Link
- Katoh, Asimenos, Toh,Multiple Alignment of DNA Sequences with MAFFT. In Bioinformatics for
DNA Sequence Analysis edited by D. Posada, Methods in Molecular Biology
537,39-64 (2009) Link
- Katoh, Toh,Improved accuracy of multiple ncRNA alignment by incorporating structural
information into a MAFFT-based framework., BMC Bioinformatics 9,212
- Katoh, Toh,Recent developments in the MAFFT multiple sequence alignment program.,
Briefings in Bioinformatics 9, 286-298 (2008) Link
- Katoh, Toh,PartTree: an algorithm to build an approximate tree from a large number of
unaligned sequences., Bioinformatics 23, 372-374 (2007) Link Errata
- Katoh, Kuma, Toh, Miyata,MAFFT version 5: improvement in accuracy of multiple sequence
alignment., Nucleic Acids Res. 33, 511-518 (2005) Link
- Katoh, Misawa, Kuma, Miyata,MAFFT: a novel method for rapid multiple sequence alignment based
on fast Fourier transform., Nucleic Acids Res. 30, 3059-3066 (2002)
- Li, Jaroszewski, Godzik,Clustering of highly homologous sequences to reduce the size of large
protein databases., Bioinformatics 17, 282-283 (2001) Link
- Waterhouse, Procter, Martin, Clamp, Barton,Jalview Version 2--a multiple sequence alignment
editor and analysis workbench., Bioinformatics 25 (9), 1189-119 (2009)