University of Göttingen | Faculty of Biology | Inst. of Microbiology and Genetics | Dep. of Bioinformatics

DIALIGN-SEC [manual]

Multiple sequence alignment using secondary structure prediction

Multiple alignment programs are usually based solely on primary sequence information. However, since proteins are more conserved at the structural level than at the primary-sequence level, attempts have been made to use predicted secondary structure information for improved alignment, e.g. by Heringa (1999) or Kim and Xie (2006).

Here, we offer a WWW-based multiple-alignment tool that uses protein secondary structure prediction produced by PSIPRED v2 (Jones, 2004). We use DIALIGN 2 that calculates multiple alignments based on local pairwise homologies, so-called fragments (Morgenstern et al., 1996; Morgenstern, 2004). Unlike in the standard version of DIALIGN, our server uses similarities at the primary-sequence level as well as at the secondary-structure level to score these fragments.

Input sequence file:

The input for our program is a single ASCII file containing the sequences to be aligned in multiple FASTA format. We take as running example the following dataset BB11001 from the BAliBASE 3 benchmark.

        
>1aab_
GKGDPKKPRGKMSSYAFFVQTSREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGE
>1j46_A
MQDRVKRPMNAFIVWSRDQRRKMALENPRMRNSEISKQLGYQWKMLTEAEKWPFFQEAQKLQAMHREKYPNYKYRPRRKAKMLPK
>1k99_A
MKKLKKHPDFPKKPLTPYFRFFMEKRAKYAKLHPEMSNLDLTKILSKKYKELPEKKKMKYIQDFQREKQEFERNLARFREDHPDLIQNAKK
>lef_A
MHIKKPLNAFMLYMKEMRANVVAESTLKESAAINQILGRRWHALSREEQAKYYELARKERQLHMQLYPGWSARDNYGKKKKRKREK
For each sequence, the first line starts with ">" and contains the name of the sequence.

Approach:

Given a set of input sequences, our method first calculates all pairwise alignments using the standard version of DIALIGN. Our web server allows the user to apply a threshold T to remove low-scoring local similarities from these pairwise alignemnts alignment.

The standard version of DIALIGN includes the fragments contained in the respective pairwise alignments greedily into a growing multiple alignment, provided they are consistent with each other, i.e. as long they fit together in a single output MSA. The priority of the fragments in the greedy algorithm depends on the degree of similarity at the primary-sequence level.

In our structure-based version, the priority of the fragments in the greedy procedure depends not only on primary-sequence similarity but also on the degree of similarity between the predicted secondary structures. Details are explained in a forthcoming paper (Subramanian et al., submitted).

Program Output:

Our web server creates different output files containing

This is DIALIGN alignment format:

The output of DIALIGN with secondary structure in our running example is as follows:
program call:  /c1/scratch/disec/libexec/dialign2 -thr 0 -sec input.fa

1aab_ 1 gkgd------ PKKPrgkmss yafFVQTSRE EHKKK---HP DASV-NFSEF 1j46_A 1 mq------DR VKRPMNA--- ---FIVWSRD QRRKMALENP RM---RNSEI 1k99_A 1 mkklkkhpDF PKKPLTP--- ---YFRFFME KRAKYAKLHP EM---SNLDL lef_A 1 m--------H IKKPLNA--- ---FMLYMKE MRANV---VA ESTLkESAAI 0000000000 0000000000 0000000000 0000000000 0000013399 1aab_ 41 SKKCSERWKT MSAKEKGKFE DMAKADKARY EREMktyipp kge------- 1j46_A 36 SKQLGYQWKM LTEAEKWPFF QEAQKLQAMH REk------- YP------NY 1k99_A 42 TKILSKKYKE LPEKKKMKYI QDFQREKQEF ERNLarfred HP------DL lef_A 34 NQILGRRWHA LSREEQAKYY ELARKERQLH MQl------- YPgwsardNY 9999999999 9999999999 9999966666 6655000000 0000000000 1aab_ 84 ---------- --- 1j46_A 73 KYRPRRKakm lpk 1k99_A 86 IQNAKK---- --- lef_A 77 GKKKKRKrek --- 0000000000 000

Secondary structures predicted by PSIPRED are given in the following format:

83 CCCCCCCCCCCCCHHHHHHHHHHHHHHHHCCCCCCCHHHHHHHHHHHHHHCCHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCC
85 CCCCCCCCCCHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHHHHHCCCHHHHHHHHHHHHHHHHHHHHHCHCCCCCCCCCCCCCCC
91 CCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHHHHHHCCHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCHCCCCCC
86 CCCCCCCCHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHHHHHCCCHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCC

This is PHYLIP tree format:

 
(((1aab_       :0.000019,
1k99_A      :0.000019):0.000055,
1j46_A      :0.000075):0.000213,
lef_A       :0.000287);

Trees can be visualized using the drawtree program contained in Joe Felsenstein's PHYLIP software package.

Back to submission form.