--------------------------------------------------------------

     Flexible Multiple Alignment Program - Version 0.34
         - Suboptimal and Parametric Analysis -

--------------------------------------------------------------

1. This archive

This archive consists of the following files:

FMA/*      :  This directory consists of programs.

README
Makefile
align.c
align.h
astar.c
astar_pairopt.c
astar_pairopt_step.c
astar_pairopt_ub.c
dp.c
eppstein.c
eppstein_heap.c
estimator.c
heap.c
heap.h
histogram.c
io.c
node.c
node_list.c
node_set.c
parametric.c
score.c
weight_generator.c

FMA/dat/* :  This directory consists of data, such as score tables and
             sample protein sequences.

BLOSUM62.score
MUTMTX.score
PAM-120.score
PAM-200.score
PAM-250.score
PAM-320.score
PAM-40.score
PAM-80.score
PET-250.score
PET-91.score
PROPERTIES.score
README.scores
STRUCTURE.score
athb.dat
chitin-synthase.dat
ef1a.dat
ef1a.txt
ef1as.dat
ef2.dat
globin.dat
hat.dat
rhodopsin.dat
tcrb.dat
tcrd.dat
tnfa.dat


2. How to compile programs.

Only you have to do is:

  % make

Then you have two programs, `align' and `weight_generator'.
I checked these programs only on Sun OS 4.1.x (Sun SS5, SS10, SS20),
Solaris 2.5 (Sun Ultra 1), and linux (Dell 466/L).
If you have any problems or questions, please contact Tetsuo Shibuya
(tshibuya@trl.ibm.co.jp).


3. Features of these programs

`align' is the flexible multiple alignment program which can deal with
optimal solution of multiple alignment, enumeration of suboptimal solutions,
parametric analysis, and so on.
`weight_generator' is a toy program which can generate weight tables easily.


4. How to use

Examples :

  If you want ....

a) to see help

 % align -h

b) to compute the optimal alignment of the top 7 sequences in dat/ef1a.dat
 using the default score table (PAM-250)

 % align -7 -f dat/ef1a.dat

Notes.
  Only the top 7 sequences in dat/ef1a.dat can be aligned without using
the enhanced A* algorithm (-u option) on SS-20 with 128 megabyte memory.

c) to compute the optimal alignment of the top 8 sequences in dat/ef1a.dat
using the default score table (PAM-250) using the enhanced A* algorithm
with the lower bound of the optimal score, 33129 (optimal score).

 % align -8 -f dat/ef1a.dat -u 33129

Notes.
 You can obtain this lower bound using heuristic algorithm such
as A algorithm [1] as in example e), iterative methods which you may
have, and so on.
 The top 9 sequences in dat/ef1a.dat cannot be aligned even with this
enhanced A* algorithm.

d) to compute the optimal alignment of the 2nd, 3rd, and 7th sequences in
dat/ef1a.dat using the score table PAM-320, and letting internal gap cost
-3 and external gap cost -4

 % align -3 2 3 7 -c dat/PAM-320.score -f dat/ef1a.dat -g -3 -4

e) to compute an alignment heuristically using A algorithm [1] of 1.02
of top 10 sequences in dat/ef1a.dat

 % align -10 -f dat/ef1a.dat -p -k 50 51

f) to compute the histogram of suboptimal solutions of top 7 sequences
 in dat/ef1a.dat using the default score table (PAM-250) up to Delta = 20.

 % align -7 -D 20 -f dat/ef1a.dat

g) to compute the histogram of suboptimal solutions of top 7 sequences
 in dat/ef1a.dat using the default score table (PAM-250) up to Delta = 10
 and output the all the computed suboptimal solutions

 % align -7 -D 20 -f dat/ef1a.dat -P

h) to compute the histogram of suboptimal solutions of top 7 sequences
 in dat/ef1a.dat using the default score table (PAM-250) up to Delta = 20
 with more efficient enumeration method [3,4]

 % align -7 -D 20 -d -f dat/ef1a.dat

i) to compute the histogram of suboptimal solutions of top 8 sequences
 in dat/ef1a.dat using the default score table (PAM-250) up to Delta = 20
 with more efficient enumeration method [3,4] and the enhanced A* algorithm

 % align -8 -D 20 -d -f dat/ef1a.dat -u 33129

j) to compute the histogram of suboptimal solutions of top 7 sequences
 in dat/ef1a.dat using the default score table (PAM-250) up to Delta = 10
 with more smart enumeration method [3,4] and output the all the
 computed suboptimal solutions

 % align -7 -D 10 -d -f dat/ef1a.dat -P

k) to do parametric analysis between the gap penalties -1.5 and -24

 % align -5 -gg -3 2 -24 1 -f dat/ef1a.dat

l) to do parametric analysis between the score tables foo.score
 and bar.score

 % align -5 -cc foo.score bar.score -f dat/ef1a.dat

m)  to do parametric analysis between the weight tables foo.weight
 and bar.weight

 % align -5 -ww foo.weight bar.weight -f dat/ef1a.dat


5. History of this program

  The engine for computation of the optimal alignment by A* algorithm
is written by Takahiro Ikeda in 1994 [1].
  The engine for enumeration of suboptimal alignments is written by
Tetsuo Shibuya in 1995 (Version 0.20 - 0.29) [3,4].
  The engine for parametric optimization is written by Tetsuo Shibuya
in 1996 (Version 0.30 - 0.34) [2,4].


6. Future Works

  These problems remains as future works:

  This program does not support computing the optimal solution
using weights, though it supports it in the suboptimal analysis.
(It may be only an hour's work...)
  As for the parametric analysis, the user interface must be improved.
  For all the procedures, this program does not support floating points
and big numbers, which causes inconvenience especially parametric analyses.
  Furthermore, this program does not use affine gap costs.

  This program consists a lot of unnecessary codes in it, such as
bidirectional A* algorithm. Thus the efficiency of this program will
be much higher if we rewrite the code.


7. Copyrights

  You can freely redistribute or modify this archive, but T. Shibuya
will be very happy if you kindly tell him(tshibuya@trl.ibm.co.jp) what
you did or will do using this program.

Copyrights 1997 Tetsuo Shibuya and Takahiro Ikeda


8. References

You can get most of the following papers from the web page:

   http://naomi.is.s.u-tokyo.ac.jp/papers.html


[1] T. Ikeda and H. Imai:
Fast A* Algorithms for Multiple Sequence Alignment.
Proceedings of the Genome Informatics Workshop V, Yokohama,
Universal Academy Press, 1994, pp.90-99.

[2] T. Shibuya and H. Imai:
Parametric Alignment of Multiple Biological Sequences,
Proceedings of Genome Informatics Workshop VII, Tokyo,
Universal Academy Press, December 1996, pp.41-50.

[3] T. Shibuya and H. Imai:
Enumerating Suboptimal Alignments of Multiple Biological Sequences Efficiently.
Proceedings of the Pacific Symposium on Biocomputing '97,
Hawaii, January 1997, pp. 267-276.

[4] T. Shibuya and H. Imai:
New Flexible Approaches for Multiple Sequence Alignment,
Journal of Computational Biology, Vol 4, No. 3, 1997,
Mary Ann Liebert, Inc., pp. 385-413
(Conference version: Proceedings of 1st Annual International Conference
 on Computational Molecular Biology (RECOMB '97), Santa Fe,
 January 1997, pp. 409-420.)
