Math 574 -- Optimization Models in Computational Biology (Spring 2008)
Bioinformatics and computational biology (BCB) is one of the ``hottest'' interdisciplinary areas of science today. Roughly put, BCB uses the theory and practice of computational techniques to analyze biochemical data at the molecular level. Some of the widely studied problems in this area include DNA sequencing, sequence comparisons, phylogeny and evolution, and protein structure prediction. Most such problems are inherently combinatorial in nature, and involve the selection of one or more ideal options from a huge number of candidates. Hence, they can naturally be modeled as discrete or combinatorial optimization problems. At the same time, the relatively large sizes of these problems meant that research has been focused mostly on heuristic solution techniques, until recently. Advances made in the area of computational optimization in the last few years have opened up the possibility of modeling and solving several BCB problems using optimization techniques.
This graduate course will introduce a series of optimization models that find applications to various problems from the BCB area. We will cover the basics of the following topics: dynamic programming; graph algorithms (paths and flows); clustering and trees; linear, non-linear, and integer programming (including convex polytopes); certain probabilistic models; and very limited algebraic statistics. Emphasis will be on problem formulations, and not on the underlying theory of the models. We will also introduce relevant software tools (AMPL/CPLEX, MATLAB etc.). The applications of these optimization models to BCB will be illustrated by studying problems such as sequence motif search, DNA sequence alignment (including parametric sequence alignment), recombinations and other related phylogenetic problems, protein sequencing, and protein structure prediction (including side-chain positioning, scoring functions for threading, molecular dynamics etc.).
We will not follow any particular book, and will rely mostly on handouts and class notes. Material from several recent (and not-so recent) papers will also be covered. Since the main goal of this course is to expose the audience to this interdisciplinary research area, evaluation will be done through homeworks (around six assignments of moderate length) and a course project. No exams will be given.
AnnouncementsFriday, Feb 15: Hw3 is posted (see below). Its due in around 10 days (Tuesday, Feb 26)
Tutorial from Mathworks page
Another guide to MATLAB from UBC CS.
Partial Digest algorithm (M-files): PartialDigest.m Place.m IsSubset.m
AMPL Handout 1
Farmer Jones example: model file data file
Inventory model: model file data file Output from AMPL
Ugly Chucky problem: model file data file Output from AMPL
DDP IP: model file data file Output from AMPL