Math 574  -- Optimization Models in Computational Biology (Spring 2008)

Course Description

Bioinformatics and computational biology (BCB) is one of the ``hottest'' interdisciplinary areas of science today. Roughly put, BCB uses the theory and practice of computational techniques to analyze biochemical data at the molecular level. Some of the widely studied problems in this area include DNA sequencing, sequence comparisons, phylogeny and evolution, and protein structure prediction. Most such problems are inherently combinatorial in nature, and involve the selection of one or more ideal options from a huge number of candidates. Hence, they can naturally be modeled as discrete or combinatorial optimization problems. At the same time, the relatively large sizes of these problems meant that research has been focused mostly on heuristic solution techniques, until recently. Advances made in the area of computational optimization in the last few years have opened up the possibility of modeling and solving several BCB problems using optimization techniques.

This graduate course will introduce a series of optimization models that find applications to various problems from the BCB area. We will cover the basics of the following topics: dynamic programming; graph algorithms (paths and flows); clustering and trees; linear, non-linear, and integer programming (including convex polytopes); certain probabilistic models; and very limited algebraic statistics. Emphasis will be on problem formulations, and not on the underlying theory of the models. We will also introduce relevant software tools (AMPL/CPLEX, MATLAB etc.). The applications of these optimization models to BCB will be illustrated by studying problems such as sequence motif search, DNA sequence alignment (including parametric sequence alignment), recombinations and other related phylogenetic problems, protein sequencing, and protein structure prediction (including side-chain positioning, scoring functions for threading, molecular dynamics etc.).

We will not follow any particular book, and will rely mostly on handouts and class notes. Material from several recent (and not-so recent) papers will also be covered. Since the main goal of this course is to expose the audience to this interdisciplinary research area, evaluation will be done through homeworks (around six assignments of moderate length) and a course project. No exams will be given.

Syllabus

Web page for one of the reference books -   An Introduction to Bioinformatics Algorithms.

Announcements

Friday, Feb 15: Hw3 is posted (see below). Its due in around 10 days (Tuesday, Feb 26)

Handouts

Slides of short summary on Molecular Biology   (see above web page for full slides)

Homeworks    
Homework 1     -- Due on Tuesday, Jan 29.
    Solutions to Homework 1
Homework 2     -- Due on Tuesday, Feb 12.
    Solutions to Homework 2
Homework 3     -- Due on Tuesday, Feb 26.
    Solutions to Homework 3
Homework 4     -- Due on Thursday, Mar 6.
    Solutions to Homework 4
Homework 5     -- Due on Thursday, Apr 3.
    Solutions to Homework 5
Homework 6     -- Due on Thursday, Apr 17.     ListDecoys_Math574.txt    Scores2B.txt
    Solutions to Homework 6

Projects

PCR-based Sequencing (Andy Wu)
ILP for side-chain positioning (Matt Labrum)
Temporal Analysis of Bio Networks (Michael Brunner and Chang Hun You)
LP for Protein Classification (Ye Tian)
Metabolic flux Elucidation (Roberto Echeverry and Rigoberto Rios)
Stochastic SDP for side-chain positioning (Limin Yang)

Papers

Dynamic Programming model for Mass Spectrometry   (explains spectrum graph)  
Approximation Algorithms for SCS problem  (Shortest Common Superstring)
Four-body scoring function for protein decoy discrimination
Linear Programming Techniques for Fold Recognition
Integer Programming model for side-chain positioning
Semidefinite Programming model for side-chain positioning   NEW!!
Protein Threading by Linear Programming
Branch-and-Cut method for Multiple Sequence Alignment    NEW!!
Protein domain decomposition using max-flow problem   NEW!!
Integer Programming for the closest string problem   NEW!!

Software

MATLAB Tutorial from Mathworks page
Another guide to MATLAB from UBC CS.
   Partial Digest algorithm (M-files): PartialDigest.m   Place.m   IsSubset.m

AMPL
AMPL Handout 1
   Farmer Jones example: model file   data file
   Inventory model: model file   data file    Output from AMPL
   Ugly Chucky problem: model file   data file    Output from AMPL
   DDP IP: model file   data file    Output from AMPL

VMD

Links

Introduction to Computational Molecular Biology in MIT.
NCBI's genome data base.
A good animation showing DNA microarray methodology.
Bioinformatics journal web site (free access to articles on campus).
Nathan Edwards' lectures. The first one on Proteomics and Mass Spectrometry is recommended.


Last modified: Tue Feb 15 15:52:47 PST 2011