Mathematics Colloquium: A Linear Optimization Based Predictor for Solubility Mutagenesis
4:10 p.m. Neill Hall 5W
Mutagenesis is the process of changing one or more amino acids in a protein to others. It is used as a standard tool to engineer proteins with desirable properties - increased stability, decreased solubility, an so on. Computational tools are invaluable to narrow down the number of potential mutations to make in the lab, so as to identify one that works. We have developed a classification model to predict whether protein solubility will be increased or decreased due to mutations. We build models using concepts from computational geometry to capture relationships between the protein structure and its sequence. Subsequently, we define a weighted log-likelihood scoring function for making predictions. Weights for this predictor are obtained through linear optimization (LO). Model robustness and prediction accuracy are demonstrated using various cross validation techniques. We also compare our LO model to predictors developed by other standard machine learning methods such as Support Vector Machines (SVM) and the Least Absolute Shrinkage and Selection Operator (LASSO). On the dataset of mutations we have assembled, the LO model performs the best overall.