The success of deep learning in computational biology has been largely limited to prediction problems, such as protein structure prediction and gene expression prediction. Nevertheless, these successes serve as a testament to the ability of deep neural networks to extract useful insights from datasets of biological sequences, and this has recently motivated research into the applications of deep learning for biological sequence design problems. In this paper, we tackle two important synthetic biology problems: (1) the problem of designing promoter sequences that are differentially expressed and (2) the inverse-protein folding problem of recovering protein sequences from three-dimensional structure. We identify both problems as black-box computational design problems, and we adapt conservative objective models (COMs), a data-driven offline model-based optimization (MBO) technique that has been used successfully on a wide range of design problems, to design biological sequences in these settings. On both problems, we demonstrate that our approach significantly outperforms standard offline MBO techniques.




Download Full History