Description
Quantification and regulation of gene expression are central areas of inquiry in genomic analysis. Ribosome profiling experiments allow us to directly measure the process of gene expression directly at the point of protein production. We present a feed forward neural network model to predict the local translation rate of a protein as a function of the sequence context undergoing translation, as well as RNA structure in this region. Our model predictions correlated well with observed translation rates as measured by ribosome profiling (Pearson’s r = 0.58). We describe a procedure to process ribosome profiling data for this model, discuss underlying assumptions of our model formulation, and present a series of model selection analyses. Finally, we present an algorithm to optimize the coding sequence of a gene for fast protein production, as predicted by our neural network regression model.