Resampling Methods for Protein Structure Prediction

Blum, Benjamin Norman; EECS Department, University of California

PDF

Description

Ab initio protein structure prediction entails predicting the three-dimensional conformation of a protein from its amino acid sequence without the use of an experimentally determined template structure. In this thesis, I present a new approach to ab initio protein structure prediction that divides the search problem into two parts: sampling in a space of discrete-valued structural features, and continuous search over conformations while constraining the desired features. Both parts are carried out using Rosetta, a leading structure prediction algorithm. Rosetta is a Monte Carlo energy minimization method requiring many random restarts to find structures near the correct, or "native" structure. Our methods, which we call "resampling" methods, make use of an initial round of Rosetta-generated local minima to learn properties of the energy landscape that guide a subsequent resampling round of Rosetta search toward better predictions. One of the main innovations of this thesis is to attempt to deduce from the initial set of Rosetta models not the entire native conformation but rather a few specific features of the native conformation. Features include backbone torsion angles, per-residue secondary structure, exposure of residues to solvent, and a three-tiered hierarchy of beta pairing features. For each feature there is one "native" value: the one found in the native structure. Native feature values are generally enriched in structures with low energy, as the native structure of a protein is significantly lower in energy than non-native structures and the energy of a protein is to some extent the sum of spatially local contributions. We have developed two methods for feature-space resampling based on this observation. The first method employs feature selection methods to identify structural feature values that give rise to low energy, which are then enriched in the resampling round. The second, more sophisticated method updates the sampling distribution for all features at once, not just a selected few, by predicting the likelihood that each feature value is native. Our results indicate that both methods, especially the second one, yield structure predictions significantly better than those produced by Rosetta alone.

Details

Title

Resampling Methods for Protein Structure Prediction

Creator

Blum, Benjamin Norman, Author
EECS Department, University of California, Publisher

Published

2008-12-22

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2008-184

Type

Text

Format

technical reports

Extent

86 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket