Can we trust the bootstrap in high-dimension?

Go to main content

PDF

Description

We consider the performance of the bootstrap in high-dimensions for the setting of linear regression, where p < n but p/n is not close to zero. We consider ordinary least-squares as well as robust regression methods and adopt a minimalist performance requirement: can the bootstrap give us good confidence intervals for a single coordinate of $\beta$? (where $\beta$ is the true regression vector). We show through a mix of numerical and theoretical work that the bootstrap is fraught with problems. Both of the most commonly used methods of bootstrapping for regression – residual bootstrap and pairs bootstrap – give very poor inference on $\beta$ as the ratio p/n grows. We find that the residuals bootstrap tend to give anti-conservative estimates (inflated Type I error), while the pairs bootstrap gives very conservative estimates (severe loss of power) as the ratio p/n grows. We also show that the jackknife resampling technique for estimating the variance of $\hat{beta}$ severely overestimates the variance in high dimensions. We contribute alternative bootstrap procedures based on our theoretical results that mitigate these problems. However, the corrections depend on assumptions regarding the under- lying data-generation model, suggesting that in high-dimensions it may be difficult to have universal, robust bootstrapping techniques.

Details

Title

Can we trust the bootstrap in high-dimension?

Creator

El Karoui, Noureddine, Author
Purdom, Elizabeth, Author

Published

Statistics Department, University of California, Berkeley, University of California at Berkeley, Berkeley, California, February 2015

Full Collection Name

Statistics Technical Reports

Other Identifiers

824

Type

Text

Format

technical reports

Extent

53 pages

Archive

Mathematics Statistics Library

Standard Rights Statement

Transmission or reproduction of materials protected by copyright beyond that allowed by fair use requires the written permission of the copyright owners. Works not in the public domain cannot be commercially exploited without permission of the copyright owner. Responsibility for any use rests exclusively with the user.

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

Statistics Technical Reports

Files

Statistics

Download Full History

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS