Multiple Optimality Guarantees in Statistical Learning

Duchi, John; EECS Department, University of California

PDF

Description

Classically, the performance of estimators in statistical learning problems is measured in terms of their predictive ability or estimation error as the sample size n grows. In modern statistical and machine learning applications, however, computer scientists, statisticians, and analysts have a variety of additional criteria they must balance: estimators must be efficiently computable, data providers may wish to maintain anonymity, large datasets must be stored and accessed. In this thesis, we consider the fundamental questions that arise when trading between multiple such criteria--computation, communication, privacy--while maintaining statistical performance. Can we develop lower bounds that show there must be tradeoffs? Can we develop new procedures that are both theoretically optimal and practically useful? To answer these questions, we explore examples from optimization, confidentiality preserving statistical inference, and distributed estimation under communication constraints. Viewing our examples through a general lens of constrained minimax theory, we prove fundamental lower bounds on the statistical performance of any algorithm subject to the constraints--computational, confidentiality, or communication--specified. These lower bounds allow us to guarantee the optimality of the new algorithms we develop addressing the additional criteria we consider, and additionally, we show some of the practical benefits that a focus on multiple optimality criteria brings. In somewhat more detail, the central contributions of this thesis include the following: we (i) develop several new stochastic optimization algorithms, applicable to general classes of stochastic convex optimization problems, including methods that are automatically adaptive to the structure of the underlying problem, parallelize naturally to attain linear speedup in the number of processors available, and may be used asynchronously, (ii) prove lower bounds demonstrating the optimality of these methods, (iii) provide a variety of information-theoretic tools--strong data processing inequalities--useful for proving lower bounds in privacy-preserving statistical inference, communication-constrained estimation, and optimization, (iv) develop new algorithms for private learning and estimation, guaranteeing their optimality, and (v) give simple distributed estimation algorithms and prove fundamental limits showing that they (nearly) optimally trade off between communication (in terms of the number of bits distributed processors may send) and statistical risk.

Details

Title

Multiple Optimality Guarantees in Statistical Learning

Creator

Duchi, John, Author
EECS Department, University of California, Publisher

Published

2014-05-15

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2014-79

Type

Text

Format

technical reports

Extent

268 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket