Algorithms for Robust Linear Models against Strong Adversarial Corruptions

Wang, Yimeng

PDF

Description

In real-world data science and machine learning, data are inevitably imperfect. Data contamination comes in many sources. It may come from human errors which can be avoided with more caution. However, it may also come from sources such as systematic measurement errors and adversarial data poisoning that are hard to avoid and even detect. Consequently, there is a need for methods that can perform certain tasks in statistics despite this difficulty. Formally speaking, we want to design efficient algorithms that can provide provable guarantees for learning problems under certain models of contamination.

In the article, we examine some important techniques in the recent development of efficient algorithms for robust statistics, namely filtering-based methods and sum-of-squares techniques. Specifically, we will focus on the problem of learning linear models (including linear regression, generalized linear models etc.) under the strong contamination model. We will fully present and analyze the conditions and consequences of SEVER [DKK+19] and the sum-of-squares-based algorithm for robust linear regression in [KKM20]. SEVER is meta-algorithm that takes in a well-conditioned base learner and output a outlier-robust version of the base learners. The [KKM20] robust linear regression algorithm is an elegant and simple application of sum-of-squares techniques for robust regressions in general including l₁, l₂ and polynomial regression. Both algorithms have O(√ε)-dependence in error on the fraction of outlier ε. We will present and prove the theoretical guarantees of these algorithms which shed lights on future directions in which the error dependence and the required assumptions can be improved.

Details

Title

Algorithms for Robust Linear Models against Strong Adversarial Corruptions

Creator

Wang, Yimeng, Author

Published

EECS Department, University of California at Berkeley, Berkeley, California, 5/13/2022

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Type

Text

Format

technical reports

Extent

57 p

Language

eng

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket