With the greater adoption of statistical and machine learning methods across science and industry, a greater awareness of the need to align statistical theory ever more closely with the demands of applications is developing. One recurring theme within this process is the re-examination of basic questions and core assumptions through the lens of modern mathematical statistics. This thesis targets two such basic questions in two different contexts: posterior simulation using Markov chain Monte Carlo (MCMC), on the one hand; and multiple hypothesis testing, on the other. For MCMC, we analyze convergence in terms of the expectations of a limited number of query functions, rather than the entire posterior. We show both theoretically and via simulations that the resultant theory predicts the required chain length more sharply than global convergence criteria. Furthermore, we provide matching lower bounds that show our bounds are essentially optimal in their dependence on chain parameters and target accuracy. For multiple testing, we provide the first general framework for establishing the optimal tradeoff between false positives (measured by the False Discovery Rate, or FDR) and false negatives (measured by the False Non-discovery rate, or FNR). We begin by proving some of the first non-asymptotic results available on this topic -- initially in the context of Gaussian-like test statistics. We then go on to develop the more general framework. The latter applies to test statistics with essentially arbitrary analytic forms and dependence, recovers previous results as special cases, and yields numerically simulable lower bounds that can be evaluated in almost any model of interest.




Download Full History