Description
We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable $Y$ from an explanatory variable $X$, we treat the problem of dimensionality reduction as that of finding a low-dimensional ``effective subspace'' of $X$ which retains the statistical relationship between $X$ and $Y$. We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem we establish a general nonparametric characterization of conditional independence using covariance operators on a reproducing kernel Hilbert space. This characterization allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods for dimensionality reduction in supervised learning, the proposed method requires neither assumptions on the marginal distribution of $X$, nor a parametric model of the conditional distribution of $Y$. We present experiments that compare the performance of the method with conventional methods.