Modern learning problems in nature language processing, computer vision, computational biology, etc. often involve large-scale datasets with millions of samples and/or millions of features, thus are challenging to solve. Simply replacing the original data with a simpler approximation such as a sparse matrix or a low-rank matrix does allow a dramatic reduction in the computational effort required to solve the problem, however some information of the original data will be lost during the approximation process. In some cases, the solution obtained by directly solving the learning problem with approximated data might be infeasible for the original problem or might have undesired properties. In this thesis, we present a new approach that utilizes data approximation techniques and takes into account, via robust optimization the error made during the approximation process in order to obtain learning algorithms that could solve large-scale learning problems efficiently while preserving the learning quality. In the first part of this thesis, we give a brief review of robust optimization and its appearance in machine learning literature. In the second part of this thesis, we examine two data approximation techniques, namely data thresholding and low-rank approximation, and then discuss their connection to robust optimization.