Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational cost. The approach has been successfully used in many areas, such as signal and image processing. In sparse learning classification, for example, the prediction accuracy or some other classical measure of performance is not the sole concern: we also wish to be able to better understand which few features are relevant as markers for classification. Furthermore, many of sparse learning tasks in practice, including cross-validation, parameter search, or leave-one-out analysis, involve multiple instances of similar problems, each instance sharing a large part of learning data with the others. In this thesis, we introduce a robust framework for solving these multiple sparse regressions in the form of square-root LASSO problems, based on a sketch of the learning data that uses low-rank approximations. Our approach allows a dramatic reduction in computational effort, while not sacrificing—sometimes even improving—the statistical performance. We present our technique by first studying sparse optimization with applications in different domain of interests, from text analytics to system design, and then developing theories for robust solutions for sparse regression in multi-instance setting. We also provide comparisons with other heuristics to obtain sparse models in various applications. In more detail, our central contributions from this thesis include: (i) Identifying key tasks in domains of interests under real-world setting, (ii) Suggesting models that are suitable for these tasks along the axes of computational complexity and model understandability, (iii) Exploiting problem structures when working with multiple instances to robustly improve computation while maintaining high learning performance, and (iv) Proposing applications of our robust solutions in high-dimensional setting.