Description
We address two challenges in AutoML research: first, how to represent ML programs suitably for metalearning; and second, how to improve evaluations of AutoML systems to be able to compare approaches, not just predictions.
To this end, we have designed and implemented a framework for ML programs which provides all the components needed to describe ML programs in a standard way. The framework is extensible and framework’s components are decoupled from each other, e.g., the framework can be used to describe ML programs which use neural networks. We provide reference tooling for execution of programs described in the framework. We have also designed and implemented a service, a metalearning database, that stores information about executed ML programs generated by different AutoML systems.
We evaluate our framework by measuring the computational overhead of using the framework as compared to executing ML programs which directly call underlying libraries. We observe that the framework’s ML program execution time is an order of magnitude slower and its memory usage is twice that of ML programs which do not use this framework.
We demonstrate our framework’s ability to evaluate AutoML systems by comparing 10 different AutoML systems that use our framework. The results show that the framework can be used both to describe a diverse set of ML programs and to determine unambiguously which AutoML system produced the best ML programs. In many cases, the produced ML programs outperformed ML programs made by human experts.