Fine-tuned language models (LMs) provide the backbone for popular services such as ChatGPT, GitHub Copilot, and Cohere AI. The competitive edge of these systems often arises from their proprietary finetuning data (e.g., user-submitted prompts), and thus companies invest substantial resources into collecting and protecting this data. In this work, we study model imitation as a method to close the gap between open-source LMs and their closed-source counterparts. In the first part, we propose a framework for cheaply imitating proprietary language models in specific domains. In particular, we create a prompting pipeline that first asks what tasks a particular LM can solve and then asks for input-output examples for those tasks. We then fine-tune open-source LMs on these supervised input-output examples to create imitation models. We show that human evaluators rate the outputs of these imitation models more highly as these models get larger and use bigger querying budgets. In the second part, we apply this general framework to ChatGPT and release Koala, our strongest imitation model. Initial evaluations show that this model results in impressive qualitative performance compared to ChatGPT in specific domains.




Download Full History