Description
Human modeling is a crucial step for achieving good human-AI collaboration, and human data provides us with information on human behavior and thus plays an important role in the process. Even though existing methods work well on a single task with the help of plenty of on-task human data, real-world human-AI collaborations usually involve a distribution of disjoint tasks, and collecting human data on every single task is unrealistic. Consequently, naive human modeling could fail in tasks without human data. However, as long as we know the distribution of tasks, we can still use self-play to obtain a multi-task self-play policy. Since this policy will need to learn robust representations of all tasks, it can serve as an effective initialization for human models. We provide theoretical justification for this technique, and show its benefits on a challenging multi-task setting: multi-layout Overcooked-AI.