PDF

Description

We present progress in developing stable, scalable and transferable generative models for visual data. We first learn expressive image priors using autoregressive models which generate high-quality and diverse images. We then explore transfer learning to generalize visual representations models to new data modalities with limited available data. We propose two methods to generate high quality 3D graphics from sparse input images or natural language descriptions by distilling knowledge from pretrained discriminative vision models. We briefly summarize our work on improving generation quality with a Denoising Diffusion Probabilistic Model, and demonstrate how to transfer it to new modalities including high-quality text-to-3D synthesis using Score Distillation Sampling. Finally, we generate 2D vector graphics from text by optimizing a vector graphics renderer with knowledge distilled from a pretrained text-to-image diffusion model, without vector graphics data. Our models enable high-quality generation across many modalities, and continue to be broadly applied in subsequent work.

Details

Files

Statistics

from
to
Export
Download Full History