Visual Content Creation by Generative Adversarial Networks

Azadi, Samaneh

PDF

Description

We live in a world made up of different objects, people, and environments interacting with each other: people who work, write, eat, and drink; vehicles that move on land, water, or in the air; rooms that are furnished with chairs, tables, and carpets. This vast amount of information can be easily collected from the recorded videos and photographs shared online. However, it still remains a challenge to teach an intelligent machinery agent to reliably analyze and understand this extensive collection of data. Generative models are one of the most compelling methods towards modeling visual realism from the large corpus of available images, which operate by teaching a machine to create new contents. These models are not only beneficial in understanding the visual world, but more deeply in visual synthesis and content creation. They can assist human users in manipulating and editing an existing visual content. In the last few years, Generative Adversarial Networks (GANs) as an important type of generative models have made remarkable enhancements in learning complex data manifolds by generating data points from scratch. The GAN training procedure pits two neural networks against each other, a generator and a discriminator. The discriminator is trained to distinguish between the real samples and the generated ones. The generator is trained to fool the discriminator into thinking its outputs are real. The network learns the real-world distribution while generating high-quality images, translating a text phrase into an image, or transforming images from one domain to another. This dissertation investigates algorithms to improve the performance of such models in creating new visual content specifically in structural and compositional domains in a wide range from hand-designed fonts to natural complex scenes. In Chapter 2, we consider text as a visual element and propose tools to synthesize new glyphs in a font domain and transfer the style of the seen characters to the generated ones. From Chapter 3, we focus on the domain of natural images and propose GAN models capable of synthesizing complex scene images with lots of variations in the number of objects, their locations, shapes, etc. In Chapter 4, we explore the role of compositionality in the GAN frameworks and propose a new method to learn a function that maps images of different objects sampled from their marginal distributions into a combined sample that captures the joint distribution of object pairs. Despite all the improvements in training GANs, it still remains a challenge to fully optimize the GAN generator in a two-player adversarial game, resulting in samples that do not always follow the target distribution. In Chapter 5, instead of trying to improve the training procedure, we propose an approach to improve the quality of the trained generator by post-processing its generated samples using information from the optimized discriminator.

Details

Title

Visual Content Creation by Generative Adversarial Networks

Creator

Azadi, Samaneh, Author

Published

2021-05-13

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Type

Text

Format

technical reports

Extent

118 p

Language

eng

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket