Multimedia network video services, like multi-point video conferencing and multimedia desktop editing/'publishing, require real-time high-performance video signal compositing and manipulation. This dissertation investigates three different degrees of freedom for designing video compositing/manipulation systems: feature, location, and data format. Our goal is to provide a systematic approach to network video compositing/ manipulation by integrating the explorations in all degrees of freedom, and by accounting for the interactions among themselves and with other multimedia technologies, in particular video compression. Representative compositing features include geometrical transformations, linear filtering, opaque/semi-transparent overlapping, pixel multiplication, and arbitrarily- shaped (AS) video objects. We propose a structured video model, based on which we present several hierarchical structures for representing compositing functions, and study their restructuring properties. We characterize various performance factors for different compositing locations throughout the network. We also propose a shared distributed compositing principle to match various user/service requirements and optimize the overall system performance in multimedia networks.