Proceedings Article | 25 January 2011
KEYWORDS: Printing, Visualization, Computer programming, Computer architecture, Video acceleration, Video, Image processing, Information visualization, Clouds, Graphics processing units
This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of
this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality,
compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness
techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will
postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his
algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area
where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered
enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their
underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system
design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited
to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are
highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems
clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing
architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the
structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for
parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs
against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application,
we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined
subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.