Energy-Efficient Reduce-and-Rank Using Input-Adaptive Approximations
Approximate computing is an emerging design paradigm that exploits the intrinsic ability of applications to produce acceptable outputs even when their computations are executed approximately. In this paper, we explore approximate computing for a key computation pattern, reduce-and-rank (RnR), which is prevalent in a wide range of workloads, including video processing, recognition, search, and data mining. An RnR kernel performs a reduction operation (e.g., distance computation, dot product, and L1-norm) between an input vector and each of a set of reference vectors, and ranks the reduction outputs to select the top reference vectors for the current input. We propose three complementary approximation strategies for the RnR computation pattern. The first is interleaved reduction-and-ranking, wherein the vector reductions are decomposed into multiple partial reductions and interleaved with the rank computation. Leveraging this transformation, we propose the use of intermediate reduction results and ranks to identify future computations that are likely to have a low impact on the output, and can, hence, be approximated. The second strategy, input-similarity-based approximation, exploits the spatial or temporal correlation of inputs (e.g., pixels of an image or frames of a video) to identify computations that are amenable to approximation. The third strategy, reference vector reordering, rearranges the order in which the reference vectors are processed such that vectors that are relatively more critical in evaluating the correct output, are processed at the beginning of RnR operation. The number of these critical reference vectors is usually small, which renders a substantial portion of the total computation to be amenable to approximation. These strategies address a key challenge in approximate computing—identification of which computations to approximate—and may be used to drive any approximation mechanism, such as computation skipping or precision scaling to realize performance and energy improvements. A second key challenge in approximate computing is that the extent to which computations can be approximated varies significantly from application to application, and across inputs for even a single application. Hence, input-adaptive approximation, or the ability to automatically modulate the degree of approximation based on the nature of each individual input, is essential for obtaining optimal energy savings. In addition, to enable quality configurability in RnR kernels, we propose a kernel-level quality metric that correlates well to application-level quality, and identify key parameters that can be used to tune the proposed approximation strategies dynamically. We develop a runtime framework that modulates the identified parameters during the execution of RnR kernels to minimize their energy while meeting a given target quality. To evaluate the proposed concepts, we designed quality-configurable hardware implementations of six RnR-based applications from the recognition, mining, search, and video processing application domains in 45-nm technology. Our experiments demonstrate a $1.13\times $ – $3.18\times $ reduction in energy consumption with virtually no loss in output quality (<0.5%) at the application level. The energy benefits further improve up to $3.43\times $ and $3.9\times $ when the quality constraints are relaxed to 2.5% and 5%, respectively.