Efficient Partial Online Synthesis of Special Instructions for Reconfigurable Processors
Reconfigurable processors with fine-grained runtime-reconfigurable fabrics are used to speed up applications from different domains. Such a reconfigurable fabric allows loading of application-specific accelerators, where multiple accelerators can be combined using a coarse-grained runtime-reconfigurable $\mu $ Program to speed up complex computationally intensive kernels. To allow a large degree of adaptivity in the reconfigurable fabric, as it is required by, e.g., multitasking systems, the $\mu $ Program for a kernel should not be generated at compile time, as it would constrain the adaptivity of the system. To enable flexible and efficient use of the reconfigurable fabric, we propose the necessary algorithms for runtime: 1) accelerator placement (i.e., deciding where on the fabric an accelerator should be reconfigured at runtime); 2) $\mu $ Program generation; and 3) $\mu $ Program caching. Accelerator synthesis and implementation are done at compile time to reduce runtime overhead in generating accelerators. We evaluate the proposed algorithms using different application scenarios and demonstrate the proposed concepts on an field-programmable gate array-based prototype of a reconfigurable processor. In comparison with state-of-the-art reconfigurable processors that generate $\mu $ Programs at compile time, we obtain an average speedup of $1.29\times $ (up to $1.84\times $ ).