Eliminating Timing Errors Through Collaborative Design to Maximize the Throughput
In advanced technology nodes, large timing margins must be added to allow for worse process, voltage, temperature, and aging variations. The error detection and correction (EDAC) technique effectively eliminates these margins by timing speculation, but the high design complexity and large hardware cost make many existing EDAC systems unsuitable for commercial processors. Based on the instruction-level locality of timing errors, a collaborative EDAC approach is proposed to address this issue. The hardware layer adopts simple and low cost EDAC circuits to ensure correct operation when timing error occurs, while a runtime software layer prevents recurring errors of the same instruction by sending timing error alarms to the hardware layer. Cooperation of both layers, accompanied with the proposed profile-guided timing error avoidance algorithm, eliminates more than 95% of errors with small runtime overhead. This significantly improves overall performance and alleviates pressure on the EDAC circuits. Experimental results based on the three-stage commercial CK802 processor in SMIC 40LL process present that the approach has improved the peak performance of the baseline EDAC system (Razor-Lite + half-frequency replay) by 8% and reduced the energy consumption by 25%, with less than 1.4% area overhead.