Dynamic Memory Instruction Bypassing

Ortega, Daniel; Valero, Mateo; Ayguadé, Eduard
June 2004
International Journal of Parallel Programming;Jun2004, Vol. 32 Issue 3, p199
Academic Journal
Reducing the latency of load instructions is among the most crucial aspects to achieve high performance for current and future microarchitectures. Deep pipelining impacts load-to-use latency even for loads that hit in cache. In this paper we present a dynamic mechanism which detects relations between address producing instructions and the loads that consume these addresses and uses this information to access data before the load is even fetched from the I-Cache. This mechanism is not intended to prefetch from outside the chip but to move data from L1 and L2 silently and ahead of time into the register file, allowing the bypassing of the load instruction (hence the name). An average performance improvement of 22.24% is achieved in the SPECint95 benchmarks.


Related Articles

  • Performance Without A Power Penalty. Davis, Jessica // Electronic News;3/13/2006, Vol. 52 Issue 11, p61 

    The article reports on the new Core Microarchitecture launched by Intel in March 2006. Intel focuses on power in creating the microarchitecture. It discusses the five separate areas designed to improve performance on which the microarchitecture relies including Intelligent Power Capability. A...

  • Cache memory energy minimization in VLIW processors. Mohamed, Nagm; Botros, Nazeih; Alweh, Mohamad // Journal of Communication & Computer;Dec2009, Vol. 6 Issue 12, p70 

    This is a comparative study of cache energy dissipations in Very Long Instruction Word (VLIW) and the classical superscalar microprocessors. While architecturally different, the two types are analyzed in this work under the assumption of having similar underlying silicon fabrication platforms....

  • Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach. Savadi, Abdorreza; Deldari, Hossein // Journal of Supercomputing;Feb2014, Vol. 67 Issue 2, p565 

    Computer benchmarking is a common method for measuring the parameters of a computational model. It helps to measure the parameters of any computer. With the emergence of multicore computers, the evaluation of computers was brought under consideration. Since these types of computers can be viewed...

  • Complex Systems Require Complex Benchmarks.  // Database Trends & Applications;Jan2004, Vol. 18 Issue 1, p26 

    Focuses on the IOzone benchmark project tool. Examination of the performance of two hard-disk drives; Assessment of the database community; Observation of the caching mechanisms.

  • Execution History Guided Instruction Prefetching. Zhang, Yi; Haga, Steve; Barua, Rajeev // Journal of Supercomputing;Feb2004, Vol. 27 Issue 2, p129 

    The increasing gap in performance between processors and main memory has made effective instructions prefetching techniques more important than ever. A major deficiency of existing prefetching methods is that most of them require an extra port to I-cache. A recent study by Rivers et al. [19]...

  • Benchmark results posted for 208MHz 90nm ARM9.  // Electronics Weekly;7/4/2007, Issue 2295, p7 

    The article focuses on the results of the evaluation conducted by the Embedded Microprocessor Benchmarking Consortium (EEMBC) for NXP Semiconductors' LPC3180 microcontroller in Great Britain. EEMBC claims that NXP's microcontroller was the first device to demonstrate the effect of an integrated...

  • 486SX chip PC.  // Management Services;Jun91, Vol. 35 Issue 6, p42 

    The first personal computer to use the new Intel 486SX microprocessor is the Premium II/20 from AST Computers. AST's upgradable architecture means that the existing Premium II does not need a complete computer architecture redesign, the new processor technology is rapidly incorporated onto a new...

  • Data Cache Prefetching With Dynamic Adaptation. Khan, Minhaj Ahmad // Computer Journal;May2011, Vol. 54 Issue 5, p815 

    Modern processors based on VLIW architecture rely heavily on software cache prefetching incorporated by the compiler. For accurate prefetching different factors such as latencies of the loop iterations need to be taken into account, which cannot be determined at (static) compile time....

  • Dual Independent Bus.  // Network Dictionary;2007, p164 

    An encyclopedia entry on the "Dual Independent Bus" (DIB) computer architecture is presented. DIB is introduced in Intel's Pentium II to connect the processor, memory and L2 cache. The processor is connected by one bus to L2 cache, and a second connects the processor to main memory. Among the...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics