TITLE

Optimized Unrolling of Nested Loops

AUTHOR(S)
Sarkar, Vivek
PUB. DATE
October 2001
SOURCE
International Journal of Parallel Programming;Oct2001, Vol. 29 Issue 5, p545
SOURCE TYPE
Academic Journal
DOC. TYPE
Article
ABSTRACT
Loop unrolling is a well known loop transformation that has been used in optimizing compilers for over three decades. In this paper, we address the problems of automatically selecting unroll factors for perfectly nested loops, and generating compact code for the selected unroll factors. Compared to past work, the contributions of our work include (i) a more detailed cost model that includes register locality, instruction-level parallelism and instruction-cache considerations; (ii) a new code generation algorithm that generates more compact code than the unroll-and-jam transformation; and (iii) a new algorithm for efficiently enumerating feasible unroll vectors. Our experimental results confirm the wide applicability of our approach by showing a 2.2� speedup on matrix multiply, and an average 1.08� speedup on seven of the SPEC95fp benchmarks (with a 1.2� speedup for two benchmarks). Larger performance improvements can be expected on processors that have larger numbers of registers and larger degrees of instruction-level parallelism than the processor used for our measurements (PowerPC 604).
ACCESSION #
17143285

 

Related Articles

  • Optimized Unrolling of Nested Loops. Sarkar, Vivek // International Journal of Parallel Programming;Oct2001, Vol. 29 Issue 5, p545 

    Loop unrolling is a well known loop transformation that has been used in optimizing compilers for over three decades. In this paper, we address the problems of automatically selecting unroll factors for perfectly nested loops, and generating compact code for the selected unroll factors. Compared...

  • Loop Transformation for Nested Loops with n-dimensional Space. Jeong Samjin // International Journal of Advancements in Computing Technology;Jul2013, Vol. 5 Issue 11, p504 

    Partitioning loops into blocks is a very important optimization issue. We review some loop partitioning techniques such as loop splitting method by thresholds and Polychronopoulos' loop splitting method for exploiting parallelism from single loop which already developed. And then, we propose...

  • Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays. Dimitroulakos, Grigorios; Kostaras, Nikos; Galanis, Michalis D.; Goutis, Costas E. // Journal of Supercomputing;May2009, Vol. 48 Issue 2, p115 

    Coarse Grain Reconfigurable Array (CGRA) architectures have been extensively used for accelerating time consuming loops. The design of such systems requires good balance between the architecture abilities and the loops’ characteristics. A reliable design is characterized by optimized...

  • Sparse Conditional Constant Propagation.  // Network Dictionary;2007, p457 

    A definition of the term "Sparse Conditional Constant Propagation" is presented. It refers to an optimization frequently utilized in compilers after conversion to static single assignment form (SSA). This algorithm simultaneously removes dead code and propagates constants throughout a program....

  • Guest Editorial: Parallel and Distributed Computing. Ozturan, Can; Grigoras, Dan // International Journal of Parallel Programming;Oct2011, Vol. 39 Issue 5, p582 

    An introduction is presented in which the editor discusses various reports within the issue on topics including algorithm programming in mapping pipelined applications, compiler algorithm in many-core systems, and Graphics Processing Units (GPU).

  • EVALUATION OF VARIOUS COMPILER OPTIMIZATION TECHNIQUES RELATED TO MIBENCH BENCHMARK APPLICATIONS. Andrews, Jeyaraj; Sasikala, Thangappan // Journal of Computer Science;Jun2013, Vol. 9 Issue 6, p749 

    Tuning compiler optimization for a given application of particular computer architecture is not an easy task, because modern computer architecture reaches higher levels of compiler optimization. These modern compilers usually provide a larger number of optimization techniques. By applying all...

  • Fast enumeration of words generated by Dyck grammars. Medvedeva, Yu. // Mathematical Notes;Jul2014, Vol. 96 Issue 1/2, p68 

    The problem of enumerating and denumerating words generated by Dyck grammars arises in the work of compilers for high-level programming languages and a number of other applications. The present paper proposes an algorithm for the fast enumeration and denumeration of words of Dyck languages; the...

  • Implementation of Power Efficient Vedic Multiplier. Sree Nivas, A.; Kayalvizhi, N. // International Journal of Computer Applications;Apr2012, Vol. 43, p21 

    Vedic multiplier is based on the ancient algorithms (sutras) followed in INDIA for multiplication. This work is based on one of the sutras called "Nikhilam Sutra". These sutras are meant for faster mental calculation. Though faster when implemented in hardware, it consumes more power than the...

  • A New Technique to Calculate the Exact Process Execution Time with the Help of the Compiler. Danesh, Ehsan; Rahmani, Amir Masoud // Journal of Applied Sciences;2007, Vol. 7 Issue 21, p3217 

    The scheduling algorithms using the exact process execution time have not been put into practice and execution time is treated as a random variable and some of the methods are statistically estimate it from past observations and some of them collect more data like source process algorithm for...

Share

Read the Article

Courtesy of VIRGINIA BEACH PUBLIC LIBRARY AND SYSTEM

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics