TITLE

Scalarization Using Loop Alignment and Loop Skewing

AUTHOR(S)
Zhao, Yuan; Kennedy, Ken
PUB. DATE
January 2005
SOURCE
Journal of Supercomputing;Jan2005, Vol. 31 Issue 1, p5
SOURCE TYPE
Academic Journal
DOC. TYPE
Article
ABSTRACT
Array syntax, which is supported in many technical programming languages, adds expressive power by allowing operations on and assignments to whole arrays and array sections. To compile an array assignment statement to a uniprocessor, the language processor must convert the statement into a loop that has the same meaning. This process is calledscalarization.Scalarization presents a significant technical problem because an array assignment needs to be implemented as if all inputs are fetched before any outputs are stored. Since a loop intermixes loads and stores, the compiler typically allocates a temporary array to hold the intermediate result. Because these extra temporary arrays can cause performance problems in cache, many techniques have been developed to avoid their use or minimize their size.In this paper, we present a novel application of two compiler strategies-loop alignment and loop skewing-to address this problem. We show that these strategies can achieve the asymptotically minimal memory allocation for stencil computations. Our experiments with loop alignment and loop skewing demonstrate that it is extremely effective in improving memory hierarchy performance of Fortran 90 array code on standard uniprocessors. The result should be applicable to other array languages, such as MATLAB.
ACCESSION #
16398761

 

Related Articles

  • Optimizing AST Node for JavaScript Compiler A lightweight Interpreter for Embedded Device. Patra, Sambit Kumar; Pattanayak, Binod Kumar; Puthal, Bhagabat // Journal of Computers;Feb2013, Vol. 8 Issue 2, p349 

    Limited memory in mobile devices presents the bottleneck to execution speed. Abstract Syntax Tree (AST) could be a better option for mobile codes as it is compiled only once. Hence, to improve the execution speed of a mobile device, AST node should be designed in such a way that it would consume...

  • Syntax-Tree Regular Expression Based DFA Formal Construction. Zafar, Nazir Ahmad; Alsaade, Fawaz // Intelligent Information Management;Jul2012, Vol. 4 Issue 4, p138 

    Compiler is a program whose functionality is to translate a computer program written in source language into an equivalent machine code. Compiler construction is an advanced research area because of its size and complexity. The source codes are in higher level languages which are usually complex...

  • PATHSCALE EKO COMPILER SUITE ADOPTION ACCELERATES.  // UNIX Update;Nov2004, Vol. 15 Issue 11, p5 

    This article focuses on the PathScale EKO Compiler Suite of PathScale, a developer of the world's fastest compilers for AMD(R) Opteron processor-based Linux clusters. Many of the highest-profile research and development organizations in North America, Europe and Asia have adopted AMD Opteron...

  • ARM maximises code density without performance loss in mobile designs. Wilson, Richard // Electronics Weekly;6/29/2005, Issue 2200, p7 

    The article reports that ARM Holdings PLC has addressed code density in Java-based processor designs with a runtime compiler target for its Jazelle DBX mobile device architecture The intention is to enable run time compilers in a Java environment to maximize code density without loss of...

  • A Generic Framework for Automated Quality Assurance of Software Models -Implementation of an Abstract Syntax Tree. Owens, Darryl; Anderson, Mark // International Journal of Advanced Computer Science & Application;Jan2014, Vol. 5 Issue 1, p32 

    Syntax Tree's (AST) are used in language tools, such as compilers, language translators and transformers as well as analysers; to remove syntax and are therefore an ideal construct for a language independent tool. AST's are also commonly used in static analysis. This increases the value of ASTs...

  • Optimizing CUDA code by kernel fusion: application on BLAS. Filipovič, Jiří; Madzin, Matúš; Fousek, Jan; Matyska, Luděk // Journal of Supercomputing;Oct2015, Vol. 71 Issue 10, p3934 

    Contemporary GPUs have significantly higher arithmetic throughput than a memory throughput. Hence, many GPU kernels are memory bound and cannot exploit arithmetic power of the GPU. Examples of memory-bound kernels are BLAS-1 (vector-vector) and BLAS-2 (matrix-vector) operations. However, when...

  • Grammatical evolution based nonlinear data automatic fitting bee colony algorithm. CHEN Jian; MA Guang-zhi // Application Research of Computers / Jisuanji Yingyong Yanjiu;Nov2013, Vol. 30 Issue 11, p3257 

    This paper introduced grammatical evolution into bee colony algorithm, and proposed a new nonlinear data automatic fitting bee colony algorithm BCGE, which was based on multiple functions defined in context free grammar, and presented speedup method for BCGE by using gene truncation and...

  • Array Organization in Parallel Memories. Al-Mouhamed, Mayez // International Journal of Parallel Programming;Apr2004, Vol. 32 Issue 2, p123 

    The bandwidth mismatch between processor and main memory is one major throughput limiting problem. Although streamed computations have predictable access patterns their references have little temporal locality and are generally too long to cache. A memory and compiler co-optimization aimed at...

  • Array Organization in Parallel Memories. Al-Mouhamed, Mayez // International Journal of Parallel Programming;Apr2004, Vol. 32 Issue 2, p123 

    The bandwidth mismatch between processor and main memory is one major throughput limiting problem. Although streamed computations have predictable access patterns their references have little temporal locality and are generally too long to cache. A memory and compiler co-optimization aimed at...

Share

Read the Article

Courtesy of THE LIBRARY OF VIRGINIA

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics