TITLE

Array Organization in Parallel Memories

AUTHOR(S)
Al-Mouhamed, Mayez
PUB. DATE
April 2004
SOURCE
International Journal of Parallel Programming;Apr2004, Vol. 32 Issue 2, p123
SOURCE TYPE
Academic Journal
DOC. TYPE
Article
ABSTRACT
The bandwidth mismatch between processor and main memory is one major throughput limiting problem. Although streamed computations have predictable access patterns their references have little temporal locality and are generally too long to cache. A memory and compiler co-optimization aimed at reducing low-level memory accesses using software and hardware locality optimizations is presented. We propose a scalable and predictable parallel memory based on a compiler synthesis of storage schemes for multi-dimensional arrays that are accessed by an arbitrary but known set of data access patterns. Using algebra of non-singular Boolean matrices, we present analysis of conflict-free access to (1) parallel memories, and (2) alignment networks. Finding a multi-pattern storage scheme is one NP-complete problem. An effective compiler heuristic is proposed for finding a storage matrix that minimizes overall memory access time. This applies to arbitrary linear patterns and arbitrary alignment networks. It is shown that the proposed storage scheme finds an optimal storage scheme for parallel (1) FFT, and (2) bitonic sorting. The proposed storage scheme outperforms statically optimized storages in the case of power-of-2 multi-stride access. The case of non power-of-2 strides is also addressed. The performance and scalability of the proposed parallel memory and its predictable access time are presented using numerical and multimedia algorithms. It is shown that a memory utilization above 83% is achieved by our storage scheme for 64 memories, which largely outperforms previous proposals. Our approach provides a tool for matching the storage pattern with the data access patterns needed for embedded systems running streamed computations with predictable data access patterns.
ACCESSION #
17039618

 

Related Articles

  • Array Organization in Parallel Memories. Al-Mouhamed, Mayez // International Journal of Parallel Programming;Apr2004, Vol. 32 Issue 2, p123 

    The bandwidth mismatch between processor and main memory is one major throughput limiting problem. Although streamed computations have predictable access patterns their references have little temporal locality and are generally too long to cache. A memory and compiler co-optimization aimed at...

  • Scalarization Using Loop Alignment and Loop Skewing. Zhao, Yuan; Kennedy, Ken // Journal of Supercomputing;Jan2005, Vol. 31 Issue 1, p5 

    Array syntax, which is supported in many technical programming languages, adds expressive power by allowing operations on and assignments to whole arrays and array sections. To compile an array assignment statement to a uniprocessor, the language processor must convert the statement into a loop...

  • Improved Results for a Memory Allocation Problem. Epstein, Leah; Stee, Rob // Theory of Computing Systems;Jan2011, Vol. 48 Issue 1, p79 

    We consider a memory allocation problem. This problem can be modeled as a version of bin packing where items may be split, but each bin may contain at most two (parts of) items. This problem was recently introduced by Chung et al. (Theory Comput. Syst. 39(6):829-849, ). We give a simple...

  • Modern and Digitalized USB Device With Extendable Memory Capacity. Meeraa, J. Nandini; Abirami, S. Devi; Indhuja, N.; Aravind, R.; Chithiraikkayalvizhi, C.; Kumar, K. Rathina // International Journal of Advanced Computer Science & Application;Nov2012, Vol. 3 Issue 11, p184 

    This paper proposes a advance technology which is completely innovative and creative. The urge of inventing this proposal lies on the bases of the idea of making a pen drive have an extendable memory capacity with a modern and digitalized look. This device can operate without the use of a...

  • Smart home control system based on Internet of Things. Jun Zhu; Xiao Jia; Xiaoqing Mei // Applied Mechanics & Materials;2015, Vol. 738-739, p233 

    A set of smart home control system based on Internet of Things is proposed due to the rapid development of smart home. This paper designs a system platform which people can use to make a remote query and control the home applications. Constrained Application Protocol is utilized in the...

  • Ruggedised Pentium M board.  // Electronics Weekly;6/30/2004, Issue 2153, p25 

    Intel's Pentium M mobile processor is available in an ATX form factor motherboard designed for harsh operating conditions. Kontron Embedded Computers has introduced the ATX-855GME board with 400MHz front side bus. Main features include, a passive cooled l.GGHz processor, the ICH4 I/O hub with...

  • Dell cranks up low-end PowerEdge servers. Burt, Jeffrey // eWeek;2/23/2004, Vol. 21 Issue 8, p22 

    Dell Computer Corp. is rolling out two single-processor servers designed to give small-business and corporate workgroup customers advanced features, such as more powerful chips and remote management capabilities. Such customers have been trying to cobble together an IT infrastructure through...

  • Front Side Bus.  // Network Dictionary;2007, p205 

    An encyclopedia entry for "Front Side Bus (FSB)" is presented. This refers to the bus between the processor and the system memory. It is also known as the Central Processing Unit (CPU) bus, memory bus, and system bus. Generally, a faster FSB means higher processing speeds and a faster computer.

  • Storage: A Line In The SAN. Babcock, Charles // Inter@ctive Week;09/18/2000, Vol. 7 Issue 37, p64 

    Discusses the pros and cons of using Storage Area Network (SAN) and Network-Attached Storage (NAS) for computer storage. Data transmission speed of SAN; Price of SAN products; Consolidation of storage through NAS based on Ethernet; Steps in building SAN.

Share

Read the Article

Courtesy of THE LIBRARY OF VIRGINIA

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics