Non-Strict Execution in Parallel and Distributed Computing

Cristobal-Salas, Alfredo; Tchernykh, Andrei; Gaudiot, Jean-Luc; Lin, Wen-Yen
April 2003
International Journal of Parallel Programming;Apr2003, Vol. 31 Issue 2, p77
Academic Journal
This paper surveys and demonstrates the power of non-strict evaluation in applications executed on distributed architectures. We present the design, implementation, and experimental evaluation of single assignment, incomplete data structures in a distributed memory architecture and Abstract Network Machine (ANM). Incremental Structures (IS), Incremental Structure Software Cache (ISSC), and Dynamic Incremental Structures (DIS) provide non-strict data access and fully asynchronous operations that make them highly suited for the exploitation of fine-grain parallelism in distributed memory systems. We focus on split-phase memory operations and non-strict information processing under a distributed address space to improve the overall system performance. A novel technique of optimization at the communication level is proposed and described. We use partial evaluation of local and remote memory accesses not only to remove much of the excess overhead of message passing, but also to reduce the number of messages when some information about the input or part of the input is known. We show that split-phase transactions of IS, together with the ability of deferring reads, allow partial evaluation of distributed programs without losing determinacy. Our experimental evaluation indicates that commodity PC clusters with both IS and a caching mechanism, ISSC, are more robust. The system can deliver speedup for both regular and irregular applications. We also show that partial evaluation of memory accesses decreases the traffic in the interconnection network and improves the performance of MPI IS and MPI ISSC applications.


Related Articles

  • A Multi-Level WEB Based Parallel Processing System A Hierarchical Volunteer Computing Approach. Mohamed Osman, Abdelrahman Ahmed // Enformatika;2006, Vol. 13, p66 

    Over the past few years, a number of efforts have been exerted to build parallel processing systems that utilize the idle power of LAN's and PC's available in many homes and corporations. The main advantage of these approaches is that they provide cheap parallel processing environments for those...

  • Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing. Takizawa, Hiroyuki; Kobayashi, Hiroaki // Journal of Supercomputing;Jun2006, Vol. 36 Issue 3, p219 

    This paper presents an effective scheme for clustering a huge data set using a PC cluster system, in which each PC is equipped with a commodity programmable graphics processing unit (GPU). The proposed scheme is devised to achieve three-level hierarchical parallel processing of massive data...

  • Guest Editor's Introduction. Joe, Kazuki // International Journal of Parallel Programming;Feb2003, Vol. 31 Issue 1, p1 

    This article introduces papers and studies published in the February 2003 issue of the periodical "International Journal of Parallel Programming,". The publication contains four of the best papers presented in the 2002 International Symposium on High Performance Computing in Japan. Moreover, the...

  • Two Round Scheduling (TRS) Scheme for Linearly Extensible Multiprocessor Systems. Samad, Abdus; Rafiq, M. Qasim; Farooq, Omar // International Journal of Computer Applications;Jan2012, Vol. 38, p34 

    Balancing the computational load over multiprocessor networks is an important problem in massively parallel systems. The key advantage of such systems is to allow concurrent execution of workload characterized by computation units known as processes or tasks. The scheduling problem is to...

  • Need Performance? Don't Bet on Dual-Core. Davis, Jessica // Electronic News;5/22/2006, Vol. 52 Issue 21, p43 

    Focuses on the challenges facing computer companies in addressing the demand of customers for increased performance through the use of dual-core and multi-core processors in a single chip. Problems to be faced by firms in developing applications for parallel-oriented workloads; Efforts of...

  • Performance Prediction and Evaluation of Parallels Processing on a NUMA Multiprocessor. Xiaodong Zhang; Xiaohan Qin // IEEE Transactions on Software Engineering;Oct91, Vol. 17 Issue 10, p1059 

    Non-Uniform Memory Access (NUMA) architectures make it possible to build large-scale, shared-memory multiprocessor systems, in comparison with nonscalable Uniform Memory Access (UMA) architectures. Most NUMA multiprocessor operations such as scheduling and synchronizing processes, accessing data...

  • An Efficient Hardware-oriented Algorithm of Spatial Motion Vector Prediction for AVS HD Video Encoder. Minghui Yang; Xiaodong Xie // Applied Mechanics & Materials;2014, Issue 556-562, p4365 

    Motion Vector Prediction (MVP) plays an important role in improving coding efficiency in HEVC, H.264/AVC and AVS video coding standard. MVP is implemented by exploiting redundancy of adjacent-block optimal coding information under the constraint that MVP must be performed in a serial way. The...

  • Multiprocessor Scheduling by Simulated Evolution.  // Journal of Software (1796217X);Oct2010, Vol. 5 Issue 10, p1128 

    No abstract available.

  • A Scheduling Algorithm for Asymmetric Processor Architecture.  // International Journal of Computer Applications;Dec2010, Vol. 11, p44 

    The article presents a study that determines the performance of scheduling algorithm for heterogeneous multiprocessors and asymmetric processor architecture. The study develops optimization function to allocate the processes to the processors that reduces the overall execution time of the...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics