Comparison of OpenMP 3.0 and Other Task Parallel Frameworks on Unbalanced Task Graphs

Olivier, Stephen; Prins, Jan
October 2010
International Journal of Parallel Programming;Oct2010, Vol. 38 Issue 5/6, p341
Academic Journal
The UTS benchmark is used to evaluate the expression and performance of task parallelism in OpenMP 3.0 as implemented in a number of recently released compilers and run-time systems. UTS performs parallel search of an irregular and unpredictable search space, as arises, e.g., in combinatorial optimization problems. As such UTS presents a highly unbalanced task graph that challenges scheduling, load balancing, termination detection, and task coarsening strategies. Expressiveness and scalability are compared for OpenMP 3.0, Cilk, Cilk++, Intel Thread Building Blocks, as well as an OpenMP implementation of the benchmark without tasks that performs all scheduling, load balancing, and termination detection explicitly. Current OpenMP 3.0 run time implementations generally exhibit poor behavior on the UTS benchmark. We identify inadequate load balancing strategies and overhead costs as primary factors contributing to poor performance and scalability.


Related Articles

  • Keep up with multiple cores. Weltzin, Casey; Bell, Ian // Electronics Weekly;5/27/2009, Issue 2385, p12 

    The article focuses on harnessing developments in the parallel processing. Most commercial compilers do not automatically optimize code for parallel execution, so embedded designers use parallel programming which add overhead and are difficult to debug. The most apparent solution to the...

  • Evolution-Based Scheduling of Computations and Communications on Distributed-Memory Multicomputers. Al-Mouhamed, Mayez // Computer Journal;Jul1999, Vol. 42 Issue 5, p373 

    We present a compiler optimization approach that uses the simulated evolution (SE) paradigm to enhance the finish time of heuristically scheduled computations with communications. This is especially beneficial to the class of synchronous dataflow computations which are generally compiled once...

  • Improvements to First-Come-First-Served Multiprocessor Scheduling with Gang Scheduling. Siyambalapitiya, R.; Sandirigama, M. // IUP Journal of Computer Sciences;Jul2011, Vol. 5 Issue 3, p11 

    This paper proposes an improved algorithm for the multiprocessor job scheduling problem based on First-Come-First-Served (FCFS) strategy. Depending on the job processing times, some jobs are divided into multithreads while others remain as single thread jobs. Multithread jobs are processed based...

  • Extending goal-oriented parallel computer job scheduling policies to heterogeneous systems. Vasupongayya, S.; Prasitsupparote, A. // Journal of Supercomputing;Sep2013, Vol. 65 Issue 3, p1223 

    Several existing parallel computer systems partition their system resources or consist of systems from different geographical locations. This work is focusing on extending the original goal-oriented parallel computer job scheduling policies to cover such systems. The goal-oriented parallel...

  • Parallel Processing Is Speedy and Efficient.  // USA Today Magazine;Jun1998 Science, Vol. 126 Issue 2637, p13 

    Reports on the development of a compiler computer program by electrical and computer engineering professor Rudolf Eigenmann which translates conventional programs so they can run on a parallel processing computer.

  • An Automatic Parallization in Multicore Computer Using OMP. Sen, Bharti; Kumar, Sanjay // International Journal of Computer Science Engineering & Technolo;Oct2012, Vol. 2 Issue 10, p1465 

    The past several years has witnessed an everincreasing acceptance and adoption of parallel processing. We propose dynamically adaptive programs, programs that adapt themselves to their runtime environment. We discuss the key issues in successfully applying this approach and show examples of its...

  • Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs. Xinmin Tian; Girkar, Milind; Bik, Aart; Saito, Hideki // Computer Journal;Sep2005, Vol. 48 Issue 5, p588 

    State-of-the-art multiprocessor systems pose several difficulties: (i) the user has to parallelize the existing serial code; (ii) explicitly threaded programs using a thread library are not portable; (iii) writing efficient multi-threaded programs requires intimate knowledge of machine's...

  • Model, Design, and Evaluation of a Compiler for a Parallel Processing Environment. Baer, Jean-Loijp; Ellis, Carla S. // IEEE Transactions on Software Engineering;Nov77, Vol. 3 Issue 6, p394 

    The problem of designing compilers for a multiprocessing environment is approached. We show that by modeling an existing sequential compiler, we gain an understanding of the modifications necessary to transform the sequential structure into a pipeline of processes. The pipelined compiler is then...

  • Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies. Girbal, Sylvain; Vasilache, Nicolas; Bastoul, C├ędric; Cohen, Albert; Parello, David; Sigler, Marc; Temam, Olivier // International Journal of Parallel Programming;Jun2006, Vol. 34 Issue 3, p261 

    Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics