Reducing communication costs in collective I/O in multi-core cluster systems with non-exclusive scheduling

Cha, Kwangho; Maeng, Seungryoul
September 2012
Journal of Supercomputing;
As the number of nodes in high performance computing (HPC) systems increases, collective I/O becomes an important issue and I/O aggregators are the key factors in improving the performance of collective I/O. When an HPC system uses non-exclusive scheduling, a different number of CPU cores per node can be assigned for MPI jobs; thus, I/O aggregators experience a disparity in their workloads and communication costs. Because the communication behaviors are influenced by the sequence of the I/O aggregators and by the number of CPU cores in neighbor nodes, changing the order of the nodes affects the communication costs in collective I/O. There are few studies, however, that seek to incorporate steps to adequately determine the node sequence. In this study, it was found that an inappropriate order of nodes results in an increase in the collective I/O communication costs. In order to address this problem, we propose the use of specific heuristic methods to regulate the node sequence. We also develop a prediction function in order to estimate the MPI-IO performance when using the proposed heuristic functions. The performance measurements indicated that the proposed scheme achieved its goal of preventing the performance degradation of the collective I/O process. For instance, in a multi-core cluster system with the Lustre file system, the read bandwidth of MPI-Tile-IO was improved by 7.61% to 17.21% and the write bandwidth of the benchmark was also increased by 17.05% to 26.49%.


Related Articles

  • Performance Evaluation of Inter-Processor Communication for an Embedded Heterogeneous Multi-Core Processor. Shiao-Li Tsao; Sung-Yuan Lee // Journal of Information Science & Engineering;May2012, Vol. 28 Issue 3, p537 

    Embedded systems often use a heterogeneous multi-core processor to improve performance and energy efficiency. This multi-core processor is composed of a general purpose processor (GPP), which manages the program flow and I/O, and a digital signal processor (DSP), which processes mass data. An...

  • cFireworks: a Tool for Measuring the Communication Costs in Collective I/O. Kwangho Cha // International Journal of Advanced Computer Science & Application;Aug2014, Vol. 5 Issue 8, p192 

    Nowadays, many HPC systems use the multi-core system as a computational node. Predicting the communication performance of multi-core cluster systems is complicated job, but finding out it is important to use multi-core system efficiently. In the previous study, we introduced the simple linear...

  • Networking processors decrease cost, increase integration. Cravotta, Robert // EDN;2/2/2006, Vol. 51 Issue 3, p26 

    The article introduces the Octeon CN31XX and CN30XX networking processors from Cavium Networks Inc. The processors integrate a custom MIPS64 processor core with hardware-acceleration options for layers 3 to 7 data. They offer content processing and security services. They enable lower system...

  • Scheduling nonlinear divisible loads in a single level tree network. Suresh, S.; Kim, H.; Run, Cui; Robertazzi, T. // Journal of Supercomputing; 

    In this paper, we study the scheduling problem for polynomial time complexity computational loads in a single level tree network with a collective communication model. The problem of minimizing the processing time is investigated when the computational loads require polynomial order of...

  • MC-MIPOG: A Parallel t-Way Test Generation Strategy for Multicore Systems. Younis, Mohammed I.; Zamli, Kamal Z. // ETRI Journal;2010, Vol. 32 Issue 1, p73 

    Combinatorial testing has been an active research area in recent years. One challenge in this area is dealing with the combinatorial explosion problem, which typically requires a very expensive computational process to find a good test set that covers all the combinations for a given interaction...

  • Multicore Patents. Cass, Stephen // Technology Review;May/Jun2010, Vol. 113 Issue 3, p76 

    The article reports on the awarding of U.S. patent 5,617,537 to NTT Communications Corp. in 1997 for its multiple processing units, also known as cores.

  • Efficient and Scalable Parallel Algorithm for Sorting Multisets on Multi-core Systems. Cheng Zhong; Zeng-Yan Qu; Feng Yang; Meng-Xiao Yin; Xia Li // Journal of Computers;Jan2012, Vol. 7 Issue 1, p30 

    By distributing adaptively the data blocks to the processing cores to balance their computation loads and applying the strategy of "the extremum of the extremums" to select the data with the same keys, a cache-efficient and thread-level parallel algorithm for sorting Multisets on the multi-core...

  • Reducing Shared Cache Misses via dynamic Grouping and Scheduling on Multicores. Hossam El Din, Wael Amr; ElSayed, Hany Mohamed; Talkhan, Ihab ElSayed // International Journal of Advanced Computer Science & Application;Sep2014, Vol. 5 Issue 9, p135 

    Multicore technology enables the system to perform more tasks with higher overall system performance. However, this performance can't be exploited well due to the high miss rate in the second level shared cache among the cores which represents one of the multicore's challenges. This paper...

  • DESIGN FOR REAL-TIME CONTROL: Embedded Computing on Multicore Processors. HARTMAN, KIM // Machine Design;9/4/2014, Vol. 86 Issue 10, p52 

    The article offers information on the embedded computing platforms on multicore processors that can help cut the cost of manufacturing and engineering development. Topics covered include the challenges of consolidating processing tasks, the virtualization approaches that works for this platform,...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics