Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

Savadi, Abdorreza; Deldari, Hossein
February 2014
Journal of Supercomputing;Feb2014, Vol. 67 Issue 2, p565
Academic Journal
Computer benchmarking is a common method for measuring the parameters of a computational model. It helps to measure the parameters of any computer. With the emergence of multicore computers, the evaluation of computers was brought under consideration. Since these types of computers can be viewed and considered as parallel computers, the evaluation methods for parallel computers may be appropriate for multicore computers. However, because multicore architectures seriously focus on cache hierarchy, there is a need for new and different benchmarks to evaluate them correctly. To this end, this paper presents a method for measuring the parameters of one of the most famous multicore computational models, namely Multi-Bulk Synchronous Parallel (Multi-BSP). This method measures the hardware latency parameters of multicore computers, namely communication latency ( g) and synchronization latency ( L) for all levels of the cache memory hierarchy in a bottom-up manner. By determining the parameters, the performance of algorithms on multicore architectures can be evaluated as a sequence.


Related Articles

  • A Matrix-Matrix Multiplication methodology for single/multi-core architectures using SIMD. Kelefouras, Vasilios; Kritikakou, Angeliki; Goutis, Costas // Journal of Supercomputing;Jun2014, Vol. 68 Issue 3, p1418 

    In this paper, a new methodology for speeding up Matrix-Matrix Multiplication using Single Instruction Multiple Data unit, at one and more cores having a shared cache, is presented. This methodology achieves higher execution speed than ATLAS state of the art library (speedup from 1.08 up to...

  • Performance analysis and optimization of MPI collective operations on multi-core clusters. Tu, Bibo; Fan, Jianping; Zhan, Jianfeng; Zhao, Xiaofang // Journal of Supercomputing;Apr2012, Vol. 60 Issue 1, p141 

    Memory hierarchy on multi-core clusters has twofold characteristics: vertical memory hierarchy and horizontal memory hierarchy. This paper proposes new parallel computation model to unitedly abstract memory hierarchy on multi-core clusters in vertical and horizontal levels. Experimental results...

  • Hierarchical Binary Set Partitioning in Cache Memories. Zarandi, Hamid Reza; Sarbazi-Azad, Hamid // Journal of Supercomputing;Feb2005, Vol. 31 Issue 2, p185 

    In this paper, a new cache placement scheme is proposed to achieve higher hit ratios with respect to the two conventional schemes namely set-associative and direct mapping. Similar to set-associative, in this scheme, cache space is divided into sets of different sizes. Hence, the length of tag...

  • Optimizing the Management of Reference Prediction Table for Prefetching and Prepromotion. Junjie Wu; Xuejun Yang // Journal of Computers;Feb2010, Vol. 5 Issue 2, p242 

    Prefetching and prepromotion are two important techniques for hiding the memory access latency. Reference prediction tables (RPT) plays a significant role in the process of prefetching or prepromoting data with linear memory access patterns. The traditional RPT management, LRU replacement...

  • Data Classification Management with its Interfacing Structure for Hybrid SLC/MLC PRAM Main Memory. SUNG-IN JANG; SU-KYUNG YOON; KIHYUN PARK; GI-HO PARK; SHIN-DUG KIM // Computer Journal;Nov2015, Vol. 58 Issue 11, p2852 

    This research aims to design a new phase-change RAM (PRAM)-based main memory structure, supporting the advantages of PRAM while providing performance similar to that of conventional DRAM main memory. To replace conventional DRAMs with non-volatile PRAMs as the main memory components, comparable...

  • Dynamic Memory Instruction Bypassing. Ortega, Daniel; Valero, Mateo; Ayguadé, Eduard // International Journal of Parallel Programming;Jun2004, Vol. 32 Issue 3, p199 

    Reducing the latency of load instructions is among the most crucial aspects to achieve high performance for current and future microarchitectures. Deep pipelining impacts load-to-use latency even for loads that hit in cache. In this paper we present a dynamic mechanism which detects relations...

  • The Cache Assignment Problem and Its Application to Database Buffer Management. Levy, Hanoch; Messinger, Ted G.; Morris, Robert J. T. // IEEE Transactions on Software Engineering;Nov96, Vol. 22 Issue 11, p827 

    Given N request streams and L ≤ N LRU caches, the cache assignment problem asks to which cache each stream should be assigned in order to minimize the overall miss rate. An efficient solution to this problem is provided, based on characterizing each stream using the stack reference model...

  • AN INTEGRATED SIMULATION INFRASTRUCTURE FOR THE ENTIRE MEMORY HIERARCHY: CACHE , DRAM, NONVOLATI LE MEMORY, AND DISK. Stevens, Jim; Tschirhart, Paul; Mu-Tien Chang; Bhati, Ishwar; Enns, Peter; Greensky, James; Chisti, Zeshan; Shih-Lien Lu; Jacob, Bruce // Intel Technology Journal;2013, Vol. 17 Issue 1, p184 

    As computer systems evolve towards exascale and attempt to meet new application requirements such as big data, conventional memory technologies and architectures are no longer adequate in terms of bandwidth, power, capacity, or resilience. In order to understand these problems and analyze...

  • A New Transmission Gate based Single Ended SRAM Cell. Kumar, T. Siva; Kumar, Arvind // International Journal of Applied Engineering Research;2010, Vol. 5 Issue 2, p261 

    The cache memories are an important part of a typical memory hierarchy. They help to match the fast processor, with a slower secondary memory, which acts as a bottleneck for high speed computing. The DRAM, another important memory, with its small, one transistor cell, looks very tempting to be...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics