TITLE

Data flow analysis for anomaly detection and identification toward resiliency in extreme scale systems

AUTHOR(S)
Kim, Byoung
PUB. DATE
July 2012
SOURCE
Journal of Supercomputing;Jul2012, Vol. 61 Issue 1, p6
SOURCE TYPE
Academic Journal
DOC. TYPE
Article
ABSTRACT
The increased complexity and scale of high performance computing and future extreme-scale systems have made resilience a key issue, since it is expected that future systems will have various faults during critical operations. It is also expected that current solutions for resiliency, mainly counting on checkpointing in hardware and applications, will become infeasible because of unacceptable recovery time for checkpointing and restarting. In this paper, we present innovative concepts for anomaly detection and identification, analyzing the duration of pattern transition sequences of an execution window. We use a three-dimensional array of features to capture spatial and temporal variability to be used by an anomaly analysis system to immediately generate an alert and identify the source of faults when an abnormal behavior pattern is captured, indicating some kind of software or hardware failure. The main contributions of this paper include the innovative analysis methodology and feature selection to detect and identify anomalous behavior. Evaluating the effectiveness of this approach to detect faults injected asynchronously shows a detection rate of above 99.9% with no occurrences of false alarms for a wide range of scenarios, and accuracy rate of 100% with short root cause analysis time.
ACCESSION #
76373448

 

Related Articles

  • NoCs Design for Verification. Hahanov, V.; Yegorov, O.; Mostova, K. // Electronics & Electrical Engineering;2007, Issue 75, p45 

    To deploy high performance computing on a chip requires to place the number of the processors in networks on chip (NoC). To fulfill growing market demands number of processors and other IPs on a chip is also increases. Because of that general purpose bus is not efficient to provide communication...

  • Scheduling contention-free irregular redistributions in parallelizing compilers. Hsu, Ching-Hsien; Chen, Shih-Chang; Lan, Chao-Yang // Journal of Supercomputing;Jun2007, Vol. 40 Issue 3, p229 

    Irregular array redistribution has been paid attention recently since it can distribute different size of data segment to heterogeneous processors according to their computational ability. It’s also the reason why it has been kept an eye on load balance. High Performance Fortran Version 2...

  • Non Linear Average Model of Switching Loss Using in a Virtual Prototyping. Benachour, R.; Latreche, S.; Latreche, M. E. H.; Gontrand, C. // International Review on Modelling & Simulations;Oct2010, Vol. 3 Issue 5, p759 

    In power electronic integration design, the virtual prototyping is the most important stage during the design flow. To obtain accurate results from the prototyping, it is necessary to estimate the losses in different operating function modes. This paper presents a new approach, which consists of...

  • Pattern Matching and Membership for Hierarchical Message Sequence Charts. Genest, Blaise; Muscholl, Anca // Theory of Computing Systems;May2008, Vol. 42 Issue 4, p536 

    Several formalisms and tools for software development use hierarchy in system design, for instance statecharts and diagrams in UML. Message sequence charts (MSCs) are a standardized notation for asynchronously communicating processes. The norm Z.120 also includes hierarchical HMSCs. Algorithms...

  • TOWARDS TASK DYNAMIC RECONFIGURATION OVER ASYMMETRIC COMPUTING PLATFORMS FOR UAVS SURVEILLANCE SYSTEMS. BINOTTO, ALECIO P. D.; DE FREITAS, EDISON P.; WEHRMEISTER, MARCO A.; PEREIRA, CARLOS E.; STORK, ANDRE; LARSSON, TONY // Scalable Computing: Practice & Experience;2009, Vol. 10 Issue 3, p277 

    High-performance platforms are required by modern applications that make use of massive calculations. Actually, low-cost and high-performance specific hardware (e.g. GPU) can be a good alternative along with CPUs, which turned to multiple cores, forming powerful heterogeneous desktop execution...

  • Meeting the Challenges of Data-intensive Science. Tracy, Suzanne // Scientific Computing;Nov2011 Supplement 2, p4 

    An introduction is presented in which the editor discusses various reports within the issue on topics including data intensive science, high performance computing and large-scale molecular simulations.

  • Welcome to Front Line: HPC. Tracy, Suzanne // Scientific Computing;Oct2011, Vol. 28 Issue 5, pF3 

    The article presents information about this issue of the journal that includes growth and demands in the generation and analysis of data, and formation of new collaborations related to innovating high performance computing applications and advancing scientific discovery and scholarship.

  • The data deluge. Clark, Warren // Scientific Computing World;Feb/Mar2008, Issue 98, p3 

    The article discusses various reports published within the issue including one by Clare Sansom on microarrays and another on the high-performance computing for scientists and engineers.

  • Provably Fast Training Algorithms for Support Vector Machines. Balcázar, José; Dai, Yang; Tanaka, Junichi; Watanabe, Osamu // Theory of Computing Systems;May2008, Vol. 42 Issue 4, p568 

    Support Vector Machines are a family of algorithms for the analysis of data based on convex Quadratic Programming. We derive randomized algorithms for training SVMs, based on a variation of Random Sampling Techniques; these have been successfully used for similar problems. We formally prove an...

Share

Read the Article

Courtesy of THE LIBRARY OF VIRGINIA

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics