Analysis of the Impact Factors on Data Error Propagation in HPC Applications
- Resource Type
- Conference
- Authors
- Utrera, Gladys; Gil, Marisa; Martorell, Xavier
- Source
- 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) PDP Parallel, Distributed and Network-based Processing (PDP), 2018 26th Euromicro International Conference on. :546-549 Mar, 2018
- Subject
- Computing and Processing
Benchmark testing
Software
Libraries
Toy manufacturing industry
Runtime
Sparse matrices
Computer architecture
data error propagation
reliability
MPI resilience
- Language
- ISSN
- 2377-5750
Algorithmic codes for scientific computing may exhibit diverse levels of tolerance to memory errors, depending on the program behavior when accessing data. There are factors that can be controlled in an HPC program and may influence the tolerance degree to memory errors. A characterization of the degree of vulnerability an application exhibits can help to improve its security as well as save time and resources. In this work, we study some main factors that may have an impact on the propagation of errors originated from memory accesses.