Fault-tolerant computing options based on the use of restart information stored on and off node and the use of reserve processes have been developed, implemented and tested in a large-scale, production field solver taken from the domain of computational fluid dynamics. The tests conducted to date have shown good results, with recovery rates approaching 100% under realistic node failure scenarios. Even though the computational overhead of the field solver is very low (explicit time-marching and finite differences), the fault-tolerant implementation adds a run-time penalty that is only in the range of 6–12%, depending on the spatial and temporal approximation used. The procedures developed are generally applicable, and could easily be ported to other codes. [ABSTRACT FROM AUTHOR]