CJTCS Volume 1998 Article 4

Multitolerance in Distributed Reset

Sandeep S. Kulkarni (The Ohio State University) and Anish Arora (The Ohio State University)
7 December 1998
Abstract

A reset of a distributed system is safe if it does not complete ``prematurely,'' i.e., without having reset some process in the system. Safe resets are possible in the presence of certain faults, such as process fail-stops and repairs, but are not always possible in the presence of more general faults, such as arbitrary transients. In this paper, we design a bounded-memory distributed-reset program that possesses two tolerances: (1) in the presence of fail-stops and repairs, it always executes resets safely, and (2) in the presence of a finite number of transient faults, it eventually executes resets safely. Designing this multitolerance in the reset program introduces the novel concern of designing a safety detector that is itself multitolerant. A broad application of our multitolerant safety detector is to make any total program likewise multitolerant.


[] Article 3 [] Volume 1999, Article 1
[back] Volume 1998 [back] Published articles
[CJCTS home]

Last modified: Tue Feb 9 20:50:58 CST 1999