

Type of Document Master's Thesis Author Tadepalli, Sriram Satish URN etd-12292003-134023 Title GEMS: A Fault Tolerant Grid Job Management System Degree Master of Science Department Computer Science Advisory Committee
Advisor Name Title Dr. Calvin J. Ribbens Committee Chair Dr. Dennis G. Kafura Committee Member Dr. Srinidhi Varadarajan Committee Member Keywords
- fault tolerance
- grid computing
- grid job management systems
- local resource manager
- job migration
Date of Defense 2003-12-19 Availability restricted Abstract The Grid environments are inherently unstable. Resources join and leavethe environment without any prior notification. Application fault
detection, checkpointing and restart is of foremost importance in the
Grid environments. The need for fault tolerance is especially acute
for large parallel applications since the failure rate grows with the
number of processors and the duration of the computation.
A Grid job management system hides the heterogeneity of the Grid and the
complexity of the Grid protocols from the user. The user submits a job
to the Grid job management system and it finds the appropriate
resource, submits the job and transfers the output files to the user
upon job completion. However, current Grid job management systems do
not detect application failures.
The goal of this research is to develop a Grid job management system
that can efficiently detect application failures. Failed jobs are
restarted either on the same resource or the job is migrated to
another resource and restarted. The research also aims to identify the
role of local resource managers in the fault detection and migration
of Grid applications.
Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access thesis.pdf 328.16 Kb 00:01:31 00:00:46 00:00:41 00:00:20 00:00:01 indicates that a file or directory is accessible from the Virginia Tech campus network only.
If you have questions or technical problems, please Contact DLA.