Lab Home | Phone | Search | ||||||||
|
||||||||
Under certain conditions, an application's optimal checkpoint interval can be determined as a function of the dump time and application mean time to interrupt (AMTTI). In practice, an estimate of AMTTI for each application is therefore necessary to assign an optimal checkpoint interval. This estimate is based on a number of job and system parameters that can be difficult to determine and may even change over time. Errors in estimating AMTTI lead to errors in assigning optimal checkpoint intervals. This in turn impacts average application efficiency. By making use of BeoSim, a discrete-event driven multi-cluster simulator, we study the impact of non-optimal checkpoint intervals on overall application efficiency. Using LANL's Pink cluster and workload to parameterize the simulator, we find that dramatically overestimating the AMTTI has a fairly minor impact on application efficiency. The first two-thirds of the talk will introduce BeoSim and this recent study of non-optimal checkpoint intervals; while the latter third will detail some previous work regarding the use of a checkpoint-migration scheme to mitigate network over-subscription in a grid environment.
Dr. Will Jones is an assistant professor of Computer Science at Coastal Carolina University in Myrtle Beach, South Carolina. He previously held the position of assistant professor of Electrical Engineering at the United States Naval Academy for two years prior to accepting a position at CCU. His research interests include parallel job scheduling and resilience in computational clusters. He earned a Ph.D. in Computer Engineering from Clemson University in 2005.
|