System-level fault-tolerance in large-scale parallel machines with buffered coscheduling
- 10 June 2004
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
No abstract availableThis publication has 6 references indexed in Scilit:
- USE OF PREDICTIVE PERFORMANCE MODELING DURING LARGE-SCALE SYSTEM INSTALLATIONParallel Processing Letters, 2005
- On the feasibility of incremental checkpointing for scienti .c computingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- BCS-MPIPublished by Association for Computing Machinery (ACM) ,2003
- A survey of rollback-recovery protocols in message-passing systemsACM Computing Surveys, 2002
- The Quadrics network: high-performance clustering technologyIEEE Micro, 2002
- BProcPublished by Association for Computing Machinery (ACM) ,2002