![]() |
What is Disaster Tolerance? Disaster Tolerance (DT) in computing and communications systems refers to the ability of IT systems and communications infrastructure and business or organizational processes to maintain a degree of functionality after a disaster has occurred. Disaster Tolerance provides an ability to continue operations uninterrupted despite occurrence of a disaster that significantly interrupts normal organizational operations. Specifically within DT, critical business functions and technologies continue operations, as opposed to resuming them. |
| Disaster tolerance is a superset of fault tolerance methods in that a disaster may occur which causes rapid, almost simultaneous, multiple points of failure in a system, as well as single points of failure, that escalate into a wide catastrophic system failures. | |
Download this Paper - IT Availability & Disaster Tolerant Computing
Download this presentation on Disaster Tolerant Computing 2008 new!
The Disaster Tolerance Problem
Studies of regional traumatic events conducted over the last several years indicate that if a business experiences a significantly disruptive event and is not adequately prepared to mitigate this risk, it is highly unlikely that such organizations will survive the disaster. Businesses need an ability to tolerate such an occurrence, insure availability of human resources, protect business data and continue information systems availability in order to manage the risk of disaster.
Recent studies indicate only 6% of companies that suffer significant data loss survive. 43 % of such companies never reopen and 51 % of these businesses close within two years. The real problem with the traditional Disaster Recovery approach is that “DR Doesn't Work”; not very well and not very often. Conventional established DR practices have proven insufficient in protecting businesses against disasters and do not enable continued operations throughout disaster occurrence. This results in a large investment in DR that is literally wasted in failed recovery and lost capital expenditure on traditional Disaster Recovery and Business Continuity Planning approaches. These practices do not produce the desired result: continued organizational and business operations. Research in Disaster Tolerance seeks to address these challenges.
Increased Threat
The terrorist events of September 11, 2001 and the US Northeast power outage of August, 2003, combined with Hurricane Katrina of 2005 provide recent examples of devastating man made disasters and massively destructive natural disasters in the US which demonstrate a shared corporate and governmental inability to successfully resume or continue normal operations after these types of events occur. Businesses have become aware that they cannot rely on government institutions alone to successfully facilitate this process. Manmade disasters continue to represent an increasing portion of the total causes of IT interruptions. Additional data from Gartner suggests that almost 80 percent of application downtime is due to people or process related issues caused by application and operation error.
Challenges with Disaster Recovery
Organizations are challenged by the complexity and difficulties in successfully formulating, implementing and executing traditional DR plans and BCP initiatives. High availability IT applications and hardware for DR processes are increasingly complex and require significant manual, human interaction. Mounting evidence regarding execution of DR plans suggests that errors made in executive management strategy, personnel management within a crisis, miscalculations in impact and risk assessment, as well as incorrect assumptions and inadequately tested processes, are common errors which result in deficient DR and BCP efforts.
Unfortunately, DR & BCP efforts are often done after an IT solution has been designed and implemented, not before, where it could have had the most beneficial effect on IT architecture and appropriate implementation. In effect, traditional DR plans commonly attempt to force IT applications, technology infrastructure and organizational business processes themselves, to function in manners in which they were not designed to function. Additionally, organizational, management, people and process based aspects of DR and BCP also commonly prove faulty.
Inadequate Disaster Preparedness
Data indicates disaster preparedness is not a business priority for most US and UK companies, and a lack of executive visibility, responsibility and investment in corporate DR and BCP is prevalent. These factors collectively indicate an inability for businesses to reach the goal of providing organizational continuance in the event of a traumatic disaster. In actuality, a large portion of organizational investment in disaster recovery and business continuity is literally wasted in the failed recovery processes itself, reducing the value of this investment in the small percentage of companies who make it. This lost investment does not produce the desired result: disaster tolerant business processes, IT applications, systems, and related infrastructure.
Research in Disaster Tolerant Computing seeks to address these complex organizational, business and technology challenges.
For more information, visit the visit the
and the SMU Disaster Tolerant Papers and Documents section.


