Questions tagged [disaster-recovery]

Disaster recovery and preparedness is an unfortunate aspect of systems administration. This tag should be used for help with planning, implementation and best-practices related to recovering from a catastrophic event on a server or in a datacenter environment.

Recovering from an unplanned, catastrophic outage is a painful process whether you are managing a single server or an entire datacenter. Roof leaks, broken water lines, power outages and any number of other events can take what was a great day and turn it into a living nightmare when you are responsible for keeping systems others rely on available.

The key to recovering from any disaster is preparedness. Knowing the steps required to bring the network and systems back online is critical. Before one can properly prepare for a disaster it is necessary to understand the risks, bottlenecks and other critical components of the overall system, e.g. who controls the power, internet, etc at your site. Understanding the aspects of disaster recovery that are within ones control is a very important aspect when planning; if there is not someone on staff who can fix the power, HVAC, etc make sure that the contact info for someone who can is written down somewhere. Having a large amount of information available before a disaster occurs will help to keep everyone calm, cool and on-task when something actually does happen.

Once a risks are assessed and a plan is created, print out physical copies, email it, and make sure everyone with admin level access to the systems/datacenter has read and is familiar with them. The best plan in the world is worthless if it is on a system that is down and cannot be easily restored without following the plan. After everyone is familiar with the plan, practice when possible; in many situations it may not be realistic, but if possible take advantage of planned downtimes or natural outages to go through the recovery plan and refine it.

In summary, when a disaster happens:

  1. Don't Panic! Panic turns a debacle into a catastrophe every time.
  2. Plan ahead, understand the risks, and know what is within your control
  3. Follow the plan but be flexible, a recovery plan is more of a jazz tune than a military march
  4. Stay calm and organized, use check lists, keep notes
  5. If you are working in a team or group communicate and collaborate
  6. Be vigilant, update your plan as the environment changes
  7. Check your backups, make sure they happen at regular intervals and that the data contained therein is still good.
359 questions
40
votes
20 answers

What's your checklist for when everything blows up?

Users can't get to their e-mail, the CEO can't get to the company's home page, and your pager just went off with a "911" code. What do you do when everything blows up?
Jon Galloway
  • 1,506
  • 1
  • 18
  • 20
29
votes
11 answers

Disaster recovery plan development best practicies or resources?

I have been tasked with leading a project regarding updating a old and somewhat onesided disaster recovery plan. For now we're just looking at getting the IT side of DR sorted out. The last time they did this they set their scope by making up a…
Laura Thomas
  • 2,855
6
votes
2 answers

WHEN to put the contingency plan into action in case of a main server failure?

We have a production SQL Server database server shipping transactional log backups to two standby servers. The disaster recovery plan is already finished: we have a well documented procedure and people trained to put the standby server into…
IT2
  • 63
6
votes
2 answers

Green System Administrator looking for helpful tips

I have just been promoted to Systems Administrator for our product. We are designing a application that communicates with the cloud(Amazon EC2). I will be in charge of maintaining all Instances and their underlying components. So far this involves a…
3
votes
1 answer

What Windows Server Roles should system state be backed up on?

Currently I have the following types of servers all with single roles per server: AD (DNS, AD, Sysvol, Com+, Certificate stores, Registry) IIS (Metabase) Exchange FileServer MSSQL FileServer TerminalServer HyperV Host What are the drawbacks of not…
3
votes
1 answer

What is considered a disaster?

In doing research on a disaster recovery plan and trying to develop scenarios that must be accounted for, I realize that there are a number of different events that qualify as disasters. For example, all of these can be considered…
3
votes
2 answers

How best to handle end user notification in the event of system failure incl. email?

I've been asked to research ways of handling end user notifications when systems such as email are experiencing problems. Perhaps an example will make this a little clearer. We have a number of sites in different countries. Recently email was…
Brian Lyttle
  • 1,757
2
votes
2 answers

Data Recovery

I am looking for a way to recover data off a perfectly good external HDD. Not a problem usual. The drive was being used as a backup drive for computers that I was working on during a reformat. Several problems occurred during my instillation of…
Eric Rich
  • 155
2
votes
2 answers

Physical proximity of Disaster Recovery site

We are researching a hosting company to hold our DR site. The problem is that they are in neighboring states. One host is in Herndon, Virginia. The other is in Charlotte, NC. Are these too close together for a primary and DR? Does anyone know of…
0
votes
2 answers

Is it possible to create recovery CDs/DVDs from a recovery partition?

I recently acquired a HP laptop that has no recovery CDs or DVDs nor is there any software that will allow me to create them on the laptop. Furthermore is the laptop is infested with every imaginable virus and trojan. The laptop does however have a…
0
votes
3 answers

Disaster Recovery Standby Server

I work for a small business with 25 users and 2 servers. 1 server is the DC running Windows Server 2003/Exchange 2003. We want a reliable disaster recovery strategy for this server without having to spend a lot of money. We take regular backups but…
0
votes
1 answer

RTO and RPO - Data Loaders, CSG will have only RTO whereas Databases will have RPO as well as RTO. Is it correct?

Data Loaders, CSG will have only RTO whereas Databases will have RPO as well as RTO. Is it correct?
user60551
0
votes
1 answer

Disaster recovery advice

Possible Duplicate: Disaster recovery plan development best practicies or resources? We are in the process of starting an internal evaluation about the disaster recovery procedures for our datacenter. Can you suggest any good book/site that you…