Are You Prepared for a Technical Disaster?

Whenever I ask this question to a business owner, CEO, CFO the answer is always emphatically YES!  However my review of IT operations typically reveals that the IT disaster plan needs to be fortified.

The following recommendations are based upon my thirty plus years specializing in Construction IT:

The foundation of all IT disaster planning is focused around the following objectives, they are determined during the course of developing a Business Continuity Plan.

SLA – Service level agreement

Although the SLA between IT and management includes many measurable objectives, we are usually focused around the uptime objective and response level to an outage from IT staff and IT Vendors.

RPO- Recovery Point objective

It is the maximum tolerable period in which data might be lost from an IT service due to a major incident.  This is usually determined as part of Business Continuity planning with a Business impact analysis.

RTO- Recovery Time Objective

This is the duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in Business continuity.

I once had a client that interrupted me when I brought up the discussion of these objectives, only to loudly proclaim that “WERE NOT F’n DELTA AIRLNES”!!

Trust me on this, after working with clients over thirty years on these issues, when disaster strikes they ALL act like they are F’n Delta Airlines!!

The SLA, RPO and RTO objectives, are a discussion that ALL IT professionals should keep persisting on with management.

What type of disaster events are we talking about?

  • Fire
  • Flood
  • Theft
  • Power outage
  • Natural Disaster
  • Disgruntled Employee
  • Data Corruption
  • Malware/Ransomware
  • Many Possibilities

When disaster strikes, and it WILL,  everyone on the IT and Management team needs to be ready to execute the plan.  To the extent that your budget will allow, we need to fortify every area of IT operations.  The following list of guidelines are general suggestions that are meant to shorten the time it takes to recover from a disaster.

On-Site Data Center or IT Closet:

[checklist]

  • Secure location (Restricted Access)
  • Room walls fortified
  • No Sprinklers above equipment
  • Large enough to provide full walk-around access to equipment
  • Room is accessible during power outage- Key for building access
  • After hours contact with Building Management
  • Battery powered lighting
  • Fire Suppression
  • EVERYTHING is labeled
  • Document on Startup and Shutdown procedures is Physically attached to rack or server
  • Consider appliance to monitor room
  • Keep room organized, Neat, Clean[/checklist]

Air Conditioning & Ventilation:

[checklist]

  • Independent from Building Air Conditioning
  • Schedule regular maintenance
  • Alerting when temp starts rising above a determined level
  • Vents and or battery powered fans that are triggered by an outage.
  • Condensation drainage is away from equipment[/checklist]

Power:

[checklist]

  • Dedicated power circuit
  • Sufficient uptime on battery backup to support onsite response
  • Remote access to UPS
  • Alerting and Notification from UPS including temp thresholds
  • Use PDU’s or power distribution units (Enterprise grade power strips)
  • Ideally building has generator to fortify  battery backup
  • Consider graceful shutdown with power backup software
  • Emergency Power Shutdown switch
  • Equipment:
  • USE BUSINESS CLASS OR ENTERPRISE CLASS EQUIPMENT!!
  • ALL equipment MUST have an ON-SITE maintenance agreement
  • All equipment MUST be able to be remotely managed.
  • All equipment MUST be monitored with alerting and notification software
  • Every business should Consider Virtual Server technology for better HA, Disaster recovery options
  • All equipment should be optimized for energy and cooling impact.
  • All equipment and cables should be clearly labeled.
  • OLD EQUIPMENT AND CABLING REMOVED FROM ROOM
  • Single Monitor on KVM switch
  • Redundant equipment for active or passive failover[/checklist]

Data Communications:

[checklist]

  • Choose Service provider with SLA agreement
  • Consider having failover to secondary provider
  • Use firewall with Internet appliance, active subscription for filtering and DLP
  • Backup Router and Internet appliance on USB drive
  • Use analog pots lines for alerting and notification and access to PDU’s
  • Crisis link for routing calls to a service or remote office during outage
  • Have phone numbers for Telcos and circuit ID’s included in Disaster Documentation
  • Have a secondary way of communicating with all employees (text message service)
  • Phone message service for all employees to call in to for status of systems[/checklist]

Off-Site Location:

[checklist]

  • DO not assume that Hosting Data Center is following all Best Practices.  Request documentation of their testing procedures
  • Data Backup requirements are the same as onsite
  • Make contingency for Hosting provider going dark
  • Consider advanced services with geo-redundant backup
  • Access to Data center after a natural disaster[/checklist]

Backup:

[checklist]

  • Backup of files and images should be Onsite and Offsite
  • Multiple strategies for different tiers of data
  • Establish retention policies
  • Backup critical desktop files
  • Use inexpensive cloud storage for dormant archived data
  • Use products that create images in addition to file backup and can be mounted
  • Check to see if Open files or databases are being backed up
  • Critical files and systems backed up locally to San or Nas and to Cloud
  • Have a schedule to verify and test backups- NEVER ASSUME[/checklist]

Who’s Watching?

[checklist]

  • Consider using a service provider to Manage Servers (At minimum)
  • These interruptions typically happen on the weekends and late evening- Whose Assigned?
  • Hurricanes or other natural disasters may require IT staff to relocate out of area with family.
  • Contract in advance with IT service provider for assistance during outages[/checklist]

Summary:

[checklist]

  • Have Disaster Plan Fully documented and available on Hard Copy, Make the instructions simple enough for the Boss to follow.  They will really appreciate this if you get hit by a bus or Coconut!
  • Plan and budget to your SLA, RPO and RTO
  • Follow best practices
  • Don’t ASSUME!
  • Have an annual meeting with Management to refresh Disaster plan
  • Have quarterly review with IT staff
  • Drill Baby Drill![/checklist]

[author image=”http://www.constructionconnection.com/blog/wp-content/uploads/2014/05/Ken-Kohl_Stratus-Data-Solutions.jpg” ]Ken Kohl, President and founder of Stratus Data Solutions, has more than 35 years of experience specializing in Construction Industry IT.  Having worked with hundreds of Construction firms and IT Director of one of the largest GC’s in Florida, he understands how to deliver a successful IT Strategy to any size construction firm.  Contact Ken at Kkohl@Stratus-Data.com [/author]