Failing Gracefully: Using AWS for Web Site Failover
When it comes to the Internet, keeping your organization's presence online is crucial to the accessibility of resources for customers, potential and existing. At NetWorks Group, we understand that despite the best of intentions and planning, downtime will likely still occur, at least a few minutes per year. Many teams put forth a goal of 100% uptime for their web site, but often get a dose of reality when a large storm hits their data center or other issues pop-up that may be out of their direct control. To this end, we wanted a way to minimize full-downtime so that our presence on the Internet would only be down as minimally as possible, without going over-the-top on infrastructure to do so.
Amazon Web Services provides a plethora of cloud services to help teams do more for their environment with less overhead of capital expenditures. By cherry-picking needed services with AWS, you can find great cost-saving solutions to otherwise expensive — or complicated — problems. In the instance of a web site, the overhead costs and management of a second (or third?) data center to avoid an hour of downtime a year may be overkill for many organizations. For NetWorks Group, our web site being down, while not desirable, is not so critical that it will impede our ability to provide amazing service to our customers. With that in mind, we wanted to take a direction with web site downtime that would be economical, easy to manage, but also give us a minimal downtime of our Internet presence.
By utilizing the AWS services Route 53 and S3, we're able to provide a great failover solution when our primary web server is unreachable or down. In February 2013, Route 53 added features to allow for DNS Failover and S3 Website Hosting. The idea is that a simple health check — i.e., AWS verifies it can receive a 200 response code from your web server — will decide whether or not to failover your web site from its regular home to a special S3 bucket with your "downtime" page. By configuring a low DNS Time-to-Live (TTL), your DNS record can be changed to point to this failover end-point within a minute or two. Through having this S3 bucket at the ready, you can automatically failover to a static-content site to provide critical information to customers such as contact information, expected time-to-recovery, etc.
So the next time your team is considering spending double or triple its budget to handle a few annoying minutes of downtime, think about utilizing Amazon or other cloud service providers to handle the problem gracefully and economically.