Save
10%
Instantly

Use coupon code blog

  • DAYS

  • HOURS

  • MINUTES

  • SECONDS

HURRY UP!
OFFER EXPIRES IN:

Failover? What failover?

Recently, our phone company grasshopper.com had a massive outage that lasted over 24 hours. The original outage was due to a raid failure which is bound to happen sooner or later. Hard drives are always the part that fails the most — however, that’s why you make backups, right?

Grasshopper.com decided to fail over to their secondary/failover site while they tried to fix the raid, however their secondary site didn’t work as expected. I’ve personally seen this time and time again — companies put thousands (sometimes millions) into a secondary / redundant setup and in the prime moment where you need it the most, it doesn’t work for you. It’s easy to forget about your secondary site — usually you never log into it, once it’s setup and working — people just let it run itself. However, is that the right thing to do? Of course not.

So how do we prepare for failures? There’s three major failures we’re concerned about, we’ll outline them below:

Data loss – As I mentioned earlier, hard drives are not reliable — even at times, raids aren’t reliable, however we do our best to make sure customer data is safe. We run raid on all of our servers except 3 — those servers are very old and will be getting replaced shortly. By running raid, it gives us the safety net knowing if one drive fails, the server is still going to stay up and continue processing. So, what happens if we lose two drives at once? In the unlikely situation that we do lose multiple drives at once, we have local backups on the server. We do local backups several times a week syncing customer websites to a secondary drive on the server, not related to the raid. Therefore, if the raid dies, we still have local copies of the data. So, what happens if both the raid dies and local backup drive dies? That’s when we pack up and go home for the day. We wish ;-) , however we have yet another method to make sure your data is safe — we sync our local backup drives to a secondary backup server weekly as well. We hope you understand now how serious we take keeping customer data safe. We have developed custom, in-house scripts to monitor that data and make sure it’s always in sync with each server. Lately, we always encourage our customers to take backups of their own data as well. You can download a full backup of your site by logging into cPanel -> Backup -> Generate Full Backup.

System Failure – Although hard drive failures are most common, an entire system failure can happen as well. Yes — we have solutions for these types of failures as well. In Europe, where we lease dedicated servers, the datacenter will replace the faulty parts for us and we’ll be back up and running in no time. In Phoenix, where we buy all of our own hardware, it’s a bit different. We have spare servers with the exact same specs sitting in our cabinets waiting in case of a failure. We buy premium Dell equipment which is brand new, therefore to-date, we’ve never had a system failure, however does that mean we shouldn’t spend the money to have extra machines sitting there doing nothing? No! If a system does fail, we can swap the drives and the customer will be back up in no time — and we can spend the rest of the day troubleshooting with Dell on what went wrong and fix the bad hardware without having the customer suffer.

Networking Failure – The last part failure we are concerned about is a networking failure. While we don’t run our own network in Europe and therefore don’t need to worry about it there — we do in Phoenix. In Phoenix, our entire network was designed by us and all equipment is owned by us. As we prefer to have complete control over our network, it also is prone to failures. To prevent a networking failure, we have redundant switches and routers sitting in our cabinets, similar to our server setup. In the event one dies, we will move everything to a spare and they will be back up. As far as providers, we currently have 3 network providers and have configured our network that if one has an outage, it will automatically switch to the other one within seconds. We do, unlike our phone company, test our failover and make sure it does work as designed.

We hope you have a better understanding of how we stand by our name, Stable Host. It’s not cheap by any means to have redundant everything, but it’s worth it in an outage. Every company will have outages from time to time — it’s a matter of who’s going to be prepared for it or not.


6 comments on "Failover? What failover?"

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Comment

You may use these tags : <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

RatePoint Business Reviews