More than you wanted to know about backups
It was a topic for discussion multiple times at the Morro Bay Photo Expo. It’s a continuing topic online in various blogs. It’s a continuing problem where the solutions seem simple in theory, but in practice…
So while I’ve written about it before, I realized while I was at the expo that my own backups weren’t in great shape (in theory vs. in practice), so when I got back, I fixed that, and so here’s a snapshot of what I do and why I do it.
George Barr at behind the lens has written a couple of pieces on backups I found really interesting:
- A good computer system requires two separate and distinct components – both reliability and backup
- What are you thinking about reliable painless off site backup?
But first, here’s a good take on backups from the point of view of a developer by Steven Frank.
And now my take. I try to be pretty anal about backups — despite that, in the last year, we lost some data off of Laurie’s disk when it failed because (ta da!) at some point I turned Time Machine off on that machine and forgot to turn it back on, and in Leopard, Time Machine’s ability to notify you of a problem like this is, well, non-existant. They have partially fixed that problem in Snow Leopard, but still: that tells you just how easy it is to screw this up, and not know until it’s too late.
The quest for the perfect backup system continues. It doesn’t exist. For me, “perfect” would imply:
- Fast
- Reliable
- Turn-key and non-invasive
- Cost-effective
Preferably, it works out of the box without requiring me to figure out how to make it work; Time Machine actually takes a huge step forward in this regard, but still has weaknesses. All other solutions simply aren’t close.
Here’s what I do to back up my laptop:
- I split my data between two disks. My key data is on the disk on my laptop and goes with me everywhere. My “secondary” data is on a firewire drive that sits on my desk. That second disk is effectively an archive of things I may want to use (installers, movies and videos retired from itunes, etc) but don’t need on a regular basis. My lightroom catalog and my iTunes library both live on the laptop disk.
- I have a second firewire drive that I plug in when I’m wired onto the desk. That is my primary backup disk; on it, I use Superduper to make a bootable clone of my laptop drive onto this disk. Superduper is compatible with sharing a disk with Time Machine, so that disk also has a time machine backup on it. Superduper runs nightly, so that bootable clone is generally under 24 hours out of date. This is a feature, not a bug.
- I have a third drive, this a bus-powered firewire drive, that I carry with me on the road. It’s a bootable Superduper clone. When I’m at home, I update it weekly. When I’m on the road, I update it nightly. I do NOT run Time Machine on this drive, just Superduper.
- I’ve been backing up online for about the last year, using Amazon S3 and JungleDisk. I back up “key data” (photos, documents, and itunes) to S3 over the wire.
That is, if you’re counting, up to five copies of my key data, including an automated off-site backup online to S3. A few thoughts on why I do this:
- Time Machine is good for recovering A FILE. I don’t consider it acceptable to recover a disk for various reasons.
- This goes double for Time Capsule. When Apple released the Time Capsules, I bought two, one for my house, one for my mom’s house. I think it does a great job backing up my mom’s laptop at her house (this is the primary use case for this device, I think; non-technical user, light data usage, catastrophic recovery needs). For my uses, it fell short for many reasons — it doesn’t do well in data-intensive situations or multi-computer environments, and recovering over the network is beyond a pain (trust me, I tried). When Laurie’s disk failed, the first thing I did was clone the backups off the Time Capsule onto a disk (to preserve a copy, just in case) — and then found I couldn’t use that clone to recover the disk after plugging it into the computer. That forced me into a recovery over the network, and even running a long ethernet cable from the Time Capsule to the Mac made that recovery pretty painful. That along makes Time Capsule not acceptable to me in my environment.
- You should plan on your backup drive to be 3X the size of your drive that you’re backing up if you use Time Machine. Anything less, and it’ll probably end up thrashing with limited space and not keeping backups around as long as you’d like.
- Why backing up via superduper nightly is a feature, not a bug: If something corrupts and you don’t notice right away, one of the best ways to ruin your day is to realize that your backups are so efficient that they sucked in all of the corruption as well. That’s why it’s good to have a backup that’s only backed up once in a whilte, and even better, have a backup that you have to PLUG IN and MANUALLY back up. Because that way, you know you have a good backup even if your disk controller fries and writes gibberish over everything plugged into your computer. That’s why having a week-old backup in a drawer is a REALLY good thing — teach yourself to maintain it.
Bootable backups rock, as do bus-powered (i.e, don’t plug into an electrical socket) drives. If you’re on the road and your laptop fries (or runs away from home), a bootable, bus-powered drive means all you need to do is find a Mac and you can plug in and boot YOUR system; if you’re on the road with another Mac user, it makes surviving a lost computer a lot less painful (been there, done that); or you could even depending on circumstances find a cybercafe, or even overnight a new laptop from Amazon or Apple if that’s what it takes and be on the road again right away. That’s another reason why Time Machine shouldn’t be your only (or primary) backup.
I’ve bought my disks from Other Worlds Computing for years. I use their Mercury On-The-Go drives for my bus-powered carry-arounds. I user their Elite AL-Pro drives for my sit-on-desk and my archival drives. Laurie has an Elite AL-Pro dual-mechanism as her primary data disk on her Mac Mini, running mirrored drives via Softraid. SoftRaid rocks. The only reason I don’t use that configuration is that the ONLY failure I’ve had with a drive enclosure in the last half-dozen years was my RAID/mirror drive where the firewire interface died; I lost no data, but I never got around to replacing the enclosure; by the time it failed, I’d outgrown the drives anyway, so I just but a single-drive, much larger drive. Funny how data grows to fit available space.
That’s the first warning on RAID systems; there are still single points of failure in them. Two drives in a box is nice, unless the box itself fails.
RAID? SAN? NAS? Drobo? WTF?
It’s really easy to get lost and confused in the jargon. RAID? RAID 0? RAID 1? SAN? NAS? Drobo?
My view of all of this is simple: unless you NEED it, stay away. Keep it simple. The more complex you make your environment, the more pieces exist to go wrong, usually on deadline.
- RAID: RAID is a set of technologies that take multiple drives and hook them together in various ways. RAID 0 wires them up as one really large virtual drive. RAID 1 wires them up in parallel and writes the same data on each disk, in theory meaning the data will always exist even if one of the drives fails. Because theory rarely works as well in practice, they invented a bunch of other RAID options (RAID 5, RAID 10, RAID 1+0, RAID WTF) to try to accomplish in practice what RAID 0 and RAID 1 do in theory.
RAID is not a backup. RAID adds redundancy, but it is not a backup. If you don’t understand that concept, give a phone call to the people at Microsoft and Danger and ask about their Sidekick “oopsie”. Ask for Roz Ho (Hi, Roz!). RAID 1 has no capability to recover from many problems, including deleting a file off the disk (and wanting it back) or corrupting your data, because that data will be corrupted on all of your copies. that’s a bad thing.
RAID save you from a drive failure, but the drive you’re most likely to kill yourself when it fails is the one in your laptop or desktop computer, and it’s not set up for RAID. To me, RAID serves many useful purposes. I just don’t consider one of them to be backups. RAID can make it less likely for you to need to restore from a backup, but it doesn’t create or replace backups.
- SAN: Storage Area Network: If your data needs are complex enough to need a SAN, you either have an IT department, or you better plan on budgeting for one, at least with an IT guy on retainer.
- NAS: Network Area Storage: NAS boxes are hot among the geeks right now, for good reason. It’s basically a big fileserver that lives on your network. Your files are wherever your are, and in the days of Wifi and laptops where you carry your computer around the house, a very tempting option. you can find NAS boxes that are compatible with Time Machine, NAS boxes that are compatible with windows boxes, and NAS boxes that make breakfast and brew coffee in the morning.
I have a couple of problems with NAS boxes: first, all of your I/O goes over the network. I don’t care how you’ve built your network, unless it’s fiber optic, it’s slower than a disk attached to your computer.
The other big problem I have with NAS boxes; you are buying a computer that has a bunch of disks on it. That adds cost and complexity (and things that can fail) to the mix. The more complex your environment the harder it is to make it work reliably and the more likely something will fail along the way. I like SIMPLE. Plugging disks into my computer is simple. NAS is a lot less simple.
Now, if you are in a multi-computer, multi-user environment where sharing files happens regularly, then the cost and complexity of a NAS may well make sense for you. Buying a NAS just to back up one or two computers? To me, that makes no sense. Using it for backups as well as file sharing and storage for a small office of a few people? Different story. But for me, a NAS would make backups slower and less reliable, and not bring much to the equation to offset that. Your mileage will likely vary.
- Drobo: The Drobo
is another toy that a lot of geeks are drooling over. Basically, it’s a really smart, RAID-capable disk enclosure that worries about the details of data storage and tells you when you need to feed it more drives, and it worries about data migration and all of that. When they work, they work great. When they don’t — I know people who’ve gone insane dealing with them (but most of those were early adopters; Drobo’s done a good job of dealing with this stuff).
My complaints about Drobo are similar to the NAS — it adds cost and complexity, and for my needs, I just don’t see that I need it. Well, not now. But I’m going to buy a Drobo at some point, unless something better comes along, but not as long as I can live on (and backup with) simple drives reasonably, which is, right now, 2 terabytes. When my backup and data needs outstrip the size of a standard large hard drive, then Drobo is a good option. Until then, I’ll go with SIMPLE (and cheaper).
It is, honestly, hard to argue with inexpensive and simple, and inexpensive and simple is to take a nice, 2 terabyte drive and plug it into your computer via USB or Firewire and back up via Time Machine and Superduper.
The combination of Time Machine (for short term backups and needing “that one file back”) and Superduper (nightly to online disk, and weekly to offline disk) is simple, manageable and it works, and it protects you from just about any kind of data/hardware failure below massive catastrophic problems like your house burning down.
If you can store your backups on a 2 terabyte disk (the largest standard drive generally available right now), then all you really need is a couple of 2 terabyte drives. Anything else adds cost and complexity, not reliability or better backups.
If you CAN’T live on a couple of 2 terabyte drives, the first thing you should do is ask yourself whether you really need access to all of that data all of the time? It not, come up with a plan to subset your data into your active data and your archived data. Data you know won’t change much is a lot easier to back up and by figuring out what you dn’t need to carry around, you can probably get to the point where you don’t need the complex solutions to make your backup work. And you lower your risk at losing data if you can figure out what data doesn’t have to be carried around to be lost.
Offsite backups
What about that catastrophic problem? Your house just burnt down. Your office is underneath that mudslide. Now what?
For that, you need a copy of your data in a safe place. I’ve been using Amazon S3, others use a safe deposit box. Literally, any place where you can reasonably say “the chances of both places being destroyed at the same time is very small” works; I’m comfortable with leaving a disk in a locked drawer at work, for instance. Any disaster that takes out my house AND my office — I probably have bigger worries, if I’m around to worry about them.
The easiest way to handle an offsite backup (there’s that word again, SIMPLE): buy two firewire/USB 2 Terabyte disks. Plug one into your computer, do a time machine and Superduper backup. unplug. Take to work, lock in a drawer. Plug in the other computer and run backups.
Now, once a month, take your backup disk to work, take the disk at work home and plug it in.
How tough is that? So why don’t we? (hint: just do it)
I’ve been doing offsite backups to Amazon S3 for the last year. There are some nice advantages to it; it’s trivially easy (when you, say, don’t forget to turn the backups back on, when the network doesn’t fail, when… ) — I’ve had no complaints — zero, none, nada — with Amazon S3 and Jungledisk. It works great.
I’m going to stop doing it, too, in favor of the “buy another disk, swap it with the one at work” method. Here’s why:
- Cost: I’ve got about 45 gigabytes backed up on S3. That’s not all of my data. That’s the data that I can’t afford to lose. That data is growing every time I take a photo, and it’s not going to shrink. Currently, this is costing me $15-20/mo in storage and access charges. That’s roughly $200 a year. That’s a couple of terabytes of disk a year I can buy. This isn’t the cheaper option.
- Reliability: it’s only as reliable as the vendor you entrust your data to. That’s why I’m using Amazon S3. I know I won’t wake up some morning to find out my backup storage vendor ran out of funding and is shutting down (or shut down without notice). It’s happened. A soon as you start bringing in services like this, you start having to qualify your vendors (i.e., all that nasty stuff I.T. does for you at work) and monitoring their operations and validating their services and paying their bills. Do you want to be your own IT department more than absolutely necessary?
- Recovery: Okay, pop quiz: how long will it take to download 45 gigabytes? If I ever do need to recover a catastrophic failure from S3, not only will my data set be incomplete, it’ll take me days (I’m guessing 2-3 weeks) to pull that data down. Assuming nothing goes funky and my ISP doesn’t decide I’m pirating music and turns me over to the RIAA or rate throttles me.
That latter’s a killer. I could handle “really slow” if it were cheaper, but the cost-benefit of online backups doesn’t match simply buying a couple of disks and stuffing them in a drawer at work. It’s slower, it’s more expensive, it won’t scale, your recovery will be more painful, AND you’re adding complexity and the need to manage a vendor relationship or two.
So I’m doing away with online backups. Convenient in some ways, but not cost effective, not simple, and if you ever need to recover more than one or two files, incredibly painful. And for one or two files, Time Machine works.
Some day, online storage will happen. But not now. If you’re considering it, think long and hard about the costs and hassles — and go buy another disk.
Some final thoughts
- Here’s a hint many people don’t think about; you don’t need to keep buying disks with enclosures; it’s quite easy to replace the mechanism INSIDE the drive. That can save you $50-100 per drive, which over time really can add up. Or use a drive dock, which allows you to buy bare drives and plug them in as needed without opening an enclosure. Simply wrap the drive back in the non-static bag (or buy some to keep them in), and they take up less space and cost you less money. It’s an easy operation even for a non-techie.
- Don’t wait for a drive to fail. You don’t have to wait for a drive to fail to retire it. If you copy your data to a new drive and retire the old one BEFORE it fails, you can stick your old drive in a drawer (as an emergency backup!) and save yourself the pain of having a drive fail. A little preventative maintenance does wonders here.
- Always buy bigger than you think you need. If you are currently on a 500 megabyte drive, replace it with a 1 terabyte drive. Or better yet, a 2TB drive. you’ll find ways to use it.
- Think in terms of “active” data, “accessible” data, and “archival” data. You don’t need instant access to every file every moment. If you come to grips with a plan for “what’s available”, “what’s handy if I need it” and “what I might need”, you can REALLY simplify your life and your backups.
- I handle archival data really simply: it lives on my secondary drive until I get around to copying it to an archival drive. I make a clone of the archival drive. One lives in a drawer at work. One lives in a drive at home. Once every year or two, I take the oldest drive and retire it, and copy all of the data to a brand new (probably larger, because I’ll need it). That way, you continue to migrate that data to new media and minimize the chances of “it died sometime when we weren’t looking” or “wow, we can’t READ that zip drive any more”. By making data migration to new media part of your backup/archival plan, you limit the problems you have going back to old data down the road, at minimal cost and no real pain.
- Burning to DVD? CDs? Don’t bother. First, if you aren’t using gold archival DVDs in your burning, the chances of having bit errors down the road are high, especially after a few years, and even archival Gold DVDs have longevity issues. And when you look at the cost per gigabyte of DVDs vs buying another hard drive, it’s a no brainer. We outgrew burnable media years ago.
If you take nothing else away from this article, do these two things
- Keep your backups as simple as you can while still doing the job: two copies of your data (three is better. Four is even better), at least one copy of data off-site.
- The best way to make backups painless is to never need them — and the best way to do that is to retire/replace your PRIMARY drives every year to 18 months. This is especially true for laptop users where drives get bumped around. Upgrade your working drives on a regular schedule, and you’ll significantly reduce the change of a drive failing on you at a bad time. And you’ll get a bigger (and probably faster) hard drive in the bargain. A 500 gigabyte, 7200 RPM Seagate laptop drive will run you under $10o. You can clone your data to it via Superduper (using one of the bus-powered enclosures, say, or the disk dock…) and then even if you pay someone to install it in the laptop, that’s still $150 — and that $150 could well make sure you never NEED the backups in the first place, and people never seem to think about doing this. Do it. To me, that’s money well spent.
Related posts:
- Following my own advice on backups…. While writing my article on backups (and it’s followup) I...
- Some more thoughts on backups. I had a couple of people email me on my...
- Will google reshape online backups? Another followup on my backup article. Google has recently announced...
- Backup Sunday Backup Sunday: Did you back up your data this week?...
- Another sad chapter in the storage woe chronicles dispatches: Another sad chapter in the storage woe chronicles: You’d...

