Back in 2009, I wrote a series of articles on why it’s important to back up your disks, and my strategy for doing so. It’s now 2011, my needs and strategies have changed somewhat, and so I felt it was a good time to revisit this and revise this and talk about it a bit.
The original article was More than you wanted to know about backups, and it goes into the philosophical background behind the strategy. I still feel this way and so I recommend you go ahead and read it as background for all of this. The quick summary:
- Turn-key and non-invasive
and my final takeaway on that piece is still valid:
- Keep your backups as simple as you can while still doing the job: two copies of your data (three is better. Four is even better), at least one copy of data off-site.
- The best way to make backups painless is to never need them — and the best way to do that is to retire/replace your PRIMARY drives every year to 18 months. This is especially true for laptop users where drives get bumped around. Upgrade your working drives on a regular schedule, and you’ll significantly reduce the change of a drive failing on you at a bad time. And you’ll get a bigger (and probably faster) hard drive in the bargain. A 500 gigabyte, 7200 RPM Seagate laptop drive will run you under $10o. You can clone your data to it via Superduper (using one of the bus-powered enclosures, say, or the disk dock…) and then even if you pay someone to install it in the laptop, that’s still $150 — and that $150 could well make sure you never NEED the backups in the first place, and people never seem to think about doing this. Do it. To me, that’s money well spent.
And that’s really how I continue to push my backup strategy: every task I have to do (instead of automate it) and every chunk of time I have to spend (instead of computers doing it without my intervention) are excuses to let it slide or lapse, and the more you let it slide and lapse, the more chances you have to finally get bitten by the “oh, now what?” problem. So I’m a huge fan of not over-complicating my backups because if I do, I know at some point, I won’t follow the plan, and the n at some point, I’ll be in trouble.
Back in April, I made some changes to how I did things and switched from using firewire drives for everything to a NAS. I talk about that decision in I have committed NAS; updating my data storage and backup strategy. I’ve refined things a bit since then and so here I’ve decided to pull this all together and talk about the strategy in one place.
Here’s a diagram of my current computer and disk setup
To give you a bit of scale; this setup consists of 2 500Mb 2 1/2″ drives, and 5 2TB 3 1/2″ SATA Caviar Green drives. That’s 11 TERABYTES of disk space allocated here. In 2009, it was 3 x 320Gb plus 2 x 500Gb for a total of 2.2 terabytes. In 2006, it was around half a terabyte. And honestly? My data needs are relatively modest compared to a lot of photographers, and absolutely tiny when you start talking to video geeks. But yes, 11 terabytes of disk (which cost me about $500) freaks me out when I think too hard.
The core change I’ve made in the last two years is deciding to break with the idea of keeping all of my data with me at all times. This complicates some things — you have to think through what has to be with you and what you can leave behind — and implement a strategy for doing that in a way you can actually make it work. But it simplifies things as well, because the data you leave behind you can be somewhat less paranoid about losing if someone walks off with your laptop or you drop it in a river.
I decided to use the SAN model over a firewire because it allows me to unplug the laptop from the desk and still have that data available as long as I’m in wifi range of the SAN. I chose the SAN I did (Dlink NS323) because it has hardware mirrored RAID, giving me thought-free redundancy on that data. The big issue with this is that it is no longer indexed by Spotlight; in practice, I find this not a huge problem, but for some people, that could be hell.
Oh, and you can’t back it up with either Time Machine (not that I would) or Superduper. And remember, RAID is not a backup. A mirrored RAID still needs a backup. That SAN makes it easy to swap drives on the fly; it’s possible to do your offsite backup by removing one drive, swapping in one, and rebuilding the RAID.
I decided not to do it that way. Instead, I plop a drive in a firewire housing, mount it on the mac, and rsync the data onto the disk. That’s fairly simple: “cd /Volumes/Volume_1; rsync -az . /Volumes/san_backup” and then sit back and watch the show.
To move about 8/10th of a terabyte off the drive via ethernet (wired megabit) onto firewire took about 16 hours. The nice thing about rsync is that after this initial copy, it’ll simply copy changes, reducing massively the data moved; I’m estimating a typical “once a month rsync” will take 3-4 hours max.
The advantage of rsyncing to a macintosh-formatted, mounted disk is this: if the SAN fails, I have the data in a native format that I can mount on the computer and use. I don’t need to figure out how to get it off the RAID-formatted disk or get another SAN box to make the data usable. And on a mac-formatted disk, if for some reason I need to feed it to spotlight, I can. it gives it to me in an easily usable, standard format that I don’t need special hardware to access — which makes it more compatible with the future and less risky based on any specific product or technology. Never hurts.
My primary dataset is about 200 Gb on the laptop drive, with about 250Gb free. that gives me space to go on a photo trip without worrying aobut running out of space (or I can buy a couple of disks for the trip if it’s extended). That drive gets backed up to the primary backup disk BOTH by superduper (a bootable clone) and TimeMachine (for individual file recovery). A separate 500Gb drive that is bus-powered also is used for SuperDuper, and I update it about once a week at home, and nightly while on the road (it travels in a bag other than the computer…). Once a month, I pop out the Time Machine drive, take it offsite, and replace it with the OTHER Time Machine drive, which updates and gets back in sync within a few hours.
So the 500Gb of “most important” data lives on the main drive, is backed up hourly to Time Machine, nightly by SuperDuper, weekly to the bus-powered backup, and lives offsite in two forms updated once a month. That’s SIX copies of that data on four drives, one of which may be a week old, one a month old. And having a week old drive is important: what if a drive fails but corrupts stuff and you don’t notice for a few days? Updating all your drives immediately isn’t a feature.
My secondary data (about 8/10 of a terabyte) lives on two drives in the SAN, mirror RAID. and on an offsite drive, updated monthly. So it’s on three drives. Since the data isn’t as important (by definition, sort of) but more importantly, doesn’t travel and can’t be dropped in a river, I don’t feel I have to be AS paranoid as the traveling data.
I’m estimating this setup will handle my data needs for the next 18 months, which means if it lasts a year, I’ll be happy. Even better, it scales easily: if the SAN fills, I can choose to either upgrade to larger drives (the 3Tb are avaialble now, but pricey) or add a second SAN and simply scale that way. All that would take is a unit, 4 drives (about $500 in current costs) and some thought about how to partition the data across them. It’s never a bad idea to think about how your setup will scale, because it will have to, sooner than you plan for it.
And now that I have this all down and automated, the time it takes me to manage this is about 1 hour a month to swap drives around and reset the backups. everything else runs without intervention on a normal basis.
Don’t forget my basic premise of backups: try to never need them. And the best way to do that is replace your drives before they fail. For a laptop, I try to replace them every year or so. I usually use the old one as the carry around backup drive. For the bigger drives, every couple of years seems rational. At $90 for 2 terabytes for Caviar greens, that’s CHEAP insurance compared to recovering data from a crashed drive. So be proactive here. Build a good, simple, “run without help” backup system, but also take steps to never need it by replacing drives well before their normal lifespan ends, especially on the ones that travel with you and get bumped and bruised over time.
If you want to go back and see some of my previous writing on all of this, here are the key articles:
- More than you wanted to know about backups (2009)
- Some more thoughts on backups (2009)
- Following my own advice on backups…. (2009)
- What to do when you realize you’re running out of disk… (2010)
- Why I don’t depend on Time Machine (and other followups to the backup note) (2010)
- I have committed NAS; updating my data storage and backup strategy (2011)
- Backing Up the Modern House (2006)