What to do when you realize you’re running out of disk…

One of the things that became painfully obvious during my trip to Yosemite was that I was rapidly running out of hard disk.  Being out on the road is not a good time to realize you need  a bigger disk, s when I came back, I decided to fix things before it became a real problem. Here’s what my overall “bits on things” setup looked like:

Now, there’s one obvious problem there that I hadn’t thought about — the backup disk is smaller than the main disk. I knew about that, knew I needed to fix it, and forgot. Not a huge problem, but one of those details you need to keep an eye on or they’ll bite you at an inconvenience moment. Even though I had 3/4 of a terabyte for my backup disk, Time Machine was only storing backups for about 3 weeks, which means it was no longer large enough. It was time to update and grow and upgrade.

The biggest problem — the new Canon 7D creates much larger images. That’s good, but creates ripples. It also does video, which I’m starting to experiment with. By the time I convert the 7D RAw image to DNG and store it on disk, it grows to about 49 megabytes in size. Pile up a few hundred of those, and “Hell, disk is cheap” starts ringing a little hollow. To give an idea of the change going from the 30D to the 7D, on the 30D I use a 4Gb memory card and get 400+ images on it. On the 7D, I upgraded to 16Gb cards, and I get 500 images on one. Moderate upgrade in number of images, big upgrade in amount of disk taken. Also, since the 7D shoots 8 frames a second sustained where the 30D shot 4FPS with limited bursts, the opportunity to generate LOTS MORE images quickly exists. And it definitely happens, so at the end of the day, I have more, larger images to store. This is, as they say, a good problem to have.

The easy answer — upgrade the laptop to a bigger disk — won’t work here. The biggest laptop disks now available are 500 Gigabytes. Larger than my 320Gb, but not by that much. Upgrading delays the problem by a period of time, but it doesn’t solve it. I considered doing that, then decided to bite the bullet and shift into the “it no longer fits on the laptop” universe.

I mumbled about this on Twitter, and immediately got back the “install a NAS!” response. NAS (or Drobo, or RAID, or name your favorite disk packaging setup) isn’t a solution — it’s a technology. You don’t start by choosing a technology, you start by figuring out the solution and then choosing things that implement them well.

I’ve written about backups and my philosophy on how to do them before, check out this piece as well as this followup, as well as this piece where I talk about why I stopped using an online backup solution in favor of sneakernetting an offisite backup somewhere. I am, for the record, looking forward to when the price/performance and the network broadband make this worth doing again, but not right now…)

So for me it’s time to shift my data into a multi-disk environment. I live on a laptop, which gets carried around. If your data no longer all lives on the laptop disk, then when you need that data, you have a problem. It behooves you to then think about your data and how you use it, and figure out how to store your data across your disks so that you have access to what you want when you want it.

For my purposes, “data” can be defined as “everything on your disk”, but in practice, I see no reason to think about shifting apps out of the Application folder or similar “optimizations”. You might be able to free up a gig or two of space, but why? That’s not significant, and it can lead to potential complications later, especially if you start mucking in your Libraries, preferences, caches, etc. The savings aren’t significant — or worth the future hassles or possible compatibility issues. So for me, unless you’re a font geek with 50 gigs of fonts or something like that, just worry about the data folders: Documents, Pictures, Music, Movies. (in case it’s not painfully obvious: this info is Mac specific. General concepts work for Windows as well — the nutty details are your problem on that platform).

A few key goals

Here are a few key goals of all of this:

  • Scales infinitely. Or close enough I don’t have to go through this again for a while
  • My data is available when I need it, wherever I am
  • Easy and intuitive. I don’t want something that’s difficult to do, or I won’t.
  • Reliable and easy backups: if your backups are difficult, you won’t. Keep it simple. Make it reliable.
  • Fast catastrophic recovery. I don’t want to spend days getting my data usable again
  • Recover a file or a disk. Some backup schemes work best for a crashed disk, others for a lost file. you really need both.
  • Backups on the road are even more important, not less. So make sure you can do them. And do.

Here’s what I ended up with. It’s not hugely different than before, but the changes create significant challenges to understand:

I took the bus-powered disk and upgraded it with a 500 gig drive. This means that instead of having 320Gb available, I now have 3/4 of a terabyte I can carry around and use without needing an electrical outlet. This is a significant detail: you really mess up the concept of a “laptop” if you have to plug it in to use it… Or worse, can’t because the data you need is inaccessible because you didn’t bring it.

Digression: for those of you about to tell me “just live in the cloud”, plesae don’t. The dataset we’re talking about is measured in gigabytes trending to terabytes, and it’s not practical. In reality I am using Google Docs and Dropbox more for some things, but for the set of things “the cloud” solves for me, they also live happily on my internal laptop disk. This is about figuring out now how to scale from having 1,000 photos in my portfolio and 10,000 in my collection to having 20,000 photos in my portfolio and 100,000 in my collection without everything collapsing in a heap, and those kind of data sets aren’t going to live online any time soon, nor do I particularly want them to.

So anyway, I now have three drives going. The internal laptop drive (320Gb) is where everything I need 100% of the time has to live. The external bus powered drive can store other files that I need access to on the road — but which I probably can live without for more casual usage. And my desktop drive (AC powered) stays at home and holds the data that I need easily accessible but don’t need to travel with.

I went through all of my data and figured out where it needed to live. There’s also an unlisted “fourth category”, which is data that lives offline, or on a disk that I maybe need access to once in a while but not keep plugged in, and I spent some time pulling all of that data off my disks and sticking it in a corner to archive into a drawer. (one could also say there’s a fifth category, the “why the hell am I hanging on to THIS?” category of things that ended up in the trash. Things like the Parallel’s virtual image of Ubuntu I haven’t booted since I installed it five months ago, which deleting freed up multiple gigabytes. And why did I feel the need for an Ubuntu disto in Parallels on MacOS, which is just a different flavor of the same thing? I don’t remember, but it seemed a good idea at the time…)

I can hear some of you groaning at the thought of sorting through all of your data. I sympathize. If you don’t want to commit the time to that, I understand — but — putting some time and energy into it now helps you understand what you have and how to organize it. It also means that moving forward you’l have a good sense on where stuff belongs, meaning you’ll spend less time thinking it through and organizing on the fly. And if you do it now, you probably won’t need to do it again for a few years. It’s little more than virtually filing everything in your office, and it never hurts to do that every so often.

It shouldn’t be assumed that you need to turn “Save File” into a “Getting Things Done” adventure — I’m definitely not interested in being that anal about all of this, but it is important to understand how you want to manage your data well enough to know if it’ll do what you need it to do and how well it scales. Scaling was the big issue for me. If I’m seriously having to worry about data in terms of terabytes, I’d just as soon not have to architect this all out again in six months. Once it’s settled down, it’s back to the “that pile on the desk is in the way, let’s put it in the files” mode again…

So here’s how I finally settled on filing things. My internal laptop disk:

And here’s what my secondary disk looks like. Note that it only has Music and Pictures folders.

The Music folder is where I’m storing the video files in my iTunes library. The audio (aka “music”) lives on the main laptop disk.  As my creation of video grows, I’ll add a “Movies” folder and split it up the way I do photos, but right now, there’s not much there.

And finally, my third disk, the one that stays at home:

The blue highlighted folders are folders on that disk that I exclude from the Time Machine backup:

which is an option more people should think about if they use Time Machine (or other backups) — some stuff you can live without if you need to, so why back it up? All it does is make it harder to do backups reliably. I flag them with color labels so I don’t forget which ones were excluded — I did that once and had to restore a disk, and spent half a day freaking over “missing data” until I remembered I’d excluded that data from the backups. Oops. It goes without saying, of course, that you should only exclude stuff you really don’t need back if there’s a failure, don’t exclude it because it’s large…

A big part of how this works (or won’t) is splitting up the photo library. In general, I split up my photos into four big piles:

  • flickr or better:  images I liked enough to post to my Flickr account
  • 2nd tier: photos which are technically fine, but which aren’t something I think should be posted on flickr. Most of these are effectively duplicates of ones that go on Flickr (think “eight frames per second burst rate”); you want them around in case you want to use them; you stick them somewhere out of the way because you have no plans to actually do so. In theory, these photos are all good enough to publish, except I have some other photo I think is better — but yo never know when you might want some specific expression or a left profile instead of a right profile, and so they’re here if you need it.
  • archive and forget: photos that are clearly not as good as the candidates I’d publish, but not bad enough to throw away. To be honest, as I’m getting more comfortable about my abilities as a photographer, I’m doing less keeping photos around that “someday I might try to fix this”. Instead, I ding them and throw them out. These are flagged to be taken offline and stored, and I fully believe I’ll never look at them again and some day throw them out. More and more, I’m comfortable with my choices and simply throwing them out and saving a step…
  • dings: And finally, the dings. As I do edits, the ones that are clearly flawed get thrown out and deleted. There are people who tell you to keep everything. I’m not one of those people. Disk is cheap, but it’s not free. Maybe some day those images will be usable (or fixable in photoshop, or whatever), but the reality is I have thousands of BETTER images I could spend that time on, so why bother? So count me in the camp of tossing the crap, especially when it quickly starts turning into gigabytes and terabytes of crap. Why make it harder to find the good images by having to wade through crap, or worse, create a filing system for offline images to keep around stuff you know in your heart you’ll never use? Let it go. Just because you CAN keep everything doesn’t mean it’s a good idea. It’s not.

This setup looks like it’ll scale for a good long time; I can, if I need to, move some flickr or better onto the 2nd disk and prioritize the internal drive to active projects; 2nd tier data easily moves to the “live at home” disk when I need to. I can subset my itunes library the same way if I want to, and the rest of my data isn’t going to grow faster than disk technology seems to be progressing, and as long as I keep my folder structure sane, I can tell at a glance what’s going on, both within the Finder and Lightroom. I can use Lightroom and Spotlight searching to find things if I need to, but with a bit of care the naming structure will let me browse into it quickly as well. It looks pretty solid.

I’ve spent the last couple of days migrating the data to this new setup and I’m now happy with it, at least for now. As I’ve settled in, I’ve made some changes – originally all three disks had Documents folders, I finally realized that either a document lived on the internal laptop or it lived on the “stay at home” drive; no need for a middle phase, it just complicated things. You’ll notice there are folders on the travelling disks to act as placeholders for the stay at home disk. This makes staging stuff to sweep over there easy, so I can stuff files places on the road and then go home and move them off of the travel disks. It may seem unnecessary or trivial, but I’ve found lots of peopple don’t think about that kind of detail, and when I explain it, they love the idea — it lets me make a filing decision at the time I’m using the data, and merely shove it into the file when I get home and not have to “remember” what needs to be filed days later. Make those decisions while you’re using something and then forget it — it’s a great hint for simplifying things.

And once my backups finally sync up and my data is fully redundant again, I’ll be happier. Currently, I have my superduper backups in place, I’m letting Time Machine sync up now. It can be butt slow at times…

Some technical details on implementing this

The drive I bought for the bus powered disk was the Hitachi Traveler 500G. I’ve been using Hitachi drives for my laptop drives for a while and find them pretty reliable. That doesn’t mean others aren’t, it means thse have worked well for me, so I continue to use them. The bus-powered enclosure I use is the Mercury On-The-Go Pro from Other World Computing. I’ve bought RAM and disk from OWC for years and have been very happy with their price, quality and service. I’ve used that enclosure for a long time with never a failure. Their stuff is well-engineered and solid and I feel it’s well priced, and I haven’t been in the mood to explore other vendors because this one works for me.

For my external drives, I use the OWC Mercury Elite-Pro housing. it’s solid, it’s build like a rock, it works reliably. As part of this rework, I’ve retired the last of my IDE systems and I only buy/use drives that have SATA interfaces.

Digression: Every so often, it makes sense to see how technology is moving and migrate away from stuff that’s aging and heading towards end of life — if you refresh your data onto modern storage, you won’t go looking for it some day and find out you no longer have a way to access it. I’m a big fan of refreshing all of my offline storage every couple of years so the chances of having a stored drive failed is minimized. I’m also a fan of keeping two copies of all offline data, preferably one offsite — just in case. Since I’m also a fan of refreshing my active drives on a regular basis (because the best way to never need your backups is to never run your disks until they die!), a nice way to do this is to replace your active drives every 18 months or so, then use the retired drives and copy all of your archived data onto them, and then take the oldest drives and stick them in your files somewhere.

Digression on the digression: I see no reason to ever give a used drive to someone else, either by selling, giving, or donating. I pull the drives out of computers and housings and file them with my tax papers and other files. Once in a while, I pull the really old mechanisms and “retire” them with a big hammer. That way, there’s absolutely no way someone can recover files off of a drive they bought in Goodwill and end up with your data — because it never leaves your hands. If you trust seven-way zeroing and are willing to spend the time to do so, bless you. I jut don’t think a used disk drive is worth the time and hassle to recycle for re-use…

The drive I’m using as my backup drive now is the 2Tb Western Digital “greenpower” Caviar Green with 64 Mb cache. There are cheaper drives out there, but this one has good reviews and is built for server service. In all honesty, there’s nothing quite so painful as finding out your backup drive has failed, especially if you find out while trying to restore something. I don’t want to overpay for this stuff, but cheaping out bites you down the road.

My backup drive is living in a NewerTech Voyager Hard Drive Dock, which allows you to insert and eject SATA drives easily. This means if I want to I can easily pull this mechanism and replace it with another if I need to “do something” with another disk. I’m just starting to use it so I don’t have reliability data on it, but so far, I like it. It’s solid and well-built at first use. I plan on using it for managing my offline archives as well, saving me paying for multiple enclosures down the road.

Geeky details on backups

The 2Tb disk is split into two partitions, one 500Gb and one 1.5Gb. I use two backup technologies, SuperDuper! and Time Machine. I love Superduper for system backups because it makes bootable clones. That makes catastrophic recovery a lot simpler: take your backup drive, plug it into a Mac, and boot from it (then make a backup of it before something bad happens!). Superduper runs nightly and refreshes copies of my two travel disks, which is why the 2Tb is split into two partitions. The 500Gb syncs up the 500Gb external disk, and the 1.5Tb is the clone of the internal boot disk and also is where my Time Machine backups live.

Superduper doesn’t do versioning or archival over time, it makes a snapshot of now. For the “I need that file I threw out two weeks ago” problem, I use Time Machine. It backs up all three disks (minus the exclusions I mention above) to the 1.5 Terabyte partition of the backup disk. Time Machine is useful for casual backups (it’s better than nothing and pretty good for get-single-file recoveries) but I don’t like it for complete disk recovery and after working with a Time Capsule for a while, I really don’t like Time Machine over a network. If anyone really cares why, that’s a whole different blog posting.

The good news is that SuperDuper and Time Machine co-exist nicely on one disk (thank you, Dave!) so I can do both easily, so I’m set up to clone my two key disks onto the backup disk, and then do a time machine backup onto it for incremental backups as well. If my boot disk crashes, recovery is (almost) as simple as booting the backup disk. Wonderful, since crashes almost always happen on deadline…

What this doesn’t cover yet…

There are a few details this new setup doesn’t cover yet. None of them are time critical, but all of them need to be considered and solved, and it’s important you know how to solve them before you implment (lest they blow up your work when you go “oh, damn, didn’t think of that” later). Fortunately, they all are solvable…

  • The new setup doesn’t include “on the road” backups. Since I no longer can carry a bus-power drive big enough to back up my systems, the answer is to carry a bigger, plug-in drive. I’m not worried about Time Machine backups on the road, so the easiest solution is a 1Gb external drive in one of my Elite-Pro housings. Even better, that’s cheap, and if I set it up, gives me an easy “spare backup” setup, because I love having a set of backups I only update every week or so, just in case something corrupts that I don’t recognize right away. So that’s probably what I’ll do. The other option would be to carry the 2Tb backup disk with me in the Elite-Pro housing, which also works, but which limits the number of redundant copies I end up having. I don’t like carrying my backup on the road if I can help it, I’d rather carry a “road” backup and leave the main backup at home. But both are options.
  • The new setup doesn’t make explicit the off-site backup storage. What I’m doing in the short term is taking my old backup disk offsite. In 4-6 weeks, I’ll buy a 2nd 2Tb disk, plug it into a dock, build it the same as my new backup disk, and run backups onto it, and then swap between the two (the other going offsite) every 4-6 weeks. That’ll fix this for a good while at reasonable cost.
  • The setup for moving files onto offline disks (aka “in the drawer”) isn’t spelled out, but is pretty simple: buy a pair of 500Gb SATA drives, plug them into the dock, copy the files to each, carry one offsite. Iterate until full, and then either start another set or decide some of the files can be deleted (or both). Every couple of years, take all of your offline disks, copy them to new (fewer, bigger) disks, and store them again.

But what about “install a NAS?”

I have to admit I’m not a huge fan of NAS in my environment, but I also realize that over time, the amount of data I’m storing on my “stay at home” disk is going to grow without bounds. My plan at this time is to convert that into a Drobo at some point, but not until I need to, so I’ll hold that off until later this year. I realize that at some point the percentage of data I can keep local to the laptop, even with 1 terabyte (500gig internal + 500gig bus powered) is finite, but I’m only using about 275Gb on those two combined right now, so I have some time before I have to worry about that…

Things like Drobo and a NAS add some capabilities, but they also add complexity, cost and new ways for interesting failures, which always seem to happen on deadline when you least can afford the issues. A NAS works best if you’re sharing data among multiple machines, since I’m not, it adds more complexity than it solves problems. Drobo is different being locally hooked up (and there’s a NAS enabler you can buy for it), but adds its own set of complexities and administration — so as long as (a) a single disk works and (b) I can back it up reliably, I’ll stick with a good single disk. Once you start getting into multiple disk environments and/or your backups start being tougher to keep reliable, the addition of mirrored RAID and some of the other features of NAS or Drobo become good to have, but again, I’m not at that point yet.

Finally — speaking of Terabytes

I’ve been around long enough that the thought of buying disk in terabyte sizes amuses me. My first hard drive was ten megabytes — MEGAbytes, not GIGAbytes — and I remember a time when a terabyte would probably store all of the data at Apple, and perhaps all of the data in the state of California. Today, I’m using it for backups of my personal data set. That amount of scaling in the last 30 years or so amazes me when I step back and consider it. But then, my phone has a lot more processing power and memory and disk than my first Mac did. I think my KEYBOARD has a more powerful CPU than my first home computer did….

This entry was posted in Computers and Technology. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.
  • Andrew

    If you need more internal space, you can sacrifice your optical drive and replace it with another 500G disk.

  • Andrew

    If you need more internal space, you can sacrifice your optical drive and replace it with another 500G disk.

  • Pingback: Why I don’t depend on Time Machine (and other followups to the backup note..) | Chuqui 3.0

  • VegasTom

    What are your thoughts on off-site storage schemes?

    • http://www.chuqui.com chuqui

      If you're talking about over-the-net setups, I used them ( http://www.chuqui.com/2009/12/following-my-own-… ) and stopped because I didn't think the pricing was good enough and the disadvantages (primarily because of broadband speeds) made them not ready for me. I expect that to change as prices per gig come down and broadband speeds go up, unless you get stuck with a broadband cap that's too low.

      If you're talking about physical off-site, for my purposes, I store my off-site disks at work. My general thinking is that anything that takes out both my house and my office at the same time will give me bigger issues to worry about, and it's good enough for now. If I was independent and didn't have a convenient lockable drawer, I'd go with a secure box somewhere rather than depend on them living at a friend's place. If you get really paranoid, you could fedex them to someone you know in another timezone, I guess. Mostly I'm worried about the building-catastrophe (house burning down, etc) and being reasonably recoverable; larger natural disasters that destroy a large region? I guess it comes down to just how far the the six sigma run you feel you need to go, and every decimal point gets increasingly expensive..

  • Marshall

    In late 1984, I bought a Corvus Omnidrive off the show floor at Comdex. Since it was a show demo, I got it “used”, at half price: $1250 for 11 MB.

    Yesterday, I went to Fry's (hardly a low price leader) and bought three 1.5 TB drives for $270+tax.

    That's a 400,000x more storage for 1/5th the price.
    That's an improvement of two million in 25 1/2 years.
    That's doubling about every 28 months.

  • http://decafbad.com/ lmorchard

    Thanks for the rundown! I've been eyeing up those drive toasters for awhile, and been wondering if they'd be useful enough for keeping around and rotating a handful of backup drives.

    Also… as for the future shock from storage media – I'm still somewhat floored that I could probably swallow a 16gb MicroSD accidentally and not really notice it. Not terabytes there, but offset by the tiny tininess.

  • evanrobinson

    Regarding size of drives:
    Like you, I started with a 10MB drive (in a Compaq Deskpro running at … 8Mhz?).

    When I joined Adobe, one job I got was “interfacing” with IT on behalf of Graphics Products. We spec'd a bunch of stuff, but the relevant funny is trying to figure out what we needed in the way of backup for the twin towers in San Jose. We surveyed a bunch of people to find out what HDs they had (IIRC, a couple of GB was pretty standard), did some multiplication, and concluded that there was something like 3 terabytes in the towers. That was about 1997, IIRC. I have that much on my desk, now — 500GB in the notebook, 1 GB in the main TimeMachine drive, three 500GB bus-powered drives for various uses, and more….