Looking forward into 2014

Where will 2014 take me? To be honest, I don’t know. I have some plans, but I also expect those plans to change. Mostly it’s about getting started, then iterating until I’m happy with them.

There was a time when I was a serious planner-of-things. I couldn’t get started until I could figure out where the end was and I knew the path to get there. The end result of that is too often lots of planning and not so much actual stuff to point at. It’s taken me some time to learn to be comfortable with going and course correcting along the way — but now that I’ve done it a few times, I much prefer it. The trick continues to be to figure out what to do; more correctly, have a strong understanding of what things NOT to do so you have enough time and resources to do things properly.

This list is always subject to change, but here are some of the things currently on the docket for 2014:

Morro Bay Winter Bird Festival. I’ve finally carved out time to go to a birding festival, in a favorite place. As part of this, I’ve put myself on my first pelagic tour, and I’ll be taking a guided birding tour of Carrizo as well as being able to explore the Morro Bay area with some of the local experts. If you’re going to be at the Festival and you see me, make sure you say hello. I’ll be down in the Morro Bay area for a couple of extra days, giving me time, I hope, to shoot the elephant seals again, and just hang out a bit and enjoy the area.

I’m planning a trip to Yellowstone in spring, probably just after Memorial day when the roads are plowed and the park is starting to wake up from winter. The current plan is to take a week and go, just me and my cameras, and see what happens, and spend that time in Yellowstone and the Tetons intensively shooting whatever I find.

Laurie and I have decided our fall vacation together will probably be to the Eastern Sierras for a fall foliage shoot. Way too early to figure out the timing or location for it but it’s something we’ve both been hoping to do.

The blog is going to be busier, as I have a number of writing projects I plan on kicking off. I am hoping to exit 2014 with two eBooks available based on the writing plans in place so far. On top of that, I still want to push the refuge project forward. I have a major worry there, though, in the drought, because even though we’ll get through this winter with water for the birds in the refuges, if we continue to not get rain I’m not sure there’ll be water available to flood the refuges properly next winter. Or for people, either.

And coming to some understanding of the drought and water challenges in the state has caused me to get interested in the “don’t call it the peripheral canal” central valley water tunnel project, a massive engineering beast that may dwarf the california high speed rail project in scope and cost before it’s done. It’s way too early for me to write about the project coherently, other than to say I don’t like what I see and what I’ve read so far reads like the science and research being done for it is scoped to validate that it ought to be built rather than investigate whether or not it’s a viable and needed project.

The scary thing is, I think an argument can be made there is no choice and it needs to be built; the water situation in California is that screwed up and all of the choices are bad ones. Which is why every time I wander through the central valley and see one of these signs all it generates is a hollow laugh:

Congress Created Dust Bowl

I’ve seen the water levels at the various reservoirs this winter, and you can argue politics all you want, but if there’s no water, there’s no water. And if something doesn’t change, we’re getting painfully close to the point where argument about water allocations is going to be meaningless. And it’s hard to feel too sorry for the agricultural interests when they’re planting so many acres in rice and cotton and other water-thirsty crops (but if they don’t, where are they going to be grown? And watered? it’s a nasty, difficult set of problems we’ve allowed to happen….)

So I enter 2014 looking forward to much, but also seeing some significant challenges that may directly impact things I’m interested in protecting, and for which there may be no answers, much less good ones. We’ll see as the year plays out.

And may it start raining soon.


Posted in About Chuq

Looking back at 2013

2013 was a fascinating year for me, on balance one of the best years I’ve survived in the last decade. It was also a year of transitions.

My photography has been in transition. In 2013 I shifted my wide angle work from Canon to Fuji, and so far, I really love the result. Beyond that, though, my work in general has been shifting from trying to take good and interesting images to trying to take good and interesting images that help to tell a story or share something with the viewer. Conceptually a simple change, in reality, the devil is in the details — and a lot of it boiled down to “I’ll know what it is when I find it”. A lot of my time and energy in the last year or so has gone into figuring out what it is. I’m still not sure I know — but I know where to explore and experiment. That’s enough of a direction for now, and we’ll see where it takes me in 2014.

The blog got a complete redesign, a major upgrade to the underlying guts, a new hosting server with more power and a growth path when I need it and a major edit, taking it from > 2000 articles to < 1000 as I removed obsolete and crappy content and tried to focus it on better material that lives longer than three or four days before disappearing into the archive never to be seen again. Out of that work popped my new For Your Consideration site, which I launched quietly in December and which I will be nurturing and working to grow in 2014. You’ll hear more about that soon (hint: don’t try to make noise around Christmas, nobody’s listening).

I added fewer new photographs to my collection in 2013 than in the previous two years, but the ones I did were of higher quality, I believe. I had about 20% fewer shooting days, but the days I did go shooting were more product in terms of usable images.  I spent some time studying high end printing in February, and that was a great investment: it’s a lot easier to make an image look good online than on paper, and the act of making an image stand out printed makes it even better when looked at on a screen. You can hide many image flaws on the relatively low-res online images; you can’t putting it on paper. I continue to believe that printing images should be a final step in preparing your best images to be as good as they can be. If you can’t make them things you proudly put on your wall, they aren’t very good images.

One place I’ve put a fair amount of time and effort this year is Google+. When they released the capability to build communities, I founded the Bird Photography group. It has grown beyond 5,500 members and continues to grow and it’s generating a lot of really high quality imagery from some amazingly talented photographers around the world. Along with that I manage the Bird Photography Today page, which can best be described as the TL;DR version of the Bird Photography group. Both of them have a lot of potential and myself and the other moderators have been talking about how to manage the growth and keep it an interesting and useful environment for both experienced and newer bird photographers. If this is of interest to you, I encourage you to stop by and get involved, because I think 2014 is going to have some interesting new things added to the community.

I entered 2013 searching for some answers, a direction to take, and some projects to make. I leave 2013 with the foundations in place, the projects started, and a vision for where I want to take my writing, my photography and my life. That sounds to me like a pretty good year overall… So we’ll see you in 2014 and we’ll see how it comes together, together.

Posted in About Chuq

Birding in 2013

It looks like my birding year is more or less done, I don’t expect to get out and do any significant birding until the new year. My year list closes at 191 species, highest since 2010. A small number for many birds (it looks like the big year record has just been broken at an unconfirmed 746 species) but it’s about where I expect my birding to end up.

I’m not a birder that chases rarities — twitch — or go out of my way to build the list count; I tend to return to a smaller number of locations because I enjoy watching how they change over the seasons; to me, birding is a recreation and a relaxation, not a competition, so my lists are there just to see what I did, not something to drive what I ought to be doing. There’s no right or wrong way to bird, it’s whatever makes you happy. My preference is to go out and relax and have a good time and see what happens.

That said, you can gain understanding from looking at the information in the lists, and they can help you plan for future trips. Since I don’t travel much, most of my birding is local to home — Santa Clara county (102 species), Merced (94), San Luis Obispo (78) San Mateo (59), Monterey (48) were the top counties for the year. Interestingly I never birded in San Benito county (Panoche Valley), primarily because the drought has hurt habitat out there. Still, that’s something I’ll fix before spring.  I also don’t bird to my north: San Francisco, Marin, and out into Alameda and Contra Costa. I don’t really know why, either, and those are areas I really ought to be exploring.

My life list for Santa Clara County is now 206 with 5 new birds: Phaenopepla, Yellow-Bellied Sapsucker, Horned Lark, Wrentit (finally), and Lazuli Bunting. eBird shows 282 species for the year in the county, so I only found 36% of the county birds this year (the top birder for the county hit 262 species). My rank in the county is 64.

It was a good year for seeing new species — I was able to add 12 new species this year, Pacific Wren (although it may have been a Winter Wren, either would be a lifer), Bell’s Sparrow, Mountain Plover (finally — they’ve been hiding from me for four years now), Yellow-bellied Sapsucker, Swainson’s Hawk, Black-chinned Hummingbird, Lazuli Bunting, Pileated Woodpecker, Williamson’s Sapsucker, Black-capped Chickadee, Swainson’s thrush, and while sitting in Mom’s back yard after christmas, Nutmeg Mannikin when a small flock wandered by to say hi.

The Mannikin’s were an interesting find, since I wasn’t looking for them (or anything), but I noticed a flock of Bushtits had wandered into the yard and were flitting back and forth among the trees bugging — only to realize the flock was silent. Bushtits are many things, but quiet isn’t one of them. Out popped the binoculars, and one last bird gets added to the lists… Just a reminder that so much of birding is about behaviors and sounds, not how the birds look…

In 2014? Probably more of the same, although I expect the Yellowstone trip I’m planning will give me some species outside my normal range.  it’s all about fun, not about numbers… At least it is for me..


Posted in Birdwatching

Apple Can’t Ban “Rate This App” Dialogs

Apple Can’t Ban “Rate This App” Dialogs – Marco.org:

We could all rate these apps lower as a form of protest, but it’s unlikely to have a meaningful impact. The App Store is a big place.

We could vote with our feet and delete any app that interrupts us with these, but we won’t. Are you really going to delete Instagram and stop using it? Yeah, exactly.

We’re stuck with these annoying dialogs. All we can really do is avoid using them ourselves and stigmatize them as akin to spam, popup ads, and telemarketing — techniques only used by the greedy, desperate, shameless, and disrespectful.

Sure they can… This is an engineering problem. It’s a classic biggish-data problem.

Attach a “report this as inappropriate” button to the things you want to police. Any time someone reports it, it’s stuffed into the database as a record (thing, datestamp, userID that reported it)

Now, hire a small team. their job: manage the biggest problems, where we can define biggest various ways, but typically, a combination of number of reports and the velocity of those reports (how quickly they come in; six reports over a month is a lot less urgent than 6 reports in an hour). Their job, to start, is to evaluate reports and either validate them or reject them. If they’re validated, then whatever we decide is appropriate happens (the ‘thing’ is removed from view, the developer gets a yap letter, the developer loses privileges, etc…). If they reject the report, then nothing happens.

Sort of.

What you want to build here is a reputation system. Every time this team validates a report, everyone who made that report gets their reputation value incremented. Every time a report is rejected, those that reported gets their reputation value decremented. Over time, you’ll build a data set that will tell you how reliably a person giving a report is in sync with the standards of those judging the reports. 

You can use that data to build automation into the evaluation process. As someone’s reputation value goes up, we’re creating a trust metric that those reports are valuable and accurate, so those reports get bumped up the queue into the evaluation team. As someone’s reputation value goes down, you de-prioritize those reports, and at some point, you simply throw them out: Once someone’s proven themselves to be reliably inaccurate about reporting, you simply filter them out of the system (this will, as a side effect, do a good job of neutralizing the trolls that use the abuse reporting system as an attack vector; that’s something Facebook is amazingly bad at dealing with…)

You can take this to the next level. Once someone gets to a certain level of reliability, you can trust their reports. There will be a smallish set of reporters where you can assume those reports are correct and act on them without intervention by the review team. These reporters become an extension of the team in effect. 

Expand this reputation management one level further: anyone who reports a violation that one of these “extended team” reporters reports gets their reputation extended as well; you can go the other way as well — anyone who flags as a violation material that the known “troll team” anti-reputation group flags gets their reputation dropped.

What this will do, over time, is create a reputation metric for every user reporting violations in the system; the highest rated users can be trusted implicitly and their reports are acted upon automatically. The reports in the next group down are prioritized to the evaluation team by a combination of the likelihood that it’s a valid report (based on the combination of the number of reports, the velocity that those reports come in, and the consensus reputation of those doing the reports). 

The trolls will tend to report way out of sync with the mainstream of the community, and as they get identified, their actions will allow you to identify the clique they’re working in and over time they’ll trash the reputations within that clique and all of that data will get minimized or ignored, effectively neutering them. 

There’s no need for the in-person evaluation team to scale massively, it needs to be big enough to manage the important problems, but more importantly, they need to be able to understand and consistently implement the policies, because what they’re doing isn’t necessarily policing, but identifying the extended team that will be doing the policing; this system depends on reputations being built over time based on appropriate implementation of the policies.

Over time, with a good database and some number crunching, you can create a policing system that is a combination of community-self-policing (because if the community isn’t reporting it, it’s not a problem), and administrative oversight (because reports are initially judged by the owners of the system, and the reputation is built around how well issue-reporters report in sync with the administration policies). What you end up with is a system where the most trusted users are automatically identified and then used to police the system based on the policy decisions made by the core administration team. 

And as a nice side effect, the worst trolls and abusers are neutered, and if you really want to, you can have the system identify them and take them out behind the shed and Old Yeller them out of the community if you want… 

This is a variant of a well-solved problem, which is the one of email spam (a flavor of this kind of system has been used by Amazon for years to float the best reviews to the top and the worst reviews out of view; I’m always amazed that companies don’t borrow from them more often). The problem isn’t that it can’t be done, it’s that the companies involved have instead decided they can get away with minimal effort and avoiding responsibility for policing their communities (Facebook, staring at you big time again) instead of digging in and solving the problem. the recent Twitter kerfluffle with their well-intended (but stupidly thought out) block policy is another example of how the people running these systems don’t see managing these problems as a priority. 

When I was at Palm, the number one issue I heard from developers about was abusive, irrelevant and 1-star reviews; this is a big issue because that star rating is, if not the number one deciding factor in buy/not-buy, in the top two or three. A shift in your average rating from 4.8 to 4.5 could kill half your sales. 

Unfortunately, nobody at Palm cared or wanted to. When they shipped the WebOS App store, in fact, their was no interface to view reviews, much less police them. Two years later, when I left, there was still no interface to deal with them (and no plans to build one) other than some hand-built crap I did on the fly to give me some ability deal with the worst of it (that hand-built crap involved mysql dumps of the production data, perl scripts to implement blacklists, a web site to let me bring up and delete stuff manually, and then creating a script that the DBA would run against production to implement the changes. Not exactly real time)

Unfortunately, some variation of that “we don’t care what’s screwing over the developers as long as they get their apps into the store” seems to exist on most of these platforms (and I, for one, don’t miss trying to fight that fight much these days). 

This stuff can be fixed, it’s just not a priority. I couldn’t even get the product managers to look at possible fixes at Palm, even though I volunteered to build the damned thing on the side. 

These are all communities, and they’re all social systems. That’s something a lot of organizations don’t recognize. As a result, many times they’re designed and built by people who don’t understand (or use) social systems, and so all of the necessary management and feedback systems aren’t there.

So rule one: don’t let people who don’t grok social systems build social systems. 

That seems like a simple one, but honestly, it’s amazing how often it gets ignored..

In any event, it’s not that Apple can’t fix these problems. It’s that they don’t. There are known ways to manage these problems that will scale without throwing huge staffs at them. 

They’re just not priorities.



Posted in Computers and Technology, The Internet

More than you want to know about backups (the 2013 edition)

I think computer users can be broken down into three camps:

  • Computer users who haven’t had a hard disk fail and haven’t yet figured out they need to back up their systems.
  • Computer users who have had a disk fail but still don’t back up their systems reliably (or at all), even though they know they should.
  • Grouchy old computer geeks who yell at the first two groups because we’re the ones who get that call at 10PM because a disk failed and they need a file back because they’re on deadline and oh my god please help me I don’t have a backup what do I do?

I warn you up front, I am one of that last group. My goal is to convince you to start backing up your computer before it’s too late, because I want those late night on deadline oh my god I’m doomed please help me phone calls to stop. Even though I know it’ll never happen in my lifetime.

A hard drive is a spinning mechanical device with motors and magnets and bearings and a read-write head that flies milimeters away from the surface of the platter where the data is stored. It is inevitable this device will fail. Not IF, but WHEN. Newer computer use SSDs, which are solid state devices instead of spinning mechanic ones, but they, too, fail.

That’s the reality: whatever you store your data on is going to fail some day. If you don’t plan for that, bad things will happen. And when bad things happen, you call your geek friend late at night blubbering and crying and asking for help. Neither of us want that.

You can’t prevent the failure, but you can reduce the chances of it happening, and you can back up your data so that if a disk fails, it’s not a big deal, because that data also exists on another hard disk. Or two. Or three. The more the merrier.

This article will help you understand how to reduce the chance of that failure and to limit the pain and damage when it happens.

The Best Backup is Never Needing your Backup

The best and most reliable backup is never needing to recover data from your backup. You can never guarantee that a drive will never fail — but you can reduce the chances of it happening.

How? Simple: replace your drives before they fail. Backblaze is a company that will back up your data over the internet to their servers. They have lots of data on lots (and lots) of hard drives, and it’s their job for that data to never be missing. They’ve got lots of experience with failing hard drives and how long it takes for one to fail, and they’ve been nice enough to provide the data. If you’re interested in the details, read their study. The executive summary is that after a hard drive is three years old, the failure rate starts to rise rapidly. So the first thing you can do to reduce the chance of a hard drive failing on you is retiring it and replacing it with a new one before it gets to be four years old.

I take this one step further: if you have a laptop that you carry around, that laptop tends to get bounced and jostled. Inside that laptop is a hard drive, which is also getting jostled and bounced around. My experience is that laptop hard drives have a tendency to die younger than hard drives in machines that don’t move around, so if you have a laptop, you really want to replace that hard drive earlier.

My hard drive policy is simple:

  • Any hard drive used I use as a working drive (attached to a computer and powered up for use on a daily basis) is replaced when it is between two and three years old.
  • Any hard drive installed inside a laptop is replaced earlier: between 18 months and two years.

That doesn’t mean their useful life is over: the drives I used as my day to day drives get turned into backup drives (unless they’re too small). They’re used as backups until they’re around four years old, and then they’re retired.

Backup drives tend to be powered off a lot more, their usage is much lower, and you don’t put them under stress. That reduced stress means they’re less likely to fail. You use a drive hard when it’s new, give it a reduced role as it ages, and retire it before it hits that point in time where failure becomes likely.

If you do that, you will rarely have a drive fail on you. It costs a little money, but the cost of a new laptop drive these days is under $100, so it’s not that expensive. It’s a lot less expensive than the time and stress of recovering from a failure, that’s for sure.

A note on SSDs: As SSD (solid state drives, with no moving parts) mature, they’re rapidly replacing spinning drives for data storage. The failure tendencies of SSDs are a lot different than for hard disks, and it can be much different from one manufacturer to another. So what should you do about replacing aging SSDs? I don’t know yet. My current (tentative) plan is to let the SSD in my laptop go for three years and then replace and retire it rather than make it a backup drive, but that’s subject to change once I do more research. I still think the 3 years and out concept works for them, but I don’t think you need to be as aggressive moving them out of a high use mode.

A note on Hybrid Drives: Apple and some other companies are shipping computers with what they call a hybrid drive, which is both a hard disk and an SSD merged together. My view right now is that you treat them like hard drives and replace them like one, but I haven’t looked into the real-world failure tendencies of them yet.

Setting up backups

Even if you never have a hard drive fail, you still need backups. There are many ways for your data to disappear other than a drive failure: your house or office could burn down. Your computer could fail and scribble Shakespeare’s Sonnets all over your disks and data. You could be sitting in Starbucks and watch as someone grabs your laptop and runs out the door. You could drop your laptop (yes, I know, that never happens, right?). There are many bad things that can happen to your data.

The only way to protect yourself from these bad things is to keep multiple copies of your data. and since if your house burns down it may destroy everything inside it, not just your computer, you need to keep those copies in multiple places. This can turn into a hassle quickly, and one reality of backups is that the more hassle they are to do, the less likely it is you’re going to do them. So we need to keep doing and managing backups as simple as possible (but not too simple to be useful).

The basic goal of your backups is therefore to have at least three copies of your data, and have those copies exist in two independent locations.

My basic setup: back up data do a separate disk on a regular basis, and then swap that drive to an offsite location once a month. This gives you three copies of your data: on your computer, on your backup drive, and on your offsite drive. It minimizes cost, because you only need two backup drives that you swap. It limits the hassle factor, because as long as your backups are run automatically, you only need to intervene once a month to swap drives and take the updated one off-site.

One of the tradeoffs: not all of your data will be in all three places; your newest data won’t get out to the offsite until you swap disks at the end of the month. Remember, though, that the offsite backup is there to recover from catastrophic disasters (house burned down! oops!); the compromise between reduced hassle of constantly swapping that drive and losing some data in that situation is a reasonable one; in reality, you are unlikely to ever need that catastrophic backup. But if you do, you’ll be glad it’s there.

That said, it never hurts to have more copies of your data. You can do this in a number of ways. Using an offsite backup is one — our friends Backblaze, for instance, or Crashplan is another option. There are other companies doing this as well. The downside is that these services use your internet connection and that connection can be slow; if you have a lot of data, it can take a long time to upload them to the remote backup server and if your data fails before it’s backed up, you’re hosed. That’s one reason why I like to use these services as a supplemental backup and not a primary one.

Some ISPs put data caps on your internet connection. If yours does, doing an online backup could cause you to use more data than the cap allows and you can find your network throttled to a really slow speed, or turned off completely. Before you go online, you need to understand how big your data set it you want to back up, how long it will take to upload, how long it might take to recover if you need to, and whether you have a data cap to worry about. I generally recommend that people consider using these online services to back up the important data, but not everything.

Another online option are services like Dropbox or box.net or Google Drive. These services turn a part of your hard drive into a virtual folder that gets copied onto their servers, and then copied down to any other computer that you set up to share that virtual folder. This can be quite useful if you use multiple computers at different times, but it can also act as a kind of backup because the data gets copied to multiple places. It’s not something you should use as your primary backup, and like the other online backup services, slow network connections and data caps may impact its usefulness.

These are all ways to create multiple copies of your important data in relatively painless ways that you don’t need to spend time managing.

 How to back up your data

This section assumes you’re using a Macintosh. If you don’t, there are other equivalent tools you can use to back up your computer, but I’m not the person to tell you which one to use.

Backing up a Macintosh can actually be very simple: use Time Machine. For a lot of people, this will work quite well and it’s free with all copies of Mac OS X. I use Time Machine for part of my backups system because I like it’s incremental backups so you can go back and find a file and it’s data at a given time.

Time Machine’s big weakness is large data sets. Because it’s doing incremental backups, it is going to want a backup drive larger than the amount of data you have created. I’ve found that it works best when the backup drive is at least 2X the data being backed up, and I prefer 3X. This means if you have, say, a 500Gb boot drive in a laptop and a firewire drive with 1.2 Terabytes on it, your total data set is 1.7 Terabytes. Time Machine is going to struggle keeping that backed up on a 2 Terabyte drive, so you really need 3TB for your backup at a minimum. If you update large parts of your stored data, you can really give it indigestion (for instance: take 1000 photos in Adobe Lightroom, and assign a new keyword to each, and make sure the updated metadata is flushed to the DNG with an embedded XML sidecar. You just created 60-70 gigabyte backup). The larger the data set, the larger the disk Time Machine needs to back it up and work efficiently, and as your data set continues to grow, this is going to be a challenge.

I am not a big fan of Time Machine to recover a failed disk. I’ve done it, and sometimes it works fine, and sometimes it’s fought me and taken forever to get the data restored. Apple’s done a lot of work improving Time Machine since the early days of Mac OS X so a lot of my reservations about it aren’t true if you’re running Snow Leopard or Mavericks — but I still prefer to have a way to recover an entire disk as well.

For that I use Superduper. This tool makes an exact clone of a disk, one that you can plug into a computer and use without any work; even boot the computer from it. I use it to make bootable copies of my computer’s main drives; so if I lose one, I can clone a copy quickly, or just boot the backup drive and get back to work. And it creates another copy of my data for me (never a bad thing).

Do you need this? How badly do you want to protect your data? How quickly do you want to recover from a drive failure and get back to work? How many hard drives are you willing to buy and manage? If your data is really worth the effort, it’s a good way to create a reliable and quick-to-recover copy of it — but it does entail more time, energy and money. Whether it’s worth it to you is a decision you’ll have to make. It’s worth it to me.

I am not a fan of Apple’s Time Capsule for backups. It’s very simple, but offsite backups are effectively impossible. Recovering a failed drive from it takes time, and it’s hard (to impossible) to replace the drive as it ages. I want the ability to upgrade my WIFI router separately from my backup drives. And Time Capsule is not a good solution if the number of computers to be backed up is two or greater. I do use one in one specific situation: my mother’s house with my mother’s Mac, where absolute simplicity is the prime directive. If your needs are simple and you’re willing to forgo offsite copies of your backup, it’ll do the job, but I think for most uses, it’s not the right solution.

What if I have big data sets?

As your data set grows, it gets more complicated. As the number of computers you need to back up grows, it gets more complicated. As it gets more complicated you’ll need to spend more time (and money) making sure you have good reliable backups and that the backups work. If you’re a serious photographer or a videographer, you’ve probably stopped thinking about gigabytes and now think about terabytes.

You can keep plugging disks into your computer to store all of that data, but that’s expensive, unwieldy, and backing them up is a horror (so chances are, you’ll stop and pray nothing bad happens). That’s a disaster waiting to happen. So at some point, you need to start thinking about disk subsystems, or network-based disks, or some other setup designed to handle large sets of data.

I’ve recently hit that point, and my choice was to go to a NAS, or a Network Attached Storage device. I talk about that in some detail in Should you consider upgrading your home network to a NAS?

Is this an option you need to consider? Here are my general guidelines:

If you’re managing a single computer, a NAS probably doesn’t buy you much, until your data set starts growing past 4Terabytes. At that point, you’re talking about plugging in multiple drives and multiple backup drives and things start getting complex, and the NAS will make your life easier and you’ll end up buying less hardware over time. If you’re someone who is wandering the house/office with a laptop wireless, a NAS starts making sense sooner because your data can live on the network and you don’t need to plug in to work on that project as often.

If you’re a multi-computer environment, the complexity of your data management and keeping your backups going reliably is going to be harder and harder. The NAS helps a lot with that, and so you should consider it. I think a good general metric is when you hit 2-3 computers and your total data you have to manage hits around 4 Terabytes, it’s a good time and cost effective to start considering a NAS. If your data requirements are small, you may not need one, but if you’re a photographer or videographer, your data requirements aren’t small any more.

Once you hit 5-6 computers in the installation, the advantage of centralized online backups to the NAS seem to be overwhelming. you’re an idiot to not consider it. IMHO.

If you’re in a single-computer setup, another option are dedicated disk arrays that connect via Thunderbolt or Firewire like the Drobo. I personally think the NAS is a better option and most of the time will be less expensive and more flexible, because I like the ability to connect to it over WIFI if I grab the laptop and wander around the house. The direct-connect systems like Drobos, on the other hand, will win on pure performance, so if you need absolute max performance, they’re your better option.

My backup strategy

I’m going to close out by documenting my current backup strategy. Not everyone is going to want to implement all of this but I want people to see what I do and understand why, and have the ability to adopt in the pieces that make sense. My data situation is moderately large and I have predicted that growth will accelerate. We’re a three computer family, two of us are photographers and I’m starting to work with some video. My photo collection is well past 30,000 images, and my wife’s is 20,000+. So we have a big hunk o’ data.

I’ve just migrated to using a NAS, and I no longer have a second (or third, or fourth) drive attached to my computer. I have the boot drive on the laptop, which is a 500GB SSD and everything else lives on the NAS.

My wife keeps her data on a mirrored RAID drive (in part because she hasn’t had time to sort out what should get moved to the NAS). All three computers are backed up to the NAS via Time Machine. The NAS has a backup capability, so I back up all of the data onto two external drives, and its those drives that get swapped offside monthly.

Here’s a diagram that shows everything involving data on the home network


Here’s what’s going on:

  • Each machine uses Time Machine to back itself up to the NAS. Each machine has its own partition with a quota set on it, because otherwise, Time Machine will grow the backup to infinite size. The quotas are around 3X the size of the backed up data.
  • Each machine uses Superduper to write update a disk image on the NAS, kept in the data volume as a Sparse Bundle. I can load that onto drive if I ever need to do a recovery.
  • Each machine has access to a personal data volume and  a shared data volume we both use. I have my iTunes library out there and shared, and I keep a morgue, which is data I keep but which if I lose, I won’t die, so it doesn’t need to be backed up (I currently do, but as my data set grows, I’ll stop that).
  • The NAS backs up to two disks. I don’t need two today, but this gives me breathing room so I don’t have to update this for a while. A second pair of disks lives offsite and is swapped by sneakernet monthly. (for what it’s worth, a full backup of the NAS currently takes about 3 days).
  • I have two other disks hanging off my Macbook; these are my travel disks. One is a 500Gb drive that is bus powered (no need for external power); I use that to clone my laptop drive every night when I’m on the road, and I plug it in once every week or two and update the clone via Superduper; one more copy of that data hanging around. The other travel disk is a 500GB mirrored raid that’s bus powered, and I use that to store data on longer trips when the size of my created data is larger than my internal drive can handle. With photos and video, that’s not hard… Both of these drives are from Other World Computing and built like tanks.

What this means is that once it gets copied offsite, all of my data lives in at least three places (NAS, backup, offsite backup). 24 hours after creation, it’s in two places at the minimum. Any data that lives on my laptop drive ends up with at least five copies, and I also use Dropbox for some data, which makes even more copies including at least two computers at work…

That’s my comfort level for trying to prevent data loss. Do you need to do all this? Depends; how bad would it be to lose your data? Choose the pieces that get you to your comfort level. The really good news is that once this is all set up and running, it takes almost no time to keep going; other than swapping the backup disks (which takes up an evening, roughly) on the NAS, it’s all automated.

Setting it up takes time; getting fully running on the NAS too me two and a half weeks. And it takes some money to invest in the gear you need to add to get things going. But those are investments in not having that freak out panic attack later when a disk fails.

And you’ll sleep better at night. I know I do.

How comfortable are you at the thought that someone just grabbed your computer and ran out the front door of that Starbuck’s your sitting in? Will your backups protect you? If not, you have some work to do…


Posted in Computers and Technology