Silent email filtering makes iCloud an unreliable option

Posted by on Mar 2, 2013

Tagged with: , , ,

Silent email filtering makes iCloud an unreliable option | Macworld:

Last November, our friends at Infoworld reported that Apple’s iCloud email system silently blocks emails containing certain phrases. And that hasn’t changed in the intervening months, as Macworld UK reports. Granted, the phrases in question may not be the kind that you’re likely to exchange with your correspondents. Through our own rigorous testing, we’ve managed to confirm that emails containing the phrase “barely legal teen” are simply never delivered to iCloud inboxes. In fact, we found that even emails with the offending phrase contained in an attached PDF—even a zipped PDF—were blocked. Even if you, like us, would almost never receive a legitimate email with such a phrase, this could still be problematic.

Back in the day when I was designing and building the original (oh my god…. see note below), one of the things I wanted to do was try to limit the ability of those occasional disagreements from flaring up into full-fledged flamefests (this is, of course, still one of the holy grails of community management). I decided to try to see if we could catch them as they escalated by adding a “PG-13” filter to the incoming email; the idea being that when the language started escalating into profanities that things were probably getting out of hand. The hope was that if users got their nasty words bounced back it’d make them back off and think twice. Or at least give the admins some warning and time to wander in and see what was going on and intercede.

The filter was pretty simple regex checks, looking primarily for the “seven deadlies”. And it worked pretty well, except when it didn’t. 

I soon got to know a great Mac programmer by the name of Igor Livshits. We had a number of great conversations about the strengths and weaknesses of simplistic pattern matching in spam filtering. I started tweaking the filters so that Igor could actually use the mailing lists again (you DO see the problem, right?) — and spent time over the next few months testing and tweaking and tuning. And ultimately, I removed all filters except for the Big One, because there were just too many false positives.

And that’s the problem. Users hate spam, and want it to go away. Until their email starts disappearing or being rejected by over-aggressive filters. And then everyone learns that the only thing worse than spam are false positives. So if there’s any questions about legitimacy, the email needs to be let through — and honestly, reputation systems have really solved this problem to a couple of decimal points.

So filters like this seem like a good idea, but if they start trapping real email, they need to be turned off. And blackholing emails makes it even worse. Yes, it’s a hassle and a resource suck to reject and return as bounced spam emails, but if you don’t, then you lose any chance of a feedback loop to let you know when your system is throwing these false positives. And that’s bad. 

And the bottom line? be really, really careful building systems where there aren’t good metrics on accuracy and feedback loops that can tell you if the system is misbehaving. Even if this filter is 99% effective in trapping spam, blackholing that other 1% is a really bad thing because it impacts the reputation of your entire service. And since you don’t have feedback loops in place, you don’t know, until way too late…

(note below: taking a look at for the first time in many years, I see — it’s still basically the setup I built and handed off, including using Mailman 2.x. Part of that is sad, because the reality is email systems simply haven’t been innovating much over the last 15 years or so, but mostly, I think this is neat, because it’s rare and awesome to see a system you built still humming away years later where nobody saw any big urgency to rearchitect or throw it out and replace it — when stuff just works, that’s the best result you can hope for…)