New and interesting uses for webmail…

Reports of Usenet’s Death Are Greatly Exaggerated
breaking the 200 barrier… (with a bullet!)

For the last couple of weeks people at work have heard me muttering in the halls about “those damn geeks”. I’ve been chasing down and cleaning up after a group that’s been using the webmail system as a distribution system for — stuff. Mostly warez cracks and video, from what I can tell.

Since this seems to be fairly widespread and flying under the radar at most sites I’ve talked to about this, I thought I’d give it some wider visibility and go into some of the details.

I want to emphasize this part:

Let me say right up front: no system cracking involved here, no security issues, no hacks, no cracks, no leaks, no bugs. They are simply using these systems as designed, not doing anything to penetrate or compromise the system.

Nothing was hacked in any way, this is purely (in its way) a social engineering hack taking advantage of free webmail sites all around the internet — I saw at least 15 involved from my investigation.

I’d noticed some changes in network usage on the site the previous couple of months; bandwidth usage had doubled in both May and June, far beyond what I thought normal given the growth in new users we’re seeing. It didn’t seem too serious, though, so I stuffed it in the back of my head to investigate at some point.

Early July hits and I look at the numbers again — and in the first 7 days of July we’ve used 10X the network bandwidth we used in all of June. We’re talking orders of magnitude change, for no good reason.

That’s generally a bad thing. So I went looking….

What I found was both fascinating and a little depressing. It was a group of people based in Poland that have turned public webmail systems into the equivalent of a Bittorrent network.

Let me say right up front: no system cracking involved here, no security issues, no hacks, no cracks, no leaks, no bugs. They are simply using these systems as designed, not doing anything to penetrate or compromise the system.

Here’s how it seems to work: when they have a package to distribute, it is packaged up into pieces small enough to be attached to and sent as emails. Most webmail systems allow attachments up to about 10 megabytes. Files were split up and encoded in MIME as standard packages, although the details of name and type seemed to be ignored (lots of powerpoint files, in theory).

Then accounts were created on various webmail sites. In my sample of addresses, I see over a dozen different sites being used. The person doing all of this then emails the files to that mailbox, where they sit. Now, anyone who wants that set of files only has to get the access information for one of those accounts, log in via IMAP and let his email system download them. It looks like any given package is stored on between 3 and 8 different webmail accounts.

Account creation seems to be semi-automated. All accounts are of a similar format, a semi-random “word”, followed by a 1-3 digit number. Passwords use the same format (but are never the same), ditto the “from” address and the “return-path” in the headers of the emails. Sometimes the files are stored in more than one account on a single webmail (another reason why I think this is at least semi-automated), but generally, it’s sent to 4-6 webmail accounts on 4-6 different sites.

It looks like the actual account creation is manual, or semi-manual, because some of the sites involved use CAPTCHA on account creation and that isn’t stopping them. I don’t think this setup is sophisticated enough to have cracked CAPTCHA, so there are people involved in the setup. I think the account naming, and packaging is automated, but people are involved in the account creation and uploading. Once someone downloads the emails, there seems to be another script to put it all back together again, because it’s not depending on the MIME data in the message to do naming or decoding — in fact, that stuff is set up to (at least casually) make the content itself look innocent.

There’s obviously a web site somewhere that tells you how to access the mailbox to get the content, but I haven’t gone looking for it.

If you think about it, this is a pretty nice hack. With Bittorrent being scrutinized by many ISPs, they’ve set up a fairly low-tech, under-the-radar way of distributing “stuff” without easy detection. The original distributor only has to upload the files once, and then the rest of the resource costs are borne by the mail systems — the webmail site pays the network to upload the files into the system, pays for the disk to store them, and pays for the network to distribute them back out.

Needless to say, I spent some time shutting all of this down. We ended up with a couple of hundred accounts that I closed out. All told I identified and closed a couple of hundred accounts that accounted for over 200 gigabytes of disk storage, and the network bandwidth they were starting to suck was going to be measured in terabytes, and we’re a fairly small webmail site right now. One can only wonder what they’re doing to some other sites….

The group is based in poland. 99% of the access of these files also came from Polish IP ranges. Fortunately, once you know what to look for, it’s fairly easy to find these accounts, given the standardized naming, the limited IP range they’re coming from, and the exceptionally large average message size. The latter is the easiest way to identify them, no “real” webmail account (at least on our system) has an average message size > 5Meg. Even accounts where users are parking files in their Imap for storage tend to have no more than a 1 meg average storage size.

This group spent some time experimenting with the site, evidently to see if we were paying attention. The earliest record I can find of them accessing the site is in April. In June, they ramped their volume significantly, and in July, they opened the floodgates (and I found it four days later, fortunately). It’s hard to tell from the outside if this was them experimenting to see if we’d catch them and then ramping up when they felt safe or if this is a new network that was finally ramping up as they finished building it. Either way, it’s clear there’s a lot of network being used on a lot of webmail systems globally by these guys.

How to stop this? No easy answers. They aren’t really “doing” anything we don’t allow, it’s more of a Terms of Service on content issue with policing. If the account creation was fully automated we could possibly plug that hole (and probably should on general principles; CAPTCHA might not stop this but it can’t hurt, but some of the webmail sites being used have CAPTCHA enabled and it didn’t stop them). On the other hand, there’s no reason we should feel the need to let them pass around warez on our dime — and they only have to use network to upload it once, and then the webmail sites pay for the bandwidth to accept and then deliver it as often as it gets downloaded, plus disk storage and the typical overhead of backups and etc.

What it really goes to show is that people will find interesting uses for any publicly available technology, whether or not you intended for them to be used that way. It also, I think, means we should be aware of what those possible uses might be and see if we can influence our systems to discourage the ones we don’t like. For instance, a 5 megabyte limit on attachments might have discouraged these guys, but doesn’t seem to significantly impact “normal” users — I found very, very few emails on the system that large.

One of the things I’ve been pondering is ways to automate finding or setting alarms for this kind of “non-standard” behavior; quotas solve some problems, but not this one. I wrote a script that finds these accounts with really large average message sizes. It seems to me something that automates that process, or ways to monitor or rate-limit network usage on a per-account basis would be another way, or simply looking at accounts with the highest network usage.

Things that definitely don’t help this kind of problem: quotas, looking for accounts at or close to quota, accounts with large number of log-ins, or even usage from many different IP addresses. None of those were true. I also didn’t see any significant sign of multiple simultaneous users. The things I think of as “obvious” signs of abuse are missing here, it’s a different set of parameters that become visible once you look.

One option I’m just starting to investigate is coming up with some kind of “typical” network usage per user, sort of a capacity planning number — and then if the system deviates from that significantly it gives you a hint you need to look in more detail. I want to avoid having to monitor at the per-user level to the greatest extent possible, and find metrics at the system-usage level that might tell me if the system is within expected usage ranges or not.

In reality, there’s nothing “wrong” going on here other than the sheer size of the operation and the costs it involves (and the fact that most of the content is likely illegal). technically it’s pretty simple and straightforward — a nice hack — to shift the cost of distribution off to others in a way that’s (in theory) low-key enough to not be noticed, at least until they get greedy in resource consumption. If they hadn’t spiked usage in July like they did, I might not have gotten around to chasing them for a while.

My ultimate take-away, though, is that the users “use cases” for a technology are rarely the same as the developers. Sometimes the users innovate in really interesting and positive ways, sometimes they distribute warez — but either way, people are going to see opportunities in your technology and that should be part of the discussion in designing those technologies.

My suggestion: if you run a webmail site that allows users to create accounts, you might just want to look and see what you find. Might surprise you.

Oh, for what it’s worth, I’ve held off posting on this for a bit because I gave advance warning to the other sites I found involved in this. Of the 15 or so abuse@ accounts I sent the details to (including accounts, IP ranges, Received header data, etc, etc), one responded immediately and started their own search and destroy operation — they happened to be one of the larger “white label” webmail, so that’ll shut down any number of the domains involved.

But three of the webmail sites had their abuse@ addresses bounce as user unknown. One sent me email letting me know he was on holiday for a few weeks (in italian). And from the rest, including the two Polish ISPs where all of the upload activity intiated, total silence. Ohwell. Kinda sad, but hey, it’s their network bill, if they don’t mind paying it, I shouldn’t complain… And I just did a check of our site to see if they took the hint, and I see no sign of them creating new accounts now or doing any kind of activity, so I think they’re gone. Well, for now. I’ll know if they come back…

This entry was posted in The Internet.
  • http://marionvermazen.blogs.com Marion

    Not to be ignorant but what are warez cracks?