(It was obvious.)
As part of my mission to bring clarity to the world, let me explain the so-called “302 exploit” you may have heard scare stories about.
Background
HTTP, the protocol used to serve web pages, has two numeric codes that can be returned by the web server to direct the client (browser) to a new URL: 301 and 302.
A 301 redirection means “The page you requested has moved permanently. Please go to the new address I am providing you with, and update your bookmarks.”
A 302 redirection means “The page you requested is temporarily being hosted somewhere else; please fetch the content from the URL I am providing you with. This is still the correct URL for future access, however.”
The “exploit”
Suppose a malicious person has a spam site they wish to promote; let’s call it SPAMSITE. He looks at Google’s search results for his choice of keywords, and copies the URL of a site listed quite high up; call it GOODSITE.
Next, he sets up his web server to detect when the GoogleBot crawls his web pages. When that happens, he has his web server issue a 302 redirection. That is, SPAMSITE says to GoogleBot: “The page you requested at SPAMSITE is temporarily hosted at GOODSITE. However, SPAMSITE is still the place you should visit in future to get the content.”
The idea is that GoogleBot then indexes SPAMSITE as if it was the real GOODSITE, and GOODSITE gets dumped from the rankings. Users who search for GOODSITE via Google click on the link for SPAMSITE, which looks like it contains the real content from GOODSITE, but instead they get ads for penis enlargers, Texas Hold’em Poker, and Asian amputee lesbians shaving each other.
The reality
That’s the nightmare scenario being screamed to the media. Reality is not quite that simple, however.
Google rates sites according to their “pagerank”–a magic number proportional to how many other sites link to them. The more sites with high pagerank link to a particular page, the higher that page’s pagerank.
So, let’s go back to our earlier scenario. Google has been told that SPAMSITE is the proper URL for GOODSITE. However, chances are there are a lot of sites linking to GOODSITE; and if SPAMSITE has just been set up for hijacking, there won’t be anything pointing at it. So, Google will ignore what SPAMSITE told it, and report GOODSITE as the URL, because it has a much higher pagerank. (Or, so the Google guys say.)
The problem, such as it is
This still leaves one problem scenario: if SPAMSITE can somehow get its pagerank to be higher than GOODSITE, it can push GOODSITE off of the listings and take its place. Of course, it could do that anyway, by serving up a copy of GOODSITE’s content, but 302 allows it to do it without actually committing copyright violation.
So that’s the sum total of the “exploit”: there’s a way to spam Google exactly as if you were using other people’s content, without actually copying that content. Whoop-de-doo.
Fixes which aren’t
One thing a lot of people ask is, why not just ignore 302s and always index the destination URL?
Answer: Because it would break a lot of links. For example, the canonical URL of my web site is http://www.pobox.com/~meta/. That URL goes issues a 302 redirect to wherever I happen to be hosting my web site at the time.
Similarly, lots of commercial sites have canonical URLs which they publish, and then redirect to some dynamically generated page in a content management system. For example, IBM.
New Windows / Internet Explorer security hole:
-
Upload any Windows executable you like to a web server.
-
Set up the web server to send .exe files as text/html.
-
Post links to the file, cloaking them as http://www.innocenturl.com%01%00@www.yoursite.com/virus/whatever via the previously announced URL cloaking bug.
-
Wait for anyone using Internet Explorer to click on the innocent-looking link and get asked if they want to open the HTML web page.
-
Cackle as their computer downloads the executable and runs it, without prompting them further.
Solution: Switch to Mozilla, or don’t click on “Open” to open files.
Verisign, possibly the most incompetent name registrar on the Internet (but that’s another story), have decided to leverage their monopoly control over the current de facto standard root DNS servers.
They’ve set things up so that any nonexistent domain name now maps to one of their servers. If you type a random bogus domain name like xyloturbot.com into your web browser, you now get Verisign ads and a pay-for-hits search engine.
This is bad for many reasons. Firstly, they’re violating at least four different RFCs, including the Requirements For Internet Hosts. Secondly, they made the change without warning, breaking many anti-spam systems that were checking to see if alleged sender e-mail addresses look valid.
As if that wasn’t bad enough, spam sent with completely bogus addresses now ends up queued indefinitely on many mail servers—rather than bouncing it immediately as it’s to an invalid sender, they can now resolve every single bogus address, so they’ll queue the mail and try delivering it for a couple of weeks. There are probably lots of servers out there that aren’t given much attention, that are now gradually filling up with spam thanks to Verisign.
Another problem is that it gives the Internet a single vector for massive virus infestation. Imagine if a hacker cracks the Verisign web server and puts a new Windows virus on that server for download—it could spread across the entire Internet in seconds.
Finally, what they’re doing is probably illegal under the anti-’cybersquatting’ laws passed in the USA. They are, after all, squatting on other people’s trademarked names, in order to make cash.
There are already patches for most DNS servers to permanently blackhole the Verisign machine in question. It took IBM less than a day to decide to blackhole all traffic to that server, and according to the software authors the clamor for patches has been enormous. It’ll be interesting to see how the crooks respond.
In the meantime, it seems to me that the best thing to do is take advantage of the situation. Since every bogus e-mail address now resolves, and since all the incompetently-managed open relay servers will end up sitting delivering e-mail to Verisign 24×7, why not generate a few hundred bogus e-mail addresses every day, link to them on well-trafficked pages (like this one), and wait for the spambots to harvest them? In fact, you may already have spotted me doing just that…
The Register has published the details of how the RIAA web site got hacked.
It turns out that the RIAA left the admin tools of their web server active, without any password protection. To “secure” the site they set their ROBOTS.TXT file to ask search engines not to index the admin tools directory.
The hackers looked at ROBOTS.TXT to see what the RIAA didn’t want people finding via Google, saw the /admin directory listed, went there, and found they had total back-door access to the site.
I hope the fuckwit RIAA webmaster got fired. I mean, it barely even qualifies as a hack. It’s like walking through an open door that has a big neon sign next to it saying “Please do not walk through this door”.