Protecting Your Content


I originally started writing this post to tell you about the awesome find Matt discovered (OpenDNS), but noticed an even more important topic in my feed aggregator. Weblog Tools Collection found an awesome article about keeping your content intact. I dove headlong into the article, digging ever deeper through the links. There is so much information on this subject one hardly knows where to begin. But armed with information I’ve gained over the past week, and the information in that article, I’m prepared to do something about it.

A couple days ago I bought a magazine at Books-A-Million because it had a swell article about using PHP to keep up with some basic stats. The magazine was Blacklisted!411 and the article was “Spying A Spy” (Volume 8 Issue 1 — Winter 2005–2006, Page 76) by Israel Torres. In the article, Israel walks the reader through the process of creating the necessary files to snoop your visitors. The file is called spy.php. Using this file, you can get some stats about who’s actually coming to your page. But after reading through an article at ha.ckers.org I realized this stats page could have a much broader use.

I typed out all the code from the magazine article, visited the file, and then visited the results page. Using the Reload Every extension I set that page to reload every five minutes, basically giving me a stats ticker. It was after lunch that I noticed a curious entry (actually, about two dozen of them, all at one time.) They all came from the same IP, and the user agent on all of them was a Java program. The address wasn’t listed in the Spamhaus database, but it was an authenticated site, so I couldn’t visit it to see what they are about. Whois lookup identified the IP as a host in Great Britain. I don’t think the host is doing anything wrong, but that doesn’t mean one of their clients aren’t. Quickly opening my .htaccess file, I added a <Limit> argument to block all access from that hosts entire list of IP’s. Hey, I know it’s probably not their fault, but it is their responsibility to take care of it. Now, if I wasn’t such a nice guy, I could have elected to use a more colorful means of getting back at them for trying to scrape my content (à la ha.ckers.org.) But after all, this is a Christian blog. :o)

~Jonathan

Update

I found many more than one offending bot via the spy.php file, and they all were sending a Java user agent header. So I hopped over to the Spamhuntress site and found the answer. I haven’t seen any Java bots since then.

Related Entries

  1. A New Way to Defeat Comment Spam
  2. Is This Guy Serious?
« No Diving I Believe »

July 19th, 2006 · Back to Top · Tagged: Security, Webmastery

Trackback This Entry

http://jonlandrum.com/old/protecting-your-content/trackback/

Leave a Reply

Comment Form

Your E-Mail address is never published nor shared.

No lude comments, and please put your website address. Thank you! :o)

Unfortunately, my database is fried right now. I’m actively working to resolve this, and will re-open comments when it’s all ironed out. For now, if you have something you’d like to say, drop me a line via E-Mail.

~Jonathan