I was just curious how many images Google had on index for my site. A simple query is all it takes. site:jonlandrum.com on Google image search. But that query returned an alarming result.
My search returned no results. What had I done? When did this start? Well, I thought I had noticed a sharp decline in visitors, and my stats showed the very moment it happened. It was an .htaccess rule I added, which was intended to thwart bandwidth stealing. There was a large number of sites hotlinking various images on my site, so I thought I’d put an end to that with this rule:
# Pwning image scrapers
SetEnvIfNoCase Referer "^http://(.*\.)?google\.com/?(.*)?" ok=1
SetEnvIfNoCase Referer "^http://(.*\.)?jonlandrum\.com/?(.*)?" ok=1
SetEnvIfNoCase Referer "^http://(.*\.)?live\.com/?(.*)?" ok=1
SetEnvIfNoCase Referer "^http://(.*\.)?stumbleupon\.com/?(.*)?" ok=1
SetEnvIfNoCase Referer "^http://(.*\.)?yahoo\.com/?(.*)?" ok=1
<FilesMatch "\.(bmp|gif|ico|jpe?g|png)">
Order Allow,Deny
Allow from env=ok
</FilesMatch>
This works to only allow image referrals from five sites: Google, Yahoo!, MSN, StumbleUpon, and my own. And it worked beautifully. Except for one thing: Even though the search engines are allowed to display the images with this .htaccess rule, they are not allowed to visit the images directly. Notice that the rule gives no occasion for browsing to the file sans referrer. Visiting the images directly is how the search engines spider them. Without that ability—in this case because they were forbidden—the images on the search results will with time simply go away. It was rather easy to fix, but I definitely had a Homer Simpson moment. Add one line to the beginning of the rule and you have it made:
# Pwning image scrapers
SetEnvIfNoCase Referer "^$" ok=1
SetEnvIfNoCase Referer "^http://(.*\.)?google\.com/?(.*)?" ok=1
SetEnvIfNoCase Referer "^http://(.*\.)?jonlandrum\.com/?(.*)?" ok=1
SetEnvIfNoCase Referer "^http://(.*\.)?live\.com/?(.*)?" ok=1
SetEnvIfNoCase Referer "^http://(.*\.)?stumbleupon\.com/?(.*)?" ok=1
SetEnvIfNoCase Referer "^http://(.*\.)?yahoo\.com/?(.*)?" ok=1
<FilesMatch "\.(bmp|gif|ico|jpe?g|png)">
Order Allow,Deny
Allow from env=ok
</FilesMatch>
We will now return you to your normal traffic.
~Jonathan
Trackback This Entry
http://jonlandrum.com/old/how-to-stop-google-from-sending-visitors-to-your-site/trackback/
Leave a Reply