Wednesday, September 30, 2015

Spider Attacks

One thing that's been bothering me ever since I started the comics on is how my visitor stats keep getting skewed by spider attacks.

Not like actual arachnids, I'm talking about the automated systems that are constantly out there reading web pages and cataloging data extracted from them. All the search engines use them to index the web, but spammers use them as well - seeking out web pages with comments forms or other input fields in which they can inject links to Russian pornography sites. I get these things all the time.

While most of my traffic is legitimate, every once in a while I'll have a day where there's a huge spike in "visitors" to the comics, and 90% of them are not real people. I can tell because I record the user agent strings and then check them for legitimacy.

While I can't stop the spiders from crawling, I just want to make sure I don't count them in my stats. I want to know how many real humans viewed comics. Today I tried something new with my stats that maybe I should have done months ago but didn't think of until recently. Instead of counting accesses to one of the comic URLs directly (like I'll count access to a small 1x1 pixel image with the comic number as part of the name (like I put this image on every comic. Real browsers operated by humans will load the image while Russian porno spiders won't. I checked the logs, except for the Google Image Search bot, images are largely ignored by spiders.

We'll see if that helps. So if you're a human, please read my comic. comic for 30 September 2015