New Feature: Detecting Non-Human Hits
As of just now, Clig Details pages got a new feature: analytics showing you bot traffic.
What’s a bot? A bot/robot/crawler/spider is software that follows links. There are good bots, like GoogleBot which indexes the web to build Google’s search index, and there are bad bots that scrape content and do other nasties.
Since launching Cligs, the analytics showed you all traffic clumped together without breaking it down into humans clicking through versus bots automatically requesting the clig. Today’s update is a basic start to building more on this kind of breakdown and this type of analytics.
So what does today’s update do? Go to any Clig Details page (the bar graph icon) and under the Total Hits section you’ll see the breakdown; for a recent clig, this is what I see:

As you can see, it’s a very simple breakdown: total number of hits (as previously was shown) with two items underneath it showing the number of hits that are bots and those that are humans.
So how does Cligs detects bots? Well some of them are very easy to detect and some not. Given the millions of clig requests Cligs has seen since launching, finding bots is easier than usual: there is a lot more data to mine for interesting traffic patterns.
The traffic analysis I did indentified a lot of IP addresses that exhibit bot-like behavior. I manually checked the top 100 IP addresses (yes, manually) and confirmed they are bots. These IP addresses were then added to a special list that the Cligs analytics check to produce the breakdown.
This technique has one very important side-effect: not all bots will be detected so the number of bots Cligs gives you is the minimum guaranteed number of hits that are bots. It may be that the rest of the traffic contains bot traffic too that’s not detected. No technique can guarantee 100% detection of human vs bot traffic but we can have a honest crack at it.
With time, more IP addresses will be added to detect more bots. Also, Cligs will soon be able to show you which bots have generated traffic on your cligs. Eventually, I’d like to see how to best merge the Latest Search Engine Bot Sightings section with the detailed bot analytics as search engine bots are one type of bots. Ideas and thoughts welcome

Januar 2nd, 2009 at 6:09 pm
Thanks for the feature and for the tweet
Januar 4th, 2009 at 8:07 pm
Thank you very much for this update. I find it fascinating and very important to be able to distinguish between bot traffic and human traffic. Depending on the goals of a link posting, it will take away the speculation and makes for better measuring.
I would be interested to see a list of bots that are harvesting social media sites, where the cli.gs url most likely has been posted.
Which site actually has more bot traffic then others, finding I publish a link on twitter and on facebook with different cli.gs ids, what produces more hits with bots?
Bots are completely independent from your own set of followers or friends…and might be able to further the reach of a social media engagement beyond the known set of readers, listeners and participants.
I am rambling, forgive me, I just try to wrap my head around all the metric opportunities and the purpose of it in reaching my ultimate goal, measuring quality, as much as I will be able to measure quantity.