As of just now, Clig Details pages got a new feature: analytics showing you bot traffic.
What’s a bot? A bot/robot/crawler/spider is software that follows links. There are good bots, like GoogleBot which indexes the web to build Google’s search index, and there are bad bots that scrape content and do other nasties.
Since launching Cligs, the analytics showed you all traffic clumped together without breaking it down into humans clicking through versus bots automatically requesting the clig. Today’s update is a basic start to building more on this kind of breakdown and this type of analytics.
So what does today’s update do? Go to any Clig Details page (the bar graph icon) and under the Total Hits section you’ll see the breakdown; for a recent clig, this is what I see:
As you can see, it’s a very simple breakdown: total number of hits (as previously was shown) with two items underneath it showing the number of hits that are bots and those that are humans.
So how does Cligs detects bots? Well some of them are very easy to detect and some not. Given the millions of clig requests Cligs has seen since launching, finding bots is easier than usual: there is a lot more data to mine for interesting traffic patterns.
The traffic analysis I did indentified a lot of IP addresses that exhibit bot-like behavior. I manually checked the top 100 IP addresses (yes, manually) and confirmed they are bots. These IP addresses were then added to a special list that the Cligs analytics check to produce the breakdown.
This technique has one very important side-effect: not all bots will be detected so the number of bots Cligs gives you is the minimum guaranteed number of hits that are bots. It may be that the rest of the traffic contains bot traffic too that’s not detected. No technique can guarantee 100% detection of human vs bot traffic but we can have a honest crack at it.
With time, more IP addresses will be added to detect more bots. Also, Cligs will soon be able to show you which bots have generated traffic on your cligs. Eventually, I’d like to see how to best merge the Latest Search Engine Bot Sightings section with the detailed bot analytics as search engine bots are one type of bots. Ideas and thoughts welcome