a lightweight, secure, fast page view tracker
the journal of Michael Werneburg
twenty-seven years and one million words
A year ago, I designed a lightweight, anonymous page-view counter for my website: 75,000 views later I'm sharing with the world.
First, the design considerations:
- It must not count 'bot hits. It must not double-count "hits" like CSS files and images.
- It must be fast. For instance, it has to return immediately, and it can't load down the visitor with unnecessary stuff like images.
- It must be cache friendly, by which I mean the solution shouldn't have to fight with the technologies out there that cache a website's content to speed delivery across the Internet.
- It can't change the design of the site. I'm not claiming my website is beautiful, but why put the cart before the horse.
- While JavaScript seems to be enjoying a new life as the go-to solution for many things, I simply didn't want the complexity. Also, I didn't want to trigger the plugins that track and block trackers, which I myself use religiously.
Those are a lot of conflicting requirements!
After trying several designs, I realized that what I was looking for was:
- A small CSS file that contained some useful part of the site's design. But not too much.
- A subdomain of emuu.net (and my other website, risktopics.com) that allowed me to cache all of the main content, and put a strict no-cache configuration on the tracker-lite CSS. A simple nginx configuration block was all I needed to separate the tracker from the content.
The CSS code loaded from the subdomain can be anything. In my case, I went with the CSS that loads the fonts.
Here's the nginx.conf bit that identifies 'bot traffic:
map $http_user_agent $bot {
~*(google|bing|pingdom|monitis.com|Zend_Http_Client) 1;
~*(http|crawler|spider|bot|search|ForusP|Wget/|Python-urllib|PHPCrawl|bGenius) 1;
default 0;
access_log off;
}
log_format ww '$time_iso8601 $host $http_referer $bot $remote_addr';
Note that this doesn't actually stop 'bot traffic from being logged. But it does flag bad page views with a '1' whereas good traffic is a '0'. I then discard the bot "page views" in the script that reports on the traffic.
Here's the nginx config block for the subdomain(s):
server {
listen 80;
server_name ww.emuu.net ww.risktopics.com;
charset utf-8;
root /yadda/yadda/emuu.net/ww;
index index.html;
error_log /dev/null;
access_log /yadda/yadda/emuu.net/logs/ww_access.log ww;
location /fonts.css {
expires 0;
add_header Cache-Control private;
}
}
Note that this contrasts with the caching directive for the main site, which looks like:
add_header Cache-Control "public";
expires 1y;