a lightweight, secure, fast page view tracker

Toronto, 2017.10.13

A year ago, I designed a lightweight, anonymous page-view counter for my website: 75,000 views later I'm sharing with the world.

First, the design considerations:

  • It must not count 'bot hits. It must not double-count "hits" like CSS files and images.
  • It must be fast. For instance, it has to return immediately, and it can't load down the visitor with unnecessary stuff like images.
  • It must be cache friendly, by which I mean the solution shouldn't have to fight with the technologies out there that cache a website's content to speed delivery across the Internet.
  • It can't change the design of the site. I'm not claiming my website is beautiful, but why put the cart before the horse.
  • While JavaScript seems to be enjoying a new life as the go-to solution for many things, I simply didn't want the complexity. Also, I didn't want to trigger the plugins that track and block trackers, which I myself use religiously.

Those are a lot of conflicting requirements!

After trying several designs, I realized that what I was looking for was:

  1. A small CSS file that contained some useful part of the site's design. But not too much.
  2. A subdomain of emuu.net (and my other website, risktopics.com) that allowed me to cache all of the main content, and put a strict no-cache configuration on the tracker-lite CSS. A simple nginx configuration block was all I needed to separate the tracker from the content.

The CSS code loaded from the subdomain can be anything. In my case, I went with the CSS that loads the fonts.

Here's the nginx.conf bit that identifies 'bot traffic:

map $http_user_agent $bot { ~*(google|bing|pingdom|monitis.com|Zend_Http_Client) 1; ~*(http|crawler|spider|bot|search|ForusP|Wget/|Python-urllib|PHPCrawl|bGenius) 1; default 0; access_log off; } log_format ww '$time_iso8601 $host $http_referer $bot $remote_addr';

Note that this doesn't actually stop 'bot traffic from being logged. But it does flag bad page views with a '1' whereas good traffic is a '0'. I then discard the bot "page views" in the script that reports on the traffic.

Here's the nginx config block for the subdomain(s):

server { listen 80; server_name ww.emuu.net ww.risktopics.com; charset utf-8; root /yadda/yadda/emuu.net/ww; index index.html; error_log /dev/null; access_log /yadda/yadda/emuu.net/logs/ww_access.log ww; location /fonts.css { expires 0; add_header Cache-Control private; } }

Note that this contrasts with the caching directive for the main site, which looks like:

add_header Cache-Control "public"; expires 1y;

leave a comment

By submitting this form you agree to the privacy terms.