owen

Over the past couple months, and particularly over the last two days, it has become interesting to me what kinds of visitors I am getting to the site and where they’re coming from. There are a few stats packages, and I’ve been working out their benefits and flaws.

I was able to jump on the Google Analytics train before they closed their doors. I installed the required script in the necessary places and waited for stats to roll in. And I waited. And I waited.

Eventually they had enough data to play with (do I not even register in the Google Analytics world as anything but a blip?) and they started showing me some statistics. But. How do you use this thing? Even being a former Urchin 5.0 user, I was still somewhat confused by the interface. Suffice to say, I never really figured it out, and I started to suspect that using Analytics was affecting my AdSense adversely (though I can offer no evidence that this is anything but paranoia), so I simply removed the tags.

Isn’t there something that you can just push a button and get the stats you need? And exactly what stats do you need?

I also have a copy of Mint that I use pretty regularly.

There are things I like about Mint. Peppers are great. I can add the functionality I want, provided someone writes the pepper for it. The default peppers aren’t all that informative by themselves, but when combined with outgoing clicks and referrer data, it becomes very useful.

There are two problems I have with Mint, though. First - and this was something that I liked when I first started using it - I hate the UI.

No, really. It’s fundamentally a pretty and elegant design. But I can’t stand looking at it any more. It’s a big long page of data and it’s all green. Green! Ok, so that’s just an aesthetic issue, but shouldn’t the important data be right there in front of me? Yeah, sure I can drag things around to put important stuff near the top, but shouldn’t this type of software know what stats I’m going to be interested in and put them in front? Ok, so maybe that’s just me - but my second issue isn’t.

There is no historical data in Mint. Or very little. I haven’t figured out a way to see trends over a longer period than the “right now” that Mint shows me.

What was the most frequently searched term yesterday? Last week? Last month? Year? Who knows?!

It’s possible that Mint is storing all of this information somewhere that I can’t see, and that I can access it via a pepper, but I haven’t found the pepper that releases that functionality yet.

There is at least one WordPress plugin that tracks statistics, too.

StatTraq is a WordPress plugin that tracks statistics. It seems to create a new record in the database for each hit (which is to be expected) but never cleans up any of it. So I can get historical data as granular as a single hit, at the expense of a pretty darn large database. I haven’t looked very deeply, but although I expected that there would be some data cleanup, I could not find the SQL “DELETE” statement anywhere in the code.

StatTraq also doesn’t seem to record hits for pages outside of WordPress. Once again, I’m not sure if this is specifically true, but I don’t see a way that the code that records hits can be executed from outside WordPress.

A nice thing about StatTraq is that it is integrated with WordPress. It stores actual post IDs, not just the URLs, so you know when a specific post is being read whether though a friendly permalink or the crufty “?p=2482” URLs that sometimes sneak by.

StatTraq also tracks a good footprint of tracking data, which you don’t get from simple 3rd-party counter services that display a simple graphic number at the bottom of the page.

I don’t really trust 3rd parties with my stats. Even in the case of Google, I was worried that they were using my stats to influence the ads placed on my site. It could be paranoia, but if the stats are hosted locally, then I don’t have to worry about any of that.

Of course, there are log-reading scripts that will accomplish similar tasks by running on your server. Webalizer is an example of such a script.

What makes scripts like Webalizer different from Google Analytics or Mint is that they don’t collect data on their own, but use the existing server logs. This has benefits and drawbacks.

One benefit is that the web server usually keeps the most accurate account of what files it served. Using that log, you should be able to create a perfect picture of exactly what requests were made. This is particularly so when comparing Webalizer to Google Analytics or Mint, since both of those products rely on Javascript. If a visitor has javascript disabled, or a version of javascript that can’t execute the logging code (like in a mobile phone browser), then no hit is recorded!

Another benefit is that because the server is likely already storing these logs, it doesn’t take an additional chunk of space to store hits in a database. Usually programs like Webalizer generate their reports at set intervals as static HTML pages of results. And that brings us to the down-side.

Because Webalizer isn’t typically run in a “live” mode, you don’t see real-time statistics. Instead, you get periodic snapshots of particular ranges of time. Perhaps that’s enough of a view of statistics for you, but I prefer to see trend information as graphs over time, not just an aggregate snapshot of data during a range of time.

Also, because it’s not using anything in addition to the standard server logs to hold data, the results typically lack any information that you can get by using the javascript used by other programs. For example, screen resolution of the browser is not something typically stored in the server logs. If you’re looking to your stats to determine the best size for a redesign of your site, a program like Webalizer isn’t going to provide that information.

Another possibly troubling aspect of using only server logs is that there really isn’t a great way to reliably track sessions (what Webalizer calls “visits”). Knowing the number of unique visitors versus the number of total hits can tell you if people are reading more than one page from your site when they visit. This is important if you care about “stickiness”.

Plus, aren’t those stats kind of ugly? I guess they’re functional.

AWStats is another stats package in the group of log analyzers. Once again, it’s relying on the server logs for periodic publication of reports. It does look a whole lot better, though.

There are a couple of other projects worth metion in statistics tracking.

BlogBeat is a relative newcomer to the statistics arena. The project is still in development, although it seems complete.

Blogbeat uses the javascript method (like Google Analytics) to record statistics information. It is also a 3rd-party service, so if you’re uncomfortable about your stats are being fed to some other site, this one also isn’t for you.

Blogbeat has a very simplified interface for providing just the stats that it thinks you will want. I like the idea of reducing the clutter of overwhelming and often incomprehensible information provided by stats packages. I’m all for it. Still, Blogbeat seems a tad light on per-page content.

Something I really like about Blogbeat is the integration of FeedBurner statistics. Feedburner has an API that lets you obtain minimal information about feed readers, and Blogbeat integrates that into its graphs. That’s pretty cool.

Something that bugs the hell out of me is how Blogbeat uses links to itself for every logged page. For example, imagine that it’s telling you that 300 people read the page titled “A Nose to Honk By”. You don’t remember what that post is about at all You want to read that post to see what they’re talking about. So you click the link. Guess where it doesn’t go. In fact, there is no link that goes to that page of your site. There is a URL displayed, but it’s not linked. Perhaps this is just a small oversight.

Blogbeat has reasonable granularity. It can give you statistics for gradually longer date ranges - Today, Week, Month, Year, All. So you can find out what the most popular post is for this month or last month, but not the month before that, unless you also toss in all the popular posts for the last year. I don’t see this as a problem, really, and if you’re looking for more detail than that, then you probably need a higher-end service.

Also - Everything I’ve mentioned so far has been free to use and/or install. Blogbeat is free for the first 30 days, then it’s anywhere from $6 to $79 per month, depending on how many hits your site gets.

Another option for stats is Measure Map, which seems unavailable to the public yet. Most people who have received invitations to use it have been saying good things about it. I signed up for an invite months ago and haven’t heard anything, so I suppose I’ve been snubbed by automation.

From what reviews I’ve read and screenshots I’ve seen, the nicest thing about MeasureMap is that it’s pretty. There are fewer endless dull charts of itty-bitty stats crap here, and things seem visualized a bit better than they are with other packages.

But I haven’t personally reviewed the software, so I’ll reserve any opinion until that happens.

So what’s all this talk of statistics about, anyway?

I’m trying to come up with some dream features and the technical requirements to support them.

When you’re looking at stats, why are you looking at them? What are you hoping to see?

I’ve had some people already tell me what they use stats for. Here are a few things that I’ve learned:

  • People want to see who is linking to them and which of those links are being followed.
  • They want to know which of their posts are pulling AdSense ads that generate clicks so that they can write similar content.
  • They want to see search trends -- which search terms from Google and Yahoo become more popular over time.
  • They want to learn which pages are the most popular during specific time ranges.

I can think of a few more things that might be interesting to correlate. I’m interested in knowing anything else that people might want to get out of their statistics.

In my opinion, it would be ideal if a statistics package tracked both WordPress and non-WordPress page requests, was as easy to get up and running as installing a new WordPress plugin, and presented essential data up front and in a more visual way whenever possible. It should allow at least some historical data to be recalled, and while it should not rely on a third party for that information, it should also not chew through server resources to store it. It should also be extensible so that when it doesn’t do exactly what you want, you can easily add it.

Have any other ideas? Even if you’re satisfied with the solution you’re using now, it would be helpful for me to know what about that solution satisfies you. Because, as the good salesman says:

“Does your stats program satisfy you? Good, I’m glad it does, because my stats program isn’t going to satisfy you; It’s going to make you ecstatic.”