On page 74 of Web Designer magazine #130 is an article on making tag clouds in PHP. I suppose you could do it their way, but a few little things left me puzzled about their implementation, and I thought I would give it a go myself.
I’m a big, big fan of associative arrays in PHP. Most people who know arrays know that they are variables that contain a set of elements indexed by numerically, but PHP can index arrays in two ways. An associative array allows you to use a string key to identify each element of the array, rather than a number. The strings have to be unique, but for the purposes of creating a tag cloud, this is perfect because we only want to list each element once.
Using an associative array for the tags makes a huge difference in the amount of work you need to do. Using some smarts in other areas can offer some improvement, too. Let’s step through the process and document it a bit to see how it goes.
First, lets define some CSS classes that represent the frequency of use of each tag:
// These are the class names that we’re going to use for frequency
$frequencies = array( ‘used_never’, ‘used_infrequently’, ‘used_frequently’, ‘used_continuously’ );
You may prefer to use other class names than these, but you get the idea. ‘used_never’ is for tags that are hardly ever used, ranging up to ‘used_continuously’, which is, uh, used continuously. One thing to note here that’s going to be different from the Web Designer article is that this code will let you add new frequencies, as long as you keep them in order. You’d have to adjust parts of the article’s code to make this happen. Moving on, let’s look at loading the data, which is where the biggest change is.
We want to get our data loaded into an associative array, rather than a strangely-indexed multi-dimensional array. Here’s some PHP that would load the data directly:
// Here are some predefined tag frequencies
$tags = array(
'foo' => 2,
'bar' => 3,
'baz' => 14,
'qux' => 10,
'Freddy Mercury' => 1,
);
Let’s figure out how to get the data into the array, which the Web Designer magazine completely overlooks.
If you’re using PEAR’s MDB2 classes, you can load this array super-easy:
// Get the tags using MDB2
$tags = $database->getAssoc(‘SELECT tag, count(tag) AS used_times FROM tags GROUP BY tag ORDER BY tag’);
The getAssoc() function returns an associative array, using the first column as the key and the second column as the value. It’s pretty slick, and we should probably add something like that to Habari’s DatabaseConnection class, because it’s pretty useful.
If you’re using something like Habari’s DatabaseConnection class, you need to do a little more work:
// Get the tag data using Habari’s default DatabaseConnection
$temp_tags = DB::get_results(’
SELECT tag_text, count(id) AS used_times
FROM habari__tags
INNER JOIN habari__tag2post ON habari__tag2post.tag_id = habari__tags.id
GROUP BY id
ORDER BY tag_text
‘);
// Convert the tag data into an associative array:
foreach($temp_tags as $tag) {
$tags[$tag->tag_text] = $tag->used_times;
}
That’s how to get the data out of Habari’s database, if you’re using the default table prefixes, but you can use the same general idea for any database that stores tags.
Note that the foreach() loop above can be used to convert any existing array of data into an associative array for our purposes, as long as the keys are unique. Ok, so you’ve got the data into the right format – now things get much easier.
Steps 6 through 9 of the Web Designer article talk about looping through the array values to get the number of times the most-used tag was used, and the number of times the least-used tag was used. This is silly. Try this instead:
// Get the max and min used frequencies.
$used_max = max( $tags );
$used_min = min( $tags );
// Get the range of frequencies
$range = $used_max - $used_min;
We don’t need to know which tags were most and least used, just the numbers. That makes is simpler to use the above than writing out a whole loop. The only reason I can see that you’d assume you had to use your own loop is because your editor’s intellisense doesn’t mention that you can use an array as an argument for min()/max(). Too bad.
In the step above we’ve also gotten the range of frequencies used. These will be useful later.
Steps 10 through 18 in the original article are utter insanity. They create a couple of nested loops that assign classnames to a temporary array, indexed on the key of the multi-dimensional array… It’s even hard to describe! Let’s try a more sane approach.
We’re going to go through all of our tags, figure out which size range they fit into, and change the value in the array from a number into the actual class name we want that tag to use on output. How do we do this? A callback function!
There are a few really useful functions in PHP that require callbacks. A callback is simply a function that you provide as an argument to another function. That other function calls your callback as required to produce its own output. In this case, we’re going to use array_map().
array_map() is a PHP function that passes each member of an array through a callback function. The callback has one input argument (actually, this is not entirely true, but for our purposes it’s good enough) which is a single array element, and the output returned is the value replaced into the array.
Our callback function needs to know a few things, so we’re going to have to use a couple of global variables. I don’t like doing this. The best way around using globals in this case is to use a class. If you were developing in Habari, this would be a cinch, since nearly everything is a class. But for this example, it’s globals.
Our callback function:
/**
- Returns a class name relating to the frequency of a used tag
- @param integer $used_times Number of times a tag was used
- @return string The associated class name
**/
function count_to_class( $used_times )
{
// Make the variables we created outside of this function
// available inside this function
global $used_min, $range, $frequencies;
// Get the number of frequency classes, zero-indexed
$frequency_count = count( $frequencies ) - 1;
// Get the index of the frequency array that holds our classname
$frequency_index = round( ($used_times - $used_min) / $range * $frequency_count );
// Return the frequency class name
return $frequencies[ $frequency_index ];
}
The $frequency_index is calculated using a formula that looks complicated, but is actually pretty simple. It first figures out what percent of the overall range the tag’s frequency is. Then it finds the index that corresponds to that frequency. I used round() to get a nice distribution between the class names, but if you used floor() instead, then you’d only get your highest frequency when it’s used in every case.
Ok, so the callback function is defined. Let’s use it:
// Process the tag array using the callback
$tags = array_map( 'count_to_class', $tags );
Too simple? Maybe. Note that we have to pass the callback function name as a string. PHP is weird.
At this point we have an associative array with keys that are our tags, and values that are the classnames for their output. We can output all of that very easily, like this:
// Output the tags as list items
echo '';
foreach( $tags as $tag => $class ) {
echo "- {$tag}
";
}
echo '
';
The foreach() loop construct allows us easy access to the associative keys in the array, using the double-arrow. The key is on the left, the value is on the right. We output them in the unordered list, and presto! Tag cloud.
Apply a little CSS, and you’re home free:
#tagcloud {
list-style:none;
}
#tagcloud li {
display: inline;
padding: 0.2em;
}
#tagcloud li.used_never {
font-size: 0.8em;
}
#tagcloud li.used_infrequently {
font-size: 1.0em;
}
#tagcloud li.used_frequently {
font-size: 1.5em;
}
#tagcloud li.used_continuously {
font-size: 2.0em;
font-weight: bold;
}
Web Designer magazine styles in font-size percentages. The other Owen says to try percent. I’ve given you ems, in case you’re into that.
Enjoy.