Categories vs. tags

This entry is geared towards web authors in particular, but may be of interest to any philosopher enthusiast.

Mankind has always been sorting information. Whether litterary table of contents and book chapters, social hierarchies and casts, or mathematical sets and dichotomy algorithms, we segregate data into quantifiable chunks to easily digest the parts or to scan the whole. Formally, this has been the case since , though some prehistorical cave paintings already show some animal classification.

Naturally, the Internet and its collection of websites is no exception. When it comes to blogs, the two prominent methods of classification are categories and tags. It is interesting to see how bloggers and blogging software deal with classification, because blogs are updated frequently and interact often, thus forming a live blogosphere. Unappropriate classification could quickly lead to a tangled mess.

Web authors tackled the classification problem by mimicking printed documents, therefore creating table of contents, headings, sections… The Internet started by mirroring the printed form, but its noble quest for a truly semantic web aims much higher. If done right, the semantic web will lead us to a new form of correlation and thought.

Categories vs tags

Categories are pretty straightforward and need little explanation. They are just divisions in a web author’s conceptual system. Tags, on the other hand, are in fact a list of keywords attributed by the web author to some data, usually blog posts or pictures. While categories and tags seem to be performing the same task, I believe these concepts are very different in definition and goal.

Categories are the first and foremost logical view of some data. They have a very large scope, and when defined correctly, their naming provide universality. By definition, categories exist to separate content, so that a list of categories closely resembles a table of contents.

As for tags, they are keywords within a heap of data. They have little scope, so little in fact, that they can sometimes be related to a single entry. Actually, a closer look shows that tags do not separate data, rather, they consolidate it. It is the sum of tags that mimicks classification, but tags alone do not achieve segregation. Proof of this is that web authors rarely tag content with just one tag, more often than not, we see a collection of three to 10 tags per entry. As a result, multiple tagging creates connections similar to our brain synapses.

Where categories segregate, tags unify. Furthermore, categories segregate from both content and context, whereas tags only touch upon content, not context. This is why an increasing number of tags is applied to an entry: what is supposed to help the reader, turns into redundant data and maintenance nightmare. Anyone familiar with relational models knows that it is normalisation, not redundancy, that is at the core of solid model. Normalisation is key, and solves many subsequent problems. For example, I may tag this entry as “category tag blogging”, while another blogger may tag it “categories tags web issue”. How will there be a match between both entries if keywords are different? To make it worse, add misspellings! How about “catregory tags weblogs”?

So why oh why do people tag? The only reason I see to this, is the synapse metaphore previously presented. Our brain makes lightning speed connections between terms, and we are trying to mimick this on the web. What we forget, is that we don’t all share the same brain, our opinions and cultural backgrounds will not always lead to the same tags, whereas proper categorisation will seldom differ. Aristotle introduced categories for a reason, and we cannot merely dismiss more than 2000 years of philosophical work with a waive of hand.

In defense of the lowly tags, I want to add that tags are actually priceless for pictures, perhaps because one picture may be reused in different contexts, for example a landscape with vivid colors could be used in a nature set or in a color theory set. Texts on the other hand, do not follow the same principles… Perhaps we should leave categories for texts and tags for pictures? Or then, this will sooner or later evolve into a compromise solution. I’ll be on the lookout.

Further reading: