Designing a CMS Architecture

When faced with the alternative between an off-the-shelf CMS or a custom development, many companies pick solutions like ezPublish or Drupal. In addition to being free, these CMS seem to fulfill all possible requirements. But while choosing an open-source solution is a great idea, going for a full-featured CMS may prove more expensive than designing and developing your own Custom Management System.

Hidden Costs

What does it cost to integrate and deploy a website based on an open-source CMS? At first sight, not much. As for every CMS, you have to design your own templates and fill your website with initial data. But there are additional costs that pop up as soon as you need a little more than just plain content management.

Think about adding a blog or a forum to a website managed by a CMS. There are modules or plugins for that, but they never provide the same flexibility as plain blogging engines such as Wordpress, or plain forum engines like phpBB. So even if the basic requirement is fulfilled by a module, you will always need - always - to adapt its code.

And this is where it gets ugly. The code base of open source CMS engines and their plugin is nowhere as good as what you can see in RAD frameworks these days. Most of them are based on a very old architecture (PHP4, no object orientation, no proper error handling, direct access to the database, etc.). That means that changing something will be very painful, and very expensive. You will encounter numerous bugs, change the blogging plugin three times because neither of the ones you tested are capable of doing what you need, you will upgrade your CMS to the latest version to benefit from this single bug fix that should save your life but then you need to change all your existing configuration…

This is as bad as it sounds. Start changing one single line of code in an application build on top of Drupal or ezPublish, to name only the two major ones, and you are in trouble. The moment you need something that is not natively supported, you enter the Dark Zone of CMS hell. You are going to spend a lot of money on development. You will never see the end of the tunnel. That is, until someone says, a few years from now, “Do we need all that crap? Let’s build something that fits our needs and that actually works”.

Making Your Own CMS

Given number of available open-source CMS solutions, building one on your own sounds like a stupid idea. But if your website is 50% content management and 50% something else, you probably need to start with a web application framework like symfony or Django, rather than a CMS. These frameworks provide plugins that do part of the Content Management job already, so creating a CMS today is like assembling Lego bricks to build something that exactly fits your needs.

Take symfony, for instance. It provides native support, or support through plugins, for:

Symfony doesn’t yet provide an Access Control List or a Workflow plugin, but you can already put all of the above together and have a pretty powerful CMS engine.

A tailor-made CMS will always have less code and show better performance than any of the existing full-featured solutions. Also, you will be able to tweak it completely, since all the components are decoupled, and built with extensibility in mind.

Your custom CMS will cost you more during the first year, but if you expect your website(s) to live longer than that, then the benefit will become obvious after a year and a half. Plugging the CMS features into other parts of the website, adding features unrelated to content management, scaling to a larger audience, replacing the database engine or the caching backend, all that will be painless.

That is, if you design your custom CMS carefully, and with the future in mind.

Environments

When you add features to an application, you need a testing environment - a place where you can check that the additions work and don’t kill the rest of the application. That means that developers have a version of the website on their desktop computer, where they change stuff. Then, they upload the application to a test server, check that everything is OK, and only then can they deploy the application to the production server. This is a very common practice, often backed up by source version control and continuous integration tools.

But what happens when a new feature is not made of code, but of data? In ezPublish, for instance, in order to define a new type of content (they call it a “Class”), you have to use the backend web interface and fill in a few forms. The properties of the new type of content are stored in the database. In order to deploy this new type of content from the testing environment to the production environment, the developers need to transfer data from one database to another - without wiping off unrelated information on the production database, such as user comments, statistics, etc.

Deploying new features in this context means executing some SQL code on each server. This is much more dangerous than just pushing a new version of the codebase, especially when the data model is made of many tables glued together in complex joins. That’s why, in many websites based on ezPublish, developers add features directly on the production environment, or repeat the configuration using the backend interface on every environment. This is either a high risk or a large waste of time.

Data, or Code?

This environment drawback tends to be a major influence over the choice of features a CMS should provide. For almost every CMS feature, you should wonder: Can the user do that through the backend interface, or do we need a programmer to add a new element? In other terms, is the feature made of data, or code?

Off-the-shelf CMS engines will almost always answer ‘Data’. My personal opinion is that it is wrong in many cases. Content types are just one example, but think about workflows or page layouts for instance. They define a complex logic that always translates to code, and giving the user the ability to change them via a backend interface means storing code in the database and evaluating it at runtime. Then you can’t use op-code cache engines like APC incriease your website performance. And deploying that to production is a nightmare.

Some companies think that most of the CMS features should be accessible via a backend interface in order to be able to enhance the application without additional developments. But this is an illusion. For one, the configuration of content classes in ezPublish is so complex that it does indeed require a PHP developer, and an expensive one, since experience with ezPublish is one of the most demanded skills in the IT market (at least in France). More features mean more development, and there is no CMS out there that replaces the power of a programming language with a web interface.

So that leads to one good rule of thumb: Design your features so that they can be made of code rather than data. That applies to elements that can be modified by a graphical user interface, or programatically:

    • Content classes

    • “Widgets” or “Components” for pages

    • Page layouts or “templates”

    • Content validation workflow

    • Tasks

Fundamental questions

The complexity of a CMS engine depends greatly on the answer you give to a few fundamental questions:

    • Can contents exist independently of a page?

    • Can contents exist at more than one place in the website?

    • Are there several views for a single piece of content?

    • Can contents have different versions simultaneously?

    • Can contents be modified in the backend and keep unchanged in the frontend?

    • Can users compose a page with “widgets” or “components” in a WYSIWYG interface?

    • Can predefined zones in a template contain more than one “widget” or “component”?

    • Can section pages have different templates?

    • Can section pages have different versions simultaneously?

    • Can users program the publishing of a section page, or of contents, in advance?

    • Can the CMS remember previous URLs for a content that changed title?

If the answer to the first question is no, then the concept of “page” and “content” coincide. You probably don’t need to develop anything, since your CMS will be quite simple.

If you answer yes to all these questions, then the CMS might take three times longer to develop than what it would be otherwise.

That’s why the idea of a tailor-made CMS is not that stupid. No existing CMS will be able to answer these questions in every possible way. But designing your own relational schema based on the answer to these questions makes sense, economically speaking. Don’t make it complex if you don’t need do, or, to put it otherwise, Keep It Simple, Stupid.

Bootstrapping the reflection

Now that you’re trying to imagine what you actually need for your own CMS, here is a glimpse of the kind of technical challenge you will face all the time.

The question turns around the concept of content types. In a CMS, you mostly deal with “articles”. This type of content has a title, an author, a summary, a body, and a few other attributes. But you probably also need to deal with some other content types, like movies, slide shows, quiz games, polls, or recipes. These content types are defined by properties distinct from that of an article. Some of them can fit in a single structure, others require several structures related to each other. For instance, quiz games require a structure for the quiz itself, one for the questions, one for the answers to each question, and one for the quiz results.

The question is: Do you store the data for all these content types in a single table, or do you create a table for each content type? The most “normalized” choice is probably to create one data structure for each. You could have an “article” table, a “recipe” table, and even a “quiz” table with foreign keys to a “quiz_question” and a “quiz_result” table. That would allow you to make queries on some specific attributes of a specific content type. You could build a custom search engine for your recipes and look for ingredients, foreign cuisine and preparation time.

But then, if each content type has its own table(s), what do you do when you have to list all the contents of a section, or worse (that happens in the backend) all the contents of the website? Does that mean that, in order to display a list of contents, you must query several tables and aggregate the results together? This solution simply doesn’t scale, and a CMS built like that will become slower and slower as you add new content types.

So that probably means that you should store a reference to each content in a separate table, with a copy of the data that is generic to all content types (like title, publication date, section, etc.). Pages displaying a list of contents would use this aggregate table, while pages displaying content details would use the specific tables.

And that means that you must find a way to synchronize the specific tables and the generic tables whenever data changes in content. That’s not a big deal, but it gives you an idea of the kind of complexity you will encounter in a large scale CMS.

A Challenging Exercise

Designing a CMS is difficult and fun, and you’ll probably do it more than once. Every CMS is different, because every content management need is different, and mostly because every customer wants more than just plain content management.

If you are a developer, whenever you meet a client that asks you for a Drupal integration, try to sell your knowledge of CMS architectures rather than a few hours of developer time. Raise the important questions, talk about the possible problems of using off-the-shelf solutions. If you ever used one of those before, you will have plenty of issues to talk about. Then, try to convince your customer to trust you into a custom development. Make it small at the beginning, so that the customer can start using it right away and refine its requirements incrementally.

This will be a very satisfying experience, and the client will thank you later for leading him on the right path. And this will give you a lot to talk about for the next CMS you build…

Possibly related posts (automatically generated):

http://redotheweb.com/2008/09/19/designing-a-cms-architecture/