I am currently involved in a project that seeks to provide a unified view of disparate content on a person’s desktop, along with a cloud-based repository where such assets can be stored, searched, aggregated, published or server in various ways, and backed up.
You may ask “Is that not what Apple’s iLife/iCloud, Windows8, Google Docs/Drive, Adobe, etc… are trying to do?” Well yes, but they don’t have a great solution yet, and our project has more specific needs that we are trying to address. In particular, there is a requirement that it be possible to manage the content while offline, and that a version of the content be published and synched across peer devices. The solutions offered by Apple/Windows/Google target a very broad spectrum of the consumer market, and as such are not specifically tailored to the requirements and workflow of our project.
“Ahh, are you talking about a Content Management System (CMS) or a Digital Asset Management System (DAM) then?”
Well perhaps, we are not sure. Before proceeding with evaluating couchDB, let’s digress into a brief discussion of such systems.
How is content managed nowadays?
The main functions that such systems aim to provide are:
- Storage of the assets and content (text, documents, images, movies, etc…)
- Capture, storage and search of disparate metadata contained in the content (image size, version information, location, etc…)
- In the case of text documents, indexing and search of the content of the document
- Versioning of the documents in time
- Processing/aggregation/publication of the assets in various formats (text, web, print, zip files, etc…)
- Fine-grained control of access to the various assets
There is a wide array of software that meet these requirements in various areas. In particular, there are systems such as WordPress, Drupal, Joomla that are often labeled CMS systems because they offer many of the functionality above. Nevertheless, their original purpose was more strictly focused on the creation and maintenance of online content and web sites (blogs, html pages, articles/posts, etc…), rather than the management of digital files containing content. As such, I prefer to think about them as Web Management Systems (WMS).
Such WMS have grown over the years, their functionality has been extended via plug-ins and third-party contributions, and many consider them to be full-fledged application development platforms. I would not be surprised to hear that there exist plug-ins for any of these systems that extend their functionality further as pure CMS systems.
What are the current options for content management?
Existing systems have typically encapsulated most of their functionality on the side of the server, and only the most advanced systems provide integrated client-side tools for editing or processing documents on the desktop. Many of the existing systems use traditional SQL databases for storage. Because SQL databases are most suited for storing discrete data with fixed data structures, they are not well-suited for storing metadata where the structure of the data may change from one context to another, or large documents which content has to be indexed to be searched effectively.
The more advanced and mature DAM/CMS have eschewed the use of SQL in favor of proprietary storage engines. Newer DAM/CMS are adopting emerging standards such as java’s JSR-160 and its reference implementation Apache Jack Rabbit, which are more suitable to handling the requirements of Content Management.
At the same time, there has been a lot of progress in new storage technologies which aim to make it easier to store semi-structured data for which traditional SQL databases are not well-suited, the so called ‘NoSQL’ or schema-less datastores, which encompass a wide breath of technologies. CouchDb falls into this category, as it stores data as JSON documents that can have various structures, and where new fields can be added dynamically as the need arises.