Content Management Concepts
Content Management Concepts
“Content management will be one of the top ten technology trends in 2002:
Determining what an organization actually knows is only half the battle. Getting that
knowledge to the right place at the right time is the other half.”
Source- INFOWORLD, JAN 8, 2002
Once a content domain has been established and there is a clear idea of all
of the types of content, the content can then be broken up into its component pieces.
Components divide information into convenient and manageable chunks. They are a set of
discrete objects whose creation, maintenance, and distribution can be automated. They
typically share some common attributes, such as format or length, and they should be able
to ―stand on their own.‖ In other words, a component should have meaning in and of itself,
without needing the context of other components to make it meaningful.
Understanding content domain, from which all of the structural decisions flow
The notion of content components, which allow content processes (collection,
management, and publication) to be automated
Target publications, which are the end result of any content system
A framework, which unites all of the content into a single system of meta
information
In the broader sense, content management is a suite of applications that
allow corporations to effectively manage and deliver large amounts of diverse
information to different media through the most effective and timely means.
It takes a month to sign off the site's Terms & Conditions because every time any
one of your organization‘s lawyers changes a full stop, all the other ones need to
sign it off.
You realize that your site's visual design isn't working, but it will take a month to
wrap a new design around the same words.
Your web design agency insists on all content being signed off two months before
it goes live... and then transcribes it incorrectly.
In a parting gesture, the Web publisher you fired replaced photos of board
members with sheep.
You can't update one section of the site because another section has a major
overhaul underway. You can either publish the entire site, with both complete and
incomplete updates, or hold until both are completed.
You have to work through the night to publish the company's results at market
opening time because you don't have a secure area to develop them in advance.
You send email promotions about 'upgrading' to Windows2000 to registered Mac
users.
You're employing an army of skilled web publishers just to update the system
requirements of your software.
7
1.3 Summary:
The entire introduction can be summarized as follows:
Content:
Content is in essence, any type or 'unit' of digital information that is used to populate a
page. It can be text, images, graphics, video, sound etc - or in other words - anything that
is likely to be published across an inter-, intra- and/or extranet.
Content Management:
Content Management is effectively the management of the content described above, by
combining rules, process and/or workflows in such a way that centralized webmasters and
decentralized web authors/editors can create, edit, manage and publish all the content of a
web page in accordance with a given framework or requirements.
product that she needs. Our system of food creation, management and consumption, just
as does a CMS, depends on well-packaged, standalone chunks, that you can mix and
match in a variety of ways.
The world of the webmaster or web team being the sole method of getting
information onto a web site is over. It is not so much a case of whether you should
implement a CMS - but more a case of when and which one...
Non-technical authoring
Authors must not be required to use HTML (or other technical knowledge)
when creating pages.
Ease of use & efficiency
For a CMS to be successful, it must be easy to create and maintain content.
The functions (constituents) of CMS can be divided into four main categories
Collection (Authoring, Aggregation, Conversion)
Workflow
Storage
Publishing
A CMS manages the path from authoring through to publishing using a
scheme of workflow and by providing a system for content storage and integration.
Fig 2.2 CMS functional scope and the content life cycle
The collection system is the tools, procedures, and staff that employed to
gather content, and provide editorial processing. When content is collected, it is brought
inside the content management system. The content collection process is one of adding
new components to the existing repository. Content collection can be broken into these
categories:
1) Authoring:
Authoring is the process by which many users can create Web content
within a managed and authorized environment. It is basically the process of creating
content from scratch. Authors almost always work within an editorial framework that
allows them to fit their content into the structures of a target publication. Authors should
also be made aware of the framework that has been developed for the downstream use of
13
the content. Authors are in the best position to tag their creations with meta information.
So, to whatever extent possible, authors should be encouraged and empowered to
implement the meta information framework within their content.
2) Aggregation:
3) Conversion:
This is the process of changing the elementation scheme (i.e., the tagging
structure) of the content. In this process the structural as well as the format related codes
must be handled. One conversion problem comes in identifying structural elements
(sidebars or footers, for example) that have only format codes marking them in the source
content. Another problem comes in transforming formatting elements that don‘t exist in
the target environment.
2.4.2 Workflow
Workflow is the management of steps taken by the content between
authoring and publishing. Typical steps could be link checking and review/signoff by a
14
manager or legal team. If workflow has existed at all in traditional Web site management
it has been an off-line affair and not built in to software processes.
The workflow system is the tools, procedures, and staff that you employ to
assure that the entire process of collection, storage, and publication runs effectively and
efficiently, according to well-defined timelines and actions. A workflow system supports
the creation and management of business processes. In the context of a content
management system, the workflow system sets and administers the chain of events around
collecting, ―repositing‖, and publishing.
Extend over the entire process. Every step of the process, from authoring
through final deployment of each publication, should be able to be modeled and
tracked within the same system.
Represent all of the significant parts of the process including:
o Staff members
o Standard processes
o Standard tools and their functions
o Time and data flow with a variety of transitions and charting
representations
Represent any number of small cycles within larger cycles, with some sort of drill
down to the appropriate level of detail.
Have a visual interface that shows cycles and players in the process graphically.
Make meta information in the repository available. The workflow system
should not have to store its own staff members, content types, outlines, and other
meta information. It should be able to read the data that is stored in the repository,
and make it available when appropriate in its dialogs and selection screens. For
example, an editor might select a content type for an article in a workflow screen
order to forward it to the next reviewer. The list of content type selections should
come from the repository, not from the workflow system‘s own internal data store.
As an alternative, the workflow‘s data store (which would need to be some sort of
15
open database) could be considered part of the repository that is responsible for
storing certain meta information.
Provide a conduit to the repository for bottom up meta information. Whether
or not the workflow system stores meta information, its screens will be a natural
place for staff to enter meta information. Data such as author, status, and type are
naturally entered in workflow screens. This data must be able to be transmitted
into the repository from the workflow system.
2.4.3 Storage:
It is also the repository of all content and meta information, as well as the
processes and tools employed to access and manage the collected content and meta
information. The repository holds all of the content and meta information of the system.
Store content. The repository may be one or a set of databases of various kinds. It
can include the file system and network resources of the host computer. If the
repository is distributed among databases, one database is often in a master
position, organizing the information in the others. The repository must be able to
store:
o Textual content- This content is either flat text, or more often markup. In a
relational database the markup is usually saved as text within fields. In an
object database, the markup is broken into all its elements and made
accessible.
o Components- The repository must be able to link content into manageable
components. The better the repository, the greater the ability to create,
modify, and find components.
o Binaries and file-based data- Whether in the file system or inside a custom
data store, the repository needs to be able to effectively manage a range of
data, media, and executable files.
o Meta information- The repository must be an effective store of the variety
of meta information that needs to be collected. Some of this meta
information is coded into the structure of the repository itself (for example,
a database table can be created especially to store meta information for a
particular component type). However meta information is stored, the
repository must provide for the amount and kind of meta information
needed to describe your content.
Select content. The repository must allow access and selection of content from
within itself. The repository should offer fielded querying to find components with
particular meta information associated with them, as well as full text querying
18
against text in the system. In repositories with multiple databases it can be difficult
to issue a search that queries all databases in a consistent way.
Manage content. The repository must facilitate these management tasks:
o Security, including read and write access permissions for components
o User maintenance that interfaces to system user management resources
o Content status keeping and tracking for staging publications, workflow
triggers, and maintenance operations
o Transaction logging and rollback of major changes in individual databases
or to the repository as a whole
o Bulk automated processes that run periodically against subsets of the
repository
o Input/output processes that load in and push out information
Connect to other systems. The repository must be able to communicate over the
network with a variety of clients. Ideally, the repository should be able to
communicate with LAN-based Web browsers, Internet-based Web browsers, and
LAN- or internet-based non-Web client applications. Internet connectivity to the
repository enables authoring and other publishing process to take place from
multiple locations, a frequent requirement for today‘s content-intensive Web sites.
2.4.4 Publishing
Publication templates. These templates draw content into the appropriate context
for each particular publication. The templates must instantiate:
o The formatting syntax and surrounding standard text and media elements
of the target publication platform
o The page structure and syntax of the target publication platform
o Content components and meta information on the target pages
o Standard text and binary files from the repository onto the target pages
A full programming language. The wider the publications and more open the
repository, the more complexity there will be in transforming content in the
repository into a publication. The system needs to have complete programming
abilities so that this complexity can be managed. The language should provide:
o All of the standard variable types and control structures of major
programming languages.
o Complete access to the repository databases and files.
o Access to external objects and libraries.
Runtime dependency resolution. When content is added to the repository it
cannot be determined where and when it will be used in a publication. Therefore,
the publication system must be able to read and resolve content links when the
publication is being produced. For example, if component A has a link to
component B in the repository, but component B is not being published, then A‘s
link must be suppressed by the publication system to avoid a bad link in the
publication.
File and directory creation. The publication system must be able to create the
appropriate file and directory set for the target publication. Additionally, the
system must have some mechanism for deploying the built publication to its final
storage location.
21
happens on the content side. The end result is that even though it‘s easier for content and
design to publish, there are still strict controls as to what makes it to the live server.
2.6 Selection
2.6.1 Choosing a Content Management System
With the multitude of CMS solutions that now exist on the market, it is
imperative that you choose your solution very wisely. Equally - they may have another list
of 'features' that is not so favorable to your environment. Unless you have a very specific
set of requirements to present them with, the likelihood is that neither party will find out
whether the product is a true fit - until it is too late.
A few issues to ponder when selecting a CMS:
Multilevel security. Generally, one person per department should have the
clearance to post content to a staging server. In all cases, the authority to
actually post content to the live site should rest with one or two people.
gets served.
Offline integration. If your company produces lots of print material, you may
be a candidate for a system that integrates offline and online publishing. Both
Openpages' ContentWare and Worldweb.net's Expressroom I/O hook into
QuarkXPress so that master documents can ensure consistent offline and
online content.
The fact that an organization‘s requirements are what should determine the choice
of CMS is also one of the reasons why there is no such thing as THE content
management solution.
product manuals). They have been quick to adopt XML, but often don‘t make much room
for relational data in their models.
Versioning
So that groups of individuals can work safely on a document and also
recall older versions.
Workflow
So that content goes through an assessment, review or quality assurance process.
Integration
So that content can be stored in a manageable way, separate from web site design
‗templates‘, and then delivered as web pages or re-used in different web pages and
different document types.
27
28
The re-use of content across multiple web sites or pages creates an enhanced
productivity value.
The re-use of web output to broadcast over e.g. DTV, Mobile Phones, Kiosks
creates new audiences.
The syndication and re-use of content from other suppliers is made easier.
A CMS ensures enhanced productivity & job satisfaction of the web team
Webmasters can focus on technology and areas such as redesign and functionality.
A more appropriate use of the web team results in lowered production costs.
Enables a quick response to changes on competitor‘s web sites.
Increasingly the web site is the window that investors use to evaluate a company.
A dynamic, changing website creates the impression of a forward thinking
company.
It enables a 'speed-boat' response to changes in the competitive environment.
30
“XML is a great way to store data in a way your organization can digest and
manage it.”
Source- Dell Case Study
The second way in which HTML is too rigid is, quite simply, the
impossibility of separating format and content in a meaningful way. It is true that with
CSS you can radically alter the way page elements are presented, but a table is still a table
and a list is still a list. More importantly, a page is still a page, and that is the problem
to be looked at next.
template. In the system, we still have a kind of template, but it is at a very high level, and
completely empty. There is no fixed content at all. This is what we call a data-driven site.
It enables universal data interchange. Disparate systems and enterprises can share
34
content via XML without having to expose internal data models or invest in
complex integrations. Therefore, XML is most useful for ―data in motion.‖ Of
course, companies are trying to get more value from content precisely by putting it
in motion. By the same token, this means that if your content is going to be
delivered in only one format (e.g. Web), there is no need to invest in XML – it will
add little or no value.
Flexible tagging means more sophisticated searches with tag-aware search engines.
XML enables you to more easily assign meaning to your content. Search engines
that can leverage the tagging in your XML repository, as well as the inherent
structure of your content, will generate far superior results compared to simple
keyword queries. Users can search within particular nodes within your content
hierarchy, and enjoy more relevant results that have taken advantage of all the
metatagging you did.
XML has become the ―Lingua Franca‖ for aggregating disparate content
elements. XML makes it substantially easier for Web publishers to assemble
atomic bits of content (in all its varied forms) in an organized way within one site,
or indeed, on a single page. There are two sides to this. First, with respect to
accessing source content from within a CMS, XML can provide a common
unification environment to work within – a single layer between source content
35
and its actual management. Among other useful features, XML can provide a file
system type interface to database data Then, on the output side, XML can provide
a sole source – and in fact, a single paradigm -- for generating diverse consumable
formats, such as HTML, PDF, WML, and even (with some wrangling) print.
In short, XML can add value across the WCM lifecycle.
breaking documents into components, using the XML platform, it is possible to update
one document and have all the related documents, no matter where they are on the
corporate intranet, also be updated.
The site must have a viable engine for building the knowledge that
members want to get to. As with any content management system, the site must be able to
collect, reposit, and publish content. Specifically, in the context of a community site, the
content management engine must:
Allow members to actively contribute to the knowledgebase of the site. Not only
does this provide a wide base of knowledge flowing into the site, it also brings
affiliation to its maximum depth. Members are most affiliated when they
contribute as much as they receive from the community. Lots of member
contributions are good only as long as they are pertinent well-structured content.
Thus, it is particularly important in a community site to build a strong, simple
metadata framework that naturally guides members to contribute relevant well-
tagged information.
Have a repository with a fine level of granularity to support maximum
personalization.
Be able to take feeds from the semi-structured sources that will come out of the
message center.
Allow the repository to grow in a constrained way with content expiring when
needed, missing or scanty information clearly identifiable, and new content areas
able to be presented and pushed out to the members and host as needed.
The message center is the communication hub for the site. It includes any or
all of the following technologies:
Basic email
Chat and hosted forums
Threaded message boards
41
Net meetings
Net presentations
Member location services
Member classifieds and goods or services exchanges
The exact number and types of technologies used depends on the common
interest domain, the computer savvy of the members, and their degree of affiliation.
Generally, the more affiliation your community can muster, the more members will put
the time and energy into these communication channels. The best sign of a low affiliation
community is one where all of the bulletin boards are empty.
For all of these purposes, the site needs a strong and extendable user data
management system.
42
The Members
Members join the community for affiliation and knowledge. Again, when
they are fully immersed in the community, they contribute as much of these goods as they
receive. The purpose of the site and its underlying system is to facilitate the exchange of
affiliation and knowledge among the members. Much more tangibly, members come to
the site to:
The Host
A commercial host has the members as a target market for goods or services. This
host is willing to trade the cost of maintaining the site in return for exposure to the
members. In a typical scenario, the host has the original idea for the community,
creates an initial implementation of the site's system, fills the system with enough
content to be viable, and then launches the site and opens it to members. The host
continues to feed content in, administer user data, and create communication
events. The major issue to resolve in this circumstance is what rights the host will
need in order to use member data outside of the community. There is a delicate
43
balance here between the members leaving because they feel exploited and the
host leaving because they see a lot of cost and little return from the community.
A member host is one or more potential members who decide to create a web
presence. Typically, there is some existing trade or interest organization with a
current membership that organizes and funds the initial system. As with the
commercial host, the member host creates an initial implementation of the site's
system, fills the system with enough content to be viable, and then launches the
site and opens it to members. The key issues here are continued funding of the site
from often cash-strapped organizations and sufficient attention paid to the site
maintenance by what is often volunteer run organizations.
For both the commercial host and the member host the primary issue is to
make the site truly belong to the members. As time goes on, members should be the major
contributors to the site, with the host having to supply less and less content. In a high
affiliation community, members even plan and execute the communication events (chats,
net meetings, etc.) If the community is successful, it is because the host has created the
system that promotes affiliation and targeted knowledge gathering among a group of
people who naturally gravitate to a clearly stated and well founded common interest.
44
Conclusion
Web content management has evolved far beyond the management of static
html pages. Content management is more than presenting internal or external
communications, more than publishing newsletters or event listings. Content management
today is a complex set of processes, oftentimes involving a geographically distributed
production team from diverse functional areas, multiple process steps, and exceptional
amounts of information regarding publishing requirements and the targeting of content. In
this paper, I have tried to give an insight to the reader the relevance of content
management systems and their abstraction using XML.