0% found this document useful (0 votes)
64 views

Content Management Concepts

The document provides an introduction to content management concepts. It defines content as information put to use by packaging and presenting it for a specific purpose. Content management involves collecting, managing, and publishing content. A content management system is a database that organizes digital assets like images, video, and text, and provides access to them. It allows for efficient production of web pages using managed content. Content management systems help address issues that can arise from inefficient processes for developing and updating website content.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Content Management Concepts

The document provides an introduction to content management concepts. It defines content as information put to use by packaging and presenting it for a specific purpose. Content management involves collecting, managing, and publishing content. A content management system is a database that organizes digital assets like images, video, and text, and provides access to them. It allows for efficient production of web pages using managed content. Content management systems help address issues that can arise from inefficient processes for developing and updating website content.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Chapter 1: Content Management Concepts

“Content management will be one of the top ten technology trends in 2002:
Determining what an organization actually knows is only half the battle. Getting that
knowledge to the right place at the right time is the other half.”
Source- INFOWORLD, JAN 8, 2002

1.1 Introduction to Content


Computers have only recently become ubiquitous in the world of
information. Traditionally, computers have been tasked with handling data. As opposed to
data, which is a fairly concrete term, information is a very vague term. Just about any
communication (including data) can be described as information. For the purposes of this
discussion, information will be taken to mean all the common forms of recorded
communication: writing, recorded sound, images, video, and animations.

1.1.1 Information and Content


Content, stated as simply as possible, is information put to use. Content is
whatever information you want to share, in whatever form you want to share it. Although
some content is more appropriate for Content Management Systems (CMS) than others,
all content can be managed in this way. Information is put to use when it is packaged and
presented (published) for a specific purpose. More often than not, content is not a single
―piece‖ of information, but a conglomeration of pieces of information put together to form
a cohesive whole. A book has content, which is comprised of multiple chapters,
paragraphs, and sentences. Newspapers contain content: articles, advertisements, indexes,
and pictures. The newest entry to the media world, the Web, is just the same; sites are
made of articles, advertisements, indexes, and pictures – all organized into a coherent
presentation.
2

1.1.2 Types of Content


For an e-commerce site, content includes:
 Web pages and page elements such as text, graphics, controls, multimedia,
advertisements, and scripts
 Applications, middle-tier components, database procedures, and other
programming logic that enables and supports e-commerce
 Database information that directly supports the creation of dynamic Web pages or
enables the customer to execute business transactions
 Downloadable or online viewable files of all types
 Content on ancillary support sites in addition to the primary public site

Content which can be easily managed, includes:


 Basic information
 News
 Data
 Community Discussions
 Rapidly changing information
 Large amounts of complex content

1.1.3 Content Domain and Components


The content domain is the scope or range of information that is intended to
be captured, managed, and published. The content domain is directly related to your goals
of the content management system overall. In fact, the content domain is the realm of
information that needs to be controlled in order to meet stated goals. Conversely, it can be
asked, ―How will the stated goals be met?‖ The answer is, ―By providing content and
functionality.‖ Functionality, which is not covered in this discussion, is the set of features
and abilities provided to an audience for getting to content and for performing transactions
(monetary or information transfer) with an organization. Content is of interest only if it
falls within the stated content domain.
3

Once a content domain has been established and there is a clear idea of all
of the types of content, the content can then be broken up into its component pieces.
Components divide information into convenient and manageable chunks. They are a set of
discrete objects whose creation, maintenance, and distribution can be automated. They
typically share some common attributes, such as format or length, and they should be able
to ―stand on their own.‖ In other words, a component should have meaning in and of itself,
without needing the context of other components to make it meaningful.

1.1.4 The Web Content Lifecycle


Vendor software categories and your particular requirements and can be
both understood in relation to the ―Web Content Lifecycle.‖ Web content is managed in
two phases:
 Production, where content goes ―from thought to click.‖
 Delivery, where content gets actually consumed by end-users.
Both phases contain specific attributes that need to be addressed in any WCM plan. The
key system attributes for both phases of the Content Lifecycle are:

Fig 1.1 Comparison of Production and Delivery of web content


4

1.2 Introduction to Content Management and CMS

1.2.1 Content Management

As simply stated as possible, content management is a discipline that


involves the collection, management, and publication of content. Content management
concepts include the following:

 Understanding content domain, from which all of the structural decisions flow
 The notion of content components, which allow content processes (collection,
management, and publication) to be automated
 Target publications, which are the end result of any content system
 A framework, which unites all of the content into a single system of meta
information
In the broader sense, content management is a suite of applications that
allow corporations to effectively manage and deliver large amounts of diverse
information to different media through the most effective and timely means.

Fig 1.2 How content management exactly works


5

Importance of Content Management:


 One of the keys to the success of any e-commerce web site is to present fresh,
consistent, high quality content to customers. Therefore, effective content
management processes can establish better customer retention and can lead to
increased revenue.
 Inefficient, broken, or inconsistent content management processes drive
production costs up. This is due to poorly coordinated efforts, lack of repeatable
processes, and use of incompatible tools. Online retailers need to find every way
possible to contain these costs.
 Online retailers must move quickly to develop and deploy new promotional
campaigns to take advantage of current market/product conditions. Not being able
to respond promptly to these variations can cost the company time, money, and
market share.
 Posting incorrect information, such as errors in product pricing, can lead to
tremendous customer dissatisfaction, which is then compounded by poor public
relations experiences.
 Publishing inaccurate, misleading or untimely information may also result in legal
ramifications.
 Poor testing processes can lead to lower site availability, slow performance, and
ultimately to fewer site visitors.

1.2.2 Content Management Systems

A content management system (CMS) is a database that organizes and


provides access to all types of digital content - files containing images, graphics,
animation, sound, video or text. It contains information about these files (known as 'digital
assets'), and may also contain links to the files themselves in order to allow them to be
located or accessed individually. A content management system is usually used to manage
digital assets during the development of a digital resource, such as a website or
multimedia production. It might be used by staffs that digitize images, authors and editors,
or those responsible for the management of the content development process (content
6

managers). Content management systems range from very basic databases, to


sophisticated tailor-made applications. These more complex systems can be integrated
with the eventual digital resource in order to enable access to digital assets and to allow
regular updating.
The system itself is definable as a tool or combination or tools that
facilitate the efficient and effective production of the desired web pages using the
managed content.

Possible situations where a CMS is required:

 It takes a month to sign off the site's Terms & Conditions because every time any
one of your organization‘s lawyers changes a full stop, all the other ones need to
sign it off.
 You realize that your site's visual design isn't working, but it will take a month to
wrap a new design around the same words.
 Your web design agency insists on all content being signed off two months before
it goes live... and then transcribes it incorrectly.
 In a parting gesture, the Web publisher you fired replaced photos of board
members with sheep.
 You can't update one section of the site because another section has a major
overhaul underway. You can either publish the entire site, with both complete and
incomplete updates, or hold until both are completed.
 You have to work through the night to publish the company's results at market
opening time because you don't have a secure area to develop them in advance.
 You send email promotions about 'upgrading' to Windows2000 to registered Mac
users.
 You're employing an army of skilled web publishers just to update the system
requirements of your software.
7

1.3 Summary:
The entire introduction can be summarized as follows:
Content:
Content is in essence, any type or 'unit' of digital information that is used to populate a
page. It can be text, images, graphics, video, sound etc - or in other words - anything that
is likely to be published across an inter-, intra- and/or extranet.

Content Management:
Content Management is effectively the management of the content described above, by
combining rules, process and/or workflows in such a way that centralized webmasters and
decentralized web authors/editors can create, edit, manage and publish all the content of a
web page in accordance with a given framework or requirements.

Content Management System:


A CMS is a tool that enables a variety of (centralized) technical and (de-centralized) non
technical staff to create, edit, manage and finally publish a variety of content (such as text,
graphics, video etc), whilst being constrained by a centralized set of rules, process and
workflows that ensure a coherent, validated website appearance."
8

Chapter 2: CMS Dissected

"Content management software extends the capabilities of pre-Web document


management tools to make data available, as it is generated, to employees, business
partners and consumers across intranets, extranets and the Internet."
Source-Stephen Phillips, INFORMATION AGE

2.1 Innovative viewpoint of CMS:


Ideally, your CMS is like a supermarket of content. Manufacturers (content
contributors, that is) package their products (content) in containers (components) that they
clearly and consistently label. The manufacturer knows generally what you can use the
product for but not what any particular cook wants to do with it. The supermarket
managers (the CMS administrators) organize categorize and display product in a way that
enables shoppers to easily find and select the most appropriate products. This overall
organization lies on top of the organization that the manufacturers of the products impose
inside the individual containers. They organize a box of macaroni and cheese, for example,
into a package of cheese powder and an exact portion of macaroni. The store displays the
box of macaroni and cheese in the pasta section next to the other packaged pastas. The
containers organize their contents, and the store organizes the containers.
A consumer (a publication creator) comes in and selects just the right containers
(components). The consumer reorganizes and blends the particular products into a unique
and tasty dish (the publication). Some of the products are recognizable within the dish,
and some aren't. All the products are out of their original containers and appear as a single
unified whole.
Without the original chunking of the product into standard containers, the consumer can't
count on the amount or composition of the product. Without the further organization of
the containers into an overall storage and management system, the consumer can't find the
9

product that she needs. Our system of food creation, management and consumption, just
as does a CMS, depends on well-packaged, standalone chunks, that you can mix and
match in a variety of ways.

2.2 Necessity of CMS


Traditional tools and methods of building web pages were/are not only
labour intensive but also inefficient and extremely costly. For example, something as
simple as changing a single word in a piece of text on a web page with traditional methods
would have to be done by someone who understood HTML. This process not only
bottlenecked all creation of information and content through IT departments, but it also
prevented more effective use of the IT skills within that department (purchased usually at
considerable cost).

Content management systems are essential for large or even small-scale


projects that involve the capture or creation of digital assets. They also are increasingly
necessary for the creation of any but the most basic websites. Managing the capture or
creation of digital images requires metadata to be recorded that documents the capture,
ownership, location and licensing conditions relating to each image. Even for a few dozen
images, this may add up to hundreds of different pieces of information, the management
of which would not be possible without some automated assistance.

The desire to increase the amount of information being contained in web


pages and the need to include an ever widening circle of groups into the 'modern' web
publishing process has exacerbated this situation to the point that many web management
teams are no longer able to cope with the growing demand on their resources. For this
reason, the use of templates that draw on content held in a database is a vital management
tool. The websites that don't use a CMS will become choked, out of date and most
importantly in a world where the other websites contain more information that changes on
a more regular basis, they will become stale in comparison and visitors (both internal and
external) will stop coming.
10

The world of the webmaster or web team being the sole method of getting
information onto a web site is over. It is not so much a case of whether you should
implement a CMS - but more a case of when and which one...

2.3 Required capabilities of CMS

Key requirements may include:


 Integrated authoring environment
The CMS must provide a seamless and powerful environment for content creators.
This ensures that authors have easy access to the full range of features provided by the
CMS.
 Separation of content and presentation
It is not possible to publish to multiple formats without a strict separation of
content and presentation. Authoring must be style-based, with all formatting applied
during publishing.
 Multi-user authoring
The CMS will have many simultaneous users. Features such as record locking
ensure that clashing changes are prevented.
 Single-sourcing (content re-use)
A single page (or even paragraph) will often be used in different contexts, or
delivered to different user groups. This is a prerequisite to managing different platforms
(intranet, internet) from the same content source.
 Metadata creation
Capturing metadata (creator, subject, keywords, etc) is critical when managing a
large content repository. This also includes keyword indexes, subject taxonomies and
topic maps.
 Powerful linking
Authors will create many cross-links between pages, and these must be stable
against restructuring.
11

 Non-technical authoring
Authors must not be required to use HTML (or other technical knowledge)
when creating pages.
 Ease of use & efficiency
For a CMS to be successful, it must be easy to create and maintain content.

2.4 Typical Functions (Constituents) of CMS

Fig 2.1 Content management procedure


12

The functions (constituents) of CMS can be divided into four main categories
 Collection (Authoring, Aggregation, Conversion)
 Workflow
 Storage
 Publishing
A CMS manages the path from authoring through to publishing using a
scheme of workflow and by providing a system for content storage and integration.

Fig 2.2 CMS functional scope and the content life cycle

2.4.1 Collection (Authoring, Aggregation and Conversion)

The collection system is the tools, procedures, and staff that employed to
gather content, and provide editorial processing. When content is collected, it is brought
inside the content management system. The content collection process is one of adding
new components to the existing repository. Content collection can be broken into these
categories:

1) Authoring:

Authoring is the process by which many users can create Web content
within a managed and authorized environment. It is basically the process of creating
content from scratch. Authors almost always work within an editorial framework that
allows them to fit their content into the structures of a target publication. Authors should
also be made aware of the framework that has been developed for the downstream use of
13

the content. Authors are in the best position to tag their creations with meta information.
So, to whatever extent possible, authors should be encouraged and empowered to
implement the meta information framework within their content.

The role of authoring is performed by graphic artists, videotape production


crews, photographers, technical writers, advertising writers, application developers, Web
page developers, lawyers, human resource personnel, marketers, or anyone else that
produces original material for the Web site. Authored content is often put under version
control through the use of document management systems or source code management
systems.

2) Aggregation:

Aggregation is the process of gathering pre-existing content together, for


inclusion in the system. Aggregation is generally a process of format conversion followed
by intensive editorial processing. The conversion changes the formatting of the content,
while the editorial processing serves to segment and tag the content for inclusion in the
repository. Obviously, the closer the original content is editorially (its style and
―elementation‖ and its componentization and the meta information that has been entered)
to the content management system‘s framework, the easier the aggregation is.

3) Conversion:

This is the process of changing the elementation scheme (i.e., the tagging
structure) of the content. In this process the structural as well as the format related codes
must be handled. One conversion problem comes in identifying structural elements
(sidebars or footers, for example) that have only format codes marking them in the source
content. Another problem comes in transforming formatting elements that don‘t exist in
the target environment.

2.4.2 Workflow
Workflow is the management of steps taken by the content between
authoring and publishing. Typical steps could be link checking and review/signoff by a
14

manager or legal team. If workflow has existed at all in traditional Web site management
it has been an off-line affair and not built in to software processes.
The workflow system is the tools, procedures, and staff that you employ to
assure that the entire process of collection, storage, and publication runs effectively and
efficiently, according to well-defined timelines and actions. A workflow system supports
the creation and management of business processes. In the context of a content
management system, the workflow system sets and administers the chain of events around
collecting, ―repositing‖, and publishing.

To be successful, the workflow system should:

 Extend over the entire process. Every step of the process, from authoring
through final deployment of each publication, should be able to be modeled and
tracked within the same system.
 Represent all of the significant parts of the process including:
o Staff members
o Standard processes
o Standard tools and their functions
o Time and data flow with a variety of transitions and charting
representations

 Represent any number of small cycles within larger cycles, with some sort of drill
down to the appropriate level of detail.
 Have a visual interface that shows cycles and players in the process graphically.
 Make meta information in the repository available. The workflow system
should not have to store its own staff members, content types, outlines, and other
meta information. It should be able to read the data that is stored in the repository,
and make it available when appropriate in its dialogs and selection screens. For
example, an editor might select a content type for an article in a workflow screen
order to forward it to the next reviewer. The list of content type selections should
come from the repository, not from the workflow system‘s own internal data store.
As an alternative, the workflow‘s data store (which would need to be some sort of
15

open database) could be considered part of the repository that is responsible for
storing certain meta information.
 Provide a conduit to the repository for bottom up meta information. Whether
or not the workflow system stores meta information, its screens will be a natural
place for staff to enter meta information. Data such as author, status, and type are
naturally entered in workflow screens. This data must be able to be transmitted
into the repository from the workflow system.

Fig 2.3 Workflow management


16

A CMS must meet the following minimum workflow requirements:

 A publisher can assign users to a small number of predefined roles, such as


"Author," "Editor," "Designer" and "Manager." He/she may modify the predefined
roles as necessary.
 A publisher can formalize a production process into a checklist consisting of a set
of tasks. He/she can specify dependencies among tasks to guide the order in which
the production process occurs. The system must provide a simple default checklist,
which the publisher can substitute or modify as desired.
 A publisher can start a new production process based on a checklist. A typical
process centers on creating, producing and deploying a single item. The publisher
can assign production tasks (i.e. "Author," "Edit" and "Deploy") in the checklist
either to roles or to individual users.
 Staff users can receive notification of their tasks via e-mail. They can also review
and execute their assignments from their workspace.
 Finally, and most important for many organizations, the system must be flexible
enough to deviate from the process as needed. Content items may need to be
reworked and returned to a previous user when an iteration isn't defined, may need
to be seen by additional personnel for approval, or may need to skip steps if there
is an acceleration of publishing deadlines. See Figure 2.3

2.4.3 Storage:

Storage is the placing of authored content into a repository. Content is


usually stored directly in file systems or version control systems. Beyond this it is also the
versioning of the content, so that access conflicts between multiple authors cannot arise
and so that previous versions can be found and restored if required. It can also mean
breaking down content into structured, meaningful components such as <job title>,
<course> or <description> which are stored as separate elements. These can be stored as
records in a database or as Extensible Markup Language (XML) files.
17

It is also the repository of all content and meta information, as well as the
processes and tools employed to access and manage the collected content and meta
information. The repository holds all of the content and meta information of the system.

Repositories perform the following functions:

 Store content. The repository may be one or a set of databases of various kinds. It
can include the file system and network resources of the host computer. If the
repository is distributed among databases, one database is often in a master
position, organizing the information in the others. The repository must be able to
store:
o Textual content- This content is either flat text, or more often markup. In a
relational database the markup is usually saved as text within fields. In an
object database, the markup is broken into all its elements and made
accessible.
o Components- The repository must be able to link content into manageable
components. The better the repository, the greater the ability to create,
modify, and find components.
o Binaries and file-based data- Whether in the file system or inside a custom
data store, the repository needs to be able to effectively manage a range of
data, media, and executable files.
o Meta information- The repository must be an effective store of the variety
of meta information that needs to be collected. Some of this meta
information is coded into the structure of the repository itself (for example,
a database table can be created especially to store meta information for a
particular component type). However meta information is stored, the
repository must provide for the amount and kind of meta information
needed to describe your content.
 Select content. The repository must allow access and selection of content from
within itself. The repository should offer fielded querying to find components with
particular meta information associated with them, as well as full text querying
18

against text in the system. In repositories with multiple databases it can be difficult
to issue a search that queries all databases in a consistent way.
 Manage content. The repository must facilitate these management tasks:
o Security, including read and write access permissions for components
o User maintenance that interfaces to system user management resources
o Content status keeping and tracking for staging publications, workflow
triggers, and maintenance operations
o Transaction logging and rollback of major changes in individual databases
or to the repository as a whole
o Bulk automated processes that run periodically against subsets of the
repository
o Input/output processes that load in and push out information
 Connect to other systems. The repository must be able to communicate over the
network with a variety of clients. Ideally, the repository should be able to
communicate with LAN-based Web browsers, Internet-based Web browsers, and
LAN- or internet-based non-Web client applications. Internet connectivity to the
repository enables authoring and other publishing process to take place from
multiple locations, a frequent requirement for today‘s content-intensive Web sites.

2.4.4 Publishing

Publishing is the process by which stored content is delivered. Traditionally


this has meant ‗delivered to the Web site as HTML‘. However, it could also mean as an e-
mail message, as an Adobe PDF file or as Wireless Markup Language (to name but a few).
In the near future multiple delivery mechanisms will be required, particularly as
accessibility legislation starts to bite.
19

Fig 2.4 Content publishing to PDA and web browser

Content publishing describes the process by which content is drawn out of


the repository and formatted into Web sites and other publications. To be flexible enough
to produce a wide range of publications, the publishing system must include:

Fig 2.5 Types of Content available for templates


20

 Publication templates. These templates draw content into the appropriate context
for each particular publication. The templates must instantiate:
o The formatting syntax and surrounding standard text and media elements
of the target publication platform
o The page structure and syntax of the target publication platform
o Content components and meta information on the target pages
o Standard text and binary files from the repository onto the target pages
 A full programming language. The wider the publications and more open the
repository, the more complexity there will be in transforming content in the
repository into a publication. The system needs to have complete programming
abilities so that this complexity can be managed. The language should provide:
o All of the standard variable types and control structures of major
programming languages.
o Complete access to the repository databases and files.
o Access to external objects and libraries.
 Runtime dependency resolution. When content is added to the repository it
cannot be determined where and when it will be used in a publication. Therefore,
the publication system must be able to read and resolve content links when the
publication is being produced. For example, if component A has a link to
component B in the repository, but component B is not being published, then A‘s
link must be suppressed by the publication system to avoid a bad link in the
publication.
 File and directory creation. The publication system must be able to create the
appropriate file and directory set for the target publication. Additionally, the
system must have some mechanism for deploying the built publication to its final
storage location.
21

2.5 Working of CMS

Fig 2.6 How content management works

Subject experts build content in a separate environment. The server takes


the content, inserts it into the correct template and sends it all, neatly wrapped up, to end
users But that‘s just the technology side of CM systems. CM‘s other aspect is the way it
addresses the workflow. CM streamlines how your design gets approved and onto the
server.

Fig 2.7 Content management work flow

Create a design in whatever tool/environment you are comfortable


with.Once it is tested and ready to go, you pass it to your manager or editor or boss or
whoever okays your design. If it‘s approved, it‘s sent on to the server. If not, you get notes
and it is sent back to you, all within the CM environment: no email, no voice mail, no
printouts of your design with red ink and yellow sticky notes all over it. The same process
22

happens on the content side. The end result is that even though it‘s easier for content and
design to publish, there are still strict controls as to what makes it to the live server.

2.6 Selection
2.6.1 Choosing a Content Management System

With the multitude of CMS solutions that now exist on the market, it is
imperative that you choose your solution very wisely. Equally - they may have another list
of 'features' that is not so favorable to your environment. Unless you have a very specific
set of requirements to present them with, the likelihood is that neither party will find out
whether the product is a true fit - until it is too late.
A few issues to ponder when selecting a CMS:

 Workflow and scheduling. Large organizations need a CMS that sends


automatically triggered e-mails to everyone who needs to see a document
before it posts. A CMS should also let back-end users choose the posting date
and time in advance—or IS staff members will eventually end up posting stuff
in the wee hours of the night.

 Database compatibility. The whole point of the Web is to leverage your


existing data and use it to sell the company along with its products or services.
Don't accept any solution that demands you restructure existing databases to
make it easier for a CMS to handle the data.

 Multilevel security. Generally, one person per department should have the
clearance to post content to a staging server. In all cases, the authority to
actually post content to the live site should rest with one or two people.

 Syndication and personalization. To distribute content around the Web,


you'll need a CMS that maps content objects to XML data types. And if your
site will deliver custom pages based on user preferences, you'll need a CMS
that breaks documents down to a granular level so that only relevant material
23

gets served.

 Offline integration. If your company produces lots of print material, you may
be a candidate for a system that integrates offline and online publishing. Both
Openpages' ContentWare and Worldweb.net's Expressroom I/O hook into
QuarkXPress so that master documents can ensure consistent offline and
online content.
The fact that an organization‘s requirements are what should determine the choice
of CMS is also one of the reasons why there is no such thing as THE content
management solution.

2.6.2 Choosing the Right Content Management Tool

Organizations are turning to a wide range of tools to handle content


management tasks, from document management systems to portals and groupware to
content management solutions for contributor-intensive sites to full lifecycle solutions. To
determine the best choice organizations must understand their business objectives for the
site and how their site needs may change over time.
For example, if the primary goal of the Web site is to give access to
documents that may be stored in an underlying document management system then a
document management solution with Web capabilities is the right choice. Or if the
objective is to aggregate multiple data sources, both internal and external, then a
portal or syndication product with strong data interface capabilities is the most likely
candidate.
24

2.7 Types of CMS


 Enterprise Platforms
 Upper Mid-tier Packages
 Publishing-Oriented Portals and Application Servers
 Departmental / Mid-market Products
 Low-Cost Products
 Open-Source Packages
 ASPs

It is also meaningful to divide products according to:


 Their roots, and
 How they address the WCM lifecycle.

2.7.1 Product Roots


Understanding the origins of different packages enables you to see deeper
into their relative strengths and weaknesses. WCM universe can be divided into 3
different ancestries:
 Pure-play Web Content Management Packages
These currently predominate in the marketplace.
 Application Servers and Enterprise Portals
 Document Management / Workflow Origins
After a sluggish start, established Document Management companies have
moved aggressively into the Web Content Management arena in the past two years. These
companies come out of client-server roots (sometimes using SGML – a precursor to
XML), and therefore bring experience with large document and asset repositories and
complex publishing processes. Indeed, they sometimes store content in its native file
format as opposed to a database. After indexing and gathering metadata, the package then
converts it to HTML only for publishing.
The former DM vendors are especially well suited to reference-oriented
projects or other requirements that call for long, complex, hierarchical documents (like
25

product manuals). They have been quick to adopt XML, but often don‘t make much room
for relational data in their models.

2.7.2 Lifecycle Focus


Some packages strive to address the full WCM lifecycle, often through a
―suite‖ of modules. Others focus principally on the Production end of the cycle, or
alternatively, the Delivery facets of Content Management. There is no inherent advantage
to any category; your needs should drive you one direction or another.
 Production-Oriented
 Delivery-Oriented
 Full-Cycle Packages
Production-Oriented:
These products focus primarily (though not always exclusively) on the
Production phase of the WCM lifecycle. They address everything from Role Management
to Library Services, Workflow, and Indexing, but then ―hand off‖ content to other
software – application servers or web servers – to do the actual publishing and distribution.
This model is increasingly popular. Workflow and Library Services are trendy right now
because that is where potential time- and cost-savings lie. (Although even full-cycle
products increasingly recognize the value of specialization and are integrating with
application servers for publishing.) For a complete WCM solution, you may want bundle a
Production-oriented product with a Delivery-oriented package.
Delivery Oriented:
These products focus principally on run-time aggregation of content and
other services. The field includes portals, application servers, and combinations of both.
26

2.8 Features of CMS

Fig 2.8 CMS Feature onion

The 3 core features of CMS are:

 Versioning
So that groups of individuals can work safely on a document and also
recall older versions.

 Workflow
So that content goes through an assessment, review or quality assurance process.

 Integration
So that content can be stored in a manageable way, separate from web site design
‗templates‘, and then delivered as web pages or re-used in different web pages and
different document types.
27
28

2.9 Benefits of CMS

A CMS enables online information to be fresh, consistent and a high quality.

 Reduced customer (internal & external) dissatisfaction created by having incorrect


information.
 Reduction in legal issues created by displaying incorrect information.
 Increased value perception of the information provided.
 There is a higher likelihood of a customer re-visiting the site.
 Some search engines rank pages that change frequently higher in search results.

A CMS facilitates the re-use of content

 The re-use of content across multiple web sites or pages creates an enhanced
productivity value.
 The re-use of web output to broadcast over e.g. DTV, Mobile Phones, Kiosks
creates new audiences.
 The syndication and re-use of content from other suppliers is made easier.

A CMS ensures enhanced productivity & job satisfaction of the web team

 Webmasters can focus on technology and areas such as redesign and functionality.
 A more appropriate use of the web team results in lowered production costs.
 Enables a quick response to changes on competitor‘s web sites.

A CMS enables decentralised content creation

 This enables global contribution of content and information.


 The 'speed to market' of changes and new content is improved by avoiding the IT
bottleneck.
29

 Content creators/editors are able to take ownership/responsibility for the


information they provide.

A CMS facilitates centralized workflow, approval processes and rules

 Enables decentralized contribution without loss of controlled centralized process.


 Provides and effective audit trail that allows production with accountability.
 Ensures a controlled flow of content around internal processes.

A CMS provides either a competitive advantage or eliminates a competitive


disadvantage

 Increasingly the web site is the window that investors use to evaluate a company.
 A dynamic, changing website creates the impression of a forward thinking
company.
 It enables a 'speed-boat' response to changes in the competitive environment.
30

Chapter 3: CMS and XML

“XML is a great way to store data in a way your organization can digest and
manage it.”
Source- Dell Case Study

3.1 The HTML Problem

HTML is great, but when it comes to running a large-scale Web site it


presents real problems. HTML is needed to express creativity and describe the user
experience, but as a way of describing the data behind a Web site so that it can be reused
and manipulated, HTML doesn‘t make the grade. However hard the standard setters try, it
simply isn‘t possible to reverse the trend toward using HTML tags for visual effect.

3.1.1 HTML: Too Rigid and Too Flexible


One aspect of the problem is that HTML is both too rigid and too flexible.
Another is the conflict, inefficiency, and duplication inherent in a page-oriented
publishing paradigm. Let‘s look at both of these problems. What do we mean by too rigid
and too flexible? It is too flexible in the sense that browsers only loosely enforce the
SGML definition of HTML. They tolerate badly written documents and do their best to
present them as intended. Browser idiosyncrasies and unique features are too numerous
to count, and achieving cross-browser compatibility is a time-consuming affair. If
the content of a page is to be treated as data with any integrity, this kind of looseness
can‘t be tolerated.
On the other hand, HTML is too rigid in two ways. The first is that there is
no formalized method for extending HTML in specific applications, and in a sense
there are already too many tags in the language. The addition of custom tags by Netscape
and Microsoft to achieve specific effects was widely condemned, at the same time the
abuse of the existing tags in the service of tightly controlled formatting became the norm.
31

The second way in which HTML is too rigid is, quite simply, the
impossibility of separating format and content in a meaningful way. It is true that with
CSS you can radically alter the way page elements are presented, but a table is still a table
and a list is still a list. More importantly, a page is still a page, and that is the problem
to be looked at next.

3.1.2 Breaking the Page Paradigm


When you browse a Web site, you experience it as a set of pages. True, with
dynamic HTML those pages might have application-like functionality built into them, but
there will still be a sequence of screens or pages. In the conventional HTML paradigm,
authors prepare pages or they program ASP or CGI scripts that map closely to
pages, perhaps pulling in a set of data from a database on the fly. Databases may be
used for repetitive tabular data on the site, or to store data submitted via a form, but
for everything else, pages are hard-coded.
As highlighted earlier, the data owned by any one part of an organization
will span many pages, while at the same time any one page may incorporate data from
many groups. Two things can happen: either the page structure ends up being modified
to mirror the internal structure of the organization rather than the information
needs of users, or the organization has to build processes to handle the matrix of
ownership. In the second case, it isn‘t always clear who owns the pages, and they
may not be maintained properly.
The bottom line is that although we still work in a page-oriented publishing
paradigm, there will be conflict, inefficiency, and duplication. The Internet will fail to
live up to its promises. Information owners need to be able to maintain their data
easily without combing the site for every instance of data relating to their domain,
and without worrying about formatting issues. Site designers need to be able to
set presentational standards that will be consistently applied across the site without
worrying about the data they are dealing with. In other words, the page paradigm
has to be broken!
32

3.2 The SQL Problem


The answer to the HTML problems, according to some content
management suppliers, is to break down the site into HTML/ASP templates and a set of
SQL tables. That way you can isolate look and feel, easily manage the data, and enable
yourself to publish far more data. It is fast, highly scalable, robust, and the data is held in a
completely media-neutral format. If the majority of your site is highly consistent in format
(for example, you may have thousands of news articles, classified advertisements, or
product specifications), and you don‘t have the challenges of providing localized content
to a variety of markets, then SQL might be the right answer. However, in many cases
where this technique is used, much secondary data is incorporated in the templates, and
localization and maintainability are lost.
If you are determined, you can model the structure of most classes of Web
content using SQL. However, the more complex the page type, the more tables, keys,
and joins it requires and more the performance suffers. What‘s more, the more difficult
to the initial design, the more difficult it is to adapt later. If you want flexibility of design
and ease of reuse, SQL rapidly shows its limitations and XML shows its strength.

3.3 Data-Backed Web Sites vs. Data-Driven Web Sites:


The Template Approach
When XML is applied properly to Web site design, there is a big difference
from the template-driven approach. Templates basically consist of fixed (or slowly
changing) content with slots to be filled from a regularly updated database or even a live
data source. SQL databases are great for storing the things you want to place in the slots,
even if they contain some markup (preferably XML markup). This is what we call a
data-backed site.
XML (in tandem with XSL) puts control in the hands of the data author.
The recursive processing of the data document by the style sheet means that the data
structure is the primary driver of the document assembly process, not a script or a
33

template. In the system, we still have a kind of template, but it is at a very high level, and
completely empty. There is no fixed content at all. This is what we call a data-driven site.

3.4 Why XML for CMS?

XML has a natural place in Web and Enterprise Content Management.


Here‘s why:

Fig 3.1 Content in/out using XML abstraction

 It is a completely open standard based on common syntax, but infinite semantics.


That means everyone who uses it needs to follow basic rules (that makes it
portable), but you don‘t have to bend your business to a predefined data model.
Your content will probably have its own unique structure – or ―semantics‖ – and
XML will extend with you.

 It enables universal data interchange. Disparate systems and enterprises can share
34

content via XML without having to expose internal data models or invest in
complex integrations. Therefore, XML is most useful for ―data in motion.‖ Of
course, companies are trying to get more value from content precisely by putting it
in motion. By the same token, this means that if your content is going to be
delivered in only one format (e.g. Web), there is no need to invest in XML – it will
add little or no value.

 The ―eXtensible‖ approach typically enables more granular control and


adaptation. The holy grail of content management is separating content from site
map (―where it lives‖) and content from layout (―what it looks like‖). This enables
you to repurpose and redeploy the same content to multiple locations, devices, and
skins. XML does precisely this: it tells you what the content is, not where it resides
or how it appears. Databases can accomplish this too, but it is generally easier to
update an XML document or schema, and in any case, XML is better suited to
hierarchical content structures that you typically find in text documents (as
opposed to relational structures that typify catalogs).

 Flexible tagging means more sophisticated searches with tag-aware search engines.
XML enables you to more easily assign meaning to your content. Search engines
that can leverage the tagging in your XML repository, as well as the inherent
structure of your content, will generate far superior results compared to simple
keyword queries. Users can search within particular nodes within your content
hierarchy, and enjoy more relevant results that have taken advantage of all the
metatagging you did.

 XML has become the ―Lingua Franca‖ for aggregating disparate content
elements. XML makes it substantially easier for Web publishers to assemble
atomic bits of content (in all its varied forms) in an organized way within one site,
or indeed, on a single page. There are two sides to this. First, with respect to
accessing source content from within a CMS, XML can provide a common
unification environment to work within – a single layer between source content
35

and its actual management. Among other useful features, XML can provide a file
system type interface to database data Then, on the output side, XML can provide
a sole source – and in fact, a single paradigm -- for generating diverse consumable
formats, such as HTML, PDF, WML, and even (with some wrangling) print.
In short, XML can add value across the WCM lifecycle.

3.5 Using XML for CMS


XML can stand behind most electronic information initiatives like content
management. XML enables you to add the structure that you need to content to find it and
deliver it. Suppose, for example, that you're a manufacturer and have a Web site that tells
your distributors about all about the products that you provide. By using XML, you can
create a system behind the site that matches what you know about a distributor to all
product information that distributor may want.

In XML parlance, the product information is tagged in such a way that it


can be matched to a distributor's profile. If you create a strong XML framework, it not
only serves this personalization feature, but it can also form the basis of knowing how to
bring new content into the site, how and when to update information, and how to build a
variety of outputs, not just a Web site, from your content. Obviously, as the size and
complexity of your content increases, so does your need for the organization that XML
gives you. See figure 3.2

It essentially creates a repository to take existing content in its various


forms - such as Word documents, PDFs, PowerPoint presentations, etc. - and turn it into
an XML format. Once broken down into XML, the document can be componentized,
making it much easier to link to other documents and update. Why is that important?
Consider what happens to a company that produces products that are constantly being
updated. Specifications related to that product's features might be contained in documents
scattered throughout a company's intranet, such as marketing materials. Each time it is
updated, someone has to go and find all of the related postings, and update them - or as is
often the case, the information is not updated resulting in an inconsistent message. By
36

breaking documents into components, using the XML platform, it is possible to update
one document and have all the related documents, no matter where they are on the
corporate intranet, also be updated.

Data is stored in two ways -- either in XML or in a relational database like


Oracle or SQL Server. And even if it's stored in XML -- it's still probably kept in a
relational database, in XML. It's going to be a long time before we see relational databases
phased out in any way.

Does it matter if it's stored in XML or directly in a relational database?


Depends on the vendor. XML tends to be more easily re-purposed i.e. if you
want to move your PR text from a browser to a set-top box to a WAP enabled
device, it's probably going to take your XML savvy designers and developers very little
time to do it.

By opening the benefits of content management to a growing middle


market, XML delivers on the unrealized promise of the technology. Instead of simply
managing content as a production process or overhead expense, companies can deploy
content strategically in both internal and external applications. The supply chain shown in
Figure 3.2 allows business partners to share content through a common repository. Its
XML capabilities allow for content exchange automatically among cooperating
enterprises
37

Fig 3.3 Supply chain for automatic content exchange

Central to the concept of content management—and one of the things that


XML is designed to handle well—is separating content from its presentation. Hence, a
new technology wave based on XML standards is sweeping through the world of content
management.
38

Chapter 4: Case Study of CMS


Online Communities and Content Management

What are Online Communities?

Communities are groups of people tied together by some common purpose


or kinship. It is no different online. While lots of people talk about creating online
communities, few do so from this sort of understanding. Generally, their concept is to be
"the place" to go for some sort of information. They add a chat or a threaded discussion,
collect user data, and call it a community. However, without the core of common purpose
or some sort of kinship (in the widest sense of sharing some important aspect of life) these
sites will never fulfill their goals. To succeed, an online community needs to fulfill its
members' needs for affiliation and knowledge. Affiliation is the members' desire to belong
to something. Knowledge is the members' desire to know something. The web system
behind the online community needs to support affiliation and knowledge.

What are the components of an online community site?

Fig 4.1 Major components of online community system


39

The Common Interest Domain


The common interest domain is the boundary around the community. It is
the realm of content and interaction. It is the basis of a community. For all of the members,
there is a reason why they would come together. Specifically, the domain is a statement of
purpose for the community. The statement can be as general as "We love Barbie dolls" or
as specific as "We are all female C++ developers working at atomic accelerators on
software designed to track the trails of sub-nuclear particles." Whether specific or general,
the statement must clearly define the entrance requirement of the community. It is
absolutely the first thing that must be determined about the community and the rest of the
structuring of the system should spring naturally from it. The common interest domain
defines what members will become affiliated to and on what subject they want knowledge.

The Personalization/Data Gathering System

Personalization, in general, (see Personalization and Content Management)


is the process of collecting user data and using it to sub-select content to present to that
user. This is true for the community site too. However, in addition to using member data
to direct content to the member, on the community site member the data can be used to
target other members to the member. This member match up provides for a much greater
sense of affiliation. Members, in the end, want to be affiliated to other members -- not to a
Web site. To perform this match, the system must collect member data that falls within the
constraints of the common interest domain but narrows the focus so that members who
share specific interest can be found. Of course, there is more to making friends than
answering questions the same way. So the successful member matching system will need
to be open and configurable enough to let in some subjectivity.

On the more mundane side of matching members to content, as in any


personalization system, the site must have mechanisms for:

 Gathering member data


 Tagging content
 Mapping the type of data gathered to the appropriate tags in the content
40

 Dynamically rendering the selected content within a standardized page

The Content Management Engine and Knowledgebase

The site must have a viable engine for building the knowledge that
members want to get to. As with any content management system, the site must be able to
collect, reposit, and publish content. Specifically, in the context of a community site, the
content management engine must:

 Allow members to actively contribute to the knowledgebase of the site. Not only
does this provide a wide base of knowledge flowing into the site, it also brings
affiliation to its maximum depth. Members are most affiliated when they
contribute as much as they receive from the community. Lots of member
contributions are good only as long as they are pertinent well-structured content.
Thus, it is particularly important in a community site to build a strong, simple
metadata framework that naturally guides members to contribute relevant well-
tagged information.
 Have a repository with a fine level of granularity to support maximum
personalization.
 Be able to take feeds from the semi-structured sources that will come out of the
message center.
 Allow the repository to grow in a constrained way with content expiring when
needed, missing or scanty information clearly identifiable, and new content areas
able to be presented and pushed out to the members and host as needed.

The Message Center

The message center is the communication hub for the site. It includes any or
all of the following technologies:

 Basic email
 Chat and hosted forums
 Threaded message boards
41

 Net meetings
 Net presentations
 Member location services
 Member classifieds and goods or services exchanges

The exact number and types of technologies used depends on the common
interest domain, the computer savvy of the members, and their degree of affiliation.
Generally, the more affiliation your community can muster, the more members will put
the time and energy into these communication channels. The best sign of a low affiliation
community is one where all of the bulletin boards are empty.

The system behind the community must obviously support these


communication vehicles. In addition, it must harvest from them and successfully transition
their semi-structured, real-time output into more enduring knowledge that can be
delivered along with the rest of the content in the site's knowledgebase.

User Management System

Member data is essential to the system behind the community. In addition


to being the basis for personalization, the system needs this data for a variety of other
purposes such as:

 Member bulletins and global emails


 Member rights to particular content in the knowledgebase
 Member rights to the communication services in the message center
 Member rights to submit and modify content
 Administration of member fees or other initiation rites

For all of these purposes, the site needs a strong and extendable user data
management system.
42

The Members

Members join the community for affiliation and knowledge. Again, when
they are fully immersed in the community, they contribute as much of these goods as they
receive. The purpose of the site and its underlying system is to facilitate the exchange of
affiliation and knowledge among the members. Much more tangibly, members come to
the site to:

 Find new content of interest


 Participate in a communication forum
 Find members to interact with
 Contribute content
 Get updates on content that they have previously stated is of interest
 See what is going on

All of these activities consist of uploading and downloading messages, files,


content, and data. The goal of the site's system is to use these mundane upload and
download actions in such a way that they create a sense of place and belonging in the
members.

The Host

The community's host is the organization that is in charge of the site's


infrastructure and maintenance. There are two typical hosts:

 A commercial host has the members as a target market for goods or services. This
host is willing to trade the cost of maintaining the site in return for exposure to the
members. In a typical scenario, the host has the original idea for the community,
creates an initial implementation of the site's system, fills the system with enough
content to be viable, and then launches the site and opens it to members. The host
continues to feed content in, administer user data, and create communication
events. The major issue to resolve in this circumstance is what rights the host will
need in order to use member data outside of the community. There is a delicate
43

balance here between the members leaving because they feel exploited and the
host leaving because they see a lot of cost and little return from the community.
 A member host is one or more potential members who decide to create a web
presence. Typically, there is some existing trade or interest organization with a
current membership that organizes and funds the initial system. As with the
commercial host, the member host creates an initial implementation of the site's
system, fills the system with enough content to be viable, and then launches the
site and opens it to members. The key issues here are continued funding of the site
from often cash-strapped organizations and sufficient attention paid to the site
maintenance by what is often volunteer run organizations.

For both the commercial host and the member host the primary issue is to
make the site truly belong to the members. As time goes on, members should be the major
contributors to the site, with the host having to supply less and less content. In a high
affiliation community, members even plan and execute the communication events (chats,
net meetings, etc.) If the community is successful, it is because the host has created the
system that promotes affiliation and targeted knowledge gathering among a group of
people who naturally gravitate to a clearly stated and well founded common interest.
44

Conclusion
Web content management has evolved far beyond the management of static
html pages. Content management is more than presenting internal or external
communications, more than publishing newsletters or event listings. Content management
today is a complex set of processes, oftentimes involving a geographically distributed
production team from diverse functional areas, multiple process steps, and exceptional
amounts of information regarding publishing requirements and the targeting of content. In
this paper, I have tried to give an insight to the reader the relevance of content
management systems and their abstraction using XML.

You might also like