Metadata and The World Wide Web Citeseerx - 59eccd5d1723ddea75f92526
Metadata and The World Wide Web Citeseerx - 59eccd5d1723ddea75f92526
Jane Greenberg
The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A.
a
These other communities are not entirely separate from the information
b
resource community in that they all deal with objects—often information In another work Buckland explains that information is an entity or
objects. The difference is in the protocols and emphases that direct the process that can be either tangible or intangible (see Ref. [3]). Metadata
acquisition, organization, and access the objects found in these different is applicable to the information entity (object) presented all of
communities. Buckland’s models.
transaction’’ metadata capture the purchase activity for a to catalog computer disks, CD-ROMs, videos, multimedia
consumer good. In both of these examples, metadata resources, and other electronic resources, and they have
promotes specified functions surrounding the life of the
designated object—the information resource in the first
been doing so for close to two decades. In fact, a series of
specialized tools have been developed to support library
M
case and the purchase activity in the latter case. (Note that cataloging of electronic resources (e.g., Refs. [8–11]). The
the purchase activity is the primary object in the latter Anglo-American Cataloging Rules, 2nd Rev. Ed.
case and that the consumer good is a secondary object, (AACR2)[12] have been successfully adapted to support
which has its own set of metadata.) Metadata sup- Web-resource cataloging.[13] And, the MARC (machine
ports many different types of functions among which readable cataloging) bibliographic format (http://lcweb.
object use, authentication, and administration, are fairly loc.gov/marc/), which underlies most online library ca-
common.[4] talogs, has been enhanced over the last few years with
new fields and codes to support the cataloging of elec-
tronic formats (e.g., the MARC field 856 was introduced
to record electronic resource location).
METADATA: BEYOND THE LIBRARY Even with these developments, many individuals and
CATALOG AND CATALOGING? communities still see a distinction between creating
metadata and cataloging. They emphasize that the Web
Metadata discussions focusing on the Web frequently turn has introduced a set of functional needs extending well
to the library catalog and cataloging as a way to define beyond item-level description and the determination of
what metadata ‘‘is’’ or ‘‘is not.’’ The result is a growing ‘‘author,’’ ‘‘title,’’ and ‘‘subject’’ access points that faci-
body of literature that has equated (e.g., Refs. [5,6]) and litate resource (object) discovery. This view can be
distinguished (e.g., Ref. [7]) creating metadata from challenged by the fact that cataloging in the broadest
cataloging. These discussions depict both individual and sense has always involved and continues to involve much
community knowledge and perception about the types of more than these basic descriptive activities. Acquisition
activities that fall under the cataloging umbrella. librarians and catalogers, particularly serials catalogers,
Equating creating metadata and cataloging makes regularly record the ‘‘price,’’ ‘‘date received,’’ and ‘‘pu-
sense because these activities have the same end goal— blisher contact’’ metadata for commercially produced
to produce a set of structured descriptive data that will resources to assist with administrative functions. Archival
facilitate object discovery and other desired functions. In and artifactual custodians document resource ‘‘format,’’
this venue the metadata record and the catalog record are ‘‘genre,’’ ‘‘history,’’ ‘‘reproduction rights,’’ and a variety
seen as synonymous products. The metadata/cataloging of other metadata to assist not only with resource iden-
analogy is further supported by the fact that many Web- tification and retrieval, but also the use, evaluation, ad-
oriented metadata schemas are similar to traditional ministration, and authentication of collection holdings.
cataloging and indexing standards and include ‘‘author/ Map curators, film archivists, and school media librarians,
creator,’’ ‘‘title,’’ ‘‘subject,’’ ‘‘publication date,’’ and to name a few more specialists, also work with materials
other metadata elements that have historically served as that have metadata needs extending beyond basic re-
access points in bibliographic information systems. source description.
Related to this practice is the fact that many metadata What is new is that the Internet and Web technology
schema specifications prescribe a content syntax that is have introduced novel information formats, new encoding
similar to the library catalog. A common example is that languages [e.g., hypertext markup language (HTML) and
‘‘author’’ metadata follows the syntax of last name, first extensible markup language (XML)], and original attrib-
name, middle initial. A final factor to note in this meta- ute values schemas [e.g., multipurpose internet mail
data/cataloging comparison is that many Web-oriented extensions (MIME)]. Moreover, the Web, as a commu-
metadata schemas have adopted and promote the use of nication mechanism, has sparked the development of
attribute value schemas (e.g., controlled vocabulary, clas- metadata schemas by many different communities ope-
sificatory system, etc., see ‘‘Metadata Vocabulary’’) that rating beyond the library environment (e.g., commerce,
were originally developed for cataloging and indexing scientific, and educational communities). These factors,
in traditional information systems. and what appears to be an unprecedented emphasis on
Distinguishing metadata from cataloging presents metadata schema standardization and interoperability,
another scenario. Among one of the most frequently have forced cataloging into a new domain. It is these
heard arguments separating these two activities is that developments that permit, to some degree, a distinction
cataloging is for physical objects and metadata is ex- between what has traditionally been labeled as cataloging
clusively for electronic resources. This argument is not and what is now viewed as a metadata activity, and
well founded given that it is common practice for libraries underlying this evolution is an emerging vocabulary.
1878 Metadata and the World Wide Web
c
AACR2 is a metadata schema/content standard, whereas MARC
d
primarily is an encoding or communications format. The two are so The use of multiple namespaces is considered unwieldy for current
entwined in AACR2-MARC cataloging that MARC is generally referred practices, but theoretically, and with future technological innovations, it
to as a schema. might offer the best possible way to control schema development.
Metadata and the World Wide Web 1879
Metadata Qualification (or Qualifiers) The term metadata vocabulary is used in two distinct
ways: for metadata schemas and metadata specifica-
Information that helps to define the metadata element tions (e.g., Ref. [22]), and for controlled vocabulary
content. The Dublin Core Metadata Initiative (http:// tools, specifically thesauri (e.g., Ref. [23]).f Although
dublincore.org/), an international and interdisciplinary there is a clear difference between these two examples,
metadata community, has identified two facets of both conceptual applications of metadata vocabulary
qualification.[19] are acceptable.
(a) Type qualifiers refine the meaning of the metadata Metadata Language
element content. For example, the metadata element
‘‘creator’’ can be refined through the qualifiers of Among one of the newest terms appearing in the literature
personal name or corporate body. is metadata language. The use of this term is synonymous
(b) Schema qualifiers identify the attribute value with metadata schema or specification, with an emphasis
schema (e.g., thesauri, classification system, etc.) is on the grammar.[24] The use of the term metadata
providing the metadata element content (see attrib- language in this context is related to XML developed for
ute value schema definition directly below). communication within specialized communities (e.g.,
Refs. [25,26] ). Metadata languages and XML are similar
Attribute Value Schema because they both include semantics, but they differ in
their expression options. The XML always conform to
A schema that provides valid metadata element content an XML DTDs, whereas metadata languages can con-
values. Among popular attribute value schemas are form to XML DTD or they can be expressed via other
subject heading lists, thesauri, name authority files, and programmatic or markup languages. A fuzzy boundary
classificatory systems. There are standard (official) and exists because there are metadata languages that ini-
locally produced attribute value schemas; these schemas tially have been published as official DTDs and satisfy
represent one type of content standard (see the glossary in the definition of an XML [e.g., encoded archival des-
Ref. [16]). Code lists, such as ISO 639-2 Codes for the cription/document type definition (EAD/DTD)[27] for
archival finding aids]. Metadata languages (as schema)
defined this way can be represented via different means,
e
For example, a property may be viewed as a characteristic that is although it is not likely.
common to all members of an object class (e.g., all books have titles),
whereas an attribute may refer to a characteristic of an instance (e.g., the
title for a designated book). Another distinction is offered in that
f
properties refer to the physical characteristics of a class, which are then These definitions refer to the specific conceptual meaning of the term
expressed as attributes in the information system (Discussion on metadata vocabulary, and are distinct from the use of this article’s
September 11, 2000, with Stephanie Haas, Associate Professor, School header for Appendix, Metadata Vocabulary, which refers to the
of Information and Library Sciences, The University of North Carolina at terminology used to discuss and communicate about the larger topic of
Chapel Hill). metadata.
1880 Metadata and the World Wide Web
Metadata Syntax < H1 > or < H2 >, indicate font size, but are structural
in that they identify sections (component parts) of a
Syntax denotes grammar or ordering of data or symbols. Web resource.
Metadata conditions address arrangement, content, and The SGML and XML metadata tags identify metadata
encoding syntaxes. elements with a representative vocabulary term or an
intelligible abbreviation. For example, the term ‘‘author’’
(a) Arrangement syntax specifies the sequencing for may tag a web page author (< author >), or the abbrev-
metadata element deployment. For example, the iation ‘‘productno’’ may tag a product number (< pro-
metadata element ‘‘product number’’ must precede ductno >) in an electronic commerce application.
the metadata element of ‘‘price’’ on an invoice.
Some specifications dictate an order, while others
do not. Metadata Label
(b) Content syntax specifies the content ordering for
individual metadata elements. For example, a The public name for a metadata element. The label
specification may recommend that ‘‘author’’ meta- identifies the metadata for the end user and supports
data follow the syntax of last name, first name, searching, administrative activities, and other functions
middle initial, or that ‘‘date’’ metadata follow the that involve user interaction.
syntax of year, month, date (YYYY-MM-DD).
(c) Encoding syntax refers to the ordering of the sym-
bols that comprise the encoding language. Examples Metadata Record
of encoding languages used for metadata element
identification are MARC, XML, and SGML (XML An organized collection of metadata elements with
and SGML are also known as markup languages or content values that represent an object. Bibliographic or
more accurately metalanguages). Each of these lan- catalog records represent bibliographic objects, patient
guages has syntactical rules for encoding metadata records represent people objects, and finding aids
elements. In XML and SGML, all metadata content represent archival collections. A bibliographic produced
must be preceded by a start tag and followed by an for a finding aid, which is itself an object, results in meta-
end tag, as viewed with the example for the author metadata (see definition below).
John McPhee: < author > John McPhee </author>.
Metadata Registry
Metadata Tags
An official location that collects and provides access to
Encoding that identifies metadata elements. metadata specifications in a systematic way. Selected
The MARC format uses three-digit numbers (also examples include:
known as tags) to identify bibliographic metadata that is
placed in control fields and variable fields. For example, . BizTalk Library (http://www.biztalk.org/library/library.
a 245 MARC tag precedes ‘‘title and statement of res- asp), a Microsoft initiative that aims to provide glo-
ponsibility’’ metadata for a bibliographic record. The bal access to XML metadata schemas;
MARC format also includes fixed fields. The MARC tag- . XML.ORG Registry (http://www.xml.org/registry/
ging conventions vary among bibliographic systems and index.shtml), an OASIS (Organization for the Ad-
can include alphabetical codes or abbreviations in addi- vancement of Structured Information Standards)
tion to numbers. initiative that functions as a central clearinghouse for
The HTML tags mainly specify web page format and the publication and exchange XML schemas and
appearance (e.g., text indentations, alignment, color, size, documents for industry related metadata;
style, and placement of graphics). There are also a series . Schemas-Forum (http://www-forum.org/), a European-
of HTML tags that identify a web page’s structural based initiative that functions as a registry and assists
components and content, such as the ‘‘title’’ tag with schema development; and
(< TITLE >) that appears in the header and the ‘‘descrip- . Open Metadata Registry (http://wip.dublincore.org/
tion’’ and ‘‘keywords’’ META tags. A number of registry/jsp/schema.jsp), a database for the registra-
commercial search engines give a higher weighted value tion, navigation, and reuse of metadata element
to HTML TITLE metadata during indexing and retrieval semantics for various resource description framework
operations, and a few extract and publicly display the (RDF) schemas developed by and used in different
content of the description and keyword META tags (e.g., resource description communities (architectural frame-
HotBot and AltaVista). The HTML heading tags, such as work definition given below).
Metadata and the World Wide Web 1881
Metadata Block
Architectural Framework
A chunk or segment of one or more selected metadata
An architectural framework is the design that guides the elements that assist with organizing, accessing, and other
implementation and the aggregation of the underlying functions of an object. Metadata blocks differ from
metadata schema(s). The architectural framework of a schemas because they involve the use of elements created
metadata project is often an aggregation of a number of without adherence to a formal specification. The use of
metadata packages that adhere to different schemas. the HTML ‘‘title’’ tag and ‘‘description,’’ and ‘‘key-
Multiple packages are needed because many environ- word’’ META tags in web pages may be thought of as a
ments, such as digital libraries, contain an array of objects metadata block because these elements are supported by a
with different functional needs. markup language—not a formal metadata specification.
Among one of the most influential developments in (Similar to the above discussion on ‘‘metadata lang-
this area is the Kahn/Wilensky Framework,[31] an infra- uages,’’ the notion of a metadata block can be used to
structure proposed to support a large, diverse, and exten- raise questions about the distinction between a metadata
sible class of distributed digital information services. schema and a markup language.)
Another key metadata architecture is the Warwick
Framework, a ‘‘container architecture’’ designed to sup-
port the coexistence of different metadata pack-
ages.[32,33] This framework is modular in that metadata METADATA GENERATION
packages are connected like Legos, a childrens’ toy
where by single plastic bits or smaller objects composed Metadata generation is the act of creating or producing
of single plastic bits are snapped together to form lar- metadata. Metadata can be generated via different classes
ger objects. of persons, tools, and processes.
The Kahn/Wilinsky and Warwick Frameworks,
together with the Platform for Internet Content Selec-
tion[34] metadata schema, provided the foundation for Classes of Persons
the RDF (1999), a syntax-independent data model that
emphasizes object properties and their values through the Among the classes of persons involved in metadata
coexistence of different metadata schemas and attribute generation, are professional metadata creators, technical
value schemas (see Miller[35] for an excellent overview metadata creators, content creators, and community or
and Beckett[36] for up-to-date links to RDF resources). subject enthusiasts. The distinction between these classes
The RDF is endorsed by the W3C and is likely to become of persons is not absolute, but they are defined separately
one of the most used metadata architectures. here for reasons of clarity.
1882 Metadata and the World Wide Web
Professional metadata creators include catalogers, Content creators include persons who create (or
indexers, database administrators, and selected Web created) the intellectual content of an object and
masters who have had high-level training through a corresponding metadata. Content creators as metadata
formal educational curriculum and/or an official on-the- generators may seem like a novel consideration because
job training program. This class of persons is known as this task has historically been viewed as the province of
third-party metadata creators because they produce meta- professionally trained persons. However, an examination
data for content created by other individuals. Professional of this activity shows that authors of scientific and
metadata creators have the intellectual capacity to make scholarly articles regularly produce abstracts, keywords,
sophisticated interpretative metadata-related decisions and name qualification metadata (e.g., they provide a
and work with classificatory systems and other complex middle initial or an institutional affiliation to distinguish
attribute value schemas. On the more technical side, they their name from others that appear identical). Further-
may also have the ability to manipulate programmatic more, these author-generated data are used for part of the
applications for automatic metadata generation; this ap- surrogate representation in many commercial abstracting
plies to database and Web programmers. Given the prof- and indexing information systems.
essional’s expert knowledge and valuable skills, their The Web permits anyone to be an author and has
greatest contribution in this area may be in working with contributed to the emergence of a remarkably more
more complex schemas, instituting or overseeing an es- diverse and expanding population of content creators
tablished metadata production operation, instructing less- compared to the community that was supported by print,
skilled persons, or helping to develop tools that facili- graphic, audio, and other more traditional forms of
tate metadata production. communication. Connected with this growth is a new
Technical metadata creators can include webmasters, metadata creator population for HTML, GIF, JPEG, or
data in-putters, paraprofessionals, encoders, and other other types of Web objects. In fact, there are a host of
persons who create metadata and may have had basic projects that facilitate content creator metadata via
training, but have not participated in a structured or templates and editors.g Examples are found with Xiv.org
certified learning program. This class of persons is not (http://tw.arxiv.org/) and the Networked Digital Library
expected to exercise discretion anywhere near the same of Theses and Dissertations (NDLTD) (http://www.ndltd.
degree as the metadata professional, although they may org), both of which are part of the Open Archives Initia-
take on more sophisticated tasks over time. Technical tive (http://www.openarchives.org/) for electronic pre-
metadata creators generally work with simpler schemas print (e-print) services. Xiv.org has a Web link entitled
or they are trained in routine processes that enable Professional Help that guides authors in the submission
them to complete or contribute to metadata records that of e-prints and corresponding metadata, and the NDLTD
satisfy more complex schemas. For example, parapro- includes a collection of official university nodes, each
fessionals working in the library acquisition department of which provides instructions for uploading theses and
create ‘‘acquisition level’’ (acq level) MARC biblio- corresponding metadata. Due to a lack of Web skills or
graphic records, which are basic bibliographic descrip- time, content creators may turn to webmasters or qua-
tions that do not have authorized subject or name lified persons to webify their documents (to make their
headings. The acq level bibliographic data are used by document(s) Web accessible), but, in such cases, they can
the metadata professional at a later date to create full- still provide certain descriptive metadata.
level AACR2 MARC records, which are quite complex. Community or subject enthusiasts are persons who
A similar sharing of metadata creation tasks is found have not had any formal metadata-creation training (or at
with the creation of patient records. Generally, an of- least are not employed in the professional sense) but have
fice assistant, with the aid of the patient, first records special subject knowledge and want to assist with
basic metadata, such as the patient’s name, address, documentation. The Open Source Metadata Framework
contact information, allergies, and reason for the ap- (OMF; http://www.ibiblio.org/osrt/omf/) provides an
pointment. More substantive information, such as the example of this class of metadata creators. The OMF is
patient’s height, blood pressure, condition, treatment, based on the Dublin Core and it is used by both resource
and prognosis, is added to the patient’s record by a authors and Linux enthusiasts to produce metadata for
medical assistant and/or a doctor either during or after Linux documentation. Another example is found with the
the actual appointment. The distinction between pro- Fine Arts Museums of San Francisco’s Thinker Image-
fessional and technical metadata creators described here Base (http://www.thinker.org/fam/thinker.html), which
is not absolute because, frequently, there are persons
who are identified as and paid as though they were
technical metadata creators but who perform profes- g
Tools, further defines templates editors and is related to the discussion
sional-like activities. covered in this section.
Metadata and the World Wide Web 1883
was initiated during the Legion of Honor renovation, form without the bells and whistles. An example is found
following the Loma Prieta earthquake. ImageBase con- with the Linux Software Map (LSM) Entry Template
tains images and corresponding metadata for objects from
the collections of the Fine Arts Museums of San Fran-
(ftp://ftp.execpc.com/pub/lsm/LSM.README) for meta-
data about Linux software packages. Guidelines asso-
M
cisco (the de Young Museum and the Legion of Honor). ciated with the LSM schema refer to the RFC822 standard
Through a collaborative arrangement, museum staff for author name content syntax, among other standards,
provided the artist’s name, date of creation, technique, but the official template provides no linking mechanisms.
and other types of official museum registration metadata, Persons using this template generally work in a text
and community enthusiasts (volunteers) assigned key- editor, seek standards documentation on their own, and
words to approximately 20,000 images. Community en- submit their LSM records to a Linux repository via file
thusiasts were used for two key reasons—to assist transfer protocol.
museum staff with object documentation and to enhance The MARC bibliographic form supporting cataloging
access through the provision of additional subject terms. in many second-generation online catalogs provides
The OMF and ImageBase projects are exceptions rather another template example. Catalogers working in these
than the norm. However, it is likely that community systems are presented with a form that outlines specific
enthusiasts will increasingly be called upon to produce metadata fields (tags) for the MARC bibliographic for-
metadata, particularly with the Web’s growing connec- mat, but the syntactical encoding is far from complete.
tivity, the increase in efforts to document both physical Additionally, these cataloging forms do not provide an
and virtual communities, and the limited availability and immediate link to subject and name authority files,
high-cost metadata professionals. Furthermore, it is likely AACR2 (the electronic version), or MARC documenta-
that there will be more collaborative efforts between tion, and consulting cataloging documentation is an
metadata professionals and community enthusiasts as additional task. This facility is fortunately changing as
demonstrated by the ImageBase project. many catalogs become Web-based and hyperlink to
cataloging documentation, thus functioning more like an
editor (see definition for metadata editors given directly
Tools below). In short, templates are easy to maintain, but they
are limited because they do not link to needed docu-
Metadata generation is supported by a variety of tools. mentation; and, as noted above, the metadata creator is
There are standards and various forms of documentation, required to manually enter data according to the proper
such as specifications, qualification lists, and attribute content and encoding syntaxes.
value schemas, developed for the production of consist- Editors are similar to templates, but more sophisticated
ent, high quality, and accurate metadata. And there are in that they take advantage of technology to provide direct
human beings—intellectual tools with the capacity to access to specifications, attribute value schemas, and
exercise discretion and perform data input. Beyond these other documentation. Furthermore, they assist with syn-
examples are templates, editors, and generators. These tactical aspects of metadata creation, often via automatic
are devices that assist with the initial metadata generation means. One of most popular Dublin Core editors is the
and then capture metadata for storage in either a database Nordic Dublin Core Metadata Template (http://www.
or a resource header (e.g., the header of an HTML or lub.lu.se/cgi-bin/nmdc.pl).h This editor provides a pre-
XML document). Metadata literature and web pages that view option that allows metadata to be examined without
discuss or provide access to these devices label them its syntactical encoding, which is far easier on the eye; it
inconsistently and make it difficult to discriminate also supports the generation of metadata records with
among their various offerings. This article addresses this HTML META tags for embedding in the header of a
problem by providing refined definitions for metadata resource. The Nordic Template has been adapted to many
templates, editors, and generators. The discussion that different Dublin Core projects—a partial list of which is
follows also introduces the concept of hybrid metadata found at http://dublincore.org/tools/. Another Dublin Core
tools and explores document editors as metadata gene- example is the Reggie Metadata Editor (http://metadata.
ration tools. net/dstc/), which allows for metadata to be generated
Templates should be viewed as basic cribsheets that according to the HTML 3.2 standard, the HTML 4.0
sketch a framework or provide an outline of schema standard, and within RDF.
elements without linking to supporting documentation.
Templates, in both print and electronic format, have
predominated metadata generation most likely because h
Although the term template appears in the official name of the Nordic
they are simple to produce and maintain. These tools Dublin Core Metadata Template, it is an editor according to the
simply guide metadata creation through the provision of a definitions offered in this article.
1884 Metadata and the World Wide Web
Editors (or editor-like tools) have been developed for metadata for the ‘‘date a resource was last updated,’’ its
many different metadata schemas with hyperlinks to ‘‘MIME type,’’ and other easily processed information,
specifications, controlled vocabulary tools, and name but the results vary greatly for more intellectually
authority files. There is even off-the-shelf software like demanding metadata such as ‘‘subject descriptors.’’ One
Metabrowser (http://metabrowser.spirit.net.au/), which approach to dealing with the experimental and unpredict-
functions like an editor by hyperlinking to important able nature of generators has been the creation of hybrid
documentation for several standard schemas and by metadata tools that combine aspects of both editors and
generating the correct syntactical encoding via automatic generators. An example is offered with Klarity’s betta
means. This particular software enables a person to view meta service (http://www.klarity.com.au/), which requests
the actual object during the metadata creation process and the submission of an URL or Web address for automatic
can also be used to develop or customize a schema. metadata generation, but also allows a metadata creator to
Beyond resource description, there are forms that people complete a form that corresponds fairly well to the Dublin
use daily for activities, such as joining an organization, Core. Among several questions that the Klarity metadata
posting information on an online community bulletin form asks are: Who wrote the document? Who published
board, or purchasing a product over the Internet—and all the document? What type of document is it? And, in which
of these forms require various types of metadata. For language is it written?
example, Amazon.com requires a client to submit a Document editors that support the creation of docu-
mailing address, product information, credit card number, ments according to a specified format also need to be
and other types of information to purchase a book. The considered in this immediate discussion. Examples in-
Amazon.com form, like many other Web-based forms, clude Microsoft’s Front Page (http://office.microsoft.
includes drop-down menus that help to standardize data com/features/astFrontPage.asp) and Netscape Commun-
input and the metadata is processed and stored in a icator (http://home.netscape.com/communicator/v4.5/
consumer database. Web forms exemplifying such fea- index.html) for the production of HTML documents,
tures may be viewed as editors, at least a general sense, and XML Spy (http://www.xmlspy.com/) and Xeena IBM
because the data produced represents an object. Alphaworks (http://www.alphaworks.ibm.com/tech/
Generators support automatic metadata production.i xeena) for the production of XML documents.j Nearly
In the context of the Web, generators first require the all document editors automatically produce certain types
submission of an uniform resource locator (URL), a of metadata as part of the document-creation process (e.g.,
persistent uniform resource identifier, or another Web ‘‘date document was produced’’ is among the most
address in order to locate and visit an object. An algo- common), and function, at least partially, as metadata
rithm is then used to comb an object’s content, including generators. A caveat needs to be added here in that the
its HTML source code, and automatically assign meta- distinction between a document editor and a metadata
data. An example is found with the DC.dot generator editor or metadata generator is somewhat fuzzy, given
(http://www.ukoln.ac.uk/metadata/dcdot/), which requires that metadata records may be viewed as documents. To
the submission of an URL to locate and scan the re- clarify this, is that metadata generators and metadata
source’s content. This generator then automatically pro- editors may be viewed as tools that are limited to the
duces a Dublin Core record with HTML META tags or creation of metadata as defined in this article—that is
XML metadata tags within RDF. With the former op- data about an object that facilitates functions associated
tion, the metadata can be embedded in the < HEAD > with the designated object, whereas document editors aid
. . . < /HEAD > section of a HTML document, and, with in the production of the complete document (object),
the latter option, the metadata can be embedded in the although they include metadata editor or generator-like
header of an XML document. DC.dot supports metadata features. In concluding this discussion, it should be
generation according to a number of different of schemas emphasized that the metadata generation tools reviewed
(e.g., Government Information Locator Service,[39] the here are important because they test new technological
TEI header). capabilities and may contribute to the identification
The majority of schema-specific generators are con- more efficient and effective means of metadata produc-
sidered experimental because of their reliance on machine tion. These tools are very much intertwined with the
processing. These tools can produce fairly accurate different metadata generation processes—an overlapping
topic explored below.
i
The distinction given in this article between editors and generators is
j
based loosely on those found under the ‘‘tools’’ link on the Meta Matters Only HTML and XML examples are provided here, but there are editors
[Web site] produced by the National Library of Australia (http:// for many other types of document encoding and markup languages, such as
www.nla.gov.au/meta/). SGML and TeX/LaTeX that also generate metadata via automatic means.
Metadata and the World Wide Web 1885
metadata found in the keyword and description HTML groups, monographs, and even a forthcoming research
META tags, when it is available in a resource header.k journal entitled Metadata (sponsored by Kluwer Aca-
Leading from this last example, there are a number of demic Publishers: http://www.wkap.nl/kaphtml.htm/
other scenarios where manual and automatic metadata HOMEPAGE) devoted to this topic. Additionally, the
generation processes can be combined for a Web-resource topic of metadata, with special attention to the Web, has
description. For example, web page subject metadata can become an official part of many information and library
be enhanced by automatically mapping manually pro- science curricula, and research is underway at Catholic
duced keyword and description metadata against a term University (http://research.cua.edu/metadata/) to further
list or a controlled vocabulary. Similarly, authority educational knowledge and developments in this area.
control can be established for named entities, such as a These events clearly demonstrate that study and explora-
person’s name or a geographic body, by automatically tion of metadata is of fundamental importance to for the
mapping manually produced name metadata against a future organization and access of Web resources. Meta-
name authority file. Another scenario combining both data is a vast topic and it is unfair to think a single article
generation processes involves manually editing metadata can adequately introduce all the facets of this topic. It is
that was initially produced via automatic means by a therefore recommended that persons interested the basic
generator or a search engine’s indexing algorithm. Meta- foundations and range of issues in this area consult the
data can also be generated via both automatic and manual growing body of metadata literature, much of which is
processes at virtually the same time. Klarity’s betta meta accessible via the Web (e.g., the Dublin Core homepage
service discussed above comes close to this model be- (http://dublincore.org/) and the IFLA Digital Libraries:
cause it allows for automatic metadata generation via Metadata Resources web page (http://www.ifla.org/II/
an URL and includes a Web form that enables a person metadata.htm). While there are many outstanding re-
to manually create metadata. There are very few good sources that cover metadata, this article concludes by list-
examples of tools that combine both automatic and ma- ing four selected key readings: Dempsey and Heery,[42]
nual metadata generation, but it is likely that more tools Hudgins, et al.,[43] Introduction to Metadata,[16] and
will be available in the near future, particularly as more Vellucci.[44]
communities undertake metadata initiatives and more is
learned about the effectiveness of each process.
APPENDIX A
Report TR96-1593, July 1996. http://cs-tr.cs.cornell. Text Encoding and Interchange; Sperberg-McQueen,
edu:80/Dienst/UI/2.0/Describe/ncstrl.cornell/TR96-1593. C.M., Burnard, L., Eds.; Text Encoding Initiative:
34. Platform for Internet Content Selection (PICS), Technical Chicago, 1994.
Specifications & Completed Specifications for PICS-1.1; 39. Government Information Locator Service (GILS). 1997.
1997. http://www.w3.org/PICS/. http://www.gils.net/prof_v2.html.
35. Miller, E. An introduction to the resource description 40. Vinyard, P.E. An Analysis of Embedded Metadata Usage
framework. D-Lib Mag. 1998. (http://www.dlib.org/dlib/ on the World Wide Web. In A Master’s Paper for the
may98/miller/05miller.html. M.S. in L.S. Degree; School of Information and Library
36. Beckett, D. Dave Beckett’s Resource Description Frame- Science, University of North Carolina at Chapel Hill,
work (RDF) Resource Guide; ILRT University of Bristol, 2001.
2001[last update], http://www.ilrt.bris.ac.uk/discovery/rdf/ 41. Federal Geographic Metadata Committee, Content Stand-
resources/. ard for Digital Geospatial Metadata (FGDC/CSDGM).
37. Iannella, R.; Campbell, D. The A-Core (A-CORE): 1998. http://fgdc.er.usgs.gov/metadata/csdgm/.
Metadata about Content Metadata; 1999. http://metadata. 42. Dempsey, L.; Heery, R. Metadata: A current view of
net/admin/draft-iannella-admin-01.txt. practice and issues. J. Doc. 1998, 54 (2), 154 – 173.
38. TEI Guidelines for Electronic Text Encoding and 43. Hudgins, J.; Agnew, G.; Brown, E. Getting Mileage Out of
Interchange (TEI) (P3); Sperberg-McQueen, C.M., Bur- Metadata; American Library Association, 1999.
nard, L., Eds.; 1994, Chapter 5. http://etext.lib.virginia. 44. Vellucci, S. MetaData Annu. Rev. Inf. Sci. Technol. 1998,
edu/TEI.html (Printed copy: Guidelines for Electronic 33, 187 – 222.