Semantic Web: Department of Computer Science,,CUSAT
Semantic Web: Department of Computer Science,,CUSAT
ABSTRACT
The word semantic stands for the meaning of. The semantic of something is the
meaning of something. The Semantic Web is a web that is able to describe things in a
way that computers can understand.
Sentences like these can be understood by people. But how can they be
understood by computers?Statements are built with syntax rules. The syntax of a
language defines the rules for building the language statements. But how can syntax
become semantic?This is what the Semantic Web is all about. Describing things in a way
that computers applications can understand. The Semantic Web is not about links
between web pages. The Semantic Web describes the relationships between things (like
A is a part of B and Y is a member of Z) and the properties of things (like size, weight,
age, and price)
CONTENTS
1. INTRODUCTION 5
1.1 What is Semantic Web? 5
5. CASE STUDY 23
6. CONCLUSION 28
7. REFERENCES 29
1.INTRODUCTION
The Web was designed as an information space, with the goal that it should be
useful not only for human-human communication, but also that machines would be able
to participate and help. One of the major obstacles to this has been the fact that most
information on the Web is designed for human consumption, and even if it was derived
from a database with well defined meanings for its columns, that the structure of the data
is not evident to a robot browsing the web. Humans are capable of using the Web to carry
out tasks such as finding the Finnish word for "car", to reserve a library book, or to
search for the cheapest DVD and buy it. However, a computer cannot accomplish the
same tasks without human direction because web pages are designed to be read by
people, not machines.
Tim Berners-Lee originally expressed the vision of the semantic web as follows
“I have a dream for the Web [in which computers] become capable of analyzing
all the data on the Web – the content, links, and transactions between people and
computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but
when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be
handled by machines talking to machines. The ‘intelligent agents’ people have touted for
ages will finally materialize”
For example, with HTML and a tool to render it (perhaps Web browser software,
perhaps another user agent), one can create and present a page that lists items for sale.
The HTML of this catalog page can make simple, document-level assertions such as "this
document's title is 'Widget Superstore'". But there is no capability within the HTML itself
to assert unambiguously that, for example, item number X586172 is an Acme Gizmo
with a retail price of €199, or that it is a consumer product. Rather, HTML can only say
that the span of text "X586172" is something that should be positioned near "Acme
Gizmo" and "€ 199", etc. There is no way to say "this is a catalog" or even to establish
that "Acme Gizmo" is a kind of title or that "€ 199" is a price. There is also no way to
express that these pieces of information are bound together in describing a discrete item,
distinct from other items perhaps listed on the page.
The semantic web addresses this shortcoming, using the descriptive technologies
Resource Description Framework(RDF) and Web Ontology Language(OWL), and the
data-centric, customizable Extensible Markup Language (XML). These technologies are
combined in order to provide descriptions that supplement or replace the content of Web
documents. Thus, content may manifest as descriptive data stored in Web-accessible
databases, or as markup within documents (particularly, in Extensible HTML (XHTML)
interspersed with XML, or, more often, purely in XML, with layout/rendering cues stored
separately). The machine-readable descriptions enable content managers to add meaning
to the content, i.e. to describe the structure of the knowledge we have about that content.
In this way, a machine can process knowledge itself, instead of text, using processes
similar to human deductive reasoning and inference, thereby obtaining more meaningful
results and facilitating automated information gathering and research by computers.
Because the Web is far too large for any one organization to control it, URIs are
decentralized. No one person or organization controls who makes them or how they can
be used. While some URI schemes (such as http:) depend on centralized systems (such as
DNS), other schemes (such as freenet:) are completely decentralized.
This means that we don't need anyone's permission to create a URI. We can even
create URIs for things we don't own. While this flexibility makes URIs powerful, it
brings with it more than a few problems. Because anyone can create a URI, we will
inevitably end up with multiple URIs representing the same thing. Worse, there will be
no way to figure out whether two URIs refer to exactly the same resource. Thus, we'll
never be able to say with certainty exactly what a given URI means. But these are trade
offs that must be made if we are to create something as enormous as the Semantic Web.
A URI is not a set of directions telling your computer how to get to a specific file
on the Web (though it may also do this). It is a name for a "resource" (a thing). This
resource may or may not be accessible over the Internet. The URI may or may not
provide a way for our computer to get more information about that resource. Yes, a URL
is a type of URI that does provide a way to get information about a resource, or perhaps
to retrieve the resource itself, and other methods for providing information about URIs
and the resources they identify are under development. It is also true that the ability to
say things about URIs is an important part of the Semantic Web.
Consider a simple example: if a document contains certain words that are marked
as "emphasized," the way those words are rendered can be adapted to the context. A Web
browser might simply display them in italics, whereas a voice browser (which reads Web
pages aloud) might indicate the emphasis by changing the tone or the volume of its voice.
Each program can respond appropriately to the meaning encoded in the markup. In
contrast, if we simply marked the words as "in italics", the computer has no way of
knowing why those words are in italics. Is it for emphasis or simply for a visual effect?
How does the voice browser display this effect?
As far as our computer is concerned, this is just text. It has no particular meaning
to the computer. But now consider this same passage marked up using an XML-based
markup language (we'll make one up for this example):
<sentence>
<person href="http://aaronsw.com/"> I </person> just got a new pet <animal> dog
</animal>.
</sentence>
Notice that this has the same content, but that parts of that content are labeled.
Each label consists of two "tags": an opening tag (e.g., <sentence>) and a closing tag
(e.g., </sentence>). The name of the tag ("sentence") is the label for the content enclosed
by the tags. We call this collection of tags and content an "element." Thus, the sentence
element in the above document contains the sentence, "I just got a new pet dog." This
tells the computer that "I just got a new pet dog" is a "sentence," but -- importantly -- it
does not tell the computer what a sentence is. Still, the computer now has some
information about the document, and we can put this information to use.Similarly, the
computer now knows that "I" is a "person" (whatever that is) and that "dog" is an
"animal."
<sentence>
<person href="http://aaronsw.com">I</person> just got a new pet <animal type="dog"
href="http://aaronsw.com/myDog">dog</animal>.
</sentence>
A problem with this is that we've used the words "sentence," "person," and
"animal" in the markup language. But these are pretty common words. What if others
have used these same words in their own markup languages? What if those words have
different meanings in those languages? Perhaps "sentence" in another markup language
refers to the amount of time that a convicted criminal must serve in a penal institution.
Since everyone's tags have their own URIs, we don't have to worry about tag
names conflicting. XML, of course, lets us abbreviate and set default URIs so we don't
have to type them out each time.
RDF is really quite simple. An RDF statement is a lot like a simple sentence,
except that almost all the words are URIs. Each RDF statement has three parts: a subject,
a predicate and an object. Let's look at a simple RDF statement:
<http://aaron.com/>
<http://love.example.org/terms/reallyLikes>
<http://www.w3.org/People/Berners-Lee/Weaving/> .
The first URI is the subject. In this instance, the subject is aaron. The second URI is the
predicate. It relates the subject to the object. In this instance, the predicate is
"reallyLikes." The third URI is the object. Here, the object is Tim Berners-Lee's book
"Weaving the Web." So the RDF statement above says that aaron really like "Weaving
the Web."
Once information is in RDF form, it becomes easy to process it, since RDF is a
generic format, which already has many parsers. XML RDF is quite a verbose
specification, and it can take some getting used to (for example, to learn XML RDF
properly, you need to understand a little about XML and namespaces beforehand...), but
let's take a quick look at an example of XML RDF right now:-
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:foaf="http://xmlns.com/0.1/foaf/" >
<rdf:Description rdf:about="">
<dc:creator rdf:parseType="Resource">
<foaf:name>Sean B. Palmer</foaf:name>
</dc:creator>
<dc:title>The Semantic Web: An Introduction</dc:title>
</rdf:Description>
</rdf:RDF>
This piece of RDF basically says that this article has the title "The Semantic Web:
An Introduction", and was written by someone whose name is "Sean B. Palmer". Here
are the triples that this RDF produces:-
This format is actually a plain text serialization of RDF called "Notation3", which
we shall be covering later on. Note that some people actually prefer using XML RDF to
Notation3, but it is generally accepted that Notation3 is easier to use, and is of course
convertable to XML RDF anyway.RDF triples can be written with XML tags, and they
are represented graphically as shown below
Why RDF?
When people are confronted with XML RDF for the first time, they usually have
two questions: "why use RDF rather than XML?", and "do we use XML Schema in
conjunction with RDF?".
The answer to "why use RDF rather than XML?" is quite simple, and is twofold.
Firstly, the benefit that one gets from drafting a language in RDF is that the information
maps directly and unambiguously to a model, a model which is decentralized, and for
which there are many generic parsers already available. This means that when you have
an RDF application, you know which bits of data are the semantics of the application,
and which bits are just syntactic fluff. And not only do you know that, everyone knows
that, often implicitly without even reading a specification because RDF is so well known.
The second part of the twofold answer is that we hope that RDF data will become a part
of the Semantic Web, so the benefits of drafting your data in RDF now draws parallels
with drafting your information in HTML in the early days of the Web.
The answer to "do we use XML Schema in conjunction with RDF?" is almost as
brief. XML Schema is a language for restricting the syntax of XML applications. RDF
already has a built in BNF that sets out how the language is to be used, so on the face of it
the answer is a solid "no". However, using XML Schema in conjunction with RDF may
be useful for creating datatypes and so on. Therefore the answer is "possibly", with a
caveat that it is not really used to control the syntax of RDF. This is a common
misunderstanding, perpetuated for too long now.
For the Semantic Web to reach its full potential, many people need to start
publishing data as RDF. Where is this information going to come from? A lot of it can be
derived from many data publications that exist today, using a process called "screen
scraping". Screen scraping is the act of literally getting the data from a source into a more
manageable form (i.e. RDF) using whatever means come to hand. Two useful tools for
screen scraping are XSLT (an XML transformations language), and RegExps (in Perl,
Python, and so on).
XML RDF can be rather difficult, but thankfully, there are simpler teaching forms
of RDF. One of these is called "Notation3", and was developed by Tim Berners-Lee.
There is some documentation covering N3, including a specification and an excellent
Primer.
The design criteria behind Notation3 were fairly simple: design a simple easy to
learn scribblable RDF format, that is easy to parse and build larger applications on top of.
In Notation3, we can simply write out the URIs in a triple, delimiting them with a "<"
and ">" symbols. For example, here's a simple triple consisting of three URI triples:-
To use literal values, simply enclose the value in double quote marks, thus:-
Notation3 does have many other little constructs including contexts, DAML lists,
and alternative ways of representing anonymous nodes, but we need not concern
ourselves with them here.
RDF Schema
RDF Schema was designed to be a simple datatyping model for RDF. Using RDF
Schema, we can say that "Fido" is a type of "Dog", and that "Dog" is a sub class of
animal. We can also create properties and classes, as well as doing some slightly more
"advanced" stuff such as creating ranges and domains for properties.
The first three most important concepts that RDF and RDF Schema give us are
the "Resource" (rdfs:Resource), the "Class" (rdfs:Class), and the "Property"
(rdf:Property). These are all "classes", in that terms may belong to these classes. For
example, all terms in RDF are types of resource. To declare that something is a "type" of
something else, we just use the rdf:type property:-
This simply says that "Resource is a type of Class, Class is a type of Class,
Property is a type of Class, and type is a type of Property". These are all true statements.
It is quite easy to make up our own classes. For example, let's create a class called "Dog",
which contains all of the dogs in the world:-
We can also create properties quite easily by saying that a term is a type of
rdf:Property, and then use those properties in our RDF:-
Why have we said that Fido's name is "Fido"? Because the term ":Fido" is a URI,
and we could quite easily have chosen any URI for Fido, including ":Squiggle" or
":n508s0srh". We just happened to use the URI ":Fido" because it's easier to remember.
However, we still have to tell machines that his name is Fido, because although people
can guess that from the URI (even though they probably shouldn't), machines can't.
RDF Schema also has a few more properties that we can make use of: rdfs:subClassOf
and rdfs:subPropertyOf. These allow us to say that one class or property is a sub class or
sub property of another. For example, we might want to say that the class "Dog" is a sub
class of the class "Animal". To do that, we simply say:-
Hence, when we say that Fido is a Dog, we are also saying that Fido is an Animal.
We can also say that there are other sub classes of Animal:-
And so on. RDF schema allows one to build up knowledge bases of data in RDF
very very quickly.
The next concepts which RDF Schema provides us, which are important to
mention, are ranges and domains. Ranges and domains let us say what classes the subject
and object of each property must belong to. For example, we might want to say that the
property ":bookTitle" must always apply to a book, and have a literal value:-
rdfs:domain always says what class the subject of a triple using that property
belongs to, and rdfs:range always says what class the object of a triple using that property
belongs to.
RDF Schema also contains a set of properties for annotating schemata, providing
comments, labels, and the like. The two properties for doing this are rdfs:label and
rdfs:comment, and an example of their use is:-
DAML+OIL
datatypes, and so on. We shall run through a couple of these here, but armed with the
knowledge that you've already gotten from this introduction (assuming that you haven't
skipped any of it!), it should be just as beneficial going through the DAML + OIL
walkthru
One DAML construct that we shall run through is the daml:inverseOf property.
Using this property, we can say that one property is the inverse of another. The rdfs:range
and rdfs:domain values of daml:inverseOf is rdf:Property. Here is an example of
daml:inverseOf being used:-
implies that:-
:x daml:equivalentTo :y .
Inference
The principle of "inference" is quite a simple one: being able to derive new data
from data that already know. In a mathematical sense, querying is a form of inference
(being able to infer some search results from a mass of data, for example). Inference is
one of the driving principles of the Semantic Web, because it will allow us to create SW
applications quite easily.
To demonstrate the power of inference, we can use some simple examples. Let's
take the simple car example: we can say that:-
Now, to a German Semantic Web processor, the term ":macht" may well be built
into it, and although an English processor may have an equivalent term built into it
somewhere, it will not understand the code with the term in it that it doesn't understand.
Here, then, is a piece of inference data that makes things clearer to the processor:-
We have used the DAML "equivalentTo" property to say that "macht" in the
German system is equivalent to "power" in the English system. Now, using an inference
engine, a Semantic Web client could successfully determine that:-
2.5 Logic
For the Semantic Web to become expressive enough to help us in a wide range of
situations, it will become necessary to construct a powerful logical language for making
inferences. There is a raging debate as to how and even whether this can be
accomplished, with people pointing out that RDF lacks the power to quantify, and that
the scope of quantification is not well defined. Predicate logic is better discussed in John
Sowa's excellent Mathematical Background(Predicate Logic)
At any rate, we already have a great range of tools with which to build the
Semantic Web: assertions (i.e. "and"), and quoting (reification) in RDF, classes,
properties, ranges and documentation in RDF Schema, disjoint classes, unambiguous and
unique properties, data types, inverses, equivalencies, lists, and much more in
DAML+OIL.
} a log:Falsehood .
Which can be read as "it is not true that Joe does not love The Simpsons and is not nuts".
I resisted the temptation to make Joe a universally quantified variable.
Note that the above example does not serialize "properly" into XML RDF, because XML
RDF does not have the context construct as denoted by the curly brackets in the example
above. However a similar effect can be achieved using reification and containers.
Eg:
All permanent employees of SAS will get 50% discount for all Radison hotels.
THEREFORE, Jack will get 50% discount for all Radison hotels.
2.6 Proof
Once we begin to build systems that follow logic, it makes sense to use them to
prove things. People all around the world could write logic statements. Then your
machine could follow these Semantic "links" to construct proofs.
Example: Corporate sales records show that Jane has sold 55 widgets and 66
sprockets. The inventory system states that widgets and sprockets are both different
company products. The built-in math rules state that 55 + 66 = 121 and that 121 is more
than 100. And, as we know, someone who sells more than 100 products is a member of
the Super Salesman club. The computer puts all these logical rules together into a proof
that Jane is a Super Salesman.
While it's very difficult to create these proofs (it can require following thousands,
or perhaps millions of the links in the Semantic Web), it's very easy to check them. In this
way, we begin to build a Web of information processors. Some of them merely provide
data for others to use. Others are smarter, and can use this data to build rules. The
smartest are "heuristic engines" which follow all these rules and statements to draw
conclusions, and kindly place their results back on the Web as proofs, as well as plain old
data.
Now it's highly unlikely that you'll trust enough people to make use of most of the
things on the Web. That's where the "Web of Trust" comes in. You tell your computer
that you trust your best friend, Robert. Robert happens to be a rather popular guy on the
Net, and trusts quite a number of people. And of course, all the people he trusts, trust
another set of people. Each of those people trust another set of people, and so on. As
these trust relationships fan out from you, they form a "Web of Trust." And each of these
relationships has a degree of trust (or distrust) associated with it.
Note that distrust can be as useful as trust. Suppose that computer discovers a
document that no one explicitly trusts, but that no one explicitly distrusts either. Most
likely, computer will trust this document more than it trusts one that has been explicitly
labeled as untrustworthy.
The computer takes all these factors into account when deciding how trustworthy
a piece of information is. It can also make this process as transparent or opaque as you
desire.
3. PROJECTS
FOAF
SIOC
SIMILE
The Linking Open Data Project is a community lead effort to create openly
accessible, and interlinked, RDF Data on the Web. The data in question takes the form of
RDF Data Sets drawn from a broad collection of data sources. The project is one of
several sponsored by the W3C's Semantic Web Education & Outreach Interest Group
(SWEO)
4. BROWSERS
A semantic web Browser is a form of Web User Agent that expressly requests
RDF data from Web Servers using the best practice known as "Content Negotiation".
These tools provide a user interface that enables data-link oriented navigation of RDF
data by dereferencing the data links (URIs) in the RDF Data Sets returned by Web
Servers.
5. CASE STUDY
Semantic-based Search and Query System for the Traditional Chinese
Medicine Community
General Description
The long standing curation effort in the Chinese medicine community has been
accumulating huge amounts of data, which are typically stored in relational data
management systems such as Oracle, and published as HTML pages for
public presentation. The China Academy of Chinese Medicine Sciences(CACMS) hosts
much of the data. However, it has become increasingly difficult and time-consuming to
manage the data, and the links to data sources from other institutions. Although they
could be physically put together, but the logical links among the data are usually implicit
or even lost at all. Moreover, the randomness of choosing names for relational tables,
table columns, and record values make the data only understandable to the original
database designer and data curator and exclusively controlled by ad hoc
applications. This has caused a huge hindrance in sharing, and reusing data across
databases, and organizational boundaries.
Figure 1: This figure shows the architecture of the Semantic Web layer and its role in
unifying and linking heterogeneous relational data.
They have applied Semantic Web technologies to relational data to make it more
sharable and machine-processable. They have also developed a semantic-based search
and query system for the traditional Chinese medicine community in China (TCM
Search), which has been deployed for real life usage since fall 2005. For the TCM
system, a TCM ontology and the semantic layer has been constructed to unify and link
the legacy databases, which typically have heterogeneous logic structures and physical
properties. Users and applications now only need to interact with the semantic layer, and
the semantic interconnections allow for searching, querying, navigating around an
extensible set of databases without the awareness of the boundaries (Figure 1). Additional
deductive capabilities can then be implemented at the semantic layer to increase the
usability and re-usability of data. Besides, a visualized mapping tool has been developed
to facilitate the mapping from relational data to the TCM ontology, and an ontology-
based query and search portal has also been implemented to assist the semantic
interaction with the system.
The informal approach taken for the selection of names and values within
relational databases makes the data only understandable by specific applications. The
mapping from relational data to the Semantic Web layer makes the semantics of the data
more formal, explicit, and prepared for sharing and reusing by other applications.
However, because of the inherent model difference between relational data model and the
Semantic Web languages, mapping is always a complicated task and can be time-
consuming and error-prone. We have therefore developed a visualize mapping tool
to simplify the work as much as possible, as Figure 2 displays. The tool generates
mapping rules that are used when a SPARQL query is rewritten into a set of SQL queries.
Figure 2: This figure shows a visualized mapping from a TCM relational database
to the TCM ontology.
The ontology plays an important role in the mediation of the query, search and
navigation. At first, it serves as a logic layer for users in constructing semantic queries.
The form-based query interface is automatically generated based on the ontological
structure. The constructed semantic query will then be translated into SQL queries based
on the mapping rules generated by the mapping tool. At second, it enables semantic
navigation across database boundaries during query and search. At third, it also serves as
a control vocabulary to facilitate search by making semantic suggestions such as
synonyms, and related concepts.
Figure 3: This figure shows the ontology-based query and search portal of the TCM
search system.
Key Benefits of Using Semantic Web Technology
• Exposing of legacy data through a semantic layer so that it can be more easily
reused and recombined.
• Linking data across database boundaries so as to enabling more intuitive query,
search, and navigation without the awareness of the boundaries.
• The ontology serves as the control vocabulary to make semantic suggestions such
as synonyms, related concepts to facilitate query and search.
6. CONCLUSION
One of the best things about the Web is that it's so many different things to so
many different people. The coming Semantic Web will multiply this versatility a
thousandfold. For some, the defining feature of the Semantic Web will be the ease with
which your PDA, your laptop, your desktop, your server, and your car will communicate
with each other. For others, it will be the automation of corporate decisions that
previously had to be laboriously hand-processed. For still others, it will be the ability to
assess the trustworthiness of documents on the Web and the remarkable ease with which
we'll be able to find the answers to our questions -- a process that is currently fraught
with frustration.
Whatever the cause, almost everyone can find a reason to support this grand
vision of the Semantic Web. Sure, it's a long way from here to there . The possibilities are
endless, and even if we don't ever achieve all of them, the journey will most certainly be
its own reward.
7. REFERENCES
http://www.w3.org/2001/sw/
http://www.w3schools.com/rdf/
http://www.w3.org/RDF/FAQ
http://en.wikipedia.org/wiki/Semantic_Web
http://www.hpl.hp.com/semweb/sw-technology.htm
http://www.w3.org/2001/sw/sweo/public/UseCases/UniZheijang/
Article on The Semantic Web in Action , by Lee Feigenbaum, Ivan Herman, Tonya
Hongsermeier, Eric Neumann and Susie Stephens published on Scientific
American Magazine, Dec 2007