DiploCloud: Scalable RDF Management

The document describes DiploCloud, a system for efficiently managing and querying large amounts of RDF data in the cloud. It analyzes both instance and schema information before partitioning and distributing the data across nodes. This physiological partitioning approach improves on previous methods. The system architecture includes main data structures for storing and indexing the RDF data. Evaluation shows DiploCloud can be two orders of magnitude faster than state-of-the-art systems on standard workloads.

Uploaded by

NaveenKumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

185 views5 pages

DiploCloud: Scalable RDF Management

Uploaded by

NaveenKumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

DiploCloud: Efficient and Scalable Management of RDF Data

In the Cloud
Aim:
The main aim of this Project is to provide an efficient and scalable distributed RDF
data management system for the cloud.

Synopsis:
Despite recent advances in distributed RDF data management, processing large-
amounts of RDF data in the cloud is still very challenging. In spite of its seemingly simple
data model, RDF actually encodes rich and complex graphs mixing both instance and
schema-level data. Sharing such data using classical techniques or partitioning the graph
using traditional min-cut algorithms leads to very inefficient distributed operations and to
a high number of joins. In this paper, we describe DiploCloud, an efficient and scalable
distributed RDF data management system for the cloud. Contrary to previous approaches,
DiploCloud run a physiological analysis of both instance and schema information prior to
partitioning the data. In this paper, we describe the architecture of DiploCloud, its main
data structures, as well as the new algorithms we use to partition and distribute data. We
also present an extensive evaluation of DiploCloud showing that our system is often two
orders of magnitude faster than state-of-the-art systems on standard workloads.

Existing System:
As Database Retrieval process is heavy weighted and Time Consuming,
Information needs to be addressed by RDF-based. Variables can be connected in arbitrary
ways. Query processing from RDF corresponds to the notion of basic graph pattern (BGP)
in SPARQL .Every query represents a graph pattern consisting of a set of triple patterns
representing the sets of distinguished variables, undistinguished variables, and constants,
hence Retrieval based on query search also meaningless. A solution to a graph pattern q on
a graph G is a mapping from the variables in q to vertices in G such that the substitution of
variables would yield a subgraph of G. The substitutions of distinguished variables
constitute the answers. In fact, it can be interpreted as a homomorphism (i.e., a structure
preserving mapping) from the query graph to the data graph. This task of matching a query
graph pattern against the data graph is supported by various RDF stores, which retrieve
data for every triple pattern and join it along the query edges. While the efficiency of
retrieval depends on the physical data organization and indexing, the efficiency of join is
largely determined by the join implementation and join order optimization strategies. We
discuss these performance drivers that distinguish existing RDF stores. There are no single
systems but rather, the state-of-the art in RDF data management is constituted by a
combination of different concepts.

Proposed System:
We propose an efficient and scalable distributed RDF data management system for
the cloud. A structure oriented approach that exploits the structure patterns exhibited by
the underlying data captured using a structure index. Height- and label-parameterized
structure index for RDF. For capturing the structure of the underlying data, we propose to
use the structure index, a concept that has been successfully applied in the area of XML-
and semi structured data management. It is basically a graph, where vertices represent
groups of data elements that are similar in structure. For constructing this index, we
consider structure patterns that exhibit certain edge labels containing path. A structure
index can be used as a pseudo schema for querying and browsing semi structured RDF
data on the web. Further, we propose to leverage it for RDF data partitioning. To obtain a
contiguous storage of data elements that are structurally similar, vertices of the structure
index are mapped to tables. The triples with the same property label, triples with subjects
that share the same structure are physically grouped. Such fine-granular groups that match
a given query contain more candidate answers. Standard query processing relies on what
we call data-level processing. It consists of operations that are executed against the data
only. We suggest using the structure index for structure-level query processing. A basic
strategy is to match the query against the structure index first to identify groups of data
that satisfy the query structure. Then, via standard data-level processing, data in these
relevant groups are retrieved and joined. However, this needs to be performed only for
some parts of the query, which additional to the structure constraints, also contain
constants and distinguished variables representing more specific constraints that can only
be validated using the actual data. Instead of performing structure- and data-level
operations successively and independent from each other like in this basic strategy, we
further propose an integrated strategy that aims at an optimal combination of these two
types of operations.

Modules:
1. Semantic db RDF generation
2. Semantic Web RDF generation
3. DATA Partitioning and Indexing
4. Query Processing over Indexed Data

Semantic db RDF generation:

The Resource Description Framework (RDF) is constructed for semantic data on a
Relational Database containing Structured as well as unstructured data. A Schema is
identified for the relational database and a RDF representing the schema of the database is
constructed through model provided by the jena api. The Model contains all the
information about the data linkages in the schema. In this process the schema can also be
altered based on admin requirement so that the search process can be effective.

Semantic Web RDF generation:

The Semantic Web RDF generation generates an RDF data for user entered data.
The user Entered data is converted to RDF file and stored in that respective server. The
RDF file is constructed for the jena api.
The converted RDF file is created as a web service and when that the RDF file is
required to send an web service response.

DATA Partitioning and Indexing:

The RDF is also generated by mining the text contents uploaded by the users in
blogs and the contents of the file are analyzed and the meta contents are manipulated. The
meta contents are the key for search process so that the file can be rendered on demand.
The Text mining process analyses the text word by word and also picks up the literal
meaning behind the group of words that constitute the sentence. The Words are analyzed
in WordNet api so that the related terms can be found for use in the meta content in
generation of RDF. Generally RDF runs in the web services of Servers in all over the
world to provide the db to the distribution in the web to access it. Hence this process is
shown in real time and the text also analyzed in a WebService provided by an open source
project deployed in a real time server. So the user uploaded content will also be analyzed
in real time servers in their own natural language processing strategies and the results are
obtained in a RDF format so that it can be understood by other Servers.

Query Processing over Indexed Data:

Similar data’s are grouped together that relate to the same resource. The
data level processes are subjected to structure level processing by indexing the semantic
data elements. Multiple RDFs are grouped and structured together to form a master RDF
data that holds all the semantic information’s of a Server that support reasoning in any
formats of query processing. The Different resources are interlinked with high degree of
relational factors by the predicates in the triples. The Query processing is handled directly
in the RDF file by iterating the triples forming a discrete relation with the Service query
and the URI representing the location of the resource is returned.

Software Requirements
 Windows XP/7
 JDK 1.6
 J2EE
 Tomcat 6.0
 MySQL

Hardware Requirements
 Hard Disk : 80GB and Above
 RAM : 2GB and Above
 Processor : P IV and Above
Architecture Diagram:

Admin Indexing

Building User Schema from

defined schema Text

Relational
DB Text Mining

Master
RDF

File
Web Upload
Service
Server
Blog User

WEB

Query Processing RDF Data with Hadoop
100% (1)
Query Processing RDF Data with Hadoop
7 pages
Real-Time RDF Stream Processing with Storm
No ratings yet
Real-Time RDF Stream Processing with Storm
5 pages
Semantic Web in Cloud Computing Systems
No ratings yet
Semantic Web in Cloud Computing Systems
5 pages
Mining Semantic Web Data Efficiently
No ratings yet
Mining Semantic Web Data Efficiently
4 pages
Efficiently Querying RDF Data in Triple Stores: Ying Yan, Chen Wang, Aoying Zhou, Weining Qian, Li Ma, Yue Pan
No ratings yet
Efficiently Querying RDF Data in Triple Stores: Ying Yan, Chen Wang, Aoying Zhou, Weining Qian, Li Ma, Yue Pan
10 pages
Understanding RDF in Semantic Web
No ratings yet
Understanding RDF in Semantic Web
66 pages
RDFS: Enhancing RDF for Semantic Web
No ratings yet
RDFS: Enhancing RDF for Semantic Web
4 pages
RDF and XSLT in Semantic Web
No ratings yet
RDF and XSLT in Semantic Web
17 pages
Sesame: RDF Data Storage & Querying
No ratings yet
Sesame: RDF Data Storage & Querying
16 pages
RDF Abstract Syntax and Concepts Guide
No ratings yet
RDF Abstract Syntax and Concepts Guide
26 pages
RDF Querying with Apache Spark Review
No ratings yet
RDF Querying with Apache Spark Review
6 pages
A Distributed Graph Engine For Web Scale RDF Data
No ratings yet
A Distributed Graph Engine For Web Scale RDF Data
12 pages
Understanding Resource Description Framework
No ratings yet
Understanding Resource Description Framework
17 pages
RDFcache Sigmod15
No ratings yet
RDFcache Sigmod15
16 pages
02 RDF (S)
No ratings yet
02 RDF (S)
36 pages
TriAD: Asynchronous Distributed RDF Engine
No ratings yet
TriAD: Asynchronous Distributed RDF Engine
12 pages
Core RDF Classes and Properties Explained
No ratings yet
Core RDF Classes and Properties Explained
35 pages
Rdfpeers: A Scalable Distributed RDF Repository Based On A Structured Peer-To-Peer Network
No ratings yet
Rdfpeers: A Scalable Distributed RDF Repository Based On A Structured Peer-To-Peer Network
8 pages
Understanding Ontologies and RDF Concepts
No ratings yet
Understanding Ontologies and RDF Concepts
10 pages
RDF and Ontology in Semantic Web
No ratings yet
RDF and Ontology in Semantic Web
59 pages
Future Libraries and Linked Data Insights
No ratings yet
Future Libraries and Linked Data Insights
9 pages
Efficient Compressed Indexing for RDF Data
No ratings yet
Efficient Compressed Indexing for RDF Data
12 pages
Semantic Social Media Platform Development
No ratings yet
Semantic Social Media Platform Development
6 pages
RDF and Semantic Web Exam Notes
No ratings yet
RDF and Semantic Web Exam Notes
4 pages
Understanding Resource Description Framework
No ratings yet
Understanding Resource Description Framework
34 pages
Understanding the Semantic Web Concepts
No ratings yet
Understanding the Semantic Web Concepts
25 pages
RDF and SPARQL for Big Data Solutions
No ratings yet
RDF and SPARQL for Big Data Solutions
4 pages
Understanding SPARQL and Linked Data
No ratings yet
Understanding SPARQL and Linked Data
47 pages
RDF Clustering Techniques and Applications
No ratings yet
RDF Clustering Techniques and Applications
12 pages
#1 Semantic Web Vision and Introduction Part2
No ratings yet
#1 Semantic Web Vision and Introduction Part2
52 pages
Understanding RDF in the Semantic Web
No ratings yet
Understanding RDF in the Semantic Web
42 pages
Challenges and Potential of RDF in Data
No ratings yet
Challenges and Potential of RDF in Data
12 pages
Core Technologies of the Semantic Web
No ratings yet
Core Technologies of the Semantic Web
6 pages
SPARQL Query Performance in RDF Data
No ratings yet
SPARQL Query Performance in RDF Data
2 pages
Semantic Web - Unit3 - Questions and Answers
No ratings yet
Semantic Web - Unit3 - Questions and Answers
13 pages
5 RDF
No ratings yet
5 RDF
88 pages
Custom Data Partitioning for Big RDF
No ratings yet
Custom Data Partitioning for Big RDF
18 pages
Understanding Semantic Web Ontologies
No ratings yet
Understanding Semantic Web Ontologies
12 pages
Scalable RDF Querying with RDF3X-MPI
No ratings yet
Scalable RDF Querying with RDF3X-MPI
54 pages
Understanding the Semantic Web Basics
No ratings yet
Understanding the Semantic Web Basics
18 pages
OWL Ontology Development for E-Tourism
No ratings yet
OWL Ontology Development for E-Tourism
34 pages
RDF Primer Int Turtle
No ratings yet
RDF Primer Int Turtle
55 pages
RDF Security Enhancement with Salt
No ratings yet
RDF Security Enhancement with Salt
4 pages
Modeling and Management of Fuzzy Semantic RDF Data
No ratings yet
Modeling and Management of Fuzzy Semantic RDF Data
217 pages
RDF-BF-Hypergraph for Relational Databases
No ratings yet
RDF-BF-Hypergraph for Relational Databases
24 pages
RDF Basics in the Semantic Web
No ratings yet
RDF Basics in the Semantic Web
120 pages
Understanding Semi-Structured Data Types
No ratings yet
Understanding Semi-Structured Data Types
27 pages
SPIDER ASystemforScalableParallelDistributedEvaluationoflarge ScaleRDFData
No ratings yet
SPIDER ASystemforScalableParallelDistributedEvaluationoflarge ScaleRDFData
5 pages
RDF in Web-Based Information Systems
No ratings yet
RDF in Web-Based Information Systems
13 pages
Querying RDF Data with SPARQL
No ratings yet
Querying RDF Data with SPARQL
12 pages
An MITADT University Initiative To Provide The Students Lectures Through Remote Access
No ratings yet
An MITADT University Initiative To Provide The Students Lectures Through Remote Access
112 pages
DLDB: Enhancing Relational Databases for Semantic Web
No ratings yet
DLDB: Enhancing Relational Databases for Semantic Web
15 pages
Developing Semantic Web Applications
No ratings yet
Developing Semantic Web Applications
32 pages
Semantic Web Roadmap Overview
No ratings yet
Semantic Web Roadmap Overview
10 pages
Understanding Resource Description Framework
No ratings yet
Understanding Resource Description Framework
55 pages
RDB to RDF Translation Approaches Survey
No ratings yet
RDB to RDF Translation Approaches Survey
26 pages
Learn OWL Ontology from MongoDB Data
No ratings yet
Learn OWL Ontology from MongoDB Data
10 pages
Class XII PA 1 Syllabus Overview
No ratings yet
Class XII PA 1 Syllabus Overview
2 pages
Raspberry Pi Drone Detection System
No ratings yet
Raspberry Pi Drone Detection System
7 pages
Field Oriented Control of PMSM Motors
No ratings yet
Field Oriented Control of PMSM Motors
52 pages
Bayesian Clinical Trials in Action
No ratings yet
Bayesian Clinical Trials in Action
23 pages
Understanding Web Crawlers and Their Strategies
No ratings yet
Understanding Web Crawlers and Their Strategies
22 pages
Thermal Physics Problem Set Guide
No ratings yet
Thermal Physics Problem Set Guide
5 pages
NEC Transformer Breaker & Fuse Sizing
100% (2)
NEC Transformer Breaker & Fuse Sizing
4 pages
Every Piano Fluency Resource Compiled
No ratings yet
Every Piano Fluency Resource Compiled
17 pages
Active Muffler Design for Engine Noise Control
No ratings yet
Active Muffler Design for Engine Noise Control
6 pages
Impact of Twist on Cotton Yarn Strength
No ratings yet
Impact of Twist on Cotton Yarn Strength
6 pages
Standard Algorithms for Data Processing
No ratings yet
Standard Algorithms for Data Processing
27 pages
2019 Commander Series Parts Manual PN - 37355 - FDPB - Rev - 1
No ratings yet
2019 Commander Series Parts Manual PN - 37355 - FDPB - Rev - 1
212 pages
Transport Phenomena: Mass Transfer Analogies
No ratings yet
Transport Phenomena: Mass Transfer Analogies
27 pages
Torque and Center of Gravity Concepts
No ratings yet
Torque and Center of Gravity Concepts
18 pages
Solar Energy Storage Analysis Data
No ratings yet
Solar Energy Storage Analysis Data
198 pages
Class 10 Maths Polynomials MCQ Paper
No ratings yet
Class 10 Maths Polynomials MCQ Paper
4 pages
Understanding Research: Challenges & Benefits
No ratings yet
Understanding Research: Challenges & Benefits
5 pages
College Geometry A Unified Development (David C. Kay)
100% (7)
College Geometry A Unified Development (David C. Kay)
641 pages
Introduction to Zeolite Molecular Sieves
100% (2)
Introduction to Zeolite Molecular Sieves
20 pages
Car Park Fire Safety: Smoke Control Insights
No ratings yet
Car Park Fire Safety: Smoke Control Insights
31 pages
Comprehensive Arduino Project List
No ratings yet
Comprehensive Arduino Project List
4 pages
Calculating Pi Using Jar Lid Measurements
No ratings yet
Calculating Pi Using Jar Lid Measurements
4 pages
CEMA D Idler Specifications Overview
No ratings yet
CEMA D Idler Specifications Overview
10 pages
66kV Switchyard Earthing Design Calculations
100% (1)
66kV Switchyard Earthing Design Calculations
8 pages
Assembly Language Lab Exam Guide
No ratings yet
Assembly Language Lab Exam Guide
16 pages
Gossner College Roll Code 11013
No ratings yet
Gossner College Roll Code 11013
1 page
Year 2 Mathematics End-of-Term Test
No ratings yet
Year 2 Mathematics End-of-Term Test
9 pages
Mounting Considerations For The SOT-227 Package: Application Note 1818
No ratings yet
Mounting Considerations For The SOT-227 Package: Application Note 1818
13 pages
Door Hardware Specification Overview
No ratings yet
Door Hardware Specification Overview
5 pages
Excel Formula Reference Behavior
No ratings yet
Excel Formula Reference Behavior
121 pages

DiploCloud: Scalable RDF Management

Uploaded by

DiploCloud: Scalable RDF Management

Uploaded by

DiploCloud: Efficient and Scalable Management of RDF Data

Semantic db RDF generation:

Semantic Web RDF generation:

DATA Partitioning and Indexing:

Query Processing over Indexed Data:

Building User Schema from

You might also like