How To Map Relational Data To A Graph DB in Four Steps: by Steven Yang
How To Map Relational Data To A Graph DB in Four Steps: by Steven Yang
in Four Steps
By Steven Yang
Today, the relational data storage model is probably the most popular concept for
storing data. Just like its physical ancestor, index cards, relational data storage groups
data records of the same type (those with the same properties or attributes). Storing
data this way makes single record information retrieval easy because all attributes for
a record are stored in the same location. Searching among large set of records can be
very effective using an attribute indexing mechanism.
Considering database history, it’s not surprising that most of today’s data is still
stored in relational models. However, in the relational data storage model, there
is no physical link to indicate the relation among different data records. They
are only linked logically by a special attribute called a foreign key, rendering the
search for related records requires lookups on the entire target table, which as the
target table grows, is very inefficient.
HOW IT WORKS
Mapping data from a relational database to graph database is fundamentally a
task of converting the relational representation from one database to the other,
from table structure to graph structure. A data entity in the table of the relational
database is a “row (that contains columns), and in the graph database it’s a node
(that contains attributes). More specifically, we can use the foreign keys of the
relational data model to build edges, thus transforming loosely coupled data
records into a highly bounded group of nodes.
Let’s start with a simplified knowledge domain created from a Microsoft
Northwind database sample and go step by step on how to map the data from a
relational to a graph database.
Figure 1 shows five tables that are a modified version of the Microsoft
Northwind database, a typical relational data model.
Employee Customer
PK EmployeeID PK CustomerID
Order
ReportsTo CompanyName
PK OrderID
Name ContactName
CustomerID
Region Region
EmployeeID
ProductID
Supplier
PK SupplierID
Product
CompanyName
PK ProductID
attribute name
SupplierID Region
STEP ONE
The first step is to identify the data entity types. In this example there are five
types of data entities:
• Employee
• Order
• Customer
• Product
• Supplier
STEP TWO
In the second step, we need to find the semantic relations between those entities.
We find five relations among them which are:
This step is less obvious. As we mentioned before, the relational database is not
handling relations as a first class citizen. “Reports To” relation can be found as an
attribute in Employee table, but “Sold By” semantic is not in the schema. There is
a logical link between Order and Employee using a foreign key, but the meaning
of this relation is not coded in the database.
See figure 2. The relations on the schema diagram are highlighted.
Employee Customer
PK EmployeeID PK CustomerID
Order
ReportsTo SoldTo
ReportsTo CompanyName
PK OrderID
SoldBy
Name ContactName
CustomerID
Region Region
EmployeeID
ProductID
Supplier
ItemSold PK SupplierID
Product
CompanyName
Supplies
PK ProductID
attribute name
SupplierID Region
STEP THREE
When data entities and relations (in above table and Figure 2) are correctly
identified, we can create nodes for each data entity and edges for each relation
we just discovered. Now we’ve created a graph from the relational data model
based on the predefined relation (foreign key in relational data schema).
Figure 3. In a graph, all related nodes (data records in relational data model)
are physically linked by the edges.
SOLUTION BRIEF | 4
ReportsTo
Employee Customer
ReportsTo CustomerID
Name CompanyName
Region Region
Order
OrderID
SoldBy
CustomerID SoldTo
EmployeeID
Product ProductID
ProductID
SupplierID ItemSold
Supplier
SupplierID
CompanyName
Supplies Region
STEP FOUR
In Figure 4, you can see that there are two common attributes that are used in
more than one node. Region information is used in Employee, Customer, and
Supplier. And Company information is used in Customer and Supplier.
ReportsTo
Employee Customer
ReportsTo CustomerID
Name CompanyName
Region Region
Order
OrderID
SoldBy
CustomerID SoldTo
EmployeeID
Product ProductID
ProductID
SupplierID ItemSold
Supplier
SupplierID
CompanyName
Supplies Region
Promoting those common attributes to nodes with proper relations to the entities
could add value to our graph. In our example, after promoting Region and
Company to nodes, we can easily answer questions like the ones below:
• Question 1: Give me all customers that are in the same region as James Bond.
• Question 2: Which companies are both our customer and our supplier?
Figure 5. Shows that the graph has two new nodes Region and Company.
ReportsTo
Name CompanyName
Region Region
Order In
OrderID
SoldBy
CustomerID SoldTo
EmployeeID
Product ProductID
ProductID
SupplierID ItemSold
Supplier
SupplierID
CompanyName
Supplies Region
Finding and building the right edges in the graph is a key activity. These steps
impact the ability of the data structure to respond efficiently to end users’
questions. They consist of transforming logical relations (foreign keys) and
implicit relations (common attributes) into explicit edges that are physically
stored in graph structure. We explained this approach using a very simple use
case. With real enterprise data, we would have to use specific tools to facilitate
the design of this mapping and to automate the data migration. We will describe
what such a tool could look like in a following paper.
Global Headquarters TIBCO fuels digital business by enabling better decisions and faster, smarter actions through the TIBCO
3307 Hillview Avenue Connected Intelligence Cloud. From APIs and systems to devices and people, we interconnect everything,
Palo Alto, CA 94304 capture data in real time wherever it is, and augment the intelligence of your business through analytical insights.
+1 650-846-1000 TEL Thousands of customers around the globe rely on us to build compelling experiences, energize operations, and
+1 800-420-8450 propel innovation. Learn how TIBCO makes digital smarter at www.tibco.com.
+1 650-846-1005 FAX ©2018, TIBCO Software Inc. All rights reserved. TIBCO and the TIBCO logo are trademarks or registered trademarks of TIBCO Software Inc. or its subsidiaries
in the United States and/or other countries. All other product and company names and marks in this document are the property of their respective owners and
www.tibco.com mentioned for identification purposes only.
03/15/18