50% found this document useful (2 votes)
278 views158 pages

Neo4j Graph Database Modeling Guide

Uploaded by

maitphang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
278 views158 pages

Neo4j Graph Database Modeling Guide

Uploaded by

maitphang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

I

CopyrightedMaterial
GraphDatabaseModelingwithneo4j
Copyright©2020-21byAjitSingh,AllRightsReserved.

Nopartofthispublicationmaybereproduced,storedinaretrievalsystem or
transmitted,inanyform orbyanymeans—electronic,mechanical,photocopying,
recordingorotherwise—
withoutpriorwrittenpermissionfromtheauthor,exceptfor
theinclusionofbriefquotationsinareview.

Forinformationaboutthistitleortoorderotherbooksand/orelectronicmedia,
contactthepublisher:

AjitSingh&AnantKumar
e:ajit_singh24@[Link]
e:anant@[Link]
w:[Link]
Preface

Thisbookisdesignedtowalkyouthroughthegraphdatamodeling.
Youwillbeintroducedtothe
basicprocessofdesigningagraphdatamodelthatcananswerawiderangeofbusiness
questionsacrossavarietyofdomains.

Graphdatamodelingistheprocessinwhichauserdescribesanarbitrarydomainasa
[Link]
odelis designedtoanswerquestionsintheform
ofCypherqueriesandsolvebusinessandtechnical
problemsbyorganizingadatastructureforthegraphdatabase.

Thisbookissimplytheintroductiontodatamodelingusingasimple,straightforward
scenario.
Thereareplentyofopportunitiesthroughouttheupcomingguidestopracticemodelin
gdomains andanalyzingchangestothemodelthatmightneedtobemade.

[Link]’sprobablyuselessifyoudon’[Link]
pe,size
andfunctionalityofthatcontainerdependsonyourintendeduse,butingeneral,aconta
ineris necessary.

[Link],
you
[Link]
eling.

Oftenreservedsolelyforseniordatabaseadministrators(DBAs)orprincipaldevel
opers,data
[Link]
umay worshiptheexpertdatamodelerfromafar.

Whilesomedatamodelingscenariosreallyarebestleftuptotheexperts,itdoesn’thav
etobe
[Link],datamodelingisasmuchabusinessconcernasatechnologi
calone. Soifyoudon’tknowasinglelineofcode,you’reinluck.

Anyonecandobasicdatamodeling,andwiththeadventofgraphdatabasetechnology,
[Link]
ion
[Link](i.e.,whatyouwantyourapplicat
iontodo).
Then,inthemodelingprocessyoumapthoseneedsintoastructureforstoringandorga
nizing yourdata.

Everydatamodelisunique,dependingontheusecaseandthetypesofquestionsthatus
ers [Link],thereisno“one-size-fits-
all”approachtodata
[Link]
esultin producinganaccuratedatamodelthatbenefitsyourprocessesandusecase.

Thegraphdatabasesarenecessaryforaveryconcretedatasets:hugeamounts
ofdataofhigh
complexity,[Link],they
efficientlyquerythroughtherelationshipsamongentities,incontrasttorelational
databases.
Graphdatabasessupportalgorithmstoperform concretequeriesthatareoutof
reachtorelationaldatabases,[Link],the
biggerthevolumeofdata,theslowerthequerieswouldbeinSQL,becausethey
would requireto lookup joined tableswith [Link]
databasesallow totraversethroughthegraphandreachahighlevelofdepth,
withouthavingtoreadallthedatastored.
[Link]
(ACID)thatstoresdatastructuredasgraphsconsistingofnodes,connectedby
[Link],itallowsforhighquery
performanceoncomplexdata,whileremainingintuitiveandsimpleforthedeveloper
.

Neo4jis,byfar,[Link]
[Link]
[Link];evendatasizegrow exponentially,
performanceofNeo4jdoesnotaffectedbyit.

Usingthisbook,you'llgetto learnthetheoryofgraphdatabaseandhowtouse
Neo4jtobuilduprecommendations,relationships,andcalculatetheshortestroute
[Link],bestpractices,use-cases,andan
applicationputtingeverythingtogether,thisbookwillgiveyoueverythingyouneedto
[Link],this
bookwillshow youtheadvantagesofusinggraphdatabasesalongwithdata
[Link]'llgainpracticalhands-onexperience
withcommonlyusedandlesserknownfeaturesforupdatinggraphstorewith
Neo4j'[Link],
helpsyougraspthefundamentalconceptsbehindthisradicalnewwayofdealing with
connected data,and willgiveyou lotsofexamplesofuse casesand
environmentswhereagraphdatabasewouldbeagreatinterest.

[Link]
advantageofNeo4j'spowerfulfeaturesandbenefits-addBeginningNeo4jtoyour
librarytoday.

Contents

[Link]
Graphdatabases
[Link]
Selectingvertexlabels
Examplesoflabelselection
Drawingagraphschema
Summary
[Link]
ERmodelsanddiagrams
Example
ProceduretoconvertanERmodeltoagraphschema
Rule#1:Entitytypesbecomevertextypes
Rule#2:Binaryrelationshiptypesbecomeedgetypes
Rule#3:Naryrelationshiptypesbecomevertextypes
Conversionexample
Verticesarevertices,andedgesareedges
Summary
[Link]
Normalizationofrelationaldatabases
Transformationrulesthatproduceequivalentschemas
RuleA:Renamingpropertiesandlabels
RuleB:Reversingedgedirections
RuleC:Propertydisplacement RuleD:Specializationandgeneralization
RuleE:Edgepromotion
RuleF:Propertypromotion
RuleG:Propertyexpansion
Summary
[Link]
Schemasandconstraints
Graphuniverses,transformationsandequivalence
Derivedtypes
Metarule:Addingandremovingderivedtypes
Provingthemetarule
Provingthe7rules:Renaming,Reversing,PropertyDisplacement,
Beyondtransformationrules
Summary
[Link]
[Link]:Firstorderlogicongraphdatabases
Background
OnSQL
Onfirstorderlogic
OnGremlin
Pixy:FirstorderlogicwithGremlin
ERmodelsinPixy
Queryrequirementsdon'tusuallymatterwhilemodeling
[Link]
StateoftheartofDatabases
TypesofDBMS
NoSQLDBMS ComparisonofDBMS
Currenttrends
[Link]
GraphTheoryandItsApplications
ConceptsofGraphDatabases
Queryperformance
10.Neo4j
IntroductionofNeo4j
AdvantagesofNeo4j
PropertiesofNeo4j
PerformanceInNeo4j
HowToIncreasePerformanceOfNeo4j?
CypherQueryLanguage
Structure
OperationsInCypher
LoadingDataWithCypher
UseCasesofNeo4j
11.Gettingstartedwithneo4j
InstallationorSetup
Installation&StartingaNeo4jserver
StartNeo4jfromconsole(headless,withoutwebserver)
StartNeo4jwebserver
StartNeo4jwebserver
Deleteoneofthedatabases
CypherQueryLanguage
RDBMSVsGraphDatabase
Cypher-Implementation
Creation Createanode
Createarelationship
QueryTemplates
CreateanEdge
Deletion
Deleteallnodes
Deleteallnodesofaspecificlabel
Match(capturegroup)andlinkmatchednodes
UpdateaNode
DeleteAllOrphanNodes
Python&Noe4j
12.Neo4jApplication
UseCaseSelected
Data
ImplementingData
Exportdata
QueryExamples(Neo4j-SQL)
ShortestPath
Betweennesscentrality:
Closenesscentrality:
PageRank:
CommunityDetection:
PossiblequeriesonSQL Bibliography

PartI
Chapter1

GraphDataModel

[Link],and
[Link].
Tablesarerelatedbyforeign-keyconstraints,whichishowyoucanconnectone
table’sinformationtoanother,[Link]-leveljoinsareoften
involvedwhenqueryingrelationaldatabases.

Foragraph,specificallyascatterplot,thinkoftheelementsasnodesor,[Link]
[Link]
e pairs and a [Link] are connected by relationships oredges.
Relationshipshaveatypeandadirection,[Link]
[Link]
andmorepowerfulwhenthemeaningisintherelationshipsbetweenthedata.
Relationaldatabasescaneasilyhandledirectrelationships,butindirectrelationship
s aremoredifficulttodealwithinrelationaldatabases.

Figure1a

Whenbuildingarelationaldatabase,[Link]
questionswillwebewantingtoanswer?Forexample,youwanttoknowhowmany
peoplewhoboughtatoaster,liveinKansas,haveacriminalrecord,anduseda
[Link],orthepersonwhocreated
thedatabasedidnotanticipateaquestionlikethis,itmaybeverydifficulttoretrieve
thatinformationfrom [Link],itispossibleto
[Link],youcanansweranyquestionaslong
[Link]
[Link]
[Link]
datapoints,rather,[Link]
information.
Therearetwopropertiesofgraphdatabasesweshouldconsiderwheninvestigatingg
raph databasetechnologies:

Theunderlyingstorage
Somegraphdatabasesusenativegraphstoragethatisoptimizedanddesignedforstori
ng
[Link],
[Link],anobject-oriented
database,orsomeothergeneral-purposedatastore.

Theprocessingengine
Somedefinitionsrequirethatagraphdatabaseuseindex-
freeadjacency,meaningthat
connectednodesphysically“point”[Link]
ly broaderview:anydatabasethatfrom
theuser’sperspectivebehaveslikeagraphdatabase
(i.e.,exposesagraphdatamodelthroughCRUDoperations)qualifiesasagraphdata
base. We do acknowledge,however,the significantperformance advantages
ofindex-free
adjacency,andthereforeusethetermnativegraphprocessingtodescribegraphdatab
ases thatleverageindex-freeadjacency.

From adatabasepointofview,theconceptualtoolsdefiningaDB-Modelshould
addressatleastthestructuringanddescriptionofthedata,itsmaintainabilityand
theform [Link],aDB-Modelis
definedasacombinationofthreecomponents,firstacollectionofdatastructure
types,secondacollectionofoperatorsorinferencerulesandthirdacollectionof
[Link]-Modelsdefineonlythe
datastructures,omittingsometimesoperatorsand/orintegrityrules.

Duetotheimportanceofmodelingconceptually,philosophicallyandinpractice,DB
[Link]-
Model
are:Toolforspecifyingthekindsofdatapermissible;generaldesignmethodologyfor
databases;copingwithevolutionofdatabases;developmentoffamiliesofhighlevel
languagesforqueryanddatamanipulation;focusinDBMSarchitecture;vehiclefor
researchintothebehavioralpropertiesofalternativeorganizationsofdata.
Sincetheemergenceofdatabasemanagementsystems,therehasbeenanongoing
debateaboutwhattheDB-Modelforsuchasystem [Link]
diversityofexistentDB-Modelsshowthatthereisnosilverbulletfordatamodeling.
Theparametersinfluencingtheirdevelopmentaremanifold,andamongthemost
importantwecanmentionthecharacteristicsorstructureofthedomaintobe
modeled,thetypeofintellectualtoolsthatappealstheuser,andofcourse,the
[Link],eachDB-
Modelproposal
isgroundedoncertaintheoreticaltools,andservesasbaseforthedevelopmentof
relatedmodels.
Figure1b:[Link],arrowsindicateinfluences,and
[Link].
DatabaseModelsEvolution–BriefHistoricalOverview
InthebeginningsofthedesignofDB-Models,physical(hardware)constraintswere
[Link]
relationalmodel,mostDB-Modelfocusedessentiallyinthespecificationofthe
[Link].˜cite50130developeda
taxonomyofDB-Modelspriorto1976,comparingessentiallytheirmathematical
structuresandfoundation,andthelevelsofabstractionused.

TworepresentativeDB-Modelsarethehierarchical andnetworkmodels,which
emphasizethephysicallevel,andoffertheuserthemeanstonavigatethedatabase
attherecordlevel,thusprovidinglow leveloperationstoderivemoreabstract
structures.

TherelationalDB-ModelwasintroducedbyCoddandhighlightsthe
conceptoflevelofabstractionbyintroducingtheideaofseparation
[Link]
[Link],itgainedawidepopularity
amongbusinessapplications.

SemanticDB-Modelsallowdatabasedesignerstorepresentobjects
andtheirrelationsinanaturalandclearmannertotheuser(asopposed
topreviousmodels).Theyintendedtoprovidetheuserwithtoolsthat
couldcapturefaithfullythesemanticsoftheinformationtobemodeled.
Awellknownexampleistheentityrelationshipmodel.

ObjectorientedDB-Modelsappearedintheeighties,whenmostofthe
researchwasconcernedwithsocalled“advancedsystemsfornewtypesof
[Link]-Modelsarebasedontheobjectorientedparadigm
andtheirgoalisrepresentingdataasacollectionofobjectsthatare
organizedinclassesandhavecomplexvaluesassociatedwiththem.

SemistructuredDB-Modelsaredesignedtomodeldatawithaflexible
structure,e.g.,[Link](also called
unstructured data)is neitherraw norstrictly typed as in
[Link],dataismixedwiththe
schema,[Link]
redintheninetiesandarecurrentlyinevolution.
TheXML(eXtendedMarkupLanguage)modeldidnotoriginateinthe
[Link]
exchangeandmodeldocuments,soonitbecameageneralpurpose
model,[Link]
semistructuredmodel,schemeanddataaremixed.SeeSection2.3fora
moreindepthcomparisonamongthesemodels.

[Link]-Models
designedforparticularapplications,aswellasmodelingframeworksnot directly
focusing in database issues,which indirectly concern graph database
modeling. Among the DB-Models are Spatial databases, Geographical
Information Systems (GIS), Temporal DB-Models], MultidimensionalDB-
Models].Frameworksrelatedtoourtopic,butnot
directlyfocusingindatabaseissuesareSemanticNetworks.

GraphDatabaseModels–BriefHistoricalOverview

ThenotionofgraphDB-Modelmadeitsappearancealmostinparallelwiththe
objectorientedDB-
Models,asanalternativetothelimitationsoftraditionalDBModelsforcapturingthei
nherentgraphstructureofdataappearinginapplications
suchashypertextorgeographicdatabasesystems,wheretheinterconnectivityof
dataisanimportantaspect.

Activityaroundgraphdatabasesflourishedinthefirsthalfoftheninetiesandthenthe
[Link]:thedatabase
communitymovedtowardsemistructureddata(aresearchtopicwhichdidnothave
linkstothegraphdatabaseworkinthenineties);theemergenceofXMLcapturedallth
e attentionoftheworkonhypertext;peopleworkingongraphdatabasesmovedto
particularapplicationslikespatialdata,web,documents;thetreelikestructureiseno
ugh
formostapplications.Figure2reflectsthisevolutionbymeansofpaperspublishedin
mainconferencesandjournals.

GraphDB-Modelsemergedwiththeobjectiveofmodelinginformationwhose
[Link],RoussopoulosandMylopoulosfacingthe
failureofcurrent(atthetime)systemstotakeintoaccountthesemanticsofthe
database,[Link]
implicitstructureofgraphsforthedataitselfwaspresentedintheFunctionalData
Model,whosegoalwastoprovidea“conceptuallynatural”databaseinterface.A
differentapproachproposedtheLogicalDataModel,whereanexplicitgraphDBMo
delintendedtogeneralizetherelational,[Link]
laterKuniiproposedagraphDB-Modelforrepresentingcomplexstructuresof
knowledgecalledGBASE.

GraphDatamodeling
WhatisaGraphDataModel?

GraphDB-ModelisconceptualizedaccordingtothethreebasiccomponentsofaDB
-Model,namelydatastructures,transformationlanguage,andintegrityconstraints.
AgraphDB-Modelischaracterizedby:

Thedataand/ortheschemaarerepresentedbygraphs,orbydata structures
generalizing the notion of graph (hypergraphs,
hypernodes,hygraphs,etc.).Almosteverybodycoincideonthis
pointmoduloslightvariations.

[Link]
istomodelthedatabasedirectlyandentirelyasagraph[58].Agraph DB-
Modelisonewhosesingleunderlyingdatastructureisalabeled
directedgraph;[Link]
schemainthismodelisadirectedgraph,whereleavesrepresentdata
[Link]
labeledgraphsareusedastheformalism tospecifyandrepresent database
schemes,instances,and [Link] modelis basically
[Link],adatabaseis
describedintermsofalabeleddirectedgraphcalledschemagraph.A graphDB-
Modelformalizestherepresentationofthedatastructures
[Link]
[Link]
[Link]
instancesanddatabasesschemesaredescribedbycertaintypesof
labeledgraphs[68].Themodelfordataisorganizedasgraphs.
Labeledgraphsareusedtorepresentschemesandinstances.
Ontopofthesedescriptions,onecouldaddthefactthatsometimestheschema
andthedata(instances)aredifficulttodifferentiateinthesemodels,afactthat
[Link]
instancesareseparated.

Data manipulation is expressed by graph transformations orby


operationswhosemainprimitivesaddressdirectlytypicalfeaturesof graphs,like
paths,neighborhoods,subgraphs,graph patterns,
connectivity,andstatisticsaboutgraphs(diameter,centrality,etc.). TheDB-
Modeldefinesaflexiblecollectionoftypeconstructorsand
operatorswhichcreateandaccessthegraphdatastructuresorin
otherterms,theapproachistoexpressallqueriesintermsofafew
[Link]
canbebasedonpatternmatching,[Link]
prototypicalpieceofaninstancegraph.

Theexistenceofintegrityconstraintsenforcingtheconsistencyofthe data,which
aredirectlyrelated to thegraph [Link] example,labelswith unique
names typing constraints on nodes
functionaldependencies,domainandrangeofproperties.

Summarizing,agraphDB-Modelisamodelwherethedatastructuresfortheschema
and/orinstancesaremodeledasa(labeled)
(directed)graph,orgeneralizationsofthe
graphdatastructure,wheredatamanipulationisexpressedbygraphorientedoperati
ons
andtypeconstructors,andhasintegrityconstraintsappropriateforthegraphstructure
.

WhyaGraphDataModel?

TheapplicationareasofgraphDB-
Modelmodelsarethosewereinformationaboutthe
interconnectivityorthetopologyofthedataismoreimportant,orasimportantas,the
[Link]
[Link],introducinggraphsasamodelingtoolhasseveral
advantagesforthistypeofdata.
First,itleadstoamorenaturalmodeling:[Link]
allow anaturalwayofhandlingdataappearinginapplications([Link]
geographicdatabases).Graphshaveanimportantadvantage:theycankeepallthe
informationaboutanentityinasinglenodeandshow relatedinformationbyarcs
connected to [Link] objects(likepaths,neighborhoods)mayhavefirstorder
citizenship;auser
Typeof Abstract. Basedata Main Model level structure Focus Datacomplex. homogeneity.

Network physical point+rec. records simple/hom.


Relational logical relations data/attributes simple/hom.
Semantic user graphs schema/relations medium/hom.
Object logical/physicalobjects object/methods high/het.
Semistructurelogical tree data/components. medium/het.
Graph logical graph data/relations medium/het

Table1:Acoarsegranularitycomparativeviewamongdifferentgeneral
[Link]:abstractionlevel,base
datastructureused,whatarethetypesofinformationobjectstheDBModelfocusin,co
mplexityandhomogeneityofthedataitemsmodeled.

Second,[Link]
specificgraphoperationsinthequerylanguagealgebra,suchasfindingshortest
paths,determiningcertainsubgraphs,[Link]
[Link],this
isincontrasttographmanipulationindeductivedatabases,whereoftenfairly
complexruleprogramsneedtobewritten..Lastbutnotleast,forpurposesof
browsingitmaybeconvenienttoforgettheschema.

Third,asfarasimplementationisconcerned,graphdatabasesmayprovidespecial
storagegraphstructuresfortherepresentationofgraphsandthemostefficient
[Link]
havesomestructure,thestructureisnotasrigid,regularorcompleteastraditional
[Link]
[Link] canuseefficientgraphalgorithmsdesignedto
utilizethespecialgraphdatastructures[58].

ComparisonwithotherDatabaseModels
InthissectionwecomparethemostinfluentialDB-ModelswithgraphDB-Models.
[Link]
wepresentthedetails.

[Link]
[Link] the
hierarchicaland network [Link] models lack good
[Link]
datastructuring isnotflexibleand notaptto modelnontraditional
[Link].

RelationalDB-ModelwasintroducedbyCoddtohighlighttheconcept
oflevelofabstractionbyintroducingacleanseparationbetweenphysical
[Link]
[Link]
therelationalmodel,inatimewherethedomainofapplicationwere
basicallysimpledata(banks,payments,commercialandadministrative
applications).

The relationalmodelwas a landmark developmentbecause it


[Link]
basedonthesimplenotionofrelation,whichtogetherwithitsassociated algebraand
logic,madetherelationalmodelaprimarymodelfor
[Link],itsstandardqueryandtransformation
language,SQL,becameaparadigmaticlanguageforquerying.

ThedifferencesbetweengraphDB-ModelsandtherelationalDB-Model
[Link]:therelationalmodelwas
directedtosimplerecordtypedatawithastructureknowninadvance
(airlinereservations,accounting,inventories,etc.).Theschemaisfixedand
[Link] nor
automatizable. The query language does not support paths,
neighborhoodsandseveralothergraphoperations,likeconnectivity(an
exceptionistransitivity).Therearenoobjectsidentifiers,butvalues.

SemanticDB-Modelshavetheirorigininthenecessitytoprovidemore
expressivenessandincorporatearichersetofsemanticsintothedatabase from
[Link] databasedesignerstorepresent
objectsandtheirrelationsinanaturalandclearmanner(similartothewaythe
userviewanapplication)byusinghighlevelabstractionconceptssuchas
aggregation,classificationandinstantiation,subandsuperclassing,attribute
[Link]
[Link],butdue
tolackofprecisenesscannotreplacemodelslikerelationalorObjectOriented.
OtherexamplesofsemanticDB-ModelsareIFO
[Link],semanticDB-
Modelsarerelevantbecausetheyarebasedon
agraphlikestructurewhichhighlightstherelationsbetweentheentitiestobe
modeled.

Objectoriented(OO)DB-Models[75]appearedintheeighties,when the database


community realized thatthe relationalmodelwas
inadequatefordataintensivedomains(Knowledgebase,engineering
applications).OO databases were motivated bythe emergence of
nonconventionaldatabaseapplicationsconsistingofcomplexobjects systems
with many semantically interrelated components as in
CAD/CAM,[Link]
OOprogrammingparadigm onwhichtheyarebased,theirobjectiveis
representingdataasacollectionofobjectsthatareorganizedinclasses
[Link] OO DB-
ModelspermitmuchricherstructuresthantherelationalDBModel,theystillrequiret
hatalldataconformtoapredefinedschema.

OO DB-ModelshavebeenrelatedtographDB-Modelsduetothe
[Link],there
remainimportantdifferencesrootedintheform thateachofthem
[Link]-Modelsviewtheworldasasetofcomplex
objectshavingcertainstate(data)andinteractingamongthem by
[Link],graphDB-Models,viewtheworldasanetwork
ofrelations,emphasizing theinterconnection ofthedata,and the
[Link]-Modelsisonthe
dynamicsoftheobjects,[Link],graphDBModelsemph
asizestheinterconnectionwhilemaintainingthestructural
andsemanticcomplexityofthedata.

[Link](alsocalled
unstructureddata)wasmotivatedby:theincreasedexistenceofunstructured
data,dataexchangeand,[Link]
irregular,implicitandpartial;theschemadoesnotrestrictthedata,only
describesit,isverylargeandrapidlyevolving;theinformationassociatedwitha
schemaiscontainedwithinthedata(datacontainsdataanditsdescription,soit
isselfdescribing).AmongthemostrepresentativemodelsareOEM,Lorel,UnQL,
[Link],semistructureddataisrepresentedbyatreelike
[Link],establishinginthis
[Link]
semistructureddataasrooteddirectedconnectedgraphs.

GraphDataModelMotivationsandApplications

GraphDB-Modelsaremotivatedbyreallifeapplicationswhereinformationabout
[Link]
areasinClassicalandComplexnetworks.

Classical Applications. The applications that motivated the


introductionofthenotionofgraphdatabasesweremanifold:

Generalizations [Link] were


criticizedfortheirlackofsemantics,theflatstructureofthedata
theyallow,thedifficultiesfortheuserto“see”theconnectivityof
thedata,andthedifficulttomodelcomplexobjects.

Onthesamedirection,theobservationthatgraphshavebeenintegral
partofthedatabasedesignprocessinsemanticandobjectorientedDB
-Models,broughttheideaofintroducingamodelinwhichboth,data
manipulationanddatarepresentationweregraphbased.

Limitations of expressive power of languages for complex


applicationsmotivatedalsothesearchformodelsthatresemble
morecloselysuchapplications.

Limitations(atthetime)ofknowledgerepresentationsystems,and
theneedforintricatebutflexibleknowledgerepresentationand
derivationtechniques.

[Link] this
direction the application in mind were CASE,CAD,image
processing,andscientificdataanalysis.

Graphicalandvisualinterfaces,geographical,pictorialandmultimedia systems.

ApplicationswheredatacomplexityexceededtherelationalDB-Model
[Link],managing
transportnetworks(train,plane,water,telecommunications),spatially
embeddednetworkslikehighway,[Link]
applicationsarenowinthefieldofGeographicalinformationsystems
andspatialdatabases.

There are otherapplications who motivated graph DB-Models:


softwaresystems,integration.

[Link] huge
networksofdata which share some particularmathematical parameters, called
complex networks. The need for database
managementforsomeclassesofthesenetworkshasbeenrecently
[Link] thepointofviewof
databasesonecantreatthemasawhole,wewilldescribethemtogether
[Link],wewillgroup them in four
categories:socialnetworks,information networks,
[Link]
specificexamplesforeachofthem.

Insocialnetworks,nodesarepeopleandgroupswhilelinksshow
[Link],
businessrelationships,patternsofsexualcontacts,researchnetworks
(collaboration,co-authorship),communicationrecords(mail,telephone
calls,email),Computernetworks,[Link]
activityintheareaofSocialNetworkanalysis,visualizationanddata
processinginsuchnetworks.

Ininformationnetworksoccurrelationssuchascitationsbetween
academicpapers,WorldWideWeb(hypertext,hypermedia),peertopeer
networks,relationsbetweenwordclassesinathesaurus,preference networks.
Intechnologicalnetworksthestructureismainlygovernedbyspaceand
[Link](asnetworkofcomputers),Electric
powergrids,airlineroutes,telephonenetworks,deliverynetwork(postoffice).
TheareaofGeographicInformationSystems(GIS)istodaycoveringabigpart
ofthisarea(roads,railways,pedestriantraffic,rivers).

Biologicalnetworks represent biologicalinformation whose volume,


managementandanalysishasbecomeanissueduetotheautomationofthe
[Link],where
networksoccuringeneregulation,metabolicpathways,chemicalstructure,map
orderand homologyrelationshipsbetween [Link] otherkindsof
biologicalnetworks,suchasfoodwebs,neuralnetworks,[Link]
tremendousgrowth [Link] readercan consultdatabase proposalsfor
genomics,anoverview ofmodelsforbiochemicalpathways,atutorialon
GraphDataManagementforBiology,andamodelforChemistry.

Itisimportanttostressthatclassicalquerylanguagesofferlittlehelp
[Link],
dataprocessinginGISincludegeometricoperations(areaorboundary,
intersection,inclusions,etc),topologicaloperations(connectedness,paths,
neighbors,etc)andmetricoperations(distancebetweenentities,diameter
ofthenetwork,etc).Ingeneticregulatorynetworksexamplesofmeasures
areconnectedcomponents(interactionsbetweenproteins)anddegreesof
nearestneighbors(strongpaircorrelations).Insocialnetworks,distance,
neighborhoods,clusteringcoefficientofavertex,clusteringcoefficientofa
network,betweenness,sizeofgiantconnectedcomponents,sizedistribution
[Link],
wherequeryingRDFdataincreasinglyneedsgraphfeatures.

RepresentativeGraphDatabaseModels

InthissectionwedescribeinsomedetailthemostrepresentativegraphDB-Models,
choosingthosethatdefineanduseexplicitlygraphstructuresorgeneralizationsof
[Link],donotfit
[Link],graphsareused,forexample,fornavigation,
fordefiningviews,oraslanguagerepresentation.
Foreachproposal,wepresenttheirdatastructuresand,whenavailable,theirquery
[Link],therearefewimplementationsan
d nostandardbenchmarks,[Link]
modelingineachproposal,wewillrunthefollowingexampleaboutatoygenealogy
showninFigure3.

Figure2:Agenealogydiagram(righthandside)representedastwotables(lefthand
side)NAMELASTNAMEandPERSONPARENT.
(Childreninheritthelastnameofthe fatherjustformodelingpurposes.)

Figtype3:[Link](ontheleft)usestwobasictypenodes
forrepresentingdatavalues(NandL),andtwoproducttypenodes(NLandPP)
[Link]
(ontheright)isacollectionoftables,[Link]
internalnodesusepointers(names)tomakereferencetobasicandsetdata
datavaluesdefinedbyothernodes.

LogicalDataModel(LDM)

MotivatedbythelackofsemanticsintherelationalDB-
Model,KuperandVardiproposed aDB-
Modelthatgeneralizestherelational,[Link]
describesmechanismstorestructuredata,alogicalquerylanguageandanalgebraic
querylanguage.

InLDM aschemaisanarbitrarydirectedgraphwhereeachnodehasoneofthe
followingtypes:TheBasictypedescribesanodethatcontainsthedatastored;the
CompositiontypeTEXdescribesanodethatcontainstupleswhosecomponents

aretakenfromthechildrenofit;theCollectiontypedescribesanodethatcontains
sets,[Link],internalnodesare
oftype⊗ or⊛ representingstructureddata,terminalnodesareoftypeand
representatomicdata,andedgesrepresentconnectionsbetweendata.

Asecondversionofthemodel,besidesrenamingthenodes ⊗and
⊛ asproductandpowerrespectively,incorporatesanewtype,theUniontype∪ ,
intendedtorepresentacollectionwhosedomainistheunionofthedomainsofits
children(seeexampleinFigure4).

ALDMdatabaseinstanceconsistsofanassignmentofvaluestoeachnodeofthe
[Link],theinstanceofanodeisasetofelementsfrom the
underlyingdomain(forbasictypenodes)andtuplesorsetstakenfromtheinstance
ofthenode’schildren(for⊗,⊛andtypes).

Withtheobjectiveofavoidingcyclicityattheinstancelevel,themodelproposestoke
ep
[Link],instancesconsistofa
setoflvalues(theaddressspace),plusanrvalue(thedataspace)assignedtoeachof
[Link]
gies.
[Link],
[Link],andalgebraic
language–equivalenttothelogicallanguage–isproposed,providingoperations
fornodeandrelationcreation,transformationandreductionofinstances,andother
operationslikeunion,differenceandprojection.

LDM isacompleteDB-Model([Link]
constraints)Themodelsupportsmodelingofcomplexrelations([Link],
recursiverelations).Thenotionorvirtualrecords(pointerstophysicalrecords)pro
ves
usefultoavoidredundancyofdatabyallowingcyclicityattheschemaandinstancelev
el. Duetothefactthatthemodelisageneralizationofothermodels(liketherelational
model),theirtechniquesorpropertiescanbetranslatedintothegeneralizedmodel.A
relevantexampleisthedefinitionofintegrityconstraints.

Figure4:[Link](left)definesapersonasacomplexobject
withthepropertiesnameandlastnameoftypestring,andparentoftypeperson
(recursivelydefined).Theinstance(ontheright)showstherelationsinthegenealogy
amongdifferentinstancesofperson.

Chapter2

GraphSchemas

Selectingvertexlabels

[Link]
of
[Link]
edgescanhavepropertieswhicharekeyvaluepairswithStringkeysandprettymucha
ny valuethattheunderlyingdatabasesupports.

Sofar,themodellooksschemalesssinceverticesandedgescan'tbedistinguishedfro
m otherverticesandedgeswithoutknowingwhatthepropertiesmean
.However,edgeshave
alwayshadlabels.AndwithTinkerpop3,[Link]
strue withNeo4J'slatestmajorversion.

Ifeveryvertexmust belabeled,whatisthecorrectmethodtoselectalabel?
Whatshoulda labelsayaboutavertexoranedge,fromtheapplication'sperspective?
Wethinkavertexlabelshouldrepresentthemostgranuartype
ofthevertex,whereeach "vertextype"
isassociatedwithaunquecombnaonof:
meaning(semantics),
setofpropertykeynamesandvaluetypes,and
setofoutgoingedgelabels,whereeachlabeltypeisannotatedwiththepossible
directionsoftheedge(in/out/both)andcardinality.

Whyso?
Becauselabelsrepresentingvertextypesgivetheapplicationthemostdetailed
informationaboutthe behaor ofthatvertex,therebyensuringthattheapplicationcan
[Link],oneshouldnotbeabletosubdivideav
ertex
typetogettwovertextypesthatbehavedifferentlyfromtheapplication'sstandpoint.

Examplesoflabelselection

Let'sgothroughthelabelselectionexercisewiththeclassic6vertextinkergraphshow
nin
thepropertygraphmodelpage.SincethisisaTinkerpop2stylegraph,itdoesn'thavev
ertex
[Link]'llnowtrytocomeupwiththevertexlabelsbysimplylookingatthevertexbe
havior.
Fgure1:TnkerGraphexampe

Ifyoulookclosely,therearetwotypesofvertices:oneswith'name'and'age',andones
with 'name'and'lang'.Letuslabe
theformervertextypeas'Person'andthelattervertextypeas
'Software'.Inotherwords,youhavepersonsnamed'marko','vadas','peter'and'josh'
and softwaresnamed'lop'and'ripple'.

Afteranalyzingtheedgelabelsanddirection,youcouldsaythatthe'Person'vertextyp
ehas:
Propertykeys'name'and'age'
Edgeslabeled'knows'intheOUTdirection
Edgeslabeled'created'intheOUTdirection
The'Software'vertextypehas:

Propertykeys'name'and'lang'
Edgeslabeled'created'intheINdirection
Now,anapplicationlookingatthisgraphautomaticallyknowswhattoexpectwhenitr
eadsa
vertexlabeled'Person'or'Software'.Wecandefinetwodifferentindexeson'name',o
nefor
PersonandoneforSoftware,tomakesurethatsoftwaresearchesdotpckuppeope
,or viceversa.

Thelabelselectionprocesscan'[Link],apersonw
ithno
friendscanbethoughtofasaseparatevertextype,becausetherearenoadjacent'know
s'
[Link],unlessthismakessenseinthecontextoftheapplicati
onor
thedatamodel,thereisnopointinsubdividingthe'Person'vertextypeas'Loner'and'P
erson
withFriends'.Thesameargumentgoesforsubdividingthepersonvertextypeasthe
Developer'and'NonDeveloper'basedonwhetherthatpersoncreatedasoftware.

Torecap,therightwaytoselectvertexlabelsforapropertygraphistofirstfigureoutthe
[Link]
graphschema.

Drawingagraphschema
Thebestwaytorepresentagraphschemais,ofcourse,[Link]
graphschemalooksfortheclassicTinkerpopgraph.
Fgure2:Exampegraphschemashownasapropertygraph

[Link]
pes,
[Link]
[Link]
e nameofthemostspefcsuperass representingthecorrespondingpropertyvalues
[Link]'?'aftertheclassname(notshown
here).

Edgepropertiesarelikevertexproperties,exceptthatthereisaspecialpropertyname
d'#' thatholdsthecarnaty from
[Link]
cardinalityisM:N,i.e.,many-to-
many,forboth'knows'and'created'.Onecouldbemisledto
thinkthatsomeoftheserelationshipsare1:[Link]
other reasonfornotfullyrelyingonreverseengineeringmethodstoderiveschemas.

[Link],th
egraph
schemaisverysimple,althoughthevisualizationofthegraphshowninthelinklooks
complicated.

Fgure3:GratefulDeadgraphschema
[Link],the
schemaisextremelysimple(simplisticgivenrecentUSSupremeCourtrulings).
Fgure4:Famlytreegraphschema

NotethatinthePixyschema,thepropertylistsarethesamefor'Man'and'Woman',butth
e
directionofthe'wife'edgeisfunctionallydependentonthevalueofthe'sex'property.
Thisis veryinterestingbecausethismeansthatgraphschemascoudbenormazed
usingruleslike [Link]!

Summary

Thissectionintroducedtheideaofschemasforpropertygraphsanddescribedhow
the
[Link],itdescribedameth
odto deve
thegraphschemaforanexistingpropertygraphbyfindingthemostgranulardivision
ofitsverticesintovertextypes.

Graphschemas(orschemagraphs)helpapplicationdevelopersbetterunderstandth
e graph'sstructure.
Inthenextsection,wewilllookattheproblem [Link] deve
agraph schemafrom
ahigherlevelconceptualmodelsuchasanEntityRelationshipmodel?Could
thisbeasystematicmethodtoselectvertexandedgelabels,andpropertykeyswhen
designingagraphdatabaseapplication?

Chapter3

ConvertingERmodelstographschemas

Thissectionwilldescribeageneralmethodtoconvertanentityrelationshipmodelto
a
[Link],adatabasedesignercandevelopERmode
lsusing
standardconceptualmodelingpractices,butstorethedatainagraphdatabaseinstead
ofa relationaldatabase.

ERmodelsanddiagrams

TheentityrelationshipmodelwasproposedbyPeterCheninhis1976papertitled"Th
e
EntityRelationshipModelTowardaUnifiedViewofData".Theideasinthispaperar
e [Link]
model.

Conceptualmodelingisaparticularlyusefulexercisewhenembarkingonaprojectth
at [Link]
[Link]
modelingistolookatthenaturallanguagedescriptionofanapplication'srequirement
s.
Theserequirementscanbeanalyzedtoidentifytheentityandrelationshiptypes,using
Chen's"rulesofthumb"(quotedfromWikipedia):

Commonnoun Entitytype
Propernoun Entity
Transitiveverb Relationshiptype IntransitiveverbAttributetype
Adjective Attributeforentity Adverb Attributeforrelationship
Example
Letusconsiderthefollowingrequirements:

Modelasystem whereuserscreatepages,[Link]
[Link]
whicharethenusedtorecommendothersectionstotheauthorsandinvitedreaders.
Youcouldanalyzethisrequirementandcomeupwiththreeentitytypes,[Link],Page
and
[Link],InvitesandTaggedAscapturetherelationships.N
otethat
allverbsdon'tbecomerelationships(likecreate).Similarly,thefactthatinvitationso
nly applytopagesthatauserownsislostinthismodel.

Fgure5:ExampeERdagram

Thesquareshapedboxesshow
entitytypes,[Link]
diamondshapedboxesshowrelationshiptypes,whichrepresentsetsofsimilarrelati
onships. Arelationshiptyperelatestwoormoreentitytypestoeachother.

Thediagram
showsthecardinalityofeachentity'scontributiontoarelationship,suchas1:N
(onetomany)orN:N(manytomany).Thecardinalityisspecifiedusingthe'lookacros
s'
[Link],aUserownsNpages,[Link]
own limitationsoflookacrosscardinalityforternaryrelationshipslikeInvites.

Thediagram
alsoshowssomeovalshapedattributes,[Link]
[Link]
ustbe underlined.

Now,[Link]
fromanERperspective,itmakessensetomodeltagasanentity,especiallyiftagsare
usedtoestablishrelationshipsacrossusersforrecommendations.

ProceduretoconvertanERmodeltoagraphschema

TheproceduretoconvertanERmodeltoarelationalmodeliswellknownanddiscuss
edin
[Link]
r proceduretheERdiagramwiththeaboveexample.

Rule#1:Entitytypesbecomevertextypes
EntitytypessuchasUser,PageandTagbecomevertextypes.
Thenameoftheentitytypebecomesthelabelofthevertextype.
Theassociatedattributesbecomethepropertiesofthevertextype.

Notethatwearedrawingagraphschema,[Link]
any
[Link]
erm
"vertextype"[Link]"
entity types"(likeUser)andentities(likeJohnDoe,theuser).

Rule#2:Binaryrelationshiptypesbecomeedgetypes
AllbinaryrelationshiptypesintheERdiagram
canbeconvertedtoedgetypesinthegraph schema.
Thenameoftherelationshiptypebecomesthelabeloftheedgetype.
Theassociatedattributesbecomethepropertiesoftheedgetype.
Theendpointsoftheedgetypearethevertextypescorrespondingtotherelatedentity
[Link]'tmatter.

Hereisanexampleshowingthe"Owns"relationshiptypetranslatedtoan"owns"edg
etype:
Notethatonetomanyandmanytomanybinaryrelationshipscanbemodeledasedges
without
[Link],youwouldneedanadditionaltable
tocapture manytomanyrelationships.

Fgure7:Ownsraonshpconvertedtoanownsedge

Aminorpointisthatthecardinalityiswrittenas1:NbecausetheUser(outvertextype)t
o
Page(invertextype)relationshipisa1:Nrelationship,usingthelookacrossmethod.I
nother
words,[Link]
d, thecardinalitywouldbeN:1.

Rule#3:Naryrelationshiptypesbecomevertextypes
[Link]
ome vertextypesinthepropertygraphmodel.
Thenameoftherelationshiptypebecomesthelabelofthevertextype.
Theassociatedattributesbecomethepropertiesofthevertextype.

Thenewvertextypeincludesedgestothevertextypescorrespondingtotherelatedent
ity types(seeexample).Theseedgetypesarelabeledaftertheroleoftheparticipating
[Link]'tmatterforanyoftheseedges.

HereisanexampleshowingtheternaryrelationshipInvitestranslatedtothevertextyp
e Invitation:
ThecardinalityinthegraphschemaisN:1becausetheInvitationtoPagerelationshipi
sanN:1
relationship,[Link],aninvitationcouldbeissue
dto1
page,andapage(invertex)[Link]
omeof
theroletypes,likeinvitee,[Link]
itywill be1:N.

Fgure8:IntesraonspconvertedtoanIntaonvertextype

Wehaven'tshowntheprocessforweakentitytypesandidentifyingrelationshiptypes
but
[Link]
more
forgivingthanrelationaldatabasesinthattheyallowtwoverticestohavethesamelab
eland
[Link]
fying relationshiptypesintothepropertygraphmodel.

Conversionexample

[Link],thi
s
diagramprovidesenoughinformationforanapplicationdevelopertoworkwiththeg
raph database.

Fgure9:GraphschemaforUserPageTagERdagram

Thisisthe"logicalmodel"fortheexampleconceptualmodelintroducedinthefirstfig
[Link]
cantweakthismodelfurtherbyrenamingthelabels,changingdirectionsoftheedges,a
ndso [Link].

Verticesarevertices,andedgesare...edges

[Link],"Joebought
a
headphoneatTarget"isanexampleofa"Bought"relationshipthatrelatesaUsertoaPr
oduct
[Link],notedges(unlessyouareus
ing hypergraphs).Hencewethinkitismseadng
tothinkofedgesasrelationshipsandvertices asentities.

Itisbettertothinkofgraphsare vsuizeaberepresentaons [Link]


emphasizethevisualnatureofgraphsbecausedrawingandthinkingintermsgraphsis
easy.
Forinstance,yougototheWikipediaentryforhypergraphs,youwillseewhyvisualizi
ng hypergraphsisn'taseasyasvisualizing(binary)graphs.
Summary

This section showed thatitis possible to convertanyentityrelationship modelto


a
[Link],adataarchitectcanusestandardmethodstom
odela domainasanERdiagram andthenfollow
thisproceduretoconvertittoapropertygraph
[Link]
keyvalue storesanddocumentstores.

Chapter4

NormalizingGraphSchemas

Thissectionlooksathow graphschemascanbemanipulatedandtransformedto
[Link]
relationaldatamodels,typicallyperformedtonormalizeordenormalizearelational
schema.

Normalizationofrelationaldatabases

Thegoalofdatabasenormalizationismakesurethatrelationalschemasareeasytomo
dify,
easytoextend,[Link]
ous
normalforms,suchas1NF,2NF,andsoon,defineconstraintsthatatablemustsatisfyto
be
[Link]
mathematical,[Link]
examplefromtheWikipediapageon3NF:
Thepreviousfigurebreaksupthetournamentwinnerstableintotwotables,onewithp
layer
[Link]"functionaldependenc
ies"and
"nonprimeattributes"arehardtoremember,buttheprocessofsplittingandmergingta
bles
[Link],iftherewasanexistingtablewhichh
adone rowperplayer,we'dprobablymovethe"dateofbirth"tothattable.

Transformationrulesthatproduceequivalentschemas

Thissectionlistssometransformationrulesthatproduceequivalentgraphschemas.
Agraph schemaisequven
toanothergraphschemaifthedatastoredinoneschema,alongwith
theapplicationsthataccessit,canbeportedtotheotherschema,[Link]
rules arelikesplittingandmergingtablesinrelationalmodels.
Thetransformationrulesinthissectioncanbemechanicallyappliedtoanyschema,an
dhas
[Link],youcouldsi
mplify thesemanticsandimprovetheusabilityofyourgraphmodel.

RuleA:Renamingpropertiesandlabels
Thisruleconsistsofthreetransformationsthatresultinequivalentschemas:
Anyvertexlabelcanberenamed,solongasthenewnamedoesn'trefertoanexisting
vertexlabel.
Anyedgelabelcanberenamed,solongasthenewnamedoesn'trefertoanexistingedge
labelbetweentheoutandinvertextypes.
Anyvertex/edgepropertycanberenamedsolongthenewnamedoesn'trefertoan
existingpropertyofthevertex/edgetype.
Thefollowingfigureillustratessomeexampleapplicationsofthisruleonvertexande
dgelabels:

Fgure10:Renamngproperesandabes

Theschemashowninthetopisasimplegraphschemashowingfamilyrelationships.T
his schemaistransformedtotheschemashowninthebottom ofthefigureusingthe
followingtransformations:
VertexlabelsManandWomanarerenamedtoMaleandFemale.
Edgelabelsmother(2instances),father(2instances)arerenamedtoparent.

Eventhoughitseemslikesomeinformationislostbyrenamingmother/fathertoparent
,this
isn'ttruebecausethevertexlabelsattheendpoints(Male/Female)havethatinformati
[Link]
sametransformationwouldn'tbesoobviouswhilelookingataninstanceofthisgraphl
ikethe Kennedyfamilytree.

Notethatyoucannotrename'wife'to'parent'[Link]
e alreadyexistsaparentedgetypefromMaletoFemale.
RuleB:Reversingedgedirections

Thisrulestatesthatanedgetypecanbereversedprovideditisaselfloop,orthereisnoe
dge
[Link]
rsedas well.

Fgure 1:Reverngedgedrecons
Thefollowingfigureillustratesanexampletransformationusingthisruleandtheprev
ious [Link]:
The'wife'edgeisrenamedto'husband'(ruleA)andthenreversed.
Eachparentedgeisrenamedto'son'or'daughter'andreversed.

[Link]
, JFKJrparent>[Link]>JFKSr.
Youcouldalwaysrenamethefour'son'and'daughter'edgetypes,to'child'usingruleA
. Again,noinformationislostsincethevertexlabelsarestillunique.

Youwould,however,notbeabletorename'husband'to'son'.Youcouldrename'husba
nd'
to'daughter'(thoughabsurd).Theapplicationwillhavetointerpret"maledaughters"
as
[Link],youwouldnotbeabletoreverseit
s direction.

Asyoucanseealready,someapplicationsoftheserulesmaybequitehardtoderiveify
ouare thinkingintermsofgraphinstances,ratherthangraphschemas.
RuleC:Propertydisplacement

Fgure12:Propertydsacement

Thisrulestatesthatapropertyonanedgetypecanbemovedtoeitheradjacentvertexty
pe,
[Link]
extype
canbemovedtoanadjacentedgetypewithlookacrosscardinalityof1,providedthee
dge alwaysexistswhenthepropertyexists.

Theadjoiningfigureclarifiestherule,wherethe'dateOfBirth'propertyismovedtoth
e
'mother'relationshipbecausethereisexactlyonemotherrelationshipperMan/Wom
anandit
[Link]'dateOfBirth'to'deliveryDa
te',one couldarguethatthepropertybelongsintheedgeandnotthevertex.

NotethataMan'sdateOfBirthcannotbedisplacedtothewiferelationshipbecausetha
t
[Link]
rly,
thedateOfBirthintheedgetypelabeled'mother'fromMantoWomaninthebottomsche
ma, cannotbemovedtoWomanbecauseofthecardinalityrestrictionsintherule.

Usingthisrule,youcanmovethepropertiesaroundtheschematocomeupwithabetter
[Link]
graph
[Link],ifagraphdatabaseonlysupportsindexesonvertexpropertie
s,you
[Link],ifagraphdata
base
supportsvertexcentricindexesbasedonpropertiesonadjacentedges/vertices,youc
anuse thisruletobringtheindexedpropertyclosertothevertextypeofinterest.

RuleD:Specializationandgeneralization
Thisrulestatesthat:
AnyvertextypecanbedividedintotwodisjointvertextypesbasedonaBooleanteston
thepropertiesandadjacentedgelabelsofavertexbelongingtothattype.
AnyedgetypecanbedividedintotwodisjointedgetypesbasedonaBooleanteston
thepropertiesandadjacentvertexlabelsofanedgebelongingtothattype.
Fgure13:Generizaon
Inotherwords,ifweprovideabooleanfunctionthatcangiveaT/Fresultgivenavertex
/edge, wecanusethatfunctiontodivideavertex/edgetypeintotwodifferenttypes.
Thereverserulestatesthat:
Anyvertex/edgetypecanbemergedintoanothervertex/edgetypeprovidedthereisa
Booleantestthatcandistinguishitsvertices/edgesfromthemergedvertices.
Theadjoiningfigureshowsanexampletransformationinvolvingthefollowingsteps
:
MaleandFemalearegeneralizedasPerson,becausethebooleantest,sexequals'M',
candistinguishMalefromFemale.
Afterthat,sonanddaughteredgetypesaregeneralizedaschildbecausethebooleante
st, sexofinvertexequals'M',candistinguishsonfromdaughter.

Thisruleisusefulinincreasingthespecificity,orreducingthecomplexityofthegraph
schema.
Asageneralprinciple,itisbettertousethisruleforspecialization,we.e.,increasingth
e
specificity,becausethatallowsthedifferentvertexandedgetypestoembracediffere
nt
[Link],thereareinstanceswhe
rethe
differencesbetweenthevertextypesaresominorthatspecializationonlyresultsinap
plication
[Link]
eto Person.

RuleE:Edgepromotion

Fgure14:Edgepromoon

Thisrulestatesthatanedgetypecanbe promoted
toavertextypebyaddingtwo"out"edge
[Link]
ype.
ThecardinalityofthenewedgetypesareN:1or1:1dependingonthelookacrosscardi
nalityof theoriginalendpointvertex'stype.

Notethatthedirectionofthenewedgetypescanbechangedusingontherenameand
[Link]"out"directiontosimplifythewayinwhich
cardinalityforthenewedgestypesisderived.

Theadjoiningfigureshowsthehusbandedgepromotedtoavertextypecalled'Marria
ge'. Theedgetypes'husband'and'wife'pointtothetwoendpointsofthevertextype.
TheedgepromotionruleisusefulinapreparingbinaryrelationshiptobecomeanNary
relationship.

Thereverserulestatesthatanyvertextypewithtwopropertylessedgetypes,withsam
eside cardinalityofexactly1,canbedemoted
[Link]
[Link](ruleC)tom
ove propertiesoutofedges.

RuleF:Propertypromotion

Fgure15:Propertypromoon

Thisrulestatesthatanygroupofpropertiescanbepromotedtoanewvertextypewithth
ose
properties,providedthenewvertextypehasedgesconnectingittoallexistingvertext
ypesthat
includethepropertygroup.Thesamesidecardinalityofthenewedgetypeis1.
Theadjoiningfigureshowsthe'sex'[Link]
extype
[Link],eve
ryperson
inthenewgraphwillhaveanoutgoing'isa'edgetooneofthetwonewvertices.

Thisruleisequivalenttothesplittingofarelationintotworelations,asshowninthefirs
t
[Link],typicallyonesthatrepeat,canbepromo
tedtoa vertex.

Whileapplyingthisrule,itisbettertoincludeallvertextypesthathavethesamegroupo
f
[Link],ifthereisa'sex'propertyinadifferentAnimaltype,itisbetter
to
[Link]
up, youcanfirstpromotethoseedgetypestovertices.

Thereverseofthisruleisthatavertextypethathaspropertylessedgetypeswithsamesi
de
cardinalityof1,[Link]
smust
[Link]
atable
intherelationalmodel,whichisusefultoreducethenumberofjoins(ortraversalsinth
ecase ofgraphdatabases).
RuleG:Propertyexpansion

Fgure16:Propertyexpanson

Thisrulestatesthatapropertyofavertextypethatrepresentsalistofvaluescanbemov
ed
[Link]"in
"
edgetypefromtheexistingvertextypewithcardinality1:[Link]
wsthis ruleappliedtothenicknamepropertywhichholdsalistofStrings.
Thereverserulestatesthatanyvertextypewithexactlyonepropertylessedgetypewit
h
lookacrosscardinalityofexactly1canberemovedaftermovingitspropertiestoalisti
nthe adjacentvertextype.

[Link],how
ever,many
[Link]
g nicknamesasaListoraseparatevertextypeisuptothedesigner.
Summary

Rulebasedschematransformationsaretoolsthatadatamodeldesignercanusetorew
ritea
graphschema,[Link],adatamo
del
designercanusetheserulestoselectthedirectionsofedges,thenamesofdifferentlabe
ls
andkeys,thelocationsofvariousproperties,[Link]'tmatterfro
man
pureinformationperspective,butcouldmakeabigdifferenceintheusabilityandeffic
iency.
Inthatsense,adatamodeldesignercangobacktoCodd'soriginalgoalsfornormalizat
ion designingschemasthatareeasytomodify,easytoextend,informativetousersand
supportiveofvariousquerypatterns.

Chapter5

One metare fornormalization

Theprevioussectionlistedsevenrulebasedschematransformationssuchasrenamin
glabels,
reversingedges,promotingedgesandpropertiestovertices,[Link]
d transformationscanbemechanicallyappliedto
anygraphschema,withoutlosingany
[Link],agraphdatabasedesignercanstartwith
a designgeneratedfromanentityrelationshipmodelandtweakittogetafinaldesign.

Thissectiondescribesasingle metare from whichthesevenpreviouslydescribed


[Link]
sectionsusingsettheory.
Schemasandconstraints

Fgure17:Exampegraphschema
Theabovefigureshowsanexamplegraphschemadescribingconstraintsonthegraph
data modelsuchas:

Whatarethelegallabelsforvertices?
Whatarethelegaledgelabelsbetweentwovertextypes?
Whatarethelegalpropertykeysandvaluetypesateachedgeorvertextype?

Thereality,however,isthatagraphmodelcouldhaveotherconstraintsthataren'texpr
essed
[Link],the'inviter'edgeineveryInvitationmustbetotheUserwh
ohas
an'owns'edgetothe'page'[Link]'tcapturedintheab
ove schema.

Thequestionis:Howcanwemodelcompexconstrntsnagraphmode?
Graphuniverses,transformationsandequivalence
Agraphunverse
Uisasetofgraphs,[Link]
datamodelinthesensethatitcaptureseveryvalidgraphthatbelongstothedatamodel.

AgraphuniverseUis compabe withagraphschemaS,ifeverygraphintheuniverseis


[Link],althoughthegraphuniverseisaprecisedescriptio
nofthe
model,itcanstillbeunderstoodasarefinementofamorelooselydefinedgraphschem
a.

Redenngequvenceusngtransformaonfuncons

Fgure18:Annverbefuncon
AgraphtransformationTisafunctionthattakesgraphsfromoneuniverseUto
[Link],T:U→ V.

AuniverseUisequivalenttoauniverseVifthereisatransformationfunctionT:U →
V,
[Link]
establishaonetoonecorrespondencebetweentwosets,whichinthiscasearegraph
universes.

Inotherwords,givenanygraphG∈U,wecanuseT(G)togetagraphG'∈[Link]
usethe inversefunctionT1
(G')[Link].
Aprogrammngperspecve

Ifweareupgradingfrom onegraphmodeltoanother,thetransformationfunctionisthe
upgradescp [Link]
downgradescp
,thenwehavetwoequivalentmodels(oruniverses).Inotherwords,two
graphmodels,representedasuniversesorschemas,areequivalentiftheyareforwar
dand backwardcompabe .

Derivedtypes

[Link]
ecalleda devedvertextypenU
,ifeverygraphG∈Uissuchthatitsvertices(andadjacentedges)belongingto
thevertextypecanbecalculatedfromtherestofthegraph.

Inotherwords,givenanygraphintheuniverseU,afterweremoveallverticescorresp
ondingtothe
derivedvertextype,[Link]
dgeand
[Link]
dingraph
schemas,butarespecifictographuniversesthatarecompatiblewiththatschema.

Metarule:Addingandremovingderivedtypes
Finally,hereisthemetarulebehindallschematransformations:
GivenanygraphuniverseUcompatiblewithaschemaS,wecanaddaderivedvertex/edge/propertytypetoproduce
an equivalentgraphuniverseVcompatiblewiththeschemaS∪{derivedtype}.

Thereverserulestatesthat:
GivenanygraphuniverseUcompatiblewithaschemaS,wecanremoveaderived
vertex/edge/propertytypetoproduceanequivalentgraphuniverseVcompatible
withtheschemaS{derivedtype}.
Fgure19:Modfedgraphschema

The'invitee'edgetypeinthegraphschemashowninthefirstfigureisaderivededgetyp
e. Thisisbecausethe'invitee'edgescanbecalculatedbygoingfrom
theInvitationverticesto
thePageandbacktotheuserthrough'owns'edge(reversedirection).Wecansimplifyt
he
originalschematotheversionshownintheadjoiningfigurebyapplyingtheserules:

(Metarule)Removederivededgetype'invitee'
(Edgepromotion)DemotethebinaryrelationshipInvitationtoanedgecalled'invited
'.
Asyoucansee,theupdatedschemaissimplerthantheoriginalschemaderivedfrom
anER diagram.
Provingthemetarule

Themetaruleiseasyto [Link]
transformationfunctiontoremoveaderivedtypesimplyremovesallelementsthatbel
ong [Link]
theremaininggraph.
Hencetheuniversewiththederivedtypeisequivalenttotheuniversewithoutit.

Provingthe7rules:Renaming,Reversing,PropertyDisplacement,...
[Link]
s:
Anyedgelabelcanberenamed,solongasthenewnamedoesn'trefertoanexisting
edgelabelbetweentheoutandinvertextypes.
Wecanprovethisintwosteps:
Addderivededgetypewiththenewnameasacopyoftheoldedgetype.
Removetheoldedgetypewhichisnowderivablefromthenewedgetype.

Ofcourse,step1requiresthattheedgetypewiththenewnamedoesn'talreadyexistinth
e
[Link],alledgesoftheedgetypecan'[Link]"a
slong asthenewnamedoesn'trefertoanexistingedgelabel."

Inthismanner,wecanproveeachrulebyperformingsomestepstofirstaddnewderive
d typesandthenremovetheexistingtypeswhichbecomederivedtypesthemselves.
Beyondtransformationrules

Thinkingintermsofgraphuniverses,derivedtypesandtransformationfunctionslets
usdo
[Link]
eserules ortransformationsdependsonouroverallstrategyfordatamodeling.

Onestrategyistominimizethenumberofimplicitconstraintsnotcapturedbythesche
[Link]
instance,theschemashowninthesecondfiguredoesn'thavetheimplicitconstrainton
the'invitee'
[Link],fewerimplicitconstraintsmeanslessdu
plicationof
[Link]
onin relationaldatabases.

[Link]
shave
beenpopularizedby"denormalization"techniquessuchasdimensionalmodeling.F
orinstance, wecouldadda"shortcut"derivededgetypecalled'latest'from
UsertoPagetoshowthelast
[Link]
tof
[Link]
nthe graphmustbedesignedwiththeseconstraintsinmind.

Summary
Thissectionintroducedsettheoreticrepresentationsofgraphmodelscalledgraphun
iverses,
[Link],thissectionshowedthattwo
graph
universesareequivalentifthereisaninvertiblegraphtransformationfunctionbetwe
enthem.
Finally,thissectionshowedthatallschematransformationrulespresentedintheearli
er
sectioncanbederivedfromonemetarulethatdealswithaddingandremovingderived
types.

Validatinggraphschemas

Thelastfew sectionshavediscussedhow
propertygraphschemascanhelpdesigngraph databasesfrom
[Link]
readingthisthreadontheGremlinusersgroup,werealizedthatitiseasytovalidategra
phs againstschemaswithGremlinandGroovy.

Fgure20:Tnkergraphschema
ThisgistonGithubshowshowyoucantakeaninstancegraphandchecktoseeifitiscom
patible
[Link]
xandedge
[Link]'sthecodetocreateaschemagraphinsideaGremlinshellfortheclassicTi
nkerpop schemashownhere:

sg=newTinkerGraph()
person=[Link]()
[Link]('_label','person')
[Link]('name','[Link]')
[Link]('age','[Link]')
software=[Link]()
[Link]('_label','software')
[Link]('name','[Link]')
[Link]('lang','[Link]')
knows=[Link]('knows',person)
[Link]('weight','[Link]')
created=[Link]('created',software)
[Link]('weight','[Link]')
[Link]('_minIn',1)//Someonemustcreatethesoftware

ThepropertieshavevaluescorrespondingtotheJavaClassofthepropertyvaluesinth
e
[Link]'?'toindicatethepropertyisoptional.T
he
edgesintheschemagraphcanhave4specialproperties,viz._minIn,_maxIn,_minOu
tand _maxOuttoindicatecardinalityrestrictionsforvariousedgetypes.

Anyinstancegraph,g,canbevalidatedagainsttheschemastoredinsg,usingtheGreml
in script:
[Link]({checkVertex(it,sg)})
YoucanlookatthefullGithubgisttoseehowthevalidationisdone.
ThecurrentversionofTinkerpopdoesn'[Link]
the vertextothevertextypeisspecifictothegraph,likethis:
vertexType={v,sg>.age?
sg.V('_label','person').next():sg.V('_label','software').next()}
Mostgraphschemastypicallyhaveapropertynamed'type'thatwouldmakethismapp
ing easier.
HoweverwithTinkerpop3,thismethodcanbestandardizedtousethelabel:
vertexType={v,sg>sg.V('label',[Link]).next()}

Pixy:Firstorderlogicongraphdatabases

TheprevioussectionshaveshownthatanyERmodelcanbeconvertedtoapropertygr
aph
schema,[Link],onekeyquestio
n remains:

Dographdatabasesofferthesamequerngcapalitesasraonaldatabases?
Inotherwords,[Link]
candatastuffedinthisfashionbequeriedeffectively?
Thisisthesubjectofthissection.
Background
OnSQL

SQListhequerystandardforrelationaldatabases.Itfirstappearedinthe1970sandw
as
[Link]
[Link]
showedthatrelationalalgebraisequivalenttorelationalcalculus,aformoffirstorde
[Link] theoremisthebedrockofSQL'sexpressivepower.

Onfirstorderlogic

Usingrelationalalgebra,wecanwriteanyqueryoftheform"Findallrowsfromtables
A,B,C,..., matching somepredcat
",aslongasthepredicatecanbeexpressedinfirstorderlogic.
Specifically,thepredicateisformedusing:

variouscomparisonsonrowsandcolumns, logicaloperations"and"(∧),"or"
(∨)and"not"(¬),and
theuniversal"forevery"(∀)andexistential"thereexists"quantifiers(∃)thatop erateonrowsofagiventable.

Let'sconsidertablesnamedperson,[Link]"find
me peoplewhoownonlyBMW
cars,buthaveatleastonespeedingticket".Thepredicatecanbe writtenas:
my_query(person)=
(∀car,personownscar∧[Link]='BMW')∧(∃ticket,personhasticket)

OnGremlin
[Link]
rks
[Link]:

GremlinWikionGithub
GremlinDocs
ThePathologicalGremlin(presentation)

[Link].g.,somethinglike"findthefriendofafrie
ndof
vertexv"[Link]('friend').out('friend').Thisstyleoftraversalwithver
ticesand edgesisn'tnaturalinSQLwithtuples.

ThedeclarativequeryingstyleofSQLis,however,differentfrom
Gremlin.TheSQL2Gremlin
[Link]'tobvious.
Pixy:FirstorderlogicwithGremlin

Pixyisabridgefrom
[Link]
[Link]"Findverticesandedgesthatmatchsom
e precat "wherethepredicateisformedby

variouscomparisonsonvertexandedgeproperties, logicaloperations"and"
(∧),"or"(∨)and"not"(¬),and
theuniversal"forevery"
(∀)andexistential"thereexists"quantifiers(∃)thatoperateon verticesandedges.
PixyqueriesareexpressedusingPrologrules,[Link]
asHorn clauses.
ProloglikeSQLhasthefullexpressivepoweroffirstorderlogic.
Let'stakethepredicatefromtheearlierdiscussion,
my_query(person)=
(∀car,personownscar∧[Link]='BMW')∧(∃ticket,personhasticket)
Let'ssaythatwerepresentpeopleasverticeswithoutgoingedgetypesnamed'car'and
'ticket'
[Link],wecouldexpresstheabovepredicateu
sing Hornclausesasfollows:

my_query(Person,Ticket):out(Person,'ticket',Ticket),
not(not_all_bmw(Person)).
not_all_bmw(Person):out(Person,'car',Car),
property(Car,'make',Make),
Make<>'BMW'.
[Link]
∃[Link]
[Link]∀[Link],saying"everycarisaBMW"isthesamea
ssaying"thereisno carthatisn'taBMW".

ERmodelsinPixy

IfyouuseanERmodelasastartingpointforyourdesign,youcanreconstitutetheERmo
del from
[Link]
entitiesnamedUser,PageandTagandrelationshipsnamedOwns,InvitesandTagged
As.
Fgure21:ERmodelforUserPageTagappcaon
Thiswastranslatedtoagraphschemawithfourtypesofvertices,[Link],Page,Taga
nd Invitation.

Fgure
2:GraphschemaforUserPageTagappcaon
Now,wecanreconstitutetheERmodelfrom
thegraphschemausingPixywiththefollowing clauses:

%Entities
user(User,Name,Login):property(User,'name',Name),property(User,'login',Log
in). page(Page,Uri,Html,CreateTs):property(Page,'uri',Uri),...
tag(Tag,Hashtag,Description):property(Tag,'hashtag',Hashtag),...
%Relationships
owns(User,Page):out(User,'owns',Page).
taggedAs(Page,Tag):out(Page,'taggedas',Tag).

invites(Invitation,Inviter,Invitee,Page):
out(Invitation,'invitee',Invitee),
out(Invitation,'inviter',Inviter),
out(Invitation,'page',Page).

[Link]
vertices,
[Link],yougetthefullpowerof
[Link],anyfirstorderpredicatethatappliest
o entitiesandrelationshipscanbewrittenasaPixyquerythatusestheaboveclauses.

Let'stakeanexamplepredicatethatmatchesallusersinvitedtopagestagged'tinkerpo
p' [Link]:
tinkerpop_invitee(User,Page):invites(_,_,User,Page),
page(Page,_,_,CreateTs),
CreateTs>1388534400L,%Unixtimestampfor1/1/2014
taggedAs(Page,Tag),
tag(Tag,'tinkerpop').
Notethat'_'isusedtorepresentanonymousvariables.
Queryrequirementsdon'tusuallymatterwhilemodeling

Itisn'tsurprisingthatqueriesinfirstorderlogiccanbecompiledtoGremlin,sinceGre
mlinis
[Link]
nan
ERmodeltosomethingthatexecutes"efficiently"onthecorrespondinggraphdataba
se.

By"efficiently",wemeanthatthePixy/Gremlinquerywillalwaystraverseedgestog
ofrom
oneentity/[Link]
typicallyordersofmagnitudefasterthanindexbasedjoinsinrelationaldatabases.
Queriesonproperties,willofcourse,[Link]
asyour
startingERmodelisaccurate,yourapplicationwillnothavetosimulatejoinsusingth
ese
[Link],thegraphschemadesignisindependentofthequery
requirements.

PartII
Chapter8:IntroductiontoDatabase

DatabaseSystemsevolution:Databasesanddatabasetechnologyarevitalto
modernorganizationssupportingboththedailyoperationsanddecisionmaking.
[Link]
dominancetotheenterpriseDBMSmarketplacebyOracle,theindustryremains
highlycompetitivewithacontinuedhighlevelofinnovation[12].
Figure1:Evolutionofdatabasetechnology
Majorperiodsofdatabasetechnologyevolution[12]:

1stGeneration(1960’s):Fileoriented–Supportedsequentialandrandom
searchingoffiles,buttheuserwasrequiredtowritecomputerprogramsto
[Link]
duringthisperiod.

2ndGeneration(1970’s):Navigational–Couldmanagemultipleentitytypes
[Link] [Link] standards.
3rdGeneration(1980’s):Relationalwithnon-proceduralaccess–Foundation
based on mathematical relations and associated operators. Optimization
technology was [Link] performed pioneering researchtoenablecom-
mercializationofrelationaldatabasetechnology.

4thGeneration(1990’s+):Objectoriented–Areextendingthebound-aries
[Link] kindsofdistributedprocessinganddata
[Link]
[Link].
DBMSmarketplace:DespitedominancetotheenterpriseDBMSmarketplaceby
Oracle,withmorethan40% overallmarketshare,theindustryremainshighly
[Link],its
competitionisMicrosoftSQLServer,IBM DB2,Teradata,SAP [Link]
sourceDBMSproductshavebeguntochallengethecommercialDBMSproducts
[Link]-source
DBMSisleadedbyMySQL,followedbyMongoDB,[Link]
he desktopDBMSmarket,MicrosoftAccessdominatesbecauseofthedominanceof
MicrosoftOffice.

Figure2:DBMSmarketplace
Innovationintheindustry:TheadvancesinDBMSinrecentyearssupportbusiness
[Link]
technologyhasbeendevelopedtosupporttheneedsofBigData,tobemodernwebsca
le [Link] 2009,the mostaccepted definition ofNoSQL isnext
generationdatabasesbeingnon-relational,distributed,open-sourceandhorizon-
tally [Link]-
free,scalability,global
availability,easyreplicationsupport,simpleAPI,eventuallyconsistent/BASE(not
ACID),andlargescaledata.[5][19]

TypesofDBMS

[Link]-Enginesis
aninitiativethatprovidesinformationonthepopularityoftheDBMSavailablein
[Link],whichare
updatedmonthly.
Figure3:DBMSdevelopedbydatabasemodelpiechart

Overthoselines,apiechartrepresentsthecategoriesofDBMSthatcomprisemore
[Link],wh
ere [Link]-
valuestores,with63systems,
Documentstores,with43systems,andGraphDBMS,with27systems.
Intheoverallclassificationofdatabasemodels,thoseDBMStypesaredistinguished
. TypesofDBMS:
RelationalDBMS GraphDBMS
Key-valuestores TimeSeriesDBMS
Documentstores RDFstores
ObjectorientedDBMS(Atkinson) NativeXMLDBMS
Searchengines Contentstores
MultivalueDBMS EventStores
Widecolumnstores NavigationalDBMS

Abovetheselines,[Link]
insteadofcountingthesystemsdeveloped,thedatabasemodelsarerankedbypop
-ularity,[Link]
relationalDBMS,the79.5%,followedbydocumentstores,7.3%,searchengines,
4.3%,key-valuestores,3.5%,widecolumnstores,3.1%,andgraphDBMS,1.1%.
Belowtheselinesapiechartrepresentsthemostrecentpopularityrank.
Figure4:DBMSpopularitybydatabasemodelpiechart

Inthepiechartabove,itiscleartoseethatRelationalDBMSaretheonesusedby
[Link],thestateoftheartischangingbytheinnovationsinthe
[Link] thoughthepercentagesofpopularityofNoSQL
databasesareminimalcomparedtoRelationalDBMS,thefactthattheyarerecent
technologiesingrowthisenoughtoevaluatethemmoredeeply.
NoSQLDBMS

ManydifferentNoSQL DBMS have been developed,buttheyare generally


classifiedinfourtypes[5]:
Key-valuestores:[Link]
performingachangeinavalue,theentirevalueotherthanthekeymustbe
[Link],itcanlimitthe
complexityofthequeriesandotheradvancedfeatures.[18]Examples:
Dynamo,AzureTableStorage,BerkeleyDB

DocumentStores:Therecordsstoredarecalleddocuments,whichconsist
[Link].
[18]Examples:Elastic,MongoDB,AzureDocumentDB

WideColumnStores:WhileRDBMSstoreallthedatainaparticulartable’s
rowstogetheron-
disk,beingabletoretrieveaparticularrowfast,Columnfamilydatabasesareabletor
etrievealargeamountofaspecificat-tribute
fastbyserializingallthevaluesofaparticularcolumntogetheron-disk. This
approach is useful for aggregate queries. [18] Examples:
Hadoop/HBase,Cassandra,AmazonSimpleDB

GraphDatabases:[Link]-ture
consistofconnections,oredges,[Link]
[Link]
[Link]
downsideisthattheygenerallyrequirealldatatofitononemachine,limiting
theirscalability.[18]Examples:Neo4J,InfiniteGraph,TITAN

Othertypes:MultimodelDatabases,ObjectDatabases,Grid & Cloud Database


Solutions, XML Databases, Multidimensional Databases,
MultivalueDatabases,EventSources,TimeSeries/StreamingDatabases
(a)ExampleofKey-ValueStore (b)ExampleofDocumentStore

Figure5:FourmaintypesofNoSQLdatabases Consistency Models forNoSQL


databases:Before NoSQL,ACID was the
[Link]
ACIDproperties:

Atomicity:Alloperationsinatransactionsucceedoreveryoperationis rolledback.
Consistent:On the completion ofa transaction,the database is
structurallysound.

Isolated:[Link]
moderatedbythedatabasesothattransactionsappearto runsequentially.
Durable:Theresultsofapplyingatransactionarepermanent,eveninthe
presenceoffailures.

However,NoSQLdatabasesbreakwiththetopicalityofSQLmodelswithACID
[Link] toadequatebettertomostNoSQLdatabases,
andtheyareasfollows:

BasicAvailability:hedatabaseappearstoworkmostofthetime.
Soft-state:Storesdon’thaveto bewrite-consistent,nordodifferent
replicashavetobemutuallyconsistentallthetime.
Eventualconsistency:Storesexhibitconsistencyatsomelaterpoint(e.g.,
lazilyatreadtime).

ACIDtransactionscanbeconsideredstricterthanneededformanyNoSQLcases,
[Link],BASE
transactionsguaranteesscale and [Link] BASE modelisused by
aggregatestores,suchascolumnfamily,[Link]
contrast,graph databases use the ACID [Link] databases promise
availabilityofthedataattheexpenseofdataconsistency(theconsistencyofthe
dataisonlyassuredatconcretesnapshots).[16]Graphdatabasesdifferentiate
themselvesfrom otherNoSQLdatabasesbyfocusingmoreondataconsistency.
Thecomparisonmadeinthelinesaboveisshowninatablebelow:

ACID

Properties Atomicity Consistent Isolated Durable BASE


BasicAvailability Soft-state
Eventualconsistency

NoSQLDBMS GraphDatabases Aggregatestores


Table1:ComparisonofACIDandBASEConsistencyModels
ComparisonofDBMS

[Link]
adoptionofthisDBMStypeisanimportantfactorforchoosingitasthemainsystem
[Link],currenttrendsshowthatthefourmaintimesofNoSQL
[Link]
moreobjectivepointofviewofthebenefitsofusingeachmodel,theusecasesfor
whichtheyperform betterandtheonesforwhichtheyperform theworst,are
listedbelow.

Usecasesforrelationaldatabases

Positiveusecases:transaction-orienteddatabases(bankingapplications, on-
linereservations),wheretheconcurrencyofmanytransactionsmust besup-
portedandtheintegrityofthedatamustbeprotected.

Negativeusecases:datawarehouses,whichareanalytically-oriented
[Link]
constraintsoftherelationaldatabasewouldn’tsupportthescalability.

Usecasesforkey-valuestores
Positiveusecases:
–Forstoringusersessiondata
–Maintainingschema-lessuserprofiles
–Storinguserpreferences
–Storingshoppingcartdata
Negativeusecases:
–Toquerythedatabasebyspecificdatavalue
–Withrelationshipsbetweendatavalues
–Tooperateonmultipleuniquekeys
–Ifthebusinessneedsupdatingapartofthevaluefrequently
Usecasesfordocumentstores
Positiveusecases:
–E-commerceplatforms
–Contentmanagementsystems
–Analyticsplatforms
–Bloggingplatforms
Negativeusecases:
–Toruncomplexsearchqueries
–Applicationrequirescomplexmultipleoperationtransactions
Usecasesforwide-columnstores
Positiveusecases:
–Contentmanagementsystems
–Bloggingplatforms
–Systemsthatmaintaincounters
–Servicesthathaveexpiringusage
–Systemsthatrequireheavywriterequests(likelogaggregators)
Negativeusecases:
–Tousecomplexquerying
–Ifthequerypatternschangefrequently
–Withoutanestablisheddatabaserequirement
Usecasesforgraphdatabases[19]
Positiveusecases:
–Frauddetection
–Graphbasedsearch
–NetworkandIToperations
–Socialnetworks
Negativeusecases:
–DataWarehousessobigthatrequireBASEmodel
Figure6:PositionsofNoSQLdatabases(source:Neo4j)

Onthefigureabove,thefivetypesofDBMSthatwerebeingcompared,aredisplayeda
[Link]
concludedthateachoneofthoseDBMSworksforsomespecificusecases,
dependingontheamountandcomplexityofthedatathatisgoingtobestored.
Theirusecasesarenotoverlapped,whichjustifiesthatthefifthofthem must
beconsideredbeforeimplementingaDBMSinacompany.

Chapter9:GraphDatabases

Graphdatabasesaredatabaseswhosespecificpurposeisthestorageofgraphoriente
ddatastructures,thereforeanintroductiontographtheorytobeconsistentwhenusingi
tsterminology.

ConceptsofGraphDatabases

PositioningIthaspreviouslybeenexplainedthatNoSQLdatabasesaddresssev-eral
issuesthatrelationaldatabasesdonot:availabilityfortheprocessingoflarge
datasets,partitioning,flexibilityoftheschemaandmodellingandprocessingcomple
xstructuresliketrees,graphs,specializedinprocessinghighlyconnecteddata,
managingcomplexandflexi-bledatamodelsandimprovingtheperformanceof
complexqueriesbytraversingthegraph.

[Link]
figuresbelow,itcanbeappreciatedthedifferenceinmodelingthesameusecase
[Link]
moresimilartothebusinessmodel,whichmakesitmoreaccessibletonottechnicalpr
ofiles.[8]
(a)RelationalDatabaseModel (b)GraphDatabaseModel
Figure7:ModelComparison

Agraphisapictorialrepresentationofobjectswhichareconnectedbysome
[Link]:Nodes(vertices)and
relationships(edges).

WhatisGraphdatabase
Agraphdatabaseisadatabasewhichisusedtomodelthedataintheform
[Link]:

Nodes
Relationships
Properties
Nodes:Nodesaretherecords/[Link]
andpropertiesaresimplename/valuepairs.
[Link]
[Link].StoringdatainNeo4jis
similartoaddmorerecordsinotherdatabases.

Relationships:[Link].

Relationshipsalwayshavedirection. Relationshipsalwayshaveatype.
Relationshipsformpatternsofdata.

Properties:Propertiesarenameddatavalues.
PopularGraphDatabases
[Link]

OracleNoSQLDatabase OrientDB
HypherGraphDB
GraphBase
InfiniteGraph
AllegroGraphetc.
WhyGraphDB

Graphdatabaseisveryusefulnowadaybecauseingraphdatabasesdataexistin
[Link]
dataismorevaluablethanthedataitself.

Relationaldatabasesstorehighlystructureddatawhichhaveseveralrecordsstoring
thesametypeofdatasotheycanbeusedtostorestructureddataand,theydonot
storetherelationshipsbetweenthedatawhilegraphdatabasesstorerelationships
andconnectionsasfirst-classentities.

Thedatamodelforgraphdatabasesissimplecomparedtootherdatabasesand,
[Link]
yand operationalavailability.

GraphDBvsNoSQLDatabase
FollowingaresomepointswhichspecifywhyGraphDbisbetterthanotherNoSQLda
tabases:

[Link]
difficulttousethemforconnecteddataandgraphs.
Onewell-knownstrategyforaddingrelationshipstosuchstoresistoembedan
aggregate'sidentifierinsidethefieldbelongingtoanotheraggregate-effectively
introducingforeignkeys.

Butthisrequiresjoiningaggregatesattheapplicationlevel,whichquicklybecomes
prohibitivelyexpensive.
Seetheusecasesofdifferenttypeofdatabases:
Relationaldatabase:Itisrepresentedintabularformsoitisbestforcalculatingthe
income.
Key-ValueStore:Itisbestforbuildingashoppingcart.
NoSQLdatabases:Itisstoredasadocumentso,itisbestforstoringstructured
productinformation.
GraphDB:[Link]
pointAtopointB.
Neo4jDataModel
Neo4jDatabasefollowsthePropertyGraphModelforstoringandmanagingitsdata.
Neo4jisagraph
databasewhichcontainsthefollowingfeaturesofPropertyGraphModel.

TheGraphmodelcontainsNodes,RelationshipsandPropertieswhichspecifiesdat
aand itsoperation.
Propertiesarekey-valuepairs.
NodesarerepresentedusingcircleandRelationshipsarerepresentedusingarrowke
ys. Relationshipspecifiestherelationbetweentwonodes.
Therearetwotypesofrelationshipsbetweennodesaccordingtotheirdirections:
UnidirectionalandBidirectional
EachRelationshipcontainstwonodes:"StartNode"or"FromNode"and"ToNode"
or "EndNode".
BothNodesandRelationshipscontainproperties.
[Link]
relationshipwithoutadirection,itwillthroughanerrormessage.

TherearethreemainbuildingblockofaGraphDBDatamodel:

Nodes
Relationship Properties
FollowingisasimpleexampleofaPropertyGraph.

Figure8:SimpleGraph

Here,[Link]
gArrows.
[Link]'sdataintermsofProperties(k
ey-valuepairs).In
thisexample,wehaverepresentedeachNode'sIdpropertywithintheNode'sCircle.

Queryperformance

GraphdatabasescompetitiveadvantageIthasbeensaidthatgraphdatabaseshavea
reasontobebecausetheyoutperform [Link]
[Link]
casethatisbettersuited forgraph databasesis"find allentitiesofa kind"
([Link]).Theexecutionofsuchaquery,startswithanindexlookuptofind
thestartingnode(s)[Link]-versed
[Link],thebiggerthevolumeof
data,themoreitoutperformsrelationaldatabases.
Figure9:Queryexecutioningraphdatabases

[Link]
queryingthroughdifferenttables,followingforeignkeysandotherindexes,anditwo
uld
[Link]
rmed byfollowingphysicalpointers,whileforeignkeysarelogicalpointers.
[8]Thequeryinthe figure,includesthetimeofeachindex-
[Link],the
largertheexecutiontimewillbecome.
Figure10:Queryexecutioninrelationaldatabases

RelationalDatabasescompetitiveadvantageOntheotherhand,becauseofthe
internalstructureofthetables,relationaldatabaseswouldoutperform graph
databaseswhentheoutputrequiresalltheattributesofatable(findAll-like
queries).Itsidealusecaseistoaggregateoveracompletedataset.[8]

GraphdatabasesrankingBelow thoselines,thefigureshowstheDB-Engines
RankingonGraphDBMS.Neo4jleadstheranking,anditsscoretriplesthe
followingDBMS,MicrosoftAzureCosmosDB.Neo4jhasbeenleadingtheGraph
databasessectorforsomeyears,[Link]
betakenintoaccountthatthescoreisdisplayedinlogarithmicscale,thereforethe
differenceinpopularityisreallysignificant.

ItcanalsobeseeninthetrendscatterplotthatMicrosoftAzureCosmosDBappearedin
thegraphdatabaselandscapein2014,andsincethenitsrisein
[Link]
wellintegratedinthesoftwaremarketplace.
Successfactor:Ithasbeenstated,whencomparingtheNoSQLDBMS,thatgraph
[Link],itisacompetitiveadvantagetowork
[Link]
theyaccomplishedso,Neo4jseemstobetheDBMSthatmoresuccessfullyis
improvinggraphpartitioning.[8]

Figure11:GraphDBMSRanking
Figure12:TrendGraphDBMSpopularityscatterplot

Chapter10:Neo4j
NecessityofNeo4j

WhyNeo4j?ByusingagraphdatabaselikeNeo4jwhichfocusesondatarela-
tionships;
[Link]’s
growingbusinessdemandsandcompetitiveatmosphere,usingtherighttoolisvery
importantandwhenitcomestowidelyconnecteddataNeo4jisthebestbecauseitis
thousandsoftimesfasterthantraditionaldatabases.Neo4janalyzeandtraverseofall
datainrealtimeandgivestheresultsveryfast.Neo4jiswidelyusedbylotsofbig
companieslikeeBay,Walmart,Cisco,UBSandmanymore.

WhatisNeo4j?Neo4jisanopen-sourceNoSQLgraphdatabasewritteninJavaand
[Link],Neo4jiscurrentlyworld’slead-inggraph
[Link].FirstofallNeo4jprovidesACID transaction
compliance,clustersupport,runtimefailover,highavailabilityandhighspeedquery
ing [Link]
interfaceanditiseasytolearnbecausetherearelotsoffreeonlineresourcesonthe
[Link]
Neo4jisdesignedforlinkingrelationshipsandithandlesthisrelationshipswithspee
d,
ease,andextremeflexibility.WithNeo4j,modelscaneasilybeconvertedtodatabase
[Link]’sisneeded
forthedatathenNeo4jisthesolution..

Neo4jVersions

Version ReleaseDate Neo4jVersion1.0 February2010 Neo4jVersion2.0


December2013 Neo4jVersion3.0 April2016

Graphdatabasesusesarelationshipfirstapproachtostoringandqueryingyourdata.
Theystoredatainamuchmorelogicalfashion,awaythatrepresentstherealworld
and prioritizes the representation,discoverability and maintainability ofdata
[Link]
relationshipsoACIDpropertywasbroughtbacktoatleastonenosqldatabasecalled
[Link]
businessdata.

Graphdatabasesgivesdevelopersamoreintuitivedatamodelfasterqueriesand
betteragilitytoadapttochangesinthebusiness.
Figure13:Neo4jAsaLeadingGraphDatabase

HowNeo4jisDifferentThanTraditionalDatabases?Graphdatabasesaremuch
[Link]
rowsandcolumns,graphdatabasesuseagraphwithnodesandrelationships.
[Link]
[Link]
relationshipsinrelationaldatabaseitcangetverycomplicatedwithjointablesand
joinqueriesandweneedallkindsofprimaryandforeignkeysanditcanbereal
hardtodealwithandevenworsethanthatisitcanbereallycostlyonthesystem
sographdatabasesarebuilttofixthatproblem andworkwithdatathatismuch
morecloselyrelatedandmoredynamic.
Thus,becauseofthereasonsstatedabovewechooseNeo4jasourdatabase.
Figure14:Ebay’scommentaboutNeo4j
Neo4jWorking
Neo4jstoresanddisplaysdataintheformofgraph.InNeo4j,dataisrepresentedbyno
desand relationshipsbetweenthosenodes.

Neo4jdatabases(aswithanygraphdatabase)arealotdifferenttorelationaldatabase
ssuchasMS
Access,SQLServer,MySQL,[Link],rows,andcolumn
stostoredata. Theyalsopresentdatainatabularfashion.

Neo4jdoesn'tusetables,rows,orcolumnstostoreorpresentdata.
Neo4jisbestforstoringdatathathasmanyinterconnectingrelationshipsthat'swhygr
aphdatabases
likeNeo4jhasanadvantageandmuchbetteratdealingwithrelationaldatathanrelatio
naldatabases are.

Thegraphmodeldoesn'[Link]
eatethe
databasestructurebeforeyouloadthedata(likeyoudoinarelationaldatabase).InNe
o4j,thedatais thestructure.Neo4jisa"schema-optional"DBMS.
InNeo4j,noneedtosetupprimarykey/foreignkeyconstraintstopredeterminewhichf
ieldscanhave
arelationship,[Link]
nodesyou need.

FeaturesofNeo4jGraphDatabase

SQLLikesimplequerydialectNeo4jCQL
It’sbackinguptheIndexesbyusingApacheLucence
ItcontainsaUItoexecuteCQLCommandsi.e,Neo4jDataBrowse
It’sbackinguptheUNIQUEconstraint
ItbolstersfullACIDproperties
ItutilizesNativegraphstockpilingwithNativeGPE
ItfollowsPropertyGraphDataModel
ItgivesRESTAPItobeexecutedforanyProgrammingLanguagelikeSpring,Java,
Scalaandsoforth
ItbolsterstradingofinquiryinformationtoJSONandXLSformat
AdvantagesofNeo4j
PropertiesofNeo4j

Figure15:GeneralLookatNeo4j
FollowingarepropertiesofNeo4j;
Datamodel(flexibleschema):[Link]
explainedlikegraphhasnodesandthesenodesareconnectedwitheach
[Link]-valuepairsknownas
properties.Neo4jhasalsoflexibleschemaitmeanspropertiescanbe
addedorremovedwhenitisnecessary.

ACIDproperties:Neo4jsupportsfullACID(Atomicity,Consistency,Isolation,and
Durability)rules.

Scalabilityandreliability:Databasecanbescaledbyincreasingthenumberofreads/
writes,andthevolumewithouteffectingthequeryprocessing
speedanddataintegrity.Neo4jalsoprovidessupportforreplicationfor
datasafetyandreliability.

Thetraversalofthegraph:Thetraversalistheoperationofvisitingasetof
nodesinthegraphbymovingbetweennodesconnectedwithrelationships.
It’[Link]
usingatraversalonlytakesintoaccountthedatathat’srequired,thereforeit
isnotneededtoquerytheentiredatasetinanexpensiveoperation,likeisthe
casewithjoinoperationsonrelationaldata.[1]

CypherQueryLanguage:Neo4jprovidesapowerfuldeclarativequerylanguagekno
[Link]
easytolearnandcanbeusedtocreateandretrieverelationsbetween
datawithoutusingthecomplexquerieslikeJoins.[9]

Built-inwebapplication:Neo4jprovidesabuilt-inNeo4jBrowserweb
[Link],creatingandqueryinggraphdatacanbedone.
Drivers:Neo4jcanworkwith
RESTAPItoworkwithprogramminglanguagessuchasJava,Spring, Scalaetc.
JavaScripttoworkwithUIMVCframeworkssuchasNodeJS.
ItsupportstwokindsofJavaAPI:CypherAPIandNativeJavaAPIto
developJavaapplications.
Indexing:Neo4jsupportsIndexesbyusingApacheLucence.
AdvantagesofNeo4jGraphDatabase

[Link]
[Link]
complexdataconnectionsasaresultoftheincreasedvolumeandstrengthinthe
data,[Link]
aretheadvantagesofNeo4j.

Easytorepresentconnecteddata:Itmakesbotheasyandfasttotraverseor
navigatelargeamountsofdatathathassomesortofrelationship
Canrepresentsemi-structureddataeasily:Datathatdoesnotfallintonatural
structurecanbeeasilyrepresentedinagraphdatabase

CypherCommands:Cyphercommandsarehumanreadableandveryeasy
tolearnSimpleandPowerfulDataModel:Thepropertygraphdatamodelis
[Link]
ndtheycancontaindataintheform ofkeyvaluepairsor
propertiesunliketherelationalmodel.

JoinAspect:There’snoneedforcomplexandcostlyjoinstoretrieveconnectedorrel
[Link]
[Link]
ortraversingagraphinvolvesfollowingthosepatsandbecauseofthat pathori-
entednatureofthegraphdatamodel,themajorityofpathbased
operationsareextremelyefficient.

Performance:Traversing a relationship is done in constanttime so query


performancedoesnotdecreasewhendatagrowsandCypherisdesignedfor
graphssoitisverysimpletowritegraphtraversalsbasedonpatternmatching.

Neo4jisonlygraphdatabasethatcombinesnativegraphstorage,scalable
architecture optimized forspeed,and ACID compliance to ensure
predictabilityofrelationship-basedqueries.[10]

Real-timeinsights:Neo4jprovidesresultsbasedonreal-timedata.
Highavailability:Neo4jishighlyavailableforlargeenterprisereal-time
applicationswithtransactionalguarantees.[15]
Biggestgraphcommunityintheworld:Neo4jhasthelargestandmost
contributorgraphcommunity.
Easytolearn:MatureUIwithintuitiveinteractionandbuilt-inlearning.[10]
PerformanceInNeo4j
Neo4jprovidesfastandefficientgraphexperienceandthestrongestpartofitis;Neo4j
[Link]-
ingdata sizedoesnoteffecttheperformanceofNeo4junlikerelationaldatabases.

VolkerPacher,eBaydeveloperandNeo4jclient:"OurNeo4jsolutionisliterallya
thousandtimesfasterthanthepreviousMySQLsolution,withsearchesthat
requirebetween10and100timeslesscode”.

Figure16:QuerytimesforOracleExadatavsNeo4j
Figure17:Tomtom’sComparisonofNeo4jwithMySQL
HowToIncreasePerformanceOfNeo4j?
Increasingthesizeofavailableheapmemory(Between8G-16G).
Increasingopenfilelimitfromdefault1024toatleast40000tobesure.
Inordertoavoidcostlydiskaccess,makingsureofrelevantgraphdatais
cachedinmemory.
Forthenon-Neo4jtasksrunningonthecomputerasufficientmemory
shouldbereserved.(Atleast16G)
Simplealgorithmsleadstoincreasedperformance.
Allrelatednodesandedgesshouldbekeptinservermemorybeforegiving results.
Traversalsshouldbeindependent.
Indexesshouldbeused.
WhatcanNeo4jbeusedfor?

Neo4jis highly suitable forstoring data thathas has many


[Link]
[Link],graphdatabaseslikeNeo4jaremuch
betteratdealingwithrelationaldatathanrelationaldatabasesare.
Thisisinpart,duetothefactthatthegraphmodeldoesn'tusually
[Link]'tneedtocreatethedatabase structurebefore
you load the data (like you do in a relational database).InNeo4j,thedatais
thestructure.Neo4jisa"schemaoptional"DBMS.

ButthemainreasonNeo4jisbetterforrelationaldataisinthewayit
allowsyoutocreaterelationships.Neo4jisbuiltaroundrelationships.
Thereisnoneedtosetupprimarykey/foreignkeyconstraintsto
predeterminewhichfieldscanhavearelationship,andtowhichdata.
WithNeo4j,justaddanyrelationshipbetweenanynodewheneveryou need.

SothismakesNeo4jextremelywellsuitedforsocialnetworking
applicationslikeFacebook,Twitter,[Link]
[Link] Neo4jcanbeusedfor:

● Socialnetworks
●Realtimeproductrecommendations
●Networkdiagrams
●Frauddetection
●Accessmanagement
●Graphbasedsearchofdigitalassets
●Masterdatamanagement

CypherQueryLanguage

Cypherisadeclarativelanguageforworkingwithgraphsandgraphdataforboth
[Link]
Cypherdefinespatternsinthegivengraphdata.

Cypherisdeclarativelanguage:Thismeansthatwespecifythedatathatweare
[Link].
Cypherisveryhumanreadablelanguageanditisaccessiblenotjustfordevelopersev
eryonecaneasilylearnanduseit.

CypherhasexpressionssimilartoSQLlikeWHERE,ORDER BY andsimple
conditionstatementslike<,=,>.Itsdifferencewithsqlis;Cypherisdesignedto
representgraphdatapatternsforexampleithasMATCHpropertythispropertyis
builtonfindingandspecifyingpatternsinthedata
Structure
NodesNodesrepresentsdataentitiesandtheycanhavelabelsandeachnode
[Link]-tional
[Link]
shownwithparentheseslike(p:Product).

Figure18:NodeRepresentation

RelationshipsInCypher;betweenthenodeswehavelineswhichrepresentthe
[Link]
[Link]
[Link]–>betweentwonodes.

OperationsInCypher
Create:Itisusedtocreatenodesandrelationshipsbetweenthem
Wecreatedanoderepresentinguswithfiveproperties;
Name:’AjitSingh’
Country:’India’
City:’Patna’

DateOfBirth:’21.05.1984’ School:’PWC’ WiththisCyphercode;

CREATE(n:Person{name:’AjitSingh’,country:’India’,city:’Patna’,
DateOfBirth:’21.05.1984’,School:’PWC’})RE-TURNn
Name:’AnnaTuruPi’
Country:’Spain’
City:’Barcelona’

DateOfBirth:’30.07.1995’ School:’PWC’ WiththisCyphercode;

CREATE (n:Person{name:’AnnaTuruPi’,country:’Spain’,city:
’Barcelona’,DateOfBirth:’30.07.1995’,School:’PWC’})RETURNn
Wecreatedarelationshipcalled"FRIENDS_WITH"withtheproperty"SINCE";
WiththisCyphercode;

MATCH(a:Person),(b:Person)[Link]=’AjitSingh’[Link]=
’AnnaTuruPi’CREATE(a)-[r:FRIENDS_WITH{SINCE:"17/09/2017"}]->(b)
RETURNr
(a)ResultinConsole (b)AfterCreatingRelationship
Figure19:CreateRelationshipBetweenTwoNodes
Match:Matchfindsspecifiedpatternsinthedata.
Figure20:Relationships
WiththisCyphercodeweshowedallpeoplewhomEstebanZimányiteachesto;
MATCH(a:Person)<-[:TEACHES_TO]-(b:Person{name:’Este-
banZimányi’}) [Link]
Set:Thisisusedtoupdatepropertiesinthenodesandrelationships.
WiththisCypherCodewechangedEstebanZimányi’sdateofbirthto’01.01.1966’
MATCH(n{name:’EstebanZimányi’})[Link]=’01.01.1966’
RETURNn
DeleteThisoperatordeletesnodesorrelationshipsinthedata.
WiththisCyphercodewedeletedAjitSingh
MATCH(n:Person{name:’AjitSingh’})DELETEn
LoadingDataWithCypher

TherearelotsofwaystoimportdatainNeo4jbutthemostcommonwayisuploadit
asacsvfile.LoadCSVoperatorisbuiltintoNeo4jandthisoperatorisusedforsmall
[Link]
morethan10millionrecordsthanweshoulduse[USING PERIODICCOMMIT[n]]
[Link]
onerunandcreatingeverythinginonetransaction

LoadCSV:ThisoperatorisusedforimportingCSVfilesintoNeo4j.
Figure21:LoadCSVOperatorStructure
UseCasesofNeo4j

Figure22:UseCasesOfNeo4j
Thecommonusecasesare;

RealTimeRecommendations:Recommendationalgorithmsfindsrelationships
betweenpeople,productsandotherservicesrelatedtopurposebasedonuser’s
previousbehaviors.Neo4jisabletostoreinterconnecteddataaboutcustomers
andproductsandsinceNeo4jdoesn’tneedindexingateverysuggestionit
providesveryfastandeffectivealgorithm [Link]
usesNeo4jforthispurpose
MasterDataManagement:Inlargeorganizations,differentsystemsstoresinformati
onaboutcustomers,employees,[Link]
modelitiseasyto bring datafrom differentsystemscreateviewsabout
customersorcankeeptrackofalltheinformationabouttheorganizational
systemitself.CiscousesNeo4jforthispurposeandthecompanyalsousesNeo4j
fortheirhelpdeskso-lution
Figure23:MasterDataManagementGraphDesign

FraudDetection:[Link]-days
inordernottobedetectedbybank’sfraudalgorithmspeopleusedifferent
approacheslikeopenseveralbankaccountswithvalidinformationanddonormal
[Link]
[Link]
detectthatbehaviorbutitisveryeasytoseethatwithgraphbecausethepattern
ofthepeopleopeningbankaccountsusingthesameidentitytokencanbeeasily
detectedasapatterninagraph

GraphBasedSearch:Metadataisavailableforthingslikeproducts,articlesetc.
Andbeingabletomodelmetadataasagraphallowstoenhancesearchmeaning
[Link];When
searchisexecutedwedon’tseerandomoralphabeticalsortedresultswefirstsee
therelevantones.LufthansausesNeo4jforthismatter.

Network&ITOperations:Ifdatacenterismodelledasagraphthendepen-dency
analysiscaneasilybeappliedonnetworksystemstogetconclusionslikeifone
virtualmachinegoesdownhowmanyapplicationswillbeaffected.HpusesNeo4j
tomodeltheirnetworkforsomelargetelecommunicationproviders.

Figure24:NetworkITOperationsGraphDesign
Identity&AccessManagement:Withinlargeorganizationstherearehundreds
ofusersandcontrollingwhocanaccesstowhichinformationiscrucialfor
[Link]
[Link]
handledbyNeo4j.UPCLondonusesNeo4jforthatanditreceived2014Graphic
awardsfor“Bestİdentityandaccessmanagementapp”

Chapcrer11:GettingstartedwithNeo4j
Requirements
[Link],requires:
JDKVersion8andabove.
Neo4jGraphDatabase3.1andabove.
SpringFramework{springVersion}andabove.
IfyouplanonalteringtheversionoftheNeo4j-OGMmakesureitisa3.0.0+release.
DownloadNeo4j
FirstdownloadNeo4jfromitsofficialwebsite:[Link]
YoucanchoosefromeitherafreeEnterpriseTrial,[Link]
e,weareusing theCommunityEdition.
Runthedownloadedfileandfollowtheinstructionsgivenbelow:
StartNeo4j:
StarttheServer

ClickontheinstalledNeo4jCommunityEdition.
Initializationstarted:
[Link]
Openbrowserandgotolocalhost:[Link]
Or[Link]

StartNeo4jwebserver

Visitthesub-directory/binoftheextractedfolderandexecuteinterminal./neo4jstart
Visit[Link]

Onlythefirsttime,youwillhavetosigninwiththedefaultaccountandchangethe
defaultpassword.Asofcommunityversion3.0.3,thedefaultusernameand
passwordareneo4jandneo4j.

YoucannowinsertNeo4jqueriesintheconsoleprovidedinyourwebbrowserand
visuallyinvestigatetheresultsofeachquery.
StartNeo4jwebserver

EachNeo4jservercurrently(inthecommunityedition)canhostasingleNeo4j
database,soinordertosetupanewdatabase:
Visitsub-directory/bin andexecute./neo4jstop tostoptheserver

Visitthesub-directory /conf [Link] ,changingthevalueofthe


parameterdbms.active_database tothenameofthenew databasethatyouwantto create.

Visitagainthesub-directory/binandexecute./neo4jstart
[Link]
visitagain[Link]
Thecreateddatabaseislocatedinthesub-directory/data/databases ,underafolder
withthenamespecifiedintheparameterdbms.active_database .

Deleteoneofthedatabases

MakesuretheNeo4jserverisnotrunning;gotosub-directory/binandexecute
./neo4jstatus .Iftheoutputmessageshowsthattheserverisrunning,alsoexecute
./neo4jstop .
Thengotosub-directory/data/databasesanddeletethefolderofthedatabase
youwanttoremove.
CypherQueryLanguage

ThisistheCypher,Neo4j'[Link],CypherissimilartoSQL
ifyouarefamiliarwithit,exceptSQLreferstoitemsstoredinatablewhile
Cypherreferstoitemsstoredinagraph.

First,weshouldstartoutbylearninghow to createagraphandadd
relationships,sincethatisessentiallywhatNeo4jisallabout.
CREATE(ab:Object{age:30,destination:"England",weight:99})
YouuseCREATEtocreatedata
Toindicateanode,youuseparenthesis:()

Theab:Objectpartcanbebrokendownasfollows:avariable'ab'and
label'Object'[Link],but
youhavetobeconsistentinalineofCypherQuery
Toaddpropertiestothenode,usebrackets:{}brackets
Next,wewilllearnaboutfindingMATCHes
MATCH(abc:Object)[Link]="England"RETURNabc;

MATCHspecifiesthatyouwanttosearchforacertainnode/relationship
pattern(abc:Object)referstoonenodePattern(withlabelObject)which
[Link]
asthefollowing

abc= findthematchesthatisanObjectWHEREthedestinationisEngland.

Inthiscase,WHEREaddsaconstraintwhichisthatthedestinationmustbe
[Link](neo4jwill
notacceptjustaMatch...yourquerymustalwaysreturnsomevalue[thisalso
dependsonwhattypeofqueryyouarewriting...wewilltalkmoreaboutthislater
asweintroducetheothertypesofqueriesyoucanmake].

Thenextlinewillbeexplainedinthefuture,afterwegooversomemore
[Link]
candowiththislanguage!Below,youwillfindanexamplewhichgetsthecastof
movieswhosetitlestartswith'T'

MATCH(actor:Person)-[:ACTED_IN]->(movie:Movie)
[Link]"T"
[Link],collect([Link])AScast
ORDERBYtitleASCLIMIT10;
AcompletelistofcommandsandtheirsyntaxcanbefoundattheofficialNeo4j
CypherReferenceCardhere.
RDBMSVsGraphDatabase RDBMS GraphDatabase
Table Graph Rows Nodes
ColumnsandData Propertiesanditsvalues
Constraints Relationships Joins Traversal

Cypher
Introduction
[Link]
andmatchesagainstaNeo4jGraph.
Cypheris"inspiredbySQL"andisdesignedtobyintuitiveinthewayyoudescribe
therelationships,[Link]
Cypherrepresentationofthepattern.

Examples
Creation

Createanode
CREATE(neo:Company)//createnodewithlabel'Company'
CREATE(neo:Company{name:'Neo4j',hq:'SanMateo'})//createnodewithproperties

Createarelationship
CREATE(beginning_node)-[:edge_name{Attribute:1,Attribute:'two'}]->(ending_node)

QueryTemplates
Runningneo4jlocally,inthebrowserGUI(default:[Link]
), youcanrunthefollowingcommandtogetapaletteofqueries.
:playquerytemplate
Thishelpsyougetstartedcreatingandmergingnodesandrelationshipsbytyping
queries.
CreateanEdge CREATE(beginning_node)-[:edge_name{Attribute:1,Attribute:'two'}]->(ending_node)

Deleteallnodes
MATCH(n)
DETACHDELETEn
DETACH doesn'tworkinolderversions(lessthen2.3),forpreviousversionsuse
MATCH(n)
OPTIONALMATCH(n)-[r]-() DELETEn,r

Deleteallnodesofaspecificlabel
MATCH(n:Book)
DELETEn
Match(capturegroup)andlinkmatchednodes
Match(node_name:node_type{}),(node_name_two:node_type_two{})
CREATE(node_name)-[::edge_name{}]->(node_name_two)
UpdateaNode
MATCH(n)
WHEREn.some_attribute="someidentifier"
SETn.other_attribute="anewvalue"
DeleteAllOrphanNodes
Orphannodes/verticesarethoselackingallrelationships/edges.
MATCH(n)
WHERENOT(n)--()
DELETEn ReadCypheronline:neo4j/topic/3669/cypher

Python&Neo4j
Examples

Installneo4jrestclient
pipinstallneo4jrestclient
Connecttoneo4j
[Link]
db=GraphDatabase("[Link]
Createsomenodeswithlabels
user=[Link]("User")
u1=[Link](name="user1")
[Link](u1)
u2=[Link](name="user2")
[Link](u2)

Youcanassociatealabelwithmany nodesinonego
Language=[Link]("Language")
b1=[Link](name="C++")
b2=[Link](name="Python")
[Link](b1,b2)
Createrelationships
[Link]("likes",b1)
[Link]("likes",b2) [Link]("likes",b1)

Bi-directionalrelationships
[Link]("friends",u2)
Matchusingneo4jrestclient
fromneo4jrestclientimportclient
q='MATCH(u:User)-[r:likes]->(m:language)[Link]="Marco"RETURNu,type(r),m'

"db"asdefinedabove
results=[Link](q,returns=([Link],str,[Link]))

Printresults
forrinresults:
print("(%s)-[%s]->(%s)"%(r[0]["name"],r[1],r[2]["name"]))

Output:
(Marco)-[likes]->(C++) (Marco)-[likes]->(Python)

Chapter12:Neo4jApplication
SoftwareForthegraphdatabase,Neo4jCommunityEdition3.2.5hasbeenused,
andfortherelationaldatabase,SQLServer2017.
UseCaseSelected

Asproposedingraphdatabasebenchmarkguidelines[4],thebestteststo
benchmarkagraphdatabaseare:traversal(whichincludesthecalculationofthe
shortestpath),graphanalysis,connectedcomponents,communities,centrality
measures,[Link]
amongthedomainswheregraphdatabasesprovetobemorebeneficialarethe
[Link]
implementation,wearegoingtomodelflightroutes,astheyhavetheideal
[Link]
wheretheinformationliesonthetheirintercommunications.

Data

Thedatasetselectedtoperform thebenchmarkwasadatasetofflightroutes pro-


vided by [Link] [13].Itprovided three flatfiles,[Link],
[Link],[Link].

Becauseofthesizeconcernswecreatedsyntheticdatainadditiontoourexistingdata
tables.Beforecreatingnewdatawehad67663differentroutesandnowwehave1193
413
[Link],theydonothaveany
[Link]
[Link]
have,themoreaccuratebench-
[Link],
addingmoredatatoNeo4jdoesnoteffectitsperformance.
ImplementingData

Figure25:[Link]
Neo4j:[Link]
py2neolibrarytoaccessNeo4jdatabaseanditreadsourdata(externalsource)to
createnodes,relationships,propertiesandindexes
Figure26:Structureofthepythoncode

[Link]
bettervisualizationwecreatedafunctionthatcalculatesthedistancebetweentwo
connectedairports.Routedatahassource_airportanddestination_airportSowe
createdaroutenodeandweassignedthedistancebetweensource_airportand
destination_airportasanameattributetoroutenode.Intheendfourtypesofnodes
areAirlines,AirportsandRoutes,andtheyhavethefollowingcommunications:

Route ! TO ! Airport
Route ! FROM ! Airport
Route ! OF ! Airline

Table2:Graphdatabaseschema WeimplementedourdatatoNeo4jwiththisschema;

Figure27:InitialSchema
Figure28:ExampleofaqueryinNeo4j
SQL:Arelationaldatabasewascreatedimportingeachflatfileasatableandthen
wecreatedforeignkeyreferencesbetweentables.
Exportdata

ToexporttheNeo4j,[Link]
[Link],[Link]:
[Link]=true.

ExporttoCSV

[Link](query,file,config): exports results from the Cypher


statementasCSV [Link](file,config):exports
whole database as CSV to the pro-vided file
[Link](nodes,rels,file,config):exportsgivennodesandrelationshi
ps
[Link](graph,file,config):exportsgiven
graphobjectasCSVtotheprovidedfile
Weexportedtheentiredatabaseexecutingthefollowingcommandincypher: CALL
[Link]("/temp/neo4j_database_csv_file.csv",
{batchSize:10})YIELDfile,source,format,nodes,relationships,properties,
time,rows.

Exporttocypherscript

[Link](file,config):[Link]
r statements to the provided file
[Link](nodes,rels,file,config):
[Link]
[Link](graph,file,config)exportsgivengraphobj
ect incl.

Uindexes as Cypher statements to the provided file


[Link](query,file,config):exportsnodesandrelationshipsfro
mthe Cypher statement incl. indexes as Cypher statements to the provided file
[Link](file,config):exportsallschemaindexesandconstraint
s tocypher

Thedatabasewasalsoexportedtocypheracypherscript:

CALL [Link]("/temp/neo4j_database_cypher_file.cypher",
{batchSize:10})YIELDfile,source,format,nodes,relationships,properties,time,
rows
Figure29:ExportingNeo4jdatabasetocypherscript
QueryExamples(Neo4j-SQL)
Figure30:Algorithmsforgraphdatabases

Addlibraries:IthasbeencommentedthatNeo4jincludesgraphalgorithmsthat
allow ustoperform queriesthatwouldbeimpossibletoperform inSQL.
LibrariesofalgorithmscanbedownloadedandaddedinNeo4jasplugins.
Figure31:Addjarfilesinpluginfolder

[Link],thislineofcodehas
[Link]:[Link]=apoc.*(e.g.,
apoclibrary).

Afterthat,Neo4jneedstoberestarted,anditcanbeverifiedthatthepluginis
workingbywritingthefollowingcommandinNeo4jbrowser:
[Link]()YIELDname,signature,description
WHEREnamestartswith"apoc"
RETURNname,signature,description
ShortestPath

Thisalgorithmistheonethatbetterjustifiestheexistenceofgraphdatabases.
[Link]
oflayerstheroutehas.
Firstqueryexample:findtheshortestpathtogofromanairportinMadridtoan
airportinSeoul.
MATCHp=shortestpath((src:Airportcity: ’Madrid’)-[r:FROM|TO*..15]
(dest:Airportcity: ’Seoul’))RETURNp

Figure32:ShortestpathqueryfromMadridtoSeoul
Figure33:Pipelineoftheshortestpathquery
Thenodescanbeexpanded,andweseetheairlinetowhicheachroutebelongs.
Figure34:Expandedshortestpathquery
Secondqueryexample:findtheshortestpathbetweenanairportinSeouland
anairportinAntwerp.
MATCHp=shortestpath((src:Airport{city: ’Seoul’})-[r:FROM|TO*..15]
(dest:Airport{city: ’Antwerp’}))RETURNp
Figure35:ShortestpathqueryfromSeoultoAntwerp

Payingattentiontotherelationships,itcanbeseenthatthequerydoesn’toutputa
physicallypossibletravelingroutefrom [Link]
query,oneofthepathsendsupinSeoul,buttheotherhastwosources,Madridand
Seoul,[Link],
oneinAntwerpandtwoinSeoul,andalltheroutesfinishinGeneve.

Thepurposeofthealgorithmistofindtheshortestpathtoconnecttwonodes,
independentlyofthephysicalmeaning,butrealroutescanbecreatedwiththe
followingmodification:

Persistentinferredrelationships:Foreachroutegoingfrom
anairporttoanother,[Link]
,the [Link]
tofindphys-icallypossiblepathsbetweentwoairports(e.g.,notsteppinginto
anairline)itwillbeassuredlookingforthatinferredrelationshipthatairports
arebeingconnectedtoairports.

[Link],andispropo
[Link]
shortestpathqueriesandcommunitydetectionqueries.

Cyphercodetocreatetherelationship:
MATCH(ap1:Airport)<-[:FROM]-(r:Route)-[:TO]->(ap2:Airport)
WHEREid(ap1)<>id(ap2)
WITHap1,ap2,COUNT(*)ASweight
CREATE(ap1)-[c:CONNECTED]->(ap2)
[Link]=weightInthefigurebelowthedatabaseschemaafteraddingthe
inferredrelationshipisdisplayed:
Figure36:Neo4jDBschemaafteraddingConnectedrelationships
Cyphercodetodeletetherelationship:
MATCH(ap1:Airport)-[r:CONNECTED]->(ap2:Airport)DELETEr

[Link]
[Link] detec-
tionqueries.
Cyphercodetocreatetherelationship:

MATCH(ap1:Airport)<-[:FROM]-(r:Route)-[:TO]->(ap2:Airport)
WHEREid(ap1)<>id(ap2)
WITHap1,ap2,r
MATCH(r)-[:OF]->(al:Airline)
CREATE(ap1)-[g:GOINGTO]->(ap2)
[Link]=[Link]
[Link]=id(r)
[Link]=[Link]
Inthefigurebelowthedatabaseschemaafteraddingtheinferredrelationshipis
displayed:
Figure37:Neo4jDBschemaafteraddingGoingtorelationships
Cyphercodetodeletetherelationship:
MATCH(Airport)-[r:GOINGTO]->(Airport)DELETEr
Thefirstshortestpathqueryisrunagainnowwiththeinferredrelationships:
MATCHp=shortestpath((src:Airport{city: ’Madrid’})-[r:GOINGTO]
(dest:Airport{city: ’Seoul’}))RETURNp

Figure38:ShortestpathbetweenMadridandSeoul

[Link]
seen,[Link]
followingqueryitcanbeverifiediftheroutematchestherequisites:

MATCH(r:Route)WHEREid(r)=50276RETURNr
ItisverifiedthattherelationshipGOINGTOwasequivalenttoarealoutbound
[Link]:
MATCH(r:Route)WHEREid(r)=50205RETURNr
Figure39:Shortestpathreturnrouteoutput

ShortestpathinSQLServer:SQLServerhasthelimitationthatitneedtobe
[Link]
query,butfromourexperience,itwasnoteffective.

Whenexecutingthequery,weobtainthefollowingmessage:"Thestatementterminat
ed.Themaximumrecursion100exhaustedbeforestatementcompletion."
Figure40:PipelineofNeo4jqueryonAntwerp-Patnashortestpath
Betweennesscentrality:

Thebetweennesscentralityofanodeinanetworkisthenumberofshortestpaths
betweentwoothermembersinthenetworkonwhichagivennodeappears. Between-
nesscentalityisanimportantmetricbecauseitcanbeusedtoidentify
“brokersofinformation”inthenetworkornodesthatconnectdisparateclusters.[6]

Thisqueryshowstheairportsthathavetobecrossedmoreoftenbyroutesto gofrom
[Link],theairportswheremore
[Link] inthefigurebelow,theairports
highlightedarelikebottlenecksthatconnectclustersofairports.

MATCH(ap:Airport)
WITHcollect(ap)ASairports
CALL [Link]([’CONNECTED’],airports,’OUTGOING’)
YIELDnode,score
[Link]=score
RETURNnodeASAirport,scoreORDERBYscoreDESCLIMIT25
Figure41:Betweennesscentralityqueryresult

Thequeryoutputsfivebigairports,whicharecommonlyusedtotransferduring
[Link]
centrality.

Closenesscentrality:

Closenesscentralityistheinverseoftheaveragedistancetoallothercharactersin
[Link]
clustersinthegraph,butnotnecessarilyhighlyconnectedoutsideofthecluster.[6]

[Link]
otherwords,itshowsthelocationsthataremoregeographicallyisolatedtobe
reachedbyothermeansoftransport([Link]).Itcanoutputtheairportswith
moredirectflightsfromdifferentlocationsortheairlinesthatperformmoreroutes.

Figure42:Conceptofclosenesscentrality
Queryexample:outputthefiveairportswithahigherclosenesscentrality:
MATCH(ap:Airport)
WITHcollect(ap)ASairports
CALL [Link]([’CONNECTED’], airports, ’OUTGOING’)
YIELDnode,score
RETURNnodeASAirport,scoreORDERBYscoreDESCLIMIT5
Figure43:Closenesscentralityqueryresult

Aspredicted,thequeryoutputsairportsthatareinhighlytouristicbutgeographicallyi
solatedlocations:LopezIslandnearSeattle,theriverAraguaiainthemiddle
ofBrazil,theGrandCanyonofColorado...
Figure44:Locationoftheairportswithhighestclosenesscentrality
Queryperformance:WritingPROFILEbeforethecypherquery,outputsthe
pipelineofthequeryexecution.

Figure45:Pipelineoftheclosenesscentralityquery
PageRank:
ThesecretofGoogle’ssuccesswasitssearchalgorithm,[Link]
worksbycountingthenumberandqualityoflinkstoapagetodeterminearough
[Link]
importantwebsitesarelikelytoreceivemorelinksfrom otherwebsites[11].This
algorithmcanoutputthemostconnectedairportorthemostpowerfulairline(the
nodeconnectedtomoreroutes).

Firstquery:outputthemostimportantairports
MATCH(ap:Airport)WITHcollect(ap)ASairports
[Link](airports)YIELDnode,score
Figure46:Pipelineoftheairportspagerankquery
Themostimportantairportsarefrom London,Paris,Frankfurt,Patna,Dubai,
[Link].
Secondquery:Outputthemostpopularairlines.
MATCH(node:Airline)WITHcollect(node)ASairlines
[Link](airlines)YIELDnode,score
Figure47:Pipelineoftheairlinespagerankquery
AsaresultwecanseethatRyanairistheleadingairline,followedbyfour
companiesfromtheUSAandthreefromChina.
CommunityDetection:

Therearemanyalgorithmsforcommunitydetection:trianglecounting,strongly
connectedcomponents,...Thisalgorithmsclustertogetherthenodesmore
[Link] from thelibraryAPOC,
andwhatthecodebelowdoes,[Link]
classificationisdeterminedontheweightoftheconnectedrelationships(the
numberofroutesbetweeneachpairofairports).

Seeingasairportsaregeographicallocation,androutsarephysicaljourneys
betweenthem,itisexpectedthatgeographicallyneighbouringairportswillbe
[Link].

CALL [Link](40,[’Airport’],’partition’,
’CONNECTED’,’OUTGOING’,’weight’,10000)
MATCH(ap:Airport)WHEREexists([Link])RETURNap

Figure48:Communitydetectiongraph

Thefigureovertheselinesshowstheshapeofthegraphafterthenodeshave
[Link],the
partitionnumbermustbereturnedasoutput:

[Link](40,[’Airport’],’partition’,
’CONNECTED’,’OUTGOING’,’weight’,10000)

MATCH(ap:Airport)WHEREexists([Link])
[Link],[Link],COUNT(*)ASnum
[Link],numDESC

Figure49:Communitydetectiontable

Goingbacktothevisualizationofthecommunitydetectionforairports,thepartitionsc
[Link]
nodesdisconnectedfrom therestofairportsiscomprisedofPapuaNewGuinea
airports(thecountrycanbeseenbyhoveringoverthenodes).Theybelongtothe
firstpartitioninthetable,6394.

Thefollowingpartofthegraphisabitscattered,butitcanbeseenthattheyareall
[Link],weseethattheyallbelong
toCanada,andwecansupposethatthemoreseparatednodesareregionalairports
[Link]
sevenpartitionsinthetable.

NexttoCanada,agroupofnodesareseparated,andthoseairportsareallfrom
Algeria.Theymustbelongtopartition6624.

[Link]
thoseareconnected withaGreenland’sairport,whichconnectswithother
GreenlandandIcelandairports.

ThenextsubgraphshowsairportsfromdifferentAfricancountriesinterconnected
[Link],thereareairports,andairportsfrom african
countrieshighlyconnectedtothem,andontherightsidetherearemainlynigerian
airports,amongotherafricanaiportstoo.

Goingbacktothecenterofthegraph,itishardtorecognizemorethanonepartition,
asitshowsthecentraleuropeanairports,whicharehighlyinterconnected.
Atlast,apartitionwasdetectedinthetable,[Link]
geographicallyrelated,ithasbeendeterminedthatthoseareislandsbetween
Polynesia,[Link]
(b)Geographicallocation
(a)Partitiontable
Figure50:Australasiapartition
PossiblequeriesonSQL
[Link]
willpresentoperationsapplicabletoboth;
Findingflightsbetweentwoairportsthathavenodirectroutebe-tweenthem:

MATCH
[Link]
p=allShortestPaths((ap1:Airport [1stAirport]
{city:’Antwerp’})-[*]->(ap2:Airport ,[Link][1st
{city:’Patna’})) Airline],
WITHextract(nodein [Link][2ndAirport],
nodes(p)|[Link])as [Link][2nd
cities, Airline],
extract(relin [Link][3rdAirport],
relationships(p)|[Link])as [Link][3rdAirline],
airlines [Link][4thAirport]
RETURNcities,airlines FROMroutesrINNERJOIN airportsa1
ONr.source_airport_id=[Link]
ĢINNERJOINairlinesairline1
[Link]=r.airline_id
INNERJOINairportsa2
ON
r.destination_airport_id=[Link]
INNERJOINroutesr2
[Link]=r2.source_airport_id
INNERJOINairlinesairline2
[Link]=r2.airline_id
INNERJOINairportsa3
ON
r2.destination_airport_id=[Link]
INNERJOINroutesr3
[Link]=r3.source_airport_id
INNERJOINairlinesairline3
[Link]=r3.airline_id
INNERJOINairportsa4
on
[Link]=r3.destination_airport_id
[Link]=’Antwerp’and
[Link]=’Patna’
(a)Neo4jResult
(b)SQLResult Figure51:ComparisonofQueries-firstquery
Asitcanbeseenfromherefindingallpossibleroutesbetweentwoairportsiseasyin
Neo4j.BesidesthatNeo4jgivesvisualization.

Thereisoneimportantpointhere;InSQLwehavetospecifylevelofdepthtofind
results.Forexampleinthisquerywesearched3-levelflightsbetweenAntwerpand
[Link]
Neo4jwedon’thavetospecifylevel,itfindsallroutesbetweentwoairportsandeven
[Link]
datathathaslevels.
Nearestairporttocitybydistance

Select
match(airport1:Airport{city:’Bologna’} top1
)<-[:FROM]-(route:Route) [Link],[Link],[Link]
-[:TO]->(airport2:Airport) ,[Link]([Link],a2. latitude,
RETURNairport1, [Link],[Link])as
route,airport2 distance
[Link] fromroutesr
asclimit1 INNERJOINairportsa
[Link]=r.source_airport_id
INNERJOINairportsa2
on
[Link]=r.destination_airport_id
[Link]=’Bologna’
orderbydistanceasc

WhilewewereuploadingourdataintoNeo4jwecreatedanodecalledroute
andthisnodehasthreerelationships;TO,FROM,OFandasadescriptive
[Link]
samepagewecreatedafunctioninSQLthatcalculatesdistancesbetween
airportsgivenlat-itudeandlongitudeattributesofairportswhichalreadyexists
inourdata.BothapproachesgivethesameresultbutNeo4jalsoprovides
visualization.
Mostconnectedairports

MATCH SELECT
(airport:Airport)<-[:FROM]-(r:Route)
[Link],[Link],[Link],SUM(A.route_count)
WITHairport,count(r)as ASroute_count
departures FROM(
MATCH SELECT
(r2:Route)-[:TO]->(airport) [Link],[Link],[Link],
[Link] COUNT(*)asroute_countFROM
airport_name,departures routesR
,count(r2)asarrivals INNERJOINairportsAON
orderby [Link]=source_airport_id
departures+arrivalsdesc
GROUPBY
[Link],[Link],[Link]
)
UNION(
SELECT
[Link],[Link],[Link],COUNT(*)
asroute_countFROM
routesR
INNERJOINairportsAON
[Link]=destination_airport_id
GROUPBY
[Link],[Link],[Link]))A
GROUPBY [Link],[Link],[Link]
(a)Neo4jQuery
(b)SQLquery
Figure52:Comparisonofqueries-thirdquery
Withthesequerieswefoundthemostinterconnectedairportbycountingnumberof
incomingandoutcomingflights.AsitseemsitisveryeasytowriteinNeo4j.

Bibliography
TareqAbedrabboDominicFoxJonasPartnerAleksaVukotic,NickiWatt.Neo4jin
[Link],2015.
[Link][Link]
w. [Link]/topic/graph-theory,[Link]-11-30.

DB-
[Link].
Availableat[Link] Martinez-
[Link]-Sal,D.A
discussiononthedesignofgraphdatabasebenchmarks.September2010.

[Link][Link]
Accessed:2017-11-20.
[Link]-11-30.
[Link]:[Link],
[Link]-11-30.
[Link].
[Link][Link]
tenreasons/.
[Link],[Link]-12-8.
University [Link] managementessentials.
Available at [Link]
[Link]-10-21.
[Link],[Link]://
[Link]/[Link]-11-3.

[Link]:[Link][Link]
t. com/graph_theory/graph_theory_introduction.[Link]-11-30.
[Link]
[Link]’sBlog,[Link]-11-29.

[Link]
Serra’sBlog,[Link]-11-29.
[Link]
, [Link]-11-29.
[Link]
Theory(ICDT),volume1186ofLNCS,pages1–[Link],Jan1997.
[Link]:[Link].ofthe3th
Symposium onPrinciplesofDatabaseSystems(PODS),pages119–[Link]
Press, 1984.

[Link],[Link],[Link],[Link],[Link]
age
[Link](JODL),1(1):68–
88, 1997.

[Link].ofthe6thInt.
[Link](ICDT),volume1186ofLNCS,pages262–
[Link],Jan 1997.

[Link]
e14th [Link](VLDB),pages407–
[Link],AugSept 1988.

[Link]
mation. [Link](ICDE),pages374–
[Link] Society,Feb1989.

[Link]
TransactionsonKnowledgeandDataEngineering(TKDE),6(2):225–238,1994.
[Link]
ModernPhysics,74:47,Jan2002.
[Link],[Link],[Link]
rnalof LogicandComputation,13(6):939–956,2003.
[Link]:[Link]
n ConferenceonHypertextTechnology(ECHT),pages201–
[Link],NovDec1992.

[Link]
hip
Model.TechnicalReportTR9315,InstituteofAdvancedComputerScience,Univer
siteit Leiden,May1993.

[Link],[Link],[Link],[Link],[Link]
[Link]
ase Technology(EDBT),volume580ofLNCS,pages21–
[Link],March1992.

[Link].I
nProc.
2ndEuropeanSemanticWebConference(ESWC),number3532inLNCS,pages346
–360,
2005.

[Link] [Link]´epied.A SurveyofQueryLanguages


forGeographic
[Link],pag
es 431–438,July1976.

[Link],AGraphicalQueryLanguageforSemanticDatabases
.In Proc.ofthe4th [Link] Scientificand StatisticalDatabaseManagement
(SSDBM),volume339ofLNCS,pages259–[Link],June1988.

[Link]
DatabaseTheory(ICDT),volume326ofLNCS,pages19–
[Link],AugSept1988.
[Link]¨o,[Link],[Link]
al ofChemicalInformationandComputerSciences(JCISD),43(1):1085–
1093,Jan2003.
[Link],Amsterdam,1973.
[Link],2005.

[Link],[Link],[Link](XML)
1.0, W3C Recommendation 10 February 1998.
[Link]

[Link],[Link],[Link],[Link],[Link],[Link],[Link],
and [Link]
[Link]:theinternatio
naljournalof computerandtelecommunicationsnetworking,pages309–
[Link].,2000.
[Link].ofthe16thSymposium onPrinciplesof
DatabaseSystems(PODS),pages117–[Link],May1997.

You might also like