Neo4j Graph Database Modeling Guide
Neo4j Graph Database Modeling Guide
CopyrightedMaterial
GraphDatabaseModelingwithneo4j
Copyright©2020-21byAjitSingh,AllRightsReserved.
Nopartofthispublicationmaybereproduced,storedinaretrievalsystem or
transmitted,inanyform orbyanymeans—electronic,mechanical,photocopying,
recordingorotherwise—
withoutpriorwrittenpermissionfromtheauthor,exceptfor
theinclusionofbriefquotationsinareview.
Forinformationaboutthistitleortoorderotherbooksand/orelectronicmedia,
contactthepublisher:
AjitSingh&AnantKumar
e:ajit_singh24@[Link]
e:anant@[Link]
w:[Link]
Preface
Thisbookisdesignedtowalkyouthroughthegraphdatamodeling.
Youwillbeintroducedtothe
basicprocessofdesigningagraphdatamodelthatcananswerawiderangeofbusiness
questionsacrossavarietyofdomains.
Graphdatamodelingistheprocessinwhichauserdescribesanarbitrarydomainasa
[Link]
odelis designedtoanswerquestionsintheform
ofCypherqueriesandsolvebusinessandtechnical
problemsbyorganizingadatastructureforthegraphdatabase.
Thisbookissimplytheintroductiontodatamodelingusingasimple,straightforward
scenario.
Thereareplentyofopportunitiesthroughouttheupcomingguidestopracticemodelin
gdomains andanalyzingchangestothemodelthatmightneedtobemade.
[Link]’sprobablyuselessifyoudon’[Link]
pe,size
andfunctionalityofthatcontainerdependsonyourintendeduse,butingeneral,aconta
ineris necessary.
[Link],
you
[Link]
eling.
Oftenreservedsolelyforseniordatabaseadministrators(DBAs)orprincipaldevel
opers,data
[Link]
umay worshiptheexpertdatamodelerfromafar.
Whilesomedatamodelingscenariosreallyarebestleftuptotheexperts,itdoesn’thav
etobe
[Link],datamodelingisasmuchabusinessconcernasatechnologi
calone. Soifyoudon’tknowasinglelineofcode,you’reinluck.
Anyonecandobasicdatamodeling,andwiththeadventofgraphdatabasetechnology,
[Link]
ion
[Link](i.e.,whatyouwantyourapplicat
iontodo).
Then,inthemodelingprocessyoumapthoseneedsintoastructureforstoringandorga
nizing yourdata.
Everydatamodelisunique,dependingontheusecaseandthetypesofquestionsthatus
ers [Link],thereisno“one-size-fits-
all”approachtodata
[Link]
esultin producinganaccuratedatamodelthatbenefitsyourprocessesandusecase.
Thegraphdatabasesarenecessaryforaveryconcretedatasets:hugeamounts
ofdataofhigh
complexity,[Link],they
efficientlyquerythroughtherelationshipsamongentities,incontrasttorelational
databases.
Graphdatabasessupportalgorithmstoperform concretequeriesthatareoutof
reachtorelationaldatabases,[Link],the
biggerthevolumeofdata,theslowerthequerieswouldbeinSQL,becausethey
would requireto lookup joined tableswith [Link]
databasesallow totraversethroughthegraphandreachahighlevelofdepth,
withouthavingtoreadallthedatastored.
[Link]
(ACID)thatstoresdatastructuredasgraphsconsistingofnodes,connectedby
[Link],itallowsforhighquery
performanceoncomplexdata,whileremainingintuitiveandsimpleforthedeveloper
.
Neo4jis,byfar,[Link]
[Link]
[Link];evendatasizegrow exponentially,
performanceofNeo4jdoesnotaffectedbyit.
Usingthisbook,you'llgetto learnthetheoryofgraphdatabaseandhowtouse
Neo4jtobuilduprecommendations,relationships,andcalculatetheshortestroute
[Link],bestpractices,use-cases,andan
applicationputtingeverythingtogether,thisbookwillgiveyoueverythingyouneedto
[Link],this
bookwillshow youtheadvantagesofusinggraphdatabasesalongwithdata
[Link]'llgainpracticalhands-onexperience
withcommonlyusedandlesserknownfeaturesforupdatinggraphstorewith
Neo4j'[Link],
helpsyougraspthefundamentalconceptsbehindthisradicalnewwayofdealing with
connected data,and willgiveyou lotsofexamplesofuse casesand
environmentswhereagraphdatabasewouldbeagreatinterest.
[Link]
advantageofNeo4j'spowerfulfeaturesandbenefits-addBeginningNeo4jtoyour
librarytoday.
Contents
[Link]
Graphdatabases
[Link]
Selectingvertexlabels
Examplesoflabelselection
Drawingagraphschema
Summary
[Link]
ERmodelsanddiagrams
Example
ProceduretoconvertanERmodeltoagraphschema
Rule#1:Entitytypesbecomevertextypes
Rule#2:Binaryrelationshiptypesbecomeedgetypes
Rule#3:Naryrelationshiptypesbecomevertextypes
Conversionexample
Verticesarevertices,andedgesareedges
Summary
[Link]
Normalizationofrelationaldatabases
Transformationrulesthatproduceequivalentschemas
RuleA:Renamingpropertiesandlabels
RuleB:Reversingedgedirections
RuleC:Propertydisplacement RuleD:Specializationandgeneralization
RuleE:Edgepromotion
RuleF:Propertypromotion
RuleG:Propertyexpansion
Summary
[Link]
Schemasandconstraints
Graphuniverses,transformationsandequivalence
Derivedtypes
Metarule:Addingandremovingderivedtypes
Provingthemetarule
Provingthe7rules:Renaming,Reversing,PropertyDisplacement,
Beyondtransformationrules
Summary
[Link]
[Link]:Firstorderlogicongraphdatabases
Background
OnSQL
Onfirstorderlogic
OnGremlin
Pixy:FirstorderlogicwithGremlin
ERmodelsinPixy
Queryrequirementsdon'tusuallymatterwhilemodeling
[Link]
StateoftheartofDatabases
TypesofDBMS
NoSQLDBMS ComparisonofDBMS
Currenttrends
[Link]
GraphTheoryandItsApplications
ConceptsofGraphDatabases
Queryperformance
10.Neo4j
IntroductionofNeo4j
AdvantagesofNeo4j
PropertiesofNeo4j
PerformanceInNeo4j
HowToIncreasePerformanceOfNeo4j?
CypherQueryLanguage
Structure
OperationsInCypher
LoadingDataWithCypher
UseCasesofNeo4j
11.Gettingstartedwithneo4j
InstallationorSetup
Installation&StartingaNeo4jserver
StartNeo4jfromconsole(headless,withoutwebserver)
StartNeo4jwebserver
StartNeo4jwebserver
Deleteoneofthedatabases
CypherQueryLanguage
RDBMSVsGraphDatabase
Cypher-Implementation
Creation Createanode
Createarelationship
QueryTemplates
CreateanEdge
Deletion
Deleteallnodes
Deleteallnodesofaspecificlabel
Match(capturegroup)andlinkmatchednodes
UpdateaNode
DeleteAllOrphanNodes
Python&Noe4j
12.Neo4jApplication
UseCaseSelected
Data
ImplementingData
Exportdata
QueryExamples(Neo4j-SQL)
ShortestPath
Betweennesscentrality:
Closenesscentrality:
PageRank:
CommunityDetection:
PossiblequeriesonSQL Bibliography
PartI
Chapter1
GraphDataModel
[Link],and
[Link].
Tablesarerelatedbyforeign-keyconstraints,whichishowyoucanconnectone
table’sinformationtoanother,[Link]-leveljoinsareoften
involvedwhenqueryingrelationaldatabases.
Foragraph,specificallyascatterplot,thinkoftheelementsasnodesor,[Link]
[Link]
e pairs and a [Link] are connected by relationships oredges.
Relationshipshaveatypeandadirection,[Link]
[Link]
andmorepowerfulwhenthemeaningisintherelationshipsbetweenthedata.
Relationaldatabasescaneasilyhandledirectrelationships,butindirectrelationship
s aremoredifficulttodealwithinrelationaldatabases.
Figure1a
Whenbuildingarelationaldatabase,[Link]
questionswillwebewantingtoanswer?Forexample,youwanttoknowhowmany
peoplewhoboughtatoaster,liveinKansas,haveacriminalrecord,anduseda
[Link],orthepersonwhocreated
thedatabasedidnotanticipateaquestionlikethis,itmaybeverydifficulttoretrieve
thatinformationfrom [Link],itispossibleto
[Link],youcanansweranyquestionaslong
[Link]
[Link]
[Link]
datapoints,rather,[Link]
information.
Therearetwopropertiesofgraphdatabasesweshouldconsiderwheninvestigatingg
raph databasetechnologies:
Theunderlyingstorage
Somegraphdatabasesusenativegraphstoragethatisoptimizedanddesignedforstori
ng
[Link],
[Link],anobject-oriented
database,orsomeothergeneral-purposedatastore.
Theprocessingengine
Somedefinitionsrequirethatagraphdatabaseuseindex-
freeadjacency,meaningthat
connectednodesphysically“point”[Link]
ly broaderview:anydatabasethatfrom
theuser’sperspectivebehaveslikeagraphdatabase
(i.e.,exposesagraphdatamodelthroughCRUDoperations)qualifiesasagraphdata
base. We do acknowledge,however,the significantperformance advantages
ofindex-free
adjacency,andthereforeusethetermnativegraphprocessingtodescribegraphdatab
ases thatleverageindex-freeadjacency.
From adatabasepointofview,theconceptualtoolsdefiningaDB-Modelshould
addressatleastthestructuringanddescriptionofthedata,itsmaintainabilityand
theform [Link],aDB-Modelis
definedasacombinationofthreecomponents,firstacollectionofdatastructure
types,secondacollectionofoperatorsorinferencerulesandthirdacollectionof
[Link]-Modelsdefineonlythe
datastructures,omittingsometimesoperatorsand/orintegrityrules.
Duetotheimportanceofmodelingconceptually,philosophicallyandinpractice,DB
[Link]-
Model
are:Toolforspecifyingthekindsofdatapermissible;generaldesignmethodologyfor
databases;copingwithevolutionofdatabases;developmentoffamiliesofhighlevel
languagesforqueryanddatamanipulation;focusinDBMSarchitecture;vehiclefor
researchintothebehavioralpropertiesofalternativeorganizationsofdata.
Sincetheemergenceofdatabasemanagementsystems,therehasbeenanongoing
debateaboutwhattheDB-Modelforsuchasystem [Link]
diversityofexistentDB-Modelsshowthatthereisnosilverbulletfordatamodeling.
Theparametersinfluencingtheirdevelopmentaremanifold,andamongthemost
importantwecanmentionthecharacteristicsorstructureofthedomaintobe
modeled,thetypeofintellectualtoolsthatappealstheuser,andofcourse,the
[Link],eachDB-
Modelproposal
isgroundedoncertaintheoreticaltools,andservesasbaseforthedevelopmentof
relatedmodels.
Figure1b:[Link],arrowsindicateinfluences,and
[Link].
DatabaseModelsEvolution–BriefHistoricalOverview
InthebeginningsofthedesignofDB-Models,physical(hardware)constraintswere
[Link]
relationalmodel,mostDB-Modelfocusedessentiallyinthespecificationofthe
[Link].˜cite50130developeda
taxonomyofDB-Modelspriorto1976,comparingessentiallytheirmathematical
structuresandfoundation,andthelevelsofabstractionused.
TworepresentativeDB-Modelsarethehierarchical andnetworkmodels,which
emphasizethephysicallevel,andoffertheuserthemeanstonavigatethedatabase
attherecordlevel,thusprovidinglow leveloperationstoderivemoreabstract
structures.
TherelationalDB-ModelwasintroducedbyCoddandhighlightsthe
conceptoflevelofabstractionbyintroducingtheideaofseparation
[Link]
[Link],itgainedawidepopularity
amongbusinessapplications.
SemanticDB-Modelsallowdatabasedesignerstorepresentobjects
andtheirrelationsinanaturalandclearmannertotheuser(asopposed
topreviousmodels).Theyintendedtoprovidetheuserwithtoolsthat
couldcapturefaithfullythesemanticsoftheinformationtobemodeled.
Awellknownexampleistheentityrelationshipmodel.
ObjectorientedDB-Modelsappearedintheeighties,whenmostofthe
researchwasconcernedwithsocalled“advancedsystemsfornewtypesof
[Link]-Modelsarebasedontheobjectorientedparadigm
andtheirgoalisrepresentingdataasacollectionofobjectsthatare
organizedinclassesandhavecomplexvaluesassociatedwiththem.
SemistructuredDB-Modelsaredesignedtomodeldatawithaflexible
structure,e.g.,[Link](also called
unstructured data)is neitherraw norstrictly typed as in
[Link],dataismixedwiththe
schema,[Link]
redintheninetiesandarecurrentlyinevolution.
TheXML(eXtendedMarkupLanguage)modeldidnotoriginateinthe
[Link]
exchangeandmodeldocuments,soonitbecameageneralpurpose
model,[Link]
semistructuredmodel,schemeanddataaremixed.SeeSection2.3fora
moreindepthcomparisonamongthesemodels.
[Link]-Models
designedforparticularapplications,aswellasmodelingframeworksnot directly
focusing in database issues,which indirectly concern graph database
modeling. Among the DB-Models are Spatial databases, Geographical
Information Systems (GIS), Temporal DB-Models], MultidimensionalDB-
Models].Frameworksrelatedtoourtopic,butnot
directlyfocusingindatabaseissuesareSemanticNetworks.
GraphDatabaseModels–BriefHistoricalOverview
ThenotionofgraphDB-Modelmadeitsappearancealmostinparallelwiththe
objectorientedDB-
Models,asanalternativetothelimitationsoftraditionalDBModelsforcapturingthei
nherentgraphstructureofdataappearinginapplications
suchashypertextorgeographicdatabasesystems,wheretheinterconnectivityof
dataisanimportantaspect.
Activityaroundgraphdatabasesflourishedinthefirsthalfoftheninetiesandthenthe
[Link]:thedatabase
communitymovedtowardsemistructureddata(aresearchtopicwhichdidnothave
linkstothegraphdatabaseworkinthenineties);theemergenceofXMLcapturedallth
e attentionoftheworkonhypertext;peopleworkingongraphdatabasesmovedto
particularapplicationslikespatialdata,web,documents;thetreelikestructureiseno
ugh
formostapplications.Figure2reflectsthisevolutionbymeansofpaperspublishedin
mainconferencesandjournals.
GraphDB-Modelsemergedwiththeobjectiveofmodelinginformationwhose
[Link],RoussopoulosandMylopoulosfacingthe
failureofcurrent(atthetime)systemstotakeintoaccountthesemanticsofthe
database,[Link]
implicitstructureofgraphsforthedataitselfwaspresentedintheFunctionalData
Model,whosegoalwastoprovidea“conceptuallynatural”databaseinterface.A
differentapproachproposedtheLogicalDataModel,whereanexplicitgraphDBMo
delintendedtogeneralizetherelational,[Link]
laterKuniiproposedagraphDB-Modelforrepresentingcomplexstructuresof
knowledgecalledGBASE.
GraphDatamodeling
WhatisaGraphDataModel?
GraphDB-ModelisconceptualizedaccordingtothethreebasiccomponentsofaDB
-Model,namelydatastructures,transformationlanguage,andintegrityconstraints.
AgraphDB-Modelischaracterizedby:
Thedataand/ortheschemaarerepresentedbygraphs,orbydata structures
generalizing the notion of graph (hypergraphs,
hypernodes,hygraphs,etc.).Almosteverybodycoincideonthis
pointmoduloslightvariations.
[Link]
istomodelthedatabasedirectlyandentirelyasagraph[58].Agraph DB-
Modelisonewhosesingleunderlyingdatastructureisalabeled
directedgraph;[Link]
schemainthismodelisadirectedgraph,whereleavesrepresentdata
[Link]
labeledgraphsareusedastheformalism tospecifyandrepresent database
schemes,instances,and [Link] modelis basically
[Link],adatabaseis
describedintermsofalabeleddirectedgraphcalledschemagraph.A graphDB-
Modelformalizestherepresentationofthedatastructures
[Link]
[Link]
[Link]
instancesanddatabasesschemesaredescribedbycertaintypesof
labeledgraphs[68].Themodelfordataisorganizedasgraphs.
Labeledgraphsareusedtorepresentschemesandinstances.
Ontopofthesedescriptions,onecouldaddthefactthatsometimestheschema
andthedata(instances)aredifficulttodifferentiateinthesemodels,afactthat
[Link]
instancesareseparated.
Theexistenceofintegrityconstraintsenforcingtheconsistencyofthe data,which
aredirectlyrelated to thegraph [Link] example,labelswith unique
names typing constraints on nodes
functionaldependencies,domainandrangeofproperties.
Summarizing,agraphDB-Modelisamodelwherethedatastructuresfortheschema
and/orinstancesaremodeledasa(labeled)
(directed)graph,orgeneralizationsofthe
graphdatastructure,wheredatamanipulationisexpressedbygraphorientedoperati
ons
andtypeconstructors,andhasintegrityconstraintsappropriateforthegraphstructure
.
WhyaGraphDataModel?
TheapplicationareasofgraphDB-
Modelmodelsarethosewereinformationaboutthe
interconnectivityorthetopologyofthedataismoreimportant,orasimportantas,the
[Link]
[Link],introducinggraphsasamodelingtoolhasseveral
advantagesforthistypeofdata.
First,itleadstoamorenaturalmodeling:[Link]
allow anaturalwayofhandlingdataappearinginapplications([Link]
geographicdatabases).Graphshaveanimportantadvantage:theycankeepallthe
informationaboutanentityinasinglenodeandshow relatedinformationbyarcs
connected to [Link] objects(likepaths,neighborhoods)mayhavefirstorder
citizenship;auser
Typeof Abstract. Basedata Main Model level structure Focus Datacomplex. homogeneity.
Table1:Acoarsegranularitycomparativeviewamongdifferentgeneral
[Link]:abstractionlevel,base
datastructureused,whatarethetypesofinformationobjectstheDBModelfocusin,co
mplexityandhomogeneityofthedataitemsmodeled.
Second,[Link]
specificgraphoperationsinthequerylanguagealgebra,suchasfindingshortest
paths,determiningcertainsubgraphs,[Link]
[Link],this
isincontrasttographmanipulationindeductivedatabases,whereoftenfairly
complexruleprogramsneedtobewritten..Lastbutnotleast,forpurposesof
browsingitmaybeconvenienttoforgettheschema.
Third,asfarasimplementationisconcerned,graphdatabasesmayprovidespecial
storagegraphstructuresfortherepresentationofgraphsandthemostefficient
[Link]
havesomestructure,thestructureisnotasrigid,regularorcompleteastraditional
[Link]
[Link] canuseefficientgraphalgorithmsdesignedto
utilizethespecialgraphdatastructures[58].
ComparisonwithotherDatabaseModels
InthissectionwecomparethemostinfluentialDB-ModelswithgraphDB-Models.
[Link]
wepresentthedetails.
[Link]
[Link] the
hierarchicaland network [Link] models lack good
[Link]
datastructuring isnotflexibleand notaptto modelnontraditional
[Link].
RelationalDB-ModelwasintroducedbyCoddtohighlighttheconcept
oflevelofabstractionbyintroducingacleanseparationbetweenphysical
[Link]
[Link]
therelationalmodel,inatimewherethedomainofapplicationwere
basicallysimpledata(banks,payments,commercialandadministrative
applications).
ThedifferencesbetweengraphDB-ModelsandtherelationalDB-Model
[Link]:therelationalmodelwas
directedtosimplerecordtypedatawithastructureknowninadvance
(airlinereservations,accounting,inventories,etc.).Theschemaisfixedand
[Link] nor
automatizable. The query language does not support paths,
neighborhoodsandseveralothergraphoperations,likeconnectivity(an
exceptionistransitivity).Therearenoobjectsidentifiers,butvalues.
SemanticDB-Modelshavetheirorigininthenecessitytoprovidemore
expressivenessandincorporatearichersetofsemanticsintothedatabase from
[Link] databasedesignerstorepresent
objectsandtheirrelationsinanaturalandclearmanner(similartothewaythe
userviewanapplication)byusinghighlevelabstractionconceptssuchas
aggregation,classificationandinstantiation,subandsuperclassing,attribute
[Link]
[Link],butdue
tolackofprecisenesscannotreplacemodelslikerelationalorObjectOriented.
OtherexamplesofsemanticDB-ModelsareIFO
[Link],semanticDB-
Modelsarerelevantbecausetheyarebasedon
agraphlikestructurewhichhighlightstherelationsbetweentheentitiestobe
modeled.
OO DB-ModelshavebeenrelatedtographDB-Modelsduetothe
[Link],there
remainimportantdifferencesrootedintheform thateachofthem
[Link]-Modelsviewtheworldasasetofcomplex
objectshavingcertainstate(data)andinteractingamongthem by
[Link],graphDB-Models,viewtheworldasanetwork
ofrelations,emphasizing theinterconnection ofthedata,and the
[Link]-Modelsisonthe
dynamicsoftheobjects,[Link],graphDBModelsemph
asizestheinterconnectionwhilemaintainingthestructural
andsemanticcomplexityofthedata.
[Link](alsocalled
unstructureddata)wasmotivatedby:theincreasedexistenceofunstructured
data,dataexchangeand,[Link]
irregular,implicitandpartial;theschemadoesnotrestrictthedata,only
describesit,isverylargeandrapidlyevolving;theinformationassociatedwitha
schemaiscontainedwithinthedata(datacontainsdataanditsdescription,soit
isselfdescribing).AmongthemostrepresentativemodelsareOEM,Lorel,UnQL,
[Link],semistructureddataisrepresentedbyatreelike
[Link],establishinginthis
[Link]
semistructureddataasrooteddirectedconnectedgraphs.
GraphDataModelMotivationsandApplications
GraphDB-Modelsaremotivatedbyreallifeapplicationswhereinformationabout
[Link]
areasinClassicalandComplexnetworks.
Onthesamedirection,theobservationthatgraphshavebeenintegral
partofthedatabasedesignprocessinsemanticandobjectorientedDB
-Models,broughttheideaofintroducingamodelinwhichboth,data
manipulationanddatarepresentationweregraphbased.
Limitations(atthetime)ofknowledgerepresentationsystems,and
theneedforintricatebutflexibleknowledgerepresentationand
derivationtechniques.
[Link] this
direction the application in mind were CASE,CAD,image
processing,andscientificdataanalysis.
Graphicalandvisualinterfaces,geographical,pictorialandmultimedia systems.
ApplicationswheredatacomplexityexceededtherelationalDB-Model
[Link],managing
transportnetworks(train,plane,water,telecommunications),spatially
embeddednetworkslikehighway,[Link]
applicationsarenowinthefieldofGeographicalinformationsystems
andspatialdatabases.
[Link] huge
networksofdata which share some particularmathematical parameters, called
complex networks. The need for database
managementforsomeclassesofthesenetworkshasbeenrecently
[Link] thepointofviewof
databasesonecantreatthemasawhole,wewilldescribethemtogether
[Link],wewillgroup them in four
categories:socialnetworks,information networks,
[Link]
specificexamplesforeachofthem.
Insocialnetworks,nodesarepeopleandgroupswhilelinksshow
[Link],
businessrelationships,patternsofsexualcontacts,researchnetworks
(collaboration,co-authorship),communicationrecords(mail,telephone
calls,email),Computernetworks,[Link]
activityintheareaofSocialNetworkanalysis,visualizationanddata
processinginsuchnetworks.
Ininformationnetworksoccurrelationssuchascitationsbetween
academicpapers,WorldWideWeb(hypertext,hypermedia),peertopeer
networks,relationsbetweenwordclassesinathesaurus,preference networks.
Intechnologicalnetworksthestructureismainlygovernedbyspaceand
[Link](asnetworkofcomputers),Electric
powergrids,airlineroutes,telephonenetworks,deliverynetwork(postoffice).
TheareaofGeographicInformationSystems(GIS)istodaycoveringabigpart
ofthisarea(roads,railways,pedestriantraffic,rivers).
Itisimportanttostressthatclassicalquerylanguagesofferlittlehelp
[Link],
dataprocessinginGISincludegeometricoperations(areaorboundary,
intersection,inclusions,etc),topologicaloperations(connectedness,paths,
neighbors,etc)andmetricoperations(distancebetweenentities,diameter
ofthenetwork,etc).Ingeneticregulatorynetworksexamplesofmeasures
areconnectedcomponents(interactionsbetweenproteins)anddegreesof
nearestneighbors(strongpaircorrelations).Insocialnetworks,distance,
neighborhoods,clusteringcoefficientofavertex,clusteringcoefficientofa
network,betweenness,sizeofgiantconnectedcomponents,sizedistribution
[Link],
wherequeryingRDFdataincreasinglyneedsgraphfeatures.
RepresentativeGraphDatabaseModels
InthissectionwedescribeinsomedetailthemostrepresentativegraphDB-Models,
choosingthosethatdefineanduseexplicitlygraphstructuresorgeneralizationsof
[Link],donotfit
[Link],graphsareused,forexample,fornavigation,
fordefiningviews,oraslanguagerepresentation.
Foreachproposal,wepresenttheirdatastructuresand,whenavailable,theirquery
[Link],therearefewimplementationsan
d nostandardbenchmarks,[Link]
modelingineachproposal,wewillrunthefollowingexampleaboutatoygenealogy
showninFigure3.
Figure2:Agenealogydiagram(righthandside)representedastwotables(lefthand
side)NAMELASTNAMEandPERSONPARENT.
(Childreninheritthelastnameofthe fatherjustformodelingpurposes.)
Figtype3:[Link](ontheleft)usestwobasictypenodes
forrepresentingdatavalues(NandL),andtwoproducttypenodes(NLandPP)
[Link]
(ontheright)isacollectionoftables,[Link]
internalnodesusepointers(names)tomakereferencetobasicandsetdata
datavaluesdefinedbyothernodes.
LogicalDataModel(LDM)
MotivatedbythelackofsemanticsintherelationalDB-
Model,KuperandVardiproposed aDB-
Modelthatgeneralizestherelational,[Link]
describesmechanismstorestructuredata,alogicalquerylanguageandanalgebraic
querylanguage.
InLDM aschemaisanarbitrarydirectedgraphwhereeachnodehasoneofthe
followingtypes:TheBasictypedescribesanodethatcontainsthedatastored;the
CompositiontypeTEXdescribesanodethatcontainstupleswhosecomponents
aretakenfromthechildrenofit;theCollectiontypedescribesanodethatcontains
sets,[Link],internalnodesare
oftype⊗ or⊛ representingstructureddata,terminalnodesareoftypeand
representatomicdata,andedgesrepresentconnectionsbetweendata.
Asecondversionofthemodel,besidesrenamingthenodes ⊗and
⊛ asproductandpowerrespectively,incorporatesanewtype,theUniontype∪ ,
intendedtorepresentacollectionwhosedomainistheunionofthedomainsofits
children(seeexampleinFigure4).
ALDMdatabaseinstanceconsistsofanassignmentofvaluestoeachnodeofthe
[Link],theinstanceofanodeisasetofelementsfrom the
underlyingdomain(forbasictypenodes)andtuplesorsetstakenfromtheinstance
ofthenode’schildren(for⊗,⊛andtypes).
Withtheobjectiveofavoidingcyclicityattheinstancelevel,themodelproposestoke
ep
[Link],instancesconsistofa
setoflvalues(theaddressspace),plusanrvalue(thedataspace)assignedtoeachof
[Link]
gies.
[Link],
[Link],andalgebraic
language–equivalenttothelogicallanguage–isproposed,providingoperations
fornodeandrelationcreation,transformationandreductionofinstances,andother
operationslikeunion,differenceandprojection.
LDM isacompleteDB-Model([Link]
constraints)Themodelsupportsmodelingofcomplexrelations([Link],
recursiverelations).Thenotionorvirtualrecords(pointerstophysicalrecords)pro
ves
usefultoavoidredundancyofdatabyallowingcyclicityattheschemaandinstancelev
el. Duetothefactthatthemodelisageneralizationofothermodels(liketherelational
model),theirtechniquesorpropertiescanbetranslatedintothegeneralizedmodel.A
relevantexampleisthedefinitionofintegrityconstraints.
Figure4:[Link](left)definesapersonasacomplexobject
withthepropertiesnameandlastnameoftypestring,andparentoftypeperson
(recursivelydefined).Theinstance(ontheright)showstherelationsinthegenealogy
amongdifferentinstancesofperson.
Chapter2
GraphSchemas
Selectingvertexlabels
[Link]
of
[Link]
edgescanhavepropertieswhicharekeyvaluepairswithStringkeysandprettymucha
ny valuethattheunderlyingdatabasesupports.
Sofar,themodellooksschemalesssinceverticesandedgescan'tbedistinguishedfro
m otherverticesandedgeswithoutknowingwhatthepropertiesmean
.However,edgeshave
alwayshadlabels.AndwithTinkerpop3,[Link]
strue withNeo4J'slatestmajorversion.
Ifeveryvertexmust belabeled,whatisthecorrectmethodtoselectalabel?
Whatshoulda labelsayaboutavertexoranedge,fromtheapplication'sperspective?
Wethinkavertexlabelshouldrepresentthemostgranuartype
ofthevertex,whereeach "vertextype"
isassociatedwithaunquecombnaonof:
meaning(semantics),
setofpropertykeynamesandvaluetypes,and
setofoutgoingedgelabels,whereeachlabeltypeisannotatedwiththepossible
directionsoftheedge(in/out/both)andcardinality.
Whyso?
Becauselabelsrepresentingvertextypesgivetheapplicationthemostdetailed
informationaboutthe behaor ofthatvertex,therebyensuringthattheapplicationcan
[Link],oneshouldnotbeabletosubdivideav
ertex
typetogettwovertextypesthatbehavedifferentlyfromtheapplication'sstandpoint.
Examplesoflabelselection
Let'sgothroughthelabelselectionexercisewiththeclassic6vertextinkergraphshow
nin
thepropertygraphmodelpage.SincethisisaTinkerpop2stylegraph,itdoesn'thavev
ertex
[Link]'llnowtrytocomeupwiththevertexlabelsbysimplylookingatthevertexbe
havior.
Fgure1:TnkerGraphexampe
Ifyoulookclosely,therearetwotypesofvertices:oneswith'name'and'age',andones
with 'name'and'lang'.Letuslabe
theformervertextypeas'Person'andthelattervertextypeas
'Software'.Inotherwords,youhavepersonsnamed'marko','vadas','peter'and'josh'
and softwaresnamed'lop'and'ripple'.
Afteranalyzingtheedgelabelsanddirection,youcouldsaythatthe'Person'vertextyp
ehas:
Propertykeys'name'and'age'
Edgeslabeled'knows'intheOUTdirection
Edgeslabeled'created'intheOUTdirection
The'Software'vertextypehas:
Propertykeys'name'and'lang'
Edgeslabeled'created'intheINdirection
Now,anapplicationlookingatthisgraphautomaticallyknowswhattoexpectwhenitr
eadsa
vertexlabeled'Person'or'Software'.Wecandefinetwodifferentindexeson'name',o
nefor
PersonandoneforSoftware,tomakesurethatsoftwaresearchesdotpckuppeope
,or viceversa.
Thelabelselectionprocesscan'[Link],apersonw
ithno
friendscanbethoughtofasaseparatevertextype,becausetherearenoadjacent'know
s'
[Link],unlessthismakessenseinthecontextoftheapplicati
onor
thedatamodel,thereisnopointinsubdividingthe'Person'vertextypeas'Loner'and'P
erson
withFriends'.Thesameargumentgoesforsubdividingthepersonvertextypeasthe
Developer'and'NonDeveloper'basedonwhetherthatpersoncreatedasoftware.
Torecap,therightwaytoselectvertexlabelsforapropertygraphistofirstfigureoutthe
[Link]
graphschema.
Drawingagraphschema
Thebestwaytorepresentagraphschemais,ofcourse,[Link]
graphschemalooksfortheclassicTinkerpopgraph.
Fgure2:Exampegraphschemashownasapropertygraph
[Link]
pes,
[Link]
[Link]
e nameofthemostspefcsuperass representingthecorrespondingpropertyvalues
[Link]'?'aftertheclassname(notshown
here).
Edgepropertiesarelikevertexproperties,exceptthatthereisaspecialpropertyname
d'#' thatholdsthecarnaty from
[Link]
cardinalityisM:N,i.e.,many-to-
many,forboth'knows'and'created'.Onecouldbemisledto
thinkthatsomeoftheserelationshipsare1:[Link]
other reasonfornotfullyrelyingonreverseengineeringmethodstoderiveschemas.
[Link],th
egraph
schemaisverysimple,althoughthevisualizationofthegraphshowninthelinklooks
complicated.
Fgure3:GratefulDeadgraphschema
[Link],the
schemaisextremelysimple(simplisticgivenrecentUSSupremeCourtrulings).
Fgure4:Famlytreegraphschema
NotethatinthePixyschema,thepropertylistsarethesamefor'Man'and'Woman',butth
e
directionofthe'wife'edgeisfunctionallydependentonthevalueofthe'sex'property.
Thisis veryinterestingbecausethismeansthatgraphschemascoudbenormazed
usingruleslike [Link]!
Summary
Thissectionintroducedtheideaofschemasforpropertygraphsanddescribedhow
the
[Link],itdescribedameth
odto deve
thegraphschemaforanexistingpropertygraphbyfindingthemostgranulardivision
ofitsverticesintovertextypes.
Graphschemas(orschemagraphs)helpapplicationdevelopersbetterunderstandth
e graph'sstructure.
Inthenextsection,wewilllookattheproblem [Link] deve
agraph schemafrom
ahigherlevelconceptualmodelsuchasanEntityRelationshipmodel?Could
thisbeasystematicmethodtoselectvertexandedgelabels,andpropertykeyswhen
designingagraphdatabaseapplication?
Chapter3
ConvertingERmodelstographschemas
Thissectionwilldescribeageneralmethodtoconvertanentityrelationshipmodelto
a
[Link],adatabasedesignercandevelopERmode
lsusing
standardconceptualmodelingpractices,butstorethedatainagraphdatabaseinstead
ofa relationaldatabase.
ERmodelsanddiagrams
TheentityrelationshipmodelwasproposedbyPeterCheninhis1976papertitled"Th
e
EntityRelationshipModelTowardaUnifiedViewofData".Theideasinthispaperar
e [Link]
model.
Conceptualmodelingisaparticularlyusefulexercisewhenembarkingonaprojectth
at [Link]
[Link]
modelingistolookatthenaturallanguagedescriptionofanapplication'srequirement
s.
Theserequirementscanbeanalyzedtoidentifytheentityandrelationshiptypes,using
Chen's"rulesofthumb"(quotedfromWikipedia):
Commonnoun Entitytype
Propernoun Entity
Transitiveverb Relationshiptype IntransitiveverbAttributetype
Adjective Attributeforentity Adverb Attributeforrelationship
Example
Letusconsiderthefollowingrequirements:
Modelasystem whereuserscreatepages,[Link]
[Link]
whicharethenusedtorecommendothersectionstotheauthorsandinvitedreaders.
Youcouldanalyzethisrequirementandcomeupwiththreeentitytypes,[Link],Page
and
[Link],InvitesandTaggedAscapturetherelationships.N
otethat
allverbsdon'tbecomerelationships(likecreate).Similarly,thefactthatinvitationso
nly applytopagesthatauserownsislostinthismodel.
Fgure5:ExampeERdagram
Thesquareshapedboxesshow
entitytypes,[Link]
diamondshapedboxesshowrelationshiptypes,whichrepresentsetsofsimilarrelati
onships. Arelationshiptyperelatestwoormoreentitytypestoeachother.
Thediagram
showsthecardinalityofeachentity'scontributiontoarelationship,suchas1:N
(onetomany)orN:N(manytomany).Thecardinalityisspecifiedusingthe'lookacros
s'
[Link],aUserownsNpages,[Link]
own limitationsoflookacrosscardinalityforternaryrelationshipslikeInvites.
Thediagram
alsoshowssomeovalshapedattributes,[Link]
[Link]
ustbe underlined.
Now,[Link]
fromanERperspective,itmakessensetomodeltagasanentity,especiallyiftagsare
usedtoestablishrelationshipsacrossusersforrecommendations.
ProceduretoconvertanERmodeltoagraphschema
TheproceduretoconvertanERmodeltoarelationalmodeliswellknownanddiscuss
edin
[Link]
r proceduretheERdiagramwiththeaboveexample.
Rule#1:Entitytypesbecomevertextypes
EntitytypessuchasUser,PageandTagbecomevertextypes.
Thenameoftheentitytypebecomesthelabelofthevertextype.
Theassociatedattributesbecomethepropertiesofthevertextype.
Notethatwearedrawingagraphschema,[Link]
any
[Link]
erm
"vertextype"[Link]"
entity types"(likeUser)andentities(likeJohnDoe,theuser).
Rule#2:Binaryrelationshiptypesbecomeedgetypes
AllbinaryrelationshiptypesintheERdiagram
canbeconvertedtoedgetypesinthegraph schema.
Thenameoftherelationshiptypebecomesthelabeloftheedgetype.
Theassociatedattributesbecomethepropertiesoftheedgetype.
Theendpointsoftheedgetypearethevertextypescorrespondingtotherelatedentity
[Link]'tmatter.
Hereisanexampleshowingthe"Owns"relationshiptypetranslatedtoan"owns"edg
etype:
Notethatonetomanyandmanytomanybinaryrelationshipscanbemodeledasedges
without
[Link],youwouldneedanadditionaltable
tocapture manytomanyrelationships.
Fgure7:Ownsraonshpconvertedtoanownsedge
Aminorpointisthatthecardinalityiswrittenas1:NbecausetheUser(outvertextype)t
o
Page(invertextype)relationshipisa1:Nrelationship,usingthelookacrossmethod.I
nother
words,[Link]
d, thecardinalitywouldbeN:1.
Rule#3:Naryrelationshiptypesbecomevertextypes
[Link]
ome vertextypesinthepropertygraphmodel.
Thenameoftherelationshiptypebecomesthelabelofthevertextype.
Theassociatedattributesbecomethepropertiesofthevertextype.
Thenewvertextypeincludesedgestothevertextypescorrespondingtotherelatedent
ity types(seeexample).Theseedgetypesarelabeledaftertheroleoftheparticipating
[Link]'tmatterforanyoftheseedges.
HereisanexampleshowingtheternaryrelationshipInvitestranslatedtothevertextyp
e Invitation:
ThecardinalityinthegraphschemaisN:1becausetheInvitationtoPagerelationshipi
sanN:1
relationship,[Link],aninvitationcouldbeissue
dto1
page,andapage(invertex)[Link]
omeof
theroletypes,likeinvitee,[Link]
itywill be1:N.
Fgure8:IntesraonspconvertedtoanIntaonvertextype
Wehaven'tshowntheprocessforweakentitytypesandidentifyingrelationshiptypes
but
[Link]
more
forgivingthanrelationaldatabasesinthattheyallowtwoverticestohavethesamelab
eland
[Link]
fying relationshiptypesintothepropertygraphmodel.
Conversionexample
[Link],thi
s
diagramprovidesenoughinformationforanapplicationdevelopertoworkwiththeg
raph database.
Fgure9:GraphschemaforUserPageTagERdagram
Thisisthe"logicalmodel"fortheexampleconceptualmodelintroducedinthefirstfig
[Link]
cantweakthismodelfurtherbyrenamingthelabels,changingdirectionsoftheedges,a
ndso [Link].
Verticesarevertices,andedgesare...edges
[Link],"Joebought
a
headphoneatTarget"isanexampleofa"Bought"relationshipthatrelatesaUsertoaPr
oduct
[Link],notedges(unlessyouareus
ing hypergraphs).Hencewethinkitismseadng
tothinkofedgesasrelationshipsandvertices asentities.
Chapter4
NormalizingGraphSchemas
Thissectionlooksathow graphschemascanbemanipulatedandtransformedto
[Link]
relationaldatamodels,typicallyperformedtonormalizeordenormalizearelational
schema.
Normalizationofrelationaldatabases
Thegoalofdatabasenormalizationismakesurethatrelationalschemasareeasytomo
dify,
easytoextend,[Link]
ous
normalforms,suchas1NF,2NF,andsoon,defineconstraintsthatatablemustsatisfyto
be
[Link]
mathematical,[Link]
examplefromtheWikipediapageon3NF:
Thepreviousfigurebreaksupthetournamentwinnerstableintotwotables,onewithp
layer
[Link]"functionaldependenc
ies"and
"nonprimeattributes"arehardtoremember,buttheprocessofsplittingandmergingta
bles
[Link],iftherewasanexistingtablewhichh
adone rowperplayer,we'dprobablymovethe"dateofbirth"tothattable.
Transformationrulesthatproduceequivalentschemas
Thissectionlistssometransformationrulesthatproduceequivalentgraphschemas.
Agraph schemaisequven
toanothergraphschemaifthedatastoredinoneschema,alongwith
theapplicationsthataccessit,canbeportedtotheotherschema,[Link]
rules arelikesplittingandmergingtablesinrelationalmodels.
Thetransformationrulesinthissectioncanbemechanicallyappliedtoanyschema,an
dhas
[Link],youcouldsi
mplify thesemanticsandimprovetheusabilityofyourgraphmodel.
RuleA:Renamingpropertiesandlabels
Thisruleconsistsofthreetransformationsthatresultinequivalentschemas:
Anyvertexlabelcanberenamed,solongasthenewnamedoesn'trefertoanexisting
vertexlabel.
Anyedgelabelcanberenamed,solongasthenewnamedoesn'trefertoanexistingedge
labelbetweentheoutandinvertextypes.
Anyvertex/edgepropertycanberenamedsolongthenewnamedoesn'trefertoan
existingpropertyofthevertex/edgetype.
Thefollowingfigureillustratessomeexampleapplicationsofthisruleonvertexande
dgelabels:
Fgure10:Renamngproperesandabes
Theschemashowninthetopisasimplegraphschemashowingfamilyrelationships.T
his schemaistransformedtotheschemashowninthebottom ofthefigureusingthe
followingtransformations:
VertexlabelsManandWomanarerenamedtoMaleandFemale.
Edgelabelsmother(2instances),father(2instances)arerenamedtoparent.
Eventhoughitseemslikesomeinformationislostbyrenamingmother/fathertoparent
,this
isn'ttruebecausethevertexlabelsattheendpoints(Male/Female)havethatinformati
[Link]
sametransformationwouldn'tbesoobviouswhilelookingataninstanceofthisgraphl
ikethe Kennedyfamilytree.
Notethatyoucannotrename'wife'to'parent'[Link]
e alreadyexistsaparentedgetypefromMaletoFemale.
RuleB:Reversingedgedirections
Thisrulestatesthatanedgetypecanbereversedprovideditisaselfloop,orthereisnoe
dge
[Link]
rsedas well.
Fgure 1:Reverngedgedrecons
Thefollowingfigureillustratesanexampletransformationusingthisruleandtheprev
ious [Link]:
The'wife'edgeisrenamedto'husband'(ruleA)andthenreversed.
Eachparentedgeisrenamedto'son'or'daughter'andreversed.
[Link]
, JFKJrparent>[Link]>JFKSr.
Youcouldalwaysrenamethefour'son'and'daughter'edgetypes,to'child'usingruleA
. Again,noinformationislostsincethevertexlabelsarestillunique.
Youwould,however,notbeabletorename'husband'to'son'.Youcouldrename'husba
nd'
to'daughter'(thoughabsurd).Theapplicationwillhavetointerpret"maledaughters"
as
[Link],youwouldnotbeabletoreverseit
s direction.
Asyoucanseealready,someapplicationsoftheserulesmaybequitehardtoderiveify
ouare thinkingintermsofgraphinstances,ratherthangraphschemas.
RuleC:Propertydisplacement
Fgure12:Propertydsacement
Thisrulestatesthatapropertyonanedgetypecanbemovedtoeitheradjacentvertexty
pe,
[Link]
extype
canbemovedtoanadjacentedgetypewithlookacrosscardinalityof1,providedthee
dge alwaysexistswhenthepropertyexists.
Theadjoiningfigureclarifiestherule,wherethe'dateOfBirth'propertyismovedtoth
e
'mother'relationshipbecausethereisexactlyonemotherrelationshipperMan/Wom
anandit
[Link]'dateOfBirth'to'deliveryDa
te',one couldarguethatthepropertybelongsintheedgeandnotthevertex.
NotethataMan'sdateOfBirthcannotbedisplacedtothewiferelationshipbecausetha
t
[Link]
rly,
thedateOfBirthintheedgetypelabeled'mother'fromMantoWomaninthebottomsche
ma, cannotbemovedtoWomanbecauseofthecardinalityrestrictionsintherule.
Usingthisrule,youcanmovethepropertiesaroundtheschematocomeupwithabetter
[Link]
graph
[Link],ifagraphdatabaseonlysupportsindexesonvertexpropertie
s,you
[Link],ifagraphdata
base
supportsvertexcentricindexesbasedonpropertiesonadjacentedges/vertices,youc
anuse thisruletobringtheindexedpropertyclosertothevertextypeofinterest.
RuleD:Specializationandgeneralization
Thisrulestatesthat:
AnyvertextypecanbedividedintotwodisjointvertextypesbasedonaBooleanteston
thepropertiesandadjacentedgelabelsofavertexbelongingtothattype.
AnyedgetypecanbedividedintotwodisjointedgetypesbasedonaBooleanteston
thepropertiesandadjacentvertexlabelsofanedgebelongingtothattype.
Fgure13:Generizaon
Inotherwords,ifweprovideabooleanfunctionthatcangiveaT/Fresultgivenavertex
/edge, wecanusethatfunctiontodivideavertex/edgetypeintotwodifferenttypes.
Thereverserulestatesthat:
Anyvertex/edgetypecanbemergedintoanothervertex/edgetypeprovidedthereisa
Booleantestthatcandistinguishitsvertices/edgesfromthemergedvertices.
Theadjoiningfigureshowsanexampletransformationinvolvingthefollowingsteps
:
MaleandFemalearegeneralizedasPerson,becausethebooleantest,sexequals'M',
candistinguishMalefromFemale.
Afterthat,sonanddaughteredgetypesaregeneralizedaschildbecausethebooleante
st, sexofinvertexequals'M',candistinguishsonfromdaughter.
Thisruleisusefulinincreasingthespecificity,orreducingthecomplexityofthegraph
schema.
Asageneralprinciple,itisbettertousethisruleforspecialization,we.e.,increasingth
e
specificity,becausethatallowsthedifferentvertexandedgetypestoembracediffere
nt
[Link],thereareinstanceswhe
rethe
differencesbetweenthevertextypesaresominorthatspecializationonlyresultsinap
plication
[Link]
eto Person.
RuleE:Edgepromotion
Fgure14:Edgepromoon
Thisrulestatesthatanedgetypecanbe promoted
toavertextypebyaddingtwo"out"edge
[Link]
ype.
ThecardinalityofthenewedgetypesareN:1or1:1dependingonthelookacrosscardi
nalityof theoriginalendpointvertex'stype.
Notethatthedirectionofthenewedgetypescanbechangedusingontherenameand
[Link]"out"directiontosimplifythewayinwhich
cardinalityforthenewedgestypesisderived.
Theadjoiningfigureshowsthehusbandedgepromotedtoavertextypecalled'Marria
ge'. Theedgetypes'husband'and'wife'pointtothetwoendpointsofthevertextype.
TheedgepromotionruleisusefulinapreparingbinaryrelationshiptobecomeanNary
relationship.
Thereverserulestatesthatanyvertextypewithtwopropertylessedgetypes,withsam
eside cardinalityofexactly1,canbedemoted
[Link]
[Link](ruleC)tom
ove propertiesoutofedges.
RuleF:Propertypromotion
Fgure15:Propertypromoon
Thisrulestatesthatanygroupofpropertiescanbepromotedtoanewvertextypewithth
ose
properties,providedthenewvertextypehasedgesconnectingittoallexistingvertext
ypesthat
includethepropertygroup.Thesamesidecardinalityofthenewedgetypeis1.
Theadjoiningfigureshowsthe'sex'[Link]
extype
[Link],eve
ryperson
inthenewgraphwillhaveanoutgoing'isa'edgetooneofthetwonewvertices.
Thisruleisequivalenttothesplittingofarelationintotworelations,asshowninthefirs
t
[Link],typicallyonesthatrepeat,canbepromo
tedtoa vertex.
Whileapplyingthisrule,itisbettertoincludeallvertextypesthathavethesamegroupo
f
[Link],ifthereisa'sex'propertyinadifferentAnimaltype,itisbetter
to
[Link]
up, youcanfirstpromotethoseedgetypestovertices.
Thereverseofthisruleisthatavertextypethathaspropertylessedgetypeswithsamesi
de
cardinalityof1,[Link]
smust
[Link]
atable
intherelationalmodel,whichisusefultoreducethenumberofjoins(ortraversalsinth
ecase ofgraphdatabases).
RuleG:Propertyexpansion
Fgure16:Propertyexpanson
Thisrulestatesthatapropertyofavertextypethatrepresentsalistofvaluescanbemov
ed
[Link]"in
"
edgetypefromtheexistingvertextypewithcardinality1:[Link]
wsthis ruleappliedtothenicknamepropertywhichholdsalistofStrings.
Thereverserulestatesthatanyvertextypewithexactlyonepropertylessedgetypewit
h
lookacrosscardinalityofexactly1canberemovedaftermovingitspropertiestoalisti
nthe adjacentvertextype.
[Link],how
ever,many
[Link]
g nicknamesasaListoraseparatevertextypeisuptothedesigner.
Summary
Rulebasedschematransformationsaretoolsthatadatamodeldesignercanusetorew
ritea
graphschema,[Link],adatamo
del
designercanusetheserulestoselectthedirectionsofedges,thenamesofdifferentlabe
ls
andkeys,thelocationsofvariousproperties,[Link]'tmatterfro
man
pureinformationperspective,butcouldmakeabigdifferenceintheusabilityandeffic
iency.
Inthatsense,adatamodeldesignercangobacktoCodd'soriginalgoalsfornormalizat
ion designingschemasthatareeasytomodify,easytoextend,informativetousersand
supportiveofvariousquerypatterns.
Chapter5
Theprevioussectionlistedsevenrulebasedschematransformationssuchasrenamin
glabels,
reversingedges,promotingedgesandpropertiestovertices,[Link]
d transformationscanbemechanicallyappliedto
anygraphschema,withoutlosingany
[Link],agraphdatabasedesignercanstartwith
a designgeneratedfromanentityrelationshipmodelandtweakittogetafinaldesign.
Fgure17:Exampegraphschema
Theabovefigureshowsanexamplegraphschemadescribingconstraintsonthegraph
data modelsuchas:
Whatarethelegallabelsforvertices?
Whatarethelegaledgelabelsbetweentwovertextypes?
Whatarethelegalpropertykeysandvaluetypesateachedgeorvertextype?
Thereality,however,isthatagraphmodelcouldhaveotherconstraintsthataren'texpr
essed
[Link],the'inviter'edgeineveryInvitationmustbetotheUserwh
ohas
an'owns'edgetothe'page'[Link]'tcapturedintheab
ove schema.
Thequestionis:Howcanwemodelcompexconstrntsnagraphmode?
Graphuniverses,transformationsandequivalence
Agraphunverse
Uisasetofgraphs,[Link]
datamodelinthesensethatitcaptureseveryvalidgraphthatbelongstothedatamodel.
Redenngequvenceusngtransformaonfuncons
Fgure18:Annverbefuncon
AgraphtransformationTisafunctionthattakesgraphsfromoneuniverseUto
[Link],T:U→ V.
AuniverseUisequivalenttoauniverseVifthereisatransformationfunctionT:U →
V,
[Link]
establishaonetoonecorrespondencebetweentwosets,whichinthiscasearegraph
universes.
Inotherwords,givenanygraphG∈U,wecanuseT(G)togetagraphG'∈[Link]
usethe inversefunctionT1
(G')[Link].
Aprogrammngperspecve
Ifweareupgradingfrom onegraphmodeltoanother,thetransformationfunctionisthe
upgradescp [Link]
downgradescp
,thenwehavetwoequivalentmodels(oruniverses).Inotherwords,two
graphmodels,representedasuniversesorschemas,areequivalentiftheyareforwar
dand backwardcompabe .
Derivedtypes
[Link]
ecalleda devedvertextypenU
,ifeverygraphG∈Uissuchthatitsvertices(andadjacentedges)belongingto
thevertextypecanbecalculatedfromtherestofthegraph.
Inotherwords,givenanygraphintheuniverseU,afterweremoveallverticescorresp
ondingtothe
derivedvertextype,[Link]
dgeand
[Link]
dingraph
schemas,butarespecifictographuniversesthatarecompatiblewiththatschema.
Metarule:Addingandremovingderivedtypes
Finally,hereisthemetarulebehindallschematransformations:
GivenanygraphuniverseUcompatiblewithaschemaS,wecanaddaderivedvertex/edge/propertytypetoproduce
an equivalentgraphuniverseVcompatiblewiththeschemaS∪{derivedtype}.
Thereverserulestatesthat:
GivenanygraphuniverseUcompatiblewithaschemaS,wecanremoveaderived
vertex/edge/propertytypetoproduceanequivalentgraphuniverseVcompatible
withtheschemaS{derivedtype}.
Fgure19:Modfedgraphschema
The'invitee'edgetypeinthegraphschemashowninthefirstfigureisaderivededgetyp
e. Thisisbecausethe'invitee'edgescanbecalculatedbygoingfrom
theInvitationverticesto
thePageandbacktotheuserthrough'owns'edge(reversedirection).Wecansimplifyt
he
originalschematotheversionshownintheadjoiningfigurebyapplyingtheserules:
(Metarule)Removederivededgetype'invitee'
(Edgepromotion)DemotethebinaryrelationshipInvitationtoanedgecalled'invited
'.
Asyoucansee,theupdatedschemaissimplerthantheoriginalschemaderivedfrom
anER diagram.
Provingthemetarule
Themetaruleiseasyto [Link]
transformationfunctiontoremoveaderivedtypesimplyremovesallelementsthatbel
ong [Link]
theremaininggraph.
Hencetheuniversewiththederivedtypeisequivalenttotheuniversewithoutit.
Provingthe7rules:Renaming,Reversing,PropertyDisplacement,...
[Link]
s:
Anyedgelabelcanberenamed,solongasthenewnamedoesn'trefertoanexisting
edgelabelbetweentheoutandinvertextypes.
Wecanprovethisintwosteps:
Addderivededgetypewiththenewnameasacopyoftheoldedgetype.
Removetheoldedgetypewhichisnowderivablefromthenewedgetype.
Ofcourse,step1requiresthattheedgetypewiththenewnamedoesn'talreadyexistinth
e
[Link],alledgesoftheedgetypecan'[Link]"a
slong asthenewnamedoesn'trefertoanexistingedgelabel."
Inthismanner,wecanproveeachrulebyperformingsomestepstofirstaddnewderive
d typesandthenremovetheexistingtypeswhichbecomederivedtypesthemselves.
Beyondtransformationrules
Thinkingintermsofgraphuniverses,derivedtypesandtransformationfunctionslets
usdo
[Link]
eserules ortransformationsdependsonouroverallstrategyfordatamodeling.
Onestrategyistominimizethenumberofimplicitconstraintsnotcapturedbythesche
[Link]
instance,theschemashowninthesecondfiguredoesn'thavetheimplicitconstrainton
the'invitee'
[Link],fewerimplicitconstraintsmeanslessdu
plicationof
[Link]
onin relationaldatabases.
[Link]
shave
beenpopularizedby"denormalization"techniquessuchasdimensionalmodeling.F
orinstance, wecouldadda"shortcut"derivededgetypecalled'latest'from
UsertoPagetoshowthelast
[Link]
tof
[Link]
nthe graphmustbedesignedwiththeseconstraintsinmind.
Summary
Thissectionintroducedsettheoreticrepresentationsofgraphmodelscalledgraphun
iverses,
[Link],thissectionshowedthattwo
graph
universesareequivalentifthereisaninvertiblegraphtransformationfunctionbetwe
enthem.
Finally,thissectionshowedthatallschematransformationrulespresentedintheearli
er
sectioncanbederivedfromonemetarulethatdealswithaddingandremovingderived
types.
Validatinggraphschemas
Thelastfew sectionshavediscussedhow
propertygraphschemascanhelpdesigngraph databasesfrom
[Link]
readingthisthreadontheGremlinusersgroup,werealizedthatitiseasytovalidategra
phs againstschemaswithGremlinandGroovy.
Fgure20:Tnkergraphschema
ThisgistonGithubshowshowyoucantakeaninstancegraphandchecktoseeifitiscom
patible
[Link]
xandedge
[Link]'sthecodetocreateaschemagraphinsideaGremlinshellfortheclassicTi
nkerpop schemashownhere:
sg=newTinkerGraph()
person=[Link]()
[Link]('_label','person')
[Link]('name','[Link]')
[Link]('age','[Link]')
software=[Link]()
[Link]('_label','software')
[Link]('name','[Link]')
[Link]('lang','[Link]')
knows=[Link]('knows',person)
[Link]('weight','[Link]')
created=[Link]('created',software)
[Link]('weight','[Link]')
[Link]('_minIn',1)//Someonemustcreatethesoftware
ThepropertieshavevaluescorrespondingtotheJavaClassofthepropertyvaluesinth
e
[Link]'?'toindicatethepropertyisoptional.T
he
edgesintheschemagraphcanhave4specialproperties,viz._minIn,_maxIn,_minOu
tand _maxOuttoindicatecardinalityrestrictionsforvariousedgetypes.
Anyinstancegraph,g,canbevalidatedagainsttheschemastoredinsg,usingtheGreml
in script:
[Link]({checkVertex(it,sg)})
YoucanlookatthefullGithubgisttoseehowthevalidationisdone.
ThecurrentversionofTinkerpopdoesn'[Link]
the vertextothevertextypeisspecifictothegraph,likethis:
vertexType={v,sg>.age?
sg.V('_label','person').next():sg.V('_label','software').next()}
Mostgraphschemastypicallyhaveapropertynamed'type'thatwouldmakethismapp
ing easier.
HoweverwithTinkerpop3,thismethodcanbestandardizedtousethelabel:
vertexType={v,sg>sg.V('label',[Link]).next()}
Pixy:Firstorderlogicongraphdatabases
TheprevioussectionshaveshownthatanyERmodelcanbeconvertedtoapropertygr
aph
schema,[Link],onekeyquestio
n remains:
Dographdatabasesofferthesamequerngcapalitesasraonaldatabases?
Inotherwords,[Link]
candatastuffedinthisfashionbequeriedeffectively?
Thisisthesubjectofthissection.
Background
OnSQL
SQListhequerystandardforrelationaldatabases.Itfirstappearedinthe1970sandw
as
[Link]
[Link]
showedthatrelationalalgebraisequivalenttorelationalcalculus,aformoffirstorde
[Link] theoremisthebedrockofSQL'sexpressivepower.
Onfirstorderlogic
Usingrelationalalgebra,wecanwriteanyqueryoftheform"Findallrowsfromtables
A,B,C,..., matching somepredcat
",aslongasthepredicatecanbeexpressedinfirstorderlogic.
Specifically,thepredicateisformedusing:
variouscomparisonsonrowsandcolumns, logicaloperations"and"(∧),"or"
(∨)and"not"(¬),and
theuniversal"forevery"(∀)andexistential"thereexists"quantifiers(∃)thatop erateonrowsofagiventable.
Let'sconsidertablesnamedperson,[Link]"find
me peoplewhoownonlyBMW
cars,buthaveatleastonespeedingticket".Thepredicatecanbe writtenas:
my_query(person)=
(∀car,personownscar∧[Link]='BMW')∧(∃ticket,personhasticket)
OnGremlin
[Link]
rks
[Link]:
GremlinWikionGithub
GremlinDocs
ThePathologicalGremlin(presentation)
[Link].g.,somethinglike"findthefriendofafrie
ndof
vertexv"[Link]('friend').out('friend').Thisstyleoftraversalwithver
ticesand edgesisn'tnaturalinSQLwithtuples.
ThedeclarativequeryingstyleofSQLis,however,differentfrom
Gremlin.TheSQL2Gremlin
[Link]'tobvious.
Pixy:FirstorderlogicwithGremlin
Pixyisabridgefrom
[Link]
[Link]"Findverticesandedgesthatmatchsom
e precat "wherethepredicateisformedby
variouscomparisonsonvertexandedgeproperties, logicaloperations"and"
(∧),"or"(∨)and"not"(¬),and
theuniversal"forevery"
(∀)andexistential"thereexists"quantifiers(∃)thatoperateon verticesandedges.
PixyqueriesareexpressedusingPrologrules,[Link]
asHorn clauses.
ProloglikeSQLhasthefullexpressivepoweroffirstorderlogic.
Let'stakethepredicatefromtheearlierdiscussion,
my_query(person)=
(∀car,personownscar∧[Link]='BMW')∧(∃ticket,personhasticket)
Let'ssaythatwerepresentpeopleasverticeswithoutgoingedgetypesnamed'car'and
'ticket'
[Link],wecouldexpresstheabovepredicateu
sing Hornclausesasfollows:
my_query(Person,Ticket):out(Person,'ticket',Ticket),
not(not_all_bmw(Person)).
not_all_bmw(Person):out(Person,'car',Car),
property(Car,'make',Make),
Make<>'BMW'.
[Link]
∃[Link]
[Link]∀[Link],saying"everycarisaBMW"isthesamea
ssaying"thereisno carthatisn'taBMW".
ERmodelsinPixy
IfyouuseanERmodelasastartingpointforyourdesign,youcanreconstitutetheERmo
del from
[Link]
entitiesnamedUser,PageandTagandrelationshipsnamedOwns,InvitesandTagged
As.
Fgure21:ERmodelforUserPageTagappcaon
Thiswastranslatedtoagraphschemawithfourtypesofvertices,[Link],Page,Taga
nd Invitation.
Fgure
2:GraphschemaforUserPageTagappcaon
Now,wecanreconstitutetheERmodelfrom
thegraphschemausingPixywiththefollowing clauses:
%Entities
user(User,Name,Login):property(User,'name',Name),property(User,'login',Log
in). page(Page,Uri,Html,CreateTs):property(Page,'uri',Uri),...
tag(Tag,Hashtag,Description):property(Tag,'hashtag',Hashtag),...
%Relationships
owns(User,Page):out(User,'owns',Page).
taggedAs(Page,Tag):out(Page,'taggedas',Tag).
invites(Invitation,Inviter,Invitee,Page):
out(Invitation,'invitee',Invitee),
out(Invitation,'inviter',Inviter),
out(Invitation,'page',Page).
[Link]
vertices,
[Link],yougetthefullpowerof
[Link],anyfirstorderpredicatethatappliest
o entitiesandrelationshipscanbewrittenasaPixyquerythatusestheaboveclauses.
Let'stakeanexamplepredicatethatmatchesallusersinvitedtopagestagged'tinkerpo
p' [Link]:
tinkerpop_invitee(User,Page):invites(_,_,User,Page),
page(Page,_,_,CreateTs),
CreateTs>1388534400L,%Unixtimestampfor1/1/2014
taggedAs(Page,Tag),
tag(Tag,'tinkerpop').
Notethat'_'isusedtorepresentanonymousvariables.
Queryrequirementsdon'tusuallymatterwhilemodeling
Itisn'tsurprisingthatqueriesinfirstorderlogiccanbecompiledtoGremlin,sinceGre
mlinis
[Link]
nan
ERmodeltosomethingthatexecutes"efficiently"onthecorrespondinggraphdataba
se.
By"efficiently",wemeanthatthePixy/Gremlinquerywillalwaystraverseedgestog
ofrom
oneentity/[Link]
typicallyordersofmagnitudefasterthanindexbasedjoinsinrelationaldatabases.
Queriesonproperties,willofcourse,[Link]
asyour
startingERmodelisaccurate,yourapplicationwillnothavetosimulatejoinsusingth
ese
[Link],thegraphschemadesignisindependentofthequery
requirements.
PartII
Chapter8:IntroductiontoDatabase
DatabaseSystemsevolution:Databasesanddatabasetechnologyarevitalto
modernorganizationssupportingboththedailyoperationsanddecisionmaking.
[Link]
dominancetotheenterpriseDBMSmarketplacebyOracle,theindustryremains
highlycompetitivewithacontinuedhighlevelofinnovation[12].
Figure1:Evolutionofdatabasetechnology
Majorperiodsofdatabasetechnologyevolution[12]:
1stGeneration(1960’s):Fileoriented–Supportedsequentialandrandom
searchingoffiles,buttheuserwasrequiredtowritecomputerprogramsto
[Link]
duringthisperiod.
2ndGeneration(1970’s):Navigational–Couldmanagemultipleentitytypes
[Link] [Link] standards.
3rdGeneration(1980’s):Relationalwithnon-proceduralaccess–Foundation
based on mathematical relations and associated operators. Optimization
technology was [Link] performed pioneering researchtoenablecom-
mercializationofrelationaldatabasetechnology.
4thGeneration(1990’s+):Objectoriented–Areextendingthebound-aries
[Link] kindsofdistributedprocessinganddata
[Link]
[Link].
DBMSmarketplace:DespitedominancetotheenterpriseDBMSmarketplaceby
Oracle,withmorethan40% overallmarketshare,theindustryremainshighly
[Link],its
competitionisMicrosoftSQLServer,IBM DB2,Teradata,SAP [Link]
sourceDBMSproductshavebeguntochallengethecommercialDBMSproducts
[Link]-source
DBMSisleadedbyMySQL,followedbyMongoDB,[Link]
he desktopDBMSmarket,MicrosoftAccessdominatesbecauseofthedominanceof
MicrosoftOffice.
Figure2:DBMSmarketplace
Innovationintheindustry:TheadvancesinDBMSinrecentyearssupportbusiness
[Link]
technologyhasbeendevelopedtosupporttheneedsofBigData,tobemodernwebsca
le [Link] 2009,the mostaccepted definition ofNoSQL isnext
generationdatabasesbeingnon-relational,distributed,open-sourceandhorizon-
tally [Link]-
free,scalability,global
availability,easyreplicationsupport,simpleAPI,eventuallyconsistent/BASE(not
ACID),andlargescaledata.[5][19]
TypesofDBMS
[Link]-Enginesis
aninitiativethatprovidesinformationonthepopularityoftheDBMSavailablein
[Link],whichare
updatedmonthly.
Figure3:DBMSdevelopedbydatabasemodelpiechart
Overthoselines,apiechartrepresentsthecategoriesofDBMSthatcomprisemore
[Link],wh
ere [Link]-
valuestores,with63systems,
Documentstores,with43systems,andGraphDBMS,with27systems.
Intheoverallclassificationofdatabasemodels,thoseDBMStypesaredistinguished
. TypesofDBMS:
RelationalDBMS GraphDBMS
Key-valuestores TimeSeriesDBMS
Documentstores RDFstores
ObjectorientedDBMS(Atkinson) NativeXMLDBMS
Searchengines Contentstores
MultivalueDBMS EventStores
Widecolumnstores NavigationalDBMS
Abovetheselines,[Link]
insteadofcountingthesystemsdeveloped,thedatabasemodelsarerankedbypop
-ularity,[Link]
relationalDBMS,the79.5%,followedbydocumentstores,7.3%,searchengines,
4.3%,key-valuestores,3.5%,widecolumnstores,3.1%,andgraphDBMS,1.1%.
Belowtheselinesapiechartrepresentsthemostrecentpopularityrank.
Figure4:DBMSpopularitybydatabasemodelpiechart
Inthepiechartabove,itiscleartoseethatRelationalDBMSaretheonesusedby
[Link],thestateoftheartischangingbytheinnovationsinthe
[Link] thoughthepercentagesofpopularityofNoSQL
databasesareminimalcomparedtoRelationalDBMS,thefactthattheyarerecent
technologiesingrowthisenoughtoevaluatethemmoredeeply.
NoSQLDBMS
DocumentStores:Therecordsstoredarecalleddocuments,whichconsist
[Link].
[18]Examples:Elastic,MongoDB,AzureDocumentDB
WideColumnStores:WhileRDBMSstoreallthedatainaparticulartable’s
rowstogetheron-
disk,beingabletoretrieveaparticularrowfast,Columnfamilydatabasesareabletor
etrievealargeamountofaspecificat-tribute
fastbyserializingallthevaluesofaparticularcolumntogetheron-disk. This
approach is useful for aggregate queries. [18] Examples:
Hadoop/HBase,Cassandra,AmazonSimpleDB
GraphDatabases:[Link]-ture
consistofconnections,oredges,[Link]
[Link]
[Link]
downsideisthattheygenerallyrequirealldatatofitononemachine,limiting
theirscalability.[18]Examples:Neo4J,InfiniteGraph,TITAN
Atomicity:Alloperationsinatransactionsucceedoreveryoperationis rolledback.
Consistent:On the completion ofa transaction,the database is
structurallysound.
Isolated:[Link]
moderatedbythedatabasesothattransactionsappearto runsequentially.
Durable:Theresultsofapplyingatransactionarepermanent,eveninthe
presenceoffailures.
However,NoSQLdatabasesbreakwiththetopicalityofSQLmodelswithACID
[Link] toadequatebettertomostNoSQLdatabases,
andtheyareasfollows:
BasicAvailability:hedatabaseappearstoworkmostofthetime.
Soft-state:Storesdon’thaveto bewrite-consistent,nordodifferent
replicashavetobemutuallyconsistentallthetime.
Eventualconsistency:Storesexhibitconsistencyatsomelaterpoint(e.g.,
lazilyatreadtime).
ACIDtransactionscanbeconsideredstricterthanneededformanyNoSQLcases,
[Link],BASE
transactionsguaranteesscale and [Link] BASE modelisused by
aggregatestores,suchascolumnfamily,[Link]
contrast,graph databases use the ACID [Link] databases promise
availabilityofthedataattheexpenseofdataconsistency(theconsistencyofthe
dataisonlyassuredatconcretesnapshots).[16]Graphdatabasesdifferentiate
themselvesfrom otherNoSQLdatabasesbyfocusingmoreondataconsistency.
Thecomparisonmadeinthelinesaboveisshowninatablebelow:
ACID
[Link]
adoptionofthisDBMStypeisanimportantfactorforchoosingitasthemainsystem
[Link],currenttrendsshowthatthefourmaintimesofNoSQL
[Link]
moreobjectivepointofviewofthebenefitsofusingeachmodel,theusecasesfor
whichtheyperform betterandtheonesforwhichtheyperform theworst,are
listedbelow.
Usecasesforrelationaldatabases
Positiveusecases:transaction-orienteddatabases(bankingapplications, on-
linereservations),wheretheconcurrencyofmanytransactionsmust besup-
portedandtheintegrityofthedatamustbeprotected.
Negativeusecases:datawarehouses,whichareanalytically-oriented
[Link]
constraintsoftherelationaldatabasewouldn’tsupportthescalability.
Usecasesforkey-valuestores
Positiveusecases:
–Forstoringusersessiondata
–Maintainingschema-lessuserprofiles
–Storinguserpreferences
–Storingshoppingcartdata
Negativeusecases:
–Toquerythedatabasebyspecificdatavalue
–Withrelationshipsbetweendatavalues
–Tooperateonmultipleuniquekeys
–Ifthebusinessneedsupdatingapartofthevaluefrequently
Usecasesfordocumentstores
Positiveusecases:
–E-commerceplatforms
–Contentmanagementsystems
–Analyticsplatforms
–Bloggingplatforms
Negativeusecases:
–Toruncomplexsearchqueries
–Applicationrequirescomplexmultipleoperationtransactions
Usecasesforwide-columnstores
Positiveusecases:
–Contentmanagementsystems
–Bloggingplatforms
–Systemsthatmaintaincounters
–Servicesthathaveexpiringusage
–Systemsthatrequireheavywriterequests(likelogaggregators)
Negativeusecases:
–Tousecomplexquerying
–Ifthequerypatternschangefrequently
–Withoutanestablisheddatabaserequirement
Usecasesforgraphdatabases[19]
Positiveusecases:
–Frauddetection
–Graphbasedsearch
–NetworkandIToperations
–Socialnetworks
Negativeusecases:
–DataWarehousessobigthatrequireBASEmodel
Figure6:PositionsofNoSQLdatabases(source:Neo4j)
Onthefigureabove,thefivetypesofDBMSthatwerebeingcompared,aredisplayeda
[Link]
concludedthateachoneofthoseDBMSworksforsomespecificusecases,
dependingontheamountandcomplexityofthedatathatisgoingtobestored.
Theirusecasesarenotoverlapped,whichjustifiesthatthefifthofthem must
beconsideredbeforeimplementingaDBMSinacompany.
Chapter9:GraphDatabases
Graphdatabasesaredatabaseswhosespecificpurposeisthestorageofgraphoriente
ddatastructures,thereforeanintroductiontographtheorytobeconsistentwhenusingi
tsterminology.
ConceptsofGraphDatabases
PositioningIthaspreviouslybeenexplainedthatNoSQLdatabasesaddresssev-eral
issuesthatrelationaldatabasesdonot:availabilityfortheprocessingoflarge
datasets,partitioning,flexibilityoftheschemaandmodellingandprocessingcomple
xstructuresliketrees,graphs,specializedinprocessinghighlyconnecteddata,
managingcomplexandflexi-bledatamodelsandimprovingtheperformanceof
complexqueriesbytraversingthegraph.
[Link]
figuresbelow,itcanbeappreciatedthedifferenceinmodelingthesameusecase
[Link]
moresimilartothebusinessmodel,whichmakesitmoreaccessibletonottechnicalpr
ofiles.[8]
(a)RelationalDatabaseModel (b)GraphDatabaseModel
Figure7:ModelComparison
Agraphisapictorialrepresentationofobjectswhichareconnectedbysome
[Link]:Nodes(vertices)and
relationships(edges).
WhatisGraphdatabase
Agraphdatabaseisadatabasewhichisusedtomodelthedataintheform
[Link]:
Nodes
Relationships
Properties
Nodes:Nodesaretherecords/[Link]
andpropertiesaresimplename/valuepairs.
[Link]
[Link].StoringdatainNeo4jis
similartoaddmorerecordsinotherdatabases.
Relationships:[Link].
Relationshipsalwayshavedirection. Relationshipsalwayshaveatype.
Relationshipsformpatternsofdata.
Properties:Propertiesarenameddatavalues.
PopularGraphDatabases
[Link]
OracleNoSQLDatabase OrientDB
HypherGraphDB
GraphBase
InfiniteGraph
AllegroGraphetc.
WhyGraphDB
Graphdatabaseisveryusefulnowadaybecauseingraphdatabasesdataexistin
[Link]
dataismorevaluablethanthedataitself.
Relationaldatabasesstorehighlystructureddatawhichhaveseveralrecordsstoring
thesametypeofdatasotheycanbeusedtostorestructureddataand,theydonot
storetherelationshipsbetweenthedatawhilegraphdatabasesstorerelationships
andconnectionsasfirst-classentities.
Thedatamodelforgraphdatabasesissimplecomparedtootherdatabasesand,
[Link]
yand operationalavailability.
GraphDBvsNoSQLDatabase
FollowingaresomepointswhichspecifywhyGraphDbisbetterthanotherNoSQLda
tabases:
[Link]
difficulttousethemforconnecteddataandgraphs.
Onewell-knownstrategyforaddingrelationshipstosuchstoresistoembedan
aggregate'sidentifierinsidethefieldbelongingtoanotheraggregate-effectively
introducingforeignkeys.
Butthisrequiresjoiningaggregatesattheapplicationlevel,whichquicklybecomes
prohibitivelyexpensive.
Seetheusecasesofdifferenttypeofdatabases:
Relationaldatabase:Itisrepresentedintabularformsoitisbestforcalculatingthe
income.
Key-ValueStore:Itisbestforbuildingashoppingcart.
NoSQLdatabases:Itisstoredasadocumentso,itisbestforstoringstructured
productinformation.
GraphDB:[Link]
pointAtopointB.
Neo4jDataModel
Neo4jDatabasefollowsthePropertyGraphModelforstoringandmanagingitsdata.
Neo4jisagraph
databasewhichcontainsthefollowingfeaturesofPropertyGraphModel.
TheGraphmodelcontainsNodes,RelationshipsandPropertieswhichspecifiesdat
aand itsoperation.
Propertiesarekey-valuepairs.
NodesarerepresentedusingcircleandRelationshipsarerepresentedusingarrowke
ys. Relationshipspecifiestherelationbetweentwonodes.
Therearetwotypesofrelationshipsbetweennodesaccordingtotheirdirections:
UnidirectionalandBidirectional
EachRelationshipcontainstwonodes:"StartNode"or"FromNode"and"ToNode"
or "EndNode".
BothNodesandRelationshipscontainproperties.
[Link]
relationshipwithoutadirection,itwillthroughanerrormessage.
TherearethreemainbuildingblockofaGraphDBDatamodel:
Nodes
Relationship Properties
FollowingisasimpleexampleofaPropertyGraph.
Figure8:SimpleGraph
Here,[Link]
gArrows.
[Link]'sdataintermsofProperties(k
ey-valuepairs).In
thisexample,wehaverepresentedeachNode'sIdpropertywithintheNode'sCircle.
Queryperformance
GraphdatabasescompetitiveadvantageIthasbeensaidthatgraphdatabaseshavea
reasontobebecausetheyoutperform [Link]
[Link]
casethatisbettersuited forgraph databasesis"find allentitiesofa kind"
([Link]).Theexecutionofsuchaquery,startswithanindexlookuptofind
thestartingnode(s)[Link]-versed
[Link],thebiggerthevolumeof
data,themoreitoutperformsrelationaldatabases.
Figure9:Queryexecutioningraphdatabases
[Link]
queryingthroughdifferenttables,followingforeignkeysandotherindexes,anditwo
uld
[Link]
rmed byfollowingphysicalpointers,whileforeignkeysarelogicalpointers.
[8]Thequeryinthe figure,includesthetimeofeachindex-
[Link],the
largertheexecutiontimewillbecome.
Figure10:Queryexecutioninrelationaldatabases
RelationalDatabasescompetitiveadvantageOntheotherhand,becauseofthe
internalstructureofthetables,relationaldatabaseswouldoutperform graph
databaseswhentheoutputrequiresalltheattributesofatable(findAll-like
queries).Itsidealusecaseistoaggregateoveracompletedataset.[8]
GraphdatabasesrankingBelow thoselines,thefigureshowstheDB-Engines
RankingonGraphDBMS.Neo4jleadstheranking,anditsscoretriplesthe
followingDBMS,MicrosoftAzureCosmosDB.Neo4jhasbeenleadingtheGraph
databasessectorforsomeyears,[Link]
betakenintoaccountthatthescoreisdisplayedinlogarithmicscale,thereforethe
differenceinpopularityisreallysignificant.
ItcanalsobeseeninthetrendscatterplotthatMicrosoftAzureCosmosDBappearedin
thegraphdatabaselandscapein2014,andsincethenitsrisein
[Link]
wellintegratedinthesoftwaremarketplace.
Successfactor:Ithasbeenstated,whencomparingtheNoSQLDBMS,thatgraph
[Link],itisacompetitiveadvantagetowork
[Link]
theyaccomplishedso,Neo4jseemstobetheDBMSthatmoresuccessfullyis
improvinggraphpartitioning.[8]
Figure11:GraphDBMSRanking
Figure12:TrendGraphDBMSpopularityscatterplot
Chapter10:Neo4j
NecessityofNeo4j
WhyNeo4j?ByusingagraphdatabaselikeNeo4jwhichfocusesondatarela-
tionships;
[Link]’s
growingbusinessdemandsandcompetitiveatmosphere,usingtherighttoolisvery
importantandwhenitcomestowidelyconnecteddataNeo4jisthebestbecauseitis
thousandsoftimesfasterthantraditionaldatabases.Neo4janalyzeandtraverseofall
datainrealtimeandgivestheresultsveryfast.Neo4jiswidelyusedbylotsofbig
companieslikeeBay,Walmart,Cisco,UBSandmanymore.
WhatisNeo4j?Neo4jisanopen-sourceNoSQLgraphdatabasewritteninJavaand
[Link],Neo4jiscurrentlyworld’slead-inggraph
[Link].FirstofallNeo4jprovidesACID transaction
compliance,clustersupport,runtimefailover,highavailabilityandhighspeedquery
ing [Link]
interfaceanditiseasytolearnbecausetherearelotsoffreeonlineresourcesonthe
[Link]
Neo4jisdesignedforlinkingrelationshipsandithandlesthisrelationshipswithspee
d,
ease,andextremeflexibility.WithNeo4j,modelscaneasilybeconvertedtodatabase
[Link]’sisneeded
forthedatathenNeo4jisthesolution..
Neo4jVersions
Graphdatabasesusesarelationshipfirstapproachtostoringandqueryingyourdata.
Theystoredatainamuchmorelogicalfashion,awaythatrepresentstherealworld
and prioritizes the representation,discoverability and maintainability ofdata
[Link]
relationshipsoACIDpropertywasbroughtbacktoatleastonenosqldatabasecalled
[Link]
businessdata.
Graphdatabasesgivesdevelopersamoreintuitivedatamodelfasterqueriesand
betteragilitytoadapttochangesinthebusiness.
Figure13:Neo4jAsaLeadingGraphDatabase
HowNeo4jisDifferentThanTraditionalDatabases?Graphdatabasesaremuch
[Link]
rowsandcolumns,graphdatabasesuseagraphwithnodesandrelationships.
[Link]
[Link]
relationshipsinrelationaldatabaseitcangetverycomplicatedwithjointablesand
joinqueriesandweneedallkindsofprimaryandforeignkeysanditcanbereal
hardtodealwithandevenworsethanthatisitcanbereallycostlyonthesystem
sographdatabasesarebuilttofixthatproblem andworkwithdatathatismuch
morecloselyrelatedandmoredynamic.
Thus,becauseofthereasonsstatedabovewechooseNeo4jasourdatabase.
Figure14:Ebay’scommentaboutNeo4j
Neo4jWorking
Neo4jstoresanddisplaysdataintheformofgraph.InNeo4j,dataisrepresentedbyno
desand relationshipsbetweenthosenodes.
Neo4jdatabases(aswithanygraphdatabase)arealotdifferenttorelationaldatabase
ssuchasMS
Access,SQLServer,MySQL,[Link],rows,andcolumn
stostoredata. Theyalsopresentdatainatabularfashion.
Neo4jdoesn'tusetables,rows,orcolumnstostoreorpresentdata.
Neo4jisbestforstoringdatathathasmanyinterconnectingrelationshipsthat'swhygr
aphdatabases
likeNeo4jhasanadvantageandmuchbetteratdealingwithrelationaldatathanrelatio
naldatabases are.
Thegraphmodeldoesn'[Link]
eatethe
databasestructurebeforeyouloadthedata(likeyoudoinarelationaldatabase).InNe
o4j,thedatais thestructure.Neo4jisa"schema-optional"DBMS.
InNeo4j,noneedtosetupprimarykey/foreignkeyconstraintstopredeterminewhichf
ieldscanhave
arelationship,[Link]
nodesyou need.
FeaturesofNeo4jGraphDatabase
SQLLikesimplequerydialectNeo4jCQL
It’sbackinguptheIndexesbyusingApacheLucence
ItcontainsaUItoexecuteCQLCommandsi.e,Neo4jDataBrowse
It’sbackinguptheUNIQUEconstraint
ItbolstersfullACIDproperties
ItutilizesNativegraphstockpilingwithNativeGPE
ItfollowsPropertyGraphDataModel
ItgivesRESTAPItobeexecutedforanyProgrammingLanguagelikeSpring,Java,
Scalaandsoforth
ItbolsterstradingofinquiryinformationtoJSONandXLSformat
AdvantagesofNeo4j
PropertiesofNeo4j
Figure15:GeneralLookatNeo4j
FollowingarepropertiesofNeo4j;
Datamodel(flexibleschema):[Link]
explainedlikegraphhasnodesandthesenodesareconnectedwitheach
[Link]-valuepairsknownas
properties.Neo4jhasalsoflexibleschemaitmeanspropertiescanbe
addedorremovedwhenitisnecessary.
ACIDproperties:Neo4jsupportsfullACID(Atomicity,Consistency,Isolation,and
Durability)rules.
Scalabilityandreliability:Databasecanbescaledbyincreasingthenumberofreads/
writes,andthevolumewithouteffectingthequeryprocessing
speedanddataintegrity.Neo4jalsoprovidessupportforreplicationfor
datasafetyandreliability.
Thetraversalofthegraph:Thetraversalistheoperationofvisitingasetof
nodesinthegraphbymovingbetweennodesconnectedwithrelationships.
It’[Link]
usingatraversalonlytakesintoaccountthedatathat’srequired,thereforeit
isnotneededtoquerytheentiredatasetinanexpensiveoperation,likeisthe
casewithjoinoperationsonrelationaldata.[1]
CypherQueryLanguage:Neo4jprovidesapowerfuldeclarativequerylanguagekno
[Link]
easytolearnandcanbeusedtocreateandretrieverelationsbetween
datawithoutusingthecomplexquerieslikeJoins.[9]
Built-inwebapplication:Neo4jprovidesabuilt-inNeo4jBrowserweb
[Link],creatingandqueryinggraphdatacanbedone.
Drivers:Neo4jcanworkwith
RESTAPItoworkwithprogramminglanguagessuchasJava,Spring, Scalaetc.
JavaScripttoworkwithUIMVCframeworkssuchasNodeJS.
ItsupportstwokindsofJavaAPI:CypherAPIandNativeJavaAPIto
developJavaapplications.
Indexing:Neo4jsupportsIndexesbyusingApacheLucence.
AdvantagesofNeo4jGraphDatabase
[Link]
[Link]
complexdataconnectionsasaresultoftheincreasedvolumeandstrengthinthe
data,[Link]
aretheadvantagesofNeo4j.
Easytorepresentconnecteddata:Itmakesbotheasyandfasttotraverseor
navigatelargeamountsofdatathathassomesortofrelationship
Canrepresentsemi-structureddataeasily:Datathatdoesnotfallintonatural
structurecanbeeasilyrepresentedinagraphdatabase
CypherCommands:Cyphercommandsarehumanreadableandveryeasy
tolearnSimpleandPowerfulDataModel:Thepropertygraphdatamodelis
[Link]
ndtheycancontaindataintheform ofkeyvaluepairsor
propertiesunliketherelationalmodel.
JoinAspect:There’snoneedforcomplexandcostlyjoinstoretrieveconnectedorrel
[Link]
[Link]
ortraversingagraphinvolvesfollowingthosepatsandbecauseofthat pathori-
entednatureofthegraphdatamodel,themajorityofpathbased
operationsareextremelyefficient.
Neo4jisonlygraphdatabasethatcombinesnativegraphstorage,scalable
architecture optimized forspeed,and ACID compliance to ensure
predictabilityofrelationship-basedqueries.[10]
Real-timeinsights:Neo4jprovidesresultsbasedonreal-timedata.
Highavailability:Neo4jishighlyavailableforlargeenterprisereal-time
applicationswithtransactionalguarantees.[15]
Biggestgraphcommunityintheworld:Neo4jhasthelargestandmost
contributorgraphcommunity.
Easytolearn:MatureUIwithintuitiveinteractionandbuilt-inlearning.[10]
PerformanceInNeo4j
Neo4jprovidesfastandefficientgraphexperienceandthestrongestpartofitis;Neo4j
[Link]-
ingdata sizedoesnoteffecttheperformanceofNeo4junlikerelationaldatabases.
VolkerPacher,eBaydeveloperandNeo4jclient:"OurNeo4jsolutionisliterallya
thousandtimesfasterthanthepreviousMySQLsolution,withsearchesthat
requirebetween10and100timeslesscode”.
Figure16:QuerytimesforOracleExadatavsNeo4j
Figure17:Tomtom’sComparisonofNeo4jwithMySQL
HowToIncreasePerformanceOfNeo4j?
Increasingthesizeofavailableheapmemory(Between8G-16G).
Increasingopenfilelimitfromdefault1024toatleast40000tobesure.
Inordertoavoidcostlydiskaccess,makingsureofrelevantgraphdatais
cachedinmemory.
Forthenon-Neo4jtasksrunningonthecomputerasufficientmemory
shouldbereserved.(Atleast16G)
Simplealgorithmsleadstoincreasedperformance.
Allrelatednodesandedgesshouldbekeptinservermemorybeforegiving results.
Traversalsshouldbeindependent.
Indexesshouldbeused.
WhatcanNeo4jbeusedfor?
ButthemainreasonNeo4jisbetterforrelationaldataisinthewayit
allowsyoutocreaterelationships.Neo4jisbuiltaroundrelationships.
Thereisnoneedtosetupprimarykey/foreignkeyconstraintsto
predeterminewhichfieldscanhavearelationship,andtowhichdata.
WithNeo4j,justaddanyrelationshipbetweenanynodewheneveryou need.
SothismakesNeo4jextremelywellsuitedforsocialnetworking
applicationslikeFacebook,Twitter,[Link]
[Link] Neo4jcanbeusedfor:
● Socialnetworks
●Realtimeproductrecommendations
●Networkdiagrams
●Frauddetection
●Accessmanagement
●Graphbasedsearchofdigitalassets
●Masterdatamanagement
CypherQueryLanguage
Cypherisadeclarativelanguageforworkingwithgraphsandgraphdataforboth
[Link]
Cypherdefinespatternsinthegivengraphdata.
Cypherisdeclarativelanguage:Thismeansthatwespecifythedatathatweare
[Link].
Cypherisveryhumanreadablelanguageanditisaccessiblenotjustfordevelopersev
eryonecaneasilylearnanduseit.
CypherhasexpressionssimilartoSQLlikeWHERE,ORDER BY andsimple
conditionstatementslike<,=,>.Itsdifferencewithsqlis;Cypherisdesignedto
representgraphdatapatternsforexampleithasMATCHpropertythispropertyis
builtonfindingandspecifyingpatternsinthedata
Structure
NodesNodesrepresentsdataentitiesandtheycanhavelabelsandeachnode
[Link]-tional
[Link]
shownwithparentheseslike(p:Product).
Figure18:NodeRepresentation
RelationshipsInCypher;betweenthenodeswehavelineswhichrepresentthe
[Link]
[Link]
[Link]–>betweentwonodes.
OperationsInCypher
Create:Itisusedtocreatenodesandrelationshipsbetweenthem
Wecreatedanoderepresentinguswithfiveproperties;
Name:’AjitSingh’
Country:’India’
City:’Patna’
CREATE(n:Person{name:’AjitSingh’,country:’India’,city:’Patna’,
DateOfBirth:’21.05.1984’,School:’PWC’})RE-TURNn
Name:’AnnaTuruPi’
Country:’Spain’
City:’Barcelona’
CREATE (n:Person{name:’AnnaTuruPi’,country:’Spain’,city:
’Barcelona’,DateOfBirth:’30.07.1995’,School:’PWC’})RETURNn
Wecreatedarelationshipcalled"FRIENDS_WITH"withtheproperty"SINCE";
WiththisCyphercode;
MATCH(a:Person),(b:Person)[Link]=’AjitSingh’[Link]=
’AnnaTuruPi’CREATE(a)-[r:FRIENDS_WITH{SINCE:"17/09/2017"}]->(b)
RETURNr
(a)ResultinConsole (b)AfterCreatingRelationship
Figure19:CreateRelationshipBetweenTwoNodes
Match:Matchfindsspecifiedpatternsinthedata.
Figure20:Relationships
WiththisCyphercodeweshowedallpeoplewhomEstebanZimányiteachesto;
MATCH(a:Person)<-[:TEACHES_TO]-(b:Person{name:’Este-
banZimányi’}) [Link]
Set:Thisisusedtoupdatepropertiesinthenodesandrelationships.
WiththisCypherCodewechangedEstebanZimányi’sdateofbirthto’01.01.1966’
MATCH(n{name:’EstebanZimányi’})[Link]=’01.01.1966’
RETURNn
DeleteThisoperatordeletesnodesorrelationshipsinthedata.
WiththisCyphercodewedeletedAjitSingh
MATCH(n:Person{name:’AjitSingh’})DELETEn
LoadingDataWithCypher
TherearelotsofwaystoimportdatainNeo4jbutthemostcommonwayisuploadit
asacsvfile.LoadCSVoperatorisbuiltintoNeo4jandthisoperatorisusedforsmall
[Link]
morethan10millionrecordsthanweshoulduse[USING PERIODICCOMMIT[n]]
[Link]
onerunandcreatingeverythinginonetransaction
LoadCSV:ThisoperatorisusedforimportingCSVfilesintoNeo4j.
Figure21:LoadCSVOperatorStructure
UseCasesofNeo4j
Figure22:UseCasesOfNeo4j
Thecommonusecasesare;
RealTimeRecommendations:Recommendationalgorithmsfindsrelationships
betweenpeople,productsandotherservicesrelatedtopurposebasedonuser’s
previousbehaviors.Neo4jisabletostoreinterconnecteddataaboutcustomers
andproductsandsinceNeo4jdoesn’tneedindexingateverysuggestionit
providesveryfastandeffectivealgorithm [Link]
usesNeo4jforthispurpose
MasterDataManagement:Inlargeorganizations,differentsystemsstoresinformati
onaboutcustomers,employees,[Link]
modelitiseasyto bring datafrom differentsystemscreateviewsabout
customersorcankeeptrackofalltheinformationabouttheorganizational
systemitself.CiscousesNeo4jforthispurposeandthecompanyalsousesNeo4j
fortheirhelpdeskso-lution
Figure23:MasterDataManagementGraphDesign
FraudDetection:[Link]-days
inordernottobedetectedbybank’sfraudalgorithmspeopleusedifferent
approacheslikeopenseveralbankaccountswithvalidinformationanddonormal
[Link]
[Link]
detectthatbehaviorbutitisveryeasytoseethatwithgraphbecausethepattern
ofthepeopleopeningbankaccountsusingthesameidentitytokencanbeeasily
detectedasapatterninagraph
GraphBasedSearch:Metadataisavailableforthingslikeproducts,articlesetc.
Andbeingabletomodelmetadataasagraphallowstoenhancesearchmeaning
[Link];When
searchisexecutedwedon’tseerandomoralphabeticalsortedresultswefirstsee
therelevantones.LufthansausesNeo4jforthismatter.
Network&ITOperations:Ifdatacenterismodelledasagraphthendepen-dency
analysiscaneasilybeappliedonnetworksystemstogetconclusionslikeifone
virtualmachinegoesdownhowmanyapplicationswillbeaffected.HpusesNeo4j
tomodeltheirnetworkforsomelargetelecommunicationproviders.
Figure24:NetworkITOperationsGraphDesign
Identity&AccessManagement:Withinlargeorganizationstherearehundreds
ofusersandcontrollingwhocanaccesstowhichinformationiscrucialfor
[Link]
[Link]
handledbyNeo4j.UPCLondonusesNeo4jforthatanditreceived2014Graphic
awardsfor“Bestİdentityandaccessmanagementapp”
Chapcrer11:GettingstartedwithNeo4j
Requirements
[Link],requires:
JDKVersion8andabove.
Neo4jGraphDatabase3.1andabove.
SpringFramework{springVersion}andabove.
IfyouplanonalteringtheversionoftheNeo4j-OGMmakesureitisa3.0.0+release.
DownloadNeo4j
FirstdownloadNeo4jfromitsofficialwebsite:[Link]
YoucanchoosefromeitherafreeEnterpriseTrial,[Link]
e,weareusing theCommunityEdition.
Runthedownloadedfileandfollowtheinstructionsgivenbelow:
StartNeo4j:
StarttheServer
ClickontheinstalledNeo4jCommunityEdition.
Initializationstarted:
[Link]
Openbrowserandgotolocalhost:[Link]
Or[Link]
StartNeo4jwebserver
Visitthesub-directory/binoftheextractedfolderandexecuteinterminal./neo4jstart
Visit[Link]
Onlythefirsttime,youwillhavetosigninwiththedefaultaccountandchangethe
defaultpassword.Asofcommunityversion3.0.3,thedefaultusernameand
passwordareneo4jandneo4j.
YoucannowinsertNeo4jqueriesintheconsoleprovidedinyourwebbrowserand
visuallyinvestigatetheresultsofeachquery.
StartNeo4jwebserver
EachNeo4jservercurrently(inthecommunityedition)canhostasingleNeo4j
database,soinordertosetupanewdatabase:
Visitsub-directory/bin andexecute./neo4jstop tostoptheserver
Visitagainthesub-directory/binandexecute./neo4jstart
[Link]
visitagain[Link]
Thecreateddatabaseislocatedinthesub-directory/data/databases ,underafolder
withthenamespecifiedintheparameterdbms.active_database .
Deleteoneofthedatabases
MakesuretheNeo4jserverisnotrunning;gotosub-directory/binandexecute
./neo4jstatus .Iftheoutputmessageshowsthattheserverisrunning,alsoexecute
./neo4jstop .
Thengotosub-directory/data/databasesanddeletethefolderofthedatabase
youwanttoremove.
CypherQueryLanguage
ThisistheCypher,Neo4j'[Link],CypherissimilartoSQL
ifyouarefamiliarwithit,exceptSQLreferstoitemsstoredinatablewhile
Cypherreferstoitemsstoredinagraph.
First,weshouldstartoutbylearninghow to createagraphandadd
relationships,sincethatisessentiallywhatNeo4jisallabout.
CREATE(ab:Object{age:30,destination:"England",weight:99})
YouuseCREATEtocreatedata
Toindicateanode,youuseparenthesis:()
Theab:Objectpartcanbebrokendownasfollows:avariable'ab'and
label'Object'[Link],but
youhavetobeconsistentinalineofCypherQuery
Toaddpropertiestothenode,usebrackets:{}brackets
Next,wewilllearnaboutfindingMATCHes
MATCH(abc:Object)[Link]="England"RETURNabc;
MATCHspecifiesthatyouwanttosearchforacertainnode/relationship
pattern(abc:Object)referstoonenodePattern(withlabelObject)which
[Link]
asthefollowing
abc= findthematchesthatisanObjectWHEREthedestinationisEngland.
Inthiscase,WHEREaddsaconstraintwhichisthatthedestinationmustbe
[Link](neo4jwill
notacceptjustaMatch...yourquerymustalwaysreturnsomevalue[thisalso
dependsonwhattypeofqueryyouarewriting...wewilltalkmoreaboutthislater
asweintroducetheothertypesofqueriesyoucanmake].
Thenextlinewillbeexplainedinthefuture,afterwegooversomemore
[Link]
candowiththislanguage!Below,youwillfindanexamplewhichgetsthecastof
movieswhosetitlestartswith'T'
MATCH(actor:Person)-[:ACTED_IN]->(movie:Movie)
[Link]"T"
[Link],collect([Link])AScast
ORDERBYtitleASCLIMIT10;
AcompletelistofcommandsandtheirsyntaxcanbefoundattheofficialNeo4j
CypherReferenceCardhere.
RDBMSVsGraphDatabase RDBMS GraphDatabase
Table Graph Rows Nodes
ColumnsandData Propertiesanditsvalues
Constraints Relationships Joins Traversal
Cypher
Introduction
[Link]
andmatchesagainstaNeo4jGraph.
Cypheris"inspiredbySQL"andisdesignedtobyintuitiveinthewayyoudescribe
therelationships,[Link]
Cypherrepresentationofthepattern.
Examples
Creation
Createanode
CREATE(neo:Company)//createnodewithlabel'Company'
CREATE(neo:Company{name:'Neo4j',hq:'SanMateo'})//createnodewithproperties
Createarelationship
CREATE(beginning_node)-[:edge_name{Attribute:1,Attribute:'two'}]->(ending_node)
QueryTemplates
Runningneo4jlocally,inthebrowserGUI(default:[Link]
), youcanrunthefollowingcommandtogetapaletteofqueries.
:playquerytemplate
Thishelpsyougetstartedcreatingandmergingnodesandrelationshipsbytyping
queries.
CreateanEdge CREATE(beginning_node)-[:edge_name{Attribute:1,Attribute:'two'}]->(ending_node)
Deleteallnodes
MATCH(n)
DETACHDELETEn
DETACH doesn'tworkinolderversions(lessthen2.3),forpreviousversionsuse
MATCH(n)
OPTIONALMATCH(n)-[r]-() DELETEn,r
Deleteallnodesofaspecificlabel
MATCH(n:Book)
DELETEn
Match(capturegroup)andlinkmatchednodes
Match(node_name:node_type{}),(node_name_two:node_type_two{})
CREATE(node_name)-[::edge_name{}]->(node_name_two)
UpdateaNode
MATCH(n)
WHEREn.some_attribute="someidentifier"
SETn.other_attribute="anewvalue"
DeleteAllOrphanNodes
Orphannodes/verticesarethoselackingallrelationships/edges.
MATCH(n)
WHERENOT(n)--()
DELETEn ReadCypheronline:neo4j/topic/3669/cypher
Python&Neo4j
Examples
Installneo4jrestclient
pipinstallneo4jrestclient
Connecttoneo4j
[Link]
db=GraphDatabase("[Link]
Createsomenodeswithlabels
user=[Link]("User")
u1=[Link](name="user1")
[Link](u1)
u2=[Link](name="user2")
[Link](u2)
Youcanassociatealabelwithmany nodesinonego
Language=[Link]("Language")
b1=[Link](name="C++")
b2=[Link](name="Python")
[Link](b1,b2)
Createrelationships
[Link]("likes",b1)
[Link]("likes",b2) [Link]("likes",b1)
Bi-directionalrelationships
[Link]("friends",u2)
Matchusingneo4jrestclient
fromneo4jrestclientimportclient
q='MATCH(u:User)-[r:likes]->(m:language)[Link]="Marco"RETURNu,type(r),m'
"db"asdefinedabove
results=[Link](q,returns=([Link],str,[Link]))
Printresults
forrinresults:
print("(%s)-[%s]->(%s)"%(r[0]["name"],r[1],r[2]["name"]))
Output:
(Marco)-[likes]->(C++) (Marco)-[likes]->(Python)
Chapter12:Neo4jApplication
SoftwareForthegraphdatabase,Neo4jCommunityEdition3.2.5hasbeenused,
andfortherelationaldatabase,SQLServer2017.
UseCaseSelected
Asproposedingraphdatabasebenchmarkguidelines[4],thebestteststo
benchmarkagraphdatabaseare:traversal(whichincludesthecalculationofthe
shortestpath),graphanalysis,connectedcomponents,communities,centrality
measures,[Link]
amongthedomainswheregraphdatabasesprovetobemorebeneficialarethe
[Link]
implementation,wearegoingtomodelflightroutes,astheyhavetheideal
[Link]
wheretheinformationliesonthetheirintercommunications.
Data
Becauseofthesizeconcernswecreatedsyntheticdatainadditiontoourexistingdata
tables.Beforecreatingnewdatawehad67663differentroutesandnowwehave1193
413
[Link],theydonothaveany
[Link]
[Link]
have,themoreaccuratebench-
[Link],
addingmoredatatoNeo4jdoesnoteffectitsperformance.
ImplementingData
Figure25:[Link]
Neo4j:[Link]
py2neolibrarytoaccessNeo4jdatabaseanditreadsourdata(externalsource)to
createnodes,relationships,propertiesandindexes
Figure26:Structureofthepythoncode
[Link]
bettervisualizationwecreatedafunctionthatcalculatesthedistancebetweentwo
connectedairports.Routedatahassource_airportanddestination_airportSowe
createdaroutenodeandweassignedthedistancebetweensource_airportand
destination_airportasanameattributetoroutenode.Intheendfourtypesofnodes
areAirlines,AirportsandRoutes,andtheyhavethefollowingcommunications:
Route ! TO ! Airport
Route ! FROM ! Airport
Route ! OF ! Airline
Table2:Graphdatabaseschema WeimplementedourdatatoNeo4jwiththisschema;
Figure27:InitialSchema
Figure28:ExampleofaqueryinNeo4j
SQL:Arelationaldatabasewascreatedimportingeachflatfileasatableandthen
wecreatedforeignkeyreferencesbetweentables.
Exportdata
ToexporttheNeo4j,[Link]
[Link],[Link]:
[Link]=true.
ExporttoCSV
Exporttocypherscript
[Link](file,config):[Link]
r statements to the provided file
[Link](nodes,rels,file,config):
[Link]
[Link](graph,file,config)exportsgivengraphobj
ect incl.
Thedatabasewasalsoexportedtocypheracypherscript:
CALL [Link]("/temp/neo4j_database_cypher_file.cypher",
{batchSize:10})YIELDfile,source,format,nodes,relationships,properties,time,
rows
Figure29:ExportingNeo4jdatabasetocypherscript
QueryExamples(Neo4j-SQL)
Figure30:Algorithmsforgraphdatabases
Addlibraries:IthasbeencommentedthatNeo4jincludesgraphalgorithmsthat
allow ustoperform queriesthatwouldbeimpossibletoperform inSQL.
LibrariesofalgorithmscanbedownloadedandaddedinNeo4jasplugins.
Figure31:Addjarfilesinpluginfolder
[Link],thislineofcodehas
[Link]:[Link]=apoc.*(e.g.,
apoclibrary).
Afterthat,Neo4jneedstoberestarted,anditcanbeverifiedthatthepluginis
workingbywritingthefollowingcommandinNeo4jbrowser:
[Link]()YIELDname,signature,description
WHEREnamestartswith"apoc"
RETURNname,signature,description
ShortestPath
Thisalgorithmistheonethatbetterjustifiestheexistenceofgraphdatabases.
[Link]
oflayerstheroutehas.
Firstqueryexample:findtheshortestpathtogofromanairportinMadridtoan
airportinSeoul.
MATCHp=shortestpath((src:Airportcity: ’Madrid’)-[r:FROM|TO*..15]
(dest:Airportcity: ’Seoul’))RETURNp
Figure32:ShortestpathqueryfromMadridtoSeoul
Figure33:Pipelineoftheshortestpathquery
Thenodescanbeexpanded,andweseetheairlinetowhicheachroutebelongs.
Figure34:Expandedshortestpathquery
Secondqueryexample:findtheshortestpathbetweenanairportinSeouland
anairportinAntwerp.
MATCHp=shortestpath((src:Airport{city: ’Seoul’})-[r:FROM|TO*..15]
(dest:Airport{city: ’Antwerp’}))RETURNp
Figure35:ShortestpathqueryfromSeoultoAntwerp
Payingattentiontotherelationships,itcanbeseenthatthequerydoesn’toutputa
physicallypossibletravelingroutefrom [Link]
query,oneofthepathsendsupinSeoul,buttheotherhastwosources,Madridand
Seoul,[Link],
oneinAntwerpandtwoinSeoul,andalltheroutesfinishinGeneve.
Thepurposeofthealgorithmistofindtheshortestpathtoconnecttwonodes,
independentlyofthephysicalmeaning,butrealroutescanbecreatedwiththe
followingmodification:
Persistentinferredrelationships:Foreachroutegoingfrom
anairporttoanother,[Link]
,the [Link]
tofindphys-icallypossiblepathsbetweentwoairports(e.g.,notsteppinginto
anairline)itwillbeassuredlookingforthatinferredrelationshipthatairports
arebeingconnectedtoairports.
[Link],andispropo
[Link]
shortestpathqueriesandcommunitydetectionqueries.
Cyphercodetocreatetherelationship:
MATCH(ap1:Airport)<-[:FROM]-(r:Route)-[:TO]->(ap2:Airport)
WHEREid(ap1)<>id(ap2)
WITHap1,ap2,COUNT(*)ASweight
CREATE(ap1)-[c:CONNECTED]->(ap2)
[Link]=weightInthefigurebelowthedatabaseschemaafteraddingthe
inferredrelationshipisdisplayed:
Figure36:Neo4jDBschemaafteraddingConnectedrelationships
Cyphercodetodeletetherelationship:
MATCH(ap1:Airport)-[r:CONNECTED]->(ap2:Airport)DELETEr
[Link]
[Link] detec-
tionqueries.
Cyphercodetocreatetherelationship:
MATCH(ap1:Airport)<-[:FROM]-(r:Route)-[:TO]->(ap2:Airport)
WHEREid(ap1)<>id(ap2)
WITHap1,ap2,r
MATCH(r)-[:OF]->(al:Airline)
CREATE(ap1)-[g:GOINGTO]->(ap2)
[Link]=[Link]
[Link]=id(r)
[Link]=[Link]
Inthefigurebelowthedatabaseschemaafteraddingtheinferredrelationshipis
displayed:
Figure37:Neo4jDBschemaafteraddingGoingtorelationships
Cyphercodetodeletetherelationship:
MATCH(Airport)-[r:GOINGTO]->(Airport)DELETEr
Thefirstshortestpathqueryisrunagainnowwiththeinferredrelationships:
MATCHp=shortestpath((src:Airport{city: ’Madrid’})-[r:GOINGTO]
(dest:Airport{city: ’Seoul’}))RETURNp
Figure38:ShortestpathbetweenMadridandSeoul
[Link]
seen,[Link]
followingqueryitcanbeverifiediftheroutematchestherequisites:
MATCH(r:Route)WHEREid(r)=50276RETURNr
ItisverifiedthattherelationshipGOINGTOwasequivalenttoarealoutbound
[Link]:
MATCH(r:Route)WHEREid(r)=50205RETURNr
Figure39:Shortestpathreturnrouteoutput
ShortestpathinSQLServer:SQLServerhasthelimitationthatitneedtobe
[Link]
query,butfromourexperience,itwasnoteffective.
Whenexecutingthequery,weobtainthefollowingmessage:"Thestatementterminat
ed.Themaximumrecursion100exhaustedbeforestatementcompletion."
Figure40:PipelineofNeo4jqueryonAntwerp-Patnashortestpath
Betweennesscentrality:
Thebetweennesscentralityofanodeinanetworkisthenumberofshortestpaths
betweentwoothermembersinthenetworkonwhichagivennodeappears. Between-
nesscentalityisanimportantmetricbecauseitcanbeusedtoidentify
“brokersofinformation”inthenetworkornodesthatconnectdisparateclusters.[6]
Thisqueryshowstheairportsthathavetobecrossedmoreoftenbyroutesto gofrom
[Link],theairportswheremore
[Link] inthefigurebelow,theairports
highlightedarelikebottlenecksthatconnectclustersofairports.
MATCH(ap:Airport)
WITHcollect(ap)ASairports
CALL [Link]([’CONNECTED’],airports,’OUTGOING’)
YIELDnode,score
[Link]=score
RETURNnodeASAirport,scoreORDERBYscoreDESCLIMIT25
Figure41:Betweennesscentralityqueryresult
Thequeryoutputsfivebigairports,whicharecommonlyusedtotransferduring
[Link]
centrality.
Closenesscentrality:
Closenesscentralityistheinverseoftheaveragedistancetoallothercharactersin
[Link]
clustersinthegraph,butnotnecessarilyhighlyconnectedoutsideofthecluster.[6]
[Link]
otherwords,itshowsthelocationsthataremoregeographicallyisolatedtobe
reachedbyothermeansoftransport([Link]).Itcanoutputtheairportswith
moredirectflightsfromdifferentlocationsortheairlinesthatperformmoreroutes.
Figure42:Conceptofclosenesscentrality
Queryexample:outputthefiveairportswithahigherclosenesscentrality:
MATCH(ap:Airport)
WITHcollect(ap)ASairports
CALL [Link]([’CONNECTED’], airports, ’OUTGOING’)
YIELDnode,score
RETURNnodeASAirport,scoreORDERBYscoreDESCLIMIT5
Figure43:Closenesscentralityqueryresult
Aspredicted,thequeryoutputsairportsthatareinhighlytouristicbutgeographicallyi
solatedlocations:LopezIslandnearSeattle,theriverAraguaiainthemiddle
ofBrazil,theGrandCanyonofColorado...
Figure44:Locationoftheairportswithhighestclosenesscentrality
Queryperformance:WritingPROFILEbeforethecypherquery,outputsthe
pipelineofthequeryexecution.
Figure45:Pipelineoftheclosenesscentralityquery
PageRank:
ThesecretofGoogle’ssuccesswasitssearchalgorithm,[Link]
worksbycountingthenumberandqualityoflinkstoapagetodeterminearough
[Link]
importantwebsitesarelikelytoreceivemorelinksfrom otherwebsites[11].This
algorithmcanoutputthemostconnectedairportorthemostpowerfulairline(the
nodeconnectedtomoreroutes).
Firstquery:outputthemostimportantairports
MATCH(ap:Airport)WITHcollect(ap)ASairports
[Link](airports)YIELDnode,score
Figure46:Pipelineoftheairportspagerankquery
Themostimportantairportsarefrom London,Paris,Frankfurt,Patna,Dubai,
[Link].
Secondquery:Outputthemostpopularairlines.
MATCH(node:Airline)WITHcollect(node)ASairlines
[Link](airlines)YIELDnode,score
Figure47:Pipelineoftheairlinespagerankquery
AsaresultwecanseethatRyanairistheleadingairline,followedbyfour
companiesfromtheUSAandthreefromChina.
CommunityDetection:
Therearemanyalgorithmsforcommunitydetection:trianglecounting,strongly
connectedcomponents,...Thisalgorithmsclustertogetherthenodesmore
[Link] from thelibraryAPOC,
andwhatthecodebelowdoes,[Link]
classificationisdeterminedontheweightoftheconnectedrelationships(the
numberofroutesbetweeneachpairofairports).
Seeingasairportsaregeographicallocation,androutsarephysicaljourneys
betweenthem,itisexpectedthatgeographicallyneighbouringairportswillbe
[Link].
CALL [Link](40,[’Airport’],’partition’,
’CONNECTED’,’OUTGOING’,’weight’,10000)
MATCH(ap:Airport)WHEREexists([Link])RETURNap
Figure48:Communitydetectiongraph
Thefigureovertheselinesshowstheshapeofthegraphafterthenodeshave
[Link],the
partitionnumbermustbereturnedasoutput:
[Link](40,[’Airport’],’partition’,
’CONNECTED’,’OUTGOING’,’weight’,10000)
MATCH(ap:Airport)WHEREexists([Link])
[Link],[Link],COUNT(*)ASnum
[Link],numDESC
Figure49:Communitydetectiontable
Goingbacktothevisualizationofthecommunitydetectionforairports,thepartitionsc
[Link]
nodesdisconnectedfrom therestofairportsiscomprisedofPapuaNewGuinea
airports(thecountrycanbeseenbyhoveringoverthenodes).Theybelongtothe
firstpartitioninthetable,6394.
Thefollowingpartofthegraphisabitscattered,butitcanbeseenthattheyareall
[Link],weseethattheyallbelong
toCanada,andwecansupposethatthemoreseparatednodesareregionalairports
[Link]
sevenpartitionsinthetable.
NexttoCanada,agroupofnodesareseparated,andthoseairportsareallfrom
Algeria.Theymustbelongtopartition6624.
[Link]
thoseareconnected withaGreenland’sairport,whichconnectswithother
GreenlandandIcelandairports.
ThenextsubgraphshowsairportsfromdifferentAfricancountriesinterconnected
[Link],thereareairports,andairportsfrom african
countrieshighlyconnectedtothem,andontherightsidetherearemainlynigerian
airports,amongotherafricanaiportstoo.
Goingbacktothecenterofthegraph,itishardtorecognizemorethanonepartition,
asitshowsthecentraleuropeanairports,whicharehighlyinterconnected.
Atlast,apartitionwasdetectedinthetable,[Link]
geographicallyrelated,ithasbeendeterminedthatthoseareislandsbetween
Polynesia,[Link]
(b)Geographicallocation
(a)Partitiontable
Figure50:Australasiapartition
PossiblequeriesonSQL
[Link]
willpresentoperationsapplicabletoboth;
Findingflightsbetweentwoairportsthathavenodirectroutebe-tweenthem:
MATCH
[Link]
p=allShortestPaths((ap1:Airport [1stAirport]
{city:’Antwerp’})-[*]->(ap2:Airport ,[Link][1st
{city:’Patna’})) Airline],
WITHextract(nodein [Link][2ndAirport],
nodes(p)|[Link])as [Link][2nd
cities, Airline],
extract(relin [Link][3rdAirport],
relationships(p)|[Link])as [Link][3rdAirline],
airlines [Link][4thAirport]
RETURNcities,airlines FROMroutesrINNERJOIN airportsa1
ONr.source_airport_id=[Link]
ĢINNERJOINairlinesairline1
[Link]=r.airline_id
INNERJOINairportsa2
ON
r.destination_airport_id=[Link]
INNERJOINroutesr2
[Link]=r2.source_airport_id
INNERJOINairlinesairline2
[Link]=r2.airline_id
INNERJOINairportsa3
ON
r2.destination_airport_id=[Link]
INNERJOINroutesr3
[Link]=r3.source_airport_id
INNERJOINairlinesairline3
[Link]=r3.airline_id
INNERJOINairportsa4
on
[Link]=r3.destination_airport_id
[Link]=’Antwerp’and
[Link]=’Patna’
(a)Neo4jResult
(b)SQLResult Figure51:ComparisonofQueries-firstquery
Asitcanbeseenfromherefindingallpossibleroutesbetweentwoairportsiseasyin
Neo4j.BesidesthatNeo4jgivesvisualization.
Thereisoneimportantpointhere;InSQLwehavetospecifylevelofdepthtofind
results.Forexampleinthisquerywesearched3-levelflightsbetweenAntwerpand
[Link]
Neo4jwedon’thavetospecifylevel,itfindsallroutesbetweentwoairportsandeven
[Link]
datathathaslevels.
Nearestairporttocitybydistance
Select
match(airport1:Airport{city:’Bologna’} top1
)<-[:FROM]-(route:Route) [Link],[Link],[Link]
-[:TO]->(airport2:Airport) ,[Link]([Link],a2. latitude,
RETURNairport1, [Link],[Link])as
route,airport2 distance
[Link] fromroutesr
asclimit1 INNERJOINairportsa
[Link]=r.source_airport_id
INNERJOINairportsa2
on
[Link]=r.destination_airport_id
[Link]=’Bologna’
orderbydistanceasc
WhilewewereuploadingourdataintoNeo4jwecreatedanodecalledroute
andthisnodehasthreerelationships;TO,FROM,OFandasadescriptive
[Link]
samepagewecreatedafunctioninSQLthatcalculatesdistancesbetween
airportsgivenlat-itudeandlongitudeattributesofairportswhichalreadyexists
inourdata.BothapproachesgivethesameresultbutNeo4jalsoprovides
visualization.
Mostconnectedairports
MATCH SELECT
(airport:Airport)<-[:FROM]-(r:Route)
[Link],[Link],[Link],SUM(A.route_count)
WITHairport,count(r)as ASroute_count
departures FROM(
MATCH SELECT
(r2:Route)-[:TO]->(airport) [Link],[Link],[Link],
[Link] COUNT(*)asroute_countFROM
airport_name,departures routesR
,count(r2)asarrivals INNERJOINairportsAON
orderby [Link]=source_airport_id
departures+arrivalsdesc
GROUPBY
[Link],[Link],[Link]
)
UNION(
SELECT
[Link],[Link],[Link],COUNT(*)
asroute_countFROM
routesR
INNERJOINairportsAON
[Link]=destination_airport_id
GROUPBY
[Link],[Link],[Link]))A
GROUPBY [Link],[Link],[Link]
(a)Neo4jQuery
(b)SQLquery
Figure52:Comparisonofqueries-thirdquery
Withthesequerieswefoundthemostinterconnectedairportbycountingnumberof
incomingandoutcomingflights.AsitseemsitisveryeasytowriteinNeo4j.
Bibliography
TareqAbedrabboDominicFoxJonasPartnerAleksaVukotic,NickiWatt.Neo4jin
[Link],2015.
[Link][Link]
w. [Link]/topic/graph-theory,[Link]-11-30.
DB-
[Link].
Availableat[Link] Martinez-
[Link]-Sal,D.A
discussiononthedesignofgraphdatabasebenchmarks.September2010.
[Link][Link]
Accessed:2017-11-20.
[Link]-11-30.
[Link]:[Link],
[Link]-11-30.
[Link].
[Link][Link]
tenreasons/.
[Link],[Link]-12-8.
University [Link] managementessentials.
Available at [Link]
[Link]-10-21.
[Link],[Link]://
[Link]/[Link]-11-3.
[Link]:[Link][Link]
t. com/graph_theory/graph_theory_introduction.[Link]-11-30.
[Link]
[Link]’sBlog,[Link]-11-29.
[Link]
Serra’sBlog,[Link]-11-29.
[Link]
, [Link]-11-29.
[Link]
Theory(ICDT),volume1186ofLNCS,pages1–[Link],Jan1997.
[Link]:[Link].ofthe3th
Symposium onPrinciplesofDatabaseSystems(PODS),pages119–[Link]
Press, 1984.
[Link],[Link],[Link],[Link],[Link]
age
[Link](JODL),1(1):68–
88, 1997.
[Link].ofthe6thInt.
[Link](ICDT),volume1186ofLNCS,pages262–
[Link],Jan 1997.
[Link]
e14th [Link](VLDB),pages407–
[Link],AugSept 1988.
[Link]
mation. [Link](ICDE),pages374–
[Link] Society,Feb1989.
[Link]
TransactionsonKnowledgeandDataEngineering(TKDE),6(2):225–238,1994.
[Link]
ModernPhysics,74:47,Jan2002.
[Link],[Link],[Link]
rnalof LogicandComputation,13(6):939–956,2003.
[Link]:[Link]
n ConferenceonHypertextTechnology(ECHT),pages201–
[Link],NovDec1992.
[Link]
hip
Model.TechnicalReportTR9315,InstituteofAdvancedComputerScience,Univer
siteit Leiden,May1993.
[Link],[Link],[Link],[Link],[Link]
[Link]
ase Technology(EDBT),volume580ofLNCS,pages21–
[Link],March1992.
[Link].I
nProc.
2ndEuropeanSemanticWebConference(ESWC),number3532inLNCS,pages346
–360,
2005.
[Link],AGraphicalQueryLanguageforSemanticDatabases
.In Proc.ofthe4th [Link] Scientificand StatisticalDatabaseManagement
(SSDBM),volume339ofLNCS,pages259–[Link],June1988.
[Link]
DatabaseTheory(ICDT),volume326ofLNCS,pages19–
[Link],AugSept1988.
[Link]¨o,[Link],[Link]
al ofChemicalInformationandComputerSciences(JCISD),43(1):1085–
1093,Jan2003.
[Link],Amsterdam,1973.
[Link],2005.
[Link],[Link],[Link](XML)
1.0, W3C Recommendation 10 February 1998.
[Link]
[Link],[Link],[Link],[Link],[Link],[Link],[Link],
and [Link]
[Link]:theinternatio
naljournalof computerandtelecommunicationsnetworking,pages309–
[Link].,2000.
[Link].ofthe16thSymposium onPrinciplesof
DatabaseSystems(PODS),pages117–[Link],May1997.