PHP in the graph
Fosdem 2017, Brussels, Belgique
Agenda
Discover the Graph
Steps with a gremlin
Gremlin and PHP
Damien Seguy
CTO at Exakat
Static analysis for PHP
PHP code as Dataset
Speaker
Wordpress call graph
<?php
function _default_wp_die_handler( $message, $title
    $defaults = array( 'response' => 500 );
    $r = wp_parse_args($args, $defaults);
    $have_gettext = function_exists('__');
    if ( function_exists( 'is_wp_error' ) && 
is_wp_error( $message ) ) {
        if ( empty( $title ) ) {
            $error_data = $message->get_error_data
            if ( is_array( $error_data ) && isset(
Wordpress call graph
What is gremlin?
Domain specific language for graphs
It is a programming language to traverse a graph
It is open source, vendor-agnostic
Simple traversal
V :Vertices, or nodes or objects
E : Edges, or links or relations
G : graph, or the dataset
The Vertices
g represents the graph or the gremlin
v()represents all the vertices
g.V(1) is one of the vertice
Vertices always have an id
g.V(1) => v(1)
1
g.V(1) => v(1)
g.V(1).values('name') => apply_filter
g.V(1).values('leaf') => true
g.V(1).values('compat') => ['7.0', '7.1']
g.V(1).id() => 1
// non-existing properties
g.V(1).values('isFedAfterMidnight') => null
Properties
Graph is schemaless
apply_filter
Vertice discovery
Use valueMap to discover the graph
Except id and label
g.V(2).valueMap() => {name=wp_die, leaf=tru
g.V(2).valueMap('name') => {name=wp_die}
wp_die
The Edges
g.E() represents all the edges
g.E(1) is the edge with id 1
Edges have id, properties
also : start, end and label
CALLS
g.E(1) => e(1)
g.E(1).label() => 'CALLS'
g.E(1).id() => 1
g.E(1).values('count') => 3
g.E(1).valueMap() => {count=3}
Edge discovery
wp_diewp_ajax_fetch
_list
CALLS
Edges link the vertices
g.E(5).outV() => v(2)
g.E(6).outV() => v(3)
g.E(7).inV() => v(4)
Exiting the Edges
2
3
4
1
5
6
7
Directed graph
g.V(1).out() => v(2)
v(3)
g.V(1).in() => v(4)
g.V(1).both() => v(2)
v(3)
v(4)
Following Edges
2
3
4
1
5
6
7
g.V(1).inE() => e(7)
g.V(1).out().id() => 2
3
g.V(2).in().in().id() => 4
Chaining
2
3
4
1
5
6
7
Wordpress Calls Graph
The graph of all Wordpress internal function calls
function
:name
function
:name
CALLS
g.V(19).out('CALLS').values('name')
=> wp_hash_password
wp_cache_delete
g.V(19).in('CALLS').values('name')

=> reset_password
wp_check_password
Who's calling ?
wp_set_
password
reset_
password
wp_hash_
password
wp_check_
password
wp_cache_
delete
CALLS
CALLS
CALLS
CALLS
Is it Recursive?
get_permalink get_category_
link
get_category_
parents
id : 30
CALLS
CALLS
CALLS
g.V(30).as('myself')
.out(‘CALLS’)
.retain('myself')

.values('name')
=> get_category_parents
Is it Recursive?
get_permalink get_category_
link
get_category_
parents
id : 30
CALLS
CALLS
CALLS
Is it Recursive?
g.v(30).as('myself')
.in(‘CALLS’)
.retain('myself')
.values('name')
=> get_category_parents
get_permalink get_category_
link
get_category_
parents
id : 30
CALLS
CALLS
CALLS
g.V(47).as('myself')
.out(‘CALLS’).except('myself')
.out('CALLS').retain('myself')
.values('name')
=> wp_trash_comment
wp_delete_comment
Ping-Pong Functions
CALLS
wp_trash
_comment
id: 47
wp_delete
_comment
id : 148
CALLS
g.V(47).as('myself')
.out(‘CALLS’).except('myself')
.out(‘CALLS’).except('myself')
.out('CALLS').retain('myself')
.values('name')
Ping-Pong Functions
CALLS
CALLS
CALLS
g.V(47).as('myself')
.out(‘CALLS’).except('myself')
.out(‘CALLS’).except('myself')
.out(‘CALLS’).except('myself')
.out('CALLS').retain('myself')
.values('name')
Ping-Pong Functions
CALLS
CALLS
CALLS
CALLS
g.V(47).as('myself')
.repeat(
out(‘CALLS’).except('myself')
).emit().times(3)
.out('CALLS').retain('myself')
.values('name')
Ping-Pong Functions
CALLS
CALLS
CALLS
CALLS
Up to now
nodes and vertices : basic blocs
in and out (and both) : navigation
except(), retain(), in(‘label’) : filtering
Loops with repeat()
Starting at the vertices
Traversing the graph
1 2 3
4
6
5 7
Filtering on edges
g.V().out('CALL') => v(25);
g.V().out('CALL', 'CALLED', 'XXX') => v(25);
wp_set_
password
reset_
password
wp_hash_
password
wp_check_
password
wp_cache_
delete
CALLS
CALLS
CALLS
CALLS
Filtering on vertices
g.V().has('name')
g.V().has('name','wp_die')
g.V().has('name', neq('wp_die'))
g.V().has('name', within('wp_die', 'wp_header'))
g.V().has('name', without('wp_die', 'wp_header'))
wp_die
g.V().out('CALLS')
.has('name','wp_die')
.values('name') =>
Dying Functions
???? wp_dieCALLS
PROCESSING
wp_die
wp_die
wp_die
wp_die
wp_die
wp_die
wp_die
wp_die
wp_die
g.V().has('name','wp_die')
.in('CALLS')
.values('name')
=> wp_ajax_trash_post
wp_ajax_delete_post
wp_ajax_delete_meta
wp_ajax_delete_link
wp_ajax_delete_tag
wp_ajax_delete_comment
wp_ajax_oembed_cache
wp_ajax_imgedit_preview
we_ajax_fetch_list
Dying Functions
???? wp_dieCALLS
PROCESSING
Dying Functions
g.V().out('CALLS')
.has('name','wp_die')
.count() => 84
???? wp_dieCALLS
PROCESSING
PROCESSING
g.V().has('name','wp_die')
.in('CALLS')
.count() => 84
Dying Functions
g.V().out('CALLS')
.has('name','wp_die')
.dedup()
.count() => 1
???? wp_dieCALLS
PROCESSING
PROCESSING
g.V().has('name','wp_die')
.in('CALLS')
.dedup()
.count() => 84
Sampling
g.V().limit(2).count() => 2
g.V().range(2,5).count() => 3
g.V().tail(4).count() => 4
g.V().coin(0.01).count() => 44
g.V().count() => 4373
g.E().count() => 55457
wp_parse_args
2
wp_get_object
_terms
3
wp_terms_
checklist
4
count()
g.V()
wp_terms_
checklist
4
g.V().as('start')
.out('CALLS')
.has('name','wp_die')
.select('start')
.by('name') => wp_ajax_trash_post
wp_ajax_delete_post
wp_ajax_delete_meta
wp_ajax_delete_link
wp_ajax_delete_tag
wp_ajax_delete_comment
wp_ajax_oembed_cache
wp_ajax_imgedit_preview
wp_ajax_fetch_list
Dying Functions
???? wp_dieCALLS
PROCESSING
Naming nodes
esc_html
g.V()
_e
CALLS
CALLS
????
has(‘name’) CALLS
as () : gives a name
select() : select a node
by() : options of display
Filtering on vertices
//Functions that call wp_die and esc_html
g.V().has('name','wp_die')
.in('CALLS')
.as('results')
.out('CALLS')
.has('name', 'esc_html')
.select('results')
????
wp_die
esc_html
CALLS
CALLS
CALLS
????
CALLS
Filtering on vertices
//Functions that call wp_die and esc_html
g.V().where(
.out('CALLS')
.has('name','wp_die')
)
.as('results')
.out('CALLS')
.has('name', 'esc_html')
.select('results')
????
wp_die
esc_html
CALLS
CALLS
CALLS
????
CALLS
Filtering on vertices
//Functions that call wp_die and esc_html
g.V().where(
.out('CALLS')
.has('name','wp_die')
)
.where(
__.out('CALLS')
.has('name', 'esc_html')
)
????
wp_die
esc_html
CALLS
CALLS
CALLS
????
CALLS
Filtering on vertices
//Functions that call wp_die and esc_html
g.V().where(
.out('CALLS')
.has('name','wp_die')
)
.where(
__.out('CALLS')
.has('name', 'esc_html')
.count().is(neq(0))
)
????
wp_die
esc_html
CALLS
CALLS
CALLS
????
CALLS
Traversal full ahead
g.V().in/out()
.has()
.where()
.as()
.select().by().by()
Advanced
Applied to properties
Non standard functions
g.V().filter{
it.get().value('name') !=
it.get().value('name').toLowerCase()
}
.count() => 73
Closures
Steps often offer possibility for closure
Closure is between {} , uses ‘it.get()’ as current
node, is written in Groovy
Closure should be replaces by step, unless there
is a need for a special manipulation
GroupBy/GroupCount
g.V().groupCount('a').by('name').cap('a')
==>[wp_die:22, wp_header:24…]
g.V().groupCount('a').by('name')
.groupCount('b').by(out().count())
.cap('a','b')
==>[a:[wp_die:22, wp_header:24], b:[22:1, 24:1]]
Gremlin and PHP
Gremlin For PHP
https://github.com/PommeVerte/gremlin-php
Using with Neo4j : REST API
Older API : neo4jPHP, rexpro-php
No Gremlin implementation in PHP (yet?)
<?php
require_once('vendor/autoload.php'); 
// depending on your project this may not be necessa
use BrightzoneGremlinDriverConnection;
 
$db = new Connection([
   'host' => 'localhost',
   'graph' => 'graph'
]);
$db->open();
 
$result = $db->send('g.V(2)'); 
//do something with result
$db->close();
Apache TinkerPop
http://tinkerpop.incubator.apache.org/
Version : 3.2.3
Database
StarDog
sqlg
Gremlin
Server
Console
bin/gremlin-server.sh conf/gremlin-server-
modern.yaml
:install org.apache.tinkerpop
neo4j-gremlin 3.2.3-incubating
Thanks
dseguy@exakat.io @exakat
http://www.slideshare.net/dseguy/
Leaf and Roots
LEAF
ROOT
g.V().where( out('CALLS') )
.count() => 407
g.V().where( __.in('CALLS').count().is(eq(0)) )
.count() => 1304

Php in the graph (Gremlin 3)