0% found this document useful (0 votes)
58 views

PLSQL ContextSwitching

This document discusses the inefficiencies that can arise from frequent context switching between SQL and PL/SQL execution. PL/SQL functions are often slower than equivalent SQL queries due to overhead from repeatedly switching between SQL and PL/SQL evaluation modes when functions are called within SQL queries. The authors propose compiling PL/SQL functions into equivalent SQL subqueries to avoid this context switching overhead and allow more efficient combined evaluation. Their approach supports arbitrary PL/SQL control flow by compiling iteration into SQL-level recursion. Preliminary results show significant performance improvements by eliminating context switching between SQL and PL/SQL execution.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

PLSQL ContextSwitching

This document discusses the inefficiencies that can arise from frequent context switching between SQL and PL/SQL execution. PL/SQL functions are often slower than equivalent SQL queries due to overhead from repeatedly switching between SQL and PL/SQL evaluation modes when functions are called within SQL queries. The authors propose compiling PL/SQL functions into equivalent SQL subqueries to avoid this context switching overhead and allow more efficient combined evaluation. Their approach supports arbitrary PL/SQL control flow by compiling iteration into SQL-level recursion. Preliminary results show significant performance improvements by eliminating context switching between SQL and PL/SQL execution.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Compiling PL/SQL Away

Christian Duta Denis Hirn Torsten Grust


Universität Tübingen
Tübingen, Germany
[ christian.duta, denis.hirn, torsten.grust ]@uni-tuebingen.de

ABSTRACT Context switches will be abundant. If f’s call site is lo-


“PL/SQL functions are slow,” is common developer wisdom cated inside a SELECT-FROM-WHERE block of Q, each row pro-
that derives from the tension between set-oriented SQL eval- cessed by the block will invoke f. Likewise, if f embeds
uation and statement-by-statement PL/SQL interpretation. multiple queries or employs iteration, e.g., in terms of FOR
We pursue the radical approach of compiling PL/SQL away, or WHILE loops, we observe repeated plan evaluation for the Qi .
turning interpreted functions into regular subqueries that Unfortunately, both kinds of context switches are costly.
can then be efficiently evaluated together with their em- Each switch Q→f incurs overhead for PL/SQL interpreter invo-
bracing SQL query, avoiding any PL/SQL ↔ SQL context cation or resumption. A switch f→Qi leads to overhead due
switches. Input PL/SQL functions may exhibit arbitrary to (1) plan generation and caching on the first evaluation
control flow. Iteration, in particular, is compiled into SQL- of Qi or (2) plan cache lookup, plan instantiation, and plan
level recursion. RDBMSs across the board reward this com- teardown for each subsequent evaluation of Qi . Iteration in
pilation effort with significant run time savings that render both, Q and f, multiplies the toll.
established developer lore questionable. Let us make the conundrum concrete with PL/pgSQL func-
tion walk() of Figure 3. The function simulates the walk of
1. NOW IS NOT A GOOD TIME a robot on a grid whose cells hold rewards (see Figures 1a
and 2a). On cell (x, y) the robot follows a prescribed policy
TO INTERRUPT ME (e.g., move down ↓ if on cell (3, 0), see Figures 1b and 2b).
Frequent changes | The required | of context | context switch- This policy has been precomputed by a Markov decision pro-
ing effort | can turn | may even outweigh | otherwise tractable cess which takes into account that the robot may stray from
tasks | the cost | into real challenges. | of the tasks themselves. its prescribed path: a planned move right from (3, 2) will
If you have found those two sentences hard to comprehend, reach (4, 2) with probability 80% but may actually end up
you were struggling with the context switches—occurring at in (3, 3) or (3, 2), each with probability 10% (see Figures 1c
every | bar—needed to process a piece of one sentence before and 2c). A call walk(o,w,l,s) starts the robot in origin cell o
immediately turning focus back to the other. and performs a maximum of s steps; walk returns early if the
SQL evaluation in relational DBMSs can face such frequent accumulated reward exceeds w or falls below l.
context switches, in particular if bits of the query logic are Each execution of PL/SQL function walk leads to the iterated
implemented using PL/SQL,1 the in-database scripting lan- evaluation of the embedded SQL queries Q1...3. The run time
guage. Whenever a SQL query Q invokes a PL/SQL function, profile on the rightmost edge of Figure 3 identifies these em-
say f, bedded queries to use the lion share of execution time (e.g.,
• the DBMS switches from set-oriented plan evaluation to Q2 accounts for 54.02% of walk’s overall run time). While we
statement-by-statement PL/SQL interpretation mode (re- expect such embedded queries to dominate over the evalua-
ferred to as switch Q→f in the sequel). tion of simpler expressions and statements, the profile also
• Execution of f’s statements then switches query evalua- shows that a significant portion of the evaluation time for the
tion back to plan mode—possibly multiple times—to eval- Qi stems from walk→Qi context switch overhead (see the black
uate the SQL queries Qi embedded in f (switch f→Qi ). section of the profile bars). For PostgreSQL, this cost is
to be attributed to the engine’s ExecutorStart and Executor-
1
We refer to the language as PL/SQL as coined by Oracle. End functions. These prepare the Qi ’s plans (i.e., copy the
Our discussion extends to its variants known as PL/pgSQL cached plan into a runtime data structure and instantiate
in PostgreSQL or T-SQL in Microsoft SQL Server.
the query’s placeholders) and free temporary memory con-
texts, respectively. The FOR loop iteration in walk multiplies
this effort. The bottom line shows that PostgreSQL invests
more than 35% of its time in walk→Qi overhead during each
invocation of walk. Section 3 shows similar or worse bad
This article is published under a Creative Commons Attribution License news for more PL/pgSQL functions.
(http://creativecommons.org/licenses/by/3.0/), which permits distribution Two worlds of interpreters. We stress that the roots of this
and reproduction in any medium as well allowing derivative works, pro- tension between SQL and PL/SQL lie deep. We do not
vided that you attribute the original work to the author(s) and CIDR 2020.
10th Annual Conference on Innovative Data Systems Research (CIDR ‘20) merely observe a deficiency of the languages’ implementa-
January 12–15, 2020, Amsterdam, Netherlands. tion in a specific RDBMS, say, PostgreSQL.
0 1 2 3 4 x 0 1 2 3 4 x with SQL Server’s OUTER APPLY [6, 11]. The technique is el-
wall
0 −2 0 −1 0    egant and simple but comes with severe restrictions: fore-
1 −2 0 −1 1 1     0.1 most, the just mentioned chaining will only work for func-
tions f that exhibit loop-less control flow. This rules out

2 1 1 −1−1 0 2     (3,2) (4,2)
3 −2 1 0 −1 3    0.8
PL/SQL functions like walk that build on WHILE or FOR iter-
0.1
4 −1 0 −1−2−2 4     ation, arguably core constructs in any imperative program-
y y (3,3) ming language.
Compile PL/SQL away. We, too, subscribe to the drastic ap-
(a) Cell rewards. (b) Markov policy. (c) Random straying.
proach of Froid. However, we also believe that efforts that
aim to host complex computation inside the DBMS and thus
Figure 1: Controlling an unreliable robot to collect rewards. close to the data, need to support expressive programming
language dialects. Control flow restrictions will be an im-
cells policy actions mediate show stopper for the majority of interesting com-
here action there prob
loc reward loc action
.. .. .. .. putational workloads. The present research thus sets out
(2,0) -2 (2,0) ↓
(3,0) 0 (3,0) ↓
. . . . 1. to completely compile PL/SQL functions f away, trans-
(4,0) -1 (4,0) ↓ (3,2) → (4,2) 0.8 forming them into regular SQL queries Qf. The PL/SQL
(0,1) -2 (0,1) ↓ (3,2) → (3,3) 0.1 functions may feature iteration—in fact any control flow
.. .. .. .. (3,2) → (3,2) 0.1
. . . . .. .. .. .. is acceptable. If f indeed contained iteration, Qf will em-
(4,4) -2 (4,4) ↑ . . . . ploy a recursive common table expression (CTE, WITH RE-
(a) Rewards. (b) Policy. (c) Actions and straying. CURSIVE) to express this in pure SQL. No changes to the
underlying DBMS are required (although modest local
Figure 2: A tabular encoding of the robot control scenario. changes can provide another boost, see Section 3).
2. We study and quantify the run time impact of this com-
pilation approach and the benefit of getting rid of Q→f and
f→Qi context switches, in particular.
Once a (pure) SQL query has been translated into an
3. As a by-product, the approach enables in-database pro-
internal tree of algebraic operators, its evalution is driven
gramming support for DBMSs like SQLite3 that previ-
by a very specifically tuned plan interpreter: (1) a limited
ously lacked any PL/SQL support at all.
set of operators of known interface and behavior are or-
Section 2, the core of this paper, elaborates on the compi-
chestrated such that operator fusion or transitions between
lation technique that turns iterative PL/SQL into recursive
row-by-row and batched evaluation are feasible. Further,
SQL. We hope to show that the transformation is systematic
(2) the interpreter realizes a rigid evaluation discipline—in
and practical. Along the way, we point out several opportu-
Volcano-style, for example—following predetermined paths
nities to make the approach even more efficient. Section 3
of control. The deliberate control flow inside an PL/SQL
reports on experimental observations we made once we com-
function calls for a different imperative-style of interpreter
piled PL/SQL away.
whose progress is determined by the then current state of
updateable variables. The function body is assembled from
arbitrary blocks of statements whose behavior (“will it loop, 2. COMPILING PL/SQL AWAY
will it exit early?”) is not known a priori. The following structures the compilation into a series of
A merger of the SQL and PL/SQL interpreters thus ap- transformation steps. We will use the PL/SQL function walk
pears to be elusive. We regard the friction between both and of Figure 3 as a running example and show interim results
the resulting context switch costs to be fundamental. (The after each step. These intermediate forms of walk reveal fur-
situation may be different in DBMSs that compile SQL and ther optimizations and simplifications we could apply under-
PL/SQL into a common intermediate form that is then eval- way. The four forms are (also see Figure 4):
uated by a single interpreter or even the CPU if the IR is SSA Turn PL/SQL function f into static single assign-
native to the host machine. The present work is concerned ment (SSA) form. This maps the diversity of PL/SQL
with interpreted query evaluation.) control flow constructs to the single goto primitive.
Froid. PL/SQL has long been identified as a culprit for ANF From the SSA form derive a functional administra-
disappointing database application performance and it is tive normal form (ANF) for f which expresses itera-
common developer wisdom to “avoid PL/SQL functions alto- tion in terms of (mutually tail-)recursive functions.
gether if possible” [11]. The situation is dire and has led to UDF Flatten mutual recursion and map the ANF func-
recent drastic efforts—coined Froid [11]—by the Microsoft tions into one tail-recursive SQL user-defined function.
SQL Server team: if function f is simple enough, compile its SQL Identify recursive calls and base cases in the body of
statements into a regular SQL subquery Qf that can be in- this UDF and embed the body into a template query
lined into the containing SQL query Q. Queries Q and Qf may based on WITH RECURSIVE. This yields the SQL query Qf
then be planned once and executed together in set-oriented we are after.
fashion, avoiding any Q→f or f→Qi overheads. SQL Server Query Qf may then be inlined into Q at the call sites of the
with Froid indeed enjoys noticeable performance improve- original function f.
ments and has been recognized as a major step forward by
both, the developer as well as the database research com- SSA Explicit Data Flow and
munities [9]. Simple goto-Based Control Flow
In a nutshell, Froid transforms sequences of PL/SQL assign- Lowering the PL/SQL input into its static single assign-
ment statements into subqueries that are chained together ment (SSA) form [1] preserves the function body’s impera-
1 -- move robot following a precomputed Markov policy
2 CREATE FUNCTION walk(origin coord, win int, loose int, steps int)
3 RETURNS int AS $$
4 DECLARE
5 reward int = 0;
6 location coord = origin;
7 movement text = ’’;
8 roll float;
9 BEGIN % of run time
10 -- move robot repeatedly walk→Qi overhead
11 FOR step IN 1..steps LOOP
12 -- where does the Markov policy send the robot from here?
13 movement = (SELECT p.action 28.40
14 FROM policy AS p
15 WHERE location = p.loc);
16 -- compute new location of robot, Q1 [·]
17 -- robot may randomly stray from policy’s direction
18 roll = random(); 0.03
19 location = 54.02
20 (SELECT move.loc
21 FROM (SELECT a.there AS loc,
22 COALESCE(SUM(a.prob) OVER lt, 0.0) AS lo,
23 SUM(a.prob) OVER leq AS hi
24 FROM actions AS a
25 WHERE location = a.here AND movement = a.action
26 WINDOW leq AS (ORDER BY a.there),
27 lt AS (leq ROWS UNBOUNDED PRECEDING
28 EXCLUDE CURRENT ROW)
29 ) AS move(loc, lo, hi)
30 WHERE roll BETWEEN move.lo AND move.hi);
31 -- robot collects reward (or penalty) at new location Q2 [·, ·, ·]
32 reward = reward + (SELECT c.reward 12.44
33 FROM cells AS c
34 WHERE location = c.loc);
35 -- bail out if we win or loose early Q3 [·]
36 IF reward >= win OR reward <= loose THEN 0.03
37 RETURN step * sign(reward);
38 END IF;
39 END LOOP;
40 -- draw: robot performed all steps without winning or losing
41 RETURN 0; 0.01
42 END;
43 $$ LANGUAGE PLPGSQL;

Figure 3: Original PL/pgSQL function walk. Black sections of the profile bars quantify f→Qi context switch overhead.

function walk(origin, win, loose, steps)


tive style but introduces the invariant that any variable is 1
2 {
now assigned exactly once (see Figure 5). Variable reassign- 3 L0: goto L1;
ment in the original function leads to the introduction of a 4 L1: reward1 ← φ(L0: 0, L2: reward2 );
new variable version (e.g., step2 in Line 15) in SSA form. 5 location1 ← φ(L0: origin, L2: location2 );
6 movement1 ← φ(L0: ’’, L2: movement2 );
φ functions model that an assignment might be reached via 7 step1 ← φ(L0: 0, L2: step2 );
different control flow paths. The SSA invariant facilitates a
wide range of code simplifications, among these the tracking 8
9
if step1 <= steps then
goto L2
of redundant code, constant propagation, or strength reduc- 10 else return 0;
tion. Others have studied these in depth [5]. Let us note
that PL/SQL code is subject to the same optimizations as 11 L2: movement2 ← (Q1 [location1 ]);
12 roll ← random();
any imperative programming language. 13 location2 ← (Q2 [location1 , movement2 , roll]);
Statements in SSA programs are deliberatly simple, fea- 14 reward2 ← (Q3 [location2 ]);
turing assignments, conditionals, gotos, and return only. 15 step2 ← step1 + 1;
In the PL/SQL case, expressions in these SSA programs 16 if reward2 >= win OR reward2 <= loose then
are regular SQL expressions. The SSA program contains 17 return step1 * sign(reward2);
the original walk’s embedded queries Q1...3, with their query 18 goto L1;
}
parameters instantiated by the appropriate SSA variables
19

(see Q1[location1] in Line 11, for example). Figure 5: Iterative SSA form of PL/SQL function walk.
Importantly, the zoo of PL/SQL control flow constructs—

PL/SQL SQL UDF pure SQL including LOOP, EXIT (to label), CONTINUE (at label), FOREACH,
f SSA ANF UDF SQL Qf FOR, WHILE— are now exclusively expressed in terms of goto
and jump labels Lx. While verbose (the original FOR loop is
iterative goto recursive recursive WITH RECURSIVE now implemented by the conditional goto in Lines 8 to 10,
the assignment of Line 15, and the goto L1 of Line 18, for
Figure 4: Intermediate forms on the way from f to Qf. example), this homogeneity aids subsequent steps that trade
1 function walk(origin, win, loose, steps) = 1 CREATE FUNCTION walk(origin coord, win int, loose int, steps int)
2 letrec L1(reward1, location1, movement1, step1) = 2 RETURNS int AS $$
3 letrec L2(reward1, location1, movement1, step1) = 3 SELECT walk*(L1, 0, origin, ’’, 0, win, loose, steps);
4 let movement2 = (Q1 [location1 ]) 4 $$ LANGUAGE SQL;
5 in
6 let roll = random() 5 CREATE FUNCTION walk*(
7 in 6 fn int, reward1 int, location1 coord, movement1 text, step1 int,
8 let location2 = (Q2 [location1 , movement2 , roll]) 7 win int, loose int, steps int)
9 in 8 RETURNS
a int AS $$
10 let reward2 = reward1 + Q3 [location2 ] 9 SELECT
b
11 in 10 CASE
c
12 let step2 = step1 + 1 11 WHENdfn = L1 THEN
13 in 12 CASE
e
14 if reward2 >= win OR reward2 <= loose then 13 WHENfstep1 <= steps THEN
15 step1 * sign(reward2) 14 walk*(L2,reward1,location1,movement1,step1,
16 else 15 g h
win,loose,steps)
17 L1(reward2,location2,movement2,step2) 16 ELSE 0
18 in 17
i
END
19 if step1 <= steps then 18 WHENjfn = L2 THEN
20 L2(reward1,location1,movement1,step1) 19 (SELECT
k
21 else 0 20 CASE
l
22 in 21 WHENmreward2 >= win OR reward2 <= loose THEN
23 L1(0, origin, ’’, 0) 22
n ostep1 * sign(reward2)
23 ELSE walk*(L1,reward2,location2,movement2,step2,
Figure 6: Tail-recursive ANF variant of function walk. 24 win,loose,steps)
25 END
26 FROM
27 (SELECT (Q1 [location1 ])) AS _0(movement2)
control flow for function calls. 28 LEFT JOIN LATERAL
29 (SELECT random()) AS _1(roll)
ON true LEFT JOIN LATERAL
ANF Turning Iteration Into Tail Recursion 30
31 (SELECT (Q2 [location1 , movement2 , roll])) AS _2(location2)
Despite its imperative appearance, the single-assignment re- 32 ON true LEFT JOIN LATERAL
33 (SELECT reward1 + (Q3 [location2 ])) AS _3(reward2)
striction renders SSA already quite close to the functional 34 ON true LEFT JOIN LATERAL
administrative normal form (ANF) [2]. To translate from 35 (SELECT step1 + 1) AS _4(step2)
SSA to ANF we adapt an algorithm by Chakravarty and 36 ON true)
37 END
colleagues [3]. The resulting programs are purely express- 38 $$ LANGUAGE SQL;
ion-based and are composed of—besides basic subexpres-
sions which, as for SSA, directly follow SQL syntax and
semantics—let(rec)·in and if·then·else expressions only. Figure 7: Recursive SQL UDF walk* and its wrapper walk.
Put briefly, we arrive at the ANF of function walk (shown The overlaid AST ( ) becomes relevant in step SQL .
in Figure 6) by
• translating each jump label Lx and the statement block
it governs into a separate function Lx(),
• turning goto Lx into calls to function Lx(), while
• supplying the values of φ-bound variables in Lx() as pa- are chained using LEFT JOIN LATERAL. If JeK denotes the SQL
rameters to these function calls (we additionally perform equivalent of ANF expression e, we have
lambda lifting and supply free variables as explicit param-
Jlet v = e1 in e2 K =
eters).
SELECT Je1 K AS _(v) LEFT JOIN LATERAL Je2 K ON true .
If we follow this strategy, iteration (i.e., looping back to a
label) will turn into recursion. Any such recursive call will LATERAL, introduced by the SQL:1999 standard, implements
be in tail position (since control does not return after a goto; the dependency of e2 on variable v. In a sense, LATERAL thus
see the calls to L1() in Lines 17 and 23 and L2() in Line 20) assumes the role of statement sequencing via ; in PL/SQL.
which will be crucial in the final translation to SQL’s WITH Here, Froid relied on the Microsoft SQL Server-specific OUTER
RECURSIVE. APPLY instead [6, 11]. The resulting LATERAL chains may look
Finally, note how sequences of statements have turned intimidating but note that these joins process single-row ta-
into chains of nested lets which nicely prepares the tran- bles containing bindings of names to (scalar) values.
scription to a SQL UDF in the upcoming step. This translation step emits a regular SQL UDF which fea-
tures direct tail recursion—in the case of walk* of Figure 7
UDF Direct Tail Recursion in a SQL UDF we find two recursive call sites at Lines 14 and 23. DBMSs
We take a first step towards SQL and transcribe the in- that admit such recursive UDFs could, in principle, evaluate
termediate ANF into a user-defined SQL function (UDF). this function to compute the result of the original PL/SQL
See Figure 7 (ignore the annotations for now). The mu- procedure. We observe, however, that
tual recursion between functions L1() and L2() is flattened • some DBMSs—among these MySQL, for example—forbid
using an additional parameter fn whose value discerns be- recursion in user-defined SQL functions, and that
tween the two call targets. This conversion into direct re- • the direct evaluation of these UDF has disappointing per-
cursion follows standard defunctionalization tactics [7, 12], formance characteristics. This is, again, due to significant
but inlining would work as well. Q→f and f→Q overhead: the plan for UDF f* ’s body needs to
We follow Froid and compile ANF constructs let·in and be prepared and instantiated anew on each recursive in-
if·then·else into SQL’s table-less SELECT and CASE·WHEN, re- vocation. Additionally, we quickly hit default stack depth
spectively. Nested let bindings translate into SELECTs that limits, e.g., in PostgreSQL or SQL Server.
9 SELECT
1 WITH RECURSIVE run("call?", args, result) AS ( 10 CASE
2 -- original function call 11 WHEN r.fn = L1 THEN
3 SELECT true AS "call?", in AS args, CAST(NULL AS τ) AS result 12 CASE
4 UNION ALL 13 WHENfr.step1 <= r.steps THEN
5 -- subsequent recursive calls and base cases 14 ROW(true,
6 SELECT iter.* ROW(L2,r.reward1,r.location1,r.movement1,r.step1,
7 FROM run AS r, r.win, r.loose, r.steps),
8 LATERAL (body(f* , r)) AS iter("call?", args, result) h
NULL)
9 WHERE r."call?" 16 ELSE ROW(false, NULL, 0)
10 ) 17 END
11 -- extract result of final recursive function invocation 18 WHEN r.fn = L2 THEN
12 SELECT r.result 19 (SELECT
13 FROM run AS r 20 CASE
14 WHERE NOT r."call?" 21 WHENmreward2 >= r.win OR reward2 <= r.loose THEN
22
oROW(false,NULL,r.step1 * sign(reward2))
23 ELSE ROW(true,
Figure 8: SQL CTE template that evaluates tail-recursive f* . ROW(L1,reward2,location2,movement2,step2,
r.win, r.loose, r.steps),
NULL)
25 END
FROM
SQL Inlinable SQL CTE (WITH RECURSIVE) 26
27 (.. -- code of Figure 7
.) -- (binds reward2 ,location2 ,movement2 ,step2 )
Instead, we bank on a SQL:1999-style recursive CTE [13, 37 END
§ 7.12] as an evaluation strategy for recursion, ultimately
compiling any use of PL/SQL or SQL user-defined functions Figure 9: Adapted UDF body body(walk* , r). At f , o , and
away. The CTE constructs a table run(call?,args,result)2 h , m row construction replaces recursive calls and base cases.
that tracks the evaluation of the recursive UDF f* :
• call? Does the UDF perform a recursive call (true) or
evaluate a base case (false)? Finalization. A merge of body(f* , r) with the SQL code tem-
• args In case of a call, what arguments are passed to f* ? plate yields a pure SQL expression which may be inlined
• result In a base case, what is the function’s result? at f’s call sites in the embracing query Q. Any occurrence
of PL/SQL has been compiled away. The DBMS will be
Recall that we are dealing with tail recursion: once we reach
able to compile the resulting SQL query into a regular plan
a base case, the UDF’s result is known and no further re-
and jointly optimize the formerly separated code of Q, the
cursive ascent is required. The obtained result may thus be
transformed body of f, and the embedded queries Qi . Most
returned as the original function’s outcome.
importantly, the evaluation of Q instantiates this joint plan
The evaluation of call f(in) is expressed by the simple WITH once and will proceed without the need for Q→f or f→Qi con-
RECURSIVE SQL code template of Figure 8: text switches. The upcoming section quantifies the benefits
• Line 3: Start evaluation with the original invocation of we can now reap.
the UDF for argument list in. f* ’s result (of type τ ) is yet Beyond tail recursion. Let us close this discussion by not-
unknown and thus encoded as NULL. ing that the WITH RECURSIVE-based simulation of a recursive
• Line 9: Continue evaluation as long as new recursive calls function extends beyond tail recursion. Table run can be
are to be performed. generalized to hold a true call graph that then does support
• Line 8: Evaluate the body of f* for the current arguments recursive ascent. While this is not needed in the context
held in r.args. This either leads to a new call or the of the present work, this paves the way for an intuitive,
evaluation of a base case. functional-style notation for SQL UDFs that may employ
• Lines 12 to 14: Once we reach a base, extract its result. linear or general n-way recursion. The run time savings can
Done. be—again, due to the absence of plan instantiation effort—
The code template of Figure 8 is entirely generic. It is to significant. We are actively pursuing this idea in a parallel
be completed with a slightly adapted body—body(f* , r)—of strand of research that aims to leave more complex recursive
the UDF f* for f. In this adaptation, computation in the hands of the DBMS itself.
• a recursive call f*(args) is replaced by the construction
of row (true,args,NULL) which encodes just that call in 3. ONCE PL/SQL IS GONE
simulation table run, Function walk() is not the exception. The described over-
• a base case expression with result v of type τ is replaced heads are pervasive [11] and we, too, observed them across
by row (false,NULL,v). a variety of PL/SQL functions.
Figure 9 depicts the resulting body body(walk* , r) for the Context switching overhead. Table 1 contains a sample of
recursive UDF of Figure 7. The construction of body(f* , ·) iterative functions and reports the run time for repeated
calls for a simple abstract syntax tree (AST) traversal of plan instantiation and deallocation to evaluate their em-
the body of UDF f* . Selected fragments of the AST for bedded queries. Columns Exec·Start and Exec·End embody
function walk* are shown in an overlay of Figure 7. This the f→Qi context switch overhead present in PostgreSQL (re-
traversal identifies the leaves of the computation—i.e., the call Section 1). Across the functions, we find overall f→Qi
recursive call sites f , o and base case expressions h , m —and overheads of up to 38%. Only the columns Exec·Run and In-
performs the local replacements described above. terp represent productive evaluation effort: the execution of
embedded queries and PL/SQL interpretation, respectively.
2
args abbreviates the list of UDF arguments. For walk* , Function fibonacci, an iterative computation of the nth Fi-
args = fn, reward1, location1, movement1, step1, win, loose, steps. bonacci number, evaluates arithmetic expressions only and
Table 1: Run time spent (in %) during PL/SQL evaluation. PostgreSQL 11.3 instance hosted on a Linux-based x86 box
Bold entries indicate context switch overhead of kind f→Qi . with 8 Intel CoreTM i7 CPUs running at 3.66 GHz with 64 GB
of RAM. We report the average as well as the window of
PL/pgSQL Function Exec·Start Exec·Run Exec·End Interp minimal/maximal measurements of ten runs.)
walk 30.89 55.13 4.36 9.63 Scaling the number of context switches. We can quite con-
see Figure 3
parse 13.84 68.52 2.20 15.62 sistently observe these savings of > 40% across a wide range
via finite state automaton of scenarios. Figure 11a varies the number of invocations
traverse 31.80 35.82 6.03 26.35 of walk() as well as the intra-function FOR loop iterations to
directed graph traversal
fibonacci
iteratively compute fib(n)
0 90.45 0 9.55 obtain a heat map of run time improvements. Only very
small numbers of invocations and/or iterations fail to com-
pensate the one-time cost to optimize and evaluate the tem-
plate query of Figure 8 (see the heat map’s lower left). Be-
does not execute embedded queries. PostgreSQL evaluates yond 32 invocations and/or iterations, the transformation
such simple expressions using a fast path that already fore- to recursive SQL always is a clear win.
goes plan instantiation. Compiling PL/SQL away does not
promise much in this case. Still, turning query-less iterative Beyond PostgreSQL. Modulo syntactic details, we were able
functions into pure SQL can uncover opportunities for paral- to apply the function transformation of Section 2 imme-
lel evaluation—this is a direction we have not yet explored. diately to Oracle, MySQL, SQL Server, and HyPer. As an
example, Figure 11b shows how the evaluation of parse() on
Iterative PL/SQL vs. Recursive SQL. For PL/pgSQL func- Oracle can significantly benefit once PL/SQL is traded for re-
tion walk(), Table 1 indicates potential run time savings of cursive SQL (measurements in the lower left appeared to be
about 35% ≈ 30.89%+4.36% should we manage to get rid of close to 100 ; we have omitted them here due to the DBMS’s
context switching overhead. The translation from iterative coarse timer resolution). SQLite3 lacks support for LATERAL,
PL/SQL to pure SQL built on a recursive CTE can indeed but a simple syntactic rewrite brought the functions to run
realize this advantage. Figure 10 shows the wall clock time on a system that formerly lacked any support for PL/SQL at
of one invocation of walk() for a growing number of FOR loop all. Compiling PL/SQL away could, generally, pave the way
iterations (which is controlled by parameter steps, see Fig- to provide scripting support for more database engines.
ure 3). Throughout the experiment, the recursive SQL vari-
ant consistently shows an even greater run time savings of When WITH RECURSIVE does too much. Exploiting tail recursion.
approximately 43%. Beyond saved context switches, this The transformation from SSA to ANF compiles goto into
suggests that the evaluation of pure SQL expressions gener- tail recursion which obviates the need for recursive ascent:
ally undercuts the interpretation of PL/SQL statements. any activiation of a tail-recursive function already contains
We have found the underlying RDBMSs to cope well with its complete evaluation context—typically held in the func-
the resulting SQL queries and their associated plans. The tion’s arguments. Tail recursion, thus, needs no stack (fra-
SQL equivalent of function walk() accounts for a translation mes). Vanilla WITH RECURSIVE, however, collects a trace of all
and optimization time of about 25 ms on PostgreSQL. As ex- function invocations and their respective arguments (recall
pected, the plans feature their share of LATERAL joins. Since table run of Figure 8). For our purposes, accumulating this
these come with a prescribed order of evaluation (from left to trace is wasted effort and the template of Figure 8 indeed
right) and process single-row tables, the joins do not present uses the predicate NOT r."call?" in Line 14 to dispose of the
a challenge to plan enumeration, however. We further ob- trace and hold on to the function’s final activation only.
served that the sub-plans associated with the Qi before and Here, a hypothetical “WITH TAIL RECURSIVE” that keeps the
after compilation did not diverge, essentially. most recent run row only would be a better fit. Interestingly,
(The measurements of Figure 10 have been taken on a earlier work on the evaluation of complex analytics in HyPer
has described just this construct, coined WITH ITERATE in [10].
To assess the benefit in the context of PL/SQL elimination,
— [ms] we implemented WITH ITERATE in PostgreSQL 11.3. The re-
sulting space savings can indeed be profound, in particular
PL/SQL for functions with potentially sizable arguments. One such
4 000 example is function parse() which receives its input text as
WITH RECURSIVE
min/max envelope
an argument. Table 2 lists the number of buffer page writes
3 000 performed by PostgreSQL when inputs of growing length are
43% parsed. WITH ITERATE realizes the promise of tail recursion
and requires no space at all, while WITH RECURSIVE exhibits
2 000

1 000 Table 2: Eliminating buffering effort via WITH ITERATE.

0 # Iterations # Buffer Page Writes


(= input length) WITH ITERATE WITH RECURSIVE
10 50 100
10 000 0 6 132
# iterations (×1 000) 20 000 0 24 471
30 000 0 55 016
Figure 10: Iterative vs. recursive: wall clock time for walk() 40 000 0 97 769
on PostgreSQL 11.3 across varying intra-function iterations. 50 000 0 152 729
1024 61 60 57 60 60 59 59 59 59 59 87 80 67 57 52 47 44 42 44 47 1024

512 60 61 60 59 59 59 60 59 59 59 88 78 69 58 54 49 44 43 44 48 512

# invocations (Q→parse)
# invocations (Q→walk)
256 55 60 61 60 56 60 60 59 59 59 86 81 72 59 54 49 46 44 45 48 256

128 55 53 60 56 60 55 60 59 59 59 75 60 72 60 54 49 44 45 45 48 128

64 61 56 57 59 59 59 56 59 59 60 67 78 62 58 50 44 44 46 50 64

32 78 52 60 55 59 59 59 56 59 60 67 67 54 50 47 45 47 50 32

16 88 62 56 46 55 59 60 60 55 59 50 50 67 45 46 46 50 16

8 87 82 71 49 51 49 61 60 60 55 60 71 50 57 48 52 8

4 98 87 83 69 66 61 54 60 60 60 67 60 57 67 57 55 4

2 109 96 95 86 64 63 59 55 59 60 67 67 75 86 50 52 2

2 4 8 16 32 64 256 1024 2 4 8 16 32 64 256 1024

# iterations (walk→Qi ) # iterations (parse→Qi )

(a) walk on PostgreSQL. (b) parse on Oracle.

Figure 11: Relative run time (in %) of recursive SQL vs. iterative PL/SQL. Light colors indicate an advantage for SQL.

quadratic space appetite (both, the number of required iter- known beforehand, the proposed compilation scheme is
ations that consume one input character each as well as the able to perform all of these calls in terms of a single eval-
lengths of the residual strings left to parse, do grow). uation of the template of Figure 8: instead of the single-
In an age of complex in-database computation, we step row recursive seed set up in Line 3 of the template, supply
forward and propose that a construct like WITH ITERATE should an n-row table of all arguments in. All else remains as is.
find its way into the SQL standard. The query will return a table of n result rows without
ever leaving the context of the WITH RECURSIVE block. Such
batching [4, 8] has been identified to provide a substantial
4. (TOO EARLY FOR) CONCLUSIONS boost in the iterative evaluation of function calls and we
This marks the beginning of a thread of research in which should be able to benefit, too.
we aim to explore fresh ways to support complex in-database • Since the compilation discloses the formerly opaque inter-
computation, preferably without turning existing engines on nals of a function’s body to the SQL query optimizer, we
their head. can expect a significantly better estimation of its eval-
Current coverage of PL/SQL. The compilation strategy does uation cost (instead of the all too common default or
not restrict the control flow used to express the imperative fixed cost). Exactly how this improves plan quality for
f and admits, for example, deep loop nesting (this is not function-rich workloads remains to be quantified.
showcased in the present paper). Exceptions and their as- • Beyond PL/SQL: With its ability to compile arbitrary SSA
sociated handlers constitute more of a challenge in this re- programs, this provides the groundwork required for the
spect: raising an exception is readily expressed in terms of compilation and evaluation of expressive imperative lan-
SSA’s goto, but the detection of exception conditions from guages within regular DBMSs and thus close to the data.
within a SQL query appears difficult. PL/SQL variables of
non-atomic types (e.g., row values, arrays, or geometric ob- 5. REFERENCES
jects) seamlessly fit with the compilation scheme as long as
the underlying RDBMS supports their storage in table cells. [1] B. Alpern, M. Wegman, and F. Zadeck. Detecting
A positional array update a[i]= e as permitted by PL/SQL Equality of Values in Programs. In Proc. POPL, San
translates into a less efficient replacement of array a. We are Diego, CA, USA, 1988.
currently underway to devise a compilation scheme for cur- [2] A. Appel. SSA is Functional Programming. ACM
sors that range over the result rows of an embedded query Qi. SIGPLAN Notices, 33(4), 1998.
Dynamic SQL (PL/SQL’s string-based EXECUTE) will probably [3] M. Chakravarty, G. Keller, and P. Zadarnowski. A
never compile to SQL. Functional Perspective on SSA Optimisation
Here, we have assumed the return type τ of PL/SQL func- Algorithms. Electronic Notes in Theoretical Computer
tion f to be scalar but this is not an inherent restriction. A Science, 82(2), 2004.
generalization to set-returning functions (the RETURN NEXT of [4] W. Cook and B. Wiedermann. Remote Batch
PL/SQL) has already been found to integrate elegantly. Invocation for SQL Databases. In Proc. DBPL,
Directions waiting to be explored include at least the follow- Seattle, WA, USA, 2011.
ing: [5] R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and
• With its recent Version 12, PostgreSQL will offer hooks F. Zadeck. Efficiently Computing Static Single
that enable merging of CTEs with their containing queries. Assignment Form and the Control Dependence Graph.
Inlining compiled functions into their calling query then ACM TOPLAS, 13(4), 1991.
opens up additional optimization opportunities. [6] C. Galindo-Legaria and M. Joshi. Orthogonal
• Flattening nested iteration into flat recursion facilitates Optimization of Subqueries and Aggregation. In Proc.
efficient evaluation through partitioning and parallelism. SIGMOD, Santa Barbara, CA, USA, 2001.
• If f is to be invoked n > 1 times (since it is embedded [7] T. Grust, N. Schweinsberg, and A. Ulrich. Functions
in a SQL query Q) and the arguments in of these calls are are Data Too (Defunctionalization for PL/SQL). In
Proc. VLDB, Riva del Garda, Italy, 2013.
[8] R. Guravannavar and S. Sudarshan. Rewriting
Procedures for Batched Bindings. Proc. VLDB, 1(1),
2008.
[9] B. Ozar. Froid: How SQL Server 2019 Might Fix the
Scalar Functions Problem. www.brentozar.com, 2018.
[10] L. Passing, M. Then, N. Hubig, H. Lang, M. Schreier,
S. Günnemann, A. Kemper, and T. Neumann. SQL-
and Operator-Centric Data Analytics in Relational
Main-Memory Databases. In Proc. EDBT, Venice,
Italy, 2017.
[11] K. Ramachandra, K. Park, K. Emani, A. Halverson,
C. Galindo-Legaria, and C. Cunningham. Froid:
Optimization of Imperative Programs in a Relational
Database. Proc. VLDB, 11(4), 2018.
[12] J. Reynolds. Definitional Interpreters for Higher-Order
Programming Languages. Higher-Order and Symbolic
Computation, 11, 1998.
[13] SQL:1999. Database Languages–SQL–Part 2:
Foundation. ISO/IEC 9075-2:1999.

You might also like