PLSQL ContextSwitching
PLSQL ContextSwitching
Figure 3: Original PL/pgSQL function walk. Black sections of the profile bars quantify f→Qi context switch overhead.
(see Q1[location1] in Line 11, for example). Figure 5: Iterative SSA form of PL/SQL function walk.
Importantly, the zoo of PL/SQL control flow constructs—
PL/SQL SQL UDF pure SQL including LOOP, EXIT (to label), CONTINUE (at label), FOREACH,
f SSA ANF UDF SQL Qf FOR, WHILE— are now exclusively expressed in terms of goto
and jump labels Lx. While verbose (the original FOR loop is
iterative goto recursive recursive WITH RECURSIVE now implemented by the conditional goto in Lines 8 to 10,
the assignment of Line 15, and the goto L1 of Line 18, for
Figure 4: Intermediate forms on the way from f to Qf. example), this homogeneity aids subsequent steps that trade
1 function walk(origin, win, loose, steps) = 1 CREATE FUNCTION walk(origin coord, win int, loose int, steps int)
2 letrec L1(reward1, location1, movement1, step1) = 2 RETURNS int AS $$
3 letrec L2(reward1, location1, movement1, step1) = 3 SELECT walk*(L1, 0, origin, ’’, 0, win, loose, steps);
4 let movement2 = (Q1 [location1 ]) 4 $$ LANGUAGE SQL;
5 in
6 let roll = random() 5 CREATE FUNCTION walk*(
7 in 6 fn int, reward1 int, location1 coord, movement1 text, step1 int,
8 let location2 = (Q2 [location1 , movement2 , roll]) 7 win int, loose int, steps int)
9 in 8 RETURNS
a int AS $$
10 let reward2 = reward1 + Q3 [location2 ] 9 SELECT
b
11 in 10 CASE
c
12 let step2 = step1 + 1 11 WHENdfn = L1 THEN
13 in 12 CASE
e
14 if reward2 >= win OR reward2 <= loose then 13 WHENfstep1 <= steps THEN
15 step1 * sign(reward2) 14 walk*(L2,reward1,location1,movement1,step1,
16 else 15 g h
win,loose,steps)
17 L1(reward2,location2,movement2,step2) 16 ELSE 0
18 in 17
i
END
19 if step1 <= steps then 18 WHENjfn = L2 THEN
20 L2(reward1,location1,movement1,step1) 19 (SELECT
k
21 else 0 20 CASE
l
22 in 21 WHENmreward2 >= win OR reward2 <= loose THEN
23 L1(0, origin, ’’, 0) 22
n ostep1 * sign(reward2)
23 ELSE walk*(L1,reward2,location2,movement2,step2,
Figure 6: Tail-recursive ANF variant of function walk. 24 win,loose,steps)
25 END
26 FROM
27 (SELECT (Q1 [location1 ])) AS _0(movement2)
control flow for function calls. 28 LEFT JOIN LATERAL
29 (SELECT random()) AS _1(roll)
ON true LEFT JOIN LATERAL
ANF Turning Iteration Into Tail Recursion 30
31 (SELECT (Q2 [location1 , movement2 , roll])) AS _2(location2)
Despite its imperative appearance, the single-assignment re- 32 ON true LEFT JOIN LATERAL
33 (SELECT reward1 + (Q3 [location2 ])) AS _3(reward2)
striction renders SSA already quite close to the functional 34 ON true LEFT JOIN LATERAL
administrative normal form (ANF) [2]. To translate from 35 (SELECT step1 + 1) AS _4(step2)
SSA to ANF we adapt an algorithm by Chakravarty and 36 ON true)
37 END
colleagues [3]. The resulting programs are purely express- 38 $$ LANGUAGE SQL;
ion-based and are composed of—besides basic subexpres-
sions which, as for SSA, directly follow SQL syntax and
semantics—let(rec)·in and if·then·else expressions only. Figure 7: Recursive SQL UDF walk* and its wrapper walk.
Put briefly, we arrive at the ANF of function walk (shown The overlaid AST ( ) becomes relevant in step SQL .
in Figure 6) by
• translating each jump label Lx and the statement block
it governs into a separate function Lx(),
• turning goto Lx into calls to function Lx(), while
• supplying the values of φ-bound variables in Lx() as pa- are chained using LEFT JOIN LATERAL. If JeK denotes the SQL
rameters to these function calls (we additionally perform equivalent of ANF expression e, we have
lambda lifting and supply free variables as explicit param-
Jlet v = e1 in e2 K =
eters).
SELECT Je1 K AS _(v) LEFT JOIN LATERAL Je2 K ON true .
If we follow this strategy, iteration (i.e., looping back to a
label) will turn into recursion. Any such recursive call will LATERAL, introduced by the SQL:1999 standard, implements
be in tail position (since control does not return after a goto; the dependency of e2 on variable v. In a sense, LATERAL thus
see the calls to L1() in Lines 17 and 23 and L2() in Line 20) assumes the role of statement sequencing via ; in PL/SQL.
which will be crucial in the final translation to SQL’s WITH Here, Froid relied on the Microsoft SQL Server-specific OUTER
RECURSIVE. APPLY instead [6, 11]. The resulting LATERAL chains may look
Finally, note how sequences of statements have turned intimidating but note that these joins process single-row ta-
into chains of nested lets which nicely prepares the tran- bles containing bindings of names to (scalar) values.
scription to a SQL UDF in the upcoming step. This translation step emits a regular SQL UDF which fea-
tures direct tail recursion—in the case of walk* of Figure 7
UDF Direct Tail Recursion in a SQL UDF we find two recursive call sites at Lines 14 and 23. DBMSs
We take a first step towards SQL and transcribe the in- that admit such recursive UDFs could, in principle, evaluate
termediate ANF into a user-defined SQL function (UDF). this function to compute the result of the original PL/SQL
See Figure 7 (ignore the annotations for now). The mu- procedure. We observe, however, that
tual recursion between functions L1() and L2() is flattened • some DBMSs—among these MySQL, for example—forbid
using an additional parameter fn whose value discerns be- recursion in user-defined SQL functions, and that
tween the two call targets. This conversion into direct re- • the direct evaluation of these UDF has disappointing per-
cursion follows standard defunctionalization tactics [7, 12], formance characteristics. This is, again, due to significant
but inlining would work as well. Q→f and f→Q overhead: the plan for UDF f* ’s body needs to
We follow Froid and compile ANF constructs let·in and be prepared and instantiated anew on each recursive in-
if·then·else into SQL’s table-less SELECT and CASE·WHEN, re- vocation. Additionally, we quickly hit default stack depth
spectively. Nested let bindings translate into SELECTs that limits, e.g., in PostgreSQL or SQL Server.
9 SELECT
1 WITH RECURSIVE run("call?", args, result) AS ( 10 CASE
2 -- original function call 11 WHEN r.fn = L1 THEN
3 SELECT true AS "call?", in AS args, CAST(NULL AS τ) AS result 12 CASE
4 UNION ALL 13 WHENfr.step1 <= r.steps THEN
5 -- subsequent recursive calls and base cases 14 ROW(true,
6 SELECT iter.* ROW(L2,r.reward1,r.location1,r.movement1,r.step1,
7 FROM run AS r, r.win, r.loose, r.steps),
8 LATERAL (body(f* , r)) AS iter("call?", args, result) h
NULL)
9 WHERE r."call?" 16 ELSE ROW(false, NULL, 0)
10 ) 17 END
11 -- extract result of final recursive function invocation 18 WHEN r.fn = L2 THEN
12 SELECT r.result 19 (SELECT
13 FROM run AS r 20 CASE
14 WHERE NOT r."call?" 21 WHENmreward2 >= r.win OR reward2 <= r.loose THEN
22
oROW(false,NULL,r.step1 * sign(reward2))
23 ELSE ROW(true,
Figure 8: SQL CTE template that evaluates tail-recursive f* . ROW(L1,reward2,location2,movement2,step2,
r.win, r.loose, r.steps),
NULL)
25 END
FROM
SQL Inlinable SQL CTE (WITH RECURSIVE) 26
27 (.. -- code of Figure 7
.) -- (binds reward2 ,location2 ,movement2 ,step2 )
Instead, we bank on a SQL:1999-style recursive CTE [13, 37 END
§ 7.12] as an evaluation strategy for recursion, ultimately
compiling any use of PL/SQL or SQL user-defined functions Figure 9: Adapted UDF body body(walk* , r). At f , o , and
away. The CTE constructs a table run(call?,args,result)2 h , m row construction replaces recursive calls and base cases.
that tracks the evaluation of the recursive UDF f* :
• call? Does the UDF perform a recursive call (true) or
evaluate a base case (false)? Finalization. A merge of body(f* , r) with the SQL code tem-
• args In case of a call, what arguments are passed to f* ? plate yields a pure SQL expression which may be inlined
• result In a base case, what is the function’s result? at f’s call sites in the embracing query Q. Any occurrence
of PL/SQL has been compiled away. The DBMS will be
Recall that we are dealing with tail recursion: once we reach
able to compile the resulting SQL query into a regular plan
a base case, the UDF’s result is known and no further re-
and jointly optimize the formerly separated code of Q, the
cursive ascent is required. The obtained result may thus be
transformed body of f, and the embedded queries Qi . Most
returned as the original function’s outcome.
importantly, the evaluation of Q instantiates this joint plan
The evaluation of call f(in) is expressed by the simple WITH once and will proceed without the need for Q→f or f→Qi con-
RECURSIVE SQL code template of Figure 8: text switches. The upcoming section quantifies the benefits
• Line 3: Start evaluation with the original invocation of we can now reap.
the UDF for argument list in. f* ’s result (of type τ ) is yet Beyond tail recursion. Let us close this discussion by not-
unknown and thus encoded as NULL. ing that the WITH RECURSIVE-based simulation of a recursive
• Line 9: Continue evaluation as long as new recursive calls function extends beyond tail recursion. Table run can be
are to be performed. generalized to hold a true call graph that then does support
• Line 8: Evaluate the body of f* for the current arguments recursive ascent. While this is not needed in the context
held in r.args. This either leads to a new call or the of the present work, this paves the way for an intuitive,
evaluation of a base case. functional-style notation for SQL UDFs that may employ
• Lines 12 to 14: Once we reach a base, extract its result. linear or general n-way recursion. The run time savings can
Done. be—again, due to the absence of plan instantiation effort—
The code template of Figure 8 is entirely generic. It is to significant. We are actively pursuing this idea in a parallel
be completed with a slightly adapted body—body(f* , r)—of strand of research that aims to leave more complex recursive
the UDF f* for f. In this adaptation, computation in the hands of the DBMS itself.
• a recursive call f*(args) is replaced by the construction
of row (true,args,NULL) which encodes just that call in 3. ONCE PL/SQL IS GONE
simulation table run, Function walk() is not the exception. The described over-
• a base case expression with result v of type τ is replaced heads are pervasive [11] and we, too, observed them across
by row (false,NULL,v). a variety of PL/SQL functions.
Figure 9 depicts the resulting body body(walk* , r) for the Context switching overhead. Table 1 contains a sample of
recursive UDF of Figure 7. The construction of body(f* , ·) iterative functions and reports the run time for repeated
calls for a simple abstract syntax tree (AST) traversal of plan instantiation and deallocation to evaluate their em-
the body of UDF f* . Selected fragments of the AST for bedded queries. Columns Exec·Start and Exec·End embody
function walk* are shown in an overlay of Figure 7. This the f→Qi context switch overhead present in PostgreSQL (re-
traversal identifies the leaves of the computation—i.e., the call Section 1). Across the functions, we find overall f→Qi
recursive call sites f , o and base case expressions h , m —and overheads of up to 38%. Only the columns Exec·Run and In-
performs the local replacements described above. terp represent productive evaluation effort: the execution of
embedded queries and PL/SQL interpretation, respectively.
2
args abbreviates the list of UDF arguments. For walk* , Function fibonacci, an iterative computation of the nth Fi-
args = fn, reward1, location1, movement1, step1, win, loose, steps. bonacci number, evaluates arithmetic expressions only and
Table 1: Run time spent (in %) during PL/SQL evaluation. PostgreSQL 11.3 instance hosted on a Linux-based x86 box
Bold entries indicate context switch overhead of kind f→Qi . with 8 Intel CoreTM i7 CPUs running at 3.66 GHz with 64 GB
of RAM. We report the average as well as the window of
PL/pgSQL Function Exec·Start Exec·Run Exec·End Interp minimal/maximal measurements of ten runs.)
walk 30.89 55.13 4.36 9.63 Scaling the number of context switches. We can quite con-
see Figure 3
parse 13.84 68.52 2.20 15.62 sistently observe these savings of > 40% across a wide range
via finite state automaton of scenarios. Figure 11a varies the number of invocations
traverse 31.80 35.82 6.03 26.35 of walk() as well as the intra-function FOR loop iterations to
directed graph traversal
fibonacci
iteratively compute fib(n)
0 90.45 0 9.55 obtain a heat map of run time improvements. Only very
small numbers of invocations and/or iterations fail to com-
pensate the one-time cost to optimize and evaluate the tem-
plate query of Figure 8 (see the heat map’s lower left). Be-
does not execute embedded queries. PostgreSQL evaluates yond 32 invocations and/or iterations, the transformation
such simple expressions using a fast path that already fore- to recursive SQL always is a clear win.
goes plan instantiation. Compiling PL/SQL away does not
promise much in this case. Still, turning query-less iterative Beyond PostgreSQL. Modulo syntactic details, we were able
functions into pure SQL can uncover opportunities for paral- to apply the function transformation of Section 2 imme-
lel evaluation—this is a direction we have not yet explored. diately to Oracle, MySQL, SQL Server, and HyPer. As an
example, Figure 11b shows how the evaluation of parse() on
Iterative PL/SQL vs. Recursive SQL. For PL/pgSQL func- Oracle can significantly benefit once PL/SQL is traded for re-
tion walk(), Table 1 indicates potential run time savings of cursive SQL (measurements in the lower left appeared to be
about 35% ≈ 30.89%+4.36% should we manage to get rid of close to 100 ; we have omitted them here due to the DBMS’s
context switching overhead. The translation from iterative coarse timer resolution). SQLite3 lacks support for LATERAL,
PL/SQL to pure SQL built on a recursive CTE can indeed but a simple syntactic rewrite brought the functions to run
realize this advantage. Figure 10 shows the wall clock time on a system that formerly lacked any support for PL/SQL at
of one invocation of walk() for a growing number of FOR loop all. Compiling PL/SQL away could, generally, pave the way
iterations (which is controlled by parameter steps, see Fig- to provide scripting support for more database engines.
ure 3). Throughout the experiment, the recursive SQL vari-
ant consistently shows an even greater run time savings of When WITH RECURSIVE does too much. Exploiting tail recursion.
approximately 43%. Beyond saved context switches, this The transformation from SSA to ANF compiles goto into
suggests that the evaluation of pure SQL expressions gener- tail recursion which obviates the need for recursive ascent:
ally undercuts the interpretation of PL/SQL statements. any activiation of a tail-recursive function already contains
We have found the underlying RDBMSs to cope well with its complete evaluation context—typically held in the func-
the resulting SQL queries and their associated plans. The tion’s arguments. Tail recursion, thus, needs no stack (fra-
SQL equivalent of function walk() accounts for a translation mes). Vanilla WITH RECURSIVE, however, collects a trace of all
and optimization time of about 25 ms on PostgreSQL. As ex- function invocations and their respective arguments (recall
pected, the plans feature their share of LATERAL joins. Since table run of Figure 8). For our purposes, accumulating this
these come with a prescribed order of evaluation (from left to trace is wasted effort and the template of Figure 8 indeed
right) and process single-row tables, the joins do not present uses the predicate NOT r."call?" in Line 14 to dispose of the
a challenge to plan enumeration, however. We further ob- trace and hold on to the function’s final activation only.
served that the sub-plans associated with the Qi before and Here, a hypothetical “WITH TAIL RECURSIVE” that keeps the
after compilation did not diverge, essentially. most recent run row only would be a better fit. Interestingly,
(The measurements of Figure 10 have been taken on a earlier work on the evaluation of complex analytics in HyPer
has described just this construct, coined WITH ITERATE in [10].
To assess the benefit in the context of PL/SQL elimination,
[ms] we implemented WITH ITERATE in PostgreSQL 11.3. The re-
sulting space savings can indeed be profound, in particular
PL/SQL for functions with potentially sizable arguments. One such
4 000 example is function parse() which receives its input text as
WITH RECURSIVE
min/max envelope
an argument. Table 2 lists the number of buffer page writes
3 000 performed by PostgreSQL when inputs of growing length are
43% parsed. WITH ITERATE realizes the promise of tail recursion
and requires no space at all, while WITH RECURSIVE exhibits
2 000
512 60 61 60 59 59 59 60 59 59 59 88 78 69 58 54 49 44 43 44 48 512
# invocations (Q→parse)
# invocations (Q→walk)
256 55 60 61 60 56 60 60 59 59 59 86 81 72 59 54 49 46 44 45 48 256
128 55 53 60 56 60 55 60 59 59 59 75 60 72 60 54 49 44 45 45 48 128
64 61 56 57 59 59 59 56 59 59 60 67 78 62 58 50 44 44 46 50 64
32 78 52 60 55 59 59 59 56 59 60 67 67 54 50 47 45 47 50 32
16 88 62 56 46 55 59 60 60 55 59 50 50 67 45 46 46 50 16
8 87 82 71 49 51 49 61 60 60 55 60 71 50 57 48 52 8
4 98 87 83 69 66 61 54 60 60 60 67 60 57 67 57 55 4
2 109 96 95 86 64 63 59 55 59 60 67 67 75 86 50 52 2
Figure 11: Relative run time (in %) of recursive SQL vs. iterative PL/SQL. Light colors indicate an advantage for SQL.
quadratic space appetite (both, the number of required iter- known beforehand, the proposed compilation scheme is
ations that consume one input character each as well as the able to perform all of these calls in terms of a single eval-
lengths of the residual strings left to parse, do grow). uation of the template of Figure 8: instead of the single-
In an age of complex in-database computation, we step row recursive seed set up in Line 3 of the template, supply
forward and propose that a construct like WITH ITERATE should an n-row table of all arguments in. All else remains as is.
find its way into the SQL standard. The query will return a table of n result rows without
ever leaving the context of the WITH RECURSIVE block. Such
batching [4, 8] has been identified to provide a substantial
4. (TOO EARLY FOR) CONCLUSIONS boost in the iterative evaluation of function calls and we
This marks the beginning of a thread of research in which should be able to benefit, too.
we aim to explore fresh ways to support complex in-database • Since the compilation discloses the formerly opaque inter-
computation, preferably without turning existing engines on nals of a function’s body to the SQL query optimizer, we
their head. can expect a significantly better estimation of its eval-
Current coverage of PL/SQL. The compilation strategy does uation cost (instead of the all too common default or
not restrict the control flow used to express the imperative fixed cost). Exactly how this improves plan quality for
f and admits, for example, deep loop nesting (this is not function-rich workloads remains to be quantified.
showcased in the present paper). Exceptions and their as- • Beyond PL/SQL: With its ability to compile arbitrary SSA
sociated handlers constitute more of a challenge in this re- programs, this provides the groundwork required for the
spect: raising an exception is readily expressed in terms of compilation and evaluation of expressive imperative lan-
SSA’s goto, but the detection of exception conditions from guages within regular DBMSs and thus close to the data.
within a SQL query appears difficult. PL/SQL variables of
non-atomic types (e.g., row values, arrays, or geometric ob- 5. REFERENCES
jects) seamlessly fit with the compilation scheme as long as
the underlying RDBMS supports their storage in table cells. [1] B. Alpern, M. Wegman, and F. Zadeck. Detecting
A positional array update a[i]= e as permitted by PL/SQL Equality of Values in Programs. In Proc. POPL, San
translates into a less efficient replacement of array a. We are Diego, CA, USA, 1988.
currently underway to devise a compilation scheme for cur- [2] A. Appel. SSA is Functional Programming. ACM
sors that range over the result rows of an embedded query Qi. SIGPLAN Notices, 33(4), 1998.
Dynamic SQL (PL/SQL’s string-based EXECUTE) will probably [3] M. Chakravarty, G. Keller, and P. Zadarnowski. A
never compile to SQL. Functional Perspective on SSA Optimisation
Here, we have assumed the return type τ of PL/SQL func- Algorithms. Electronic Notes in Theoretical Computer
tion f to be scalar but this is not an inherent restriction. A Science, 82(2), 2004.
generalization to set-returning functions (the RETURN NEXT of [4] W. Cook and B. Wiedermann. Remote Batch
PL/SQL) has already been found to integrate elegantly. Invocation for SQL Databases. In Proc. DBPL,
Directions waiting to be explored include at least the follow- Seattle, WA, USA, 2011.
ing: [5] R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and
• With its recent Version 12, PostgreSQL will offer hooks F. Zadeck. Efficiently Computing Static Single
that enable merging of CTEs with their containing queries. Assignment Form and the Control Dependence Graph.
Inlining compiled functions into their calling query then ACM TOPLAS, 13(4), 1991.
opens up additional optimization opportunities. [6] C. Galindo-Legaria and M. Joshi. Orthogonal
• Flattening nested iteration into flat recursion facilitates Optimization of Subqueries and Aggregation. In Proc.
efficient evaluation through partitioning and parallelism. SIGMOD, Santa Barbara, CA, USA, 2001.
• If f is to be invoked n > 1 times (since it is embedded [7] T. Grust, N. Schweinsberg, and A. Ulrich. Functions
in a SQL query Q) and the arguments in of these calls are are Data Too (Defunctionalization for PL/SQL). In
Proc. VLDB, Riva del Garda, Italy, 2013.
[8] R. Guravannavar and S. Sudarshan. Rewriting
Procedures for Batched Bindings. Proc. VLDB, 1(1),
2008.
[9] B. Ozar. Froid: How SQL Server 2019 Might Fix the
Scalar Functions Problem. www.brentozar.com, 2018.
[10] L. Passing, M. Then, N. Hubig, H. Lang, M. Schreier,
S. Günnemann, A. Kemper, and T. Neumann. SQL-
and Operator-Centric Data Analytics in Relational
Main-Memory Databases. In Proc. EDBT, Venice,
Italy, 2017.
[11] K. Ramachandra, K. Park, K. Emani, A. Halverson,
C. Galindo-Legaria, and C. Cunningham. Froid:
Optimization of Imperative Programs in a Relational
Database. Proc. VLDB, 11(4), 2018.
[12] J. Reynolds. Definitional Interpreters for Higher-Order
Programming Languages. Higher-Order and Symbolic
Computation, 11, 1998.
[13] SQL:1999. Database Languages–SQL–Part 2:
Foundation. ISO/IEC 9075-2:1999.