Extending SQLextending SQL
In the sections that follow, we will discuss how you
can extend the PostgreSQL
SQL query language by adding:
functions (starting in )
aggregates (starting in )
data types (starting in )
operators (starting in )
operator classes for indexes (starting in )
How Extensibility WorksPostgreSQL is extensible because its operation is
catalog-driven. If you are familiar with standard
relational database systems, you know that they store information
about databases, tables, columns, etc., in what are
commonly known as system catalogs. (Some systems call
this the data dictionary.) The catalogs appear to the
user as tables like any other, but the DBMS stores
its internal bookkeeping in them. One key difference
between PostgreSQL and standard relational database systems is
that PostgreSQL stores much more information in its
catalogs: not only information about tables and columns,
but also information about data types, functions, access
methods, and so on. These tables can be modified by
the user, and since PostgreSQL bases its operation
on these tables, this means that PostgreSQL can be
extended by users. By comparison, conventional
database systems can only be extended by changing hardcoded
procedures in the source code or by loading modules
specially written by the DBMS vendor.
The PostgreSQL server can moreover
incorporate user-written code into itself through dynamic loading.
That is, the user can specify an object code file (e.g., a shared
library) that implements a new type or function, and
PostgreSQL will load it as required.
Code written in SQL is even more trivial to add
to the server. This ability to modify its operation on the
fly makes PostgreSQL uniquely
suited for rapid prototyping of new applications and storage
structures.
The PostgreSQL Type Systembase typedata typebasecomposite typedata typecompositePostgreSQL data types are divided into base
types, composite types, domains, and pseudo-types.
Base Types
Base types are those, like int4, that are
implemented below the level of the SQL> language
(typically in a low-level language such as C). They generally
correspond to what are often known as abstract data types.
PostgreSQL can only operate on such
types through functions provided by the user and only understands
the behavior of such types to the extent that the user describes
them. Base types are further subdivided into scalar and array
types. For each scalar type, a corresponding array type is
automatically created that can hold variable-size arrays of that
scalar type.
Composite Types
Composite types, or row types, are created whenever the user
creates a table. It is also possible to use to
define a stand-alone> composite type with no associated
table. A composite type is simply a list of types with
associated field names. A value of a composite type is a row or
record of field values. The user can access the component fields
from SQL> queries. Refer to
for more information on composite types.
Domains
A domain is based on a particular base type and for many purposes
is interchangeable with its base type. However, a domain can
have constraints that restrict its valid values to a subset of
what the underlying base type would allow.
Domains can be created using the SQL> command
.
Their creation and use is not discussed in this chapter.
Pseudo-Types
There are a few pseudo-types> for special purposes.
Pseudo-types cannot appear as columns of tables or attributes of
composite types, but they can be used to declare the argument and
result types of functions. This provides a mechanism within the
type system to identify special classes of functions. lists the existing
pseudo-types.
Polymorphic Typespolymorphic typepolymorphic functiontypepolymorphicfunctionpolymorphic
Four pseudo-types of special interest are anyelement>,
anyarray>, anynonarray>, and anyenum>,
which are collectively called polymorphic types>.
Any function declared using these types is said to be
a polymorphic function>. A polymorphic function can
operate on many different data types, with the specific data type(s)
being determined by the data types actually passed to it in a particular
call.
Polymorphic arguments and results are tied to each other and are resolved
to a specific data type when a query calling a polymorphic function is
parsed. Each position (either argument or return value) declared as
anyelement is allowed to have any specific actual
data type, but in any given call they must all be the
same actual type. Each
position declared as anyarray can have any array data type,
but similarly they must all be the same type. If there are
positions declared anyarray and others declared
anyelement, the actual array type in the
anyarray positions must be an array whose elements are
the same type appearing in the anyelement positions.
anynonarray> is treated exactly the same as anyelement>,
but adds the additional constraint that the actual type must not be
an array type.
anyenum> is treated exactly the same as anyelement>,
but adds the additional constraint that the actual type must
be an enum type.
Thus, when more than one argument position is declared with a polymorphic
type, the net effect is that only certain combinations of actual argument
types are allowed. For example, a function declared as
equal(anyelement, anyelement)> will take any two input values,
so long as they are of the same data type.
When the return value of a function is declared as a polymorphic type,
there must be at least one argument position that is also polymorphic,
and the actual data type supplied as the argument determines the actual
result type for that call. For example, if there were not already
an array subscripting mechanism, one could define a function that
implements subscripting as subscript(anyarray, integer)
returns anyelement>. This declaration constrains the actual first
argument to be an array type, and allows the parser to infer the correct
result type from the actual first argument's type. Another example
is that a function declared as f(anyarray) returns anyenum>
will only accept arrays of enum types.
Note that anynonarray> and anyenum> do not represent
separate type variables; they are the same type as
anyelement, just with an additional constraint. For
example, declaring a function as f(anyelement, anyenum)>
is equivalent to declaring it as f(anyenum, anyenum)>:
both actual arguments have to be the same enum type.
A variadic function (one taking a variable number of arguments, as in
) can be
polymorphic: this is accomplished by declaring its last parameter as
VARIADIC> anyarray>. For purposes of argument
matching and determining the actual result type, such a function behaves
the same as if you had written the appropriate number of
anynonarray> parameters.
&xfunc;
&xaggr;
&xtypes;
&xoper;
&xindex;
Using C++ for ExtensibilityC++
It is possible to use a compiler in C++ mode to build
PostgreSQL extensions by following these
guidelines:
All functions accessed by the backend must present a C interface
to the backend; these C functions can then call C++ functions.
For example, extern C> linkage is required for
backend-accessed functions. This is also necessary for any
functions that are passed as pointers between the backend and
C++ code.
Free memory using the appropriate deallocation method. For example,
most backend memory is allocated using palloc()>, so use
pfree()> to free it, i.e. using C++
delete()> in such cases will fail.
Prevent exceptions from propagating into the C code (use a
catch-all block at the top level of all extern C>
functions). This is necessary even if the C++ code does not
throw any exceptions because events like out-of-memory still
throw exceptions. Any exceptions must be caught and appropriate
errors passed back to the C interface. If possible, compile C++
with
If calling backend functions from C++ code, be sure that the
C++ call stack contains only plain old data structure
(POD>). This is necessary because backend errors
generate a distant longjump()> that does not properly
unroll a C++ call stack with non-POD objects.
In summary, it is best to place C++ code behind a wall of
extern C> functions that interface to the backend,
and avoid exception, memory, and call stack leakage.