ltree
ltree
This module implements a data type ltree> for representing
labels of data stored in a hierarchical tree-like structure.
Extensive facilities for searching through label trees are provided.
Definitions
A label is a sequence of alphanumeric characters
and underscores (for example, in C locale the characters
A-Za-z0-9_> are allowed). Labels must be less than 256 bytes
long.
Examples: 42>, Personal_Services>
A label path is a sequence of zero or more
labels separated by dots, for example L1.L2.L3>, representing
a path from the root of a hierarchical tree to a particular node. The
length of a label path must be less than 65Kb, but keeping it under 2Kb is
preferable. In practice this is not a major limitation; for example,
the longest label path in the DMOZ catalogue () is about 240 bytes.
Example: Top.Countries.Europe.Russia
The ltree> module provides several data types:
ltree stores a label path.
lquery represents a regular-expression-like pattern
for matching ltree> values. A simple word matches that
label within a path. A star symbol (*>) matches zero
or more labels. For example:
foo Match the exact label path foo>
*.foo.* Match any label path containing the label foo>
*.foo Match any label path whose last label is foo>
Star symbols can also be quantified to restrict how many labels
they can match:
*{n>} Match exactly n> labels
*{n>,} Match at least n> labels
*{n>,m>} Match at least n> but not more than m> labels
*{,m>} Match at most m> labels — same as *{0,m>}
There are several modifiers that can be put at the end of a non-star
label in lquery> to make it match more than just the exact match:
@ Match case-insensitively, for example a@> matches A>
* Match any label with this prefix, for example foo*> matches foobar>
% Match initial underscore-separated words
The behavior of %> is a bit complicated. It tries to match
words rather than the entire label. For example
foo_bar%> matches foo_bar_baz> but not
foo_barbaz>. If combined with *>, prefix
matching applies to each word separately, for example
foo_bar%*> matches foo1_bar2_baz> but
not foo1_br2_baz>.
Also, you can write several possibly-modified labels separated with
|> (OR) to match any of those labels, and you can put
!> (NOT) at the start to match any label that doesn't
match any of the alternatives.
Here's an annotated example of lquery:
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
a. b. c. d. e.
This query will match any label path that:
begins with the label Top
and next has zero to two labels before
a label beginning with the case-insensitive prefix sport
then a label not matching football nor
tennis
and then ends with a label beginning with Russ or
exactly matching Spain.
ltxtquery represents a full-text-search-like
pattern for matching ltree> values. An
ltxtquery value contains words, possibly with the
modifiers @>, *>, %> at the end;
the modifiers have the same meanings as in lquery>.
Words can be combined with &> (AND),
|> (OR), !> (NOT), and parentheses.
The key difference from
lquery> is that ltxtquery matches words without
regard to their position in the label path.
Here's an example ltxtquery:
Europe & Russia*@ & !Transportation
This will match paths that contain the label Europe and
any label beginning with Russia (case-insensitive),
but not paths containing the label Transportation.
The location of these words within the path is not important.
Also, when %> is used, the word can be matched to any
underscore-separated word within a label, regardless of position.
Note: ltxtquery> allows whitespace between symbols, but
ltree> and lquery> do not.
Operators and Functions
Type ltree> has the usual comparison operators
=>, <>,
<>, >>, <=>, >=>.
Comparison sorts in the order of a tree traversal, with the children
of a node sorted by label text. In addition, there are the following
specialized operators:
ltree> Operators
Operator
Returns
Description
ltree> @>> ltree>
boolean
is left argument an ancestor of right (or equal)?
ltree> <@> ltree>
boolean
is left argument a descendant of right (or equal)?
ltree> ~> lquery>
boolean
does ltree> match lquery>?
lquery> ~> ltree>
boolean
does ltree> match lquery>?
ltree> ?> lquery[]>
boolean
does ltree> match any lquery> in array?
lquery[]> ?> ltree>
boolean
does ltree> match any lquery> in array?
ltree> @> ltxtquery>
boolean
does ltree> match ltxtquery>?
ltxtquery> @> ltree>
boolean
does ltree> match ltxtquery>?
ltree> ||> ltree>
ltree
concatenate ltree> paths
ltree> ||> text>
ltree
convert text to ltree> and concatenate
text> ||> ltree>
ltree
convert text to ltree> and concatenate
ltree[]> @>> ltree>
boolean
does array contain an ancestor of ltree>?
ltree> <@> ltree[]>
boolean
does array contain an ancestor of ltree>?
ltree[]> <@> ltree>
boolean
does array contain a descendant of ltree>?
ltree> @>> ltree[]>
boolean
does array contain a descendant of ltree>?
ltree[]> ~> lquery>
boolean
does array contain any path matching lquery>?
lquery> ~> ltree[]>
boolean
does array contain any path matching lquery>?
ltree[]> ?> lquery[]>
boolean
does ltree> array contain any path matching any lquery>?
lquery[]> ?> ltree[]>
boolean
does ltree> array contain any path matching any lquery>?
ltree[]> @> ltxtquery>
boolean
does array contain any path matching ltxtquery>?
ltxtquery> @> ltree[]>
boolean
does array contain any path matching ltxtquery>?
ltree[]> ?@>> ltree>
ltree
first array entry that is an ancestor of ltree>; NULL if none
ltree[]> ?<@> ltree>
ltree
first array entry that is a descendant of ltree>; NULL if none
ltree[]> ?~> lquery>
ltree
first array entry that matches lquery>; NULL if none
ltree[]> ?@> ltxtquery>
ltree
first array entry that matches ltxtquery>; NULL if none
The operators <@, @>,
@ and ~ have analogues
^<@>, ^@>>, ^@>,
^~, which are the same except they do not use
indexes. These are useful only for testing purposes.
The following functions are available:
ltree> Functions
Function
Return Type
Description
Example
Result
subltree(ltree, int start, int end)
ltree
subpath of ltree> from position start> to
position end>-1 (counting from 0)
subltree('Top.Child1.Child2',1,2)
Child1
subpath(ltree, int offset, int len)
ltree
subpath of ltree> starting at position
offset>, length len>.
If offset> is negative, subpath starts that far from the
end of the path. If len> is negative, leaves that many
labels off the end of the path.
subpath('Top.Child1.Child2',0,2)
Top.Child1
subpath(ltree, int offset)
ltree
subpath of ltree> starting at position
offset>, extending to end of path.
If offset> is negative, subpath starts that far from the
end of the path.
subpath('Top.Child1.Child2',1)
Child1.Child2
nlevel(ltree)
integer
number of labels in path
nlevel('Top.Child1.Child2')
3
index(ltree a, ltree b)
integer
position of first occurrence of b> in
a>; -1 if not found
index('0.1.2.3.5.4.5.6.8.5.6.8','5.6')
6
index(ltree a, ltree b, int offset)
integer
position of first occurrence of b> in
a>, searching starting at offset>;
negative offset> means start -offset>
labels from the end of the path
index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4)
9
text2ltree(text)
ltree
cast text> to ltree>
ltree2text(ltree)
text
cast ltree> to text>
lca(ltree, ltree, ...)
ltree
lowest common ancestor, i.e., longest common prefix of paths
(up to 8 arguments supported)
lca('1.2.2.3','1.2.3.4.5.6')
1.2
lca(ltree[])
ltree
lowest common ancestor, i.e., longest common prefix of paths
lca(array['1.2.2.3'::ltree,'1.2.3'])
1.2
Indexes
ltree> supports several types of indexes that can speed
up the indicated operators:
B-tree index over ltree>:
<>, <=>, =>,
>=>, >
GiST index over ltree>:
<>, <=>, =>,
>=>, >>,
@>>, <@>,
@>, ~>, ?
Example of creating such an index:
CREATE INDEX path_gist_idx ON test USING GIST (path);
GiST index over ltree[]>:
ltree[] <@ ltree>, ltree @> ltree[]>,
@>, ~>, ?
Example of creating such an index:
CREATE INDEX path_gist_idx ON test USING GIST (array_path);
Note: This index type is lossy.
Example
This example uses the following data (also available in file
contrib/ltree/ltreetest.sql> in the source distribution):
CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING gist(path);
CREATE INDEX path_idx ON test USING btree(path);
Now, we have a table test> populated with data describing
the hierarchy shown below:
Top
/ | \
Science Hobbies Collections
/ | \
Astronomy Amateurs_Astronomy Pictures
/ \ |
Astrophysics Cosmology Astronomy
/ | \
Galaxies Stars Astronauts
We can do inheritance:
ltreetest=> SELECT path FROM test WHERE path <@ 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)
Here are some examples of path matching:
ltreetest=> SELECT path FROM test WHERE path ~ '*.Astronomy.*';
path
-----------------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Collections.Pictures.Astronomy
Top.Collections.Pictures.Astronomy.Stars
Top.Collections.Pictures.Astronomy.Galaxies
Top.Collections.Pictures.Astronomy.Astronauts
(7 rows)
ltreetest=> SELECT path FROM test WHERE path ~ '*.!pictures@.*.Astronomy.*';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
Here are some examples of full text search:
ltreetest=> SELECT path FROM test WHERE path @ 'Astro*% & !pictures@';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Hobbies.Amateurs_Astronomy
(4 rows)
ltreetest=> SELECT path FROM test WHERE path @ 'Astro* & !pictures@';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
Path construction using functions:
ltreetest=> SELECT subpath(path,0,2)||'Space'||subpath(path,2) FROM test WHERE path <@ 'Top.Science.Astronomy';
?column?
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
We could simplify this by creating a SQL function that inserts a label
at a specified position in a path:
CREATE FUNCTION ins_label(ltree, int, text) RETURNS ltree
AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
LANGUAGE SQL IMMUTABLE;
ltreetest=> SELECT ins_label(path,2,'Space') FROM test WHERE path <@ 'Top.Science.Astronomy';
ins_label
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
Authors
All work was done by Teodor Sigaev (teodor@stack.net) and
Oleg Bartunov (oleg@sai.msu.su). See
for
additional information. Authors would like to thank Eugeny Rodichev for
helpful discussions. Comments and bug reports are welcome.