Make an editorial pass over the newly SGML-ified contrib documentation.

Fix lots of bad markup, bad English, bad explanations. Second round of commits. pgcrypto and pgstandby still to go...
author: Tom Lane 2007-12-10 05:32:51 +0000
committer: Tom Lane 2007-12-10 05:32:51 +0000
commit: 8828689ae93f5873c263a3ab0a875407fcb95a6c (patch)
tree: eb5f24d8d1e2b49ffa0ed9d15dea9df8f923be16 /doc/src/sgml/intarray.sgml
parent: a3102ce1ef2da26a4d907b983e81da08982638cd (diff)
1 files changed, 216 insertions, 189 deletions
diff --git a/doc/src/sgml/intarray.sgml b/doc/src/sgml/intarray.sgml
index a7f29980e59..95121c1e9ba 100644
--- a/doc/src/sgml/intarray.sgml
+++ b/doc/src/sgml/intarray.sgml
@@ -1,196 +1,207 @@
+<!-- $PostgreSQL: pgsql/doc/src/sgml/intarray.sgml,v 1.5 2007/12/10 05:32:51 tgl Exp $ -->
+
 <sect1 id="intarray">
  <title>intarray</title>
- 
+
  <indexterm zone="intarray">
   <primary>intarray</primary>
  </indexterm>
 
  <para>
-  This is an implementation of RD-tree data structure using GiST interface
-  of PostgreSQL. It has built-in lossy compression.
- </para>
-
- <para>
-  Current implementation provides index support for one-dimensional array of
-  integers: gist__int_ops, suitable for small and medium size of arrays (used by 
-  default), and gist__intbig_ops for indexing large arrays (we use superimposed
-  signature with length of 4096 bits to represent sets). There is also a 
-  non-default gin__int_ops for GIN indexes on integer arrays.
+  The <filename>intarray</> module provides a number of useful functions
+  and operators for manipulating one-dimensional arrays of integers.
+  There is also support for indexed searches using some of the operators.
  </para>
 
  <sect2>
-  <title>Functions</title>
-
-  <itemizedlist>
-
-   <listitem>
-    <para>
-     <literal>int   icount(int[])</literal> - the number of elements in intarray
-    </para>
-    <programlisting>
-test=# select icount('{1,2,3}'::int[]);  
- icount
---------
-      3
-(1 row)
-    </programlisting>
-   </listitem>
-
-   <listitem>
-    <para>
-     <literal>int[] sort(int[], 'asc' | 'desc')</literal> - sort intarray
-    </para>
-    <programlisting>
-test=# select sort('{1,2,3}'::int[],'desc');
-  sort  
----------
- {3,2,1}
-(1 row)
-    </programlisting>
-   </listitem>
-
-   <listitem>
-    <para>
-     <literal>int[] sort(int[])</literal> - sort in ascending order
-    </para>
-   </listitem>
-
-   <listitem>
-    <para>
-     <literal>int[] sort_asc(int[]),sort_desc(int[])</literal> - shortcuts for sort
-    </para>
-   </listitem>
-
-   <listitem>
-    <para>
-     <literal>int[] uniq(int[])</literal> - returns unique elements
-    </para>
-    <programlisting>
-test=# select uniq(sort('{1,2,3,2,1}'::int[]));
-  uniq  
----------
- {1,2,3}
-(1 row)
-    </programlisting>
-   </listitem>
-
-   <listitem>
-    <para>
-     <literal>int   idx(int[], int item)</literal> - returns index of first
-      intarray matching element to item, or '0' if matching failed.
-    </para>
-    <programlisting>
-test=# select idx('{1,2,3,2,1}'::int[],2);
- idx
------
-   2
-(1 row)
-    </programlisting>
-   </listitem>
-
-   <listitem>
-    <para>
-     <literal>int[] subarray(int[],int START [, int LEN])</literal> - returns
-    part of intarray starting from element number START (from 1) and length LEN.
-    </para>
-    <programlisting>
-test=# select subarray('{1,2,3,2,1}'::int[],2,3);
- subarray
-----------
- {2,3,2}
-(1 row)
-    </programlisting>
-   </listitem>
-
-   <listitem>
-    <para>
-     <literal>int[] intset(int4)</literal> - casting int4 to int[]
-    </para>
-    <programlisting>
-test=# select intset(1);
- intset
---------
- {1}
-(1 row)
-    </programlisting>
-   </listitem>
-
-  </itemizedlist>
- </sect2>
+  <title><filename>intarray</> Functions and Operators</title>
 
- <sect2>
-  <title>Operations</title>
-  <table>
-   <title>Operations</title>
-   <tgroup cols="2">
+  <table id="intarray-func-table">
+   <title><filename>intarray</> Functions</title>
+
+   <tgroup cols="5">
+    <thead>
+     <row>
+      <entry>Function</entry>
+      <entry>Return Type</entry>
+      <entry>Description</entry>
+      <entry>Example</entry>
+      <entry>Result</entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry><function>icount(int[])</function></entry>
+      <entry><type>int</type></entry>
+      <entry>number of elements in array</entry>
+      <entry><literal>icount('{1,2,3}'::int[])</literal></entry>
+      <entry><literal>3</literal></entry>
+     </row>
+
+     <row>
+      <entry><function>sort(int[], text dir)</function></entry>
+      <entry><type>int[]</type></entry>
+      <entry>sort array &mdash; <parameter>dir</> must be <literal>asc</> or <literal>desc</></entry>
+      <entry><literal>sort('{1,2,3}'::int[], 'desc')</literal></entry>
+      <entry><literal>{3,2,1}</literal></entry>
+     </row>
+
+     <row>
+      <entry><function>sort(int[])</function></entry>
+      <entry><type>int[]</type></entry>
+      <entry>sort in ascending order</entry>
+      <entry><literal>sort(array[11,77,44])</literal></entry>
+      <entry><literal>{11,44,77}</literal></entry>
+     </row>
+
+     <row>
+      <entry><function>sort_asc(int[])</function></entry>
+      <entry><type>int[]</type></entry>
+      <entry>sort in ascending order</entry>
+      <entry><literal></literal></entry>
+      <entry><literal></literal></entry>
+     </row>
+
+     <row>
+      <entry><function>sort_desc(int[])</function></entry>
+      <entry><type>int[]</type></entry>
+      <entry>sort in descending order</entry>
+      <entry><literal></literal></entry>
+      <entry><literal></literal></entry>
+     </row>
+
+     <row>
+      <entry><function>uniq(int[])</function></entry>
+      <entry><type>int[]</type></entry>
+      <entry>remove adjacent duplicates</entry>
+      <entry><literal>uniq(sort('{1,2,3,2,1}'::int[]))</literal></entry>
+      <entry><literal>{1,2,3}</literal></entry>
+     </row>
+
+     <row>
+      <entry><function>idx(int[], int item)</function></entry>
+      <entry><type>int</type></entry>
+      <entry>index of first element matching <parameter>item</> (0 if none)</entry>
+      <entry><literal>idx(array[11,22,33,22,11], 22)</literal></entry>
+      <entry><literal>2</literal></entry>
+     </row>
+
+     <row>
+      <entry><function>subarray(int[], int start, int len)</function></entry>
+      <entry><type>int[]</type></entry>
+      <entry>portion of array starting at position <parameter>start</>, len <parameter>elements</></entry>
+      <entry><literal>subarray('{1,2,3,2,1}'::int[], 2, 3)</literal></entry>
+      <entry><literal>{2,3,2}</literal></entry>
+     </row>
+
+     <row>
+      <entry><function>subarray(int[], int start)</function></entry>
+      <entry><type>int[]</type></entry>
+      <entry>portion of array starting at position <parameter>start</></entry>
+      <entry><literal>subarray('{1,2,3,2,1}'::int[], 2)</literal></entry>
+      <entry><literal>{2,3,2,1}</literal></entry>
+     </row>
+
+     <row>
+      <entry><function>intset(int)</function></entry>
+      <entry><type>int[]</type></entry>
+      <entry>make single-element array</entry>
+      <entry><literal>intset(42)</literal></entry>
+      <entry><literal>{42}</literal></entry>
+     </row>
+
+    </tbody>
+   </tgroup>
+  </table>
+
+  <table id="intarray-op-table">
+   <title><filename>intarray</> Operators</title>
+
+   <tgroup cols="3">
     <thead>
      <row>
       <entry>Operator</entry>
+      <entry>Returns</entry>
       <entry>Description</entry>
      </row>
     </thead>
+
     <tbody>
      <row>
       <entry><literal>int[] &amp;&amp; int[]</literal></entry>
-      <entry>overlap - returns TRUE if arrays have at least one common element</entry>
+      <entry><type>boolean</type></entry>
+      <entry>overlap &mdash; <literal>true</> if arrays have at least one common element</entry>
      </row>
      <row>
       <entry><literal>int[] @&gt; int[]</literal></entry>
-      <entry>contains - returns TRUE if left array contains right array</entry>
+      <entry><type>boolean</type></entry>
+      <entry>contains &mdash; <literal>true</> if left array contains right array</entry>
      </row>
      <row>
       <entry><literal>int[] &lt;@ int[]</literal></entry>
-      <entry>contained - returns TRUE if left array is contained in right array</entry>
+      <entry><type>boolean</type></entry>
+      <entry>contained &mdash; <literal>true</> if left array is contained in right array</entry>
      </row>
      <row>
       <entry><literal># int[]</literal></entry>
-      <entry>returns the number of elements in array</entry>
+      <entry><type>int</type></entry>
+      <entry>number of elements in array</entry>
+     </row>
+     <row>
+      <entry><literal>int[] # int</literal></entry>
+      <entry><type>int</type></entry>
+      <entry>index (same as <function>idx</> function)</entry>
      </row>
      <row>
       <entry><literal>int[] + int</literal></entry>
-      <entry>push element to array ( add to end of array)</entry>
+      <entry><type>int[]</type></entry>
+      <entry>push element onto array (add it to end of array)</entry>
      </row>
      <row>
       <entry><literal>int[] + int[]  </literal></entry>
-      <entry>merge of arrays (right array added to the end of left one)</entry>
+      <entry><type>int[]</type></entry>
+      <entry>array concatenation (right array added to the end of left one)</entry>
      </row>
      <row>
       <entry><literal>int[] - int</literal></entry>
-      <entry>remove entries matched by right argument from array</entry>
+      <entry><type>int[]</type></entry>
+      <entry>remove entries matching right argument from array</entry>
      </row>
      <row>
       <entry><literal>int[] - int[]</literal></entry>
-      <entry>remove right array from left</entry>
+      <entry><type>int[]</type></entry>
+      <entry>remove elements of right array from left</entry>
      </row>
      <row>
       <entry><literal>int[] | int</literal></entry>
-      <entry>returns intarray - union of arguments</entry>
+      <entry><type>int[]</type></entry>
+      <entry>union of arguments</entry>
      </row>
      <row>
       <entry><literal>int[] | int[]</literal></entry>
-      <entry>returns intarray as a union of two arrays</entry>
+      <entry><type>int[]</type></entry>
+      <entry>union of arrays</entry>
      </row>
-
      <row>
       <entry><literal>int[] &amp; int[]</literal></entry>
-      <entry>returns intersection of arrays</entry>
+      <entry><type>int[]</type></entry>
+      <entry>intersection of arrays</entry>
      </row>
-
      <row>
       <entry><literal>int[] @@ query_int</literal></entry>
-      <entry>
-       returns TRUE if array satisfies query (like
-       <literal>'1&amp;(2|3)'</literal>)
-      </entry>
+      <entry><type>boolean</type></entry>
+      <entry><literal>true</> if array satisfies query (see below)</entry>
      </row>
-
      <row>
       <entry><literal>query_int ~~ int[]</literal></entry>
-      <entry>returns TRUE if array satisfies query (commutator of @@)</entry>
+      <entry><type>boolean</type></entry>
+      <entry><literal>true</> if array satisfies query (commutator of <literal>@@</>)</entry>
      </row>
     </tbody>
    </tgroup>
   </table>
+
   <para>
    (Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
    respectively called @ and ~.  These names are still available, but are
@@ -198,85 +209,102 @@ test=# select intset(1);
    are reversed from the convention formerly followed by the core geometric
    datatypes!)
   </para>
+
+  <para>
+   The <literal>@@</> and <literal>~~</> operators test whether an array
+   satisfies a <firstterm>query</>, which is expressed as a value of a
+   specialized data type <type>query_int</>.  A <firstterm>query</>
+   consists of integer values that are checked against the elements of
+   the array, possibly combined using the operators <literal>&amp;</>
+   (AND), <literal>|</> (OR), and <literal>!</> (NOT).  Parentheses
+   can be used as needed.  For example,
+   the query <literal>1&amp;(2|3)</> matches arrays that contain 1
+   and also contain either 2 or 3.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Index Support</title>
+
+  <para>
+   <filename>intarray</> provides index support for the
+   <literal>&amp;&amp;</>, <literal>@&gt;</>, <literal>&lt;@</>,
+   and <literal>@@</> operators, as well as regular array equality.
+   The implementation uses an RD-tree data structure with
+   built-in lossy compression.
+  </para>
+
+  <para>
+   Two GiST index operator classes are provided:
+   <literal>gist__int_ops</> (used by default) is suitable for
+   small and medium-size arrays, while
+   <literal>gist__intbig_ops</> uses a larger signature and is more
+   suitable for indexing large arrays.
+  </para>
+
+  <para>
+   There is also a non-default GIN operator class
+   <literal>gin__int_ops</>.
+  </para>
+
+  <para>
+   The choice between GiST and GIN indexing depends on the relative
+   performance characteristics of GiST and GIN, which are discussed elsewhere.
+   As a rule of thumb, a GIN index is faster to search than a GiST index, but
+   slower to build or update; so GIN is better suited for static data and GiST
+   for often-updated data.
+  </para>
  </sect2>
 
  <sect2>
   <title>Example</title>
 
   <programlisting>
-CREATE TABLE message (mid INT NOT NULL,sections INT[]);
-CREATE TABLE message_section_map (mid INT NOT NULL,sid INT NOT NULL);
+-- a message can be in one or more <quote>sections</>
+CREATE TABLE message (mid INT PRIMARY KEY, sections INT[], ...);
 
--- create indices
-CREATE unique index message_key ON message ( mid );
-CREATE unique index message_section_map_key2 ON message_section_map (sid, mid );
-CREATE INDEX message_rdtree_idx ON message USING GIST ( sections gist__int_ops);
+-- create specialized index
+CREATE INDEX message_rdtree_idx ON message USING GIST (sections gist__int_ops);
 
--- select some messages with section in 1 OR 2 - OVERLAP operator
-SELECT message.mid FROM message WHERE message.sections &amp;&amp; '{1,2}'; 
+-- select messages in section 1 OR 2 - OVERLAP operator
+SELECT message.mid FROM message WHERE message.sections &amp;&amp; '{1,2}';
 
--- select messages contains in sections 1 AND 2 - CONTAINS operator
+-- select messages in sections 1 AND 2 - CONTAINS operator
 SELECT message.mid FROM message WHERE message.sections @&gt; '{1,2}';
--- the same, CONTAINED operator
-SELECT message.mid FROM message WHERE '{1,2}' &lt;@ message.sections;
+
+-- the same, using QUERY operator
+SELECT message.mid FROM message WHERE message.sections @@ '1&amp;2'::query_int;
   </programlisting>
  </sect2>
 
  <sect2>
   <title>Benchmark</title>
+
   <para>
-  subdirectory bench contains benchmark suite.
+   The source directory <filename>contrib/intarray/bench</> contains a
+   benchmark test suite.  To run:
   </para>
+
   <programlisting>
-  cd ./bench
-  1. createdb TEST
-  2. psql TEST &lt; ../_int.sql
-  3. ./create_test.pl | psql TEST
-  4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
-                  with/without RD-Tree. Run script without arguments to
-                  see availbale options.
-
-     a)test without RD-Tree (OR)
-       ./bench.pl -d TEST -c -s 1,2 -v
-     b)test with RD-Tree
-       ./bench.pl -d TEST -c -s 1,2 -v -r
-
-     BENCHMARKS:
-    
-     Size of table &lt;message>: 200000
-     Size of table &lt;message_section_map>: 269133
-    
-     Distribution of messages by sections:
-    
-     section 0: 74377 messages
-     section 1: 16284 messages
-     section 50: 1229 messages
-     section 99: 683 messages
-    
-     old - without RD-Tree support,
-     new - with RD-Tree
-    
-     +----------+---------------+----------------+
-     |Search set|OR, time in sec|AND, time in sec|
-     |          +-------+-------+--------+-------+
-     |          |  old  |  new  |   old  |  new  |
-     +----------+-------+-------+--------+-------+
-     |         1|  0.625|  0.101|       -|      -|
-     +----------+-------+-------+--------+-------+
-     |        99|  0.018|  0.017|       -|      -|
-     +----------+-------+-------+--------+-------+
-     |       1,2|  0.766|  0.133|   0.628|  0.045|
-     +----------+-------+-------+--------+-------+
-     | 1,2,50,65|  0.794|  0.141|   0.030|  0.006|
-     +----------+-------+-------+--------+-------+
+   cd .../bench
+   createdb TEST
+   psql TEST &lt; ../_int.sql
+   ./create_test.pl | psql TEST
+   ./bench.pl
   </programlisting>
+
+  <para>
+   The <filename>bench.pl</> script has numerous options, which
+   are displayed when it is run without any arguments.
+  </para>
  </sect2>
 
  <sect2>
   <title>Authors</title>
+
   <para>
-   All work was done by Teodor Sigaev (<email>[email protected]</email>) and Oleg
-   Bartunov (<email>[email protected]</email>). See
+   All work was done by Teodor Sigaev (<email>[email protected]</email>) and
+   Oleg Bartunov (<email>[email protected]</email>). See
    <ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for
    additional information. Andrey Oktyabrski did a great work on adding new
    functions and operations.
@@ -284,4 +312,3 @@ SELECT message.mid FROM message WHERE '{1,2}' &lt;@ message.sections;
  </sect2>
 
 </sect1>
-
author	Tom Lane	2007-12-10 05:32:51 +0000
committer	Tom Lane	2007-12-10 05:32:51 +0000
commit	8828689ae93f5873c263a3ab0a875407fcb95a6c (patch)
tree	eb5f24d8d1e2b49ffa0ed9d15dea9df8f923be16 /doc/src/sgml/intarray.sgml
parent	a3102ce1ef2da26a4d907b983e81da08982638cd (diff)