Java Datstage
Java Datstage
Version 8 Release 1
LC18-9932-01
Version 8 Release 1
LC18-9932-01
Note Before using this information and the product that it supports, read the information in Notices on page 77.
Copyright International Business Machines Corporation 1998, 2008. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Chapter 1. DB2 API stage . . . . . . . 1
Introduction . . . . . . . . . . . . . . 1 Functionality of the DB2 UDB API stage . . . . . 1 Installing the Stage . . . . . . . . . . . . 2 Setting environment variables for IBM DB2 Database 2 The IBM DB2 Database Connection . . . . . . . 2 Defining the IBM DB2 Database Connection . . . . 3 Connecting to an IBM DB2 Data Source . . . . . 3 Transaction Isolation Levels . . . . . . . . . 4 Defining Character Set Mapping . . . . . . . . 4 Defining IBM DB2 Input Data . . . . . . . . 4 General Tab. . . . . . . . . . . . . . 4 Options tab . . . . . . . . . . . . . . 5 Columns Tab . . . . . . . . . . . . . 7 SQL Tab . . . . . . . . . . . . . . . 7 Defining IBM DB2 Output Data . . . . . . . . 7 General Tab. . . . . . . . . . . . . . 8 Columns Tab . . . . . . . . . . . . . 8 SQL Tab . . . . . . . . . . . . . . . 8 Data Type Support . . . . . . . . . . . . 9 Mapping Data Types from WebSphere DataStage SQL to IBM DB2 SQL . . . . . . . . . . 9 Mapping Data Types from IBM DB2 SQL to WebSphere DataStage SQL . . . . . . . . 10 Handling $ and # Characters . . . . . . . . 11 Troubleshooting . . . . . . . . . . . . . 11 Partitioning tab . . . . . Output page . . . . . . . Output Link Properties tab . . . . . . . . . . . . . . . . . . . . 39 . 41 . 41
13
. . . . . 13 14 15 16 17
. 18 . . . . . . . . . . 19 20 20 21 22 22 22 23 23 23
. 23 . . . . . . 24 24 24 25 25 25
iii
Product documentation . . . . . . . 71
Contacting IBM . . . . . . . . . . . . . 71
Notices . . . . . . . . . . . . . . 77
Trademarks . . . . . . . . . . . . . . 79
Index . . . . . . . . . . . . . . . 81
iv
Introduction
IBM DB2 API processes SQL statements in the native IBM DB2 environment. It also provides native importing of metadata definitions into the WebSphere DataStage Repository as well as live data browsing during job design. The DB2 UDB API stage enables WebSphere DataStage to write data to and read data from an IBM DB2 database. The DB2 UDB API stage is passive and can have any number of input, output, and reference output links. v Input links. Specify the data you are writing, which is a stream of rows to be loaded into an IBM DB2 database. You can specify the data on an input link by using an SQL statement generated by WebSphere DataStage or constructed by the user. v Output links. Specify the data you are extracting, which is a stream of rows to be read from an IBM DB2 database. You can specify the data on an output link by using an SQL SELECT statement generated by WebSphere DataStage or constructed by the user. v Reference output links. Represent rows that are read from an IBM DB2 database by using the key columns in a WHERE clause of the SELECT statement. These statements can be constructed by WebSphere DataStage or specified by the user. The key columns are determined by column definitions specified for the link. In summary, the purpose of this plug-in is to eliminate the need for the ODBC stage in order to access IBM DB2 data by providing native capabilities for the following: v Reading and writing data (DML) v Creating and dropping tables (DDL) v Importing table and column definitions (metadata) v Browsing native data with the custom IBM DB2 property editor
Provides a custom user interface for editing the IBM DB2 plug-in properties. Uses stored procedures. Supports NLS (National Language Support). Allows data browsing through the custom property editor. You can use the custom GUI for the plug-in to view sample native table data residing on the target IBM DB2 database. v Supports reject row handling. v v v v The following functionality is not supported: v Bulk loading of IBM DB2 tables from stream input. Although vast amounts of data be read into an IBM DB2 database by using this plug-in, the stream input links are not designed for performance-critical loading. You should use the DB2 UDB Load stage for this purpose. v Replacing the ODBC stage. The IBM DB2 API stage does not replace the ODBC stage. The ODBC stage will continue to exist for access to data for which WebSphere DataStage does not provide a native interface. Users who created jobs by using the ODBC stage to access an IBM DB2 database can continue to run these jobs. v The large object family of IBM DB2 data types (BLOB and DBCLOB).
v Stage. This page displays the name of the stage you are editing. The General tab defines the IBM DB2 database server name, login information, and transaction isolation level information for concurrency control in jobs. You can describe the purpose of the stage in the Description field. The properties on this page define the connection to the data source. For details, see Connecting to an IBM DB2 Data Source. The NLS tab defines a character set map to be used with the stage. This tab appears only if you have installed NLS for WebSphere DataStage. For details, see Defining Character Set Mapping . v Input. This page is displayed only if you have an input link to this stage. It specifies the SQL table to use and the associated column definitions for each data input link. It also specifies how data is written and contains the SQL statement or call syntax used to write data to a table. It also specifies how to create the target table if desired and how to drop it if necessary. v Output. This page is displayed only if you have an output or reference output link to this stage. It specifies the SQL tables to use and the associated column definitions for each data output link. It contains the SQL SELECT statement or call syntax used to read data from one or more tables or views.
necessary concurrency control between transactions in the job and other transactions. You cannot edit this field, which is required. 5. Optionally, describe the purpose of the IBM DB2 API stage in the Description field
General Tab
Use this tab to indicate how the SQL statements are created from an Input link on the DB2 UDB API stage.
This tab is displayed by default. It contains the following fields: v Query Type. Determines how the SQL statements are created. Options are Use SQL Builder tool. Causes the SQL Builder button and the Update action property to appear. This is the default value for new jobs. Generate Update action from Options and Columns tabs. Causes the Update action property to appear. Uses values from the Options and Columns tabs and from Update action to generate the SQL. Enter custom SQL statement. Writes the data using a user-defined SQL statement, which overrides the default SQL statement generated by the stage. If you choose this option, you enter the SQL statement on the SQL tab. v SQL Builder. Causes SQL Builder to open. v Update action. Specifies which stage-generated SQL statements are used to update the target table. Some update actions require key columns to update or delete rows. The default is Insert rows without clearing. Choose one of the following options: Insert rows without clearing. Inserts the new rows in the table. When you click SQL Button, the Insert page opens. Clear the table, then insert rows. Deletes the contents of the table before inserting the new rows. When you click SQL Button, the Insert page opens. Delete existing rows only. Deletes existing rows in the target file that have identical keys in the input rows. When you click SQL Button, the Delete page opens. Replace existing rows completely. Deletes the existing rows, then adds the new rows to the table. When you click SQL Button, the Delete page opens. However, you must also complete an Insert page to accomplish the replace. Update existing rows only. Updates the existing data rows. Any rows in the data that do not exist in the table are ignored. When you click SQL Button, the Update page opens. Update existing or insert new rows. Updates the existing data rows before inserting new rows. Performance depends on the contents of the target table and the rows being processed in the job. If most rows exist in the target table, it is faster to update first. When you click SQL Button, the Update page opens. However, you must also complete an Insert page to accomplish the replace. Insert new or update existing rows. Inserts the new rows before updating existing rows. Performance depends on the contents of the target table and the rows being processed in the job. If most rows do not exist in the target table, it is faster to insert first. When you click SQL Button, the Insert page opens. However you must also complete an Update page to accomplish the update. Note: If using Update existing or insert new rows or Insert new or update existing rows for the update action, Array size, located on the Options tab, must be 1. (Otherwise, a warning is logged and the stage automatically sets it to 1.)
Options tab
Use the Options tab to create or drop tables and to specify miscellaneous link options. v Table name. It is the name of the target table to update. You must specify the target table. There is no default. You can also click the ... button at the right of Table name to browse the Repository to select the table.
Chapter 1. DB2 API stage
v Create table action. Choose one of the following options to create the target table in the specified database: Do not create target table. Specifies that the target table is not created, and the Drop table action field and the Table Properties button (at the right of the field) are disabled. If the target table does not exist when you run a job, the job aborts. Generate DDL. Specifies that the stage generates the CREATE TABLE statement using information obtained from the Table name field, the column definitions grid, and the advanced table properties (see the description for the Table Properties button later in this section). If the target table already exists, the job aborts. User-defined DDL. Specifies that you enter the appropriate CREATE TABLE statement on the SQL tab. You can customize the stage-generated DDL that the stage provides as a template. If the target table already exists, the job aborts. v Drop table action. Lets you control the dropping of the target table before it is created by the stage. If you choose not to create the target table, this field is disabled. Choose one of the following options: Do not drop target. Specifies that the target table is not dropped. Generate DDL. Specifies that the stage generates DDL based on the value of the Table name field. If the target table does not exist, a warning is logged. The job does not abort. User-defined DDL. Specifies that you should define the DDL to drop the target table. You can customize the stage-generated DDL that the stage provides as a template. If the target table does not exist, a warning is logged. The job does not abort. v Table Properties button. Click the button at the right of the Drop table action list box to display the Create Table Properties dialog box. (This button is enabled when you select Generate DDL or User-defined DDL from the Create table action list box.) You can then specify the following advanced table properties from this dialog box. Tablespace. Specifies an existing tablespace name. The new table is created in this tablespace. If you omit the name, the table is created in the default tablespace as defined by the database. Partitioning Key. Specifies the columns to use for partitioning the data for a table in a multi partitioned node group. If you omit this field and the table resides in a multi partitioned node group, the table is partitioned using the default partitioning rules as defined by the database. v Array size. The input parameter array size. The default is 50 rows that are cached before being written to the database. The array size value should be an integer greater than or equal to 1. If a table that is being updated is also being used for reference lookups, Array size must be 1 so the updates can be referenced. v Transaction size. The number of rows that the stage processes before committing a transaction to the database. The transaction size should always be a multiple of the array size. The default is 100. The transaction size should be an integer greater than or equal to 0. A value of 0 means that the transaction will not be committed until all rows have been processed. Related concepts DB2 UDB API stage
Columns Tab
This tab contains the column definitions for the data written to the table or file. The Columns tab behaves the same way as the Columns tab in the ODBC stage.
SQL Tab
This tab contains the following tabs. Use these tabs to display the stage-generated SQL statement and the SQL statement that you can enter. v Query. This tab is displayed by default. It is similar to the General tab, but it contains the SQL statements that are used to write data to Oracle. It is based on the current values of the stage and link properties. You cannot edit these statements unless Query type is set to Enter custom SQL statement or Load SQL from a file at run time. v Before. This tab contains the SQL statements executed before the stage processes any job data rows. The elements on this tab correspond to the Before SQL and Continue if Before SQL fails grid properties. The Before and After tabs look alike. The Continue if Before SQL fails property is represented by a check box and the SQL statement is entered in an edit box that you can resize. v After. This tab contains the SQL statements executed after the stage processes job data rows. The elements on this tab correspond to the After SQL and Continue if After SQL fails grid properties. The Before and After tabs look alike. The Continue if After SQL fails property is represented by a check box and the SQL statement is entered in a an edit box that you can resize. v Generated DDL. Select Generate DDL or User-defined DDL from the Create table action field on the General tab to enable this tab. The CREATE statement field displays the non-editable CREATE TABLE statement that is generated from the column metadata definitions and the information provided on the Create Table Properties dialog box. If you select an option other than Do not drop target table from the Drop table action list, the DROP statement field displays the generated DROP TABLE statement for dropping the target table. v User-defined DDL. Select User-defined DDL from the Create table action or Drop table action field on the General tab to enable this tab. The generated DDL statement is displayed as a starting point from which you can define a CREATE TABLE and a DROP TABLE statement. The DROP statement field is disabled if User-defined DDL is not selected from the Drop table action field. If Do not drop target is selected, the DROP statement field is empty in the Generated DDL and User-defined DDL tabs. Note: Once you modify the user-defined DDL statement from the original generated DDL statement, changes made to other table-related properties do not affect the user-defined DDL statement. If, for example, you add a new column in the column grid after modifying the user-defined DDL statement, the new column appears in the generated DDL statement but does not appear in the user-defined DDL statement. You must ensure that the user-defined SQL results in the creation or dropping of the correct target table.
General Tab
This tab is displayed by default. It provides the type of query and, where appropriate, a button to open an associated dialog box. The General tab contains the following field: v Query type. Displays the following options. Use SQL Builder Tool. Specifies that the SQL statement is built using the SQL Builder graphical interface. When this option is selected, the SQL Builder button appears. If you click SQL Builder, the SQL Builder opens. See Designer Client Guide for a complete description of the SQL Builder. This is the default setting. Generate SELECT clause from column list; enter other clauses. Specifies that WebSphere DataStage generates the SELECT clause based on the columns you select on the Columns tab. When this option is selected, the SQL Clauses button appears. If you click SQL Clauses, the SQL Clauses dialog box appears. Use this dialog box to refine the SQL statement. Enter custom SQL statement. Specifies that a custom SQL statement is built using the SQL tab. See SQL Tab . Load SQL from a file at run time. Specifies that the data is extracted using the SQL query in the path name of the designated file that exists on the server. Enter the path name for this file instead of the text for the query. With this choice, you can edit the SQL statements. v Description. Lets you enter an optional description of the output link.
Columns Tab
This tab contains the column definitions for the data being output on the chosen link. Enter the appropriate table name in the Description field on output links to qualify column references. Do this if any ambiguity exists as to which table the indicated columns belong. The column definitions for reference links require a key field. Key fields join reference inputs to a Transformer stage. The key reads the data by using a WHERE clause in the SQL SELECT statement.
SQL Tab
This tab displays the stage-generated or user-defined SQL statements or stored procedure call syntax used to read data from a table. It contains the Query, Before, and After tabs. v Query. This tab is read-only if you select Use SQL Builder tool or Generate SELECT clause from column list; enter other clauses for Query Type. If Query Type is Enter Custom SQL statement, this tab contains the SQL statements executed to read data from a table. The GUI displays the stage-generated SQL statement on this tab as a starting point. However, you can enter any valid, appropriate SQL statement. If Query Type is Load SQL from a file at run time, enter the path name of the file. v Before. This tab contains the SQL statements executed before the stage processes any job data rows. v After. This tab contains the SQL statements executed after the stage processes all job data rows.
Mapping Data Types from WebSphere DataStage SQL to IBM DB2 SQL
When the Create Table property is set to Yes for input links, the target table is created using the column definitions for the input link and the specific input link properties defining the target tables properties. In some cases, there is no exact translation between a IBM DB2 data type and a WebSphere DataStage data type, for example, GRAPHIC. The following table shows the IBM DB2 data types that are generated from the corresponding WebSphere DataStage types:
Table 1. WebSphere DataStage data types and corresponding IBM DB2 data types WebSphere DataStage SQL Data Type SQL_BIGINT SQL_BINARY SQL_BIT SQL_CHAR SQL_DATE SQL_DECIMAL SQL_DOUBLE SQL_FLOAT SQL_INTEGER SQL_LONGVARBINARY SQL_LONGVARCHAR SQL_LONGVARCHAR SQL_NUMERIC SQL_REAL SQL_SMALLINT SQL_TIME SQL_TIMESTAMP SQL_TINYINT SQL_VARBINARY SQL_VARCHAR IBM DB2 SQL Data Type BIGINT CHAR FOR BIT DATA Unsupported CHAR DATE DECIMAL DOUBLE PRECISION FLOAT INTEGER LONG VARCHAR FOR BIT DATA LONG VARCHAR CLOB (see note below) DECIMAL REAL SMALLINT TIME TIMESTAMP SMALLINT VARCHAR FOR BIT DATA VARCHAR
Note: The DB2 UDB API stage supports the CLOB data type by mapping the LONGVARCHAR data type with a precision greater than 32 K to IBM DB2s CLOB data type. To work with a CLOB column definition, choose WebSphere DataStages LONGVARCHAR as the columns data type and provide a Length of more than 32 K in the Columns tab. If the Length is less than or equal to 32 K, WebSphere DataStages LONGVARCHAR maps to LONGVARCHAR.
Mapping Data Types from IBM DB2 SQL to WebSphere DataStage SQL
Conversely, when the DB2 UDB API stage imports metadata definitions from a database, it must perform a mapping of the database SQL data types to the SQL data types supported by WebSphere DataStage. The following table describes the mapping between the IBM DB2 SQL data types and the WebSphere DataStage SQL data types:
Table 2. IBM DB2 data types and corresponding WebSphere DataStage data types IBM DB2 SQL Data Type BIGINT CHAR CHAR FOR BIT DATA DATE DECIMAL DOUBLE PRECISION FLOAT GRAPHIC INTEGER LONG VARCHAR LONG VARCHAR FOR BIT DATA LONG VARGRAPHIC NUMERIC REAL SMALLINT TIME TIMESTAMP VARCHAR VARCHAR FOR BIT DATA BLOB and LOCATOR CLOB and LOCATOR DBCLOB and LOCATOR WebSphere DataStage SQL Data Type SQL_BIGINT SQL_CHAR SQL_BINARY SQL_DATE SQL_DECIMAL SQL_DOUBLE SQL_FLOAT SQL_CHAR SQL_INTEGER SQL_LONGVARCHAR SQL_LONGVARBINARY SQL_LONGVARCHAR SQL_NUMERIC SQL_REAL SQL_SMALLINT SQL_TIME SQL_TIMESTAMP SQL_VARCHAR SQL_VARBINARY Unsupported SQL_LONGVARCHAR (see note below) Unsupported
Note: The DB2 UDB API stage supports the CLOB data type by mapping the LONGVARCHAR data type with a precision greater than 32 K to IBM DB2s CLOB data type. To work with a CLOB column definition, choose WebSphere DataStages LONGVARCHAR as the columns data type and provide a Length of more than 32 K in the Columns tab. If the Length is less than or equal to 32 K, WebSphere DataStages LONGVARCHAR maps to LONGVARCHAR.
10
Particularly note the key in this statement ($A#) is specified by using the external name.
Troubleshooting
If your source data is defined correctly, rows are properly inserted in a target table. However, under certain conditions rows might not be inserted in a target table. IBM DB2 rejects the remainder of the row batch following a bad row when the following three conditions occur: v The Array Size property exceeds 1. v The defined string lengths of source data exceed the defined length of its target column. v The source data contains a row with a character string that exceeds the length of the target column.
Chapter 1. DB2 API stage
11
Example
Suppose the target table defines a column as CHAR(5), the WebSphere DataStage metadata for this column is defined as CHAR(10), and the source data contains the following rows: ABC ABCD ABCDEFG (Longer than 5 characters) AB ABCD The last three rows are not inserted into the target table when the Array Size property is set to 5. IBM DB2 reports that all three rows contained values that were too large (IBM DB2 error SQL0302N). Additionally, using BIGINT for source data that contains out-of-range values causes similar behavior.
Solutions
Define the WebSphere DataStage metadata correctly to match the IBM DB2 target, and ensure that any BIGINT source data is within BIGINT range. Otherwise, it might be safer to run the job with Array Size set to 1. However, this can impact performance. Another solution is to use a Transformer stage to scrub the data before sending it to the DB2 UDB API stage. This method also impacts performance. Note: This behavior does not occur for rows rejected by the database for reasons such as constraint violations or non-null violations. The remaining rows in a batch are not rejected.
12
Overview
The DB2/UDB enterprise stage is a database stage. By using DB2/UDB enterprise stage, you can read data from and write data to an IBM DB2 database. You can use the stage in conjunction with a lookup stage to access a lookup table hosted by an IBM DB2 database. See the Parallel Job Developer Guide. IBM DB2 databases distribute data in multiple partitions. DB2/UDB enterprise stage can match the partitioning when reading data from or writing data to an IBM DB2 database. Depending upon the properties that you set for the DB2/UDB enterprise stage, the stage can have: v One input link for the load and write methods and v One output link for the write method or a reference output link. The lookup stage uses the reference output link when referring to a IBM DB2 lookup table. Alternatively DB2/UDB enterprise stage can have a single output reject link (in conjunction with an input link). By using DB2/UDB enterprise stage, you can perform the following operations: v Writing data to an IBM DB2 table (by using INSERT). v Updating an IBM DB2 table (by using INSERT and/or UPDATE as appropriate), by using DB2 command line interface (CLI) to enhance performance. v Loading data to an IBM DB2 table (by using DB2 fast loader). (Note loading is not supported by mainframe DB2 databases.) v Reading data from an IBM DB2 table. v Deleting rows from an IBM DB2 table. v Performing a lookup operation directly on an IBM DB2 table. v Loading an IBM DB2 table into memory and then perform a lookup on that IBM DB2 table. When using a DB2/UDB enterprise stage as a source for lookup data, there are special considerations about column naming. If you have columns of the same name in both the source and lookup data sets, the source data set column will go to the output data. If you want this column to be replaced by the column from the lookup data source, you need to drop the source data column before you perform the lookup (you could, for example, use a Modify stage to do this). See the topic on Merge stages in the Parallel Job Advanced Developer Guide for more details about performing lookup operations.
13
To edit a DB2/UDB enterprise stage, you use the stage editor. To learn about the stage editor in detail, the Parallel Job Developer Guide.
14
where db_name is the name of the IBM DB2 database and user_name is your WebSphere DataStage login user name. If you specify the message file property in conjunction with LOAD method, the database instance must have read or write privilege on that file. The location of the log file for the LOAD operation messages is exactly as defined in the APT_CONFIG_FILE. Your PATH should include $DB2_HOME/bin, for example, /opt/IBMdb2/V8.1/bin. The LIBPATH should include $DB2_HOME/lib before any other lib statements, for example, /opt/IBMdb2/V8.1/lib. The following IBM DB2 environment variables set the runtime characteristics of your system: v DB2INSTANCE specifies the user name of the owner of the IBM DB2 instance. IBM DB2 uses DB2INSTANCE to determine the location of db2nodes.cfg. For example, if you set DB2INSTANCE to Mary, the location of db2nodes.cfg is ~Mary/sqllib/db2nodes.cfg. v DB2DBDFT specifies the name of the IBM DB2 database that you want to access from your DB2/UDB enterprise stage. There are two other methods of specifying the IBM DB2 database: 1. The override database property of the DB2/UDB enterprise stage Input or Output link. 2. The APT_DBNAME environment variable (this takes precedence over DB2DBDFT). You should normally use the input property Row Commit Interval to specify the number of records to insert into a table between commits (see the Row Commit Interval section under Options category). Previously the environment variable APT_RDBMS_COMMIT_ROWS was used for this, and this is still available for backwards compatibility. You can set this environment variable to any value between 1 and (231 - 1) to specify the number of records. The default value is 2000. If you set APT_RDBMS_COMMIT_ROWS to 0, a negative number, or an invalid value, a warning is issued and each partition commits only once after the last insertion. If you set APT_RDBMS_COMMIT_ROWS to a small value, you force IBM DB2 to perform frequent commits. Therefore, if your program terminates unexpectedly, your data set can still contain partial results that you can use. However, the high frequency of commits might affect performance of DB2/UDB enterprise stage. If you set a large value for APT_RDBMS_COMMIT_ROWS, DB2 must log a correspondingly large amount of rollback information. This, too, might slow your application.
Remote connection
You can also connect from a DB2/UDB enterprise stage to a remote IBM DB2 Server. The connection is made via an IBM DB2 client. In order to remotely connect from an IBM DB2 client to an IBM DB2 server, the IBM DB2 client should be located on the same machine as the WebSphere DataStage server. Both IBM DB2 client and IBM DB2 server need to be configured for remote connection communication (see your IBM DB2 Database Administrator). The WebSphere DataStage configuration file needs to contain the node on which WebSphere DataStage and the IBM DB2 client are installed and the nodes of the
Chapter 2. DB2/UDB Enterprise stage
15
remote computer where the IBM DB2 server is installed. See the topic about the parallel engine configuration file in the Parallel Job Developer Guide. On the DB2/UDB enterprise stage in your parallel job, you need to set the following properties: v Client Instance Name. Set this to the IBM DB2 client instance name. If you set this property, WebSphere DataStage assumes you require remote connection. v Server. Optionally set this to the instance name of the IBM DB2 server. Otherwise use the IBM DB2 environment variable, DB2INSTANCE, to identify the instance name of the IBM DB2 server. v Client Alias DB Name. Set this to the IBM DB2 clients alias database name for the remote IBM DB2 server database. This is required only if the clients alias is different from the actual name of the remote server database. v Database. Optionally set this to the remote server database name. Otherwise use the environment variables APT_DBNAME or APT_DB2DBDFT to identify the database. v User. Enter the user name for connecting to IBM DB2. This is required for a remote connection. v Password. Enter the password for connecting to IBM DB2. This is required for a remote connection. You can use the remote connection facilities available in WebSphere DataStage to connect to a different IBM DB2 server within the same job. You could, for example, read from an IBM DB2 database on one server, use this data to access a lookup table on another IBM DB2 server, and then write any rejected rows to a third IBM DB2 server. Each database would be accessed by a different stage in the job with the Client Instance Name and Server properties set appropriately.
16
Generally, in the DB2/UDB enterprise stage, you enter external names everywhere except when referring to stage column names, where you use names in the form ORCHESTRATE.internal_name. When using the DB2/UDB enterprise stage as a target, you should enter external names as follows: v For Write and Load options, use external names for select list properties. v For Upsert option, for update and insert, use external names when referring to IBM DB2 table column names, and internal names when referring to the stage column names. For example:
INSERT INTO tablename ($A#, ##B$) VALUES (ORCHESTRATE.__036__A__035__, ORCHESTRATE.__035__035__B__036__) UPDATE tablename SET ##B$ = ORCHESTRATE.__035__035__B__036__ WHERE ($A# = ORCHESTRATE.__036__A__035__)
When using the DB2/UDB enterprise stage as a source, you should enter external names as follows: v For Read using the user-defined SQL method, use external names for IBM DB2 columns for SELECT: For example:
SELECT #M$, #D$ FROM tablename WHERE (#M$ > 5)
v For Read using Table method, use external names in select list and where properties. When using the DB2/UDB enterprise stage in parallel jobs as a lookup, you should enter external or internal names as follows: v For Lookups using the user-defined SQL method, use external names for IBM DB2 columns for SELECT, and for IBM DB2 columns in any WHERE clause you might add. Use internal names when referring to the stage column names in the WHERE clause. For example:
SELECT #M$, #D$ FROM tablename WHERE (#B$ = ORCHESTRATE.__035__ B __036__)
v For Lookups using the Table method, use external names in select list and where properties. v Use internal names for the key option on the Input page Properties tab of the Lookup stage to which the DB2/UDB enterprise stage is attached.
17
use the DB2/UDB enterprise stage to update or query the table, you must use the Pad Character property with the value of a space in order to produce the correct results. When you insert rows and subsequently update or query them by using the DB2/UDB enterprise stage, you do not need to specify the Pad Character property. The stage automatically pads with null terminators, and the default pad character for the stage is the null terminator.
fixed-length string in the CHAR(n) form string[n] and ustring[n]; where n is the string length length <= 254 bytes fixed-length string in the VARCHAR(n) form string[n] and ustring[n]; where n is the string length length < 32672 for Varchar length < 32700 for LongVarchar
LongVarChar VarChar
variable-length string, in the form string[max=n] and ustring[max=n]; maximum length <= 4000 bytes variable-length string in the form string and ustring
string and ustring, 4000 bytes Not supported < length ustring [n] ustring [max = n] GRAPHIC VARGRAPHIC
18
Table 3. Data type conversion - writing data to IBM DB2 Databases (continued) WebSphere DataStage SQL Data Type LongNVarChar Underlying Data Type ustring [max = n] IBM DB2 Data Type LONGVARGRAPHIC
Note: The default length of VARCHAR is 32 bytes. That is, 32 bytes are allocated for each variable-length string field in the input data set. If an input variable-length string field is longer than 32 bytes, the stage issues a warning.
19
Table 4. Data type conversion - reading data from IBM DB2 Databases (continued) WebSphere DataStage SQL Data Type Unknown Char LongVarChar VarChar NChar NVarChar LongNVarChar Unknown Char LongVarChar VarChar NChar NVarChar LongNVarChar Unknown Char LongVarChar VarChar NChar NVarChar LongNVarChar NChar NVarChar LongNVarChar Underlying Data Type string[n] or ustring[n] IBM DB2 Data Type GRAPHIC(n)
string[max = n] or ustring[max = n]
VARGRAPHIC(n)
string[max = n] or ustring[max = n]
VARCHAR(n)
Examples
These examples show how to perform a lookup operation and how to update data in an IBM DB2 table.
The table below shows the data in the IBM DB2 lookup table:
Table 6. Example of a lookup operation - Table 2 accountType InterestRate
20
Table 6. Example of a lookup operation - Table 2 (continued) bronze silver gold plat flexi fixterm 1.25 1.50 1.75 2.00 1.88 3.00
The job looks like the jobs illustrated under the Overview section. When you edit a DB2/UDB enterprise stage, the Data_set stage provides the primary input, DB2_lookup_table provides the lookup data, Lookup_1 performs the lookup and outputs the resulting data to Data_Set_3. In the IBM DB2 database stage we specify that we are going to look up the data directly in the IBM DB2 database, and the name of the table we are going to look up. In the Lookup stage we specify the column that we are using as the key for the lookup.
You are going to specify upsert as the write method and choose User-defined Update and Insert as the upsert mode, this is so that we do not include the existing name column in the INSERT statement. The properties (showing the INSERT statement) are shown below. The INSERT statement is as generated by WebSphere DataStage, except the name column is removed. The UPDATE statement is as automatically generated by WebSphere DataStage:
21
Must Dos
WebSphere DataStage has many defaults which means that it can be very easy to include DB2/UDB enterprise stages in a job. This section specifies the minimum steps to take to get a DB2/UDB enterprise stage functioning. WebSphere DataStage provides a versatile user interface, and there are many shortcuts to achieving a particular end. This section describes the basic method. You will learn where the shortcuts are when you get familiar with the product.
22
If you want to send rejected rows down a rejects link, set Output Rejects to True (it is false by default).
23
v Connect the DB2/UDB enterprise stage to a lookup stage by using a reference link. v In the Output Link Properties tab: Set the Lookup Type to Sparse. Choose a Read Method. This is Table by default, which reads directly from a table. However, you can also choose to read by using auto-generated SQL or user-generated SQL. Specify the table to be read for the lookup. If using a Read Method of user-generated SQL, specify the SELECT SQL statement to use. WebSphere DataStage provides the auto-generated statement as a basis, which you can edit as required. You would use this if, for example, you wanted to perform a non-equality based lookup. If you are not using environment variables to specify the server and database (as described in the Accessing IBM DB2 Databases section), set Use Database Environment Variable and Use Server Environment Variable to False, and supply values for the Database and Server properties. v Ensure that column metadata has been specified for the lookup.
Stage page
The General tab allows you to specify an optional description of the DB2/UDB enterprise stage. The Advanced tab allows you to specify how the stage executes. The NLS Map tab appears if you have NLS enabled on your system, it allows you to specify a character set map for the stage.
Advanced tab
This tab allows you to specify the following: v Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the data is processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced tab. In sequential mode the entire write is processed by the conductor node. v Combinability mode. This is Auto by default, which allows WebSphere DataStage to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage. v Preserve partitioning. You can select Set or Clear. If you select Set file read operations will request that the next stage preserves the partitioning as is. The Preserve partitioning option does not appear if your stage only has an input link. v Node pool and resource constraints. Select this option to constrain parallel execution to the node pool or pools and/or resource pool or pools specified in the grid. The grid allows you to make choices from drop down lists populated from the Configuration file.
24
v Node map constraint. Select this option to constrain parallel execution to the nodes in a defined node map. You can define a node map by typing node numbers into the text box or by clicking the browse button to open the Available Nodes dialog box and selecting nodes from there. You are effectively defining a new node pool for this stage (in addition to any node pools defined in the Configuration file). Note: The Stage page is blank if you are using the stage to perform a lookup directly on an IBM DB2 table, that is, operating in sparse mode.
Input page
The Input page allows you to specify details about how the DB2/UDB enterprise stage writes data to an IBM DB2 database. The DB2/UDB enterprise stage can have only one input link writing to one table. The General tab allows you to specify an optional description of the input link. The Properties tab allows you to specify details of exactly what the link does. The Partitioning tab allows you to specify how incoming data is partitioned before being written to the database. The Columns tab specifies the column definitions of incoming data. The Advanced tab allows you to change the default buffering settings for the input link. Details about DB2/UDB enterprise stage properties, partitioning, and formatting are given in the following sections. See the topic about stage editors in the Parallel Job Developer Guide for a general description of the other tabs.
25
Table 9. Input link properties and values (continued) Category and Property Target/Delete SQL Target/Upsert Mode Values String Default N/A Required? Yes if Write method = Delete Rows Yes if Write method = Upsert Dependent of N/A
Auto-generated Update & Insert/ Auto-generated Update Only/User-defined Update & Insert/User-defined Update Only String
N/A
N/A
Yes if Write method = Upsert Yes if Write method = Upsert Yes Yes
N/A
String
N/A
N/A
Load Append
N/A N/A
Connection/Use True/False Default Database Connection/Use True/False Default Server Connection/ Database string
True
Yes
N/A
True N/A
Yes Yes (if Use Database environment variable = False) Yes (if Use Server environment variable = False) No
N/A N/A
Connection/ Server
string
N/A
N/A
string
N/A
N/A
number
2000
Yes (if Write Method = Delete) Yes (if Write Method = Upsert) No No
N/A
Options/Output True/False Rejects Options/Row number Commit Interval Options/Time number Commit Interval
False
N/A
N/A N/A
26
Table 9. Input link properties and values (continued) Category and Property Values Default False Required? Yes Dependent of N/A
Options/Silently True/False Drop Columns Not in Table Options/ Truncate Column Names Options/ Truncation Length Options/Close Command True/False
False
Yes
N/A
number
18
No
string
No No No Yes (if Write Method = Load) Yes (if Write Method = Load) No No No
Options/Use True/False ASCII Delimited Format Options/ Cleanup on Failure Options/ Message File Options/DB Options Options/ Nonrecoverable Transactions Options/Pad Character Options/ Exception Table Options/ Statistics True/False
False
N/A
string string stats_none/ stats_exttable_only/ stats_extindex_only/ stats_index/stats_table/ stats_extindex_table/ stats_all/ stats_both number
No No No
No
True/False
True
No
27
Target category
Under Target category, you can set properties for the database table to write data to.
Table
Specify the name of the table to write to. You can specify a job parameter if required.
Delete SQL
Only appears for the Delete Rows write method. This property allows you to view an auto-generated Delete statement, or to specify your own, depending on the setting of the Delete Rows Mode property.
Upsert Mode
This only appears for the Upsert write method. Allows you to specify how the insert and update statements are to be derived. Choose from: v Auto-generated Update & Insert. WebSphere DataStage generates update and insert statements for you, based on the values you have supplied for table name and on column details. The statements can be viewed by selecting the Insert SQL or Update SQL properties. v Auto-generated Update Only. WebSphere DataStage generates an update statement for you, based on the values you have supplied for table name and on column details. The statement can be viewed by selecting the Update SQL properties. v User-defined Update & Insert. Select this to enter your own update and insert statements. Then select the Insert SQL and Update SQL properties and edit the default statements. v User-defined Update Only. Select this to enter your own update statement. Then select the Update SQL property and edit the default statement.
Insert SQL
Only appears for the Upsert write method. This property allows you to view an auto-generated Insert statement, or to specify your own (depending on the setting of the Update Mode property).
Update SQL
This property appears only for the Upsert write method. This property allows you to view an auto-generated Update statement, or to specify your own, depending on the setting of the Update Mode property.
28
Write Method
Choose from Delete Rows, Write, Upsert, or Load. Load is the default Write method. Load takes advantage of fast DB2 loader technology for writing data to the database. Note that loading is not supported by DB2 databases on a mainframe computer. Upsert uses Insert and Update SQL statements to write to the database. Upsert is not available when you are using the DB2 UDB Load stage on a USS system.
Write Mode
Select from the following: v Append. This is the default. New records are appended to an existing table. v Create. Select this option to create a new table. If the IBM DB2 table already exists an error occurs and the job terminates. You must specify this mode if the IBM DB2 table does not exist. v Replace. The existing table is first dropped and an entirely new table is created in its place. IBM DB2 uses the default partitioning method for the new table. Note that you cannot create or replace a table that has primary keys. You should not specify primary keys in your metadata. v Truncate. The existing table attributes (including schema) and the IBM DB2 partitioning keys are retained, but any existing records are discarded. New records are then appended to the table.
Connection category
Under the Connection category, you can set properties for the database connection.
Server
Optionally specify the IBM DB2 instance name for the table. This property appears if you set Use Server Environment Variable property to False.
Database
Optionally specify the name of the IBM DB2 database to access. This property appears if you set Use Database Environment Variable property to False.
29
Note: Connection details are normally specified by environment variables as described in the Accessing IBM DB2 databases section. If you are specifying a remote connection, when you fill in the client instance name, user and password fields appear and allow you to specify these for connection to the remote server.
Options category
Under the Options category, you can set additional properties for the job that you are creating.
Array Size
This is only available for Write Methods of Delete and Upsert, and is optional for Upsert. You specify the size the insert/delete host array. It defaults to 2000, but you can enter 1 if you want each insert/delete statement to be executed individually.
Output Rejects
This appears for the Upsert Write Method. It specifies how to handle rows that fail to be inserted. Choose True to send them down a reject link, or False to drop them. A state field is added to each rejected row. This field contains a five-letter SQL code that identifies the reason that the record was rejected.
30
Close Command
This is an optional property. Use it to specify any command to be parsed and executed by the IBM DB2 database on all processing nodes after the stage finishes processing the IBM DB2 table. You can specify a job parameter if required.
Open Command
This is an optional property. Use it to specify any command to be parsed and executed by the IBM DB2 database on all processing nodes before the IBM DB2 table is opened. You can specify a job parameter if required.
Cleanup on Failure
This property appears only if Write Method is set to Load. Specify this option to deal with failures during stage execution that leave the table space being loaded in an inaccessible state. The cleanup procedure neither inserts data into the table nor deletes data from it. You must delete rows that were inserted by the failed execution either through the IBM DB2 command-level interpreter or by using the stage subsequently using the replace or truncate write modes.
Message File
This property only appears if Write Method is set to Load. Specify the file where the IBM DB2 loader writes diagnostic messages. The database instance must have read/write privilege to the file.
31
DB Options
This appears only if Write Method is set to load and Write Mode is set to Create or Replace. Specify an optional table space or partitioning key to be used by IBM DB2 to create the table. By default, WebSphere DataStage creates the table on all processing nodes in the default table space and uses the first column in the table, corresponding to the first field in the input data set, as the partitioning key. You specify arguments as a string enclosed in braces in the form:
{tablespace=t_space,[key=col0,...]}
Nonrecoverable Transactions
This option appears only if Write Method is set to Load. It is False by default. If set to True, it indicates that your load transaction is marked as nonrecoverable. It will not be possible to recover your transaction with a subsequent roll forward action. The roll forward utility will skip the transaction, and will mark the table into which data was being loaded as invalid. The utility will also ignore any subsequent transactions against the table. After a roll forward is completed, the table can only be dropped. Table spaces are not put in a backup pending state following the load operation, and a copy of the loaded data is not made during the load operation.
Pad Character
This appears for a Write Method of Upsert or Delete Rows. It specifies the padding character to be used in the construction of a WHERE clause when it contains string columns that have a length less than the DB2 char column in the database. It defaults to null. (See the section titled Using the Pad Character property)
Exception Table
This property appears only if Write Method is set to Load. It allows you to specify the name of a table where rows that violate load table constraints are inserted. The table needs to have been created in the IBM DB2 database. The exception table cannot be used when the Write Mode is set to create or replace.
Statistics
This property appears only if Write Method is set to Load. It allows you to specify which statistics should be generated upon load completion, as part of the loading process IBM DB2 will collect the requisite statistics for table optimization. This option is only valid for a Write Mode of truncate, it is ignored otherwise.
32
This only appears if Number of Processes per Node is set to a value greater than 1. If set true, it specifies that the loading of every node can be arbitrary, leading to a potential performance gain.
USS options
If you are designing jobs within a USS deployment project (See the topic about parallel jobs on USS in the Parallel Job Developer Guide. The properties available under the Connection and Options categories are different, and there is an extra category: MVS Datasets. The following table describes the properties available for these categories; see the Target category section for the properties available under the Target category.
Table 10. USS options and values Category and Property Connection/Use Default Database Connection/Server/ Database Options/Enforce Constraints Options/Keep Dictionary Options/Preformat Options/Silently Drop Columns Not in Table Options/Truncate Column Names Options/Truncation Length Options/Verbose Options/Close Command Options/Default String Length Options/Exception Table Options/Number of Processes per Node Options/Arbitrary Loading Order Options/Open Command Options/Row Estimate Values True/False string Default True N/A Required? Yes Yes (if Use Database environment variable = False) Yes (if Write Method = Load) Yes (if Write Method = Load) Yes (if Write Method = Load) Yes (if Write Method = Load or Write) Yes (if Write Method = Load or Write) No Dependent of N/A N/A
True/False number
False 18
string integer
N/A N/A
No No
33
Table 10. USS options and values (continued) Category and Property Options/Sort Device Type Options/Sort Keys Options/When Clause Options/Create Statement Options/DB Options Options/Reuse Datasets Options/Statistics Values string integer string True/False Default N/A N/A N/A False Required? No No No Dependent of N/A N/A N/A
Yes (if Write Method = N/A Load and Write Mode = Create) No N/A
string True/False
N/A False
Yes (if Write Method = N/A Load and Write Mode = Replace) No N/A
stats_none
Options/Array Size Options/Pad Character Options/Row Commit Interval Options/Time Commit Interval Options/Output Rejects
2000 null
Connection category
Under the Connection category for the Input link, you can set appropriate properties for the database connection.
Database
Optionally specify the name of the IBM DB2 database to access. This property appears if you set Use Database Environment Variable property to False.
34
Discard DSN
Specify the name of the MVS data set that stores the rejected records. It has the following subproperties: v Discard Device Type The device type that is used for the specified discard data set. v Discard Space The primary allocation space for the discard data set, specified in cylinders. v Max Discards Per Node An integer which specifies the maximum number of discarded rows to keep in a data set per node.
Error DSN
The name of the MVS data set that stores rows that could not be loaded into IBM DB2 because of an error. It has the following subproperties: v Error Device Type The device type that is used for the specified Error data set. v Error Space The primary allocation space for the error data set, specified in cylinders.
Map DSN
Specify the name of the MVS data set for mapping identifiers back to the input records that caused an error. This property has the following subproperties: v Map Device Type The device type that is used for the specified Map data set. v Map Space The primary allocation space for the map data set, specified in cylinders.
Work 1 DSN
Specify the name of the MVS data set for sorting input. This property has the following subproperties: v Work 1 Device Type The device type that is used for the specified Work 1 data set. v Work 1 Space The primary allocation space for the Work 1 data set, specified in cylinders.
Work 2 DSN
Specify the name of the MVS data set for sorting output. This property has the following subproperties: v Work 2 Device Type The device type that is used for the specified Work 2 data set. v Work 2 Space The primary allocation space for the Work 2 data set, specified in cylinders.
Chapter 2. DB2/UDB Enterprise stage
35
Options category
Under this category, you can specify additional properties for the write operation.
Enforce Constraints
This property is available only when Write Method is set to Load. If this is set to True, load will delete errant rows when encountering them, and issue a message identifying such row. This requires that: v referential constraints exist. v the input must be sorted. v a Map DSN data set must be specified under the MVS data sets category.
Keep Dictionary
This property is available only when Write Method is set to Load. If this is set to true, Load is prevented from building a new compression dictionary. This property is ignored unless the associated table space has the COMPRESS YES attribute.
Preformat
This property is available only when Write Method is set to Load. If set to True, the remaining pages are pre-formatted in the table space and its index space.
Verbose
This option is available only when Write Method is set to Load. If this is set to True, WebSphere DataStage logs all messages generated by IBM DB2 when a record is rejected because of prime key or other violations.
Close Command
This is an optional property. Use it to specify any command to be parsed and executed by the IBM DB2 database on all processing nodes after the stage finishes processing the IBM DB2 table. You can specify a job parameter if required.
36
The maximum length you can set is 4000 bytes. Note that the stage always allocates the specified number of bytes for a variable-length string. In this case, setting a value of 4000 allocates 4000 bytes for every string. Therefore, you should set the expected maximum length of your largest string and no larger.
Exception Table
This property only appears if Write Method is set to Load. It allows you to specify the name of a table where rows that violate load table constraints are inserted. The table needs to have been created in the IBM DB2 database. The exception table cannot be used when the Write Mode is set to create or replace.
Open Command
This is an optional property. Use it to specify any command to be parsed and executed by the IBM DB2 database on all processing nodes before the IBM DB2 table is opened. You can specify a job parameter if required.
Row Estimate
This option is available only when Write Method is set to Load. Specify the estimated number of rows (across all nodes) to be loaded into the database. An estimate of the required primary allocation space for storing all rows is made before load is engaged.
Sort Keys
This option is available only when Write Method is set to Load. Set this to have rows presorted according to keys, the value is an estimate of the number of index keys to be sorted. Do not use this property if table space does not have an index, has only one index, or data is already sorted according to index keys.
When Clause
This option is available only when Write Method is set to Load. Specify a WHEN clause for the load script.
37
Create Statement
This option is available only when Write Method is set to Load and Write Mode is set to Create or Replace. Specify the SQL statement to create the table.
DB Options
This option appears only if Write Method is set to load and Write Mode is set to Create or Replace. Specify an optional table space or partitioning key to be used by IBM DB2 to create the table. By default, WebSphere DataStage creates the table on all processing nodes in the default table space and uses the first column in the table, corresponding to the first field in the input data set, as the partitioning key. You specify arguments as a string enclosed in braces in the form:
{tablespace=t_space,[key=col0,...]}
Reuse Datasets
This option appears only if Write Method is set to Load and Write Mode is set to Replace. If True, IBM DB2 reuses IBM DB2 managed data sets without relocating them.
Statistics
This only appears if Write Method is set to load and Write Mode is set to Truncate. Specify which statistics should be generated upon completion of load. As a part of the loading process, IBM DB2 collects the statistics required for table access optimization. Alternatively, use the RUNSTAT utility.
Array Size
This option is available only for Write Methods of Delete and Upsert, and is optional for upsert. This specifies the size of the insert/delete host array. It defaults to 2000, but you can enter 1 if you want each insert/delete statement to be executed individually.
Pad Character
This option appears for a Write Method of Upsert or Delete Rows. It specifies the padding character to be used in the construction of a WHERE clause when it contains string columns that have a length less than the DB2 char column in the database. It defaults to null. (See the section titled Using the Pad Character property.)
38
set can still contain partial results that you can use. However, you might pay a performance penalty because of the high frequency of the commits. If you set a large value for Row Commit Interval, IBM DB2 must log a correspondingly large amount of rollback information. This, too, might slow your application.
Output Rejects
This appears for the Upsert Write Method. It specifies how to handle rows that fail to be inserted. Choose True to send them down a reject link, or False to drop them. A state field is added to each rejected row. This field contains a five-letter SQL code that identifies the reason that the record was rejected.
Partitioning tab
The Partitioning tab allows you to specify details about how the incoming data is partitioned or collected before it is written to the IBM DB2 database. It also allows you to specify that the data should be sorted before being written. By default the stage partitions in DB2 mode. This takes the partitioning method from a selected IBM DB2 database (or the one specified by the environment variables described in the Accessing IBM DB2 Databases section.) If the DB2/UDB enterprise stage is operating in sequential mode, it will first collect the data before writing it to the file using the default Auto collection method. The Partitioning tab allows you to override this default behavior. The exact operation of this tab depends on: v Whether the DB2/UDB enterprise stage is set to execute in parallel or sequential mode. v Whether the preceding stage in the job is set to execute in parallel or sequential mode. If the DB2/UDB enterprise stage is set to execute in parallel, then you can set a partitioning method by selecting from the Partition type list. This will override any current partitioning. If the DB2/UDB enterprise stage is set to execute in sequential mode, but the preceding stage is executing in parallel mode, then you can set a collection method from the Collector type list. This will override the default Auto collection method. The following partitioning methods are available: v Entire. Each file to which data is written, receives the entire data set. v Hash. The records are hashed into partitions based on the value of a key column or columns selected from the Available list. v Modulus. The records are partitioned using a modulus function on the key column selected from the Available list. This is commonly used to partition on tag columns.
Chapter 2. DB2/UDB Enterprise stage
39
v Random. The records are partitioned randomly, based on the output of a random number generator. v Round Robin. The records are partitioned on a round robin basis as they enter the stage. v Same. Preserves the partitioning already in place. v DB2. Replicates the IBM DB2 partitioning method of the specified IBM DB2 table. This is the default method for the DB2/UDB enterprise stage. v Range. Divides a data set into approximately equal size partitions based on one or more partitioning keys. Range partitioning is often a preprocessing step to performing a total sort on a data set. Requires extra properties to be set. Access these properties by clicking the properties button. The following Collection methods are available: v (Auto). This is the default collection method for DB2/UDB enterprise stage. Normally, when you are using Auto mode, WebSphere DataStage will read any row from any input partition as it becomes available. v Ordered. Reads all records from the first partition, then all records from the second partition, and so on. v Round Robin. Reads a record from the first input partition, then from the second partition, and so on. After reaching the last partition, the operator starts over. v Sort Merge. Reads records in an order based on one or more columns of the record. This requires you to select a collecting key column from the Available list. The Partitioning tab also allows you to specify that data arriving on the input link should be sorted before being written to the file or files. The sort is always carried out within data partitions. If the stage is partitioning incoming data the sort occurs after the partitioning. If the stage is collecting data, the sort occurs before the collection. The availability of sorting depends on the partitioning or collecting method chosen. It is not available with the default Auto methods. Select the check boxes as follows: v Perform Sort. Select this to specify that data coming in on the link should be sorted. Select the column or columns to sort on from the Available list. v Stable. Select this if you want to preserve previously sorted data sets. This is the default. v Unique. Select this to specify that, if multiple records have identical sorting key values, only one record is retained. If stable sort is also set, the first record is retained. If NLS is enabled an additional button opens a dialog box allowing you to select a locale specifying the collate convention for the sort. You can also specify sort direction, case sensitivity, whether sorted as ASCII or EBCDIC, and whether null columns will appear first or last for each column. Where you are using a keyed partitioning method, you can also specify whether the column is used as a key for sorting, for partitioning, or for both. Select the column in the Selected list and right-click to invoke the shortcut menu.
40
Output page
The Output page allows you to specify details about how the DB2/UDB enterprise stage reads data from an IBM DB2 database. The stage can have only one output link. Alternatively, it can have a reference output link, which is used by the Lookup stage when referring to a IBM DB2 lookup table. It can also have a reject link where rejected records are routed (used in conjunction with an input link). The General tab allows you to specify an optional description of the output link. The Properties tab allows you to specify details of exactly what the link does. The Columns tab specifies the column definitions of the data. The Advanced tab allows you to change the default buffering settings for the output link. Details about DB2/UDB enterprise stage properties are given in the following sections. See the Parallel Job Developer Guide for a general description of the other tabs.
Source/Read Method
Table
N/A
Yes (if Read N/A Method = Table) No No Yes (if Read Method = Query) Table Table N/A
41
Table 11. Output link properties and values (continued) Category and Property Values Default N/A True True N/A Required? No Yes Yes Dependent of Query N/A N/A
Source/Partition string Table Connection/Use True/False Default Database Connection/Use Default Server Connection/ Server True/False string
Yes (if Use N/A Database environment variable = False) Yes (if Use N/A Server environment variable = False) No N/A
Connection/ Database
string
N/A
string
N/A
string string
N/A N/A
No No
N/A N/A
Source category
Under the Source category, you can specify the properties of the database to read data from.
Lookup Type
Where the DB2/UDB enterprise stage is connected to a Lookup stage via a reference link, this property specifies whether the stage will provide data for an in-memory look up (Lookup Type = Normal) or whether the lookup will access the database directly (Lookup Type = Sparse). If the Lookup Type is Normal, the Lookup stage can have multiple reference links. If the Lookup Type is Sparse, the Lookup stage can only have one reference link.
Read Method
This property specifies whether you are specifying a table or a query when reading the IBM DB2 database, and how you are generating the query: v Select the Table method in order to use the Table property to specify the read. This will read in parallel mode. v Select Auto-generated SQL to have WebSphere DataStage automatically generate an SQL query based on the columns you have defined and the table you specify in the Table property. v Select User-defined SQL to define your own query. v Select SQL Builder Generated SQL to open the SQL Builder and define the query using its helpful interface. (See the topic about SQL Builder in the Designer Client Guide.
42
By default, Read methods of SQL Builder Generated SQL, Auto-generated SQL, and User-defined SQL operate sequentially on a single node. You can have the User-defined SQL read operate in parallel if you specify the Partition Table property.
Query
This property is used to contain the SQL query when you choose a Read Method of User-defined query or Auto-generated SQL. If you are using Auto-generated SQL you must select a table and specify some column definitions. An SQL statement can contain joins, views, database links, synonyms, and so on. It has the following dependent option: v Partition Table Specifies execution of the query in parallel on the processing nodes containing a partition derived from the named table. If you do not specify this, the stage executes the query sequentially on a single node.
Table
Specifies the name of the IBM DB2 table. The table must exist and you must have SELECT privileges on the table. If your IBM DB2 user name does not correspond to the owner of the specified table, you can prefix it with a table owner in the form:
table_owner.table_name
If you use a Read method of Table, then the Table property has two dependent properties: v Where clause Allows you to specify a WHERE clause of the SELECT statement to specify the rows of the table to include or exclude from the read operation. If you do not supply a WHERE clause, all rows are read. v Select List Allows you to specify an SQL select list of column names.
Connection category
Under this category, you can set appropriate properties for the server and database to connect to.
43
Server
Optionally specify the IBM DB2 instance name for the table. This property appears if you set the Use Server Environment Variable property to False. This does not appear if you are developing a job for deployment on a USS system.
Database
Optionally specify the name of the IBM DB2 database to access. This property appears if you set Use Database Environment Variable property to False.
Options category
Under this category, you can specify the close and open SQL queries, and the Pad character.
Close Command
This is an optional property. Use it to specify a command to be parsed and executed by the IBM DB2 database on all processing nodes after the stage finishes processing the IBM DB2 table. You can specify a job parameter if required.
Open Command
This is an optional property. Use it to specify a command to be parsed and executed by the IBM DB2 database on all processing nodes before the IBM DB2 table is opened. You can specify a job parameter if required.
Pad Character
This appears when you are using an IBM DB2 table as a lookup, that is, when you have set Lookup Type as Sparse. It specifies the padding character to be used in the construction of a WHERE clause when it contains string columns that have a length less than the DB2 char column in the database. It defaults to null. (See the section titled Using the Pad Character property.)
44
Load Methods
The two methods of loading data into an IBM DB2 table are the Sequential File method and the Named Pipe method. The Load Method property determines which method to use to load the data.
45
or the load is delayed. Sequential File loading is slower than the Named Pipe loading because all the rows must be written to this data file. v In immediate loading, a data file (INPDATA.DAT) is constructed, which contains the rows of data to be loaded. v In delayed loading, the following three files are constructed: INPDATA.DAT. The data file containing the rows of data to be loaded. CMD.CLP. The command file containing the Connect, Load, and Quit commands. The Load command is constructed from property values. ULOAD.BAT. The batch file that calls the command file. The data file is loaded by running this custom batch file. The advantage to using delayed loading is that you can modify the data file and command file or move them to another machine.
46
47
48
You can use the SQL from various connectivity stages that WebSphere DataStage supports. Different databases have slightly different SQL syntax (particularly when it comes to more complex operations such as joins). The exact form of the SQL statements that the SQL builder produces depends on which stage you invoke it from. You do not have to be an SQL expert to use the SQL builder, but we assume some familiarity with the basic structure of SQL statements in this documentation.
49
3. Click a database alias. The list of schemas opens as nodes beneath each database alias. 4. In the SQL Type list, select the type of SQL query that you want to construct. 5. Click the SQL builder button. The SQL Builder - DB2/UDB 8.2 window opens. In the Select Tables pane, the database alias appears as a node.
50
4. For each column in the column selection grid, specify how values are derived. You can type a value or select a derivation method from the drop-down list. v Job Parameters. The Parameter dialog box appears. Select from the job parameters that are defined for this job. v Lookup Columns. The Lookup Columns dialog box appears. Select a column from the input columns to the stage that you are using the SQL builder in. v Expression Editor. The Expression Editor opens. Build an expression that derives the value. 5. Click on the Sql tab to view the finished query.
51
editor to specify the actual filter (the fields displayed depend on the predicate you choose). For example, use the Comparison predicate to specify that a column should match a particular value, or the Between predicate to specify that a column falls within a particular range. The filter appears as a WHERE clause in the finished statement. 4. Click the Add button in the filter panel. The filter that you specify appears in the filter expression panel and is added to the update statement that you are building. 5. Click on the Sql tab to view the finished query.
Toolbar
The SQL builder toolbar contains the following tools. v Clear Query removes the field entries for the current SQL query. v Cut removes items and placed them on the Windows clipboard so they can be pasted elsewhere. v Copy copies items and place them on the Windows clipboard so they can be pasted elsewhere. v Paste pastes items from the Windows clipboard to certain places in the SQL builder. v SQL properties opens the Properties dialog box. v Quoting toggles quotation marks in table and column names in the generated SQL statements. v Validation toggles the validation feature. Validation automatically occurs when you click OK to exit the SQL builder. v View Data is available when you invoke the SQL builder from stages that support the viewing of data. It causes the calling stage to run the SQL as currently built and return the results for you to view. v Refresh refreshes the contents of all the panels on the SQL builder. v Window View allows you to select which panels are shown in the SQL builder window. v Help opens the online help.
Tree Panel
This displays the table definitions that currently exist within the WebSphere DataStage repository. The easiest way to get a table definition into the repository is to import it directly from the database you want to query. You can do this via the Designer client, or you can do it directly from the shortcut menu in the tree panel. You can also manually define a table definition from within the SQL builder by selecting New Table... from the tree panel shortcut menu. To select a table to query, select it in the tree panel and drag it to the table selection canvas. A window appears in the canvas representing the table and listing all its individual columns.
52
A shortcut menu allows you to: v Refresh the repository view v Define a new table definition (the Table Definition dialog box opens) v Import metadata directly from a data source (a sub menu offers a list of source types) v Copy a table definition (you can paste it in the table selection canvas) v View the properties of the table definition (the Table Definition dialog box opens) You can also view the properties of a table definition by double-clicking on it in the repository tree.
53
With a join selected in the canvas (select queries only), a shortcut menu allows you to: v Open the Alternate Relation dialog box to specify that the join should be based on a different foreign key relationship. v Open the Join Properties dialog box to modify the type of join and associated join expression. From the canvas background, a shortcut menu allows you to: v Refresh the view of the table selection canvas. v Paste a table that you have copied from the tree panel. v View data - this is available when you invoke the SQL builder from stages that support the viewing of data. It causes the calling stage to run the SQL as currently built and return the results for you to view. v Open the Properties dialog box to view details of the SQL syntax that the SQL builder is currently building a query for.
Selection Page
The Selection page appears when you are using the SQL builder to define a Select statement. Use this page to specify details of your select query. It has the following components.
Column expression
Identifies the column to be included in the query. You can specify: v Job parameter. A dialog box appears offering you a choice of available job parameters. This allows you to specify the value to be used in the query at run time (the stage you are using the SQL builder from must allow job parameters for this to appear). v Expression. An expression editor dialog box appears, allowing you to specify an expression that represents the value to be used in the query. v Data flow variable. A dialog box appears offering you a choice of available data flow variables (the stage you are using the SQL builder from must support data flow variables for this to appear) v Lookup Column. You can directly select a column from one of the tables in the table selection canvas.
54
Table
Identifies the table that the column belongs to. If you populate the column grid by dragging, copying or double-clicking on a column from the table selection canvas, the table name is filled in automatically. You can also choose a table from the drop-down list. To specify the table name at runtime, choose a job parameter from the drop-down list.
Column Alias
This allows you to specify an alias for the column.
Output
This is selected to indicate that the column will be output by the query. This is automatically selected when you add a column to the grid.
Sort
Choose Ascending or Descending to have the query sort the returned rows by the value of this column. Selecting to sort results in an ORDER BY clause being added to the query.
Sort Order
Allows you to specify the order in which rows are sorted if you are ordering by more than one column.
Context Menu
A shortcut menu allows you to: v Paste a column that youve copied from the table selection canvas. v Insert a row in the grid. v Show or hide the filter panel. v Remove a row from the grid.
Filter Panel
The filter panel allows you to specify a WHERE clause for the SELECT statement you are building. It comprises a predicate list and an expression editor panel, the contents of which depends on the chosen predicate. See Expression Editor for details on using the expression editor that the filter panel provides.
55
Group Page
The Group page appears when you are using the SQL builder to define a select statement. Use the Group page to specify that the results of a select query are grouped by a column, or columns. Also, use it to aggregate the results in some of the columns, for example, you could specify COUNT to count the number of rows that contain a not-null value in a column. The Group tab gives access to the toolbar, tree panel, and the table selection canvas, in exactly the same way as the Selection page.
Grouping Grid
This is where you specify which columns are to be grouped by or aggregated on. The grid is populated with the columns that you selected on the Selection page. You can change the selected columns or select new ones, which will be reflected in the selection your query makes. The grid has the following fields: v Column expression. Identifies the column to be included in the query. You can modify the selections from the Selection page, or build a column expression. Job parameter. A dialog box appears offering you a choice of available job parameters. This allows you to specify the value to be used in the query at run time (the stage you are using the SQL builder from must allow job parameters for this to appear). Expression Editor. An expression editor dialog box appears, allowing you to specify an expression that represents the value to be used in the query. Data flow variable. A dialog box appears offering you a choice of available data flow variables (the stage you are using the SQL builder from must support data flow variables for this to appear). Lookup Column. You can directly select a column from one of the tables in the table selection canvas. v Column Alias. This allows you to specify an alias for the column. If you select an aggregation operation for a column, SQL builder will automatically insert an alias of the form Alison; you can edit this if required. v Output. This is selected to indicate that the column will be output by the query. This is automatically selected when you add a column to the grid. v Distinct. Select this check box if you want to add the DISTINCT qualifier to an aggregation. For example, a COUNT aggregation with the distinct qualifier will count the number of rows with distinct values in a field (as opposed to just the not-null values). For more information about the DISTINCT qualifier, see SQL Properties Dialog Box. v Aggregation. Allows you to select an aggregation function to apply to the column (note that this is mutually exclusive with the Group By option). See Aggregation Functions for details about the available functions. v Group By. Select the check box to specify that query results should be grouped by the results in this column.
Aggregation Functions
The aggregation functions available vary according to the stage you have opened the SQL builder from. The following are the basic ones supported by all SQL syntax variants.
56
The following aggregation functions are supported. v AVG. Returns the mean average of the values in a column. For example, if you had six rows with a column containing a price, the six rows would be added together and divided by six to yield the mean average. If you specify the DISTINCT qualifier, only distinct values will be averaged; if our six rows only contained four distinct prices then these four would be added together and divided by four to produce a mean average. v COUNT. Counts the number of rows that contain a not-null value in a column. If you specify the DISTINCT qualifier, only distinct values will be counted. v MAX. Returns the maximum value that the rows hold in a particular column. The DISTINCT qualifier can be selected, but has no effect on this function. v MIN. Returns the minimum value that the rows hold in a particular column. The DISTINCT qualifier can be selected, but has no effect on this function. v STDDEV. Returns the standard deviation for a set of numbers. v VARIANCE. Returns the variance for a set of numbers.
Filter Panel
The filter panel allows you to specify a HAVING clause for the SELECT statement you are building. It comprises a predicate list and an expression editor panel, the contents of which depends on the chosen predicate. See Expression Editor for details on using the expression editor that the filter panel provides.
Insert Page
The Insert page appears when you are using the SQL builder to define an insert statement. Use this page to specify details of your insert statement. The only component the page has is insert column grid.
Insert Column
Identifies the columns to be included in the statement. You can populate this in a number of ways: v drag columns from the table in the table selection canvas. v choose columns from a drop-down list in the grid. v double-click the column name in the table selection canvas. v copy and paste from the table selection canvas.
57
Insert Value
Identifies the values that you are setting the corresponding column to. You can specify one of the following in giving a value. You can also type a value directly into this field. v Job parameter. A dialog box appears offering you a choice of available job parameters. This allows you to specify the value to be used in the query at run time (the stage you are using the SQL builder from must allow job parameters for this to appear). v Expression. An expression editor dialog box appears, allowing you to specify an expression that represents the value to be used in the query. v Data flow variable. A dialog box appears offering you a choice of available data flow variables (the stage you are using the SQL builder from must support data flow variables for this to appear) v Lookup Column. You can directly select a column from one of the tables in the table selection canvas.
Update Page
The Update page appears when you are using the SQL builder to define an update statement. Use this page to specify details of your update statement. It has the following components.
Update Column
Identifies the columns to be included in the statement. You can populate this in a number of ways: v drag columns from the table in the table selection canvas. v choose columns from a drop-down list in the grid. v double-click the column name in the table selection canvas. v copy and paste from the table selection canvas.
Update Value
Identifies the values that you are setting the corresponding column to. You can specify one of the following in giving a value. You can also type a value directly into this field. v Job parameter. A dialog box appears offering you a choice of available job parameters. This allows you to specify the value to be used in the query at run time (the stage you are using the SQL builder from must allow job parameters for this to appear). v Expression. An expression editor dialog box appears, allowing you to specify an expression that represents the value to be used in the query. v Data flow variable. A dialog box appears offering you a choice of available data flow variables (the stage you are using the SQL builder from must support data flow variables for this to appear) v Lookup Column. You can directly select a column from one of the tables in the table selection canvas.
58
Filter Panel
The filter panel allows you to specify a WHERE clause for the update statement you are building. It comprises a predicate list and an expression editor panel, the contents of which depends on the chosen predicate. See Expression Editor for details on using the expression editor that the filter panel provides.
Delete Page
The Delete page appears when you are using the SQL builder to define a delete statement. Use this page to specify details of your delete statement. It has the following components.
Filter Panel
The filter panel allows you to specify a WHERE clause for the delete statement you are building. It comprises a predicate list and an expression editor panel, the contents of which depends on the chosen predicate. See Expression Editor for details on using the expression editor that the filter panel provides.
Sql Page
Click the Sql tab to view the generated statement. Using the shortcut menu, you can copy the statement for use in other environments. For select queries, if the columns you have defined as output columns for your stage do not match the columns that the SQL statement is generating, use the Resolve columns grid to reconcile them. Iin most cases, the columns match.
59
If there is a mismatch, the grid displays a warning message. Click the Auto Match button to resolve the mismatch. You are offered the choice of matching by name, by order, or by both. When matching, the SQL builder seeks to alter the columns generated by the SQL statement to match the columns loaded onto the stage. If you choose Name matching, and a column of the same name with a compatible data type is found, the SQL builder: v Moves the result column to the equivalent position in the grid to the loaded column (this will change the position of the named column in the SQL). v Modifies all the attributes of the result column to match those of the loaded column. If you choose Order matching, the builder works through comparing each results column to the loaded column in the equivalent position. If a mismatch is found, and the data type of the two columns is compatible, the SQL builder: v Changes the alias name of the result column to match the loaded column (provided the results set does not already include a column of that name). v Modifies all the attributes of the result column to match those of the loaded column. If you choose Both, the SQL builder applies Name matching and then Order matching. If auto matching fails to reconcile the columns as described above, any mismatched results column that represents a single column in a table is overwritten with the details of the loaded column in the equivalent position. When you click OK in the Sql tab, the SQL builder checks to see if the results columns match the loaded columns. If they dont, a warning message is displayed allowing you to proceed or cancel. Proceeding causes the loaded columns to be merged with the results columns: v Any matched columns are not affected. v Any extra columns in the results columns are added to the loaded columns. v Any columns in the loaded set that do not appear in the results set are removed. v For columns that dont match, if data types are compatible the loaded column is overwritten with the results column. If data types are not compatible, the existing loaded column is removed and replaced with the results column. You can also edit the columns in the Results part of the grid in order to reconcile mismatches manually.
Expression Editor
The Expression Editor allows you to specify details of a WHERE clause that will be inserted in your select query or update or delete statement. You can also use it to specify WHERE clause for a Join condition where you are joining multiple tables, or for a HAVING clause. A variant of the expression editor allows you to specify a calculation, function, or a case statement within an expression. The Expression Editor can be opened from various places in the SQL builder.
60
v Fill in the information required by the Expression Editor fields that appear. v Click the Add button to add the filter to the query you are building. This clears the expression editor so that you can add another filter if required. The contents of the expression editor vary according to which predicate you have selected. The following predicates are available: v Between. Allows you to specify that the value in a column should lay within a certain range. v Comparison. Allows you to specify that the value in a column should be equal to, or greater than or less than, a certain value. v In. Allows you to specify that the value in a column should match one of a list of values. v Like. Allows you to specify that the value in a column should contain, start with, end with, or match a certain value. v Null. Allows you to specify that a column should, or should not be, null. If you are building a query for Oracle 8i, you can use Join predicate. The logic appears in the query as a WHERE statement. Oracle 8i does not support JOIN statements.
Between
The expression editor when you have selected the Between predicate contains: v Column. Choose the column on which you are filtering from the drop-down list. You can also specify: Job parameter. A dialog box appears offering you a choice of available job parameters. This allows you to specify the value to be used in the query at run time (the stage you are using the SQL builder from must allow job parameters for this to appear). Expression. An expression editor dialog box appears, allowing you to specify an expression that represents the value to be used in the query. Data flow variable. A dialog box appears offering you a choice of available data flow variables (the stage you are using the SQL builder from must support data flow variables for this to appear) Column. You can directly select a column from one of the tables in the table selection canvas. v Between/Not Between. Choose Between or Not Between from the drop-down list to specify whether the value you are testing should be inside or outside your specified range. v Start of range. Use this field to specify the start of your range. Click the menu button to the right of the field and specify details about the argument you are using to specify the start of the range, then specify the value itself in the field. v End of range. Use this field to specify the end of your range. Click the menu button to the right of the field and specify details about the argument you are using to specify the end of the range, then specify the value itself in the field.
Comparison
The expression editor when you have selected the Comparison predicate contains: v Column. Choose the column on which you are filtering from the drop-down list. You can specify one of the following in identifying a column:
61
Job parameter. A dialog box appears offering you a choice of available job parameters. This allows you to specify the value to be used in the query at run time (the stage you are using the SQL builder from must allow job parameters for this to appear). Expression. An expression editor dialog box appears, allowing you to specify an expression that represents the value to be used in the query. Data flow variable. A dialog box appears offering you a choice of available data flow variables (the stage you are using the SQL builder from must support data flow variables for this to appear) Column. You can directly select a column from one of the tables in the table selection canvas. v Comparison operator. Choose the comparison operator from the drop-down list. The available operators are: = equals <> not equal to < less than <= less than or equal to > greater than >= greater than or equal to v Comparison value. Use this field to specify the value you are comparing to. Click the menu button to the right of the field and choose the data type for the value from the menu, then specify the value itself in the field.
In
The expression editor when you have selected the In predicate contains: v Column. Choose the column on which you are filtering from the drop-down list. You can specify one of the following in identifying a column: Job parameter. A dialog box appears offering you a choice of available job parameters. This allows you to specify the value to be used in the query at run time (the stage you are using the SQL builder from must allow job parameters for this to appear). Expression. An expression editor dialog box appears, allowing you to specify an expression that represents the value to be used in the query. Data flow variable. A dialog box appears offering you a choice of available data flow variables (the stage you are using the SQL builder from must support data flow variables for this to appear) Column. You can directly select a column from one of the tables in the table selection canvas. v In/Not In. Choose IN or NOT IN from the drop-down list to specify whether the value should be in the specified list or not in it. v Selection. These fields allows you to specify the list used by the query. Use the menu button to the right of the single field to specify details about the argument you are using to specify a list item, then enter a value. Click the double right arrow to add the value to the list. To remove an item from the list, select it then click the double left arrow.
Like
The expression editor when you have selected the Like predicate is as follows. The fields it contains are:
62
v Column. Choose the column on which you are filtering from the drop-down list. You can specify one of the following in identifying a column: Job parameter. A dialog box appears offering you a choice of available job parameters. This allows you to specify the value to be used in the query at run time (the stage you are using the SQL builder from must allow job parameters for this to appear). Expression. An expression editor dialog box appears, allowing you to specify an expression that represents the value to be used in the query. Data flow variable. A dialog box appears offering you a choice of available data flow variables (the stage you are using the SQL builder from must support data flow variables for this to appear) Column. You can directly select a column from one of the tables in the table selection canvas. v Like/Not Like. Choose LIKE or NOT LIKE from the drop-down list to specify whether you are including or excluding a value in your comparison. v Like Operator. Choose the type of Like or Not Like comparison you want to perform from the drop-down list. Available operators are: Match Exactly. Your query will ask for an exact match to the value you specify. Starts With. Your query will match rows that start with the value you specify. Ends With. Your query will match rows that end with the value you specify. Contains. Your query will match rows that contain the value you specify anywhere within them. v Like Value. Specify the value that your LIKE predicate will attempt to match.
Null
The expression editor when you have selected the Null predicate is as follows. The fields it contains are: v Column. Choose the column on which you are filtering from the drop-down list. You can specify one of the following in identifying a column: Job parameter. A dialog box appears offering you a choice of available job parameters. This allows you to specify the value to be used in the query at run time (the stage you are using the SQL builder from must allow job parameters for this to appear). Expression. An expression editor dialog box appears, allowing you to specify an expression that represents the value to be used in the query. Data flow variable. A dialog box appears offering you a choice of available data flow variables (the stage you are using the SQL builder from must support data flow variables for this to appear) Column. You can directly select a column from one of the tables in the table selection canvas. v Is Null/Is Not Null. Choose whether your query will match a NULL or NOT NULL condition in the column.
Join
This predicate is only available when you are building an Oracle 8i query with an `old style join expression. The Expression Editor is as follows. v Left column. Choose the column to be on the left of your join from the drop-down list. v Join type. Choose the type of join from the drop-down list.
Chapter 4. Building SQL statements
63
v Right column. Choose the column to be on the right of your query from the drop-down list.
Calculation
The expression editor when you have selected the Calculation predicate contains these fields: v Left Value. Enter the argument you want on the left of your calculation. You can choose the type of argument by clicking the menu button on the right and choosing a type from the menu. v Calculation Operator. Choose the operator for your calculation from the drop-down list. v Right Value. Enter the argument you want on the right of your calculation. You can choose the type of argument by clicking the menu button on the right and choosing a type from the menu.
Functions
The expression editor when you have selected the Functions predicate contains these fields: v Function. Choose a function from the drop-down list. The list of available functions depends on the database you are building the query for. v Description. Gives a description of the function you have selected. v Parameters. Enter the parameters required by the function you have selected. The parameters that are required vary according to the selected function.
Case
The case option on the expression editor enables you to include case statments in the SQL you are building. You can build case statements with the following syntax.
CASE WHEN condition THEN value CASE WHEN... ELSE value
or
CASE subject WHEN match_value THEN value WHEN... ELSE value
The expression editor when you have selected the Case predicate contains these fields: v Case Expression. This is the subject of the case statement. Specify this if you are using the second syntax described above (CASE subject WHEN). By default, the field offers a choice of the columns from the table or tables you have dragged to the table selection canvas. To choose an alternative, click the browse button next to the field. This gives you a choice of data types, or of specifying another expression, a function, or a job parameter.
64
v Case Expression. This is the subject of the case statement. Specify this if you are using the second syntax described above (CASE subject WHEN). By default, the field offers a choice of the columns from the table or tables you have dragged to the table selection canvas. To choose an alternative, click the browse button next to the field. This gives you a choice of data types, or of specifying another expression or function. v When. This allows you to specify a condition or match value for your case statement. By default, the field offers a choice of the columns from the table or tables you have dragged to the table selection canvas. To choose an alternative, click the browse button next to the field. This gives you a choice of data types, or of specifying another expression, a function, or a job parameter. You can access the main expression editor by choose case expression editor from the menu. This allows you to specify expressions such as comparisons. You would typically use this in the first syntax example. For example, you would specify grade=3 as the condition in the expression WHEN grade=3 THEN first class. v When. This allows you to specify a condition or match value for your case statement. By default, the field offers a choice of the columns from the table or tables you have dragged to the table selection canvas. To choose an alternative, click the browse button next to the field. This gives you a choice of data types, or of specifying another expression or function. You can access the main expression editor by choose case expression editor from the menu. This allows you to specify expressions such as comparisons. You would typically use this in the first syntax example. For example, you would specify grade=3 as the condition in the expression WHEN grade=3 THEN first class. v Then. Use this to specify the value part of the case expression. By default, the field offers a choice of the columns from the table or tables you have dragged to the table selection canvas. To choose an alternative, click the browse button next to the field. This gives you a choice of data types, or of specifying another expression, a function, or a job parameter. v Then. Use this to specify the value part of the case expression. By default, the field offers a choice of the columns from the table or tables you have dragged to the table selection canvas. To choose an alternative, click the browse button next to the field. This gives you a choice of data types, or of specifying another expression or function. v Add. Click this to add a case expression to the query. This clears the When and Then fields so that you can specify another case expression. v Else Expression. Use this to specify the value for the optional ELSE part of the case expression.
65
v Date Time. Specifies that the argument is a date time. The SQL builder inserts the current date and time in the format that the database the query is being built for expects. You can edit the date time as required. v Plaintext. Allows you to select the default value of an argument (if one is defined). v Expression Editor. You can specify a function or calculation expression as an argument of an expression. Selecting this causes the Calculation/Function version of the expression editor to open. v Function. You can specify a function as an argument to an expression. Selecting this causes the Functions Form dialog box to open. The functions available depend on the database that the query you are building is intended for. Selecting this causes the Function dialog box to open. v Job Parameter. You can specify that the argument is a job parameter, the value for which is supplied when you actually run the WebSphere DataStage job. Selecting this opens the Parameters dialog box. v Integer. Choose this to specify that the argument is of integer type. v String. Select this to specify that the argument is of string type. v Time. Specifies that the argument is the current local time. You can edit the value. v Timestamp. Specifies that the argument is a timestamp. You can edit the value. The SQL builder inserts the current date and time in the format that the database that the query is being built for expects.
66
v Result. Shows the actual function that will be included in the query as specified in this dialog box. v Parameters. Enter the parameters required by the function you have selected. The parameters that are required vary according to the selected function.
Joining Tables
When you use the SQL builder to help you build select queries, you can specify table joins within the query. When you drag multiple tables onto the table selection canvas, the SQL builder attempts to create a join between the table added and the one already on the canvas to its left. It uses captured foreign key meta data where this is available. The join is represented by a line joining the columns the SQL builder has decided to join on. After the SQL builder automatically inserts a join, you can amend it. When you add a table to the canvas, SQL builder determines how to join the table with tables that are on the canvas. The process depends on whether the added table is positioned to the right or left of the tables on the canvas. To construct a join between the added table and the tables to its left: 1. SQL builder starts with the added table. 2. Determine if there is a foreign key between the added table and the subject table. v If a foreign key is present, continue to Step 3. v If a foreign key is not present, skip to Step 4. 3. Choose between alternatives for joining the tables that is based on the following precedence. v Relations that apply to the key fields of the added tables v Any other foreign key relation Construct an INNER JOIN between the two tables with the chosen relationship dictating the join criteria. 4. Take the subject as the next table to the left, and try again from step 2 until either a suitable join condition has been found or all tables, to the left, have been exhausted. 5. If no join condition is found among the tables, construct a default join. If the SQL grammar does not support a CROSS JOIN, an INNER JOIN is used with no join condition. Because this produces an invalid statement, you must set a suitable condition, either through the Join Properties dialog box, or by dragging columns between tables. An INNER JOIN is used with no join condition. Because this produces an invalid statement, you must set a suitable condition, either through the Join Properties dialog box, or by dragging columns between tables. To construct a join between the added table and tables to its right: 1. SQL builder starts with the added table.
Chapter 4. Building SQL statements
67
2. Determine if foreign key information exists between the added table and the subject table. v If a foreign key is present, continue to Step 3. v If a foreign key is not present, skip to Step 4. 3. Choose between alternatives based on the following precedence: v Relations that apply to the key fields of the added tables v Any other joins Construct an INNER JOIN between the two tables with the chosen relationship dictating the join criteria. 4. Take the subject as the next table to the right and try again from step 2. 5. If no join condition is found among the tables, construct a default join. If the SQL grammar does not support a CROSS JOIN, an INNER JOIN is used with no join condition. Because this produces an invalid statement, you must set a suitable condition, either through the Join Properties dialog box, or by dragging columns between tables. An INNER JOIN is used with no join condition. Because this produces an invalid statement, you must set a suitable condition, either through the Join Properties dialog box, or by dragging columns between tables.
Specifying Joins
There are three ways of altering the automatic join that the SQL builder inserts when you add more than one table to the table selection canvas: v Using the Join Properties dialog box. Open this by selecting the link in the table selection canvas, right clicking and choosing Properties from the shortcut menu. This dialog allows you to choose a different type of join, choose alternative conditions for the join, or choose a natural join. v Using the Alternate Relation dialog box. Open this by selecting the link in the table selection canvas, right clicking and choosing Alternate Relation from the shortcut menu. This dialog allows you to change foreign key relationships that have been specified for the joined tables. v By dragging a column from one table to another column in any table to its right on the canvas. This replaces the existing automatic join and specifies an equijoin between the source and target column. If the join being replaced is currently specified as an inner or outer join, then the type is preserved, otherwise the new join will be an inner join. Yet another approach is specify the join using a WHERE clause rather than an explicit join operation (although this is not recommended where your database supports explicit join statements). In this case you would: 1. Specify the join as a Cartesian product. (SQL builder does this automatically if it cannot determine the type of join required). 2. Specify a filter in the Selection tab filter panel. This specifies a WHERE clause that selects rows from within the Cartesian product. If you are using the SQL builder to build Oracle 8i, Microsoft SQL Server, IBM Informix, or Sybase queries, you can use the Expression Editor to specify a join condition, which will be implemented as a WHERE statement. Oracle 8i does not support JOIN statements.
68
Properties Dialogs
Depending where you are in the SQL builder, choosing Properties from the shortcut menu opens a dialog box as follows: v The Table Properties dialog box opens when you select a table in the table selection canvas and choose Properties from the shortcut menu. v The SQL Properties dialog box opens when you select the Properties icon in the toolbox or Properties from the table selection canvas background. v The Join Properties dialog box opens when you select a join in the table selection canvas and choose Properties from the shortcut menu. This dialog is described in Join Properties Dialog Box.
Chapter 4. Building SQL statements
69
70
Product documentation
Documentation is provided in a variety of locations and formats, including in help that is opened directly from the product interface, in a suite-wide information center, and in PDF file books. The information center is installed as a common service with IBM Information Server. The information center contains help for most of the product interfaces, as well as complete documentation for all product modules in the suite. A subset of the product documentation is also available online from the product documentation library at publib.boulder.ibm.com/infocenter/iisinfsv/v8r1/ index.jsp. PDF file books are available through the IBM Information Server software installer and the distribution media. A subset of the information center is also available online and periodically refreshed at www.ibm.com/support/docview.wss?rs=14 &uid=swg27008803. You can also order IBM publications in hardcopy format online or through your local IBM representative. To order publications online, go to the IBM Publications Center at www.ibm.com/shop/publications/order. You can send your comments about documentation in the following ways: v Online reader comment form: www.ibm.com/software/data/rcf/ v E-mail: [email protected]
Contacting IBM
You can contact IBM for customer support, software services, product information, and general information. You can also provide feedback on products and documentation.
Customer support
For customer support for IBM products and for product download information, go to the support and downloads site at www.ibm.com/support/us/. You can open a support request by going to the software support service request site at www.ibm.com/software/support/probsub.html.
My IBM
You can manage links to IBM Web sites and information that meet your specific technical support needs by creating an account on the My IBM site at www.ibm.com/account/us/.
71
Software services
For information about software, IT, and business consulting services, go to the solutions site at www.ibm.com/businesssolutions/us/en.
General information
To find general information about IBM, go to www.ibm.com.
Product feedback
You can provide general product feedback through the Consumability Survey at www.ibm.com/software/data/info/consumability-survey.
Documentation feedback
You can click the feedback link in any topic in the information center to comment on the information center. You can also send your comments about PDF file books, the information center, or any other documentation in the following ways: v Online reader comment form: www.ibm.com/software/data/rcf/ v E-mail: [email protected]
72
If an optional item appears above the main path, that item has no effect on the execution of the syntax element and is used only for readability.
optional_item required_item
v If you can choose from two or more items, they appear vertically, in a stack. If you must choose one of the items, one item of the stack appears on the main path.
required_item required_choice1 required_choice2
If choosing one of the items is optional, the entire stack appears below the main path.
required_item optional_choice1 optional_choice2
If one of the items is the default, it appears above the main path, and the remaining choices are shown below.
default_choice required_item optional_choice1 optional_choice2
v An arrow returning to the left, above the main line, indicates an item that can be repeated.
73
required_item
repeatable_item
If the repeat arrow contains a comma, you must separate repeated items with a comma.
, required_item repeatable_item
A repeat arrow above a stack indicates that you can repeat the items in the stack. v Sometimes a diagram must be split into fragments. The syntax fragment is shown separately from the main syntax diagram, but the contents of the fragment should be read as if they are on the main path of the diagram.
required_item fragment-name
Fragment-name:
required_item optional_item
v Keywords, and their minimum abbreviations if applicable, appear in uppercase. They must be spelled exactly as shown. v Variables appear in all lowercase italic letters (for example, column-name). They represent user-supplied names or values. v Separate keywords and parameters by at least one space if no intervening punctuation is shown in the diagram. v Enter punctuation marks, parentheses, arithmetic operators, and other symbols, exactly as shown in the diagram. v Footnotes are shown by a number in parentheses, for example (1).
74
Product accessibility
You can get information about the accessibility status of IBM products. The IBM Information Server product modules and user interfaces are not fully accessible. The installation program installs the following product modules and components: v IBM Information Server Business Glossary Anywhere v IBM Information Server FastTrack v IBM Metadata Workbench v IBM WebSphere Business Glossary v IBM WebSphere DataStage and QualityStage v IBM WebSphere Information Analyzer v IBM WebSphere Information Services Director For more information about a products accessibility status, go to http://www.ibm.com/able/product_accessibility/index.html.
Accessible documentation
Accessible documentation for IBM Information Server products is provided in an information center. The information center presents the documentation in XHTML 1.0 format, which is viewable in most Web browsers. XHTML allows you to set display preferences in your browser. It also allows you to use screen readers and other assistive technologies to access the documentation.
75
76
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the users responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: IBM World Trade Asia Corporation Licensing 2-31 Roppongi 3-chome, Minato-ku Tokyo 106-0032, Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.
Copyright IBM Corp. 1998, 2008
77
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Corporation J46A/G4 555 Bailey Avenue San Jose, CA 95141-1003 U.S.A. Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBMs future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information is for planning purposes only. The information herein is subject to change before the products described become available. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
78
Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as follows: (your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. Copyright IBM Corp. _enter the year or years_. All rights reserved. If you are viewing this information softcopy, the photographs and color illustrations may not appear.
Trademarks
IBM trademarks and certain non-IBM trademarks are marked on their first occurrence in this information with the appropriate symbol. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/ copytrade.shtml. The following terms are trademarks or registered trademarks of other companies: Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
Notices
79
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. The United States Postal Service owns the following trademarks: CASS, CASS Certified, DPV, LACSLink, ZIP, ZIP + 4, ZIP Code, Post Office, Postal Service, USPS and United States Postal Service. IBM Corporation is a non-exclusive DPV and LACSLink licensee of the United States Postal Service. Other company, product, or service names may be trademarks or service marks of others.
80
Index A
accessibility 71 description DB2 UDB API stage 1 documentation accessible 71 dollar sign ($), DB2 UDB API stage
R
restarting the load, DB2 UDB Load stage 46 11
C
Columns tab Input page, DB2 UDB API stage 7 Output page, DB2 UDB API stage 8 customer support 71
F
functionality IBM DB2 API stage 1 IBM DB2 Load stage 45
S
screen readers 71 sequential file load method, DB2 UDB Load stage 45 software services 71 SQL tab Input page, DB2 UDB API stage 7 Output page, DB2 UDB API stage 8 Stage page, DB2 UDB API stage 3, 4 support, customer 71
D
data types, DB2 UDB API stage 9, 10 DB Options 38 DB2 API stage 1 DB2 API stages Stage page 4 General tab 4 DB2 enterprise stage examples 20 DB2 load stage 45 DB2 UDB API stage Columns tab Input page 7 Output page 8 connecting to a data source 3 data types 9, 10 description 1 functionality 1 General tab Input page 5, 6 Output page 8 Stage page 3 handling $ and # 11 input links 1 Input page 4 output links 1 Output page 7, 8 overview 1 Query Type 5 reference output links 1 SQL builder 5 SQL tab Input page 7 Output page 8 Stage page 3 NLS tab 4 troubleshooting 11 DB2 UDB Load stage functionality 45 load methods 45, 46 named pipe load method 46 restarting the load 46 sequential file load method 45 DB2/UDB enterprise stage 13 DB2/UDB Enterprise stage 13 DB2/UDB enterprise stage input properties 25 DB2/UDB enterprise stage output properties 41 Copyright IBM Corp. 1998, 2008
G
General tab Input page, DB2 UDB API stage 5, 6 Output page, DB2 UDB API stage 8 Stage page, DB2 UDB API stage 3, 4
T
trademarks 79 troubleshooting, DB2 UDB API stage 11
I
IBM DB2 API stage Input page 3 NLS tab Stage page 3 Output page 3 Stage page General tab 3 IBM support 71 Input page, DB2 UDB API stage
3, 4
L
legal notices 77 load methods, DB2 UDB Load stage 46 45,
N
named pipe load method, DB2 UDB Load stage 46 NLS tab Stage page, DB2 UDB API stage 3, 4
O
Output page, DB2 UDB API stage overview DB2 UDB API stage 1 3, 7, 8
P
pound sign (#), DB2 UDB API stage 11 product accessibility accessibility 75 properties DB2/UDB enterprise stage input 25 DB2/UDB enterprise stage output 41
81
82
Printed in USA
LC18-9932-01