WebSphere Message Broker Coding Tips
WebSphere Message Broker Coding Tips
Tim Dunn
[email protected]
V1.0
Last Updated 22nd August 2008
Introduction....................................................................................................................3 Identifying the areas in which Processing Costs Arise..................................................4 Parsing........................................................................................................................5 Message/Business Processing....................................................................................5 Navigation..................................................................................................................5 Tree Copying .............................................................................................................6 Resources ...................................................................................................................6 General Message Flow Coding Considerations .............................................................7 Identify the Critical Path of Processing [CPU, Memory] ......................................7 Minimize number of Compute & JavaCompute nodes [CPU, Memory] ..............7 Avoid Consecutive Short Message Flows [CPU, Memory] ..................................8 Maximise use of the built-in parsers [CPU, Memory]...........................................8 Use Subflows carefully [CPU, Memory] .............................................................8 Watch the order in which you define message tree lements [CPU] ......................9 ESQL........................................................................................................................10 Array Variables [CPU] ........................................................................................10 CARDINALITY function [CPU] ........................................................................10 CREATE Statement [CPU, Memory]..................................................................11 DECLARE statements [Memory]........................................................................11 EVAL Statement [CPU] ......................................................................................11 FORMAT Clause [CPU]......................................................................................11 IF and CASE statements [CPU]...........................................................................11 PASSTHRU Statement [CPU].............................................................................12 PROPAGATE [CPU, Memory]...........................................................................12 Reference Variables [CPU]..................................................................................13 Shared Variables [CPU].......................................................................................13 Minimize use of String Manipulation Functions [CPU]......................................13 Volume of ESQL [CPU, Memory] ......................................................................13 Minimize Navigation of the Logical Tree [CPU] ................................................13 Java ..........................................................................................................................14 Storing Intermediate tree references [CPU].........................................................14 String Concatenation [CPU] ................................................................................14 Optimise BLOB Processing [CPU] .....................................................................15 Java Code .............................................................................................................15 Examples......................................................................................................................16 What Not to do.........................................................................................................16 Array Subscripts...................................................................................................16 Memory Use.........................................................................................................17 What to do................................................................................................................19 Large Repeating Structures..................................................................................19
Introduction
The purpose of this document is to provide coding tips for Message Broker message flow developers. WebSphere Message Broker provides a variety of transformation techniques to the message flow developer or analyst. These range from coding to mapping techniques which use drag and drop technology. These techniques are: ESQL code written in nodes such as Compute, Filter and Database nodes Java written a JavaCompute node eXtensible Stylesheet running in the XMLT node Use of drag and drop facility in the Mapping node WebSphere Transformation Extender running in the WebSphere Transformation Extender plug-in node In theses facilities WebSphere Message Broker explicitly provides support for two programming languages: ESQL and Java. As with any programming language it is possible to unwittingly write ESQL or Java code that is inefficient. This often arises because the developer is not familiar with the implications of using certain features or artefacts of the programming language in a particular way. To help developers produce more efficient message flows and in particular to code more efficient ESQL and Java code in WebSphere Message Broker message flows this article outlines the key performance issues and documents some recommended best practices for code development. You will see by the side of each tip an indication of whether it helps with CPU and/or memory consumption. [CPU] indicates that using this tip will help to reduce CPU usage by a message flow. [Memory] indicates that using this tip will help to reduce the amount of memory used by a message flow. [CPU, Memory] indicates that using this tip will help to reduce both CPU and memory usage by a message flow.
Figure 1. A simple Routing Message Flow. If it is an order the top path of execution through the Order Analysis and Order Processing nodes is followed. If it is a payment the bottom path through the Process Payment node is followed. When this or any message flow executes processing costs arise in the following areas: o Parsing. This has two parts. The processing of incoming messages and the creation of output messages. As parsing proceeds a message tree is populated in which the elements of the incoming message are represented. o Message/Business Processing. This is the routing and transformation logic which you code in ESQL, Java, Mapping node, XSL or WebSphere TX mappings. o Navigation. This is the process of "walking" the message tree to access the elements which are referred to in the ESQL or Java. o Tree Copying. This occurs in nodes which are able to change the message tree such as Compute nodes. A copy of the message tree is taken for recovery reasons. o Resources. This is the cost of invoking resource requests such as reading or writing WebSphere MQ messages or making database requests.
Parsing
Before an incoming message can be processed by the nodes or ESQL it must transformed from the sequence of bytes, which is the input message, into a structured object, which is the message tree. Some parsing will take place immediately such as the parsing of the MQMD (assuming the incoming message is an MQ message), some will take place, on demand, as fields in the message payload are referred to within the message flow. The amount of data which needs to be parsed is dependent on the organization of the message and the requirements of the message flow. Not all message flows may require access to all data in a message. When an output message is created the message tree needs to be converted into an actual message. This is a function of the parser. The process of creating the output message is referred to as serialization or flattening of the message tree. The creation of the output message is a simpler process than reading an incoming message. The whole message will be written at once when an output message is created. Figure 2 below shows this processing schematically.
Message/Business Processing
It is possible to code message manipulation or business processing in any one of a number of transformation technologies. These were touched on at the beginning of the article. It is through these technologies that the input message is processed and the tree for the output message is produced. The cost of running this processing is dependent on the amount and complexity of the transformation processing that is coded.
Navigation
The cost of navigation is dependent on the complexity and size of the message tree which is in turn dependent on the size and complexity of the input messages and the complexity of the processing within the message flow. As the message tree changes shape over the course of execution of the message so will the costs of accessing different parts of the tree. The cost will be proportional to the depth of tree. There
are steps which can be taken to reduce the cost of navigating the tree and we will touch on these in the sections on ESQL and Java coding
Tree Copying
This occurs in nodes which are able to change the message tree such as Compute nodes. A copy of the message tree is taken for recovery reasons so that if a compute node makes changes and processing in node incurs or generates an exception the message tree can be recovered to a point earlier in the message flow. Without this a failure in the message flow downstream could have implications for a different path in the message flow. Tree copying does not happen in the Filter or Database nodes as these nodes cannot modify the message tree. A tree copy is a copy of a structured object and so is relatively expensive. It is not a copy of a sequence of bytes that is being copied. For this reason it is best to minimize the number of such copies, hence the general recommendation to minimize the number of compute nodes (Compute and JavaCompute) in a message flow.
Resources
The cost of processing messages and databases is dependent on the type (read/write/update for example), the exact type of resource (non persistent or persistent message for example) and the level of activity. Processing non persistent messages will cost less in CPU and I/O processing then persistent messages which need to be logged to ensure data integrity. Similarly a database read will cost less in CPU and I/O activity than a database insert. With the insert data must be added to the db2 table and logged to ensure data integrity.
The processing costs in each of the sections above can normally be reduced by following a series of coding recommendations which are given in the sections below. The recommendations are split into three distinct areas. There are those which are generic in nature and to do with the way in which the message flow is constructed. There are those which apply when coding in ESQL and finally there are those which apply when coding in Java. The primary effect of the recommendations will be to reduce CPU and memory usage. I/O reduction normally occurs as a result of issuing fewer resource requests. This is normally controlled as part of message flow design. This aspect is not covered in this article.
If two Compute nodes are separated by a ResetContentDescriptor node it is possible to combine all three nodes in to a single Compute node using function that is now available from Message Broker V5 fixpack 3 onwards. The CREATE with PARSE clause statement now means that the message brokers parsers can be invoked from within a Compute node removing the need for the ResetContentDescriptor node. There will be situations where you will need more than one compute node in a message flow and this is fine. It is as expected. The key thing is to avoid unnecessary additional Compute nodes. Do not take away the impression that you are being recommended to force all of the processing into a single compute node.
When they were introduced in Version 2 they were the only way of achieving code reuse and as such were widely used. There are now other facilities available which you should consider. But first let us consider a potential drawback to using subflows. Subflow's which contain common routines can be embedded into message flows. The subflow's are 'in-lined' into the message flow when the message flow is compiled. There are no additional nodes inserted into the message flow as a result of using subflow's. [The input and output terminals of the subflow are not processing nodes in the way that a Compute or Filter node are]. However be aware of implicitly adding extra nodes into a message flow as a result of using subflow's. In some situations compute nodes are added to subflow's to perform marshalling of data from one part of the message tree into a known place in the message tree so that the data can be processed by the common processing in the subflow. The result may then need to be copied back to another part of the message tree before the subflow completes. This approach can easily lead to the addition of two compute nodes, each of which performs a tree copy. In such cases the subflow facilitates the reuse of logic but unwittingly adds an additional processing overhead each time it is used. In Message Broker Version 5 ESQL schemas, procedures and functions were introduced. It is more efficient to achieve code re-use through the use of ESQL procedures as they do not result in the additional tree copying that can easily occur easily with subflows.
Watch the order in which you define message tree lements [CPU]
When constructing an internal OutputRoot message tree structure (for an XML message) you must create the individual elements in the correct sequence as defined in the XSD and message set. The parser will not re-order the elements. This applies equally when coding with ESQL or Java. In order to ensure that the correct order is observed you can create a sequence of statements such as the ESQL shown below:
CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Surname'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Inits'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Addr1'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Addr2'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Addr3'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Postcode'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Account_Number'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Account_Bal'; ...
This code creates the right elements, and in the correct sequence. Then later on, when the elements are populated (generally using code like SET OutputRoot.XMLNSC.MsgStruct.<element> = ), each is already there and so the sequence is maintained. Note: It is not necessary to code the DOMAIN clause on every CREATE LASTCHILD statement. When creating a child node under parent P, the child is created by P's parser. So the parser (i.e. the domain) automatically propagates down the tree from the root. There is a performance and memory gain to be had as a result of not coding DOMAIN on the creation of the child elements although the magnitude of the gain is not quantified.
Coding Suggestions
ESQL
Array Variables [CPU]
Avoid use of array subscripts [ ] on the right hand side of expressions use LASTMOVE and reference variables instead. The reason for doing is because of the way in which array subscripts are evaluated at runtime. Every access to an element of an array will always start from the first element. There is no problem when the first element is required, but when the 10th is accessed it involves walking along the message tree from the first elememtn until the 10th is reached. When the 50th element is referenced then again the message tree has to be walked from the first element again. So the higher the array subscript the greater the cost of accessing it. The evaluation of array subscripts works in this way to support the dynamic insertion of elements into the array. Reference variables overcome this by maintaing a pointer into the message to the last element accessed. So if the 10th element has been accessed, accessing the 11th involves only walking to the next element, not starting from the first again. Here is an example of how to use reference variables to access the elements of an array. DECLARE myref REFERENCE TO OutputRoot.XML.Invoice.Purchases.Item[1]; -- Continue processing for each item in the array WHILE LASTMOVE(myref)=TRUE DO -- Add 1 to each item in the array SET myref = myref + 1; -- Move the dynamic reference to the next item in the array MOVE myref NEXTSIBLING; END WHILE;
SET ARRAY_SIZE = CARDINALITY (InputRoot.MRM.A.B.C[] WHILE ( I < ARRAY_SIZE ) This way CARDINALITY is only evaluated the once.
Java
Storing Intermediate tree references [CPU]
Avoid building and navigating trees without storing intermediate references. An example of where this was not done and how it should be done is given below.
MbMessage newEnv = new MbMessage(env); newEnv.getRootElement().createElementAsFirstChild(MbElement.TYPE_NAME, "Destination", null); newEnv.getRootElement().getFirstChild().createElementAsFirstChild(MbElement.TYPE_NAME, "MQDestinationList", null); newEnv.getRootElement().getFirstChild().getFirstChild() createElementAsFirstChild(MbElement.TYPE_NAME,"DestinationData", null); This repeatedly navigates from root to build the tree. It is better to store references as follows: MbMessage newEnv = new MbMessage(env); MbElement destination = newEnv.getRootElement().createElementAsFirstChild(MbElement.TYPE_NAME,"Destination", null); MbElement mqDestinationList = destination.createElementAsFirstChild(MbElement.TYPE_NAME, "MQDestinationList", null);
mqDestinationList.createElementAsFirstChild(MbElement.TYPE_NAME,"DestinationData", null);
Code such as
keyforCache = hostSystem + CommonFunctions.separator + sourceQueueValue + CommonFunctions.separator + smiKey + CommonFunctions.separator + newElement;
Java Code
Follow the usual Java coding tips.
Examples
What Not to do
Here are some examples of how performance was impacted by the use of the wrong coding technique.
Array Subscripts
Below is an example of some ESQL used to load records from a database table. The aim of the processing was to read the rows in from the database table and then iterate around them to create an output message which is then propagated to the next node. The load of the records from four database tables involved processing several hundred thousand rows from a database. It was taking 6-8 hours to run. Here is an extract of the code
SET Environment.Variables.DBDATA[] = ( SELECT T.* FROM Database.{'ABC'}.{'XYZ'} as T ); DECLARE A INTEGER 1; DECLARE B INTEGER CARDINALITY(Environment.Variables.*[]); SET JPcntFODS = B; WHILE A <= B DO CALL CopyMessageHeaders(); CREATE FIELD OutputRoot.XML.FODS; DECLARE outRootRef REFERENCE TO OutputRoot.XML.Data; SET outRootRef.Field1 = Trim(Environment.Variables.DBDATA[A].Field1); SET outRootRef.Field2 = Trim(Environment.Variables.DBDATA[A].Field2); SET SET SET . . . . outRootRef.Field3 = Trim(Environment.Variables.DBDATA[A].Field3); outRootRef.Field4 = Trim(Environment.Variables.DBDATA[A].Field4); outRootRef.Field5 = Trim(Environment.Variables.DBDATA[A].Field5); . .
The problem with the ESQL is the repeated use of array subscripts throughout such as Environment.Variables.DBData[A]. See the section Array Subscript above for why this is not good for performance. The solution in this case was to use REFERENCE variables and the LASTMOVE function instead. This is covered in the section Array Subscript By replacing the use of array subscripts with reference pointers the time dropped to minutes.
Memory Use
The message flow was reading records from four databases into an array in Environment and processing each. The user was experiencing problems with memory usage. The flow was abending after 6 to 8 hours because of memory problems. They ESQL was as follows:
SET Environment.Variables.Part1[] = ( SELECT T.* FROM Database.MyDB.TableA as T ); While loop for each row Build message PROPAGATE; End While SET Environment.Variables.Part2[] = ( SELECT T.* FROM Database.MyDB.TableB as T ); While loop for each row Build message PROPAGATE; End While SET Environment.Variables.Part3[] = ( SELECT T.* FROM Database. MyDB.TableC as T ); While loop for each row Build message PROPAGATE; End While SET Environment.Variables.Part4[] = ( SELECT T.* FROM Database.MyDB.TableD as T ); While loop for each row Build message PROPAGATE; End While
Each of these loads read in between 50K-100K records. This obviously made the memory requirements large as there were hundreds of thousands of rows in total. The while loops after each read built an output message for each row. This was passed to the next node in the flow using the PROPAGATE statement.
At no point did was there any attempt to free memory and in most cases it is not needed within a message. BUT when processing large volumes of data you sometimes need to take some explicit action to avoid problems. After each part had been processed what they should have done was to issue a DELETE for that portion of the tree. For example DELETE LASTCHILD OF Environment.Variables.Part1. This would have freed the memory associated with that part of the Environment Correlation. Note: setting the field to null does not work. So for example: SET Environment.Variables.Part1[ ] =<some large array>; SET Environment.Variables.Part1 = null; Results in the named portion of the tree being detached It effectively disconnects that portion of the tree but does not delete it (that is free the memory). With a detach the memory is tracked, and released when the parser associated with the message is reset. That is when the node has finished its work or after a PROPAGATE without a DELETE NONE.
What to do
Large Repeating Structures
Here is an example of how to deal efficiently with a large repeating message structure which might be many megabytes in size. This code is taken from the Large Messaging sample in the sample gallery of the Message Broker Tookit. See the sample gallery if you would like to run the sample. The message flow works by reading in the whole message, storing it a ROW variable, and then processing one element of the repeating structure at a time. Each element is then sent along the remainder of the message flow using the PROPAGATE function. When an element of the repeating structure has been processed it is deleted by using the statement DELETE PREVIOUSSIBLING OF refEnvironmentSaleList; The key factor in the success of this technique is the use of the ROW variable rowCachedInputXML when give mutable tree. InputRoot is immutable and as such portions of it cannot be deleted.
CREATE COMPUTE MODULE XMLwithRepeat_to_singleXML_slicer_Compute -- ======================== -- The INPUT message format -- ======================== ---------SaleEnvelope Header SaleListCount SaleList (n) Invoice (2) Initial (2) Surname Item (2) Code (3)
---------
Description Category Price Quantity Balance Currency Trailer CompletionTime ROOT_LEVEL CONSTANT CHARACTER 'SaleEnvelope'; HEADER CONSTANT CHARACTER 'Header'; REPEATING_ELEMENT_COUNT CONSTANT CHARACTER 'SaleListCount'; REPEATING_ELEMENT CONSTANT CHARACTER 'SaleList';
-- Therefore, the repeating item which will be being processed is the 'SaleList' element. -- Elements within SafeList will not be referenced *specifically* by this code (but they will -- be parsed and hence memory will be claimed to store information about the internal elements ). -- Declare module level variables ("global" to this module) DECLARE intNumberOfSaleListsDeclared INTEGER 0; DECLARE intNumberOfSaleListsFound INTEGER 0; /* =================================== Main function to control processing =================================== */ CREATE FUNCTION Main() RETURNS BOOLEAN BEGIN CALL ProcessLargeMessageToProduceIndividualMessages(); CALL ProduceProcessingCompleteNotification(); END;
/* ============================================================================================
> Declare variables > Find first instance of the element to process > For each instance found 1> Release memory used to store information about the previous instance (if appropriate) 2> Call a procedure to produce a single message the current instance 3> Look for a following instance ============================================================================================ */ CREATE PROCEDURE ProcessLargeMessageToProduceIndividualMessages() BEGIN -- Creat a (local to this node) variable to hold a mutable tree... DECLARE rowCachedInputXML ROW; -- ... and create a suitable parser (DOMAIN) to process the incoming message /* As both the incoming message AND the new parser are XMLNSC no translation is required and therefore the XML message is NOT fully parsed */ CREATE FIRSTCHILD OF rowCachedInputXML DOMAIN ('XMLNSC') NAME 'XMLNSC'; -- Create a reference variable to be used to traverse the input XML message /* Which will be processed via the local variable described above */ DECLARE refEnvironmentSaleList REFERENCE TO rowCachedInputXML.XMLNSC; -- Create a mutable tree by copying the INPUT XML to the local parser /* This is to allow data about parsed message elements to be deleted from the message tree (which can not happen on the InputRoot as its message tree is immutable) */ SET rowCachedInputXML.XMLNSC = InputRoot.XMLNSC; -- Determine how many SaleList items are expected... IF FIELDNAME( InputBody.{ROOT_LEVEL}.{HEADER}.*[>]) = REPEATING_ELEMENT_COUNT THEN SET intNumberOfSaleListsDeclared = InputBody.{ROOT_LEVEL}.{HEADER}.{REPEATING_ELEMENT_COUNT}; ELSE THROW USER EXCEPTION MESSAGE 2999 VALUES ('LMSmessageFailure', 'No count found!'); END IF; -- Acquire the first SaleList element... MOVE refEnvironmentSaleList FIRSTCHILD NAME ROOT_LEVEL; IF NOT LASTMOVE(refEnvironmentSaleList) THEN
THROW USER EXCEPTION MESSAGE 2999 VALUES ('LMSmessageFailure', 'No root element found!'); END IF; -- The next line results in the parser attempting to locate the first SaleList structure... MOVE refEnvironmentSaleList FIRSTCHILD NAME REPEATING_ELEMENT; -- Loop around each SaleList item WHILE LASTMOVE(refEnvironmentSaleList) DO -- Increment the count of SaleList items found... SET intNumberOfSaleListsFound = intNumberOfSaleListsFound + 1; -- Are we on the second, or subsequent repeating item? IF intNumberOfSaleListsFound > 1 THEN -- YES, therefore erase the parsed details about the previous item to release memory /* The following line is most significant with respect to memory usage. Its execution results in the last-but-one *repeating* element (SaleList), including subordinate elements, of the message tree being deleted allowing the memory used to hold information generated during parsing to be reused for further parsing. */ DELETE PREVIOUSSIBLING OF refEnvironmentSaleList; END IF; CALL ProduceIndividualSaleListMessage(refEnvironmentSaleList, intNumberOfSaleListsFound); -- The next line searches for another repeating element... MOVE refEnvironmentSaleList NEXTSIBLING NAME REPEATING_ELEMENT; END WHILE; END;
/* ==================================================================== Produce a message consisting of one "slice" of the compound message. ==================================================================== */
CREATE PROCEDURE ProduceIndividualSaleListMessage(IN refEnvironmentSaleList REFERENCE, IN intSaleListNumber INTEGER) BEGIN -- ================================== -- The relevent OUTPUT message format -- ================================== -Parent -Number -SaleList CALL CopyMessageHeaders(); SET OutputRoot.XMLNSC.{ROOT_LEVEL}.Number = intSaleListNumber; SET OutputRoot.XMLNSC.{ROOT_LEVEL}.{REPEATING_ELEMENT} = refEnvironmentSaleList; -- Generate a new message consisting of one SaleList structure PROPAGATE; END;