Overview
There are many techniques to perform an Initial Load.
In the simplest scenario, in which the source and target systems are quiescent with
no updates being applied to either databases while the data is instantiated, it is
recommended to use a native tool to copy the contents of one database to the
other. As a rule, any time a native tool is available for synchronization, it is preferred.
This is ideal for homogeneous environments where the source and target database
are of the same type and the schema and tables have the same definition and
structure.
Native tools may include using archival techniques, backup and restore, Oracle
Transportable Tablespaces/RMAN/expdp/impdp, splitting a mirror, PAK and UNPAK
on NSK, and others.
If the source and target databases are not identical in type and structure, or if data
transformation is required, then using only native tools might not be sufficient to do
the task and GoldenGate replication, in some form, must be used for the initial
synchronization.
A GoldenGate Initial Load technique must also be used if the source system is not
quiescent during the initial load, or if the target system has existing data.
After the initial load is complete, databases can be kept in sync using the ongoing
change synchronization processes EXTRACT and REPLICAT. All (selected) changes
occurring against source tables are automatically captured by EXTRACT, formatted,
and transferred in near real-time to temporary (trail) files on the target. Once there,
the data is read from these files and applied to the target database by the REPLICAT
process. Checkpoints enable both EXTRACT and REPLICAT to process data
continuously from run-to-run. Checkpoints enable EXTRACT and REPLICAT to be
restarted seamlessly, ensuring that all records are replicated once and only once.
Terminology
The terms (Online) Change Synchronization Extract/Replicat group are used to
refer to the ongoing Extract/Replicat processes used to keep the source data
synchronized with the target data.
The terms Initial-Load Extract/Replicat, refer to the Extract/Replicat used to
perform the initial copy of source data into the target system.
Ongoing change capture (Extract) must be started as the first step of the initial
synchronization process. Change synchronization keeps track of ongoing
transactional changes while the initial data load is being applied. @The exact step at
which the change synchronization is enabled depends on which method you
choose to perform the initial load.
Zero Effective Lag
Zero effective lag is one of the conditions that indicates the replicat is current with
data being written by the Extract. At this time, there should no longer be any
chances for a ‘collision’ between data loaded by the initial load and data captured
during the initial load. At that time, no error handling should be done as it may mask
true errors. If your data does get out of sync and HANDLECOLLISIONS (HC) is turned
on, there is no way to determine what caused the error and a new initial load will be
required.
Effective lag can be checked by doing any of the following:
GGSCI> lag <replicat>
GGSCI> send <replicat>, getlag
GGSCI> info <replicat>
The above methods will return information about the replicat including a line that
looks like:
[Link] (updated [Link] ago)
This line has two lag components. The first ([Link]) shows how many hours,
minutes and seconds the replicat is lagging. In most cases, this should be a few
seconds. Note this is not the time it will take to catch up.
The second component is how long it has been since this value has been updated.
This should also be no more than a few seconds.
A lag in bytes may also be displayed like.. 23k. This is the number of bytes in the trail
file that the replicat still needs to be processed. When either the number of bytes
approaches a few thousand or the number of seconds is fewer than 10 seconds, you
can consider yourself with a zero effective lag for initial load purposes.
INFO <replicat> will also show in the header and timestamp of the last record in the
trail that it has processed. When that timestamp exceeds the time that the initial
load was finished, then there will be zero effective lag.
Prerequisites for Initial Load
In all cases of Initial Load in which the systems are not quiescent, it is necessary to
establish ongoing extraction prior to the actual synchronization of databases. This is
to be certain that any changes being made to the source while the synchronization
is being done will be captured and applied as part of the synchronization. Once the
sync is done, these newly captured changes are applied to the target before ongoing
change synchronization begins. This ensures the integrity and consistency of the
two databases.
GoldenGate provides multiple methods for an Initial Load. These can combine the
use of native tool methods, extfiles, exttrails, or direct loads. Each of these has
variations and advantages and disadvantages. Each of those methods will be
discussed in detail.
When GoldenGate is used to perform the Initial-Load, the following prerequisites
should be taken into account:
1. Disable DDL processing
Prior to starting the initial-load, make sure DDL processing is disabled from the
Extract and Replicat parameters.
2. Prepare the target tables
The following are suggestions that can make the load go faster and help you to avoid
errors:
● Data: Make certain that the target tables are empty unless otherwise required. If
the target tables have data then there may be duplicate-row errors or conflicts
between existing rows and rows that are being loaded.
● Constraints: On the target site, disable foreign-key constraints and check
constraints. Foreign-key constraints can cause errors, and check constraints can
slow down the loading process. The constraints can be reactivated after the load
concludes successfully.
● Disable any triggers on the target tables. Triggers firing at the same time than the
data is loading can cause errors. They can be enabled once the initial load is
complete.
● Indexes: Remove indexes from the target tables. Indexes are not necessary for
inserts. They will slow down the loading process significantly. For each row that is
inserted into a table, the database will update every index on that table. You can add
back the indexes after the load is finished.
NOTE: A primary index is required for all applications that access DB2 for z/OS
target tables. You can delete all other indexes from the target tables, except for the
primary index.
3. Configure the Manager process at source and target systems
Configure and start the manager on both systems. The same manager can be used
for the initial-load Extract/Replicat and ongoing change synchronization groups.
4. Create a data-definitions file
Should the source and target database differ in the structure of tables, a data-
definitions file is required to be able to convert the data in the format required by the
target database.
Refer to the documentation for more information on the DEFGEN utility and how to
create the data-definitions file.
5. Create change synchronization groups
This step can be omitted if the source db is quiesced during the initial-load.
Online change-synchronization Extract and Replicat groups are created to enable
the capture and replication of ongoing changes while the initial-load is taking place.
NOTE: It is important that the ongoing change-synchronization Extract groups are
started for the first time at the exact step that are required. This depends on the
actual technique employed for the Initial-Load, refer to those for detail. It is equally
important the the ongoing Replicat NOT be started until the appropriate time. This
time is genarally AFTER the completion of Initial Load Replication.
Also, take into account that the first time that the Extract starts in a new
configuration, any open transactions are skipped; the new Extract will only capture
transactions that begin after the start of the Extract. Refer to Document
1347191.1 How to handle OPEN TRANSACTIONS during INITIAL LOAD when using
instantiation from Oracle source database.
For the ongoing change-synchronization Replicat, the parameter
HANDLECOLLISIONS (HC) is recommended only during the initial load when the
source database is active during the load.
The Replicat parameter HANDLECOLLISIONS will take into account collisions that
occur during the overlap of time between the initial load and the ongoing change
replication. It reconciles insert operations for which the row already exists and
reconciles update and delete operations for rows that do not exist.
Once the initial load is complete, HANDLECOLLISIONS should be removed from the
Replicat as after this time there should be no longer any chances for a 'collision'
between data loaded by the initial load and data that has been captured by the
ongoing change-synchronization Extract during the initial load.
At that time, no error handling should be done via HC as it may mask true errors.
Note that if the data does get out of synchrony and HANDLECOLLISIONS is turned
on, there is no way to determine what caused the error and might require a new
initial load.
NOTE: To use the HANDLECOLLISIONS (HC) function to reconcile incremental data
changes with the load, each target table must have a primary or unique key. If you
cannot create a key through your application, use the KEYCOLS option of the TABLE
and MAP parameters to specify columns as a substitute key for GoldenGate’s
purposes. A key helps identify which row to process. If you cannot create keys, the
source database must be quiesced for the load. If you use HC and you have no
primary key or unique index on the target table, it is possible to insert duplicate
rows. The use of KEYCOLS will not prevent this.
Tools to Perform Initial-Loads
In a scenario in which the source and target systems are quiescent with no updates
being applied to the databases whilst the data is instantiated, it is recommended to
use a native tool to copy the contents of one database to the other. As a rule, any
time a native tool is available for synchronization, it is preferred. This is ideal for
homogeneous environments where the source and target database are of the same
type and the schema and tables have the same definition and structure.
Native tools may include using archival techniques, backup and restore, Oracle
Transportable Tablespaces/RMAN/expdp/impdp, splitting a mirror, PAK and UNPAK
on NSK, and others.
If the source and target databases are not identical in type and structure, or if data
transformation is required, then native tools might not be sufficient to the task and
GoldenGate replication, in some form, must be used for the initial synchronization.
A GoldenGate Initial Load technique must also be used if the source system is not
quiescent during the initial load, or if the target system has existing data.
The advantage of using GoldenGate techniques to perform the initial-load is the
ability to keep track of ongoing transactional changes whilst the initial load is being
applied.
Change synchronization groups as described above, can be configured to track the
incremental changes that will then be reconciled with the results of the Initial-Load.
Loading with a Database Utility
Using this method, the database utility performs the Initial Load on the target site.
If the source site is active whilst we are instantiating the target then an Extract group
can be configured to extract ongoing changes whilst the database utility takes and
applies a static copy of the data from the source. Once the copy completes, the
change synchronization Replicat needs to be started to resynchronize rows that
were changed whilst the copy was being applied. Afterwards, both the Extract and
Replicat continue running to main data synchronization.
This method does not involve any initial-load Extract or Replicat processes.
Steps to Perform
1 . Ensure the Prerequisites for Initial Load discussed above have been addressed.
2. Start the manager process on both source and target systems:
GGSCI> START MANAGER
3. Start the change-synchronization Extract on the source
GGSCI> START EXTRACT <change-synchronization Extract>
In Oracle Databases, if replicating sequences, then make sure you
issue DBLOGIN at this point, as the user who has EXECUTE privilege on
[Link].
GGSCI> DBLOGIN USERID DBLOGINuser, PASSWORD password [<encryption-
options>]
Then, issue the following command to update each source sequence and generate
[Link] the redo, Replicat performs initial synchronization of the sequences on
the target. You can use an asterisk wildcard for any or all characters in the name of a
sequence (but not the owner).
FLUSH SEQUENCE <[Link]>
4. Make the copy on the source system, note the exact time of completion.
5. Set the HANDLECOLLISIONS parameter in the change-synchronization Replicat
and start the replicat:
GGSI> START <change-synchronization Replicat>
6. Check the status of the Replicat, using INFO REPLICAT, until you verify that all the
changes generated during the initial-load have been posted.
7. Once that you confirm all changes have been posted, then turn off
HANDLECOLLISIONS and remove the parameter from the Replicat parameter file so
that the next time the process is started the change is taken into account:
GGSCI> SEND REPLICAT <change-synchronization Replicat>,
NOHANDLECOLLISIONS
8. From this point onwards, GoldenGate continues to synchronize the data.
Refer to the section Instantiation from a Source Oracle Database, for a better
method to instantiate an Oracle Database without the need to specify
HANDLECOLLISIONS at all, using LOGSCN.
Instantiation from a Source Oracle Database
For a better technique when instantiating from a Source Oracle Database, refer to
the attached Technical Brief, which provides an introduction to Oracle GoldenGate’s
best practices and guidelines for instantiation of a target database from an Oracle
Source Database using LOGSCN.
The document is applicable to Oracle GoldenGate Version 10 and above and Oracle
Database releases 9.2 and above and all Oracle GoldenGate currently supported
releases.
This method of initial-load allows the user to start Replicat at a specific point in the
trail file created by Extract by using CSN number so the Replicat will not hit any
'duplicate' records. This is a much better initial load methodology than the
traditional solution of adding HANDLECOLLISIONS in the Replicat parameter file.
Document 1276058.1 Oracle GoldenGate Best Practices: Instantiation from an
Oracle Source Database
Loading Data from File to Replicat
Using this method of initial load, an initial-load Extract writes the records to an
external file which Replicat uses to apply the changes to the target site.
This is quite a slow initial-load method and only recommended if the amount of data
to be loaded is not too large and other methods cannot be used due to limitations
such as column size/data-type limitations.
For detail information and an example, refer to Document 1441172.1 GoldenGate :
File to Replicat - Initial Load Techniques.
Loading Data from Trail to Replicat
Using this method of initial load, an initial-load Extract writes the records to a series
of external files which Replicat uses to apply the changes to the target site.
This is faster than the previous method because replication can begin while
extraction is already started.
It can also process a near unlimited amount of tables.
For detail information and an example, refer to Document 1195705.1 How to initial
load files/tables larger than 2 gig using rmtfile
Loading Data from File to Database Utility
Using this method, an initial-load Extract writes the records to an external ASCII
format file(s). The file(s) are used as data file(s) for input into target tables by a data
bulk load utility native to the database, such as Oracle's SQLLOADER, Microsoft's
BCP/DTS/SQL Server Integration Services (SSIS) or IBM's LOADUTIL.
Any transformation of data will need to be made by the initial-load Extract. The
initial-load Replicat generates the needed run and control files that will be used by
the native tool.
For more information on required parameters, sample configurations and examples,
refer to Document 1457989.1 Goldengate : File To Database Utility - Initial Load
Techniques
Loading Data with an Oracle GoldenGate Direct Load
With a GoldenGate Direct Load, the initial-load Extract extracts the records and
sends them directly to a Replicat initial-load Task.
Transformations and mappings can be done, during extraction or apply, but this
method of initial-load does NOT support tables with LOBs, LONGs and UDTs (User
defined data types), and any other data-type larger than 4K.
For more details and examples, refer to Document 1457164.1 Goldengate : Direct
Load - Initial Load Techniques
Loading Data with a Direct Bulk Load to SQL*Loader
This method only works with Oracle's SQLLOADER utility (SQLLDR).
An initial-load Extract extracts the source records and sends them directly to an
initial-load Replicat Task, which is dynamically started by the manager. The initial-
load Replicat task interfaces with with the SQLLDR API to load the data as a direct-
bulk load.
Data Mapping and Transformation can be made by either the initial-load Extract or
initial-load Replicat.
This method does not support tables with LOBs or LONG data-types, you can use
File to Replicat or File to Database Utility as alternative initial-load methods.
Materialized views with LOBs are not supported either.
Data Encryption is not supported either with this initial-load method.
For more information and examples; refer to Document 1461851.1 Goldengate :
Direct Bulk Load to SQL*Loader - Initial Load Techniques.
Performance during Initial-Loads
Consider the following suggestions, which can help increase the speed of an the
initial-load:
Parallel Processing
Parallel GoldenGate processes can be used for all methods of initial-load except
those performed with a Database Utility.
You can use multiple Replicat processes with multiple Extract processes in parallel
to increase throughput. To differentiate among Replicat processes, you assign each
one a group name
It is recommended to group tables that are related, ie. via referential integrity
constraints, within the same group.
TABLE and MAP parameters can be used to specify a different set of tables for each
pair of Extract/Replicat, ie ext1/ext2 and corresponding rep1/rep2).
Another technique is to use the RANGE function to split the rows into equal buckets,
GoldenGate manages this by using a hash algorithm on the key values.
For instance, split the table into 3 parallel processes:
EXT1
TABLE [Link] ,FILTER (@RANGE (1,3)
EXT2
TABLE [Link] ,FILTER (@RANGE (2,3)
EXT3
TABLE [Link] ,FILTER (@RANGE (3,3)
Refer to Document 1289341.1 Guidance on the best way to use RANGE function
with EXTRACT.
Another way to implement parallel processing is by using SQLPREDICATE in the
TABLE parameter; this allows to partition the rows of large tables so that they are
processed among two or more parallel Extract processes.
Also, it can be used to select data based on some other criteria to allow restriction
of rows to be extracted and loaded to the target table, for instance, in cases where
not all the source rows will be loaded. SQLPREDICATE can also be used for ORDER
BY clauses or any other type of selection clause.
For intance, using SQLPREDICATE in the Extract to only load rows for a certain
period:
TABLE [Link] SQLPREDICATE "where date_joined > 'APR-2010'";
Columns specified as part of the WHERE clause should be part of a key or index for
best performance. Otherwise, a full table scan will be required, which will reduce
the efficiency of the SELECT statement.
NOTE: SQLPREDICATE should only be used for initial-loads not the ongoing change-
synchronization, and is valid for Oracle, DB2 LUW and z/OS, SQL Server, and
Teradata databases.
To avoid contention, it is recommended that when processing trail files, each
Replicat process its own trail. In any case, be aware of the disk contention if more
than 3 Replicats should be reading the same trail.
Storage and Network Considerations
As a general rule, always try to place the trails in the fastest storage available.
Storage Area Network (SAN) Disk
Although NAS is fine for the trails, SAN storage will in general perform better than
NAS as there is no network overhead.
When the source and target databases share a common disk farm, there is an
additional opportunity to reduce Initial Load time. Rather than write the trail to a
local disk and datapump it to a remote host, if the two nodes share disks then the
source node can write to a local shared disk and the target node can read locally
from that same disk. This reduces transfer time and operational complexity.
Database File System (DBFS)
If the source and target are compatible Oracle databases and are each connected
to the same DBFS file system, then this is the method of choice. It uses minimal
resources, has minimal latency, and superior reliability.
DBFS (Database File System) requires Oracle Database 11gR2 or higher.
DBFS creates a standard file system interface on top of files and directories that are
actually stored as SecureFile LOBs in database tables.
DBFS is similar to to NFS in the sense that it provides a shared network file system
that appears like a local file system.
DBFS is ideal when the databases are RAC.
The source and target servers must be both connected to an Oracle DBFS. The
Extract and Replicat can use the same trails; the extract writes locally and the
replicat reads locally increasing performance, optimizing resource utilization and
increasing stability.
For more information on DBFS refer to the documentation:
Oracle® GoldenGate Oracle Installation and Setup Guide
11g Release 2 Patch Set ([Link].1)
E29797-01
April 2012
APPENDIX 4 Preparing DBFS for active-active propagation with Oracle GoldenGate
and the Oracle Database Documentation:
Oracle Database SecureFiles and Large Objects Developer's Guide 11g Release 2
(11.2)
TCP Settings/Network
DYNAMICPORTLIST is strongly recommended for best performance.
The Collector process is responsible for finding and binding to an available port, and
having a known list of qualified ports speeds this process. In the absence of
DYNAMICPORTLIST (or if not enough ports are specified with it), Collector tries to
use port 7840 for remote requests. If 7840 is not available, Collector increments by
one until it finds an available port. This can delay the acceptance of the remote
request.
For more information about PORT and DYNAMICPORTLIST, refer to the Oracle
GoldenGate Windows and UNIX Reference Guide, section 'Configuring Manager and
Network Communications' for more information regarding the manager parameters.
The TCPBUFSIZE option controls the size of the TCP socket buffer that Extract will try
to maintain, allowing larger packet sizes to be sent to the target system.
Refer to the documentation for more details:
Windows and UNIX Troubleshooting and Tuning Guide, section 'Configuring Oracle
GoldenGate to use the network efficiently'.