June 27, 2016

Oracle GoldenGate EventMarker System (EMS) to help facilitate Oracle Real Application Testing (RAT)

In this blog I want to share some neat tricks available to help facilitate Oracle RAT testing on the new target system or new target DR system when using Oracle GoldenGate as the migration replication tool.  We don't need to cover in detail why Oracle GoldenGate is a tool that every IT shop should have in its IT tool box, as I have already beat that topic to death...it is simple, Oracle GoldenGate is a tool to help make many impossible migration projects possible. This tool and its capabilities have helped tens of thousands of migrations possible and successful.

Now, onto the topic of interest. When doing Oracle-to-Oracle migrations, we all know how important it is to validate the new target platform, and validation should be done before cutover, not post cutover.

If using Oracle GoldenGate as the migration replication tool (which you should be!), you have access to some less common/unknown but helpful features of Oracle GoldenGate that can help with the data synchronization requirement for Oracle RAT (Real Application Testing) capture and replay on the new target state environment.  RAT is best run when the data is in sync between source and target at time of capture. We must remember, RAT captures the sources transactions with the data/bind used for replay. TO avoid high replay error percentage (or complete replay failover), the data should be in sync at time of replay (in sync as of the time of the capture start).

With the full tool set from Oracle: Oracle RDBMS, Oracle GoldenGate, Oracle RAT and Oracle Data Guard (not required), you can satisfy the data in sync issue and can validate the new target state platform and tune and tweak as much as needed before cutover (as many times as you like to proof performance and scalability). The end result.....an extremely successful cutover for your customers and application owners (and usually a very short, if any at all, war room presence), and hopefully IT will be looked at a as a hero instead of a villian.

In Oracle GoldenGate is a little known feature called EventMarker System (EMS). In Oracle GoldenGate, this is nothing more than a set of parameters to existing mappings or the mapping configuration that can be used to interact with replication due to events and respond to those events.
Lots and lots of uses for EMS, but lets stick with using it to control our replication to the target to allow us to stop replication at the right time for the replay to occur with data sync at time of capture.

Usually the triggering event for EMS is some type of DML transaction that occurred in the source and that matches a criteria we specify. The criteria is quite flexible with EMS (you code it using parameters and options), but for simplicity, lets say our criteria is a DML transaction against a table in the source that has certain column values in the transaction. We will look for this type of transaction on the table, and evaluate the data in the transaction and respond or not respond when column values match our criteria.

In our case, when the RAT capture on the source starts, we also want to insert a record to the control table that will be extracted and pumped over to the target system. The target system will be applying data to keep the target in sync and once it sees this EMS transaction, it will respond by stopping all apply processes (replicats). This will ensure that our target is at the same point in time data-wise as the source system so our replay will have very few errors due to out of sync data on the target.

We will also use Flashback features of Oracle RDBMS on the primary target system or the target's DR environment which is already running and being synced with Data Guard. We can choose to run the replay on the primary or snapshot standby depending on our needs and lag threshold for the target itself. IF data must be keep in sync regularly due to high volume, we can run the replay on the target system DR standby.


The entire process from a high level looks like this:

1) Build the new target stat environment
    - with a shell DB on the new target platform/version as you like
    - configure the shell DB as you like (physical and storage layout for tablespaces
2) Implement the requirements for Oracle GoldenGate capture on the legacy source
    - database level supplemental logging
    - userid with privs for Oracle GoldenGate
   - correct init.ora parameters needed for replication
3) Install Oracle GoldenGate on the source system, configure and start extraction
    - create subdirs
    - create configuration files
    - schematrandata (or trandata for all schemas/tables in replication scope)
4) Start up capture and validate extraction is:
    -  running and stable
    -  no long running transactions existed before extract started and are still running (they will be
       skipped)
5) Install Oracle GoldenGate on the target system and configure for target apply
     - start the manager process for communications
6) Create the Pump on the source system and start pumping the CDC data to the target system
7) Start a consistent export on the source system (using flashback_scn)
    - for all schemas/tables in replication scope (this step may be in phases depending on issues)
    - also, a full metadata export may be needed
8) Import the consistent exported data into the target system
9) Create replicats to start applying the data and starting after the SCN (CSN in GoldenGate)
     - special command to start the replicats from a particular CSN
10) Apply any RAT patches needed to the source or target database platform/versions
11) Create the table needed for EventMarker system (a simple control table in the source / target)
      - and add table-level supplemental logging (not required since it is not a critical table,
        but a good practice)
12) Implement the Oracle GoldenGate EventMarker system mappings
      -  in the source extract, pump(s), replicat(s)
13) Restart all the Oracle GoldenGate processes to reload the mappings/configuration now
      including EMS parameter/mappings
14) Let replication caught up on the target (if you desire to validate replication is working)
      -  not required at all, but a good idea at first to ensure all replication is working)
15) Test EMS a time or two just to be sure the mappings work as configured
     - mappings can stop extract, pump or replicats or all
     - suggest only stopping replicats, or pumps but leave extracts running at all times
15) Start RAT capture and insert the control table DML on the source
      -  a simple insert with 4-5 column values as part of it
          - the insert uses key words for certain columns to instuct replication to stop
          - the column values will match criteria we setup in the mappings for evaluation and matching
      - can do the insert before the RAT command or after
16) While RAT capture is running, verify the replicats have all seen the DML and acted upon it by           stopping

17) Once RAT capture is complete, do the remaining steps needed to process the capture
18)  Now we are ready for replay or should be:
       - replicats should all be stopped at the same point in time due to our EMS and control DML
19) Decide where to run the replay, on the primary or standby system
      - for primary, stop all OGG processes, create a GRP on the target and replay
      - for standby, stop DG shipping and apply, stop DG, create GRP and open as snapshot
        - perorm replay on standby

Note: Flashback DB is our friend here. We can flashback and replay the capture as many times as needed to tune/tweak the target system to our liking

20) Flashback DB to same state as before replay (where OGG was at in apply state)
21) Restart Oracle GoldenGate processes on the target to catch up
22) Repeat as necessary to make your migration a complete success.


Sample EMS mappings and control table:

....still updating....have to run off to do a project (coming soon, 7-10-2016)

Good luck and let me know if you have any questions or feedback.

Gregory Thomas



No comments:

Post a Comment