April 8, 2016

Migrating SAP Using Oracle GoldenGate


SAP: Making The Impossible Project...Possible!

Recently I got involved in a large and very critical project to lay out and implement both a platform migration and Oracle RDBMS upgrades for a large SAP implementation.  The project goal was to use Oracle GoldenGate (CDC/logical replication) to migrate all the SAP databases to new hardware platform (complete hardware stack refresh) and upgrade the RDBMS versions and implement newer RDBMS features all in the same project.

Sounds complex, and trust me, it can be managed to mitigate, but this is exactly what Oracle GoldenGate can help you achieve. However, when coupled with a few other essential Oracle products/features (Flashback, DataGuard and RAT), the project risks can be managed and mitigated by thorough testing and validation before any cutover cutover.  Actually, it can tested and validated until all the application owners/teams are absolutely comfortable and ready for the cutover, more so than the DBAs (which were already anxious with some hesitation).

As with most critical enterprise applications, the SAP eco system was absolutely business critical across the enterprise and therefore the project had the following requirements to mitigate project risks and reduce downtime for the cutover event(s):

  1)  a significantly reduced downtime window/duration for the cutover to the new environment
       - less than 24 hours for the cutovers (all in one 24 hour period, or multiple cut over events)

  2) ability to migrate the databases independently to the new environment (split migrations/cutovers)
       - split the database migrations over multiple periods, each several weeks apart
       - validate application functionality with a split environment (some still legacy and some new)

  3) "fall back" or "fail back" replication from the new system back to the legacy environment
       -  any database could fail back independently if needed

  4) performance validation of the complete stack for each system before cutover
      - avoid WAR room as much as possible: bad sql, communication issues, general DB performance

  5) backup/recovery and DR setup and validated before production cutover
      - all aspects of backup/recovery and DR tested and validated before final cutover

Fallback ("fail back") replication is intended mainly to:
  1) reduce migration risks of any data loss in case of a fail back to the legacy environment
       - all transactions in the new environment are captured and replicated back to the legacy system
  2) reduce the fail back window in case of fall back to the legacy system production environment
       - fail back replication for up to 30 days

One last fall back requirement was that any of the
databases of the SAP ecosystem could fall back on its own, independently of the others
(cross environment and platform compatibility worked and tested).


The migration of the entire SAP ecosystem consisted of migrating to a new hardware platform/OS (Endian conversion needed) and a new/upgraded Oracle RDBMS version, with the goal
of migrating to a newly sized/designed and supported hardware/OS platform and RDBMS
version; while also utilizing many of the new features available in the new RDBMS/RAC
version for both database performance,  management, HA and DR.

I am happy to report (very happy and relieved that we did all the extra work/diligence to make it
a huge success) that we have just completed the production migration to the new environment
and the entire project was a complete and overwhelming success.  The post cut over WAR room
on Monday lasted about 5-6 hours to address less than a half dozen issues that popped up, all of
which required tweaks to the SAP application configuration as a result of moving to Oracle
RAC from a big iron and single instance configuration in legacy.  A few other errors popped
up that already existed in the legacy environment and the app team decided to fix at that time.

The cut over itself took 6 hours of technical time (detailed below), which was more than we
wanted for the technical pieces, but the BASIS / other app teams needed lots of time to reload
the configuration and do their validations for the new app servers once the databases were
officially cut over to the new Oracle RAC environment.  The database cut over for the Oracle GoldenGate replicated databases (6 of them totaling 70TB of data) was 3.0 hours for the
databases. For the database cut over, 95% of the cut over time was spent running business validation/compare scripts and table row count scripts against old and new databases. The
remaining time in the cut over was for the application reconfiguration and validation steps.
The application reconfiguration and technical validation was another 4 hours, after which both functional/interface validations and business validations were performed for a total of 13 hours
of down time. During the functional and business validations the SAP applications were live
and running and any interfaces could have connected.

The real value of using Oracle GoldenGate for the database migration is that all work can be completed and validated ahead of time and with no downtime of the existing production systems,
even DR was setup and validated before final cut over.  We used DR for RAT testing before cut
over which is highly recommended.

Major work stream durations for the production cut over:

3 hours for database cut over and data validation for all databases
   - databases were cutover in 10-15 minutes (thanks to Oracle GoldenGate)
   - remaining time was for static row counts and business queries used
     to validate data for a static period
4 hours of application re-configuration and validation
3 hours of functional/interface validation for critical processes
2 hours of business validation of critical processes
-----------------------------------------------------------------------------
12 hours (SAP applications open and live in the new eco system)

From a high level, the migration process consisted of the following (of course done in lower environments first to work out the processes, tasks and ordering of tasks, and then a few dry
runs before doing the production migration to fine tune the steps and timings:

1) Sizing, design and acquisition of the new hardware (servers, storage and networking)

2) Building of the new database serrvers (RAC cluster) and app server cloud environments
     - all Oracle RAC systems and new app servers on VMWare

3) Building the new databases (shell= DBs) on the new environment
    - patching to the latest SAP patches (which include Oracle RDBMS patches)
    - install Oracle GoldenGate on source and ensure it is running

4) Installation and configuration of Oracle GoldenGate
     - on both source and target systems for CDC (for both DDL and DML)
     - GoldenGate configured with EMS mappings as well as application mappings

5) Online data migration, in phases and with no downtime of legacy databases
     (platform conversion needed - moving from Solaris to Linux)
      - Oracle DataPump of selected sets of tables with same flashback SCN for consistency
        - legacy DR could not be used (across the WAN and was 8-10 times slower - very old)
        - no storage cloning capabilities and no spare servers to restore production to
          in the same data center location
        - data migration was completed with no downtime for the existing production systems
          and was done in pieces (chunks) as to not impact existing production

6) Replicating the source transactions (CDC data) from legacy to the newly staged databases
     - using Oracle GoldenGate (real time CDC) to keep them synchronized with legacy production
       for a significantly reduced migration window
      - using GoldenGate and its EMS capabilities enhanced our ability to use
        Real Application Testing (RAT), highly recommended

7) Validating both data and metadata for all databases as a result of of the platform migration
     and the use of logical replication for each database
     - validation of all database objects in the new database
        -  users,profiles,views, synonyms, dblinks, directories, tables, indexes,sequences, etc
          - all applicable to SAP SR3 schema and data model
     - row counts (static row counts for all application tables both before cut over and at cut over)
     - data validation using Oracle Veridata leading up to cut over day (bit for bit comparison)
     - scripts to validate business data in each SAP module database
       (provided by business analysts and application teams)

8) Re-configuration of the applications for the new app servers and new database servers

9) Technical validation of the new environment for all applications

10) Performance validation of the new environment (RAT captures on legacy production)
     - legacy production captures (multiple times periods and duration of RAT captures)
       - used GoldenGate's EMS capabilities to insert markers to stop replication to the target
         at specific time periods corresponding to the different RAT capture periods
     - database performance testing and validation using Oracle Real Application Test (RAT)
        replays
        - Oracle GoldenGate, Data Guard (snapshot standby) and Oracle Real Application Testing
           are invaluable for these tasks (make it much easier to do and can be done in parallel)
     - load stress testing  and validation using Oracle Real Application Test on both the new
       production and DR systems (invaluable again)
    - validate load balancing
    - SQL tuning, patching and tweaks for performance at OS, network, RDBMS, etc

11) Functional testing of applications, interfaces, external loads, etc
       - validate major application components are functioning and all integration end points
          are functioning across the new Linux platform for all databases

12) HA/DR testing and validation of the new environment (end to end validation)
      - DataGuard setup for each database, and both role switches and forced fail overs tested
         - backup/restores tested using fail overs and restoring lost primary databases
      - application app servers failed over and tested against DR environment
      - cluster/database HA tests of the new environment, many different HA tests completed
        for the entire stack

13) Fall back or "fail back" logical replication implemented and tested
      (from new production back to legacy production)
      - tested only in Pre-prod (this is the one piece that could only be tested with pre-prod legacy)

14) Fall back of independent components (database and app servers in the SAP ecosystem)
       - validate major application components are functioning with cross platform components
       - different pieces of the eco system failed back to legacy with other remaining future state

15) Business validation
      - ensuring application works and critical processes all work

16) Return databases back to replicated state (Oracle Flashback Database) in staged production
      to resume Oracle GoldenGate catch up CDC replication
      - sync CDC from legacy production to staged production to keep synchronized
      - during testing periods only (at cut over this was the go live starting point)
         - OGG has keep all the datbases in sync with legacy production, waiting for final cut over

17) Everything validated and verified before cut over, wait for cut over time!
      - repeat this process as many times as possible to ensure a near flawless migration
        (with a large SAP eco system with many integration/communication points, some issues will                come up....remember, not for the faint of heart...stay cool and more importantly,
         have the right team in place)

That covers the high level process we went through for the SAP migration to the new environment
and new Oracle database environment. Definitely, testing and dry runs in lower environments are invaluable for your success. All tasks get documented and timings established before you talk to
the business about an appropriate outage and the over risks.

This was an awesome project. I must warn, replicating SAP with any type of logical replication is probably not for the faint of heart. You will need to tweak the configuration to get apply
performance and issues with come up that must be dealt with (especially with the SAP BW
system).

Overall, using Oracle RDBMS/RAC, Oracle GoldenGate, Oracle DataGuard and Oracle Real Application Testing (RAT) made this project a huge success with a significantly reduced total
effort and resources required (RAT is a life safer to be blunt). These products made an
"impossible project possible" within the time frame/budget left in the overall transformation project with some 30+ GoldenGate migrations a huge success.

By using Oracle GoldenGate in the project, we were able to consolidate 3-4 different projects into one (moving to a new hardware platform, new Oracle RDBMS version (upgraded to a supported version), new Grid version, Oracle RAC/ASM, enhancements to the physical and logical database design (new storage, partitioning, BIGFile table spaces, Secure Files, some different indexing strategies for specific tables due to RAC, etc).

This new consolidated project was now more complex and a bit longer than any of the 3-4 individual projects by themselves and a bit riskier due to the added scope and complexity of all the moving parts and using logical replication for the data migration. However, do not let this discourage you, as these risks (all of them) can be managed and mitigated. In addition, with the use of the Oracle products mentioned, you can help reduce the risks and effort involved.  Also, by consolidating the projects into one, we reduced the amount of testing time significantly as compared to 3 or 4 projects; basically we were able to test once for all pieces.

For now, thank you Oracle, for the Oracle GoldenGate product (this is an awesome product and a market leading product), and thank you SAP for allowing the use of Oracle GoldenGate to migrate SAP to a new environment (of course with a certified SAP migration specialist on hand, as well as Oracle GoldenGate expert).

Using Oracle GoldenGate to migrate SAP to a new environment can literally make the "impossible project possible".  Many of the risks mitigated would otherwise still be present and would be nearly impossible to mitigate and would require a large and lengthy WAR room at post cut over. And, GoldenGate allowed for a significantly reduced outage window for the database migration pieces, typically from 16-48 hours to 1-2 hours, which is significant to the business and critical processes supported by the SAP implementation and the potential cost of downtime to any or all parts of the SAP eco system.

Replicating the SAP data model using logical replication is both easy (no foreign keys or triggers and only one sequence, and most tables have a PK or UI on them), and also a bit complicated at the same time due to the fact that SAP uses a single schema for all application data (SAPSR3).  In a nut shell, all the data is in one schema for each database. When using Oracle GoldenGate, having multiple schemas makes instantiating and establishing apply at different time periods a bit easier because you can do it in pieces (by schema in most cases) rather than customizing the mappings for each table or piece with mappings to evaluate CSN (it can be done, but we needed a simple config) . To instantiate an SAP database from a production and without any downtime and little impact on it makes for some challenges when all the data is one schema in each database.  In this case, we used a static table listing to pull over all the tables in pieces (groups by SCN), setup apply for each piece, then added an apply that used a wildcard to catch any new tables and other things such as OTHER DDL category.  If using 11.2.0.4+, definitely use the integrated apply if at all possible and patch SAP to the highest you can because the RDBMS patches are included and integrated apply requires RDBMS patches in addition to using Oracle GoldenGate 12c. However, please please, be aware, that if doing fall back replication to an older Oracle version, integrated apply likely will not work for you and you will need a different apply configuration for the fall back replication (if using it).  In our case, integrated apply worked really well, but we needed classic replicats (coordinated) for fallback replication. Therefore, instead of having two apply configurations and having to tune each strategy, we stuck with the classic replicat approach to allow use to easily and quickly reverse the replication with no additional configuration needed for the apply on the older 11.2.0.2 databases.  We used integrated capture where we could.

I will continue to update this post to also include some additional details regarding how others do it (Oracle ACS and their 02O), or triple "O" service) and how we deviated from it slightly in the initial load portion. We did not use GoldenGate initial load for the initial data migration and instantiation and used other Oracle approaches for the initial data migration (scripted framework for using DataPump).  The initial load capabilities in Oracle GoldenGate are definitely worth using and work really well and performance is acceptable in most cases.

More to come on this in late May or June.....trying to relax a bit now and take advantage of some free time.

Further reading...white paper from Oracle on SAP Migration:

http://www.oracle.com/technetwork/articles/systems-hardware-architecture/sap-migration-163923.pdf


Take care, thank you for reading my two cents worth and good luck in your migration projects.

To end this blog, I want to throw out a leadership quote by Red Adair: "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur."  Read more at: http://www.brainyquote.com/quotes/quotes/r/redadair195665.html