Data Movement Options for Commerce Databases – Creation of staging database during build

Posted by

So one of the most common questions I get is about moving data between homogenous Commerce databases. Our standard setup includes 4 environments – Dev, QA, Stage, and Prod. Dev/QA are a stagingprop pair, Stage/Prod are a stagingprop pair, and Prod usually has HADR. So with 4 environments, they can get out of sync. For the most part, business users are making changes on staging, and stagingprop is used to move those changes to prod. Therefore the catalog between staging and prod is kept in sync, and we know right away if stagingprop fails and it gets out of sync. Some data we never expect in an environment other than prod – and in fact do not want it to end up in the lower environments where many developers have unrestricted access, and security is not as tight. That data includes orders, users, payment information, etc.

I hope to present a series of postings to address a number of scenarios and approaches for this kind of data movement. I’m not talking in this series about data movement as in loading catalog or user data into the database for the first time, or keeping things in sync with a separate system of record.

So the most common set of goals are:

  1. Keep private data in production only
  2. Synchronize catalog and other data across environments
  3. Staging is the master for catalog and data that is needed in all environments

Data movement Scenarios:

Create a staging environment to match production

This is most commonly encountered during build, but can be needed after build as well to refresh the data in a staging environment to match production. Commerce intends that we create our production environment, develop there, and then move the database data back to staging using the stagincopy utility. Essentially the stagingcopy utility does these tasks:

  1. Exports data from the production database
  2. Deletes any data from the staging database
  3. Imports data into the staging database
  4. Creates the stagingprop triggers (and sequences) in the staging database

When running stagingcopy you can either have stagingcopy do all the work, or you can have it write the script and run the script yourself.

One of the biggest problems with stagingcopy is that it does all of the work in the staging database in ONE UNIT OF WORK! This means that you must have huge transaction logs to support it. We usually run with a LOGFILSIZ of 10000 and 12 primary and 50 secondary logs. For a newer database with a small or medium size, you can get by with bumping LOGSECOND up to 240 in this scenario (an online change). If you’ve got the default log file size of 1000, there is not an online change you can make to make that work. There is a limit of 255 for the combination of LOGPRIMARY and LOGSECOND. What you can do is change LOGFILSIZ and then recycle the database for the change to take effect, making sure that your filesystem has space for the log files. If you don’t have log files on a separate filesystem, you’ll also have to make sure that your filesystem can hold the files data is exported to, the data being loaded into, and the log files. Depending on the size of your catalog, you may not be able to have enough log space to support stagingcopy.

Stagingcopy also tends to end up with cryptic error messages, and I usually expect at least two iterations to get it to work – with an experienced commerce DB2 DBA sitting there waiting for it to succeed or fail. My knee-jerk estimate (when I don’t know database/catalog size) is that a stagingcopy will take 8 hours to complete, during which the target (staging) environment will be unavailable. In most situations, that allows for it to start over at least once. I’ll talk in a future post about some non-supported ways that stagingcopy can be useful.

Stagingcopy must be run from an application server (if you have application servers that are separate from your database servers). You also must use the specify the same id to stagingcopy that you use for the application to connect to your database – stagingcopy cannot handle a different schema name than the user you specify on execution. I’m not going to go through the syntax in detail – it is in the info center:

http://www.ibm.com/support/knowledgecenter/SSZLC2_7.0.0/com.ibm.commerce.admin.doc/refs/rsscopy.htm

The one thing I will say is that in my experience the batchsize parameter does not work.

Stagingcopy is the supported way of doing this. But it’s not the only way we do it. IF the Commerce versions including FixPacks, Feature Packs, and Feature enablement are exactly identical(query the site table), and the Merchant Keys are also identical, then you can do a straight restore at the DB2 level from the production database to the staging database. I only recommend doing this pre-go-live or for environments where the database you’re backing up does not have any real customer or order data. If everything falls into line for this, it is immensely easier, taking usually less than 30 minutes, depending on your hardware and your catalog size. With Staging and Production on different servers, we keep the databases such that we can do a straight restore without a redirect (storage architecture is a separate topic), and do this kind of restore quickly.

Using this restore method, you also have to ensure that your staging triggers are in place after you have completed the restore. Conversely, you’ll need to make sure that the staging triggers DO NOT exist on your production database. You can query syscat.triggers like this to see if the staging triggers are there:

$db2 "select count(*) from syscat.triggers where trigname like 'STAG%' with ur"
1
-----------
        951
  1 record(s) selected.

The number of triggers will vary by version and with enablement of features like CMC, but you should certainly have over 500. If you don’t have them, you can find the syntax for them on the app servers, here(on Linux, anyway):

/opt/IBM/WebSphere/CommerceServer70/schema/db2/wcs.stage.trigger.sql

and for Commerce 7 you’ll also need to have the sequences in place:

/opt/IBM/WebSphere/CommerceServer70/schema/db2/wcs.stage.sequence.sql

These steps are not needed if you use stagingcopy because stagingcopy takes care of them for you.

Note that if you have different Feature Packs, FixPacks, or features enabled, using the restore method will succeed, and the app will come up, BUT you will have inconsistencies that will likely require you to rebuild your whole commerce instance from scratch at a later time.

Next post I’ll discuss keeping a dev or qa database in sync with your stage/prod pair.

Lead Db2 Database Engineer and Service Delivery Manager , XTIVIA
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 17 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home for XTIVIA, leading a team of Db2 DBAs.

8 comments

  1. Hi Ember,
    In a scenario post go-live we have 4 tiers Dev (Stage/Prod), QA (Stage/Prod), Stage and Prod. Now assuming Stage and Prod are both in-sync.
    Now if we want to sync-up the lower environments top to bottom.

    Would you recommend.

    1. Perform backup and restores from Stage to QA (Stage) to Dev (Stage). Similarly doing a backup and restore from Production to QA (Prod) and Dev (Prod). My concern is doing so we will be migrating all the transactional information from the production environment to a lower environment.

    2. Alternatively I am thinking once the QA (Staging) is refreshed from Staging we can either do a stagingcopy from QA (Staging) to QA (Prod) Or we can do a backup and restore of QA (Staging) to QA (Prod). In either case we will have to recreate the triggers again in QA (Prod).

    What is the best way to tackle this per you recommendation?

    1. Because there is often PII in production, usually the database to copy from is stage and not prod. In most situations, prod data should never be copied to lower environments. Backup and restore does not work with all features in the most recent releases of WCS, so as long at the database is not overly large, stagingcopy may be a good option. Whenever possible, it is best to do this in the IBM-approved direction, from prod to stage. If you do it in the other direction, you have to have the staging sequence created and you will end up with staging triggers and other objects that don’t belong in a production environment. If you’re not using WCS’s content versioning, you may be able to get away with backup/restore, but you’ll want to test it out thoroughly with a robust backout plan in case it doesn’t work.

  2. I came across this discussion, while looking for strategies for creating data in my staging databases that best simulates production data and yet avoids misuse of PII data in customer databases.

    I have a single product deployment in the cloud that is multi tenanted with 10s of customers and an exclusive db per customer. For every subsequent production code update (defect fixes, enhancements, new features), we need to sure that the changes can work with data across our customer databases.

    My plan is to find a way to get data from production to staging that is representative of data across all production dbs. This might involve sampling of data from multiple customer dbs, anonymizing the data to avoid PII issues, etc….

    Ember, what strategies have you successfully deployed in similar situations? What tools have you used to implement the strategies?

    1. The RI in Commerce databases makes a piecemeal approach really difficult. Often when it comes to that kind of situation, using a Commerce Utility (such as massload or similar) is the best way to move chunks of data around. Those sorts of utilities will often take data and put it with an existing store number and relate it properly to other data. It tends to be a slower approach, however, so it may make sense to develop a process for taking one production database as a starting point and sanitizing it, then adding in anything specific you need from other environments via commerce utilities such as massload.

    1. I have not. I would think it would copy more tables than you would need. The staging data is only a small subset of tables.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.