The Problem
I often lean on analogies when I speak to non-technical people about technical topics. For example, I would explain database locking as “two kids fighting for the same candy bar”. If you introduce isolation levels into the conversation, I would expand the analogy to include “one sibling holding on to a candy bar they care nothing about just because they don’t want the other sibling getting to it”.
Recently I ran into that second isolation level analogy. As our site was ramping up, I started to notice a pattern of deadlocks and timeouts (SQL0911N, RC2 and RC68) around our stagingprop utility process. I immediately suspected the usual suspects such as a rogue developer, long running batch process, or oddball query. But that wasn’t the case.
My locking issues seemed to be haphazard, throughout the evening, and only caused total failure around the stagingprop utility process. I did discover one interesting clue: the deadlock event monitor did capture a few more SQL around the faceting search. This process constantly runs and controls what products are displayed to the end user.
I found the root cause and a solution after a ton of research and a PMR with IBM. The application’s “Default Isolation Level” was not set to a recommended value. Why is this default not set if there is a best practice? I’m not exactly sure.
According to IBM, WebSphere Commerce and search application connections should come in as cursor stability (CS). This isolation level causes the least contention and best performance, while still ensuring the data integrity that WCS requires. The default setting is blank, which causes the application server to use a default isolation level based off the installed JDBC driver. In most cases this default setting choses read stability (RS).
With RS, the application server will try to lock all scanned rows in share mode for a unit of work. This can prevent multiple rows from being changed, even if the application is not actively using them. With CS, the application will lock only the single row the cursor is currently positioned.
To expand our analogy, the application wanted to grab the whole bag of Hershey chocolate bars instead of the single Hershey bar with almonds. Protect everything from my siblings, not just what I want.
As I researched, I discovered that the faceting search would be one of the main processes suffering from contention if the default blank setting was in play. To confirm there was SQL coming in as read stability (RS), support asked me to run two SQL:
SELECT pkg.effective_isolation, pkg.num_executions, pkg.stmt_text FROM TABLE(mon_get_pkg_cache_stmt(NULL, NULL, NULL, -1) ) pkg WHERE pkg.effective_isolation = 'RS' AND pkg.stmt_text NOT LIKE '%wcs_db2collect%' AND pkg.stmt_text NOT LIKE '%WITH RS%' ORDER BY pkg.num_executions DESC SELECT pkg.effective_isolation, pkg.stmt_text FROM TABLE(mon_get_pkg_cache_stmt(NULL, NULL, NULL, -1) ) pkg WHERE pkg.stmt_text LIKE '%SELECT_FACET_VALUE_IMAGE_AND_SEQUENCE%'
The Solution
Addressing the problem was simple. The following steps were taken directly from the IBM Support Technote: Transaction Isolation Levels and WebSphere Application Server. In many cases, these steps will be performed by application specialists, and not by the DBA.
To define this custom property for a data source, you should do the following:
- Click Resources > JDBC provider > JDBC_provider.
- Click Data sources in the Additional Properties section.
- Click the name of the data source.
- Click Custom properties.
- Create the webSphereDefaultIsolationLevel custom property
- Click New.
- Enter webSphereDefaultIsolationLevel for the name field.
- Enter one of the “possible values” in the value field from the table above. i.e. 0, 1, 2, 4, or 8. (In this specific case, enter “2”)
Once complete, the application server must be recycled. If you have one server, this would mean a small outage. In a multiple application server environment, you can do a rolling restart, where servers are recycled one at a time while the other application servers keep serving the workload.
After the change is made on the application server side, run the SQL above to confirm no queries are coming in as read stability (RS). In our case, the effect was immediately visible in the search for “%WITH RS%” SQL that IBM had shared. Better yet, during our next stagingprop utility process not one deadlock reared it’s ugly head.
Michael Krafick is an IBM Champion and occasional contributor to db2commerce.com. He has been a DB2 LUW DBA for over 15 years in data warehousing, transactional OLTP environments, and e-commerce databases. He was recently inducted into the IDUG “Speaker Hall of Fame” and given the IBM “DB2 Community Leader Award” in 2015. Michael also has extensive experience in setting up monitoring configurations for DB2 Databases as well as preparing for high availability failover, backup, and recovery. He can be reached at “Michael.Krafick (at) icloud (dot) com”. Linked-in Profile: http://www.linkedin.com/in/michaelkrafick. Twitter: @mkrafick Reddit: https://www.reddit.com/r/DB2/
Hi Michael,
We also had the same problem with our application environment. But after making these changes still deadlocks were occurring .
After struggle it was found out that DB package cache should also be cleared in order for these changes to reflect as DB package cache will always be referred by the application in order to have the shortest access path. After bouncing DB , it started working.
Thanks