I’m a big fan of locking event monitors, and try to create them in every database I work on. I constantly refer to my entry on Analyzing Deadlocks – The New Way for this, even when I’m looking at lock timeouts. Most of this post assumes you can do and have done most of the steps in that post.
Locking Event Monitors
A lot of times, the basic information collected with these database configuration settings will give me what I need:
Lock timeout events (MON_LOCKTIMEOUT) = WITHOUT_HIST Deadlock events (MON_DEADLOCK) = WITHOUT_HIST
This only captures the last statement run by each transaction involved in the deadlock or lock timeout.
Depending on the workload against the database, the above settings can lead to a situation where it makes no sense that the two (or 3 or 4) statements reported should be conflicting with each other. What’s really happening in this case is actually that the last statement reported has several other statements before it in the unit of work. The actual conflict is between one or more statements in each unit of work, not necessarily the last executed statement that is reported from the above.
With a fairly simplistic database environment and/or really talented developers, the above may be enough. But having those two factors is the exception instead of the rule.
For this kind of complex situation, set those database configuration parameters like this:
Lock timeout events (MON_LOCKTIMEOUT) = HISTORY Deadlock events (MON_DEADLOCK) = HISTORY
These parameters do not require an outage or a database activation to take effect. However, you must stop and then re-start the locking event monitor for them to take effect.
The output from the locking event monitor will now have a lot more data. Always place this data in tables for analysis rather than simply formatting it for humans to read. The statements that were last will have an ACTIVITIY_TYPE of ‘current’ in the lock_participant_activities table, and there will be (likely dozens or hundreds) of statements with an ACTIVITIY_TYPE of ‘past’. These statements are in the same UOW by the same application as the ones with the ‘current’ statements involved in the deadlock or lock timeout. XMLID is the column that links a particular locking event together across the various locking event monitor tables.
I was working with some of this data, and came up with around 11,000 statements across 305 lock timeouts (and a deadlock or two) over the course of 7 days. Obviously, these are problems, and I need to find the slow and problem queries, so I can address them to make lock timeouts less likely. The problem is knowing where to start with this many statements. There was one I found that also showed up in the results of my query for finding problem sql in the package cache, which I am certainly focusing on, though the solution on that one involves a variety of complicated solutions, including better caching at the application level, reducing the use of parameter markers, and introducing a statistical view.
In this post, I’m going to focus on finding those statements I can easily help with indexing. My read efficiency for this busy e-commerce database is 41, and peak periods are approaching. I would like to get it lower, if possible and reduce the likelihood of lock timeouts.
Explain Everything
My first goal is to get into a table each of these statements, along with the timeron values for their execution, and even the amount by which I can easily help these queries by adding indexes. This doesn’t mean I’ll add all of the indexes, but just having a list of queries I can focus on is going to be hugely beneficial to me. Luckily, DB2 makes this surprisingly easy to do.
Create a Workload
The first thing to do is to create a workload by inserting data into the ADVISE_WORKLOAD table. I’m using explain tables in my own schema (ECROOKS) for this. I used this SQL to do that population, directly from the tables created from the locking event monitor data:
insert into ecrooks.advise_workload(workload_name, statement_no, statement_text, frequency) ( select '20161011_locking2' , row_number() over() , rtrim(STMT_TEXT) , (select count(*) from ecrooks.lock_participant_activities lpa2 where lpa.STMT_PKGCACHE_ID=lpa2.STMT_PKGCACHE_ID and xmlid in (select xmlid from ecrooks.lock_event where event_timestamp > current timestamp - 7 days)) from ecrooks.lock_participant_activities lpa where xmlid in (select xmlid from ecrooks.lock_event where event_timestamp > current timestamp - 7 days) )
In the above, the name of my workload is ‘20161011_locking2’. I am also only looking at the output from the last 7 days. These can easily be changed in the SQL above to meet different needs.
Generating Explain Data
Once I have that data in the ADVISE_WORKLOAD table, I can run db2advis for the whole workload. I also want to tell DB2 to keep the explain information generated, and use db2inst1 as the schema for any statements that do not specify the schema:
db2advis -d sample -w 20161011_locking2 -keep -q db2inst1 |tee trans_3.advis
Now that generates a lot of output, but I’m not as interested in the text output. I’m more interested in what it did to my ADVISE_WORKLOAD table. I am also not going to add the 65 indexes it recommended. I am focusing on using this as a way to identify the queries for which I can make the biggest impact by adding indexes.
Analyzing ADVISE_WORKLOAD
Here is the query that I’m using to find the statements for which I’m likely to make the biggest impact with indexing:
select decimal(cost_before,25,2) as before , decimal(cost_after,25,2) as after , frequency , decimal((cost_before-cost_after)*frequency,20,5) as weighted_diff , decimal(((cost_before-cost_after)*frequency)/(cost_before*frequency),5,2) as diff_pct , substr(statement_text,1,100) as stmt from ecrooks.advise_workload where cost_before is not null and workload_name='20161011_locking2' order by weighted_diff desc fetch first 20 rows only with ur
The main reason I have all of those columns in there is because I was troubleshooting and trying things as I was writing the query. Here’s a sample of the output:
BEFORE AFTER FREQUENCY WEIGHTED_DIFF DIFF_PCT STMT --------------------------- --------------------------- ----------- ---------------------- -------- ---------------------------------------------------------------------------------------------------- 273136630000.00 271870330000.00 6 7597800000.00000 0.00 DELETE FROM (SELECT 1 FROM MEMBER WHERE (MEMBER_ID) IN (SELECT MEMBER_ID from member WHERE member_ 72467712.00 306432.00 24 1731870720.00000 0.99 select process_state, count(*) from X_XORDERITEMSTATE where CREATE_TIME > '2016-10-06 5:00:00' grou 1317120.00 107520.00 840 1016064000.00000 0.91 SELECT CASE WHEN ((SELECT COUNT(*) from TI_DELTA_CATENTRY WHERE CATENTRY_ID = (:HV00014 :HI00014 ) 40763088.00 172368.00 18 730632960.00000 0.99 select process_state, count(*) from X_XORDERITEMSTATE where CREATE_TIME > '2016-10-06 5:00:00' group 287441850000.00 287263060000.00 4 715160000.00000 0.00 DELETE FROM (SELECT 1 FROM MEMBER WHERE (MEMBER_ID) IN (SELECT MEMBER_ID from member WHERE member_ 2822400.00 2069760.00 840 632217600.00000 0.26 SELECT CASE WHEN ((SELECT COUNT(*) AS SCOUNT FROM CATENTREL CER, CATENTRY CE, CATENTDESC CED, INVENT 35014896.00 2690928.00 12 387887616.00000 0.92 select count (*) from x_xorderitemstate where x_xorderitemstate_id in ( select x.x_xorderitemstate_i 34266096.00 2688480.00 12 378931392.00000 0.92 select count(*) from x_xorderitemstate where x_xorderitemstate_id in ( select x.x_xorderitemstate_id 1197840.00 781200.00 840 349977600.00000 0.34 SET :HV00023 :HI00023 = (SELECT SUBSTR(PARTNUMBER,1,8) FROM CATENTRY WHERE CATENTRY_ID = :HV00014 18116640.00 623088.00 12 209922624.00000 0.96 select * from x_xorderitemstate where CREATE_TIME > '2016-10-06 5:00:00' 120960.00 13440.00 840 90316800.00000 0.88 SELECT CASE WHEN ( (SELECT COUNT(*) FROM TI_DELTA_CATENTRY WHERE CATENTRY_ID = :HV00035 :HI00035 A 114240.00 13440.00 840 84672000.00000 0.88 SELECT CASE WHEN ((SELECT COUNT(*) from TI_DELTA_CATENTRY WHERE CATENTRY_ID = (:HV00014 :HI00014 ) 8753724.00 672732.00 6 48485952.00000 0.92 select count(*) from x_xorderitemstate where x_xorderitemstate_id in ( select x.x_xorderitemstate_id
Several things to notice here:
- Notice the first statement – while indexes recommeneded for it would save me about 7 Billion timerons, that represents less than a 1% improvement in the timing of that SQL. This is a known problem delete that I have a completely different suggestion that will improve it by about 30% by re-writing the delete.
- The weighted difference column only shows me the difference in timerons indexes could make over the course of the number of executions from the locking data. It does not include the other executions at other times, of which there are likely to be many.
- Carriage returns in the queries mess up my output at times.
- The SQL here is trucated to make the output more readable – I’ll have to do another query to get the full SQL
- Some of the queries are very similar, with minor syntax differences or different static values. This makes it all the more important to address performance problems for them.
Now that I have a shorter list of SQL statements, I approach each statement individually, looking for indexes that could help (using output from db2advis and other methods for each one very carefully) or for other alterations that make sense.
Note that this method is not finding necessarily the worst SQL from this group, but really the SQL that is most likely to be helped by one or more indexes. There are other ways to look at this data. I could query it to find the most expensive queries by timerons in this workload, or the most frequently executed, or a weighted value that combines those two factors as well.