This post is specific to DB2 9.7. In DB2 10.1 and above, Usage Lists would be a better way to handle this.
In the course of writing my recent developerWorks article on low-cardinality indexes and an associated presentation, I found that I have dozes of indexes with a cardinality of 1 that are also recently used. I cannot come up with a way that using these indexes could help performance, so I want to understand the SQL that is using them. To do that, I need to find the SQL that uses an index. I also would like to do it without event monitors on a production system with minimal impact. Here’s my journey to do that.
I suspect the reason that the low-cardinality indexes are being used is because the SQL using them is using parameter markers, and therefore cannot make use of distribution statistics to know that they’re harmful rather than helpful to performance. I also want to know if the SQL using these indexes is frequently executed or some of the reporting workload that this e-commerce database is cursed with.
I also have absolutely no vendor or IBM tools other than what comes with DB2 ESE. This in itself is both a blessing and a curse. It takes me longer to do a lot of things than it would with tools, but without tools I learn so much more about how things work.
Where to Find Index Usage Information
Let’s start with where I got the list of indexes that have a cardinality of 1 and a lastused within the last 30 days. (this was run on 9/30/2013)
select i.lastused, substr(t.tabname,1,15) as tabname, substr(indname,1,10) as indname, substr(colnames,1,16) as colnames, smallint(fullkeycard)as fullkeycard, integer(card) as table_card, volatile from syscat.indexes i, syscat.tables t where i.tabname=t.tabname and i.tabschema=t.tabschema and t.tabschema not like 'SYS%' and card > 10 and fullkeycard=1 and uniquerule='D' and i.lastused > current date - 30 days with ur; LASTUSED TABNAME INDNAME COLNAMES FULLKEYCARD TABLE_CARD VOLATILE ---------- --------------- ---------- ---------------- ----------- ----------- -------- 09/30/2013 ACACTGRP I0000427 +MEMBER_ID 1 412 09/29/2013 ACRELGRP I0000442 +MEMBER_ID 1 19 09/30/2013 ACRESGRP I0000446 +MEMBER_ID 1 216 09/30/2013 ATCHTGT I0000864 +ATTACHUSG_ID 1 3961 09/30/2013 CALCODE I0001247 +TXCDCLASS_ID 1 1019 09/30/2013 CATENCALCD I0000516 +TRADING_ID 1 208 09/30/2013 CATENTRY I0000375 +RANK 1 279635 09/30/2013 CATGPCALCD I0000522 +TRADING_ID 1 32 09/30/2013 CATGRPTPC I0000531 +CATGROUP_ID 1 22 09/30/2013 CATTOGRP I0000373 +CATALOG_ID_LINK 1 627 09/30/2013 CPENDORDER I0000542 +STOREENT_ID 1 23 09/30/2013 DISPCGPREL I0000556 +MBRGRP_ID 1 13 09/30/2013 FFMCENTDS I0000570 +STADDRESS_ID 1 785 09/30/2013 LANGPAIR I0000603 +STOREENT_ID 1 13 09/30/2013 NOTIFY I0000625 +MSGTYPE_ID 1 11818 09/29/2013 OICOMPLIST I0001263 +OICOMPREL_ID 1 14951 09/30/2013 ORDERS I0001267 +BUYERPO_ID 1 1184754 09/30/2013 PARTICIPNT I0000680 +TERMCOND_ID 1 73 09/30/2013 PATTRVALUE I0001270 +ATTACHMENT_ID 1 44644 09/30/2013 PPCEXTDATA I0001106 +PPCBATCH_ID 1 3633179 09/30/2013 PPCPAYINST I0000919 +RMA_ID+MARKFORD 1 241572 09/30/2013 PPCPAYTRAN I0001103 +PPCBATCH_ID 1 209810 09/30/2013 SCHCMD I0000758 +STOREENT_ID 1 21 09/30/2013 TRADEPOSCN I0000814 +PRODUCTSET_ID 1 43 09/30/2013 PRELETEMPLATE I0001344 +STOREENT_ID 1 13 09/30/2013 PRICERULE I0001342 +TYPE 1 15 09/30/2013 CATENTSUBS I0001358 +SUBSCPTYPE_ID 1 162 09/30/2013 SEOPAGEDEFSUBPA I0001415 +STOREENT_ID 1 14 09/20/2013 CATGROUP I0000376 +RANK 1 1400 09/30/2013 BUSEVENT I0001104 +CHECKED 1 6805318 09/30/2013 STLOC I0001211 +GEONODE_ID 1 4913 09/29/2013 PPCEXTDATA I0000927 +PPCPAYTRAN_ID 1 3633179 32 record(s) selected.
After digging around a bit, I determined there’s no good way to say match statements in mon_get_pkg_cache_stmt to indexes using system tables alone. So I determined I’m going to have to explain basically everything in my package cache to accomplish this. Explain must track the objects used, and indeed, I found the EXPLAIN_OBJECT table, which should list the indexes used and tie them back to which statement they were used for.
My first thought on explaining everything was to use db2advis – since I knew that you can apply db2advis to the entire contents of your package cache, and can specify the option to capture explain information as well. Luckily, I thought that this might be resource-intensive and tried it on a non-production system first. I DO NOT RECOMMEND USING THIS METHOD on a production database. I used this syntax:
db2advis -d wc005s01 -q WSCOMUSR -g -p |tee db2advis.out
This ran for over 30 minutes. It did end up recommending some interesting indexes, but sifting through them would be a bit time consuming, and if I added all 192 it recommended, it would take over 5 GB of space – quite a bit for this ~100GB database. It wasn’t about the indexes anyway – I ran this to get the explain information. The big problem was that it spiked my CPU. CPU on a box that mostly runs below 20% was at 70-80% for that half hour, with spikes all the way to 100%. That would never work for my production server. And anyway, it’s doing more work than I need – I don’t need all that db2advis info, just the explain tables popluated.
Theoretically, I could just query mon_get_pkg_cache_stmt to get each of the statements, and run explains on them. But I wanted to use this to see if there was a more direct way of getting what I wanted. Digging around the IBM DB2 Info Center some more (some days I just love the Info Center), I found EXPLAIN_FROM_SECTION. It allows me to pass in an executable_id (which is available in mon_get_pkg_cache_stmt), and not much else. Armed with that, I came up with this, instead:
db2 -x "select executable_id from table(mon_get_pkg_cache_stmt(NULL, NULL, NULL, -2)) as t" >exe_id.list date; cat exe_id.list |while read e; do db2 connect to wc005q01; db2 "CALL EXPLAIN_FROM_SECTION ($e, 'M', NULL, 0, NULL, ?, ?, ?, ?, ?)"; date; db2 connect reset; done >>explain_all.out
When I ran this on a nearly un-used test box, it used between 5% and 15% of CPU, when the CPU was normally 0% used. There were about 13,000 statements involved, and it’s a 12-cpu box, but obviously results vary based on hardware, package cache size, database purpose and design, etc. This is the results of my case study only.
It also ran for about half an hour. Part of that is my methodology. Running it in a stored procedure would certainly perform better – especially since my bash loop requires me to connect in every loop – significant overhead. But creating a stored procedure on stage and prod databases would require me to go through a change management process. I would be willing to do that once I understand if this process is useful, but not when I’m just trying it out.
After testing to make sure I wasn’t going to tank the system, I moved forward to an actually busy test system (ongoing CPU utilization still under 10%), it spiked CPU utilization up to 35%. Granted, this is a smaller box, not sized for load tests, with only 3 cpus and there were about 18,000 statements to explain. Running explains this way really seemed to peg one cpu, and leave the others alone.
Production has a smaller package cache (I have since altered the test environments to match prod), and has 12 cpus. It normally runs around 20% cpu utilization off hours, so I feel comfortable running the same process there. With only about 3,000 statements, it finished on prod in less than 10 minutes, and while running had CPU running 10% higher or less than what it looked like without it running.
Summary of the last 3 paragraphs: don’t run this on prod without testing the impact on your system first – it could be the straw that breaks the camel’s back.
Analyzing Explain Info for Indexes
After I had collected data as outlined above, I used this query to look at indexes on the same criteria I used to look at indexes above:
select o.explain_time, substr(o.object_schema,1,16) as schema, substr(o.object_name,1,32) as name, i.uniquerule, i.fullkeycard, t.card, substr(s.statement_text,1,50) as stmt from db2inst1.explain_object o, db2inst1.explain_statement s, syscat.tables t, syscat.indexes i where s.explain_time = o.explain_time and s.explain_requester=o.explain_requester and s.source_name=o.source_name and s.source_schema=o.source_schema and s.explain_time > timestamp('2013-09-30-20.00.00.996909') and o.object_type='IX' and o.object_schema = i.indschema and o.object_name = i.indname and i.tabname=t.tabname and i.tabschema=t.tabschema and card > 10 and i.fullkeycard=1 and uniquerule='D' with ur;
Note the conditions here – I’m looking for SQL that used indexes with a cardinality of 1 where the table has a cardinality greater than 10. In one execution of this process, I got output like this:
EXPLAIN_TIME SCHEMA NAME UNIQUERULE FULLKEYCARD CARD STMT -------------------------- ---------------- -------------------------------- ---------- -------------------- -------------------- -------------------------------------------------- 2013-10-01-09.41.44.460410 WSCOMUSR I0001263 D 1 17742 DELETE FROM ORDERITEMS WHERE ORDERITEMS_ID = ? AN 2013-10-01-09.41.44.460410 WSCOMUSR I0001263 D 1 17742 $WITH CONTEXT$($CONSTRAINT$(WSCOMUSR.BKORDALLOC.F_ 2013-10-01-09.41.51.030702 WSCOMUSR I0000927 D 1 3769661 SELECT T1.PPCEXTDATA_ID, T1.ENCRYPTED, T1.MARKFORD 2013-10-01-09.41.51.030702 WSCOMUSR I0000927 D 1 3769661 SELECT Q1.PPCEXTDATA_ID AS "PPCEXTDATA_ID", Q1.ENC 2013-10-01-10.01.28.520154 WSCOMUSR I0000927 D 1 3769661 SELECT T1.PPCEXTDATA_ID, T1.ENCRYPTED, T1.MARKFORD 2013-10-01-10.01.28.520154 WSCOMUSR I0000927 D 1 3769661 SELECT Q1.PPCEXTDATA_ID AS "PPCEXTDATA_ID", Q1.ENC 2013-10-01-10.04.31.072162 WSCOMUSR I0000927 D 1 3769661 select ppcpaytran_id, count(*) from wscomusr.PPCEX 2013-10-01-10.04.31.072162 WSCOMUSR I0000927 D 1 3769661 SELECT Q3.PPCPAYTRAN_ID AS "PPCPAYTRAN_ID", Q3.$C1 2013-10-01-22.214.171.1243638 WSCOMUSR I0001263 D 1 17742 DELETE FROM ORDERITEMS WHERE ORDERITEMS_ID = ? AN 2013-10-01-126.96.36.1993638 WSCOMUSR I0001263 D 1 17742 $WITH CONTEXT$($CONSTRAINT$(WSCOMUSR.BKORDALLOC.F_ 2013-10-01-188.8.131.52352 WSCOMUSR I0000927 D 1 3769661 SELECT T1.PPCEXTDATA_ID, T1.ENCRYPTED, T1.MARKFORD 2013-10-01-184.108.40.206352 WSCOMUSR I0000927 D 1 3769661 SELECT Q1.PPCEXTDATA_ID AS "PPCEXTDATA_ID", Q1.ENC 12 record(s) selected.
One thing I noticed here – many times there are two entries for the same query – one for the original syntax and one for the optimized syntax that you see listed in the explain plan when you do an explain (which is not directly executable).
While this got some interesting results, it still did not quite get me as far as I wanted to go. I find that to understand HOW the sql is using these indexes, I need the values for the parameter markers in the queries. While this was an interesting exercise, in the end, I think I still need an event monitor to fully understand how the SQL is using the indexes, because only an event monitor will get me those values.. I’m hoping DB2 is using the indexes when the value of the column for the low-cardinality indexes is not represented in the index at all – for elimination. Because otherwise I don’t understand how an index with a cardinality of 1 could ever be useful.
But I can run explains and run the design advisor against this SQL to see what I can do as far as these indexes go.