Find the Statements Using an Index Without Event Monitors

Posted by

This post is specific to DB2 9.7. In DB2 10.1 and above, Usage Lists would be a better way to handle this.

In the course of writing my recent developerWorks article on low-cardinality indexes and an associated presentation, I found that I have dozes of indexes with a cardinality of 1 that are also recently used. I cannot come up with a way that using these indexes could help performance, so I want to understand the SQL that is using them. To do that, I need to find the SQL that uses an index. I also would like to do it without event monitors on a production system with minimal impact. Here’s my journey to do that.

I suspect the reason that the low-cardinality indexes are being used is because the SQL using them is using parameter markers, and therefore cannot make use of distribution statistics to know that they’re harmful rather than helpful to performance. I also want to know if the SQL using these indexes is frequently executed or some of the reporting workload that this e-commerce database is cursed with.

I also have absolutely no vendor or IBM tools other than what comes with DB2 ESE. This in itself is both a blessing and a curse. It takes me longer to do a lot of things than it would with tools, but without tools I learn so much more about how things work.

Where to Find Index Usage Information

Let’s start with where I got the list of indexes that have a cardinality of 1 and a lastused within the last 30 days. (this was run on 9/30/2013)

select  i.lastused, 
        substr(t.tabname,1,15) as tabname, 
        substr(indname,1,10) as indname, 
        substr(colnames,1,16) as colnames, 
        smallint(fullkeycard)as fullkeycard, 
        integer(card) as table_card, 
from    syscat.indexes i, 
        syscat.tables t 
where i.tabname=t.tabname and i.tabschema=t.tabschema 
    and t.tabschema not like 'SYS%'
    and card > 10
    and fullkeycard=1 
    and uniquerule='D' 
    and i.lastused > current date - 30 days 
with ur;

---------- --------------- ---------- ---------------- ----------- ----------- --------
09/30/2013 ACACTGRP        I0000427   +MEMBER_ID                 1         412         
09/29/2013 ACRELGRP        I0000442   +MEMBER_ID                 1          19         
09/30/2013 ACRESGRP        I0000446   +MEMBER_ID                 1         216         
09/30/2013 ATCHTGT         I0000864   +ATTACHUSG_ID              1        3961         
09/30/2013 CALCODE         I0001247   +TXCDCLASS_ID              1        1019         
09/30/2013 CATENCALCD      I0000516   +TRADING_ID                1         208         
09/30/2013 CATENTRY        I0000375   +RANK                      1      279635         
09/30/2013 CATGPCALCD      I0000522   +TRADING_ID                1          32         
09/30/2013 CATGRPTPC       I0000531   +CATGROUP_ID               1          22         
09/30/2013 CATTOGRP        I0000373   +CATALOG_ID_LINK           1         627         
09/30/2013 CPENDORDER      I0000542   +STOREENT_ID               1          23         
09/30/2013 DISPCGPREL      I0000556   +MBRGRP_ID                 1          13         
09/30/2013 FFMCENTDS       I0000570   +STADDRESS_ID              1         785         
09/30/2013 LANGPAIR        I0000603   +STOREENT_ID               1          13         
09/30/2013 NOTIFY          I0000625   +MSGTYPE_ID                1       11818         
09/29/2013 OICOMPLIST      I0001263   +OICOMPREL_ID              1       14951         
09/30/2013 ORDERS          I0001267   +BUYERPO_ID                1     1184754         
09/30/2013 PARTICIPNT      I0000680   +TERMCOND_ID               1          73         
09/30/2013 PATTRVALUE      I0001270   +ATTACHMENT_ID             1       44644         
09/30/2013 PPCEXTDATA      I0001106   +PPCBATCH_ID               1     3633179         
09/30/2013 PPCPAYINST      I0000919   +RMA_ID+MARKFORD           1      241572         
09/30/2013 PPCPAYTRAN      I0001103   +PPCBATCH_ID               1      209810         
09/30/2013 SCHCMD          I0000758   +STOREENT_ID               1          21         
09/30/2013 TRADEPOSCN      I0000814   +PRODUCTSET_ID             1          43         
09/30/2013 PRELETEMPLATE   I0001344   +STOREENT_ID               1          13         
09/30/2013 PRICERULE       I0001342   +TYPE                      1          15         
09/30/2013 CATENTSUBS      I0001358   +SUBSCPTYPE_ID             1         162         
09/30/2013 SEOPAGEDEFSUBPA I0001415   +STOREENT_ID               1          14         
09/20/2013 CATGROUP        I0000376   +RANK                      1        1400         
09/30/2013 BUSEVENT        I0001104   +CHECKED                   1     6805318         
09/30/2013 STLOC           I0001211   +GEONODE_ID                1        4913         
09/29/2013 PPCEXTDATA      I0000927   +PPCPAYTRAN_ID             1     3633179         

  32 record(s) selected.

After digging around a bit, I determined there’s no good way to say match statements in mon_get_pkg_cache_stmt to indexes using system tables alone. So I determined I’m going to have to explain basically everything in my package cache to accomplish this. Explain must track the objects used, and indeed, I found the EXPLAIN_OBJECT table, which should list the indexes used and tie them back to which statement they were used for.

EXPLAINing Everything

My first thought on explaining everything was to use db2advis – since I knew that you can apply db2advis to the entire contents of your package cache, and can specify the option to capture explain information as well. Luckily, I thought that this might be resource-intensive and tried it on a non-production system first. I DO NOT RECOMMEND USING THIS METHOD on a production database. I used this syntax:

db2advis -d wc005s01 -q WSCOMUSR -g -p |tee db2advis.out

This ran for over 30 minutes. It did end up recommending some interesting indexes, but sifting through them would be a bit time consuming, and if I added all 192 it recommended, it would take over 5 GB of space – quite a bit for this ~100GB database. It wasn’t about the indexes anyway – I ran this to get the explain information. The big problem was that it spiked my CPU. CPU on a box that mostly runs below 20% was at 70-80% for that half hour, with spikes all the way to 100%. That would never work for my production server. And anyway, it’s doing more work than I need – I don’t need all that db2advis info, just the explain tables popluated.

Theoretically, I could just query mon_get_pkg_cache_stmt to get each of the statements, and run explains on them. But I wanted to use this to see if there was a more direct way of getting what I wanted. Digging around the IBM DB2 Info Center some more (some days I just love the Info Center), I found EXPLAIN_FROM_SECTION. It allows me to pass in an executable_id (which is available in mon_get_pkg_cache_stmt), and not much else. Armed with that, I came up with this, instead:

db2 -x "select executable_id from table(mon_get_pkg_cache_stmt(NULL, NULL, NULL, -2)) as t" >exe_id.list
date; cat exe_id.list |while read e; do db2 connect to wc005q01; db2 "CALL EXPLAIN_FROM_SECTION ($e, 'M', NULL, 0, NULL, ?, ?, ?, ?, ?)"; date; db2 connect reset; done >>explain_all.out

When I ran this on a nearly un-used test box, it used between 5% and 15% of CPU, when the CPU was normally 0% used. There were about 13,000 statements involved, and it’s a 12-cpu box, but obviously results vary based on hardware, package cache size, database purpose and design, etc. This is the results of my case study only.

It also ran for about half an hour. Part of that is my methodology. Running it in a stored procedure would certainly perform better – especially since my bash loop requires me to connect in every loop – significant overhead. But creating a stored procedure on stage and prod databases would require me to go through a change management process. I would be willing to do that once I understand if this process is useful, but not when I’m just trying it out.

After testing to make sure I wasn’t going to tank the system, I moved forward to an actually busy test system (ongoing CPU utilization still under 10%), it spiked CPU utilization up to 35%. Granted, this is a smaller box, not sized for load tests, with only 3 cpus and there were about 18,000 statements to explain. Running explains this way really seemed to peg one cpu, and leave the others alone.

Production has a smaller package cache (I have since altered the test environments to match prod), and has 12 cpus. It normally runs around 20% cpu utilization off hours, so I feel comfortable running the same process there. With only about 3,000 statements, it finished on prod in less than 10 minutes, and while running had CPU running 10% higher or less than what it looked like without it running.

Summary of the last 3 paragraphs: don’t run this on prod without testing the impact on your system first – it could be the straw that breaks the camel’s back.

Analyzing Explain Info for Indexes

After I had collected data as outlined above, I used this query to look at indexes on the same criteria I used to look at indexes above:

    substr(o.object_schema,1,16) as schema, 
    substr(o.object_name,1,32) as name, 
    substr(s.statement_text,1,50) as stmt 
from    db2inst1.explain_object o, 
    db2inst1.explain_statement s, 
    syscat.tables t,
    syscat.indexes i
where   s.explain_time = o.explain_time 
    and s.explain_requester=o.explain_requester 
    and s.source_name=o.source_name 
    and s.source_schema=o.source_schema 
    and s.explain_time > timestamp('2013-09-30-') 
    and o.object_type='IX'
    and o.object_schema = i.indschema and o.object_name = i.indname
        and i.tabname=t.tabname and i.tabschema=t.tabschema 
        and card > 10
        and i.fullkeycard=1 
        and uniquerule='D' 
with ur;

Note the conditions here – I’m looking for SQL that used indexes with a cardinality of 1 where the table has a cardinality greater than 10. In one execution of this process, I got output like this:

EXPLAIN_TIME               SCHEMA           NAME                             UNIQUERULE FULLKEYCARD          CARD                 STMT                                              
-------------------------- ---------------- -------------------------------- ---------- -------------------- -------------------- --------------------------------------------------
2013-10-01- WSCOMUSR         I0001263                         D                             1                17742 DELETE FROM ORDERITEMS  WHERE ORDERITEMS_ID = ? AN
2013-10-01- WSCOMUSR         I0001263                         D                             1                17742 $WITH CONTEXT$($CONSTRAINT$(WSCOMUSR.BKORDALLOC.F_
2013-10-01- WSCOMUSR         I0000927                         D                             1              3769661 SELECT T1.PPCEXTDATA_ID, T1.ENCRYPTED, T1.MARKFORD
2013-10-01- WSCOMUSR         I0000927                         D                             1              3769661 SELECT Q1.PPCEXTDATA_ID AS "PPCEXTDATA_ID", Q1.ENC
2013-10-01- WSCOMUSR         I0000927                         D                             1              3769661 SELECT T1.PPCEXTDATA_ID, T1.ENCRYPTED, T1.MARKFORD
2013-10-01- WSCOMUSR         I0000927                         D                             1              3769661 SELECT Q1.PPCEXTDATA_ID AS "PPCEXTDATA_ID", Q1.ENC
2013-10-01- WSCOMUSR         I0000927                         D                             1              3769661 select ppcpaytran_id, count(*) from wscomusr.PPCEX
2013-10-01- WSCOMUSR         I0000927                         D                             1              3769661 SELECT Q3.PPCPAYTRAN_ID AS "PPCPAYTRAN_ID", Q3.$C1
2013-10-01- WSCOMUSR         I0001263                         D                             1                17742 DELETE FROM ORDERITEMS  WHERE ORDERITEMS_ID = ? AN
2013-10-01- WSCOMUSR         I0001263                         D                             1                17742 $WITH CONTEXT$($CONSTRAINT$(WSCOMUSR.BKORDALLOC.F_
2013-10-01- WSCOMUSR         I0000927                         D                             1              3769661 SELECT T1.PPCEXTDATA_ID, T1.ENCRYPTED, T1.MARKFORD
2013-10-01- WSCOMUSR         I0000927                         D                             1              3769661 SELECT Q1.PPCEXTDATA_ID AS "PPCEXTDATA_ID", Q1.ENC

  12 record(s) selected.

One thing I noticed here – many times there are two entries for the same query – one for the original syntax and one for the optimized syntax that you see listed in the explain plan when you do an explain (which is not directly executable).

While this got some interesting results, it still did not quite get me as far as I wanted to go. I find that to understand HOW the sql is using these indexes, I need the values for the parameter markers in the queries. While this was an interesting exercise, in the end, I think I still need an event monitor to fully understand how the SQL is using the indexes, because only an event monitor will get me those values.. I’m hoping DB2 is using the indexes when the value of the column for the low-cardinality indexes is not represented in the index at all – for elimination. Because otherwise I don’t understand how an index with a cardinality of 1 could ever be useful.

But I can run explains and run the design advisor against this SQL to see what I can do as far as these indexes go.

Lead Db2 Database Engineer and Service Delivery Manager , XTIVIA
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 17 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home for XTIVIA, leading a team of Db2 DBAs.


  1. Hi Amber, I liked your blog as I tried the same thing before and feels nice somebody else is trying to get the same info DB2 does not make easy to get.
    My experiences are:
    A) I’ve found SQLs where DB2 uses indexes which have nothing to do at all with the WHERE predicate. Like this particular query below uses the PK index (which is SINGLETRANSID column) but you don’t see it in the where predicate, even though there’s an index on ARCHFLAG+ARCHDATE+TMSTAMP columns.

    “SELECT Min(singletransID) FROM isaac.table WHERE ARCHFLAG = ? and Tmstamp < timestamp(current_date) – ? DAY"

    Only if you add 'and ARCHDATE=? and' in the predicate, DB2 uses the right index. So I think if we know indexes with fullkeycard=1 are not useful we should drop them anyway.

  2. Hi ,
    Thanks for sharing the information . It was great to see the way we can pull the metadata statistics and use the same to improve our application and database performance . I was also wondering if there is a way to do the below :

    I need to know which columns of a certain index does a SELECT query use if at all the query uses that index . As for eg let’s say I have an index ix1 based on columns c1 , c2 , c3 and c4 . Now my select query is in the form of select * from table where c1 = …. and c3 = …. . If I see the explain plan I will be able to see that DB2 has indeed used index ix1 but how can I know which columns out of that index has DB2 used viz. c1 only or c1 and c3 both . Do we have any third party tool that pretty much serves the above purpose ?

    1. DB2 has to read all of the columns from the index, so it doesn’t really matter which columns it “uses” – unless we’re talking index-only access, or unless the table has to then be scanned/accessed to get more information that’s not in the index. If the data is in the index and there isn’t a function in the query that prevents its use, then the data in the index will generally be preferred over the table. By reading the explain, you can generally tell which predicates are being applied at which step, but you have to dig down into the details to do so.

      The bigger question to ask with indexes is usually how they are being accessed. There are two ways to access an index – following the b-tree structure to retrieve certain leaf pages, or scanning through all the leaf pages. The latter is obviously more resource-intensive and worse for performance.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.