I’ve been having some fun with explains lately. I’ve been preparing a presentation for some developers on how to analyze queries – I’ll be presenting next week when I’m on-site. So I started digging through some explain plans I had sitting around from various times when I’ve need to do analyses, and I started going through them. One reccomendation on my list of “DB2 SQL Best Practices” is to avoid in-lists whenever possible. Just one of those tips that I’ve always known and never really thought about. So I start digging into this excerpt from explain plan:
0.716508 FETCH ( 6) 133.716 38 /---+--- 1 55804 RIDSCN TABLE: WSCOMUSR ( 7) CATENTRY 77.2279 6 +----------------+----------------+--------+-------+----------------+----------------+ 1 1 1 1 1 1 SORT SORT SORT SORT SORT SORT ( 8) ( 10) ( 12) ( 14) ( 16) ( 18) 12.8719 12.8719 12.8719 12.8719 12.8719 12.8719 1 1 1 1 1 1 | | | | | | 1 1 1 1 1 1 IXSCAN IXSCAN IXSCAN IXSCAN IXSCAN IXSCAN ( 9) ( 11) ( 13) ( 15) ( 17) ( 19) 12.8711 12.8711 12.8711 12.8711 12.8711 12.8711 1 1 1 1 1 1 | | | | | | 55804 55804 55804 55804 55804 55804 INDEX: WSCOMUSR INDEX: WSCOMUSR INDEX: WSCOMUSR INDEX: WSCOMUSR INDEX: WSCOMUSR INDEX: WSCOMUSR I0000518 I0000518 I0000518 I0000518 I0000518 I0000518
Having not done much in-depth analysis recently and most of my recent analysis being of the popcorn and bubble-gum variety using the index advisor, I wonder why it is scanning the same index 6 times (and that is taking up just less than 1/2 of the total timerons). Looking into the query itself, I find an IN list with exactly 6 items – whaddya know. So in this case, DB2 had to scan the SAME index 6 times -once for each item in the list, and then it did a RID scan of that list and fetched the rest of the data from the base table. I now have proof of how IN lists can work and a nifty example for the developers.
So digging into more queries, I find this access plan:
Access Plan: ----------- Total Cost: 102.842 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 4 NLJOIN ( 2) 102.842 8 /----+--- 4 1 TBSCAN IXSCAN ( 3) ( 6) 0.00286127 25.7188 0 2 | | 4 55804 SORT INDEX: DB2INST1 ( 4) IBM_007 0.00214227 0 | 4 TBSCAN ( 5) 5.95154e-05 0 | 4 TABFNC: SYSIBM GENROW
Nothing too expensive or complicated here – single table access. But why the heck is it doing a JOIN when the query only accesses a single table with no subqueries? And furthermore, what’s this GENROW thing that it’s doing a table scan on and getting 4 rows. The documentation is real helpful on GENROW (from the Info Center):
Digging a bit further, I find:
This operator is used by the optimizer to generate rows of data.
Operator name: GENROW
Represents: A built-in function that generates a table of rows, using no input from tables, indexes, or operators.
GENROW can be used by the optimizer to generate rows of data (for example, for an INSERT statement or for some IN-lists that are transformed into joins).
Ok, so burried in there is a reference to IN-lists and commerce tends to be a bit IN-list happy. Looking at the query, I find an IN-list with 4 values. So DB2 is essentially treating that 4 rows as a (non-indexed) table, and then doing an NLJOIN between it and the table itself. So on a single-table query, I get a sort, a table scan, and a join, even though I’m getting index-only access to the table directly.
Much of this I’ve learned before when digging into the details of explains, but I haven’t focused so much on how IN-lists are handled, and thought it was interesting.
=) yes, this is my idea of fun on a Thursday night.
Hello, Great Post !. I’m facing the same situation. I have 2 equal databases in 2 different servers, but when I execute the explain plan.. one is using the same index several times and the other is using genrow. What’s the difference? Could be statistics or anything else? Thanks. Regards.
Do they have the same statistics and exactly the same data? My guess would be that they have different data or different statistics. I would look also at if one is much faster than the other.
Thanks, for the information on genrow. I tried using “or” clause instead of in-list for example “where (pred1=”value1″ or pred1=”value2″)” but it still produces genrow section.
may be by now, you would have found the way to avoid multiple index scan with in-list.
I have tried something like this and was able to avoid multiple index scan.
Original Query :
select pm.id_value,pm.token
from FIN_DA_PAYMENT_MEANS pm
where pm.ENTITY_STATUS_CREATING_DATE between ‘01.03.2017 00:00:00.000000’ and ‘01.03.2017 23:59:59.999999’
and pm.type in (‘virtualCreditCardToken’, ‘creditCardToken’)
for read only with ur
Modified query :
select pm.id_value,pm.token
from HRSDB.FIN_DA_PAYMENT_MEANS pm join
(values(‘virtualCreditCardToken’) union values(‘creditCardToken’) ) as v1(c1)
on v1.c1=pm.type
where pm.ENTITY_STATUS_CREATING_DATE between ‘2017-03-01-00.00.00.000000’ and ‘2017-03-01-23.59.59.999999’
for read only with ur
;
create temp table in the query. So far modified query didn’t give incorrect results :).