Fun with IN-lists and GENROW

Posted by

I’ve been having some fun with explains lately. I’ve been preparing a presentation for some developers on how to analyze queries – I’ll be presenting next week when I’m on-site. So I started digging through some explain plans I had sitting around from various times when I’ve need to do analyses, and I started going through them. One reccomendation on my list of “DB2 SQL Best Practices” is to avoid in-lists whenever possible. Just one of those tips that I’ve always known and never really thought about. So I start digging into this excerpt from explain plan:

                                                    0.716508                                         
                                                     FETCH                                          
                                                     (   6)                                         
                                                     133.716                                        
                                                       38                                     
                                                    /---+---                                 
                                                  1          55804                            
                                               RIDSCN   TABLE: WSCOMUSR                       
                                               (   7)      CATENTRY                                
                                               77.2279                                             
                                                  6                                                
       +----------------+----------------+--------+-------+----------------+----------------+    
        1                1                1                1                1                1   
     SORT             SORT             SORT             SORT             SORT             SORT   
     (   8)           (  10)           (  12)           (  14)           (  16)           (  18)
     12.8719          12.8719          12.8719          12.8719          12.8719          12.8719
        1                1                1                1                1                1
       |                |                |                |                |                |
        1                1                1                1                1                1
     IXSCAN           IXSCAN           IXSCAN           IXSCAN           IXSCAN           IXSCAN
     (   9)           (  11)           (  13)           (  15)           (  17)           (  19)
     12.8711          12.8711          12.8711          12.8711          12.8711          12.8711
        1                1                1                1                1                1
       |                |                |                |                |                |
      55804            55804            55804            55804            55804            55804
 INDEX: WSCOMUSR  INDEX: WSCOMUSR  INDEX: WSCOMUSR  INDEX: WSCOMUSR  INDEX: WSCOMUSR  INDEX: WSCOMUSR
    I0000518         I0000518         I0000518         I0000518         I0000518         I0000518

Having not done much in-depth analysis recently and most of my recent analysis being of the popcorn and bubble-gum variety using the index advisor, I wonder why it is scanning the same index 6 times (and that is taking up just less than 1/2 of the total timerons). Looking into the query itself, I find an IN list with exactly 6 items – whaddya know. So in this case, DB2 had to scan the SAME index 6 times -once for each item in the list, and then it did a RID scan of that list and fetched the rest of the data from the base table. I now have proof of how IN lists can work and a nifty example for the developers.

So digging into more queries, I find this access plan:

Access Plan:
-----------
  Total Cost:   102.842
  Query Degree:  1

              Rows
             RETURN
             (   1)
              Cost
               I/O
               |
                4
             NLJOIN
             (   2)
             102.842
                8
           /----+---
         4             1
      TBSCAN        IXSCAN
      (   3)        (   6)
    0.00286127      25.7188
         0             2
        |             |
         4           55804
      SORT      INDEX: DB2INST1
      (   4)        IBM_007
    0.00214227
         0
        |
         4
      TBSCAN
      (   5)
    5.95154e-05
         0
        |
         4
 TABFNC: SYSIBM  
      GENROW

Nothing too expensive or complicated here – single table access. But why the heck is it doing a JOIN when the query only accesses a single table with no subqueries? And furthermore, what’s this GENROW thing that it’s doing a table scan on and getting 4 rows. The documentation is real helpful on GENROW (from the Info Center):

GENROW
Generates a table of rows.

Digging a bit further, I find:

This operator is used by the optimizer to generate rows of data.

Operator name: GENROW

Represents: A built-in function that generates a table of rows, using no input from tables, indexes, or operators.

GENROW can be used by the optimizer to generate rows of data (for example, for an INSERT statement or for some IN-lists that are transformed into joins).

Ok, so burried in there is a reference to IN-lists and commerce tends to be a bit IN-list happy. Looking at the query, I find an IN-list with 4 values. So DB2 is essentially treating that 4 rows as a (non-indexed) table, and then doing an NLJOIN between it and the table itself. So on a single-table query, I get a sort, a table scan, and a join, even though I’m getting index-only access to the table directly.

Much of this I’ve learned before when digging into the details of explains, but I haven’t focused so much on how IN-lists are handled, and thought it was interesting.

=) yes, this is my idea of fun on a Thursday night.

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, and is now pivoting to focus on learning MySQL. Ember shares both posts about her core skill set and her journey learning MySQL. Ember lives in Denver and work from home

4 comments

  1. Hello, Great Post !. I’m facing the same situation. I have 2 equal databases in 2 different servers, but when I execute the explain plan.. one is using the same index several times and the other is using genrow. What’s the difference? Could be statistics or anything else? Thanks. Regards.

    1. Do they have the same statistics and exactly the same data? My guess would be that they have different data or different statistics. I would look also at if one is much faster than the other.

  2. Thanks, for the information on genrow. I tried using “or” clause instead of in-list for example “where (pred1=”value1″ or pred1=”value2″)” but it still produces genrow section.

  3. may be by now, you would have found the way to avoid multiple index scan with in-list.

    I have tried something like this and was able to avoid multiple index scan.

    Original Query :
    select pm.id_value,pm.token
    from FIN_DA_PAYMENT_MEANS pm
    where pm.ENTITY_STATUS_CREATING_DATE between ‘01.03.2017 00:00:00.000000’ and ‘01.03.2017 23:59:59.999999’
    and pm.type in (‘virtualCreditCardToken’, ‘creditCardToken’)
    for read only with ur

    Modified query :

    select pm.id_value,pm.token
    from HRSDB.FIN_DA_PAYMENT_MEANS pm join
    (values(‘virtualCreditCardToken’) union values(‘creditCardToken’) ) as v1(c1)
    on v1.c1=pm.type
    where pm.ENTITY_STATUS_CREATING_DATE between ‘2017-03-01-00.00.00.000000’ and ‘2017-03-01-23.59.59.999999’
    for read only with ur
    ;

    create temp table in the query. So far modified query didn’t give incorrect results :).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.