DB2 Administrative SQL Cookbook: Column Selectivity by Table (BLU)

Purpose

This statement reports how selective at the column level queries are that run against a specific table. This does not look at overall selectivity or row selectivity. It will only work in DB2 10.5. BLU performs best when not all columns are referenced by queries.

Source

While I’ve modified this statement, it started with statements from these two sources, both influenced by David Kalmuk:
http://www.ibm.com/developerworks/data/library/techarticle/dm-1407monitor-bluaccel/index.html
http://www.dbisoftware.com/blog/db2nightshow.php?id=619

Statement

with t1 as (
SELECT
        substr(tabschema,1,18) as tabschema
        ,substr(tabname,1,33) as tabname
        ,(select count(*) from syscat.columns c where c.tabname=mgt.tabname and c.tabschema=mgt.tabschema) as num_cols
        ,section_exec_with_col_references as num_queries
        ,(num_columns_referenced /
        nullif(section_exec_with_col_references, 0))
        as avg_cols_ref_per_query
FROM table(mon_get_table('', '', -2)) AS mgt
where tab_organization='C'
)
SELECT 
    t1.*
    , decimal((float(t1.avg_cols_ref_per_query)/float(num_cols)),5,2) * 100 as PCT_COLS
FROM table(mon_get_table('', '', -2)) AS mgt2
    join t1 on t1.tabschema = mgt2.tabschema and t1.tabname=mgt2.tabname
where mgt2.tab_organization='C'
    and mgt2.tabschema not like 'SYS%'
order by PCT_COLS desc, t1.num_queries desc
with ur;

Sample Output

TABSCHEMA          TABNAME                           NUM_COLS    NUM_QUERIES          AVG_COLS_REF_PER_QUERY PCT_COLS
------------------ --------------------------------- ----------- -------------------- ---------------------- --------
SSIRS              COGNOS_LOGIN_RAW_MS_EXCHANGE                7               343483                      7   100.00
SSIRS              COGNOS_LOGIN_RAW_MS_EXCHANGESMTP            6               343482                      6   100.00
SSIRS              ROW_COUNT_CMIPSSFTP                         5                 1447                      5   100.00
SSIRS              COGLOG_USERINFO                             8                  131                      8   100.00
SSIRS_DMART        DIM_CS_LCTN                                15                10342                      2    13.00
SSIRS_DMART        DIM_CS_PGM_REG                             39                 3636                      5    12.00
SSIRS              COLL_RFRL                                  55                  886                      7    12.00
SSIRS_DMART        DIMDATE                                    17                 2205                      2    11.00
SSIRS_DMART        DIM_CS_PGM                                 38                14916                      4    10.00
SSIRS              ES_CSLD_VER2_1DAY                          65                 3373                      4     6.00

Notes

100 percent means that on average, every query on this table references every column in the table. The lower this number is, the more likely the power of BLU is being leveraged.

Ember Crooks
Ember Crooks

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, spent a year working with high-volume MySQL, and is now learning Snowflake. Ember shares both posts about her core skill sets and her journey learning Snowflake.

Ember lives in Denver and work from home

Articles: 545

2 Comments

  1. Good point regarding using the number of columns referenced by a query to understand if using BLU may be a good fit for a workload! There is a small but important detail worth pointing out …

    Take for example the following query that reports sales for a 5-day period where a transaction was over 1000 dollars:
    select * from sales where saleDate > ’10/10/2015′ and saleDate 1000

    If the sales table has 90 columns, avg_cols_ref_per_query would return 90. However query processing in BLU will take advantage of late materialization by first only accessing the saleDate column, applying the date predicate followed by the same for the salePrice column and only for the surviving rows, the remaining 88 columns are accessed (a concept known as late materialization in the industry). Based on the selectivity of the predicates, the sales table may still be a good candidate for being column-organized despite pct_cols reporting 100% of columns are accessed. Another point for readers to consider.

    • Agreed. This is by no means the only measure for BLU being successful. I’ve heard that queries that access all columns can run faster on BLU in a number of scenarios. This is just one factor to consider.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.