Purpose
This statement reports how selective at the column level queries are that run against a specific table. This does not look at overall selectivity or row selectivity. It will only work in DB2 10.5. BLU performs best when not all columns are referenced by queries.
Source
While I’ve modified this statement, it started with statements from these two sources, both influenced by David Kalmuk:
http://www.ibm.com/developerworks/data/library/techarticle/dm-1407monitor-bluaccel/index.html
http://www.dbisoftware.com/blog/db2nightshow.php?id=619
Statement
with t1 as ( SELECT substr(tabschema,1,18) as tabschema ,substr(tabname,1,33) as tabname ,(select count(*) from syscat.columns c where c.tabname=mgt.tabname and c.tabschema=mgt.tabschema) as num_cols ,section_exec_with_col_references as num_queries ,(num_columns_referenced / nullif(section_exec_with_col_references, 0)) as avg_cols_ref_per_query FROM table(mon_get_table('', '', -2)) AS mgt where tab_organization='C' ) SELECT t1.* , decimal((float(t1.avg_cols_ref_per_query)/float(num_cols)),5,2) * 100 as PCT_COLS FROM table(mon_get_table('', '', -2)) AS mgt2 join t1 on t1.tabschema = mgt2.tabschema and t1.tabname=mgt2.tabname where mgt2.tab_organization='C' and mgt2.tabschema not like 'SYS%' order by PCT_COLS desc, t1.num_queries desc with ur;
Sample Output
TABSCHEMA TABNAME NUM_COLS NUM_QUERIES AVG_COLS_REF_PER_QUERY PCT_COLS ------------------ --------------------------------- ----------- -------------------- ---------------------- -------- SSIRS COGNOS_LOGIN_RAW_MS_EXCHANGE 7 343483 7 100.00 SSIRS COGNOS_LOGIN_RAW_MS_EXCHANGESMTP 6 343482 6 100.00 SSIRS ROW_COUNT_CMIPSSFTP 5 1447 5 100.00 SSIRS COGLOG_USERINFO 8 131 8 100.00 SSIRS_DMART DIM_CS_LCTN 15 10342 2 13.00 SSIRS_DMART DIM_CS_PGM_REG 39 3636 5 12.00 SSIRS COLL_RFRL 55 886 7 12.00 SSIRS_DMART DIMDATE 17 2205 2 11.00 SSIRS_DMART DIM_CS_PGM 38 14916 4 10.00 SSIRS ES_CSLD_VER2_1DAY 65 3373 4 6.00
Notes
100 percent means that on average, every query on this table references every column in the table. The lower this number is, the more likely the power of BLU is being leveraged.
Good point regarding using the number of columns referenced by a query to understand if using BLU may be a good fit for a workload! There is a small but important detail worth pointing out …
Take for example the following query that reports sales for a 5-day period where a transaction was over 1000 dollars:
select * from sales where saleDate > ’10/10/2015′ and saleDate 1000
If the sales table has 90 columns, avg_cols_ref_per_query would return 90. However query processing in BLU will take advantage of late materialization by first only accessing the saleDate column, applying the date predicate followed by the same for the salePrice column and only for the surviving rows, the remaining 88 columns are accessed (a concept known as late materialization in the industry). Based on the selectivity of the predicates, the sales table may still be a good candidate for being column-organized despite pct_cols reporting 100% of columns are accessed. Another point for readers to consider.
Agreed. This is by no means the only measure for BLU being successful. I’ve heard that queries that access all columns can run faster on BLU in a number of scenarios. This is just one factor to consider.