Example of A Clustering Low-Cardinality Index Helping Query Performance

Posted by

The request from the developers was something along the lines of “Help, Ember, this query needs to perform better”. Sometimes the query I’m working on is not one that shows up as a problem from the database administrator’s perspective, but one that is especially important in some part of application functioning. In this case, this query is related to the performance of searches done on the website – a particularly problematic area on this client.

The query I was asked to help with is this one:

select distinct 
from    attrdictsrchconf ADSC, 
    attr A, 
    attrval AV, 
    attrvaldesc AVD 
    ADSC.ATTR_ID is not NULL 
    and ADSC.attr_id = A.attr_id 
    and ADSC.mastercatalog_id = ? 
    and ADSC.srchfieldname in (?) 
    and A.storeent_id in (?, ?, ?) 
    and A.facetable = 1 
    and AV.attr_id = A.attr_id 
    and AV.storeent_id in (?, ?, ?) 
    and AV.attrval_id = AVD.attrval_id 
    and AVD.language_id = ?

The data model in question here is a typical WebSphere Commerce database, with no customizations involved in this query.

Doing an explain (using DBI’s Brother-Panther), I get this:

A typical db2advis gives me nothing useful, but I know better than to stop there. The thing I tried first was to do an explain that recommends MQTs and MDC tables. I’ve talked to WebSphere Commerce support on this and they’ve told me that MDC Tables are considered a customization. They don’t see why it wouldn’t work, but they won’t say it’s supported. C’mon IBM – there are several areas I can see MDC’s making big performance gains for me. But even if I can’t use MDC tables, the db2 advisor for them can sometimes give me ideas for clustering indexes, which I can use.

In DBI’s Brother-Panther, I just click the right boxes:

If doing this at the command line, you would use the ‘-m C’ option on db2advis to get the same kinds of recommendations.

In this case, it recommended one clustering index and that one table be converted to an MDC table. But the MDC recommendation includes only one dimension. This makes it particularly likely that a clustering index could help me nearly as much. Both index recommendations are duplications of existing indexes, but just adding clustering. From what I’ve seen, the WebSphere Commerce data model seems to favor a lot of indexes on single columns. This leads to some low-cardinality indexes. If you want to understand why low-cardinality indexes can hurt performance, please see my DeveloperWorks article, Why low cardinality indexes negatively impact performance.

Looking at the index in question, it is clearly a low-cardinality index:

select substr(indschema,1,12) as indschema, substr(indname,1,12) as indname, fullkeycard from syscat.indexes where indname='I0001468' with ur

------------ ------------ --------------------
WSCOMUSR     I0001468                        2

  1 record(s) selected.

Hard to get much more low-cardinality than 2. The cardinality of this full table is 1,255,954. The data is relatively evenly distributed across these two values. DB2 is (correctly) choosing not to use the index because if it’s low cardinality and the table is not clustered on it, it would make performance worse.

I changed the index to be a clustering index like this:

$ db2 "drop index wscomusr.I0001468"
DB20000I  The SQL command completed successfully.
$ db2 "create index wscomusr.i_attrvaldesc_01 on wscomusr.attrvaldesc (language_id) cluster allow reverse scans"                     <
DB20000I  The SQL command completed successfully.
$ db2 "REORG TABLE WSCOMUSR.ATTRVALDESC INDEX wscomusr.i_attrvaldesc_01 "
DB20000I  The REORG command completed successfully.
$ db2 "RUNSTATS ON TABLE WSCOMUSR.ATTRVALDESC with distribution and detailed indexes all"
DB20000I  The RUNSTATS command completed successfully.

After doing that, the explain looks like this:

Note that it's now using my clustering index for that table, even though it can't make it into index-only access. And the cost of that query is reduced by a whopping 41%. I got the cost of the query down a bit more with another clustering index, and the developers were quite happy. They asked if I could reduce it further and my response was "Well, can you stop using DISTINCT and IN?"

Lessons learned:

  1. Information beyond the standard db2advis can be useful even if you cannot make use of all DB2 features based on vendor restrictions.
  2. C'mon IBM - you're the vendor of both the application and the database. Can we get the use of MDC tables for base WebSphere Commerce tables certified?
  3. I am a lifelong GUI-hater, but I'm loving DBI's Brother-Panther for looking at things.

I somewhat doubt that I would have added this index by itself if it wasn't replacing a base WebSphere Commerce index with the exact same columns. Maybe if I happened on this same performance improvement. I might have experimented more with a composite index in that case. But I like sticking as close to the WebSphere Commerce data model as possible unless I have very specific reasons to do otherwise.

Lead Db2 Database Engineer and Service Delivery Manager , XTIVIA
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 17 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home for XTIVIA, leading a team of Db2 DBAs.

One comment

  1. Fabulous post, Ember, really enjoyed its simplicity and main point: clustered index even on columns with low cardinality can get you a lot. Great lesson, thanks for sharing.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.