DB2 LUW Indexes: B-Tree Details, Physical Storage and More

I had an earlier blog entry on the basics of Indexes, but there are so many details to cover.

Indexes are Physical Objects

An index is an object in an IBM DB2 LUW database that takes up space on disk. Much like a table, it contains actual data, though index data is a partial copy of the data in one table. The reason indexes exist is to speed access to the data in tables. In contrast, views may be in place either to improve performance or to manage security issues, but they are not manifested on disk. Sometimes queries can be fulfilled through only accessing an index without going to the table. This is called index-only access.

By definition, Indexes represent a duplication of data that already exists in the table.

Different relational database management systems (RDBMSes such as Oracle or MySQL) handle indexes differently. So the information in this blog entry is specific to IBM DB2 for Linux, Unix, and Windows.

B-Tree Indexes

There is only one type of index that IBM DB2 for LUW supports – B-Tree indexes. (Ok, block indexes for MDC tables, but that’s beyond the scope of this blog entry).
B-Tree indexes are typically represented conceptually by three levels, as in the diagram below, though there can be additional levels of intermediate nodes with the higher levels pointing to lower levels of intermediate nodes.

B-Tree Index

B-Tree Index

In this diagram, DB2 would start at the root page. “Page” and “Node” are interchangeable terms for IBM DB2 LUW. Let’s work through this as if DB2 were looking for the value of ‘E’ in this index on a single character. IBM DB2 would first look at the root node, and look at each value. Essentially, it would ask:
Is E before or equal to C?
No.
Is E before or equal to J?
Yes – go to that intermediate page.
The process then happens again for values on the intermediate page.
Is E before or equal to D?
No.
Is E before or equal to F?
Yes – go to that leaf page.
Once IBM DB2 has identified the right leaf page, it will find the right value on the page and get the RID or Record IDentifier. The RID tells db2 the exact location (page and slot number) in the table/tablespace where it can find that particular record. Many times once IBM DB2 has that RID, it will go get additional data from the data page in the table.

A RID consists of a page number and a slot number on that page. For tables in SMS tablespaces, the RID is relative to the table. For tables in DMS or AST tablespaces, the RID is relative to the tablespace.

In the scenario we have described above, given a unique value in an index, IBM DB2 will use exactly four page reads (which may be entirely in memory) to get a specific row. This is much more efficient than scanning through hundreds or thousands or millions of pages in the table to find the needed record.

The index can have more than one column. In that case, DB2 would follow this process for each column in order – first identifying the place where the values in the first column exist, and then identifying where the values in the second column exist and so forth. The image below represents a b-tree index with multiple columns.

B-Tree Index with Two Columns

B-Tree Index with Two Columns
For this one there are two columns – first one on a singe character, and then one with a single number that can have the value of either 1, 2, or 3. If we were looking for the values of F and 3, we would start at the root page, and ask:
Is F before or equal to C?
No.
Is F before or equal to J?
Yes – go to that intermediate page.
Is F before or equal to D?
No.
Is F before or equal to F?
Yes.
Is 3 before or equal to 2?
No, move to the next listing on the intermediate page.
Is F before or equal to J?
Yes, go to that leaf page, and get the RID.

The above shows indexes with only one entry for each index key, or at least a very small number. There may also be indexes with a lower cardinality – with many entries for each index key. Index cardinality is the number of unique index key values. The cardinality of an index may be anywhere from 1 to the number of rows in the table. Unique indexes have a cardinality equal to the cardinality of the table.

Accessing Indexes

If you are looking at an explain plan of access to table using an index, it would look something like this, given a table name of WSCOMUSR.MBRATTRVAL and an index name of WSCOMUSR.I0000327:

   Total Cost:         298.134
    Query Degree:       1

            Rows 
           RETURN
           (   1)
            Cost 
             I/O 
             |
           7309.97 
           FETCH 
           (   2)
           298.134 
           152.851 
         /---+----\
     7309.97    6.90035e+06 
     IXSCAN   TABLE: WSCOMUSR
     (   3)     MBRATTRVAL
     54.7399        Q1
     6.86993 
       |
   6.90035e+06 
 INDEX: WSCOMUSR
    I0000327
       Q1

Indexes can also be accessed using an index leaf scan. An index leaf scan is where every leaf page in the index is scanned in order, much like a table scan. These are a bit hard to find because IBM DB2 refers to any index access as an “IXSCAN”, even if it is just direct access as in the example above. Index leaf scans occur when most or all of the needed data is in the index, but one or more of the columns earlier in the index are not specified or used in the query.

Column Order Matters

Above, I talked about how DB2 uses the index, looking for values for the first column first, and the second column second, and so forth. This means that when you are trying to find appropriate indexes, the order of the columns in the index matters very much. If a query specifies a value for the first column in the index and the third column in and index, but not the 2nd column, then that index will not be very useful for that query prior to DB2 10. In DB2 10, the concept of “Jump Scans” was introduced, so it is now possible in some cases for DB2 to still use the index, but only in certain scenarios.

Indexes may Help Select Performance, May Hurt Insert/Update/Delete Performance

Indexes can be great for some Select performance, but every index added negatively impacts Insert/Update/Delete performance. That is because for each record in an Insert/Update/Delete, DB2 must also go out and handle the record in every index. Above we saw that with just three page reads, we could locate an individual row. Those three reads and associated writes must also occur for each Insert/Update/Delete. That means that with standard three-level b-tree indexes and 10 indexes on a table, in addition to locating and altering the proper page in the table for a particular record, DB2 must also touch 30 pages of index. It doesn’t take long for the maintenance of the duplicate data that indexes are to become problematic.

For non-unique indexes, each index key is only stored on each leaf page once. This is saves space. The maximum rid for each leaf page is also stored on the intermediate pages, to make access to a specific RID as efficient as possible. The RID in these cases is treated much like an additional column in the index, or keypart. Below is what our b-tree diagram looks like including that information:

Non-Unique B-Tree Index

Non-Unique B-Tree Index

Index Options

PCTFREE

On index creation, LOAD, and REORG the parameter of PCTFREE is respected when the pages are filled with data. The default if nothing is specified is 10 percent. What PCTFREE does is leave open space on each leaf page (LEVEL2 PCTFREE is also available for intermediate nodes) of an index for additional entries to be added. Performance of inserts is likely to be better if there is enough space on the right page to add an index entry where DB2 needs to. If there is not enough space on the right page, then DB2 will have to perform a page split (defined below), which requires another page read and write.

I like to think about the geometry of my index leaf pages. For example, the maximum index key size for a 4K tablespace is 1024. With an index key size that large, only 3 rows will fit on an index leaf page (since there is a small amount of page overhead as well). In that scenario, a PCTFREE of 10% would be about 410 bytes, which would not be enough room for an additional row to fit on the page. It is therefore a useless setting in that case, and you might want to consider a larger page size for the indexes for that table. There is no requirement for the indexes and the table to have the same page size as the table.

Because many larger indexes may include VARCHAR columns, the actual size of index keys tends to be rather small in many of my databases. Here’s a query you can use to find indexes where the average index key size is more than 10% of the page size:

select substr(i.indname, 1, 25) as indname, substr(i.tabname,1,20) as tabname, avgleafkeysize, pagesize 
from syscat.indexes i, syscat.tablespaces t 
where i.TBSPACEID=t.TBSPACEID and avgleafkeysize > .1 * pagesize 
with ur;

I’d love to post the results of this query, but I don’t have any indexes that meet those parameters. Keep in mind that the average index key size may be very different from the maximum index key size, so the results here could change over time.

MINPCTUSED

MINPCTUSED can help reduce the need for reorgs, however it can also slow down updates and deletes, and is only really used if there is an exclusive table lock along with the update or delete. That makes it something I don’t use in my e-commerce databases, but it may be useful in some situations. When MINPCTUSED is specified, DB2 will attempt to merge a leaf page with a neighboring leaf page when the page is less full than the percentage specified for MINPCTUSED. Logically, you would probably not want to set this to anything over 50%

PAGE SPLIT

When there is no space available on a leaf page for an insert or an update, then DB2 must add in another page. By default, DB2 will split the page into two equal groups of RIDs – this is called a symmetric page split. You can, however specify “HIGH” or “LOW” for this parameter. When you specify “HIGH”, better performance is seen for ever-increasing values of the index key (such as you may have for a generated key like order number or like many of the _ID fields in WebSphere Commerce databases). If you specify “LOW”, then you get better performance for ever-decreasing index keys.

Index Size

Since indexes are actually represented on disk, they take up disk space. I’ve supported databases where the indexes were larger than the table data in total size. If you index everything, then your indexes will be larger than your table – so you have to be cautious and consider carefully the impact any new index will have on disk space.

Runstats

After you add an index, you must perform runstats on the table/index before the index will be used. On the create index command, you can use the keywords “COLLECT DETAILED STATISTICS” to collect statistics as the index is built. You can also simply do runstats on the table after creating the index.

A Word on General Indexing Strategies

This is not a post on how to find appropriate indexes. That is incredibly complicated and an art that generally requires a fair amount of experience to do well. In general, I’m a fan of the strategy of finding appropriate composite (multi-column) indexes. Individual indexes on separate columns don’t tend to perform as well. In my e-commerce databases, I don’t get a lot of index-only access, either – that is only appropriate for specific scenarios.

Index Risk

With any added index, there is a small risk of performance degradation for selects. It is possible that adding an index will cause DB2 to change an access plan for some other query and that change will actually be worse. The risk is pretty low, though, and it is extremely easy to back out an index addition by simply dropping the index and re-doing runstats.

Other Index Topics

There are plenty of other Index details that I’m likely to cover in future posts. This is a very large topic, and there are plenty of things to talk about.

Ember Crooks
Ember Crooks

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, spent a year working with high-volume MySQL, and is now learning Snowflake. Ember shares both posts about her core skill sets and her journey learning Snowflake.

Ember lives in Denver and work from home

Articles: 549

6 Comments

  1. When creating indexes for a star-schema is it beneficial for the natural key index to have the surrogate key as the second column? Since the lookups are done on the natural key usually, would having the surrogate key help with join efficient to the FACT table?

  2. Hi Ember,
    This question might be a little off topic, but I can not seem to find a place where I can rightfully ask it. If you think this is not the right place to ask, maybe you can point me to places where I could ask 🙂

    (I’m studying a bit on the optimizer,so this has nothing to do with actual real-world problems)
    DB2 LUW 11.5.6.0
    I created a table:
    create table ind1 (n1 int not null primary key,
    n2 int,
    n3 int,
    t varchar(100));

    insert into ind1
    WITH engine (rownum) AS (
    SELECT 1 AS rownum
    from sysibm.sysdummy1
    UNION ALL
    SELECT rownum + 1 AS rownum
    FROM engine
    where rownum<100000
    )
    SELECT rownum,
    case when rownum select nleaf,nlevels from syscat.indexes where tabname=’IND1′;

    NLEAF NLEVELS
    ——————– ——-
    417 3

    1 record(s) selected.

    db2 =>

    My assumption is/was (but for now is proven wrong) : Whenever I need data/RID’s for 1 row from the index I have 3 IO’s to get to the right leafnode.
    Hoever ….. db2exfmt tells me, it expects 2 IO’s

    [db2inst1@broccoli indscn]$ db2 “explain plan for select n1 from ind1 where n1=10000”
    DB20000I The SQL command completed successfully.
    [db2inst1@broccoli indscn]$
    [db2inst1@broccoli indscn]$ db2exfmt -1 -d tinus -o exp.txt
    DB2 Universal Database Version 11.5, 5622-044 (c) Copyright IBM Corp. 1991, 2017
    Licensed Material – Program Property of IBM
    IBM DATABASE 2 Explain Table Format Tool

    Connecting to the Database.
    Connect to Database Successful.
    Output is in exp.txt.
    Executing Connect Reset — Connect Reset was Successful.
    [db2inst1@broccoli indscn]$

    When I look at the output (just showing the graph portion):
    Access Plan:
    ———–
    Total Cost: 13.5334
    Query Degree: 1

    Rows
    RETURN
    ( 1)
    Cost
    I/O
    |
    1
    IXSCAN
    ( 2)
    13.5334
    2 <= Here is my problem
    |
    100000
    INDEX: SYSIBM
    SQL220220120518310
    Q1

    What I do not understand is : why does the optimizer thinks it will need 2 IO's. I was thinking 3.

    I probably completly misunderstand something….so don't hold back…tell me where I'm going wrong. Ow … and I have a Oracle background, so maybe (probably) I should think about thing differently regarding DB2.

    Anyway….I hope I do not bother you to much
    (And I did not really know how to format this reply….sorry about that

  3. OW……that’s great.
    In the last section I found what I was looking for:

    Runtime statistics for objects Used in Access Plan:

    Metrics for the index:
    object_index_l_reads:3
    object_index_lbp_pages_found:3
    object_index_gbp_indep_pages_found_in_lbp:3

    Metrics for the table:
    rows_reads:1
    object_data_l_reads:1
    object_data_lbp_pages_found:1
    object_data_gbp_indep_pages_found_in_lbp:1
    object_data_caching_tier_gbp_indep_pages_found:1

    I see 3 logical reads on the index
    And then 1 logical read on the table
    That sums up to the 4 IO’s I was expecting.

    So…a big thank you for your help

Leave a Reply to Ember CrooksCancel Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.