Db2 Basics: Buffer Pools

Posted by

What is a Buffer Pool?

A buffer pool is one of the most critical memory areas for a Db2 database. Buffer pools are used to cache data in a way that the data can be reused by other connections. They do not store the results of Query A and Query B, but they store the data pages used to produce those results.

While databases are thought of as I/O intensive, the whole point of the buffer pools is to make the impact of I/O, particularly in OLTP and smaller databases, less. Most RDBMSes cache data in some way to improve performance. Db2 will take just about as much memory as you can throw at it. I’ve run databases on systems with more memory than the size of the database. As memory has gotten cheaper, this has become more of a realistic thing. Who needs SSD when all of your data fits in memory?

That’s a bit of a joke because it’s not always possible to fit a database in memory. Db2 focuses a fair amount of effort on getting the “hot” data into memory and keeping it there. I’m old enough to have been working as a DBA when 32-bit systems were the norm. I remember the experimental phase of 64-bit Db2 and all of the conversions of systems. Why do I even mention that? On 32-bit systems, we were limited to using 2 GB of memory, which seemed huge in the early days of 32-bit systems, but by the end seemed to always be horribly constraining. Db2 did a fairly good job still of using that limited amount of memory to maximum advantage.

How Data Gets into a Buffer pool

Every single data page, index page, temporary page, or XML page must go into the buffer pool before the data on the page can be returned as a part of any result set. The preferred way of getting data into the buffer pools is by prefetching. Prefetching refers to getting the data from disk into memory before it is needed. Db2 has several algorithms to try to make this happen. The three types of prefetch are sequential, read-ahead, and list. Prefetching of data is called asynchronous I/O.

If the data does not make it in to the buffer pools ahead of time, then the agent (process or edu) that is serving the connection and executing the query will fetch the data into the buffer pool. This is referred to as synchronous I/O, and is significantly slower than asynchronous I/O. The agent is not doing other things while moving data from disk to the buffer pools.

Why Buffer pools are Critical to Performance

It should be clear, then, that buffer pools are critical to Db2 database performance. Back before auto tuning, when the default size for the default buffer pool was 1000 pages it was kind of fun to walk into a database that had basically not been touched before and with a single parameter change just make performance instantly (after a recycle) better. I remember doing that once and taking a problem query from 15 minutes to less than 30 seconds. The application folks thought I was a magician. With STMM(Self-Tuning Memory Manager) tuning most buffer pools these days, there is less opportunity for that.

Today, I see buffer pool hit ratios as a canary in the coal mine. Before you see huge impact on the end-user side, and before other metrics start to tank, when using STMM, buffer pool hit ratios will start to decline. This is my indicator to either tune SQL (which leads to less utilization of buffer pools) or add memory to the system.

How Many Buffer Pools?

You can have a number of buffer pools. When I learned some Oracle, you could only have one of each size (may have since changed). But Db2 will allow more. Why would you want more than one per page size? Well, there are some fancy things like splitting things out. Usually this is only useful at a reasonable scale, but you can use a dedicated buffer pool to do things like isolating a table you would like to remain in memory at all times. The most useful way to split things into multiple buffer pools is by access type – placing tables where random access is expected in different buffer pools from those expecting to have frequent scans. The main reason for this is that the scan of a table can force very frequently used data out of a buffer pool, thereby decreasing performance for the applications that do the more random accesses. In OLTP or mixed workloads, often the random accesses are the more important for performance.

Mapping Buffer Pools to Table Spaces

Each buffer pool can serve more than one table space. By default, there is only one buffer pool created when you create a new database. It is called IBMDEFAULTBP and it has whatever page size was specified for the default for the database when creating the database. If a size was not specified, the default is 4K.

Each table space is assigned to one and only one buffer pool. Thus if you want to isolate a table or group of tables in a buffer pool, you have to isolate it/them in their own table space.

While many databases have more than one buffer pool, it is somewhat uncommon for them to have more than 20 these days. Frequently you do end up with one of each pagesize – so 4 total. There are some things like creating a locking event monitor, that require a 32k page size table space and therefore a 32k page size buffer pool.

For analytics or DW environments, it is more common to have separate buffer pools for temporary data. A larger page size is also more standard for these environments – often 32k. If using the BLU feature, most or all data should be 32K.

Investigating the Buffer Pools in any Db2 Database

It is so common to have a very small number of buffer pools that my default SQL for testing that running a SQL statement works is:

select * from SYSCAT.BUFFERPOOLS

This is the easiest/quickest way to get basic details about the buffer pools. The output will look something like this:

BPNAME                                                                                                                           BUFFERPOOLID DBPGNAME                                                                                                                         NPAGES      PAGESIZE    ESTORE NUMBLOCKPAGES BLOCKSIZE   NGNAME
-------------------------------------------------------------------------------------------------------------------------------- ------------ -------------------------------------------------------------------------------------------------------------------------------- ----------- ----------- ------ ------------- ----------- --------------------------------------------------------------------------------------------------------------------------------
IBMDEFAULTBP                                                                                                                                1 -                                                                                                                                         -2        4096 N                  0           0 -

  1 record(s) selected.

This gives the simplest information. From this, I can see that I have only one buffer pool, and that it has a 4k pagesize. I can also tell by the NPAGES of -2 that the size of this buffer pool is automatically managed by STMM.

If you want to see buffer pool hit ratios, this query works nicely:

db2 "SELECT SUBSTR(DB_NAME,1,8) AS DB_NAME, SUBSTR(BP_NAME,1,14) AS BP_NAME,
   TOTAL_HIT_RATIO_PERCENT, DATA_HIT_RATIO_PERCENT,
   INDEX_HIT_RATIO_PERCENT, XDA_HIT_RATIO_PERCENT, DBPARTITIONNUM
   FROM SYSIBMADM.BP_HITRATIO ORDER BY DBPARTITIONNUM"

DB_NAME  BP_NAME        TOTAL_HIT_RATIO_PERCENT DATA_HIT_RATIO_PERCENT INDEX_HIT_RATIO_PERCENT XDA_HIT_RATIO_PERCENT DBPARTITIONNUM
-------- -------------- ----------------------- ---------------------- ----------------------- --------------------- --------------
TESTDB   IBMDEFAULTBP                         -                      -                       -                     -              0
TESTDB   IBMSYSTEMBP4K                        -                      -                       -                     -              0
TESTDB   IBMSYSTEMBP8K                        -                      -                       -                     -              0
TESTDB   IBMSYSTEMBP16K                       -                      -                       -                     -              0
TESTDB   IBMSYSTEMBP32K                       -                      -                       -                     -              0

  5 record(s) selected.

Notice there are 4 more buffer pools in this output than when I queried syscat.bufferpools. Those are the hidden buffer pools. I think of them as the emergency buffer pools. I distinctly remember working with a colleague to tune a database. This was back in the 8.2 days, and we both had maybe 3 years of experience. He had run the configuration wizard/advisor and applied its recommendations, not realizing its recommendations were higher for buffer pools than the memory available on the server. After recycling Db2 performance was TERRIBLE. We discovered that being unable to allocate the buffer pools at the defined sizes, Db2 was falling back on these default buffer pools. They’re 100 pages in size, which pretty much guarantees you’ll have space for them, but it’s like running without buffer pools – all I/O becomes synchronous, so you’re waiting on disk and back then (around 2004), disks were sloooow.

So just be aware they exist, but are only used if Db2 is unable to allocate the defined buffer pools.

There is also a MON_GET table function that has a lot of information available. Like all of the MON_GET table functions, there are a huge number of columns, mostly focused on metrics at a level of detail you hope to never need. You can query it using syntax like this:

WITH BPMETRICS AS (
    SELECT bp_name,
           pool_data_l_reads + pool_temp_data_l_reads +
           pool_index_l_reads + pool_temp_index_l_reads +
           pool_xda_l_reads + pool_temp_xda_l_reads as logical_reads,
           pool_data_p_reads + pool_temp_data_p_reads +
           pool_index_p_reads + pool_temp_index_p_reads +
           pool_xda_p_reads + pool_temp_xda_p_reads as physical_reads,
           member
    FROM TABLE(MON_GET_BUFFERPOOL('',-2)) AS METRICS)
   SELECT
    VARCHAR(bp_name,20) AS bp_name,
    logical_reads,
    physical_reads,
    CASE WHEN logical_reads > 0
     THEN DEC((1 - (FLOAT(physical_reads) / FLOAT(logical_reads))) * 100,5,2)
     ELSE NULL
    END AS HIT_RATIO,
    member
   FROM BPMETRICS

BP_NAME              LOGICAL_READS        PHYSICAL_READS       HIT_RATIO MEMBER
-------------------- -------------------- -------------------- --------- ------
IBMDEFAULTBP                        21271                  858     95.96      0
IBMSYSTEMBP4K                           0                    0         -      0
IBMSYSTEMBP8K                           0                    0         -      0
IBMSYSTEMBP16K                          0                    0         -      0
IBMSYSTEMBP32K                          0                    0         -      0

This gives you essentially the same information as querying SYSIBMADM.BP_HITRATIO. That’s all I’m doing in this query. Just be aware there are a bunch more metrics available.

Tuning Bufferpools

I still remember, around about Db2 8.2, when tuning the buffer pools dynamically became an option. We were supporting a big website through a peak period, and we literally sat there watching hit ratios in db2top and tweaking which buffer pools had the most memory throughout the day to optimize for the varying workloads throughout the day.

Today, we have the self-tuning memory manager (STMM). For the vast majority of implementations, STMM does just great at tuning buffer pools. Unless you’re doing something like tuning a database for vastly different workloads at night vs. during the day, just let STMM do it’s thing when it comes to buffer pools. This is for single-node Db2. I’m not up on the latest with PureScale and DPF to know if STMM works or works well for them.

Summary

I feel strangely old writing this article. It brings back many memories. I remember the bad old days when we spent a lot of time tuning buffer pools, and that’s mostly just not needed today. It is still important to understand buffer pools, and know there are more advanced topics you can dig into in this area such as page cleaning, I/O servers, scan-sharing and block-based buffer pools.

Lead Database Administrator
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 18 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home

2 comments

  1. Hi Ember,
    not to forget “db2pd -d dbname -bufferpools”!
    It is of course a bit cryptic at first, but always a good fast way to get an overview of the current situation.
    Cheers
    Roland

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.