Automatic Bufferpools in DB2 HADR

Posted by

Pavan

In this blog post, I will share what I learned about DB2 automatic bufferpools when using HADR.

Problem:

In a DB2 10.5 (fp3a) HADR environment, during heightened database activity on the primary database, there was delayed log replay on principal standby database. The HADR_SYNCMODE was set to SYNC. Primary and Standby database servers were in the same data center.

HADR_SYNCMODE = SYNC

Here is a quick reference to what SYNC mode is:

  • In SYNC mode, the log buffer is sent first to the standby server, written to the log files on the standby, and then written to the log files locally on the primary server and only then does the application get a successful return to their commit statement. In this manner, the transaction is on disk on both the primary and standby server so you are guaranteed never to lose a transaction if one or both servers fail.
  • In other words, in SYNC mode, two on-disk copies of data are required for transaction commit and so there is no chance of data loss. SYNC mode offers the best protection of data.
  • There is an extra cost in terms of extra time for writing on the standby and sending an acknowledgment message back to the primary. Log write and replication events happen sequentially. The total time for a log write event is the sum of (primary_log_write + log_send + standby_log_write + acknowledge_message). The communication overhead of replication in this mode is significantly higher than that of the other three synchronization modes.

If you would like to take a quick look at all the available synchronization modes in DB2 HADR, refer to Ember Crooks’ article.

The Usual Suspect – Network Problem?

My initial guess was that network layer was contributing to delays. However, from quick analysis, I learned that was not the case.
db2pd’s output for HADR showed the following:

$db2pd -db dbname -hadr | grep -i log_time
PRIMARY_LOG_TIME = 08/09/2015 11:06:18.000000 (1439132778)
STANDBY_LOG_TIME = 08/09/2015 11:06:14.000000 (1439132774)
STANDBY_REPLAY_LOG_TIME = 08/09/2015 08:55:10.000000 (1439124910)

What this indicated is that the timestamp of the latest log file (on disk) on primary and standby database servers is the same or acceptably close. This meant that all the log files that were on primary were able to make it to secondary. So, HADR was able to ship the logs without delay, essentially proving that network layer was not a problem.

How was Replay Delay Detected?

From db2pd’s output shown above, the standby replay log time was 08:55 hours while the primary log time was 11:06 hours. This indicated delay of more than 2 hours in log replay. In other words, transactions were ~2 hours behind on the standby database when compared to primary. In addition to this, I also noticed that the replay gap was increasing as the database activity on the primary continued.

Impact of Delayed Replay on Standby Database

As the database activity on the primary continued to chug along, the number of log files in the active log path on the standby database kept piling up due to delayed log replay on the standby. This eventually led to active log path on the standby to be full. At this point, DB2 stopped further database activity on the primary database.

Root Cause of Delayed Replay (Short Version)

We captured output from iostat, vmstat and other commands; we found that the root cause of the delayed replay was the difference in buffer pool sizes between primary and standby databases. This buffer pool size difference was because of STMM tuning the buffer pool on primary.

Detailed Analysis and Solving Delayed Replay (Long Version):

The command that was used to analyze IO data is below:

$  iostat -DRasl -T 10 120

Output from the above iostat command showed there were read and write differences:

Adapter:       xfers                 time                   
-----------  -------------------- ------------              
           bps tps  bread bwrtn                        
fcs0      5.1M 196.6 4.3M 850.7K  23:37:05              
fcs1      5.0M 193.3 4.1M 826.2K  23:37:05              
fcs2      5.0M 195.3 4.2M 783.6K  23:37:05              
fcs3      5.1M 199.2 4.3M 822.1K  23:37:05                                                                                      

As we can see read-speed (bread) is around 4-5M but write speed (bwrtn) is less than 1M. We suspected that this is a writing issue on the standby database.

Further, we used db2pd to analyze buffer pool activity and that helped us to pin-point why there were so few writes on the standby database when compared to the primary database. The command that was used is below.

$  db2pd -db dbname –bufferpools

On Primary:

BPID DatLRds    DatPRds     HitRatio TmpDatLRds TmpDatPRds  HitRatio  IdxLRds    IdxPRds     HitRatio TmpIdxLRds TmpIdxPRds  HitRatio         
1    128286122  8306757     93.52%   81847      0           100.00%   670806391  4499155     99.33%   0          0           00.00%       

On Standby:

                                                                   
BPID DatLRds    DatPRds     HitRatio TmpDatLRds TmpDatPRds  HitRatio   IdxLRds    IdxPRds     HitRatio TmpIdxLRds TmpIdxPRds  HitRatio         
1    23562340   1478679     93.72%          0          0           00.00%     36971419   5982190     83.82%   0          0           00.00%           

Based on the above data, it was evident that there were lots of physical reads on Index pages. Also, buffer pool hit ratio for indexes was 83.82% on the standby while on the primary it was 99.33%

This led us to look into buffer pool sizes and STMM’s role in HADR.

STMM with HADR and Buffer Pool Analysis

In IBM’s Knowledge center, although “Buffer pool operations” are listed among the operations that are replicated from the primary to the standby database, what we learned from the white paper  “The Self-Tuning Memory Manager (STMM)” is that only ‘explicit’ buffer pool operations are replicated to standby. No STMM operations are replicated.

When STMM is activated on a HADR system, memory tuning will only occur on the primary server. During takeover, when the secondary server becomes the primary server, STMM will be started on the new primary node (the former secondary node) and memory tuning will begin. At the same time, tuning will be stopped on the new secondary node (the former primary node). As a result, it may be advantageous to allow each server to run as the primary for a period of time before entering into production. This will ensure that both servers are suitably configured from a memory perspective and will ease the memory tuning process after takeover occurs.

In order to continue to take advantage of STMM on primary database, the following sequence of steps were implemented:

  1. Alter buffer pool to higher value on primary
  2. Validate that the above value (of buffer pool) got propagated to standby
  3. Alter buffer pool to AUTOMATIC on primary

Using the same db2pd command (db2pd –db dbname –bufferpools) that we used earlier, we can compare BP sizes. Initially, the buffer pool sizes are the same on primary and standby. This is because, these commands were run after a DB2 recycle.

## On Primary:
Address            Id   Name               PageSz     PA-NumPgs  BA-NumPgs  BlkSize    NumTbsp    PgsToRemov CurrentSz  PostAlter  SuspndTSCt Automatic
0x0A000102130D9580 1    IBMDEFAULTBP       32768      1003       0          0          7          0          1003       1003       0          True

## On Standby:
Address            Id   Name               PageSz     PA-NumPgs  BA-NumPgs  BlkSize    NumTbsp    PgsToRemov CurrentSz  PostAlter  SuspndTSCt Automatic
0x0A000200111E3A20 1    IBMDEFAULTBP       32768      1003       0          0          7          0          1003       1003       0          True 

Step 1: Alter bufferpool to higher value on Primary:

$db2 "alter bufferpool IBMDEFAULTBP immediate size 13124"
DB20000I  The SQL command completed successfully.

$db2pd -db dbname -bufferpools
Address            Id   Name               PageSz     PA-NumPgs  BA-NumPgs  BlkSize    NumTbsp    PgsToRemov CurrentSz  PostAlter  SuspndTSCt Automatic
0x0A000102130D9580 1    IBMDEFAULTBP       32768      13124      0          0          7          0          13124      13124      0          False

Step 2: Validate on Standby:

 ## On standby,
$db2pd -db dbname -bufferpools
Address            Id   Name               PageSz     PA-NumPgs  BA-NumPgs  BlkSize    NumTbsp    PgsToRemov CurrentSz  PostAlter  SuspndTSCt Automatic
0x0A000200111E3A20 1    IBMDEFAULTBP       32768      13124      0          0          7          0          13124      13124      0          False

From the above, standby database’s bufferpool size matches with the one on primary but not-eligible for automatic tuning (“Automatic” is set to “False” on both primary and standby).

Step 3: Alter buffer pool to AUTOMATIC on Primary

$  db2 "alter bufferpool IBMDEFAULTBP immediate size automatic"
DB20000I  The SQL command completed successfully.

Validate that STMM is turned ON on primary

$  db2pd -db dbname -bufferpools
Address            Id   Name               PageSz     PA-NumPgs  BA-NumPgs  BlkSize    NumTbsp    PgsToRemov CurrentSz  PostAlter  SuspndTSCt Automatic
0x0A000102130D9580 1    IBMDEFAULTBP       32768      13124      0          0          7          0          13124      13124      0          True

Validate on Standby

$  db2pd -db dbname -bufferpools
Address            Id   Name               PageSz     PA-NumPgs  BA-NumPgs  BlkSize    NumTbsp    PgsToRemov CurrentSz  PostAlter  SuspndTSCt Automatic
0x0A000102130D9580 1    IBMDEFAULTBP       32768      13124      0          0          7          0          13124      13124      0          True

Step 4: After some time STMM is tuning Primary bufferpool because of activity on the primary database. (In this case, BP size has been reduced from 13124 to 8400 pages)

$  db2pd -db dbname -bufferpools
Address            Id   Name               PageSz     PA-NumPgs  BA-NumPgs  BlkSize    NumTbsp    PgsToRemov CurrentSz  PostAlter  SuspndTSCt Automatic
0x0A000102130D9580 1    IBMDEFAULTBP      32768         8400        0         0          7            0       8400       8400          0          True

The Standby retains the high value because STMM is not active on Standby database. (Although it is set to TRUE as shown below, STMM operations are not replayed on standby).

$  db2pd -db dbname -bufferpools
Address            Id   Name               PageSz     PA-NumPgs  BA-NumPgs  BlkSize    NumTbsp    PgsToRemov CurrentSz  PostAlter  SuspndTSCt Automatic
0x0A000200111E3A20 1    IBMDEFAULTBP       32768      13124      0          0          7          0          13124      13124      0          True

From the above, we were able to use an ALTER BUFFERPOOL IMMEDIATE SIZE n statement on a primary and the SIZE was propagated to the standby. We then issued an ALTER BUFFERPOOL IMMEDIATE AUTOMATIC on the primary for STMM to begin tuning the buffer pool on the primary. The AUTOMATIC setting was propagated to the standby, but the standby retains the SIZE of n pages. If the standby takes over, STMM should still begin tuning the buffer pool on the former standby. The above activity solved the slow buffer pool replay issue on the standby database.

Learning Points and Conclusion

  1. STMM with HADR will only tune memory on the primary database server.
  2. It may be advantageous to allow each server to run as the primary for a period of time before entering into production. This will ensure that both servers are suitably configured from a memory perspective and will ease the memory tuning process after takeover occurs.
  3. If you would like to use STMM with HADR,
    • Alter buffer pool size to a higher value on primary in order for HADR to propagate this value to standby and then alter the buffer pool on primary to AUTOMATIC to take advantage of STMM. This leaves the bufferpool on standby at a fixed value. What this does is it will help to reduce bufferpool size related problems on the standby database like slow log replay.
    • On primary, alter bufferpool to automatic for STMM to start tuning the bufferpool. This alter gets replayed to standby. However, as STMM operations are not replayed on standby, the standby database bufferpool will continue to be at a fixed size.
    • After steps a and b, primary’s bufferpool is tuned by STMM while the standby’s bufferpool is fixed (at 6 GB in the above example).
    • When a failover happens, as the now-primary database has STMM turned ON (from step b) and so the bufferpool is eligible for tuning. On the now-secondary database, bufferpool is set to a fixed size (where STMM left it) and with STMM ON (but no STMM actions are performed).

I would like to convey my special thanks to my colleague Paul Pearon Jr. who worked along with me in troubleshooting and documenting this learning experience.


PavanPavan’s work experience includes database consulting engagements at Fortune Finance / Health care / Insurance clients. He currently works full time as a Sr. Database Engineer and is an IBM Certified Advanced Database Administrator – DB2 10.1 for Linux, UNIX and Windows.
In his current role, he manages and supports a large multi-terabyte Enterprise Data Warehouse (EDW) on DB2 LUW DPF (Database Partitioning Feature) and Highly Available (24/7) OLTP (DB2 / HADR) databases.
He was selected into “Top 10 User Speakers” at the 2014 IDUG Technical Conference in Phoenix, AZ.
His LinkedIn profile is: www.linkedin.com/in/pavankristipati/. He can be reached at pavan @db2talk.com (no space before @)
He is a regular blogger at www.db2talk.com.

Pavan Kristipati’s work experience includes database consulting engagements at Fortune Finance / Health care / Insurance clients. He currently works full time as a Sr. Database Engineer and is an IBM Certified Advanced Database Administrator – DB2 10.1 for Linux, UNIX and Windows. In his current role, he manages and supports a large multi-terabyte Enterprise Data Warehouse (EDW) on DB2 LUW DPF (Database Partitioning Feature) and Highly Available (24/7) OLTP (DB2 / HADR) databases. He was selected into “Top 10 User Speakers” at the 2014 IDUG Technical Conference in Phoenix, AZ.

2 comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.