In this blog post, I will share what I learned about DB2 automatic bufferpools when using HADR.
Problem:
In a DB2 10.5 (fp3a) HADR environment, during heightened database activity on the primary database, there was delayed log replay on principal standby database. The HADR_SYNCMODE was set to SYNC. Primary and Standby database servers were in the same data center.
HADR_SYNCMODE = SYNC
Here is a quick reference to what SYNC mode is:
- In SYNC mode, the log buffer is sent first to the standby server, written to the log files on the standby, and then written to the log files locally on the primary server and only then does the application get a successful return to their commit statement. In this manner, the transaction is on disk on both the primary and standby server so you are guaranteed never to lose a transaction if one or both servers fail.
- In other words, in SYNC mode, two on-disk copies of data are required for transaction commit and so there is no chance of data loss. SYNC mode offers the best protection of data.
- There is an extra cost in terms of extra time for writing on the standby and sending an acknowledgment message back to the primary. Log write and replication events happen sequentially. The total time for a log write event is the sum of (primary_log_write + log_send + standby_log_write + acknowledge_message). The communication overhead of replication in this mode is significantly higher than that of the other three synchronization modes.
If you would like to take a quick look at all the available synchronization modes in DB2 HADR, refer to Ember Crooks’ article.
The Usual Suspect – Network Problem?
My initial guess was that network layer was contributing to delays. However, from quick analysis, I learned that was not the case.
db2pd’s output for HADR showed the following:
$db2pd -db dbname -hadr | grep -i log_time PRIMARY_LOG_TIME = 08/09/2015 11:06:18.000000 (1439132778) STANDBY_LOG_TIME = 08/09/2015 11:06:14.000000 (1439132774) STANDBY_REPLAY_LOG_TIME = 08/09/2015 08:55:10.000000 (1439124910)
What this indicated is that the timestamp of the latest log file (on disk) on primary and standby database servers is the same or acceptably close. This meant that all the log files that were on primary were able to make it to secondary. So, HADR was able to ship the logs without delay, essentially proving that network layer was not a problem.
How was Replay Delay Detected?
From db2pd’s output shown above, the standby replay log time was 08:55 hours while the primary log time was 11:06 hours. This indicated delay of more than 2 hours in log replay. In other words, transactions were ~2 hours behind on the standby database when compared to primary. In addition to this, I also noticed that the replay gap was increasing as the database activity on the primary continued.
Impact of Delayed Replay on Standby Database
As the database activity on the primary continued to chug along, the number of log files in the active log path on the standby database kept piling up due to delayed log replay on the standby. This eventually led to active log path on the standby to be full. At this point, DB2 stopped further database activity on the primary database.
Root Cause of Delayed Replay (Short Version)
We captured output from iostat, vmstat and other commands; we found that the root cause of the delayed replay was the difference in buffer pool sizes between primary and standby databases. This buffer pool size difference was because of STMM tuning the buffer pool on primary.
Detailed Analysis and Solving Delayed Replay (Long Version):
The command that was used to analyze IO data is below:
$ iostat -DRasl -T 10 120
Output from the above iostat command showed there were read and write differences:
Adapter: xfers time ----------- -------------------- ------------ bps tps bread bwrtn fcs0 5.1M 196.6 4.3M 850.7K 23:37:05 fcs1 5.0M 193.3 4.1M 826.2K 23:37:05 fcs2 5.0M 195.3 4.2M 783.6K 23:37:05 fcs3 5.1M 199.2 4.3M 822.1K 23:37:05
As we can see read-speed (bread) is around 4-5M but write speed (bwrtn) is less than 1M. We suspected that this is a writing issue on the standby database.
Further, we used db2pd to analyze buffer pool activity and that helped us to pin-point why there were so few writes on the standby database when compared to the primary database. The command that was used is below.
$ db2pd -db dbname –bufferpools
On Primary:
BPID DatLRds DatPRds HitRatio TmpDatLRds TmpDatPRds HitRatio IdxLRds IdxPRds HitRatio TmpIdxLRds TmpIdxPRds HitRatio 1 128286122 8306757 93.52% 81847 0 100.00% 670806391 4499155 99.33% 0 0 00.00%
On Standby:
BPID DatLRds DatPRds HitRatio TmpDatLRds TmpDatPRds HitRatio IdxLRds IdxPRds HitRatio TmpIdxLRds TmpIdxPRds HitRatio 1 23562340 1478679 93.72% 0 0 00.00% 36971419 5982190 83.82% 0 0 00.00%
Based on the above data, it was evident that there were lots of physical reads on Index pages. Also, buffer pool hit ratio for indexes was 83.82% on the standby while on the primary it was 99.33%
This led us to look into buffer pool sizes and STMM’s role in HADR.
STMM with HADR and Buffer Pool Analysis
In IBM’s Knowledge center, although “Buffer pool operations” are listed among the operations that are replicated from the primary to the standby database, what we learned from the white paper “The Self-Tuning Memory Manager (STMM)” is that only ‘explicit’ buffer pool operations are replicated to standby. No STMM operations are replicated.
When STMM is activated on a HADR system, memory tuning will only occur on the primary server. During takeover, when the secondary server becomes the primary server, STMM will be started on the new primary node (the former secondary node) and memory tuning will begin. At the same time, tuning will be stopped on the new secondary node (the former primary node). As a result, it may be advantageous to allow each server to run as the primary for a period of time before entering into production. This will ensure that both servers are suitably configured from a memory perspective and will ease the memory tuning process after takeover occurs.
In order to continue to take advantage of STMM on primary database, the following sequence of steps were implemented:
- Alter buffer pool to higher value on primary
- Validate that the above value (of buffer pool) got propagated to standby
- Alter buffer pool to AUTOMATIC on primary
Using the same db2pd command (db2pd –db dbname –bufferpools)
that we used earlier, we can compare BP sizes. Initially, the buffer pool sizes are the same on primary and standby. This is because, these commands were run after a DB2 recycle.
## On Primary: Address Id Name PageSz PA-NumPgs BA-NumPgs BlkSize NumTbsp PgsToRemov CurrentSz PostAlter SuspndTSCt Automatic 0x0A000102130D9580 1 IBMDEFAULTBP 32768 1003 0 0 7 0 1003 1003 0 True ## On Standby: Address Id Name PageSz PA-NumPgs BA-NumPgs BlkSize NumTbsp PgsToRemov CurrentSz PostAlter SuspndTSCt Automatic 0x0A000200111E3A20 1 IBMDEFAULTBP 32768 1003 0 0 7 0 1003 1003 0 True
Step 1: Alter bufferpool to higher value on Primary:
$db2 "alter bufferpool IBMDEFAULTBP immediate size 13124" DB20000I The SQL command completed successfully. $db2pd -db dbname -bufferpools Address Id Name PageSz PA-NumPgs BA-NumPgs BlkSize NumTbsp PgsToRemov CurrentSz PostAlter SuspndTSCt Automatic 0x0A000102130D9580 1 IBMDEFAULTBP 32768 13124 0 0 7 0 13124 13124 0 False
Step 2: Validate on Standby:
## On standby, $db2pd -db dbname -bufferpools Address Id Name PageSz PA-NumPgs BA-NumPgs BlkSize NumTbsp PgsToRemov CurrentSz PostAlter SuspndTSCt Automatic 0x0A000200111E3A20 1 IBMDEFAULTBP 32768 13124 0 0 7 0 13124 13124 0 False
From the above, standby database’s bufferpool size matches with the one on primary but not-eligible for automatic tuning (“Automatic” is set to “False” on both primary and standby).
Step 3: Alter buffer pool to AUTOMATIC on Primary
$ db2 "alter bufferpool IBMDEFAULTBP immediate size automatic" DB20000I The SQL command completed successfully.
Validate that STMM is turned ON on primary
$ db2pd -db dbname -bufferpools
Address Id Name PageSz PA-NumPgs BA-NumPgs BlkSize NumTbsp PgsToRemov CurrentSz PostAlter SuspndTSCt Automatic
0x0A000102130D9580 1 IBMDEFAULTBP 32768 13124 0 0 7 0 13124 13124 0 True
Validate on Standby
$ db2pd -db dbname -bufferpools
Address Id Name PageSz PA-NumPgs BA-NumPgs BlkSize NumTbsp PgsToRemov CurrentSz PostAlter SuspndTSCt Automatic
0x0A000102130D9580 1 IBMDEFAULTBP 32768 13124 0 0 7 0 13124 13124 0 True
Step 4: After some time STMM is tuning Primary bufferpool because of activity on the primary database. (In this case, BP size has been reduced from 13124 to 8400 pages)
$ db2pd -db dbname -bufferpools
Address Id Name PageSz PA-NumPgs BA-NumPgs BlkSize NumTbsp PgsToRemov CurrentSz PostAlter SuspndTSCt Automatic
0x0A000102130D9580 1 IBMDEFAULTBP 32768 8400 0 0 7 0 8400 8400 0 True
The Standby retains the high value because STMM is not active on Standby database. (Although it is set to TRUE as shown below, STMM operations are not replayed on standby).
$ db2pd -db dbname -bufferpools
Address Id Name PageSz PA-NumPgs BA-NumPgs BlkSize NumTbsp PgsToRemov CurrentSz PostAlter SuspndTSCt Automatic
0x0A000200111E3A20 1 IBMDEFAULTBP 32768 13124 0 0 7 0 13124 13124 0 True
From the above, we were able to use an ALTER BUFFERPOOL IMMEDIATE SIZE n statement on a primary and the SIZE was propagated to the standby. We then issued an ALTER BUFFERPOOL IMMEDIATE AUTOMATIC on the primary for STMM to begin tuning the buffer pool on the primary. The AUTOMATIC setting was propagated to the standby, but the standby retains the SIZE of n pages. If the standby takes over, STMM should still begin tuning the buffer pool on the former standby. The above activity solved the slow buffer pool replay issue on the standby database.
Learning Points and Conclusion
- STMM with HADR will only tune memory on the primary database server.
- It may be advantageous to allow each server to run as the primary for a period of time before entering into production. This will ensure that both servers are suitably configured from a memory perspective and will ease the memory tuning process after takeover occurs.
- If you would like to use STMM with HADR,
- Alter buffer pool size to a higher value on primary in order for HADR to propagate this value to standby and then alter the buffer pool on primary to AUTOMATIC to take advantage of STMM. This leaves the bufferpool on standby at a fixed value. What this does is it will help to reduce bufferpool size related problems on the standby database like slow log replay.
- On primary, alter bufferpool to automatic for STMM to start tuning the bufferpool. This alter gets replayed to standby. However, as STMM operations are not replayed on standby, the standby database bufferpool will continue to be at a fixed size.
- After steps a and b, primary’s bufferpool is tuned by STMM while the standby’s bufferpool is fixed (at 6 GB in the above example).
- When a failover happens, as the now-primary database has STMM turned ON (from step b) and so the bufferpool is eligible for tuning. On the now-secondary database, bufferpool is set to a fixed size (where STMM left it) and with STMM ON (but no STMM actions are performed).
I would like to convey my special thanks to my colleague Paul Pearon Jr. who worked along with me in troubleshooting and documenting this learning experience.
Pavan’s work experience includes database consulting engagements at Fortune Finance / Health care / Insurance clients. He currently works full time as a Sr. Database Engineer and is an IBM Certified Advanced Database Administrator – DB2 10.1 for Linux, UNIX and Windows.
In his current role, he manages and supports a large multi-terabyte Enterprise Data Warehouse (EDW) on DB2 LUW DPF (Database Partitioning Feature) and Highly Available (24/7) OLTP (DB2 / HADR) databases.
He was selected into “Top 10 User Speakers” at the 2014 IDUG Technical Conference in Phoenix, AZ.
His LinkedIn profile is: www.linkedin.com/in/pavankristipati/. He can be reached at pavan @db2talk.com (no space before @)
He is a regular blogger at www.db2talk.com.
2 comments