I have not made extensive use of the HADR Tools that IBM offers in the past. Most of my HADR setups to date have either been same-data-center using NEARSYNC or have used ASYNC to copy data between data centers. I haven’t had much cause to tweak my network settings or change my SYNCMODE settings based on hardware/networking.
However, I have a chance to make use of these tools in several scenarios now, so I thought I would share what I’m finding. I do not claim to be the foremost expert on these tools. And there is an incredible amount of details on them available from IBM. For the full technical specifications and details on using the HADR tools, see:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/
http://www.ibm.com/developerworks/data/library/techarticle/dm-1310db2luwhadr/index.html?ca=dat-
I thought I would share my own journey with these tools to help others. Comments, corrections, additions, are all welcome in the comments form below.
What are the HADR tools?
IBM provides three major HADR tools on a devloperWorks wiki site.
The HADR Simulator is used to look both at disk speed and network details around HADR. It can be used in several different ways, including helping you to troubleshoot the way HADR does name resolution.
The DB2 Log Scanner is used to look at log files and report details about your DB2 Workload. The output is a bit cryptic, and this tool is best used in conjunction with the HADR Calculator. This does require real log files from a real workload, so if you’re setting up a new system, you will need to have actual work on the system before you can use it. Also, IBM will not provide the tool they use internally to uncompress automatically compressed log files, so if you want to use it, you’ll have to turn automatic log compression off. I tried to get the tool, they would not give it to me.
The HADR Calculator takes input from the DB2 Log Scanner, and values that you can compute using the HADR Simulator, and tells you which HADR SYNCMODEs make the most sense for you.
These three tools do NOT require that you have DB2 on a server to run – they are fully standalone. There are versions of the first two for each operating system. The third requires that you have perl, but can be run anywhere, including on a laptop or personal computer. This allows you flexibility in considering details of a network or server you are thinking of using before actually using it. And allows you to analyze log files without adding workload to a server.
Using the HADR Simulator
In this post, I’m going to focus on the HADR simulator.
First of all, the download and details can be found at: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20simulator. Note there are some child pages and links there with good detail there.
The HADR Simulator is a stand-alone tool. This means that you do not need DB2 on the servers in question. It is a binary executable. To use it, you simply download it from the link above to one or more servers. You can simulate primary-standby network interaction by running it on two servers at the same time. You can also run it on one server alone to look at things like disk performance.
Simulating HADR with the HADR Simulator
To use it in the main intended way, you download the right version for your OS, place it on each of the severs in question, make sure you have execute permission on it and execute it like this:
Primary:
simhadr_aix -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -role primary -syncmode NEARSYNC -t 60
Standby:
simhadr_aix -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -role standby
The ports in the above should be the ports you plan to use for HADR. However, you cannot use the same ports that HADR is currently running on if you happen to already be running HADR on the servers. If you try that, you will get output like this:
+ simhadr -lhost host1 -lport 18819 -rhost host2 -rport 18820 -role primary -syncmode NEARSYNC -t 60
Measured sleep overhead: 0.000004 second, using spin time 0.000004 second.
flushSize = 16 pages
Resolving local host host1 via gethostbyname()
hostname=host1
alias: host1.domain
address_type=2 address_length=4
address: 000.000.000.000
Resolving remote host host2 via gethostbyname()
hostname=host2
alias: host2.domain
address_type=2 address_length=4
address: 000.000.000.000
Socket property upon creation
BlockingIO=true
NAGLE=true
TCP_WINDOW_SCALING=32
SO_SNDBUF=262144
SO_RCVBUF=262144
SO_LINGER: onoff=0, length=0
Binding socket to local address.
bind() failed on local address. errno=67, Address already in use
You should be passing in the host names as you would use them with HADR. This allows the tool to show you how the names are resolving. The HADR calculator can be used for that purpose alone if you’re having name resolution issues. The ports that you pass in must be numbers – /etc/services or its equivalent is not consulted for port names if you’re using that.
The output from the HADR Simulator, invoked using the syntax above looks something like this:
Primary:
+ simhadr -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -role primary -syncmode NEARSYNC -t 60 Measured sleep overhead: 0.000004 second, using spin time 0.000004 second. flushSize = 16 pages Resolving local host host1.domain via gethostbyname() hostname=host1.domain alias: host1.domain.local address_type=2 address_length=4 address: 000.000.000.000 Resolving remote host host2.domain via gethostbyname() hostname=host2.domain alias: host2.domain.local address_type=2 address_length=4 address: 000.000.000.000 Socket property upon creation BlockingIO=true NAGLE=true TCP_WINDOW_SCALING=32 SO_SNDBUF=262144 SO_RCVBUF=262144 SO_LINGER: onoff=0, length=0 Binding socket to local address. Listening on local host TCP port 18821 Connected. Calling fcntl(O_NONBLOCK) Calling setsockopt(TCP_NODELAY) Socket property upon connection BlockingIO=false NAGLE=false TCP_WINDOW_SCALING=32 SO_SNDBUF=262088 SO_RCVBUF=262088 SO_LINGER: onoff=0, length=0 Sending handshake message: syncMode=NEARSYNC flushSize=16 connTime=2014-06-15_18:24:42_UTC Sending log flushes. Press Ctrl-C to stop. NEARSYNC: Total 18163171328 bytes in 60.000131 seconds, 302.718861 MBytes/sec Total 277148 flushes, 0.000216 sec/flush, 16 pages (65536 bytes)/flush Total 18163171328 bytes sent in 60.000131 seconds. 302.718861 MBytes/sec Total 277148 send calls, 65.536 KBytes/send, Total 0 congestions, 0.000000 seconds, 0.000000 second/congestion Total 4434368 bytes recv in 60.000131 seconds. 0.073906 MBytes/sec Total 277148 recv calls, 0.016 KBytes/recv Distribution of log write size (unit is byte): Total 277148 numbers, Sum 18163171328, Min 65536, Max 65536, Avg 65536 Exactly 65536 277148 numbers Distribution of log shipping time (unit is microsecond): Total 277148 numbers, Sum 59711258, Min 175, Max 3184, Avg 215 From 128 to 255 263774 numbers From 256 to 511 13335 numbers From 512 to 1023 23 numbers From 1024 to 2047 15 numbers From 2048 to 4095 1 numbers Distribution of send size (unit is byte): Total 277148 numbers, Sum 18163171328, Min 65536, Max 65536, Avg 65536 Exactly 65536 277148 numbers Distribution of recv size (unit is byte): Total 277148 numbers, Sum 4434368, Min 16, Max 16, Avg 16 Exactly 16 277148 numbers
Standby:
+ simhadr -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -role standby Measured sleep overhead: 0.000004 second, using spin time 0.000004 second. Resolving local host host2.domain via gethostbyname() hostname=host2.domain alias: host2.domain.local address_type=2 address_length=4 address: 000.000.000.000 Resolving remote host host1.domain via gethostbyname() hostname=host1.domain alias: host1.domain.local address_type=2 address_length=4 address: 000.000.000.000 Socket property upon creation BlockingIO=true NAGLE=true TCP_WINDOW_SCALING=32 SO_SNDBUF=262144 SO_RCVBUF=262144 SO_LINGER: onoff=0, length=0 Connecting to remote host TCP port 18821 connect() failed. errno=79, Connection refused Retrying. Connected. Calling fcntl(O_NONBLOCK) Calling setsockopt(TCP_NODELAY) Socket property upon connection BlockingIO=false NAGLE=false TCP_WINDOW_SCALING=32 SO_SNDBUF=262088 SO_RCVBUF=262088 SO_LINGER: onoff=0, length=0 Received handshake message: syncMode=NEARSYNC flushSize=16 connTime=2014-06-15_18:24:42_UTC Standby receive buffer size 64 pages (262144 bytes) Receiving log flushes. Press Ctrl-C on primary to stop. Zero byte received. Remote end closed connection. NEARSYNC: Total 18163171328 bytes in 59.998903 seconds, 302.725057 MBytes/sec Total 277148 flushes, 0.000216 sec/flush, 16 pages (65536 bytes)/flush Total 4434368 bytes sent in 59.998903 seconds. 0.073907 MBytes/sec Total 277148 send calls, 0.016 KBytes/send, Total 0 congestions, 0.000000 seconds, 0.000000 second/congestion Total 18163171328 bytes recv in 59.998903 seconds. 302.725057 MBytes/sec Total 613860 recv calls, 29.588 KBytes/recv Distribution of log write size (unit is byte): Total 277148 numbers, Sum 18163171328, Min 65536, Max 65536, Avg 65536 Exactly 65536 277148 numbers Distribution of send size (unit is byte): Total 277148 numbers, Sum 4434368, Min 16, Max 16, Avg 16 Exactly 16 277148 numbers Distribution of recv size (unit is byte): Total 613860 numbers, Sum 18163171328, Min 376, Max 65536, Avg 29588 From 256 to 511 166 numbers From 1024 to 2047 55614 numbers From 2048 to 4095 8845 numbers From 4096 to 8191 18028 numbers From 8192 to 16383 34458 numbers From 16384 to 32767 227758 numbers From 32768 to 65535 264416 numbers From 65536 to 131071 4575 numbers
Ok, that’s great, right, but what do I do with that?
Well, here’s one thing – you can tune your send and recieve buffers using this information. Run this process several times using different values for those like this:
./simhadr_aix -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -sockSndBuf 65536 -sockRcvBuf 65536 -role primary -syncmode NEARSYNC -t 60 ./simhadr_aix -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -sockSndBuf 65536 -sockRcvBuf 65536 -role standby
./simhadr_aix -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -sockSndBuf 131072 -sockRcvBuf 131072 -role primary -syncmode NEARSYNC -t 60 ./simhadr_aix -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -sockSndBuf 131072 -sockRcvBuf 131072 -role standby
./simhadr_aix -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -sockSndBuf 262144 -sockRcvBuf 262144 -role primary -syncmode NEARSYNC -t 60 ./simhadr_aix -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -sockSndBuf 262144 -sockRcvBuf 262144 -role standby
./simhadr_aix -lhost host1.domain -lport 18821 -rhost host2.domain -rport 18822 -sockSndBuf 524288 -sockRcvBuf 524288 -role primary -syncmode NEARSYNC -t 60 ./simhadr_aix -lhost host2.domain -lport 18822 -rhost host1.domain -rport 18821 -sockSndBuf 524288 -sockRcvBuf 524288 -role standby
In the line of output that looks like this:
NEARSYNC: Total 14220328960 bytes in 60.000083 seconds, 237.005155 MBytes/sec
Pull out the MBytes per second, and graph it like this:
In this example, it is clear that the throughput levels off at a buffer size of 128 K. Your results are likely to vary. To allow additional space, in this example, we would choose values of 256 KB, and set them using this syntax:
db2set DB2_HADR_SOSNDBUF=262144 db2set DB2_HADR_SORCVBUF=262144
This is the kind of thing I might never have gone into detail on if I didn’t blog. And yet it led to me changing parameters used and improving what I’m doing at work.
I am also interested in what I might do with some of the disk information supplied here. I sometimes have trouble getting disk information from hosting providers and, depending on the situation, there might be numbers here that I could use.
I’m really disappointed that IBM won’t share their internal log uncompression tool to use the log scanner – I’m not sure I can justify running with manually compressing logs just to run the logs scanner. Automatic log compressions is one of my favorite recent features. If I get the opportunity, I’ll play with that tool and blog about it too.
[…] The DB2 Log Scanner is one of three HADR tools provided by IBM on developerWorks. For more details on the other(s), see: HADR Tools: the HADR Simulator […]
Ember;
This looks like a great tool. Thanks for providing this information. Do you know or can you recommend a class that I could take to gain knowledge about HADR, etc? I’ve looked at all the general offerings but they all just cut and paste what IBM has on the subject. As an ex-IBM’r I pride myself in my DB2 knowledge and now it looks like my company will be implementing this.
Thanks,
Len.
I think this class is the official one that would cover it – likely offered by the normal IBM education providers:
https://www-03.ibm.com/services/learning/ites.wss/zz-en?pageType=course_description&cc=&courseCode=CL493G
HADR itself is actually pretty easy, and there’s a lot out there about it on the web and at conferences. When you add in TSAMP to automate failover, that gets more complicated, and I’ve never been able to find a good course on TSAMP.