Posted by

It has taken me a while to fully understand the difference between HADR_TIMEOUT and HADR_PEER_WINDOW. I think there is some confusion here, so I’d like to address what each means and some considerations when setting them. In general, you’ll only need HADR_TIMEOUT when using HADR and only need HADR_PEER_WINDOW when using TSA(db2haicu) or some other automated failover tool.


HADR Timeout defines, in seconds, the time after unavailability of the other HADR server is first noticed that the HADR state will change from connected to disconnected. If you are starting HADR on the primary server, then if the primary server cannot connect to the standby in this number of seconds, the start will fail and HADR will not be running. Assuming no failover software and the setting of HADR_PEER_WINDOW to 0, The primary server will continue processing transactions without sending them to the standby. It will periodically retry the connection to the standby, and if the standby becomes available it will again start processing transactions with commits tied to the requirements of the SYNCMODE being used.

If attempting a takeover without force, DB2 will wait this amount of time to attempt to communicate with the other server before failing and returning an error message.

The real point of this time period is to allow minor network hiccups to occur without other action being taken, but yet to consider the connection failed so as not to impede transactions after a reasonable period of time.

Setting this value depends on your network. I have a client with frequent network issues where I keep this value at 300. I have other clients where I use simply 120, which seems to work well for most environments. I have seen it set as low as 10 seconds for a very highly available network where seconds of slowdown are not very acceptable, but would be very cautious setting it that low.


This parameter is not usually used when only HADR is in place with manual failover. But it is critical if using an automated failover for HADR such as TSA(db2haicu) or others. This tells DB2 how long AFTER the connection is considered failed to continue to behave as if the connection were not failed. Now that may sound a bit odd. But the real intention here is to allow the connection to be considered failed, and then give time for that failure to be detected by the failover automation software before any transactions are allowed to complete and compromise the data. This means you can easily have connections waiting for as much as HADR_TIMEOUT plus HADR_PEER_WINDOW before a failover is completed and your database is again available.

Most frequently I see HADR_PEER_WINDOW set to 300 out of an abundance of caution – actual takeovers do not generally take that long, though in a failure state there may be multiple factors slowing down the failover.

Lead Db2 Database Engineer and Service Delivery Manager , XTIVIA
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 17 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home for XTIVIA, leading a team of Db2 DBAs.


  1. Good to know !
    We have just implemented HADR + TSA and our settings are

    HADR timeout value (HADR_TIMEOUT) = 20
    HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
    HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 60

    Nobody has complained, so far and HADR + TSA behave as expected (very well) -for now-.
    Let’s give hope a chance. πŸ™‚

    Thanks for sharing, Amber.

  2. in our db2 hadr / haicu system , db2 acts so weird if there was a network issue, db2 initiate takeover , then new primary will crash, hadr shutdown to avoid split brain later takeover back to old primary then both will be in a hung state.
    manual takover works perfect, also any system down scenario. having issues only with network failure. we are using VIP also.
    i found both HADR_TIMEOUT and HADR_PEER_WINDOW is 120 secs, does that cause this issues ?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.