High Availability Options for Db2 LUW

Posted by

Why Should a Database be Highly Available?

It is significantly easier to implement high availability at other levels than it is at the database level. Often the database server is one of the more powerful servers in an environment, and without some fancy footwork there can be only one live copy of data. Ignoring high availability at the database level due to expense can be a bankruptcy-inducing choice for an organization. Just imagine what would happen if your databases suddenly did not exist. Not only would your company be unable to perform daily business, but the data lost could actually drive a company out of business.

Concepts

There are several concepts to be aware of when you’re considering the high availability options. Before starting a technical discussion, the business first must set goals and budgets. Nearly any uptime goal is possible given enough hardware and software (and therefore money). If you have a true business requirement for five 9’s, it can absolutely be done, but not on a small budget. In the vast majority of cases when a client states they have an absolute uptime requirement, upon questioning, we find areas where outages are allowable.

RTO

The first number to discuss and decide on is the Recovery Time Objective or RTO. The RTO defines how long from the time of failure detection it takes to get a database available again. Often this value is different for high availability events vs. disaster recovery events.

RPO

Next, consider the Recovery Point Objective or RPO. The RPO defines when that database becomes available again, how much data loss is considered acceptable. Often this value is different for high availability events vs. disaster recovery events.

Budget

Finally, the budget is an important consideration. For some databases, a simple restore to a spare server could meet the RTO and RPO and would be vastly cheaper than any more complicated high availability strategy.

High Availability vs. Disaster Recovery

When we are talking RTO and RPO, there may be different values decided on for High Availability(HA) events vs. Disaster Recovery(DR) events. Generally HA events are things like disk or server failures or developer errors. These often have a much lower tolerance for time or data loss because they are expected in the normal course of running infrastructure.

DR events are very, very rare. I once heard someone discussing a DR scenario as “What if Dallas wasn’t there any more?”. These are often going to be natural disasters, serious terrorist attacks, or other vanishingly rare events. We still plan for these, but usually they involve not just the database being failed over somewhere else, but also the application, web, and other servers. Speed of fail over is a bit less important here. Some data loss may be tolerated. There is nearly always a geographic distance involved to prevent events that affect a region from affecting both primary and standby sites.

Not all companies do both HA and DR plans – budget or the importance of a database can mean that neither are in place or only one or the other.

The Major HA and DR Options

There are actually a vast array of high availability and disaster recovery options available with IBM Db2. Some are offered by IBM and others may be offered by third parties. This is a review of the major options. It is not intended to be comprehensive – it is likely there are other solutions out there. I’d love to hear about solutions others are using in the comments.

Solution: HADR

HADR is the simplest and often cheapest option from the DBA’s perspective. It is a shared-nothing solution. It is resilient to a large number of types of failures. HADR can be thought of a lot like log shipping, but instead of shipping at the log file level, the shipping is done at a memory level. There are different levels of confirmation available from the principal standby before a transaction is considered complete on the primary. I like to think of it as transaction shipping.

An HADR cluster consists of minimally the primary database server and at least one standby database server from this diagram:
HADR2

HADR has one principal standby that can run in any of four synchronization modes. We can then add in up to two additional servers in the SUPERASYNC mode only. A two-server configuration can really only be for high availability or disaster recovery, not both. To serve both HA and DR needs, you need at least three total servers in at least two data centers.

There are four different synchronization modes (SYNCMODEs) that HADR can use. They control the potential for data loss and the potential for performance impact on your primary database. The modes are:

  • SYNC – this guarantees that data written out to disk on your primary is also on disk on your principal standby before a transaction is considered successful (committed). It has a high probability of impacting performance on your primary database. Data for every transaction must go across the network, be written to disk on both servers, and the confirmation must be received back over the network before the commit succeeds. The network between your database servers must be very fast. It is the only mode that guarantees that even if both database servers crash at the same time, either database server will not loose any data. This mode is not optimal for DR purposes.
  • NEARSYNC – this is the default and most popular mode for high availability. It ensures that before a commit is considered successful on the primary, it is first written to memory on the principal standby. This can still impact performance because you are placing the network in the middle of every database transaction. The situation in which you might loose data with this method is if both servers crash at the same time AND you cannot get the primary back up as the first server recovered. In that case, you may lose data that had been written to memory on the standby, but had not yet made it to disk. This mode is not optimal for DR purposes
  • ASYNC – this is the first method that does not put the network in the middle of every commit. It ensures only that data is written to memory on the primary (local) database server to the queue to go to the standby server before allowing the commit to complete. Performance is impacted only in very high volume or very slow network environments. More data loss is possible. If the primary server is lost, data in the buffer on the primary server or on the network between the two servers can be lost. This mode is for DR purposes, and unlikely to be acceptable for HA purposes.
  • SUPERASYNC – this is the only method that poses no risk of performance impact on the primary database. It does no extra work beyond what a stand-alone server does. This means the risk of data loss is high. This mode is for DR purposes, and unlikely to be acceptable for HA purposes. This is the only mode that can be used for auxiliary standbys.

There is a collection of HADR tools that IBM offers to help you determine the performance impact of HADR given your transaction log files and details about your network and disk environment. These tools can be used even before you install DB2 on servers.

Using TSAMP, which is included with editions of Db2 that include HADR, fail over can be automated – failures will be detected and fail over of both the database and a virtual IP address will be initiated automatically. Other software can be manually configured to perform this function, too. I have seen this done with PowerHA (HACMP). I have seen instructions on how to do it with MSCS. I’m sure it could be done with other server clustering software such as Veritas or Linux-HA. To use the standard TSAMP to automate fail over, the Db2 servers must be on the same subnet. Automation for fail over is only provided between the primary and the principal standby. Fail over is manual to any DR database server(s). Usually a DR fail over will include application servers along with the database fail over. Each database should have it’s own virtual IP when automating fail over with TSAMP. TSAMP can also be a bit touchy and problematic.

Any of the HADR standbys can allow reads, but the clients I have seen enable Reads on Standby (ROS) and tried to use it for anything other than very occasional purposes have hated it. See some of the restrictions on the standby when using ROS.

Because the standby is only a warm standby and not actively taking load, it has different licensing requirements than your primary database server. Under most licensing agreements, a server that only hosts a standby database without ROS enabled only has to be licensed at 100 PVUs – far lower that what is required for most database servers. Please verify this with IBM to ensure it is still true and that it is true in your geography.

During HADR failover, any current connections are rolled back, and their connections are re-established to the standby database (using Automatic Client Reroute – ACR). Some applications do a better job of supporting this than others do. I’ve seen multiple failovers during full load (not peak load) where there was no perceived outage of the database. I even had a client who let me apply fix packs during the day, it was so seamless for them. HADR requires an outage for major version upgrades, but for fix packs, rolling updates are possible.

To protect against human errors, one of the standbys can be on a time delay. It can be an hour or more behind the primary. If an error is discovered within that window, that standby can be broken out of the HADR cluster and queried for the correct data. Only HADR and replication have the possibility of offering protections for this kind of error – the other methods outlined here do not. Most DBAs have been through the nightmare of dealing with the data issues caused when someone ran an update or a delete against a database, but forgot the where clause. This can be one of the most crippling errors and one of the hardest to recover from. Not all HA solutions cover this kind of issue.

HADR with TSAMP meets the needs of most of the mid-size clients I work with. It is amazingly resilient, easy, and cheap. Often when I start discussing the expenses and technical details of PureScale with a client, we find that they are not as adverse to planned outages as they thought they were and that HADR meets their needs just fine.

Here’s a summary of HADR:
HADR_Summary
If we’re translating between RDBMSes, then this is most similar to Oracle’s Data Guard, though from what I’ve heard it is easier to implement, cheaper, and has more features than that product. I have not worked with Oracle Data Guard, that’s just the impression that I get from having discussed HADR with Oracle DBAs.

Solution: OS-level Shared Disk

The concept here is that the disks the database is on can be mounted on one of two servers. Db2 is installed on each server independently on local disk. The disk should have multiple protections against failure, as this method will not help with a disk, SAN, or RAID failure (all of which I’ve seen in my career). Here is a diagram of what this solution looks like:
SharedDisk
The software used will also manage a virtual IP address and will automatically detect failovers and fail everything over. There are a number of different software packages you can use – ones I have seen for this in the real world include Veritas, Linux-HA, and PowerHA (HACMP). TSAMP, MSCS, and RHCS also have this functionality.

This method is really for HA and not DR. It might be possible with DR if you have super-fast disk replication of some sort between sites, but I have never seen it done that way. I have seen this used in one data center, with the other data center also having a HACMP two-server cluster, and HADR shipping the transactions between the data centers.

Here’s a summary of the OS-level shared disk solution:
SharedDisk_Summary

Db2 is only ever up on one of the servers at a time, so you only have to license it for one server.

Solution: IBM Db2 PureScale

PureScale is a highly scalable, shared-disk, active-active solution. With PureScale, the disks where the database resides are shared using GPFS among two or more servers. Multiple database servers can access the disks with locking and other shared aspects being managed by one or more Clustering Facilities (CF). Really, the minimum number of servers you probably want for this solution is 5. This is the only active-active clustering solution that IBM offers for Db2 that is appropriate for OLTP or other transaction processing databases. Here’s the layout of a solution:
PureScale
I believe the current limit on the number of members in a PureScale cluster is 128. The scalability when using a high-speed interconnect is really good. When PureScale was first introduced, it required specialized hardware both for the servers and for the interconnect. Those restrictions have mostly been lifted over time, and you can even use TCP/IP for the interconnect now, though for production environments you’d still want to use ROCE(RDMA Over Converged Ethernet). The GPFS requirement also eliminates many hosting providers (such as Amazon EC2) – so this is a hard solution to do without dedicated hardware.

With all the components, this is a complicated solution. Talented DBAs can pick up the DBA side of it, but things like GPFS may require experts in other areas to be involved.

PureScale is an HA solution. You can add DR into the mix by adding an identical standby PureScale cluster in another location and keep it in sync with either HADR or with replication. Technically, if you have two data centers within 100 km of each other, you can do a Geographically Dispersed PureScale Cluster (GDPC), but this is so complicated that you should hire IBM Lab services to help you with it.

PureScale requires an Advanced edition of Db2. PureScale is not currently available for Db2 on Windows.

Here’s a summary of the PureScale solution:
PureScale_Summary
PureScale is IBM’s answer to Oracle RAC. PureScale is much younger, but each server may get you more bang for your buck.

Solution: Virtual Machine full-server Fail Over

If you have a properly architected virtual environment that is not over-subscribed, then Db2 runs just great on virtual, and licensing is relatively painless as well. Poorly architected Virtual environments are the bane of a database administrator. DBAs have to learn some concepts to best work with a virtual environment.

Using a vMotion to move a database server in case of a hardware failure may be a viable option, but often the vMotion may cause a freeze on the server that tends to be a bit more noticeable on database servers. I have seen impacts of up to 30 seconds in practice. Use settings at the VM level that encourage VMWare (or your virtualization software) not to move a Db2 VM unless the other choice is an outage. Consider potential outages of things like SAN as well. I’ve seen a full SAN failure in a virtual environment, and the only option was to restore from backup.

This is really only a high availability solution. Some DR solutions for virtual servers can also cause freezes, so it is important to discuss that with your vendors. You might consider using HADR for DR if you’re relying on VMs for high availability. Often in a DR scenario, HADR can bring things up faster and with less data loss.

If you are combining solutions make sure that you use affinity rules to keep two or more database servers that are part of the other HA or DR solution from residing on the same host machine.

Solution: Replication (SQL, Q, CDC)

All of the other solutions covered here function at the database or server level. Replication is a different option, and is defined at the table or schema level. Because of that, it may require more effort to administer. SQL replication between Db2 servers is free for some editions of Db2. Q-replication requires additional licenses, but is faster, so if you have high transactional volumne, Q-replication may be required. Here’s a diagram of what this solution looks like:
Replication
There are several advantages to this approach. First, since you are replicating at the table level, your “standby” database can be fully queried and can even have different indexes. This makes it possible to use your standby database as a reporting server.

While Bi-directional replication could let you update either side, the lag is enough that this is problematic in practice. Usually tables are only queried on other databases, and not updated.

The flexibility also means that you can copy tables to multiple databases or copy different tables to different databases.

The huge advantage with this approach is that the databases can be on different versions of Db2. This is the only solution anywhere that can keep a database up through a version-level upgrade. If you have a crazy 100% uptime requirement, this is the only solution with which you have a chance of meeting that.

The counterpoint to that RTO win is an RPO loss. It is difficult to define the possibility for data loss with this solution. There is no guarantee that if you commit a transaction on one server that data makes it to another server. You can, however, monitor the lag.

Fail over is entirely manual – you’ll have to have it coded into your application in some way.

This solution is more appropriate for DR than for HA, but can be used for either if the RPO meets your needs. You can also combine it with any of the options above.

The technical complexity here is in setting up replication on a table-by-table basis, managing the addition of new tables and changes to existing tables. There is also additional complexity if you use Q-replication because MQ series is also needed, and you may need the help of a specialist with it. A summary of Replication as a HA/DR solution:
Replication_Summary

Conclusion

The options outlined here are by no means comprehensive. There are hundreds of combinations of solutions to attain high availability and disaster recovery, and new ones offered by vendors over time. Even if you’re using one of these solutions, you still need to regularly back up your databases. Backups can be used for data movement, worst-case recovery scenarios, and recovering from human error.

Lead Db2 Database Engineer and Service Delivery Manager , XTIVIA
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 17 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home for XTIVIA, leading a team of Db2 DBAs.

11 comments

  1. Hi Ember, thanks for the article, really nice summarization of available HA/DR solutions!

    Got a couple of questions wondering if you could share some insight:
    1. for HADR, is vIP really necessary, besides reducing client side re-routing complexity, and split-brain? in multi-subnet deployment (eg. muti-Availability Zone in AWS), it seems TSA may not be able to manage the vip through IBM.ServiceIP (https://developer.ibm.com/answers/questions/358520/in-a-db2-hadr-environment-does-tsamp-allow-for-nod/ and https://developer.ibm.com/answers/questions/258835/can-tsamp-move-or-failover-an-ibmservice-ip-resour.html?childToView=423565#answer-423565), nor does it really check if the vip is really attached to host. IBM.Application may still be a potential option, which requires customized scripting…
    2. assuming performant shared storage is not available (eg. in AWS excluding third party software), do you have recommendations for log file management, especially on hardening archive logs out of Primary? We could use cronjob and do customized log-shipping, but there’s still the risk losing archive logs that are not shipped yet, and also available in archive log dir (LOGARCHMETH1). (Well, we still haven’t totally figured out how active log purging works in detail on Standby, but according to IBM site Standby log files can be purged once Standby gets the confirmation from Primary that the log file has been archived…. what exactly does “has been archived” mean? but that’s tangent to may main questions here….) We could turn on both LOGARCHMETH1 and LOGARCHMETH2, but that only provide protection within a single AWS AZ, and won’t survive AWS AZ-wise EBS failure. One idea we are considering is to use EFS or block storage replication (eg. drbd) to make LOGARCHMETH2 contents synchronously available across AZ. We did the calculation, and overall the EFS and drbd transfer rate should be able to catch up with our logging rate.
    3. In general, could you point us to some best practice reference on multi-AZ HADR deployment on AWS?

    Thanks again for your help, and this awesome site!

    1. 1) The VIP is not required for HADR. You can also use ACR to automatically reroute connections to the other database server in the event of the failover. Using ACR requires either the use of type 2 jdbc drivers or any kind of architecture that goes through a full Db2 client on the application server. Some applications that do not meet that criteria (such as using type 4 jdbc drivers with WebSphere) offer the option within the application for an alternate database that the application will use if it cannot contact the primary. Overall, the value of automated failover comes into question. You can absolutely use HADR without automating failover, but that means an actual DBA must be involved if there is a need to bring up the standby. Some applications, that meets the RTO, but for many it does not.

      2) If you’re trying to externalize logs as fast as possible, you can consider using MIRRORLOGPATH to write to some external storage. However, this is likely to have a severe performance impact, as any commit will not be considered successful until the confirmation is received that the record has been written to both places. Often the write performance on any external storage is going to make that bad. I would do everything in my power to talk you out of roll-your-own log shipping. The RPO is just not there in most solutions that I have seen.

      An archive is a specific concept related to a log file. Assuming you don’t use infinite logging (with LOGARCHMETH set to -1 – never use this, please), a log file is archived when all transactions in it and all previous log files have been committed or rolled back, or on database backup, or when the ARCHIVE LOG command is issued. It represents Db2 being done with the file. In a slow database, it could be days since that log file was first opened, so using any archiving as a log shipping method does not provide a reasonable RPO.

      Db2 deletes any log files it is done with on the standby – you cannot archive logs from the standby to anywhere.

      3) First of all, if you want both HA and DR, then have one standby in the same AZ for HA, and then have one or two standbys in the other AZ. This requires manual failover when one AZ must pick up for another, but those are presumably rare and often scheduled events, right? If you are only trying to achieve DR with the multi-zone HADR, then two servers will work fine, but you’ll want to be careful in selecting the SYNCMODE – it’s likely that NEARSYNC will not work and you’ll want either ASYNC or SUPERASYNC. That increases your risk for a bad RPO. I don’t have a reference for AWS. I’ll reach out to a contact and see if IBM has anything.

      1. Hi Ember,
        Really appreciate your reply, and many helpful tips in there! Just to provide a little bit more context for you and future readers. btw, we currently use luw V10.5FP8 on Linux.

        1)
        ACR or ClientAffinity may not work for us, since we are considering using ROS (https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/Client%20Reroute section Reads on Standby)…..

        For client re-route, our current thought is to make HADR nodes publish failover/role-switch event to some config-hub, which would then push new Primary/Standby connection info to application. Application then updates the connectionPools with new connection info.

        Another less favored option is still using vIP exposed to application, but vIP just won’t be managed by IBM.ServiceIP. The second approach is less favored since a lot of customized toolings are required for the orchestration, and RTO would be at least the time to failover the vIP, plus the vIP failover operation would be SPOF by itself.

        2)
        We are not trying to externalize active logs, which should be synchronously available on Standby anyway (assuming PEER state, and of course there’s always the risk of data loss when falling out of PEER….. Actually, MIRRORLOGPATH, using locally mounted EBS, may not be a totally bad idea, which provides an extra copy of active logs/protection on Primary within the same AZ when HADR pair falls out of PEER state). The goal here is to make archived logs survive AWS AZ-wise EBS failure. We don’t rely on log-shipping to meeting the RPO. Log-shipping uses cheaper storage to reduce cost, and servers mostly for log file management purpose. It should be relatively rare when we really need those archived logs, except when QRepl needs them, or when we need to reinit extra Standby, but just in case….

        Agree with not using infinite logging, which is not allowed by HADR anyway.(https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.messages.sql.doc/doc/msql01768n.html reason code 2, and https://www.ibm.com/developerworks/community/blogs/IMSupport/entry/weekly_tip_from_experts_why_infinite_logging_is_not_supported_in_hadr_configuration?lang=en)

        3) Ideally we’d love to have multiple Standby, and to have NEARSYNC (HA) across AZ, but if the perf is not satisfying, then we can only have HA within the same AZ….

        wooo, that’s a long one…. Big thanks again for your detailed explanation!

        1. Be careful with ROS – everyone I’ve ever seen use it extensively hated it. It is simply not the reporting solution that some want to make it. (see High Level Overview of Reads on the HADR Standby.

          Db2 writes to the transaction log files before a commit is considered successful, so I suspect MIRRORLOGPATH there would cause serious performance issues. If it’s a last-line-of defense approach, using LOGARCHMETH2 might be a decent option, assuming you spin through logs on a fairly regular basis. You’d likely want to keep your logfilsiz low.

          If you’re considering NEARSYNC for cross-AZ stuff, I’d try the HADR tools first to see the performance impact. Generally, the primary and the standby for SYNC or NEARSYNC need to be within 100KM of each other on a high speed network to avoid performance impact. Perhaps the AZ model works that way within the same region, but I don’t know if it is guaranteed to.

          1. 1)
            Yep, we got warned about ROS many times, and that’s why we still keep QRepl…. but it’s so attempting to use ROS, since we already pay for it……

            2)
            found this link suggesting the loggw double flush introduced by MIRRORLOGPATH is serial, which sounds less ideal (doubled commit time for each TX….) https://developer.ibm.com/answers/questions/365855/impact-of-log-mirroring-on-performance.html but otherwise MIRRORLOGPATH sounds something nice to have.

            couple of follow up question:
            2.1) is “keep your logfilsiz low” a general best practice, or specific to LOGARCHMETH2? Our logfilsz=16384 (namely, 64MB log file size). How do we estimate if the logfilsiz is appropriate?

            2.2) Could you please elaborate a little bit more on the meaning of “spin through logs on a fairly regular basis”? not sure i understand (#newbie)…. is that something tie to LOGARCHMETH2?

            3)
            love the HADR tooling. We use db2logscan daily and model the network perf using bandwidth 100MB/sec (according to the simulator and other references, eg. https://stackoverflow.com/questions/18507405/ec2-instance-typess-exact-network-performance ) and ping time at 0.3ms (we found 2 AZs seem close enough, but haven’t found any official document from AWS on average ping-time history across AZs. btw, distance of 100KM is equivalent to about at least 0.67ms ping time), and rarely see ??, though ? still pops up from time to time, but we assume that’s acceptable.

          2. 2) Right, mirroring logs can add slowness a the most critical point in an OLTP environment. If you’re only using it for a worst-case scenario, and you have a regular HA solution in place, I’d recommend not using MIRRORLOGPATH. However, I know talented DBAs who ALWAYS mirror logs because they are afraid of corruption.
            2.1 & 2.2) I say that about log files because if you wanted to use LOGARCHMETH2 to externalize log files, you wouldn’t want a log file open for days. Generally you want to look at how often you’re archiving and aim for about 4 archives per hour on average if you can. That’s not a hard and fast number, so it’s not bad if you’re not hitting it, and every database has peaks and valleys. There are other things that impact the best size – sometimes log files are larger to accommodate ridiculously large transactions. You also keep LOGPRIMARY on the smaller side, while LOGSECOND is often larger.

  2. What are your thoughts on setting up a VIP on a load balancer, instead of doing this with db2haicu? I would prefer to avoid TSAMP if possible because of what I have heard about it. The requirements I have is no automated failover, just manual. The main concern I have is the connections are so varied that I don’t think ACR will work well. I think my best option is to setup a VIP so the clients have one place to connect to, and then if I do a manual failover I can redirect the VIP to the new Primary database. I am not sure the best way to configure this.

    1. As long as you’re OK with manual failover, having a VIP via some other method is perfectly fine. You might configure ACR anyway for the clients that support it, but point to the VIP – this means supporting apps can retry against the VIP, which could be useful. You can even use a tool like Power-HA, RHCS, or MSCS to automate IP failover along with DB2 failover, though that is vastly more complicated than using TSAMP. From a usability perspective one of the awesome things about using TSAMP is that I can just issue the normal HADR takeover commands and TSAMP takes care of the IP and such.

      You obviously run the risk of the VIP failing over without the DB or vice-versa.

      In the real world, there are a kaleidoscope of technologies that are often combined in different ways to make HA work.

  3. Hi Amber,

    My DR DB is remote catchup state. It has a huge gap logs of around 780

    PRIMARY_LOG_FILE,PAGE,POS = S0021359.LOG, 116449, 34233627172131
    STANDBY_LOG_FILE,PAGE,POS = S0020570.LOG, 8190, 32968617411284

    Without droping and restoring of DB on DR is there any possible way to bring DR DB back to peer state.

    1. I suspect you need to reinitialize (restore DB on DR). Is the log gap dropping at all? Have you tried stopping and starting HADR? Have you looked in detail at the diagnostic logs from both the primary and the standby. Is your network speed high to your DR site?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.