Why Should a Database be Highly Available?
It is significantly easier to implement high availability at other levels than it is at the database level. Often the database server is one of the more powerful servers in an environment, and without some fancy footwork there can be only one live copy of data. Ignoring high availability at the database level due to expense can be a bankruptcy-inducing choice for an organization. Just imagine what would happen if your databases suddenly did not exist. Not only would your company be unable to perform daily business, but the data lost could actually drive a company out of business.
There are several concepts to be aware of when you’re considering the high availability options. Before starting a technical discussion, the business first must set goals and budgets. Nearly any uptime goal is possible given enough hardware and software (and therefore money). If you have a true business requirement for five 9’s, it can absolutely be done, but not on a small budget. In the vast majority of cases when a client states they have an absolute uptime requirement, upon questioning, we find areas where outages are allowable.
The first number to discuss and decide on is the Recovery Time Objective or RTO. The RTO defines how long from the time of failure detection it takes to get a database available again. Often this value is different for high availability events vs. disaster recovery events.
Next, consider the Recovery Point Objective or RPO. The RPO defines when that database becomes available again, how much data loss is considered acceptable. Often this value is different for high availability events vs. disaster recovery events.
Finally, the budget is an important consideration. For some databases, a simple restore to a spare server could meet the RTO and RPO and would be vastly cheaper than any more complicated high availability strategy.
High Availability vs. Disaster Recovery
When we are talking RTO and RPO, there may be different values decided on for High Availability(HA) events vs. Disaster Recovery(DR) events. Generally HA events are things like disk or server failures or developer errors. These often have a much lower tolerance for time or data loss because they are expected in the normal course of running infrastructure.
DR events are very, very rare. I once heard someone discussing a DR scenario as “What if Dallas wasn’t there any more?”. These are often going to be natural disasters, serious terrorist attacks, or other vanishingly rare events. We still plan for these, but usually they involve not just the database being failed over somewhere else, but also the application, web, and other servers. Speed of fail over is a bit less important here. Some data loss may be tolerated. There is nearly always a geographic distance involved to prevent events that affect a region from affecting both primary and standby sites.
Not all companies do both HA and DR plans – budget or the importance of a database can mean that neither are in place or only one or the other.
The Major HA and DR Options
There are actually a vast array of high availability and disaster recovery options available with IBM Db2. Some are offered by IBM and others may be offered by third parties. This is a review of the major options. It is not intended to be comprehensive – it is likely there are other solutions out there. I’d love to hear about solutions others are using in the comments.
HADR is the simplest and often cheapest option from the DBA’s perspective. It is a shared-nothing solution. It is resilient to a large number of types of failures. HADR can be thought of a lot like log shipping, but instead of shipping at the log file level, the shipping is done at a memory level. There are different levels of confirmation available from the principal standby before a transaction is considered complete on the primary. I like to think of it as transaction shipping.
HADR has one principal standby that can run in any of four synchronization modes. We can then add in up to two additional servers in the SUPERASYNC mode only. A two-server configuration can really only be for high availability or disaster recovery, not both. To serve both HA and DR needs, you need at least three total servers in at least two data centers.
There are four different synchronization modes (SYNCMODEs) that HADR can use. They control the potential for data loss and the potential for performance impact on your primary database. The modes are:
- SYNC – this guarantees that data written out to disk on your primary is also on disk on your principal standby before a transaction is considered successful (committed). It has a high probability of impacting performance on your primary database. Data for every transaction must go across the network, be written to disk on both servers, and the confirmation must be received back over the network before the commit succeeds. The network between your database servers must be very fast. It is the only mode that guarantees that even if both database servers crash at the same time, either database server will not loose any data. This mode is not optimal for DR purposes.
- NEARSYNC – this is the default and most popular mode for high availability. It ensures that before a commit is considered successful on the primary, it is first written to memory on the principal standby. This can still impact performance because you are placing the network in the middle of every database transaction. The situation in which you might loose data with this method is if both servers crash at the same time AND you cannot get the primary back up as the first server recovered. In that case, you may lose data that had been written to memory on the standby, but had not yet made it to disk. This mode is not optimal for DR purposes
- ASYNC – this is the first method that does not put the network in the middle of every commit. It ensures only that data is written to memory on the primary (local) database server to the queue to go to the standby server before allowing the commit to complete. Performance is impacted only in very high volume or very slow network environments. More data loss is possible. If the primary server is lost, data in the buffer on the primary server or on the network between the two servers can be lost. This mode is for DR purposes, and unlikely to be acceptable for HA purposes.
- SUPERASYNC – this is the only method that poses no risk of performance impact on the primary database. It does no extra work beyond what a stand-alone server does. This means the risk of data loss is high. This mode is for DR purposes, and unlikely to be acceptable for HA purposes. This is the only mode that can be used for auxiliary standbys.
There is a collection of HADR tools that IBM offers to help you determine the performance impact of HADR given your transaction log files and details about your network and disk environment. These tools can be used even before you install DB2 on servers.
Using TSAMP, which is included with editions of Db2 that include HADR, fail over can be automated – failures will be detected and fail over of both the database and a virtual IP address will be initiated automatically. Other software can be manually configured to perform this function, too. I have seen this done with PowerHA (HACMP). I have seen instructions on how to do it with MSCS. I’m sure it could be done with other server clustering software such as Veritas or Linux-HA. To use the standard TSAMP to automate fail over, the Db2 servers must be on the same subnet. Automation for fail over is only provided between the primary and the principal standby. Fail over is manual to any DR database server(s). Usually a DR fail over will include application servers along with the database fail over. Each database should have it’s own virtual IP when automating fail over with TSAMP. TSAMP can also be a bit touchy and problematic.
Any of the HADR standbys can allow reads, but the clients I have seen enable Reads on Standby (ROS) and tried to use it for anything other than very occasional purposes have hated it. See some of the restrictions on the standby when using ROS.
Because the standby is only a warm standby and not actively taking load, it has different licensing requirements than your primary database server. Under most licensing agreements, a server that only hosts a standby database without ROS enabled only has to be licensed at 100 PVUs – far lower that what is required for most database servers. Please verify this with IBM to ensure it is still true and that it is true in your geography.
During HADR failover, any current connections are rolled back, and their connections are re-established to the standby database (using Automatic Client Reroute – ACR). Some applications do a better job of supporting this than others do. I’ve seen multiple failovers during full load (not peak load) where there was no perceived outage of the database. I even had a client who let me apply fix packs during the day, it was so seamless for them. HADR requires an outage for major version upgrades, but for fix packs, rolling updates are possible.
To protect against human errors, one of the standbys can be on a time delay. It can be an hour or more behind the primary. If an error is discovered within that window, that standby can be broken out of the HADR cluster and queried for the correct data. Only HADR and replication have the possibility of offering protections for this kind of error – the other methods outlined here do not. Most DBAs have been through the nightmare of dealing with the data issues caused when someone ran an update or a delete against a database, but forgot the where clause. This can be one of the most crippling errors and one of the hardest to recover from. Not all HA solutions cover this kind of issue.
HADR with TSAMP meets the needs of most of the mid-size clients I work with. It is amazingly resilient, easy, and cheap. Often when I start discussing the expenses and technical details of PureScale with a client, we find that they are not as adverse to planned outages as they thought they were and that HADR meets their needs just fine.
Here’s a summary of HADR:
If we’re translating between RDBMSes, then this is most similar to Oracle’s Data Guard, though from what I’ve heard it is easier to implement, cheaper, and has more features than that product. I have not worked with Oracle Data Guard, that’s just the impression that I get from having discussed HADR with Oracle DBAs.
Solution: OS-level Shared Disk
The concept here is that the disks the database is on can be mounted on one of two servers. Db2 is installed on each server independently on local disk. The disk should have multiple protections against failure, as this method will not help with a disk, SAN, or RAID failure (all of which I’ve seen in my career). Here is a diagram of what this solution looks like:
The software used will also manage a virtual IP address and will automatically detect failovers and fail everything over. There are a number of different software packages you can use – ones I have seen for this in the real world include Veritas, Linux-HA, and PowerHA (HACMP). TSAMP, MSCS, and RHCS also have this functionality.
This method is really for HA and not DR. It might be possible with DR if you have super-fast disk replication of some sort between sites, but I have never seen it done that way. I have seen this used in one data center, with the other data center also having a HACMP two-server cluster, and HADR shipping the transactions between the data centers.
Db2 is only ever up on one of the servers at a time, so you only have to license it for one server.
Solution: IBM Db2 PureScale
PureScale is a highly scalable, shared-disk, active-active solution. With PureScale, the disks where the database resides are shared using GPFS among two or more servers. Multiple database servers can access the disks with locking and other shared aspects being managed by one or more Clustering Facilities (CF). Really, the minimum number of servers you probably want for this solution is 5. This is the only active-active clustering solution that IBM offers for Db2 that is appropriate for OLTP or other transaction processing databases. Here’s the layout of a solution:
I believe the current limit on the number of members in a PureScale cluster is 128. The scalability when using a high-speed interconnect is really good. When PureScale was first introduced, it required specialized hardware both for the servers and for the interconnect. Those restrictions have mostly been lifted over time, and you can even use TCP/IP for the interconnect now, though for production environments you’d still want to use ROCE(RDMA Over Converged Ethernet). The GPFS requirement also eliminates many hosting providers (such as Amazon EC2) – so this is a hard solution to do without dedicated hardware.
With all the components, this is a complicated solution. Talented DBAs can pick up the DBA side of it, but things like GPFS may require experts in other areas to be involved.
PureScale is an HA solution. You can add DR into the mix by adding an identical standby PureScale cluster in another location and keep it in sync with either HADR or with replication. Technically, if you have two data centers within 100 km of each other, you can do a Geographically Dispersed PureScale Cluster (GDPC), but this is so complicated that you should hire IBM Lab services to help you with it.
PureScale requires an Advanced edition of Db2. PureScale is not currently available for Db2 on Windows.
Solution: Virtual Machine full-server Fail Over
If you have a properly architected virtual environment that is not over-subscribed, then Db2 runs just great on virtual, and licensing is relatively painless as well. Poorly architected Virtual environments are the bane of a database administrator. DBAs have to learn some concepts to best work with a virtual environment.
Using a vMotion to move a database server in case of a hardware failure may be a viable option, but often the vMotion may cause a freeze on the server that tends to be a bit more noticeable on database servers. I have seen impacts of up to 30 seconds in practice. Use settings at the VM level that encourage VMWare (or your virtualization software) not to move a Db2 VM unless the other choice is an outage. Consider potential outages of things like SAN as well. I’ve seen a full SAN failure in a virtual environment, and the only option was to restore from backup.
This is really only a high availability solution. Some DR solutions for virtual servers can also cause freezes, so it is important to discuss that with your vendors. You might consider using HADR for DR if you’re relying on VMs for high availability. Often in a DR scenario, HADR can bring things up faster and with less data loss.
If you are combining solutions make sure that you use affinity rules to keep two or more database servers that are part of the other HA or DR solution from residing on the same host machine.
Solution: Replication (SQL, Q, CDC)
All of the other solutions covered here function at the database or server level. Replication is a different option, and is defined at the table or schema level. Because of that, it may require more effort to administer. SQL replication between Db2 servers is free for some editions of Db2. Q-replication requires additional licenses, but is faster, so if you have high transactional volumne, Q-replication may be required. Here’s a diagram of what this solution looks like:
There are several advantages to this approach. First, since you are replicating at the table level, your “standby” database can be fully queried and can even have different indexes. This makes it possible to use your standby database as a reporting server.
While Bi-directional replication could let you update either side, the lag is enough that this is problematic in practice. Usually tables are only queried on other databases, and not updated.
The flexibility also means that you can copy tables to multiple databases or copy different tables to different databases.
The huge advantage with this approach is that the databases can be on different versions of Db2. This is the only solution anywhere that can keep a database up through a version-level upgrade. If you have a crazy 100% uptime requirement, this is the only solution with which you have a chance of meeting that.
The counterpoint to that RTO win is an RPO loss. It is difficult to define the possibility for data loss with this solution. There is no guarantee that if you commit a transaction on one server that data makes it to another server. You can, however, monitor the lag.
Fail over is entirely manual – you’ll have to have it coded into your application in some way.
This solution is more appropriate for DR than for HA, but can be used for either if the RPO meets your needs. You can also combine it with any of the options above.
The technical complexity here is in setting up replication on a table-by-table basis, managing the addition of new tables and changes to existing tables. There is also additional complexity if you use Q-replication because MQ series is also needed, and you may need the help of a specialist with it. A summary of Replication as a HA/DR solution:
The options outlined here are by no means comprehensive. There are hundreds of combinations of solutions to attain high availability and disaster recovery, and new ones offered by vendors over time. Even if you’re using one of these solutions, you still need to regularly back up your databases. Backups can be used for data movement, worst-case recovery scenarios, and recovering from human error.