Speeding up DB2 Backups

Posted by

It’s a question I hear frequently – How can I speed up backups? I thought I’d share some details on doing so.

Database Layout

Any backup cannot be faster than it takes to back up the largest tablespace. Parallelism is essentially done on the tablespace level – meaning one thread per tablespace. That means that if you have the majority of the data in a database in a single tablespace, then your backups will be slower than if you had data evenly spread across a number of smaller tablespaces.

Obviously there are a lot of factors that go into tablespace design. But one of the big ones I now take into consideration is backup speed.

WebSphere Commerce is a vended database that by default places all data into one or two tablespaces. This makes things “easy”, but it also makes it so many non-altered WCS databases could have backups that run much faster. Some vendors will not let you change tablespace layout, but WCS does, so if you’re looking at a significantly sized WCS implementation, you will likely want to customize your tablespace layout – for this and other reasons. Smaller WCS implementations it won’t matter as much for.

Other vendors have different rules on whether you can customize your tablespace layout at all. If it’s a custom database, you should certainly not just plop everything in USERSPACE1, especially for databases of 50 GB or greater.

If you have a database that is currently focused on a single tablespace, you can move tables online from tablespace using ADMIN_MOVE_TABLE. Due to RI restrictions, you’ll want to be on 10.1, fixpack 2 or higher to make this reasonable. Particularly high-activity tables may be more problematic to get the final lock on for the move.

UTIL_HEAP_SZ

When a backup starts, DB2 allocates half of the remaining utility heap for the backup operation (this is where the buffers are allocated). This means that you not only must have at least twice the amount of memory that a backup needs allocated to the utility heap, you must also be aware of the usage of the utility heap by other utilities – LOAD, REDISTRIBUTE, etc. See my entry on the utility heap for more information.

But how do you determine the memory backup needs? See the section on tuning the number and size of backup buffers below for details both on seeing what values DB2 chooses and some thoughts on calculating these for yourself.

Tuning Backup Parameters

The traditional wisdom says to let DB2 choose the values for paralellism and the number and size of backup buffers. While that’s good advice 95% of the time, it’s still good to have some ideas and tools to identify and deal with the edge cases where you may want to tune them manually.

Parallelism

Seeing What DB2 Chooses

You can parse the DB2 diagnostic log to see what DB2 has chosen for past backups. The value of this data depends on how much data you have in your diag log and how many backups have been taken with your current memory/disk/cpu configuration. A command like this will show you what DB2 has chosen in the past:
10.1 and earlier:

grep "Autonomic BAR - using parallelism .*"  ~/sqllib/db2dump/db2diag.log

10.5:

grep "Autonomic backup/restore - using parallelism ="  ./db2diag.log

The above assumes a default location for the DB2 diagnostic log on a Linux/UNIX system. Here’s an example of the output you might use:

$ grep "Autonomic backup/restore - using parallelism ="  ./db2diag.log
Autonomic backup/restore - using parallelism = 2.
Autonomic backup/restore - using parallelism = 2.
Autonomic backup/restore - using parallelism = 5.
Autonomic backup/restore - using parallelism = 5.
Autonomic backup/restore - using parallelism = 10.
Autonomic backup/restore - using parallelism = 10.
Autonomic backup/restore - using parallelism = 10.
Autonomic backup/restore - using parallelism = 10.

In this case, DB2 is choosing different levels of parallelism at different times, but it settles down to most of the time at 5 or 10, which is reasonable given the size and layout of this database.

Thoughts on Manual Tuning

If you’re going to try manual tuning here, the two main things to consider are the number of tablespaces of a reasonable size (since each tablespace can only be addressed by a single thread), and the number of CPUs on the server. As an example, I was recently dealing with a database that had about 50 tablespaces, with about 40 or so of them having significant amounts of data. The server the backup was being taken on had 64 CPUs, and this database was the only real load on the server. For purposes of this backup, I didn’t really care about leaving much overhead for other things (backup was offline). For that environment, I would choose a parallelism of 40. If it were an online backup, I would likely have chosen a lower number based on the other load I saw on the server.

Number and Size of Buffers

Again, DB2 often makes the best choice for you, and it’s rare you’ll have to try to do anything manually. You’re more likely to slow the backup down than speed it up by giving it manual values instead of letting it choose the optimum.

Seeing What DB2 Chooses

Again, you can parse the DB2 diagnostic log to see what DB2 is choosing. This grep command works well:

grep "Using buffer size = .*, number = .*"  ./db2diag.log

It produces output like this:

$ grep "Using buffer size = .*, number = .*"  ./db2diag.log
Using buffer size = 4097, number = 2.
Using buffer size = 4097, number = 2.
Using buffer size = 3297, number = 10.
Using buffer size = 3297, number = 10.
Using buffer size = 4097, number = 10.
Using buffer size = 4097, number = 10.
Using buffer size = 4097, number = 10.
Using buffer size = 4097, number = 10.

Thoughts on Manual Tuning

If you are manually selecting values, the buffer size should be a multiple of the extent size of the largest tablespace. I like 4096 – nice round number if I don’t have anything else to start with. You generally want two buffers per thread (from the parallelism, above), and maybe an extra two for good measure. So using the system details from above – that database had a really small extent size of 2 for nearly every tablespace. I chose 82 buffers of size 4096.

Backup Location

One way to speed up a backup is to throw hardware at it, of course. If backup speed is important, SSD or an SSD cache may be useful. Though if you’re going for overall performance impact, limited SSD resources may better be spent on active transaction log space. Still, when you can get it, pure SSD sure is fun. I have 220 GB Windows database that is on pure SSD, having separate SSD drive arrays for data and for backup. It has ample memory and CPU too, and I can back that sucker up in 20 minutes.

For recoverability reasons, your backups should be on separate disk from your data and transaction logs, and it’s even better if they’re externalized immediately using a tool like TSM or some other storage library with a high-speed cache. You have to be careful to ensure that your network to such a location is super fast so it doesn’t become a bottleneck. I’ve seen ridiculously slow backups caused by the fact that the TSM server was in a data center two (western) states away. If you’re backing up to a vendor device like that, talk to the admin for it to find out how many sessions you can create against the device at a time – you can specify that in the db2 backup command. The more, the faster, but some implementations may not be able to support many.

Many larger implementation have the luxury of this sort of thing, but my small and mid-range clients simply back up to disk and then have a process that comes along and externalizes anything on that disk to slower off-site storage.

Backup Memory Usage

For larger databases, and especially those with memory pressure, you can test the use of the DB2 registry parameter DB2_BACKUP_USE_DIO. I’ve heard of some good experiences with it, but the little testing I’ve done with it on a smaller database hasn’t shown much difference. Test it thoroughly before relying on it.

What it does is for the write portion of the backup, it disables OS-level filesystem caching. On one hand this makes sense – DB2 is not going to ever read the data written to the backup image, so why cache it? On the other hand, writes to cache may be much faster than directly to disk. If your SAN has a cache, your disk write speed might support use of this.

Another thought on memory usage and backups – a backup will read every page in your database into the bufferpool before writing it out to the backup image. Thus if you have primed bufferpools and larger bufferpools, this means that DB2 has less to read into memory. If you have an over-abundance of memory, push DB2 to have larger bufferpools so there is less to read in.

OS-level Parameters

The one os-level parameter I’ve heard of making a difference is maxreqs on AIX. Check out this article on maxreqs and backups on aix. Essentially, you want to make sure that maxreqs is at least 256 times the number of threads (as determined by the value of parallelism for the backup).

Summary

Well, there’s my brain-dump on backup tuning. It mostly boils down to:

  • Split data into multiple tablespaces
  • Make sure util_heap_sz is large enough
  • Use fast backup locations
  • Trust the values DB2 picks up for backup parameters

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, and is now pivoting to focus on learning MySQL. Ember shares both posts about her core skill set and her journey learning MySQL. Ember lives in Denver and work from home

7 comments

  1. Please elaborate “If you have a database that is currently focused on a single tablespace, you can move tables online from tablespace using ADMIN_MOVE_TABLE.Due to RI restrictions, you’ll want to be on 10.1, fixpack 2 or higher to make this reasonable.” in your blog

    1. Prior to 10.1, fixpack 2, ADMIN_MOVE_TABLE cannot handle moving tables with foreign keys defined. With fixpack 2 or higher of 10.1 or any fixpack of 10.5, ADMIN_MOVE_TABLE can handle tables with foreign keys. Before those, you have to have some offline time for tables with foreign keys – drop foreign keys and then use ADMIN_MOVE_TABLE and recreate the foreign keys or something. See my blog post on Changing the Data Type of a DB2 Column (Method 2) for an example of how to use ADMIN_MOVE_TABLE on a table with RI prior to 10.1 fixpack2.

  2. Hi Ember,

    I was analyzing this case of backups where two different backups are having huge difference in duration of backups

    Backup 1:

    Database Version : DB2 V10.1 FP 5
    Operating System : Aix
    Hostname : Host1
    Database Name : Sample (34 GB)
    Backup Type :Online Netbackup
    Backup Compression: No
    Parallelism :8 (value obtained from db2diag.log)
    Buffer size and No of buffers :4097 / 8 = 128 MB (value obtained from db2diag.log)
    Util_Heap_Size:155 MB of 320 MB of heap space assigned. (value obtained from db2diag.log)
    Backup duration: 10 mins

    Backup 2:

    Database Version : DB2 V10.1 FP 5
    Operating System : Aix
    Hostname : Host1
    Database Name : Sample (34 GB)
    Backup Type :Online Netbackup
    Backup Compression: YES
    Parallelism :5 (value obtained from db2diag.log)
    Buffer size and No of buffers :2561 / 10 = 100 MB (value obtained from db2diag.log)
    Util_Heap_Size:155 MB of 320 MB of heap space assigned. (value obtained from db2diag.log)
    Backup duration: 1 hour 15 mins

    So if we see the differences above the major difference are parallelism, buffer size and compression. Is there any way to speed up backup compression. ?

    Thanks
    Abdul Qader

    1. You have a factor of network speed and activity on the netbackup target as well. The number of sessions used would make a difference as well. I do see a slowdown with compression, but the time for using the compress option on a backup command is less than the time it takes me to take the backup and then compress it. You may also want to be aware that when backing up to a vendor device, using compression on the backup command may negate any deduplication on the target device, so with vendor devices like netbackup, it may be better to choose not to compress.

      1. HI Ms. Ember,
        Good day,

        We’re using commvault for DB2 backup with 10G network, our size of DB is 3 Tb, automatic for buffer and parallelism, buffer on .log = 8, our config are with compression and dedup, our disk is using non ssd. This backup is taking about 4 hours and sometimes is 6 hours, would it be the compression? or do we need any config?
        Thank you very much.

        1. You would really have to look at BAR stats to see what might help it. Are you using the dedup flag on the backup command? Often dedup and compression don’t play well together.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.