A few days ago, I was involved in a conversation on Twitter with Ember and Erwin Hattingh, a DB2 DBA from the Netherlands whom I met at IDUG in Dublin last November.
Ember expressed her surprise at how well DB2 handled a nasty situation:
It's not every day #DB2 blows your mind with the ability to stay up in a situation you would have sworn would crash it. #DB2 ROCKS!
— ember_crooks (@ember_crooks) January 21, 2016
However, Erwin complained about DB2 not dealing with log file saturation particularly well:
Except this nasty one : the log hogg,… one tiny leftover rununit…. "Log file has reached its saturation point" https://t.co/duviHyDJVO
— Erwin Hattingh (@erwin_hattingh) January 21, 2016
If you’re not familiar with this situation, log file saturation occurs when there is an active transaction (one that has not issued a commit
or rollback
), keeping an old transaction log active. DB2 can not archive active transaction logs (because they are required for crash recovery), so an old, uncommitted transaction can lead to a transaction log full error (SQL0964C
) as other transactions use transaction log space.
In this illustration, you can see that there is a very small amount of log space this is used by active transactions (shown in orange); however, because the current LSN has nearly reached the end of the last transaction log file, log saturation is about to occur.
It is fairly easy to demonstrate log saturation, with only 2 applications connecting to a database. The first application will execute a single small statement, but will not commit, thereby keeping the transaction active:
idbjorh@dpx-ubuntu14:~$ db2 +c "delete from reports" DB20000I The SQL command completed successfully.
Next, we will start another “application” that will perform many individual transactions. For this example, I have started an infinite loop that inserts rows into a table (the goal of this is simply to fill the transaction logs):
db2inst1@dpx-ubuntu14:~$ while [ 1 ] ; do > db2 "insert into ch2 select * from ch" > done DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. ... DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL0964C The transaction log for the database is full. SQLSTATE=57011
As you can see, even though auto-commit was enabled (each insert statement was a separate, small transaction), the insert statement eventually failed when DB2 returned the SQL0964C
error. There is no indication that this was actually caused by log file saturation, so you must look at the db2diag.log
for evidence. Here is an example of the 2 records you’ll see:
2016-01-20-21.09.15.944061-420 E1078331E629 LEVEL: Error PID : 12600 TID : 140106224625408 PROC : db2sysc INSTANCE: db2inst1 NODE : 000 DB : SAMPLE APPHDL : 0-149 APPID: *LOCAL.db2inst1.160121040537 AUTHID : DB2INST1 HOSTNAME: dpx-ubuntu14 EDUID : 19 EDUNAME: db2agent (SAMPLE) FUNCTION: DB2 UDB, data protection services, sqlpgResSpace, probe:2860 MESSAGE : ADM1823E The active log is full and is held by application handle "0-156". Terminate this application by COMMIT, ROLLBACK or FORCE APPLICATION. 2016-01-20-21.09.15.944336-420 E1078961E593 LEVEL: Error PID : 12600 TID : 140106224625408 PROC : db2sysc INSTANCE: db2inst1 NODE : 000 DB : SAMPLE APPHDL : 0-149 APPID: *LOCAL.db2inst1.160121040537 AUTHID : DB2INST1 HOSTNAME: dpx-ubuntu14 EDUID : 19 EDUNAME: db2agent (SAMPLE) FUNCTION: DB2 UDB, data protection services, sqlpgResSpace, probe:6666 MESSAGE : ZRC=0x85100009=-2062548983=SQLP_NOSPACE "Log File has reached its saturation point" DIA8309C Log file was full.
The first message identifies the application that has the long-running transaction, and the second message highlights the fact that log saturation has occurred.
Tools, Please
DB2 provides tools to deal with (or prevent) this situation from resulting in a call in the middle of the night. Ember mentioned that she will use the DB2 governor, by setting up a rule that forces any connections that have been idle for 4 hours. Although this works, you need to be careful when setting the time threshold to ensure that you account for transaction volume – how quickly DB2 is moving through transaction logs. In my example above, the transaction rate was high enough that I was able to cause log file saturation to occur in about 15 seconds!
It’s also fairly easy to imagine a script that would monitor transaction log utilization and look for old, idle transactions and kill them. This could probably resolve most cases, but I am a lazy DBA and do not want to write and test another script.
Fortunately, there is another option: IBM added a database configuration parameter called NUM_LOG_SPAN
in DB2 9.7. This parameter sets a limit on how many transaction log files any single transaction can span; it does this by comparing the first LSN for a transaction with the current LSN of the database. If the difference between these exceeds the number of log files specified in NUM_LOG_SPAN
, then the offending connection is automatically forced off of the database. By setting this value to a number smaller than the sum of LOGPRIMARY
and LOGSECOND
, you can be sure that DB2 will not allow log file saturation to occur.
Let’s set NUM_LOG_SPAN
(it can be updated online) and then re-run the same test from earlier.
db2inst1@dpx-ubuntu14:~$ db2 connect to sample Database Connection Information Database server = DB2/LINUXX8664 10.5.5 SQL authorization ID = DB2INST1 Local database alias = SAMPLE db2inst1@dpx-ubuntu14:~$ db2 get db cfg | egrep -i 'logprimary|logsecond' Number of primary log files (LOGPRIMARY) = 3 Number of secondary log files (LOGSECOND) = 8
For this sample database, there are a maximum of 11 log files (3 primary + 8 secondary), so let’s set NUM_LOG_SPAN
to 10:
db2inst1@dpx-ubuntu14:~$ db2 "update db cfg using num_log_span 10" DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
Now, retry the same 2 applications and see what happens. First, the long running transaction:
idbjorh@dpx-ubuntu14:~$ db2 +c "delete from reports" DB20000I The SQL command completed successfully.
Next, I’ll start the other “application” that inserts records endlessly:
db2inst1@dpx-ubuntu14:~$ while [ 1 ] ; do > db2 "insert into ch2 select * from ch" > done DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. ... wait for some time ... DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully.
Perfect! The application never fails. (Well, at least not until the tablespace runs out of space!)
However, if we go back to the first application and try to perform an action:
idbjorh@dpx-ubuntu14:~$ db2 commit SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032
The ROLLBACK
statement fails, because DB2 has forced this application off of the database.
If we look at the db2diag.log
, we can see that DB2 has logged this event:
idbjorh@dpx-ubuntu14:~$ db2diag -H 5m -g function:=sqlpScanTran 2016-01-20-18.54.30.621096-420 E506785E733 LEVEL: Error PID : 5470 TID : 139829459281664 PROC : db2sysc INSTANCE: db2inst1 NODE : 000 DB : SAMPLE HOSTNAME: dpx-ubuntu14 EDUID : 22 EDUNAME: db2loggr (SAMPLE) FUNCTION: DB2 UDB, data protection services, sqlpScanTranTableForLowTran, probe:550 MESSAGE : ADM1541W Application "db2bp" with application handle "0-24" and application id "*LOCAL.db2inst1.160121013207" executing under authentication id "IDBJORH " has been forced off of the database for violating database configuration parameter NUM_LOG_SPAN (current value "10"). The unit of work will be rolled back.
Conclusion
Log file saturation is often caused by poorly-behaved applications – applications that will allow a transaction to sit open for long periods of time. It’s easy to want to blame a user who starts to work on something and gets distracted and walks away from their computer, but I tend to believe that application developers should work to prevent this.
However, if this is not feasible, NUM_LOG_SPAN
provides a powerful way to prevent log file saturation. However, please take care before enabling it. There is no indication to the application why it was forced off of the database, so you will want to work with your application’s owners to make sure that this is desired way to handle these kinds of problems.
Ian Bjorhovde is the principal consultant for DataProxy LLC in Phoenix, Arizona. He has worked in the IT industry for over 18 years and has spent the last 15 years as a consultant developing and supporting large DB2 LUW databases with clients in many different industries. He is an IBM Gold Consultant and IBM Champion, produces a DB2-focused podcast called The Whole Package Cache and has presented at many RUG, IDUG and IBM conferences.
Wow !! Excellent article !! I will put this in my to-do list immediately !!! Thanks for sharing !!
Ian, Excellent blog.. I love that parameter that would be of great help. But i have a doubt in using db2gov as it sometimes consider backup as idle application and forces it aswell.. So i would use this parameter making things simple. Thanks again
Great idea! We had the similar situations of “log full” caused by some uncommitted transaction. This is an excellent way to handle the situation. Thanks!
This is a great tip!
Great Tip…
But, a little doubt.
Suppose that exist two apps holding the logs, following your example, the DB2 will understands that each transaction can hold 10 logs or will share this number for both?
Thanks…
@Robelis, It does not matter how many applications are connected and active, it’s always the application with the oldest transaction (i.e. the application with the current LowTranLSN) that will be killed if the difference between Current LSN and LowTranLSN exceeds NUM_LOG_SPAN.
Only 1 application can have LowTranLSN at a time, so the application with the oldest transaction will be forced when NUM_LOG_SPAN is exceeded for its transaction. At this point, a different transaction will have LowTranLSN. If the Current LSN moves forward enough to cause this transaction to exceed NUM_LOG_SPAN, then it will be forced, too.
Make sense?
@soko15576 Hi Ian, I just ran into this problem using ATS running RUNSTATS. Great article!
Hi Ian
Great article, and I am also using NUM_LOG_SPAN. I have a question with similar topic, and
I am struggling to find answer. Very often I need to run large scripts e.g.:
update table where pid=1;
update table where pid=2;
etc..
where I could have thousands of lines. I would like to be able to stop on error and rollback, so I am using +c -stvf options. How many transactions I can have open this way? Is that the same as capacity of logs, to avoid log full ? I want to be able to run all without committing, and rollback on any error. Thank you in advance.
You can have as many transactions open as will fit into NUM_LOG_SPAN. Once you exceed that, it will get an error and be forced to roll back.