Some time ago, I wrote a post that suggested using the NUM_LOG_SPAN
database configuration parameter to prevent long-running transactions from filling your transaction log.
Using NUM_LOG_SPAN
will indeed prevent the problem of long running transcations causing log file saturation, but what if you just want to find applications could become a problem, before the problem happens?
Understanding LSNs
Most DBAs will be familiar with the term “LSN”, which is an acronym for Log Sequence Number. An LSN is a 64-bit number, and is the byte offset from the start of the first transaction log associated with a database. DB2 reports LSNs in hexadecimal instead of decimal.
The first transaction log, S0000000.LOG
starts with the LSN of 0x0000000000000000
. If LOGFILSIZ
is 16384 (= 67,108,864 bytes), we can calculate that the first LSN in the next log file, S0000001.LOG
will be 0x0000000004000000
: This is the representation of 67,108,864 in hexadecimal.
When a transaction is started, it is assigned the next available LSN in the database. The next available LSN is called the current LSN. When this LSN has been assigned to a transaction, it becomes known as that transaction’s first LSN. In an active database, the current LSN is constantly changing (increasing) as transactions are written to the log, but a specific transaction’s first LSN will never change.
Incidentally, you may realize that (eventually) the LSNs for a database will be exhausted: The maximum possible LSN is
0xFFFFFFFFFFFFFFFF
. When this eventually occurs, DB2 will put the database into read-only mode, and the only way to resolve this is to create a new database and move all data to it. (DB2 will start warning you when this is approaching by placing ADM1849C messages into the db2diag.log).Running out of LSNs in the transaction log used to be a concern: in DB2 9.5 and earlier, LSNs were only 6 bytes, which provided “only” about 140 terabytes of transaction log space. However, do not fret: increasing the LSN to 8 bytes increased the the current LSN limit to approximately 18.4 exabytes. To put this into perspective, if your database were writing to the transaction log at an absolutely unimaginable constant rate of 500 MB/s – 128,000 log pages/second – it would take over 1,115 years to reach the LSN limit. (When LSNs were only 6 bytes, it would only take 6.2 days of logging at 500 MB/s to hit the LSN limit!)
How Many Log Files does a Transaction Span?
Knowing that an LSN is just a number means that, if you know the database’s current LSN and a transaction’s first LSN, you can use a simple formula to calculate the number of log files that a transaction spans:
(currentLSN - firstLSN) Logs Files Spanned = ------------------------- LOGFILSIZ * 4096
Applying the Formula
The monitoring interfaces offer limited resources to help with this. DB2 provides a monitor element, appl_id_oldest_xact
that contains the application ID that has the oldest transaction in the database, which provides a useful starting point for finding the oldest transaction in a database. However, this element only returns a single application ID. If is more than one connection to the database with an old, uncommitted transaction, you won’t be able to determine this using any of the monitor elements in the database.
The MON_GET_TRANSACTION_LOG
table function will provide the database’s current LSN and can help you determine first LSN for one application (the application with the oldest active transaction). For example:
select APPLID_HOLDING_OLDEST_XACT, OLDEST_TX_LSN, CURRENT_LSN from table (MON_GET_TRANSACTION_LOG(-2)) as t APPLID_HOLDING_OLDEST_XACT OLDEST_TX_LSN CURRENT_LSN -------------------------- -------------------- -------------------- 762 135917018 135917048
However, if you want to see the first LSN for all active transactions, you’ll need to use db2pd -transactions
. The 9th field is firstLSN
, and is reported in hexadecimal:
You’ll notice that some applications will have a first LSN value of 0x0000000000000000; this is normal and means that the application does not have an active transaction – it last issued a COMMIT or ROLLBACK.
Conveniently, you can also use db2pd to return the database’s current LSN, using db2pd -logs
:
Putting all of this together, you can write a fairly simple shell script that finds all applications with open transactions in your database, and calculates how many transaction log files each application’s active transaction is spanning:
#!/bin/ksh # if [[ ${1:-null} = "null" ]] ; then print "Usage: $0 <database>" else db=${1} fi # Find LOGFILSIZ logfilsiz=$(db2pd -db ${db} -dbcfg | grep LOGFILSIZ | awk '{print $3}') # Find Current LSN. Using printf with awk converts the value from hex to decimal. currentlsn=$(db2pd -db ${db} -logs | grep "Current LSN" | awk '{printf "%d\n", $3}') # Step through applications with open transactions and calculate log files spanned: db2pd -db ${db} -transactions | \ grep 0x | \ awk '{ if ($9 != "0x0000000000000000") { printf "%d %d\n", $2, $9 } }' | \ while read apphdl firstlsn ; do # Calculate log span logspan=$(echo "scale=2; ( ${currentlsn} - ${firstlsn} ) / ( ${logfilsiz} * 4096 )" | bc) printf " Application ${apphdl} spans %5.2f log files\n" ${logspan} done
Here’s the output of this shell script:
idbjorh@dpx-ubuntu14:~$ ./logspan.sh sample Application 15046 spans 44.26 log files Application 14997 spans 0.08 log files Application 14722 spans 93.70 log files Application 14904 spans 3.13 log files
Please note that db2pd may change its output format, so while this script was tested with DB2 9.7, 10.5 and 11.1, future changes to db2pd may break this script and require some modifications.
Conclusion
The script as written above is of limited value – it provides some basic information. However, it could be enhanced to alert you if, for example, there are multiple transactions (not just the “oldest” transaction) are getting close to causing a problem.
However, if you are using NUM_LOG_SPAN
to prevent log file saturation as described in my earlier post, then this script may be more useful as a starting point to help keep track of the users or applications that are getting forced off the database. You could then follow up with the user(s) to make sure they understand what is causing their problem and can change their behavior to become better database citizens.
Ian Bjorhovde is the principal consultant for DataProxy LLC in Phoenix, Arizona. He has worked in the IT industry for over 18 years and has spent the last 15 years as a consultant developing and supporting large DB2 LUW databases with clients in many different industries. He is an IBM Gold Consultant and IBM Champion, produces two DB2-focused podcasts called The Whole Package Cache and Create Database and has presented at many RUG, IDUG and IBM conferences.
G8 article Ian, But seems its not working for AIX. Checking on that.
Murali, can you provide some details? I tested this on AIX and it was working for me.
Seems some problem with my shell I changed that and it worked, I confirm that its working.
thank you for the article, It did clear all my doubts.
hello, I tried this in our linux environment but when I did, the output was
Usage: ./longquery.sh then that’s it. 🙁
Shenzu, you need to provide the database name to the script. So you’d execute the script as `./longquery.sh SAMPLE` (replace `SAMPLE` with the real name of your database).
Hi Ian, it worked now, hehehe!! thanks a bunch. As we say it in Japanese, Domo Arigattou Gusaimasu!! always take care!!
Hi Ian,
Not sure why script is not working correctly in our AIX environment . We deployed the script in cron with schedule of every 10 mins. In logs file, num log span never goes beyong 0.06 but in diag.log I can see voilation of num_log_span parameter. Can you please help to troubleshoot the script. We are using DB2 10.5.
Thanks,Arun
Arun,
It might be a problem with the version of awk that you’re using. If you run the command:
echo “0x000000000533B7AA” | awk ‘{printf “%d\n”, $1}’
If this returns 0, then there’s a problem with your version of awk. What version of awk (or AIX) are you using?
Thanks Ian for your response.
Here is the output of command.
echo “0x000000000533B7AA” | awk ‘{printf “%d\n”, $1}’
87275434
Thanks, Arun
This script returns the logspan of each transaction, but this is unrelated to num_log_span parameter.
One transaction can hold 1 log while other transactions write a lot of logs. As long as the first transaction does not commit the system-wide logspan keeps growing.
That is why Advanced Log Space Management is being developed.