Transaction log files are one of the core strengths of a DBMS, if you ask me. The ‘D’ in ACID refers to ‘Durability’. What Durability means is that once a transaction is committed, it is not lost, even if the database crashes or the server it is on has to be restarted.
Transaction Logging in DB2
DB2 writes transactions to the tranacation log files while they are still in progress. Every logged action is written to the log buffer. The log buffer is then written out to disk whenever either it becomes full or a transaction is committed (or a couple of other special situations). With many transactions in progress, this means that the active transaction logs on disk contain both committed and uncommitted changes. Part of the information in the transaction logs is what transaction various changes are a part of and whether or not they are committed.
Active Logs
Any uncommitted transactions can only be in the active log files. The active log files are defined in number by the database configuration parameters LOGPRIMARY and LOGSECOND, and in size by the database configuration parameter LOGFILSIZ. On database activation (or first connect), the number of log files defined by LOGPRIMARY are created. If DB2 runs out of room in these log files, then it will allocate additional logfiles up to the number specified by LOGSECOND. This is true for either circular or archive logging.
It is generally a good idea to keep LOGPRIMARY on the smaller side to speed up database activation time, but to also have room there for your normal day-in and day-out transaction logging purposes. Usually LOGPRIMARY on my databases is 14 or fewer, with the LOGFILSZ for midsize IBM Websphere Commerce database working well somewhere between 10000 and 60000. Obviously larger, more active databases need more. I like LOGSECOND somewhere between 50 and 240 to handle those unexpected spikes. Of course all of this has to take into account the size of your active log filesystem, too.
A single transaction cannot use more log space than LOGFILSZ(LOGPRIMARY + LOGSECOND). Large deletes fail when they fill up the available log files. Commit frequently to avoid this. You can also end up with a long idle connection causing logfile saturation – causing the database to be unavailable. See “Managing db2 transaction log files” for more information on log file saturation.
Archive Logs
All OLTP and many other databases should use archive logging.
When archive logging is enabled, you should always choose a location for LOGARCHMETH1, even if it is just a different directory. A different filesystem and even a vendor archive system such as TSM is vastly preferred. But even if you are only using a single filesystem, set it to a different directory. What this does is give you a location where you know you can compress, archive, or delete archived transaction log files without crashing the database. Because if you (or anyone) mess with any active log file, you are pretty likely to crash the database, probably in a pretty nasty way.
Side note: LOGARCHMETH1 was a great parameter when it was added. Before that we had to compile and manage userexit programs, and that was a pain.
An active log is archived whenever it is full, even if it contains transactions that are still being used. However, db2 does not ‘remove’ it from the active path until it contains no active transactions. I say ‘remove’ because it looks to us like it is removed, but actually the file is renamed and reused because that performs better than actually removing a files and creating a new one. Because of this scenario, you can see the same log file in the active and archive paths at the same time.
There are some scenarios, such as data replication or HADR that can cause an archived log file to be archived. Scenarios like this make it generally a bad idea to compress or move/remove every log file as soon as it it archived to disk. They can also require you to make retrieval of logs from a tape library or other location possible from more than one database server.
Log files, whether full or not, may be archived with an ARCHIVE LOG command, a DEACTIVATE DATABASE command or a BACKUP.
LOGBUFSZ
The log buffer size frequently needs to be tuned. The larger it is, the more partial transactions you may lose in a failure – though in most failures partial transactions are rolled back anyway. In a database snapshot, two numbers are reported like this:
$ db2 get snapshot for db on sample |grep "Log pages" Log pages read = 23108 Log pages written = 20513325
If you want to get that through an administrative snapshot view, you can use:
db2 "select log_reads, log_writes from sysibmadm.snapdb with ur" LOG_READS LOG_WRITES -------------------- -------------------- 23108 20513325 1 record(s) selected.
Interestingly enough, I’m not finding this in the mon_get table functions.
Generally, you want Log pages read to be at or near 0, and Log pages written to be a large number. If Log pages read is not 0, you may want to consider increasing your log buffer, but generally you don’t want it much over 1096 or so.
Basically during normal operations we want to mostly avoid log reads.
So, readers, please share with me other tuning that you do around log file size or log buffer size. Is there anything else on transaction log files that you want to learn about?
Another important parameter is NUM_log_span, the number of logs a transaction can span. Important to tune acc to type of the database
Yes, num_log_span and max_logs can be important parameters too. I think I (and most DBAs) could write a small book on transaction logging.
DB2 v10 now compress log files so a lot of space is saved at all levels [active and archive log paths, Virtual Tape Manager (disk and tape levels)]
Yes, and from what I understand, you can use that compression without purchasing the compression feature.
For LOGBUFSZ, is your recommendation to not go over 1024 or 4096? 1096 looks like an odd number 🙂
Good point. I try to stay under 1024, but have found myself going over that once or twice.
Super .. Now I get to know , when a active log gets archived.,one quick question,
For DB2 instance we have dbm cfg parameter called ‘svcename’ that give the port number for TCP/ip for client applicatons,.
lets say we have 2 active databases under a single instance .
that means we have one ‘svcename’, applications getting connected to both active databases … so how is this possible applications getting connected to both databases using single port or ‘svcname’… is there any port sharing mechanism.. I mean single port utilized by both Active Db’s
DB2 uses one port number for every instance. You cannot have separate port numbers per databases without separating them into separate instances. DB2 knows which database on that instance/port, because you must specify the database name when you connect.
ok., reasonable.. thankyou Ember.
Hi, I have to delete bulk records from a table; currently i am using the default configuration for number of logs written on disk.
Need some suggestions on setting the parameters so when i delete the records i am not getting any issues.
I am using db2 10.1 on Linux.
This blog article and it’s comments might be useful for you: https://datageek.blog/2012/07/12/how-to-delete-data-from-a-db2-table-and-release-disk-space/
Large deletes are always problematic, and committing often is usually the better solution. Increasing LOGSECOND is also a trick I frequently use, assuming you’re using archive logging.
Hi, i am raju.
I am new to DB2..
I have a doubt that, Is archive logs are reused when the application in rollback state? If yes please explain clearly how it happens and whe
a transaction log will not be archived until all transactions in it are committed. It is not possible to rollback a committed transaction. So assuming you’re not using data capture/dprop or HADR, Archived transaction log files will not be used by DB2 during normal operation.
Yesterday what happened is, in one of Windows congnos server one application is generating huge number of logs and I verified and found that one application is in rollback state and consuming more log space.. then I moved all the logs which are in archive log directory to get some space in the directory.. few hours later some app team call me and then said db2 is in stopping mode. My colleagues n TL said u moved all archive logs so rollback application hung and finally instance stopped. Is it correct?
Do you keep archived and active transaction logs in separate directories using the LOGARCHMETH1 parameter? What you described sounds like what happens when you move a log that is still active.
Yes.. the archive & active logs path are separated using logarchmeth1.. In the current setup… logprimary value was set to 200, log secondary value was set infinite and blk_log_dsk_ful has set to NO….. I was moving bunch of logs to another location to get free some free space in archive log path since one application which is in rollback state generating huge number of logs in a minute. while i am moving the archive logs into another location, at the same time i am monitoring the rollback status by completed work in application snapshot output. few hour later i got ticket instance got down.. then I shocked..
My queries…
current set up is logprimary value is 200 & logsecond is infite.. and blk_log_dsk_ful has set to NO…
1. if one very very big transaction is running and assume that transaction will generate 500 logs and even still that transaction is not completed then how many logs will be there in active log directory?
2. If one transaction is in rollback sate then Is that transaction will look for required log in archive log directory to complete the rollback
3. While application is in rollback, if that application could not find required log either in active log directory or archive log directory then db2 services will go into sopped mode?
Ah, infinite logging, that’s a whole other beast. I haven’t done infinite logging – have stayed away from it. 200 is HUGE for LOGPRIMARY, in my experience – many of my clients run with far fewer. How big are your log files?
Thanks Ember
Dear Ember,
When I am setting this LOGARCHMETH1 PARAMTER in db cfg as USEREXIT then it is showing following error in the diag log.
From db2diag.log I can see following error messages –
=====================================================================
2012-01-16-17.16.54.175099+330 E109828489E435 LEVEL: Error
PID : 28315 TID : 47130618095936PROC : db2sysc 0
INSTANCE: db2inp NODE : 000
EDUID : 25 EDUNAME: db2logmgr (INP) 0
FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile, probe:3155
DATA #1 :
Failed to archive log file S0000000.LOG to USEREXIT from /db2/INP/log_dir/NODE0000/ return code 16.
We are using
Db2 10.5 (Work group edition)
Redhat Linux (6.5) OS cluster
Data Protector (Version 9)
We have two nodes (active -passive) configuration for the same.
When we use DB1 or DB2 (physical IP),
Data protector can take online backup as well as archival log backup using USEREXIT parameter.
When we are using virtual(floating IP),
Data protector can take online backup but it wont be able to take archival log backup on tape.
Coul you please help on this.
Thanks in advance.
Regards,
Pratik Shah
pratik.titu@gmail.com
9428694150
Dear Ember,
When I am setting this LOGARCHMETH1 PARAMTER in db cfg as USEREXIT then it is showing following error in the diag log.
We are using
Db2 10.5 (Work group edition)
Redhat Linux (6.5) OS cluster
Data Protector (Version 9)
We have two nodes (active -passive) configuration for the same.
When we use DB1 or DB2 (physical IP),
Data protector can take online backup as well as archival log backup using USEREXIT parameter.
When we are using virtual(floating IP),
Data protector can take online backup but it wont be able to take archival log backup on tape.
2012-01-16-17.16.54.175099+330 E109828489E435 LEVEL: Error
PID : 28315 TID : 47130618095936PROC : db2sysc 0
INSTANCE: db2inp NODE : 000
EDUID : 25 EDUNAME: db2logmgr (INP) 0
FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile, probe:3155
DATA #1 :
Failed to archive log file S0000000.LOG to USEREXIT from /db2/INP/log_dir/NODE0000/ return code 16.
Coul you please help on this.
Thanks in advance.
Regards,
Pratik Shah
pratik.titu@gmail.com
9428694150
I haven’t used USEREXIT in at least 10 years – it used to be the only option and we hated it – so glad when they came up with LOGARCHMETH parameters instead. If you’re using userexit, you generally have to have a compiled C program to do the actual user exit process. Do you have that executable in place? It must be named properly and be in the right location. Check out this link on some of the details of USEREXIT: http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/c0024701.html
Hi Ember,
I wanted to know whether we can monitor which agentid is consuming the maximum number of logs. Often we face the situation of log full condition. During that time we see it in db2diag.log that which agentid we need to kill to resolve the issue.
But we want to know beforehand only so that we can avoid the log full situation.
We used to monitor log utilization through sysibmadm.log_utilization and check the percent of log usage but don’t get any information which application is using the most.
I think that what you’re looking for is the application id holding the oldest transaction log (and log file saturation seems to be what you’re describing). This query should give you what you need (works on 10.5, let me know if it doesn’t on the version you’re on, and there are other ways to get it): db2 “select APPLID_HOLDING_OLDEST_XACT from table(mon_get_transaction_log(-2)) with ur”
Thanks for the reply. Unfortunately it is not working in our environment which uses DB2 V9.7. I have searched this in Infocenter, but could not find anything. So, i think there must be some indirect way of finding it .
I got the query for V9.7.. It is db2 “select appl_id_oldest_xact from sysibmadm.snapdb”. Thanks for your help.
Hi Ember,
In your prior reply which is dated a year later from creation of blog post, mon_get_transaction_log() is referenced – this table function is available in db2 10.x and contains information for snapshot elements log_reads and log_writes.
Best Regards,
Nadir
[…] Transaction Logging Details – db2commerce.com – Super .. Now I get to know , when a active log gets archived.,one quick question, For DB2 instance we have dbm cfg parameter called ‘svcename’ that give the port … […]
Hi Ember,
Thanks for sharing your knowledge and experience to us.i have question how select query causes transaction log full issues.
I really appreciate on sharing your knowledge to others
Selects certainly can acquire locks, but a pure select alone should not cause transaction logs to fill.
Hi Ember
One of the database have been set as below:
Catalog cache size (4KB) (CATALOGCACHE_SZ) = 300
Log buffer size (4KB) (LOGBUFSZ) = 256
Log file size (4KB) (LOGFILSIZ) = 185000
Number of primary log files (LOGPRIMARY) = 20
Number of secondary log files (LOGSECOND) = 200
Changed path to log files (NEWLOGPATH) =
Path to log files = D:\DB2\NODE0000\SQL00001\LOGSTREAM0000\
Overflow log path (OVERFLOWLOGPATH) =
Mirror log path (MIRRORLOGPATH) =
First active log file = S0000272.LOG
Block log on disk full (BLK_LOG_DSK_FUL) = NO
Block non logged operations (BLOCKNONLOGGED) = NO
Percent max primary log space by transaction (MAX_LOG) = 0
Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0
First log archive method (LOGARCHMETH1) = DISK:D:\tr_logs\
I have noticed that the First active log file number remains same while on the location (D:\tr_logs) the number of log files are growing, i cannot run the prune command to free up space. What could be the reason?
DB2 version is 10.5.8 on windows 2012
Thanks
You may have a connection holding on to pages in that first log file. You cannot prune the log until transactions are committed or rolled back. Is your application doing appropriate commits?
Hi Ember ..
As you suggested we should keep LOGBUFSZ under 1024,
we have Log buffer size (4KB) (LOGBUFSZ) = 4096
still we are facing lost for log reads
Log pages read = 31041539
Log read time (sec.ns) = 63191.960901451
Log pages written = 287942923
Log write time (sec.ns) = 91488.767335623
can we increase size of LOGBUFSZ to 8192
Yes, some environments simply require larger sizes, due to transacitonal volume.
Hi Ember,
We have the current log file sizes. Does increasing the log file size or log buffer size be helpful ?
How to come to an optimal number ? Like making my logbufsiz to 4096 or 8192 ?
Log buffer size (4KB) (LOGBUFSZ) = 2048
Log file size (4KB) (LOGFILSIZ) = 5120
Number of primary log files (LOGPRIMARY) = 256
Log pages read = 1260489
Log read time (sec.ns) = 129.511013232
Log pages written = 673672762
Log write time (sec.ns) = 552544.818212560
Number write log IOs = 461438463
Number read log IOs = 160663
Number partial page log IOs = 214285295
Number log buffer full = 166000
Generally you increase it slowly until the number of log pages read is where you want it. Given your numbers, I’d likely increase it. I’d probably try 3,072 first.
A note on your log files – I like to have LOGPRIMARY+LOGSECOND add up to something a bit less than 256. This means I can increase LOGSECOND (the only online configurable log file size parameter) on the fly to temporarily and quickly get out of a log full situation. It is also unusual to have all 256 as primary files, as more primary files leads to a longer database activation time. You may want to consider increasing LOGFILSIZ a bit, reducing the size of LOGPRIMARY, and increasing the size of LOGSECOND. Not critical to performance overall – the only area this would help performance for is for database activation time.
Hi Ember, thanks for your reply. We identified the performance bottleneck is at LOG_DISK_WAIT_TIME being very high(using the monreport). I was planning to increase the LOGBUFSZ. Do you think we need to increase LOGFILESIZ as well ? Which parameter will help to reduce the LOG_DISK_WAIT_TIME. We are on a fast SSD disk which only holds the transaction logs. No disk level issues.
Log buffer size (4KB) (LOGBUFSZ) = 2048
Log file size (4KB) (LOGFILSIZ) = 5120
Is your total wait percentage high (across all wait consumers)? What percentage of total time does wait represent, and of that, what percentages does log disk wait represent?
That log file size does seem a bit low. I like to see how often archives are occurring, and aim for 1 log file every 15 minutes, though that’s not a hard and fast rule.
Particularly for OLTP databases, applying the fastest disk available to active transaction logs is critical.
Ember,
Thanks for your valuable inputs. As mentioned earlier, I increased the LOGBUFSZ from 2048 to 4096. It does seem to bring down the overall waittime, but still there is performance degradation whenever bulk DML operation runs.
Yes,from the dbsummary report, the total wait time % is high(close to 80%) and in that LOG_DISK_WAIT_TIME is again high 80%.
Archiving happens very fast. Its a high load OLTP and we archive close to 50-60 files every hour during business time. During some batch jobs with high number of DML operations, it reaches close to 200-250 per hour causing overall slowness.
Current LOGFILSIZ is 5120(20 MB) with 256 Primary log files.
Should we increase gradually or make it a big number(like 200 MB of 50 files or 100 primary files) considering the number of archiving happening every hour.
We do have enough space in the filesystem and also the OS person states that its on the fastest disk.
During a small batch run:
— Wait time as a percentage of elapsed time —
% Wait time/Total time
— ———————————-
For requests 81 856215/1047404
For activities 63 315651/499811
— Detailed breakdown of TOTAL_WAIT_TIME —
% Total
— ———————————————
TOTAL_WAIT_TIME 100 856215
I/O wait time
POOL_READ_TIME 8 69967
POOL_WRITE_TIME 0 57
DIRECT_READ_TIME 0 36
DIRECT_WRITE_TIME 0 848
LOG_DISK_WAIT_TIME 79 680386
LOCK_WAIT_TIME 6 51522
AGENT_WAIT_TIME 0 0
Network and FCM
TCPIP_SEND_WAIT_TIME 0 769
TCPIP_RECV_WAIT_TIME 0 206
IPC_SEND_WAIT_TIME 0 1
IPC_RECV_WAIT_TIME 0 0
FCM_SEND_WAIT_TIME 0 0
FCM_RECV_WAIT_TIME 0 0
WLM_QUEUE_TIME_TOTAL 0 0
Application performance database-wide
——————————————————————————–
TOTAL_CPU_TIME TOTAL_ TOTAL_APP_ ROWS_READ +
per request WAIT_TIME % COMMITS ROWS_MODIFIED
———————- ———– ————- —————————-
306 81 5479 11178992
I’m not sure how big of a difference it will make, but usually for OLTP, I’ll start with a log file size of 12,000. I start worrying if my log files might be a bit too large above about 50,000 or so. Given the frequency of log archives, I would certainly be increasing the log file size, and monitoring carefully to see if it makes a difference.
Dear Ember, am new to db2 , can you help with a step by step method of maintenaing tables and indexes and also some tips on the db2 daily and weekly maintenace routin . My db2 version is 10 and advance work group .
Thanks
You might want to start here: https://datageek.blog/2019/01/22/what-you-need-to-do-to-support-a-db2-environment/
There are some good topics here, as well: https://datageek.blog/2018/03/22/index-of-introductory-db2-topics/
Nice to see such an old blog post still kept alive with questions and answers, I will definitely keep this in my “toolbox.”
So I dare put a question on my own:
We use TSM for log file archiving, which works fine. But lately the DB often comes to a halt with these messages:
MESSAGE : ADM1823E The active log is full and is held by application
handle “”. Terminate this application by COMMIT, ROLLBACK or
FORCE APPLICATION.
If I kill the troublesome process then everything starts working again. But what exactly does “The active log is full” mean?
The file system is not full and there are a lot of log files but it feels like one simple/bloody log file puts a stop to the whole chain from UPDATE to the destination at TSM.
Kind regards from an oracle tainted admin.
These should help: https://datageek.blog/2016/02/02/preventing-log-saturation/ and https://datageek.blog/2017/03/14/determining-log-span/
Hello, slightly different question. We have DB2 DB which is used by core business transactions replicated using QRep to another database used for showing data in UI. We found that there are a few UoW which are containing huge number of rows like 220000 – causing significant delays in QRep. But do not know which part/ component of application is causing those. Need to find the rogue application component which is committing such huge UoW’s. How can we identify the source process OR the tables in which these many changes were committed (which in turn will enable identifying component). Is there a way to log tables involved in and/or process committing monster transaction in DB2 transaction logs? Any help/ pointers
How easy this is depends on other activity in your database. If it is a fairly slow one, you can just look for jumps in rows written by table in mon_get_table or statements with a large number of rows written in mon_get_pkg_cache_stmt.
If the database is busy, you’ll have to do something more difficult and invasive like an activity event monitor to catch it.
Apologies, ignore my previous 2 replies/ comments pending moderation. Figured the query I needed.
But unfortunately, it seems to provide AGGREGATE values for # of rows updated etc – and not for individual statement execution level. While there’s a way to calculate average for number of executions; it may not help identify outliers which updated a large number of rows only in some of executions.
Is there no monitoring function which can get me # of rows updated in 1 execution (and allow to filter if it’s > 1000 or some such value)? Alternatively, at each UoW level log entry which can tell me how many rows and associated statement(s)?
You would need an activity event monitor to get statement-by-statement details, and that has a potential of significant performance impact as well as generating a lot of data.
You could calculate the average for each statement (rows read or returned or updated divided by number of executions) or average for each transaction for currently connected transactions in mon_get_connection (rows read or returned or updated divided by commits + rollbacks), but that’s still aggregates.
A statistics event monitor might help, but I don’t have enough experience with them to know for sure.
Hi,
log files to archive vendor taking 15min to 16min, please can you suggest, is this slow or normal and let me know what are the things need to check
2020-07-07-17.59.27.978147+060 E192228525E438 LEVEL: Info
PID : 9506 TID : 139992789673728 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000 DB : WAREHOUS
HOSTNAME: ukbrt-mctvpsl02
EDUID : 27 EDUNAME: db2logmgr (WAREHOUS) 0
FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile, probe:3108
DATA #1 :
Started archive for log file S0043262.LOG.
2020-07-07-18.14.56.353881+060 I192228964E502 LEVEL: Info
PID : 9506 TID : 139992789673728 PROC : db2sysc 0
INSTANCE: db2inst2 NODE : 000 DB : WAREHOUS
HOSTNAME: ukbrt-mctvpsl02
EDUID : 27 EDUNAME: db2logmgr (WAREHOUS) 0
FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile, probe:3180
DATA #1 :
Completed archive for log file S0043262.LOG to VENDOR chain 2 from /data/db2inst2/NODE0000/LOGSTREAM0000/.
Thank you.
How large are the log files?
hello how we can find the details of archived files, when it was created. I know it is possible for db2fmtlog but this is for version 11 and we are only using version 10
Are you trying to determine this without access to the database? If you have access to the database, you could just query the history file to find out when it was archived at least.
Hi Ember,
Hope you are doing well….. I am new to db2 env and saw your article
If any db2 customer, by mistaken deleted transactions logs. Any possibility to recover database, considering no backup performed ?
If there is any recovery ref. please share…..
Thanks in advance !!!!!
db2 9.7 Version running on AIX 61 Tl06 on IBM Power 7
Kindly advise
Unfortuneately there mostly isn’t. This is one of the big reasons to take backups.
IBM support MAY be able to help you by creating dummy files for the logs. You may be able to dump some of the data out using db2dart, but it is highly unlikely that you get back a fully intact and functioning database. Generally, even if support is able to help, you’ll have to trash the DB and create a wholly new one as it’s not really a stable state after forcing something like that.
I say this from having encountered similar situations. I’ve used restores to get out of this in the real world, and attempted to help clients (who were not clients before they came to me with that problem) recover from this scenario, but it is often very difficult if possible at all.