What a Commit is
Database management systems have the concept of a commit. This concept has to do with grouping sets of actions together such that they all succeed or fail as a group. This represents the A in the famous ACID properties of a transaction system. The A stands for Atomicity – meaning that a transaction may consist of multiple small parts, but that if one of those parts fails, then the transaction as a whole must also fail to have made any changes to the data. This concept is especially critical in relational databases which may be normalized into many smaller tables, which leads to a transaction consisting of more than just a single update in a single location.
Assuming that an application does not have autocommit turned on, the following represents and example of a transaction and the use of a commit.
BEGIN TRANSACTION select columns from table a where id < NNNN; update table_a set col1=1234; update table_b set col2=5678; insert into table_c .... ; commit work; END TRANSACTION
In the above example, if the update statement for table_b fails, then the application will detect that and the commit work statement will never be executed or a rollback statement may be executed. DB2 will rollback the update to table_a and the insert into table_c.
A commit also has to do with the D in ACID - Durability. The durability property means that once a commit happens, the data persists (traditionally on disk), even if power is lost or other likely failures occur. This is a major reason that databases have transaction logs.
When Commits Occur
The DB2 command line usually has autocommit turned on by default. This means that if you are simply issuing db2 commands from a command line, then you do not generally have to commit. Many applications manage commits in other ways, including some GUI database administration or access applications.
When designing an application, designers and developers must be certain that commits are happening when they are supposed to.
DBAs also advocate for frequent commits during long-running actions like data loads or large deletes. Largely, this is so the transaction log files do not fill up.
Frequent commits support the highest level of concurrent data access.
What Commits Do
DB2 uses a method referred to as write-ahead logging. This means that when a commit occurs, the data is written directly to the transaction logs. Data gets to the tables and such on disk asynchronously, through the buffer pool. A commit does not write data out to the tables itself. This saves time - the end user is not sitting and waiting on as much I/O.
A side note - DB2 writes both redo and undo data to the transaction logs - it is not like Oracle where redo and undo logs are separate things.
DB2 has a memory buffer called the log buffer. As data is changed, it is written to the log buffer, committed or not. The log buffer is then written out to disk either when it becomes full or when any connection does a commit. The commit is not successful until it has been externalized from the log buffer to the log files on disk.
Uncommitted data can thus be written from the log buffer into the log files when this occurs. But DB2 tracks and knows which transactions are committed, and which are not.
In the image above, an agent writes a commit record to the log buffer (1). The logger process then writes the log buffer to disk (2) - either immediately or shortly based on MINCOMMIT and other factors. The agent waits for the acknowledgement from the logger process that the log records for the agent's commit have been externalized to disk (3), up to and including that commit record. More than one agent may be waiting at a time. Since this is where end users are waiting on physical I/O, it makes sense to apply your fastest disk to the transaction log files.
Note that the commit does not touch bufferpools or tablespace data. As statements have come through, they have been writing all the information for both undo and redo to the log buffer, so it's already there or in the transaction log files waiting for the commit.
If a database crashes, then when it comes back up, it goes through a process called crash recovery, which looks at the transaction log files and rolls forward through any transactions needed. After it completes that "forward phase" of crash recovery, it goes and rolls back any transactions that were in the log files and were not committed.
I recently had a customer who wanted to explicitly turn off autocommit for his sessions, as he prefers the explicit commit when needed. 😉
There are three ways to configure this:
– Start a db2 command shell with “db2 +c” or append a “+c” to all db2 invocations
– In a db2 command script you can use “update command options using c off” to switch it off on demand.
– Or you could define an environment variable DB2OPTIONS=”+c” and configure this in the users environment.
Very clear, succinct job reviewing the main interactions of a commit. I just thought I would add a note for anyone using broad “currently committed” data access, which a lot of folks seem to be liking these days. If you do, and you’re in a highly volatile OLTP environment where you likely have infinite logging enabled, then you’ll run into problems if you don’t use an extremely large (read: unreasonably large) log buffer given the following restriction on the currently committed access level:
“A transaction that has made an uncommitted modification to a row forces the currently committed reader to access appropriate log records to determine the currently committed version of the row. Although log records that are no longer in the log buffer can be physically read, currently committed semantics do not support the retrieval of log files from the log archive. This only affects databases that are configured to use infinite logging.”
Basically, just note that w/currently committed, you need to be as sure as you can be that the rows you need to affect/undo/redo/whatever, are found in the buffer and not the log archive on disk. More explication in supposedly well-understood areas like this are needed so keep it up. Great post, Ember.
Awesome detail, thanks!
Great Article Ember, thanks very much!.
I’d like to ask if anybody has run into applications staying on “Commit Active” state for seconds and how did they solve/root cause it?. I’ve ran into 2 different customers having this strange issue; eventually the issue disappeared but during the time it last (days) we never identified the root cause.
without having had this problem myself, you should look at logbuffer sizing, mincommit and disk contention on your logdevice.
Also quite small transaction log sizes can tend to lead to much archiving action, slowing down commit processing sometimes.
But beware, changing mincommit requires a lot of knowledge about your environment. It is a kind of running gag of the lab/pmr guys to change mincommit to a higher value than 1 and having problem with commit processing.
If i set the MINCOMMIT to be 20 and in the application just after 19 COMMIT operations the database server crashes. So, it means the commited data are yet to be written to disk from Log buffer. Will CRASH RECOVERY be able to recover the 19 COMMITs that were performed prior to crash? or all of them are lost?
Yes, they would likely all be lost. MINCOMMIT controls when data is externalized to disk, so the commits would not be on disk, and everything is memory is lost in a crash. I’ve heard the rule of thumb is to increase MINCOMMIT very slowly and it generally shouldn’t be set to more than three.
I always prefer ur articles when i don’t know any concept,Because ur articles wil starts from Basic->High.I had a small dbt,if log files are deleted and i didn’t updated the log mirroring parameter,then how can i get those deleted logs.Can transaction log space wil be full in Circular logging?
Log space can absolutely be full with circular logging. One transaction must fit into the active log space. All active transactions must fit into active log space. If they do not, you will get a log file full error message.
If transaction log files are deleted, then your only option is generally to get them from an os-level backup if you happen to have one. If active log files are deleted, you will likely corrupt your database. In a few rare circumstances, DB2 may be able to recreate the log files, but you shouldn’t count on it.
Does that answer your questions?
Yep,got the info clearly.Thanks for that.
Thank you for your post but perhaps you can answer a question for me. If I have to separate modules running and updating the same table at the more or less the same time. Would a commit by 1 module commit changes by the other? There is no way they can be updating the same record at the same time.
Module 1 will update certain fields on Table A and based on these updated values Module 2 will process the record and update whatever values it needs to.
In my e.g.
Module 1 Retrieves and updates row 1 on the table & commits the change
Module 2 Retrieves Row 1 & based on the values will process the row but will not commit immediately.
Module1 retrieves & updates Row 2 on the table and commits the change
Will this last commit from Module 1 commit the changes from Module 2?
This is why we have row locking in DB2 and while it can be a pain sometimes, module 2 would not be able to update the row until module 1 had committed or rolled back, releasing its exclusive row lock. Depending on the isolation levels, module 2 may not even be able to read the row while module 1 is working on it. What you describe is part of ACID processing – Isolation. Every database platform that ensures ACID should separate the transactions – how that occurs depends on the platform.
Let me know if that does not answer your question.
Thanks for the detailed explanation, I just have one question. We are trying to load millions of records, where I only see a throughput of 5k records/sec running insert statement. I know the Load Utility can run much faster. But to me 5k seems to be very less and i only see Log disk write wait. We tried all the parameters related to Log but nothing changes. Also the Avg Write(ms) is just 1 ms. is 5k the optimal through put? are there anything that I’m missing?
Did you increase LOGBUFSZ? The default is far too small for it for most situations.
Insert is also affected by things like triggers and indexes on the table – consider if those might be impacting insert speed.
You’ll get better throughput with import over insert, because you can use the commitcount parameter to commit a group of records at a time.
If insert performance is a top priority, placing your active transaction logs on the fastest possible disk can also be really useful.
This is very proper description of Commit. Thanks for this. I have one query though. Suppose we have run some update /insert query on a table. Immediately after that we run a Commit. Can we ever get -904 SQLCODE for this commit if the query before commit completed successfully?
Anything is possible. It depends on the resource that is unavailable. Commit must be able to write the log buffer out to log files, and that may involve interaction with the HADR standbys. So there are certainly resources that must be accessed for commit.