Edited 11/13/2014 to be more specific about coordinator agents.
This is the first in a series of blog entries talking about DB2 internals. I have no intimate knowledge from IBM of how DB2 works internally, just years of watching presentations about the internals and doing my best to understand what DB2 is doing to inform my actions and decisions as a DBA. Much of this information is in the depths of the Knowledge Center or is generally unpublished outside of an IBM class or IBM conference presentation.
How a Simple Query is Processed
The goal here is to show how processing works for a simple query using threads, memory area, and disk access. There are things at many levels that are not shown here, and lesser details left out, but this is basically how it happens. This covers a DB2 version of 9.5 or later, using row organized tables – a single partition database with INTRA_PARALLEL turned off.
In the images used here, circles represent threads or processes, while grey rectangles represent memory areas. Disks are represented by cylinders. Red arrows indicate the action described by each step. Click on any of the images for a larger version.
Here is how DB2 would handle a simple query, such as SELECT * FROM T1
Query is Sent to DB2 Server
First, the client sends the query to the server. A coordinating agent is either allocated from a pool or created to serve each application when it connects. There is only one coordinator agent for each query. Different coordinator agents exist to serve other queries on other connections. Each connection has only one coordinator agent.
Access Plan is Compiled
The coordinator agent checks to see if there is an access plan for this query in the package cache. If there is a plan, the coordinator agent can use it. If there is not a plan, an access plan is compiled and placed into the package cache.
Data is Retrieved
There are three main ways that DB2 can get the extents it needs with the data to fulfill the query. All data that is not LOB data moves through the bufferpools.
The first and fastest way DB2 can get the data is if the extents it needs are already in the bufferpools. If the extents needed are there, the coordinator agent can read the data directly from memory.
If the extents needed are not in the bufferpools, then the coordinator agent must go directly to disk, and read the extents it needs into the bufferpool from disk. This is much slower than simply reading data from the bufferpools, as the coordinator agent must wait for the data to be read from disk. Disk caching may make this access faster.
To avoid the slower synchronous disk access method, the prefetchers may be triggered to read data that is predicted to be needed into the bufferpools before the coordinator agent needs it. This is called Asynchronous I/O.
Locking
Depending on the isolation level, locks may be acquired during the processing of any query, including selects. If locks are required, then data about those locks is written to the lock list. When locks are released depends on the isolation level. All locks are released on a COMMIT, which is outside of the scope of this article.
Returning Data
Once all the data has been gathered, the coordinator agent returns the data to the client.
Summary
Trying to cover every single detail would be overwhelming, but this provides a good idea of what is going on. I would love to hear from readers in the comments below about this type of blog entry, as I have many more planned.
Thank you very much for sharing, Ember. Best regards from Spain.
Like it! It might also not be a bad idea to list which registry, DBM, and DB CFG parms may affect differing processes or points. Again, not necessarily going into detail, but just pointing it out may help others to make the connection between them.
Best blog I have seen so far on how to get data. I like the visuals as well.
Once again great to see a very lucid and crisp write up from you. We usually don’t find these kind of materials in IBM websites.Your article is equally good as of Matt Huras’s DB2 internals presented in IDUG.
However, i always have a doubt regarding the prefetching part. How DB2 predicts which page is needed? Isnt it more difficult in case of an OLTP environment where the page access is very random? Can you please shed some light on it?
It’s true that prefetching cannot help us with the singleton row query. However, there’s much more going on in most e-commerce/OLTP databases that either sequential or list prefetching can help with. I’ll put a blog entry about prefetching high on my to-write list.
Its very simple and very nice . I am expecting this kind of blogs more.
Nice blog. Thank you.
Hi Ember, Will you please kindly share your blogs in https://www.facebook.com/groups/kidug/ . This will help many people who loves DB2 for sure. Thanks in advance.
Regards,KIDUG
I use Facebook only for personal purposes at this time. Anyone else is welcome to post links there.
Your blog is really fantastic! The best I ever saw! You use a powerfull teaching method to make us understand how db2 is working and how we can extract the best from it.
Thank you so much.
And about this type of blog entry, let’s go! 🙂
Really enjoyed reading your blog, you explain everything very clearly. Loved the graphs too. Can’t wait for your next blog!
Thanks for this blog. Do you have something similar for DML statements, which also tells the statement workflow, and also what happens in case of database crash.
I am planning more entries in this series, which should include those.
Great work, Ember.
The way you have written about each topic, particularly the one above has more than enough details, with a good flow and explanatory pictures, takes us into a db2 world.
Enjoyed learning and looking forward to more articles.
Thanks & Regards.
Looking forward to this series!