DB2 Internals: Simple Query Processing

Edited 11/13/2014 to be more specific about coordinator agents.

This is the first in a series of blog entries talking about DB2 internals. I have no intimate knowledge from IBM of how DB2 works internally, just years of watching presentations about the internals and doing my best to understand what DB2 is doing to inform my actions and decisions as a DBA. Much of this information is in the depths of the Knowledge Center or is generally unpublished outside of an IBM class or IBM conference presentation.

How a Simple Query is Processed

The goal here is to show how processing works for a simple query using threads, memory area, and disk access. There are things at many levels that are not shown here, and lesser details left out, but this is basically how it happens. This covers a DB2 version of 9.5 or later, using row organized tables – a single partition database with INTRA_PARALLEL turned off.

In the images used here, circles represent threads or processes, while grey rectangles represent memory areas. Disks are represented by cylinders. Red arrows indicate the action described by each step. Click on any of the images for a larger version.

Here is how DB2 would handle a simple query, such as SELECT * FROM T1

Query is Sent to DB2 Server

Internals_SimpleQuery_1
First, the client sends the query to the server. A coordinating agent is either allocated from a pool or created to serve each application when it connects. There is only one coordinator agent for each query. Different coordinator agents exist to serve other queries on other connections. Each connection has only one coordinator agent.

Access Plan is Compiled

Internals_SimpleQuery_2
The coordinator agent checks to see if there is an access plan for this query in the package cache. If there is a plan, the coordinator agent can use it. If there is not a plan, an access plan is compiled and placed into the package cache.

Data is Retrieved

There are three main ways that DB2 can get the extents it needs with the data to fulfill the query. All data that is not LOB data moves through the bufferpools.
Internals_SimpleQuery_3a
The first and fastest way DB2 can get the data is if the extents it needs are already in the bufferpools. If the extents needed are there, the coordinator agent can read the data directly from memory.

Internals_SimpleQuery_3b
If the extents needed are not in the bufferpools, then the coordinator agent must go directly to disk, and read the extents it needs into the bufferpool from disk. This is much slower than simply reading data from the bufferpools, as the coordinator agent must wait for the data to be read from disk. Disk caching may make this access faster.

Internals_SimpleQuery_3c
To avoid the slower synchronous disk access method, the prefetchers may be triggered to read data that is predicted to be needed into the bufferpools before the coordinator agent needs it. This is called Asynchronous I/O.

Locking

Internals_SimpleQuery_4
Depending on the isolation level, locks may be acquired during the processing of any query, including selects. If locks are required, then data about those locks is written to the lock list. When locks are released depends on the isolation level. All locks are released on a COMMIT, which is outside of the scope of this article.

Returning Data

Internals_SimpleQuery_5
Once all the data has been gathered, the coordinator agent returns the data to the client.

Summary

Trying to cover every single detail would be overwhelming, but this provides a good idea of what is going on. I would love to hear from readers in the comments below about this type of blog entry, as I have many more planned.

Ember Crooks
Ember Crooks

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, spent a year working with high-volume MySQL, and is now learning Snowflake. Ember shares both posts about her core skill sets and her journey learning Snowflake.

Ember lives in Denver and work from home

Articles: 549

15 Comments

  1. Like it! It might also not be a bad idea to list which registry, DBM, and DB CFG parms may affect differing processes or points. Again, not necessarily going into detail, but just pointing it out may help others to make the connection between them.

  2. Once again great to see a very lucid and crisp write up from you. We usually don’t find these kind of materials in IBM websites.Your article is equally good as of Matt Huras’s DB2 internals presented in IDUG.
    However, i always have a doubt regarding the prefetching part. How DB2 predicts which page is needed? Isnt it more difficult in case of an OLTP environment where the page access is very random? Can you please shed some light on it?

    • It’s true that prefetching cannot help us with the singleton row query. However, there’s much more going on in most e-commerce/OLTP databases that either sequential or list prefetching can help with. I’ll put a blog entry about prefetching high on my to-write list.

  3. Your blog is really fantastic! The best I ever saw! You use a powerfull teaching method to make us understand how db2 is working and how we can extract the best from it.
    Thank you so much.
    And about this type of blog entry, let’s go! 🙂

  4. Thanks for this blog. Do you have something similar for DML statements, which also tells the statement workflow, and also what happens in case of database crash.

  5. Great work, Ember.

    The way you have written about each topic, particularly the one above has more than enough details, with a good flow and explanatory pictures, takes us into a db2 world.
    Enjoyed learning and looking forward to more articles.

    Thanks & Regards.

Leave a Reply to WW_DB2Cancel Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.