Comparing Db2 and MySQL: Explain

The goal of this article is to to take one query and use the explain facilities of Db2 to walk through it, and then do the same with MySQL, and then show some of the optimizations I might make for the query and why.

A few disclaimers. First, I’m using MySQL 5.7. I understand there may be some more interesting options in MySQL 8, but I continue to use 5.7 because that’s what is in production at work. Second, I have 20 years of experience with Db2 Explains, and only 6 months with MySQL. I welcome comments and suggestions below.

Optimizer

Moving from Db2 to MySQL, I expected the optimizer to be less sophisticated. And that is, indeed the case. For me, that’s one of the known differences between an open-source and a proprietary system. IBM has put decades of paid development into the Db2 optimizer. The way Db2 collects and uses statistics is also much more in-depth than MySQL 5.7.

Query and Data Model

I’ve chosen a data model and a query that is pretty MySQL-centric. I’m using the Sakila database and a the “find overdue DVDs” query from this page in the MySQL documentation.

Db2

For Db2, I pulled the Sakila database from this repo. I do have to change the date syntax, as every database platform somehow has to come up with its own date syntax. <sigh> I’m translating it as literally as possible. Note that the data wasn’t all there from the source, I list, either, and I had to tweak it. This included some kind of confusion between NULL and ”, and missing data in the phone field of the address table. I need to see if they’ll accept PRs to make it better.

Query and Result Set

Here’s the query and result set to level set before digging in to the explain:


SELECT customer.last_name || ', ' || customer.first_name AS customer, address.phone, film.title 
FROM rental 
INNER JOIN customer ON rental.customer_id = customer.customer_id 
INNER JOIN address ON customer.address_id = address.address_id 
INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id 
INNER JOIN film ON inventory.film_id = film.film_id 
WHERE rental.return_date IS NULL 
AND rental_date + film.rental_duration DAYS < CURRENT DATE 
ORDER BY title 
LIMIT 5

CUSTOMER                                                                                     PHONE                TITLE
-------------------------------------------------------------------------------------------- -------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
OLVERA, DWAYNE                                                                               62127829280          ACADEMY DINOSAUR
HUEY, BRANDON                                                                                99883471275          ACE GOLDFINGER
OWENS, CARMEN                                                                                272234298332         AFFAIR PREJUDICE
HANNON, SETH                                                                                 864392582257         AFRICAN EGG
COLE, TRACY                                                                                  371490777743         ALI FOREVER

  5 record(s) selected.

The output is messy here because that's how Db2 formats it by default using my favorite CLI, and I'm not changing the query to make it prettier. Note that the data here is the same as using MySQL below.

Basic Explain

This is my favorite format for doing explain using Db2. I'll share each step here, and then attach the full output, as it's often several pages.

$ db2 set current explain mode explain
DB20000I  The SQL command completed successfully.
$ db2 -tvf query4.sql
SELECT customer.last_name || ', ' || customer.first_name AS customer, address.phone, film.title FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id INNER JOIN address ON customer.address_id = address.address_id INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id INNER JOIN film ON inventory.film_id = film.film_id WHERE rental.return_date IS NULL AND rental_date + film.rental_duration DAYS < CURRENT DATE ORDER BY title LIMIT 5
SQL0217W  The statement was not executed as only Explain information requests
are being processed.  SQLSTATE=01604

$ db2exfmt -d sakila -1 -o query4.exfmt
DB2 Universal Database Version 11.5, 5622-044 (c) Copyright IBM Corp. 1991, 2019
Licensed Material - Program Property of IBM
IBM DATABASE 2 Explain Table Format Tool

Connecting to the Database.
Connect to Database Successful.
Using SYSTOOLS schema for Explain tables.
Output is in query4.exfmt.
Executing Connect Reset -- Connect Reset was Successful.

Note there is no information given there - it's all in the output file query4.exfmt.

Explain Highlights

I could probably teach an hour-long course on everything in this explain output, so let's stick to a few highlights and what they tell us. First of all, I nearly always start with the query graph:

Access Plan:
-----------
    Total Cost:         450.146
    Query Degree:       1


                                               Rows 
                                              RETURN
                                              (   1)
                                               Cost 
                                                I/O 
                                                |
                                                 5 
                                              ^NLJOIN
                                              (   2)
                                              450.146 
                                              262.976 
                              /-----------------+-----------------\
                             5                                       1 
                          ^NLJOIN                                 FETCH 
                          (   3)                                  (  14)
                          389.413                                 13.5336 
                            254                                      2 
                     /------+-------\                           /---+----\
                    5                  1                       1           603 
                 TBSCAN             FETCH                   IXSCAN   TABLE: DB2INST1
                 (   4)             (  12)                  (  15)       ADDRESS
                 328.517            13.5336                 6.76789        Q3
                   245                 2                       1 
                   |              /---+----\                  |
                    5            1           599              603 
                 SORT         IXSCAN   TABLE: DB2INST1  INDEX: DB2INST1
                 (   5)       (  13)      CUSTOMER        PK_ADDRESS
                 328.517      6.76789        Q1               Q3
                   245           1 
                   |            |
                   61           599 
                 HSJOIN^  INDEX: DB2INST1
                 (   6)     PK_CUSTOMER
                 328.515        Q1
                   245 
         /---------+----------\
      1000                      183 
     TBSCAN                   HSJOIN^
     (   7)                   (   8)
     49.3022                  279.192 
       53                       192 
       |              /---------+---------\
      1000         4581                     183 
 TABLE: DB2INST1  FETCH                   TBSCAN
      FILM        (   9)                  (  11)
       Q5         143.584                 135.564 
                    43                      149 
                /---+----\                  |
             4581         4581             16044 
            IXSCAN   TABLE: DB2INST1  TABLE: DB2INST1
            (  10)      INVENTORY         RENTAL
            81.5935        Q4               Q2
              12 
              |
             4581 
        INDEX: DB2INST1
        IDX_FK_FILM_ID
              Q4

This graph is read bottom to top (the things at the bottom happen first, and the things at the top happen last). It's also read right to left (things at the right generally happen before things on the left).

From this graph, I can see that this query takes 450 timerons. Timerons are an abstract measurement of time that cannot be converted to actual units of time, but can be used to compare access plans to one another. It also expects to return 5 rows, which is absolutely great, because it does return 5 rows. This graph shows me every object (table or index) that is accessed and what join methodology is used to join it to the rest of the query. Looking at this, I can immediately see that there are two table scans, and two index accesses with fetching data from the tables themselves. There are several levels of opportunities to improve performance there.

Each node of the graph is called an operator and has an identifier in parentheses. In this graph, the operators are numbered 1 through 15. Each operator also tells us the cost in timerons, which is cumulative of all operators below that one in the graph. This is the number directly below the operator identifier. The top number in each operator tells us the number of rows Db2 expects to be dealing with at this point. These numbers can be off, particularly if statistics are not correct or are not detailed. There are tools for determining that like db2caem.

To get more information on any of the operators, I can scroll down to their section. For example, the details on the table scan of RENTAL (11) look like this:

 11) TBSCAN: (Table Scan)
        Cumulative Total Cost:      135.564
        Cumulative CPU Cost:        3.57339e+07
        Cumulative I/O Cost:        149
        Cumulative Re-Total Cost:   1.77916
        Cumulative Re-CPU Cost:     3.47691e+07
        Cumulative Re-I/O Cost:     0
        Cumulative First Row Cost:  7.46347
        Estimated Bufferpool Buffers:   149

        Arguments:
        ---------
        CUR_COMM: (Currently Committed)
            TRUE
        JN INPUT: (Join input leg)
            INNER
        LCKAVOID: (Lock Avoidance)
            TRUE
        MAXPAGES: (Maximum pages for prefetch)
            ALL
        PREFETCH: (Type of Prefetch)
            SEQUENTIAL          
        ROWLOCK : (Row Lock intent)
            SHARE (CS/RS)
        SCANDIR : (Scan Direction)
            FORWARD
        SKIP_INS: (Skip Inserted Rows)
            TRUE
        SPEED   : (Assumed speed of scan, in sharing structures)
            FAST
        TABLOCK : (Table Lock intent)
            INTENT SHARE
        TBISOLVL: (Table access Isolation Level)
            CURSOR STABILITY
        THROTTLE: (Scan may be throttled, for scan sharing)
            TRUE
        VISIBLE : (May be included in scan sharing structures)
            TRUE
        WRAPPING: (Scan may start anywhere and wrap)
            TRUE

        Predicates:
        ----------
        6) Sargable Predicate, 
            Comparison Operator:        Is Null
            Subquery Input Required:    No
            Filter Factor:          0.0114061

            Predicate Text:
            --------------
            Q2.RETURN_DATE IS NULL



        Input Streams:
        -------------
            7) From Object DB2INST1.RENTAL

                Estimated number of rows:   16044
                Number of columns:      5
                Subquery predicate ID:      Not Applicable

                Column Names:
                ------------
                +Q2.$RID$+Q2.RENTAL_DATE+Q2.RETURN_DATE
                +Q2.INVENTORY_ID+Q2.CUSTOMER_ID


        Output Streams:
        --------------
            8) To Operator #8

                Estimated number of rows:   183
                Number of columns:      3
                Subquery predicate ID:      Not Applicable

                Column Names:
                ------------
                +Q2.RENTAL_DATE+Q2.INVENTORY_ID+Q2.CUSTOMER_ID

Each operator has similar sections, indicating where it is getting input from and where it is sending output to, along with any predicates being applied, and other details about the cost at this node.

Using Explain

The first reason to use explain is always to understand what choices the optimizer is making. Often we will use those choices and the costs to determine where to focus to improve query performance. While there are some red flags, like table scans, to look out for that we may want to address, we also use those cost estimates to figure out where to spend time optimizing. In the case of this query, the costs tell us to focus on the operators below the Hash Scan Join - operator 6. Spending time on the Customer or address branches would bring us limited improvements as the costs there are already quite low. If we could transform the table scan on RENTAL (operator 11) to something else, we might make a significant improvement.

Explain Summary for Db2

I've shared only a very small amount of information here, but the information is there to really dig into different parts of the query. In addition, I have advanced techniques. I can use db2caem to get not just an explain plan, but the actual counts of things like rows if I suspect that Db2 is estimating incorrectly. This could allow me to improve my statistics gathering in a way that gives Db2 the information needed to select a better plan.

I could also do an explain from section of an actual executed instance of this query from the package cache to validate that what explain expects to happen is actually what is happening.

MySQL

I keep feeling like I'm missing something with explain in MySQL, and maybe I am, but the information seems far less dense and useful.

Query and Result Set

This part is nice. Using the MySQL command line, I get nicer output.

mysql> SELECT CONCAT(customer.last_name, ', ', customer.first_name) AS customer,
    ->            address.phone, film.title
    ->            FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id
    ->            INNER JOIN address ON customer.address_id = address.address_id
    ->            INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id
    ->            INNER JOIN film ON inventory.film_id = film.film_id
    ->            WHERE rental.return_date IS NULL
    ->            AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()
    ->            ORDER BY title
    ->            LIMIT 5;
+----------------+--------------+------------------+
| customer       | phone        | title            |
+----------------+--------------+------------------+
| OLVERA, DWAYNE | 62127829280  | ACADEMY DINOSAUR |
| HUEY, BRANDON  | 99883471275  | ACE GOLDFINGER   |
| OWENS, CARMEN  | 272234298332 | AFFAIR PREJUDICE |
| HANNON, SETH   | 864392582257 | AFRICAN EGG      |
| COLE, TRACY    | 371490777743 | ALI FOREVER      |
+----------------+--------------+------------------+
5 rows in set (0.11 sec)

This is the same output we got from Db2. I do like how it tells me how long the query took to execute as well. In Db2, I'd have to wrap it in another command or use db2batch to get that. db2batch isn't a bad idea anyway, as one execution of a query isn't the definitive answer on how long it takes.

Explain

There are actually two forms of explain I would run here to look at things. First the basic explain, then a json format that gives us a few more details.

Basic Explain

The basic output of explain looks like this:

mysql> EXPLAIN SELECT CONCAT(customer.last_name, ', ', customer.first_name) AS customer,
    ->            address.phone, film.title
    ->            FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id
    ->            INNER JOIN address ON customer.address_id = address.address_id
    ->            INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id
    ->            INNER JOIN film ON inventory.film_id = film.film_id
    ->            WHERE rental.return_date IS NULL
    ->            AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()
    ->            ORDER BY title
    ->            LIMIT 5;
+----+-------------+-----------+------------+--------+----------------------------------------+---------+---------+----------------------------+------+----------+----------------------------------------------+
| id | select_type | table     | partitions | type   | possible_keys                          | key     | key_len | ref                        | rows | filtered | Extra                                        |
+----+-------------+-----------+------------+--------+----------------------------------------+---------+---------+----------------------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | rental    | NULL       | ALL    | idx_fk_inventory_id,idx_fk_customer_id | NULL    | NULL    | NULL                       |    1 |   100.00 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | customer  | NULL       | eq_ref | PRIMARY,idx_fk_address_id              | PRIMARY | 2       | sakila.rental.customer_id  |    1 |   100.00 | NULL                                         |
|  1 | SIMPLE      | address   | NULL       | eq_ref | PRIMARY                                | PRIMARY | 2       | sakila.customer.address_id |    1 |   100.00 | NULL                                         |
|  1 | SIMPLE      | inventory | NULL       | eq_ref | PRIMARY,idx_fk_film_id                 | PRIMARY | 3       | sakila.rental.inventory_id |    1 |   100.00 | NULL                                         |
|  1 | SIMPLE      | film      | NULL       | eq_ref | PRIMARY                                | PRIMARY | 2       | sakila.inventory.film_id   |    1 |   100.00 | Using where                                  |
+----+-------------+-----------+------------+--------+----------------------------------------+---------+---------+----------------------------+------+----------+----------------------------------------------+
5 rows in set, 1 warning (0.02 sec)

Just to confuse me, this output is read top to bottom, and not bottom to top like the Db2 explain output.

The nice part about this output is that it gives me one and only one line for each table being accessed, and describes which index is being used to access it. It tells me for each table how many rows it expects to be using, but these numbers are based on much more simple statistics than I'm used to in Db2, so may be less accurate. "filtered" does not mean what we might think it to mean in Db2. It indicates what percentage of the rows it thinks it is working with will meet the other conditions for that table. Notice here that it tells us here that it expects to only be working with one row all the way along, when we end up with 5 rows.

The "type" column is sometimes called join type, but is really access type. The access types are well-documented in the MySQL documentation - go to the section titled "EXPLAIN Join Types". One of the big things we're looking for in the explain output here is things we know are not great like table scans (join type: ALL). Here the table scan on the rental table once again stands out.

While the MySQL optimizer is at least partially cost-based, this format of explain has no mention of costs.

JSON Explain

mysql> EXPLAIN SELECT CONCAT(customer.last_name, ', ', customer.first_name) AS customer,            address.phone, film.title            FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id            INNER JOIN address ON customer.address_id = address.address_id            INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id            INNER JOIN film ON inventory.film_id = film.film_id            WHERE rental.return_date IS NULL            AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()            ORDER BY title            LIMIT 5;^C
mysql> EXPLAIN FORMAT=JSON SELECT CONCAT(customer.last_name, ', ', customer.first_name) AS customer,
    ->            address.phone, film.title
    ->            FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id
    ->            INNER JOIN address ON customer.address_id = address.address_id
    ->            INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id
    ->            INNER JOIN film ON inventory.film_id = film.film_id
    ->            WHERE rental.return_date IS NULL
    ->            AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()
    ->            ORDER BY title
    ->            LIMIT 5;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "2.75"
    },
    "ordering_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "1.00"
      },
      "nested_loop": [
        {
          "table": {
            "table_name": "rental",
            "access_type": "ALL",
            "possible_keys": [
              "idx_fk_inventory_id",
              "idx_fk_customer_id"
            ],
            "rows_examined_per_scan": 1,
            "rows_produced_per_join": 1,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "0.25",
              "eval_cost": "0.10",
              "prefix_cost": "0.35",
              "data_read_per_join": "32"
            },
            "used_columns": [
              "rental_id",
              "rental_date",
              "inventory_id",
              "customer_id",
              "return_date"
            ],
            "attached_condition": "(`sakila`.`rental`.`return_date` is null)"
          }
        },
        {
          "table": {
            "table_name": "customer",
            "access_type": "eq_ref",
            "possible_keys": [
              "PRIMARY",
              "idx_fk_address_id"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "customer_id"
            ],
            "key_length": "2",
            "ref": [
              "sakila.rental.customer_id"
            ],
            "rows_examined_per_scan": 1,
            "rows_produced_per_join": 1,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "0.25",
              "eval_cost": "0.10",
              "prefix_cost": "0.70",
              "data_read_per_join": "584"
            },
            "used_columns": [
              "customer_id",
              "first_name",
              "last_name",
              "address_id"
            ]
          }
        },
        {
          "table": {
            "table_name": "address",
            "access_type": "eq_ref",
            "possible_keys": [
              "PRIMARY"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "address_id"
            ],
            "key_length": "2",
            "ref": [
              "sakila.customer.address_id"
            ],
            "rows_examined_per_scan": 1,
            "rows_produced_per_join": 1,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "0.25",
              "eval_cost": "0.10",
              "prefix_cost": "1.05",
              "data_read_per_join": "632"
            },
            "used_columns": [
              "address_id",
              "phone"
            ]
          }
        },
        {
          "table": {
            "table_name": "inventory",
            "access_type": "eq_ref",
            "possible_keys": [
              "PRIMARY",
              "idx_fk_film_id"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "inventory_id"
            ],
            "key_length": "3",
            "ref": [
              "sakila.rental.inventory_id"
            ],
            "rows_examined_per_scan": 1,
            "rows_produced_per_join": 1,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "0.25",
              "eval_cost": "0.10",
              "prefix_cost": "1.40",
              "data_read_per_join": "16"
            },
            "used_columns": [
              "inventory_id",
              "film_id"
            ]
          }
        },
        {
          "table": {
            "table_name": "film",
            "access_type": "eq_ref",
            "possible_keys": [
              "PRIMARY"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "film_id"
            ],
            "key_length": "2",
            "ref": [
              "sakila.inventory.film_id"
            ],
            "rows_examined_per_scan": 1,
            "rows_produced_per_join": 1,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "0.25",
              "eval_cost": "0.10",
              "prefix_cost": "1.75",
              "data_read_per_join": "552"
            },
            "used_columns": [
              "film_id",
              "title",
              "rental_duration"
            ],
            "attached_condition": "((`sakila`.`rental`.`rental_date` + interval `sakila`.`film`.`rental_duration` day) < (curdate()))"
          }
        }
      ]
    }
  }
} |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.02 sec)

This has a bit more information, including some cost metrics. I'm still not sure how useful they are. I've seen the 'data_read_per_join' be wildly inaccurate - to the tune of TB, when there's no way it was dealing with that much data. But it can still be used to find parts of the access plan that are more expensive than others, probably.

I haven't yet seen a solid number like Db2's timerons to use to compare two access plans. I've found that particularly frustrating, as with Db2, I can say that an access plan with some tweaks will be more or less expensive and by approximately how much.

Tweaking the query and adding an index

Ok, so this query happens to display one of my favorite performance problems. This query is performing date math on columnar data, pretty much guaranteeing a table scan (or maybe an index leaf page scan). I'm curious to see what happens to the access plan on both platforms if I correct that math to apply to the comparison value and add an index on the columns in question.

Db2

For Db2, I add the index, and change the syntax, and redo the explain:

 db2 "create index idx_rental_query4 on rental (return_date, rental_date, inventory_id, customer_id) allow reverse scans collect detailed statistics"
DB20000I  The SQL command completed successfully.
[db2inst1@d8ab9ae65554 ~]$ db2 set current explain mode explain
DB20000I  The SQL command completed successfully.
[db2inst1@d8ab9ae65554 ~]$ db2 -tvf query4a.sql
SELECT customer.last_name || ', ' || customer.first_name AS customer, address.phone, film.title 
FROM rental 
INNER JOIN customer ON rental.customer_id = customer.customer_id 
INNER JOIN address ON customer.address_id = address.address_id 
INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id I
NNER JOIN film ON inventory.film_id = film.film_id 
WHERE rental.return_date IS NULL 
AND rental_date < CURRENT DATE - film.rental_duration DAYS 
ORDER BY title 
LIMIT 5
SQL0217W  The statement was not executed as only Explain information requests
are being processed.  SQLSTATE=01604

The access plan I see is clearly about 25% less in cost than before:


Access Plan:
-----------
        Total Cost:             331.09
        Query Degree:           1


                                               Rows
                                              RETURN
                                              (   1)
                                               Cost
                                                I/O
                                                |
                                                 5
                                              ^NLJOIN
                                              (   2)
                                              331.09
                                              116.413
                              /-----------------+-----------------\
                             5                                       1
                          ^NLJOIN                                 FETCH
                          (   3)                                  (  14)
                          270.357                                 13.5336
                          107.437                                    2
                     /------+-------\                           /---+----\
                    5                  1                       1           603
                 TBSCAN             FETCH                   IXSCAN   TABLE: DB2INST1
                 (   4)             (  12)                  (  15)       ADDRESS
                 209.461            13.5336                 6.76789        Q3
                 98.4372               2                       1
                   |              /---+----\                  |
                    5            1           599              603
                 SORT         IXSCAN   TABLE: DB2INST1  INDEX: DB2INST1
                 (   5)       (  13)      CUSTOMER        PK_ADDRESS
                 209.461      6.76789        Q1               Q3
                 98.4372         1
                   |            |
                 50.8046        599
                 HSJOIN^  INDEX: DB2INST1
                 (   6)     PK_CUSTOMER
                 209.459        Q1
                 98.4372
         /---------+----------\
      1000                      183
     TBSCAN                   HSJOIN^
     (   7)                   (   8)
     49.3022                  160.136
       53                     45.4372
       |              /---------+----------\
      1000         4581                      183
 TABLE: DB2INST1  FETCH                    IXSCAN
      FILM        (   9)                   (  11)
       Q5         143.584                  16.5075
                    43                     2.43717
                /---+----\                   |
             4581         4581              16044
            IXSCAN   TABLE: DB2INST1   INDEX: DB2INST1
            (  10)      INVENTORY     IDX_RENTAL_QUERY4
            81.5935        Q4                Q2
              12
              |
             4581
        INDEX: DB2INST1
        IDX_FK_FILM_ID
              Q4 

Looking at this explain, I can see I might get further optimizations by getting a better index on inventory, if index-only access is appropriate for this query, and I might also spend some further time seeing if I can optimize access to film.

Note also, that using db2advis would have given me some of this, but might have also sent me on a goose chase adding indexes on customer and address that would only be appropriate if performance of this particular query really was critical enough to aim for index-only access across the board.

MySQL

Performing the same actions in MySQL - changing the syntax around the date, creating the index, and rerunning the explain looks like this:

mysql> EXPLAIN  SELECT CONCAT(customer.last_name, ', ', customer.first_name) AS customer,
    ->            address.phone, film.title
    ->            FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id
    ->            INNER JOIN address ON customer.address_id = address.address_id
    ->            INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id
    ->            INNER JOIN film ON inventory.film_id = film.film_id
    ->            WHERE rental.return_date IS NULL
    ->            AND rental_date < CURRENT_DATE() - INTERVAL film.rental_duration DAY
    ->            ORDER BY title
    ->            LIMIT 5;
+----+-------------+-----------+------------+--------+----------------------------------------------------------------------+-------------------+---------+----------------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table     | partitions | type   | possible_keys                                                        | key               | key_len | ref                        | rows | filtered | Extra                                                     |
+----+-------------+-----------+------------+--------+----------------------------------------------------------------------+-------------------+---------+----------------------------+------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | rental    | NULL       | ref    | rental_date,idx_fk_inventory_id,idx_fk_customer_id,idx_rental_query4 | idx_rental_query4 | 6       | const                      |  183 |   100.00 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | customer  | NULL       | eq_ref | PRIMARY,idx_fk_address_id                                            | PRIMARY           | 2       | sakila.rental.customer_id  |    1 |   100.00 | NULL                                                      |
|  1 | SIMPLE      | address   | NULL       | eq_ref | PRIMARY                                                              | PRIMARY           | 2       | sakila.customer.address_id |    1 |   100.00 | NULL                                                      |
|  1 | SIMPLE      | inventory | NULL       | eq_ref | PRIMARY,idx_fk_film_id                                               | PRIMARY           | 3       | sakila.rental.inventory_id |    1 |   100.00 | NULL                                                      |
|  1 | SIMPLE      | film      | NULL       | eq_ref | PRIMARY                                                              | PRIMARY           | 2       | sakila.inventory.film_id   |    1 |   100.00 | Using where                                               |
+----+-------------+-----------+------------+--------+----------------------------------------------------------------------+-------------------+---------+----------------------------+------+----------+-----------------------------------------------------------+
5 rows in set, 1 warning (0.00 sec)

We can still see that things are clearly better, but other than eliminating a table scan, it is hard to say by how much. Additionally, there's not much to point me into which direction further optimization makes sense. Perhaps further optimization isn't really needed, since the PRIMARY index includes all table data to begin with.

Summary

Writing this blog article is actually a bit cathartic for me, and helps me name why I'm finding using explain in MySQL frustrating. It just doesn't have some of the details I'm used to. I'm sure in time, I'll see the patterns in MySQL explain more quickly, without the greater detail that I'm used to.

Ember Crooks
Ember Crooks

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, spent a year working with high-volume MySQL, and is now learning Snowflake. Ember shares both posts about her core skill sets and her journey learning Snowflake.

Ember lives in Denver and work from home

Articles: 554

3 Comments

  1. MySQL has a really simple executor, at least for most cases — it always does nested loop joins. That also makes it pretty bad at DWH queries.

    With that and no subqueries in the statement, the cost of the query in the tabular simple explain is the product of the values shown in the rows column, the “badness” of the query. Badness is not comparable between different results, but with the same result set produced, the query with the lower badness is usually better. That seems to be matching the concept of timerons for DB2?

    • I hadn’t heard the product of rows thing before. I’m still a bit mystified how the first explain in MySQL only shows one row for the table scan, while the second with the added index and altered date syntax shows 183. 183 is the correct number of rows where return_date is NULL.

      • Definitively weird, or even wrong. The explains says “ALL” in type, so it is a full table scan. It also says “NULL” in ref, so no index can be used. This is confirmed with key and key_len also being NULL. The rows should show the estimated table size for “rental” from “INFORMATION_SCHEMA.TABLES.TABLE_ROWS”.

        It further says “where used”, so additional conditions are being applied to the table scan while emitting found rows into the inner loop of the loop join. The result is materialied in a temporary table, and is sorted (that is applied at the end).

        All other operations are “eq_ref” in type, and to the Unique Not-Null Index named PRIMARY (the primary key), so they are bound to be 1 in cost.

        All of that makes sense for the (limited) way MySQL works, except for the wrong number in rows. If you can reproduce that, it would be interesting to see if that goes away after an ANALYZE TABLE on the table “rental”.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.