Top 10 tips for SQL performance on DB2

Ok, so everyone has their top 10, and some of them are significantly different – but here’s my top 10 on SQL performance for DB2. I mostly share this information with developers, but it’s important for a DBA to be aware of these things. Since it’s directed more at developers, it ignores some of the basics like “do runstats” and “do rerogs”. This top 10 is directed purely at WRITING sql, and not full database support.

Where is the cost?

Some actions can occur either in the SQL or in the Application. Some of these recommendations assume that the cost is reduced at the database level, but do not take into account the extra cost at the application level. I don’t expect that every query meet all of these suggestions – sometimes you simply have to use an inefficient method to get the data the way you need it.

 

These recommendations are also focused on  OLTP databases which favor small fast queries.

What we aim for

We want:

•Index-only access when possible – where all data can be retrieved from indexes without looking data up in the table itself
•Accurate estimates on the number of rows we’re dealing with

We don’t want:

•Table scans – where db2 has to examine each and every row in a table to  determine which rows meet the query criteria
•Sorts – some are inevitable, but they can be extremely time consuming depending on where they appear in the query. Sorts are done not only when explicitly called for, but also when db2 decides they are part of the most efficient access plan
•Nested-loop joins – again, many queries will require these, but we’d like other join methods where possible

Predicate types from most to least efficient:

1.Range-delimiting predicates – start/stop condition narrows down the pages of the index
2.Index-sargable predicates – columns from a predicate are part of an index key in the right order
3.Data-sargable predicates – predicates can be evaluated on data pages in the buffer
4.Residual predicates – require I/O beyond the buffer – include subqueires or  long/LOB data

The 10 tips:

  1. Select only that which you need. Avoid SELECT *, opting instead for specifying the smallest number of columns that will meet your needs.
    • Each returned column has an overhead that is multiplied by the number of rows
    • If there are too many columns then index-only access will not be an option
  2. Lock not, lest ye be locked. Specify FOR READ ONLY and WITH UR if these restrictions meet your needs – they will reduce locking
    • Even if you don’t see errors due to locking, you may still impact performance time not just of this query, but of other queries
  3. Avoid functions and conversions when possible – CHAR, UPPER, etc. These will limit the use of indexes.
    • The most important place to do this is on the join predicates – and remember that join predicates include those not using the “join” keyword. Using functions or even expressions in join predicates also limits the join method to Nested Loop and may produce inaccurate results
    • The second most important place to do this is in the where clause
    • This may also be useful in the column list, depending on circumstances
  4. Avoid Sorts. Some sorts are expensive – avoid them where possible. Large sorts may spill from memory to disk, drastically impacting their performance.
  5. Avoid DISTINCT and ORDER BY when possible
    • These often require sorts, and sorts are frequently time eaters
    • A GROUP BY may be a better choice than DISTINCT
  6. In-lists are out. Avoid the use of “IN” lists where possible
    • Multiple smaller queries may be more efficient
  7. Ensure you are specifying the appropriate join predicates to avoid Cartesian joins. This may seem obvious, but a really bad query can actually cause a database to be unresponsive to other users – something you want to avoid, especially in production.
  8. Play fetch with as few sticks as possible. If using the FETCH FIRST N ROWS ONLY clause, include the OPTIMIZE FOR N ROWS clause.
    • Strangely enough, DB2 does not handle this for you.
  9. To parameter or not to parameter. Consider where to pass in specific values
    • If Parameter Markers are used, DB2 does not make use of distribution statistics
    • If Parameter Markers are not used, then DB2 re-calculates the access plan every time you run the query
    • Balance the reduced preparation time with the performance advantaged gained by the use of statistics – this is especially important when your data is not distributed in a normal fashion.
  10. Consider using temporary tables for repeatedly accessed data. I’ve actually seen the same stored procedure called 3 times with the same parameters in a single query, and being one that determines the distance from nearby stores, it was an expensive one. Storing such data in a temporary table before executing the query might be a good idea.

 

 

Ember Crooks
Ember Crooks

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, spent a year working with high-volume MySQL, and is now learning Snowflake. Ember shares both posts about her core skill sets and her journey learning Snowflake.

Ember lives in Denver and work from home

Articles: 545

4 Comments

  1. Play fetch with as few sticks as possible – I actually didn’t get the meaning of that. Does it mean “do not fetch first n rows only being n a big number” ?

    • That item is meant to specify that both if you only need 5 rows, specify `FETCH FIRST 5 ROWS ONLY`, and also that if you specify a `FETCH FIRST X ROWS ONLY` clause, you should also specify `OPTIMIZE FOR X ROWS`.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.