SQL Analysis Overview

Posted by

So I thought I’d do a short series on analyzing SQL. I want to focus on some of the techniques I use, so not necessarily everything you could know about it, but some basics on one way to do it.

I’m going to focus on what you can do manually with no pay-for-use or even tools that require additional installation tasks. There are multiple ways to do this, so don’t be afraid you’re doing it the “wrong” way. This stuff can also be done through the GUI, and for 9.7, maybe through Optim too.
You’ll either start with new SQL that you want to make sure isn’t too bad, or you’ll want to get problem SQL – possibly as defined in my post on that.
You’ll also want to make sure runstats are current before running any of this, as it is highly dependent on runstats.

Explain the SQL

I’m not going to cover reading explain plans here as that is a whole other topic that deserves a post of its own. Or more likely a series of posts. Here I’ll just focus on how to generate the information at the command line.

Dynamic SQL

Most of what you’re going to need is probably dynamic SQL. Dynamic SQL is all SQL that is not statically bound to the database through a stored procedure or package. The dynamic SQL method I’ll show you here can handle parameter markers, or the ? (question marks) that you may see in the output of some tools, such as the dynamic SQL snapshot.

If you have never done an explain before you will need to execute the file to build the explain tables first, like this:

db2 -tf $INSTHOME/sqllib/misc/EXPLAIN.DDL

After you have the explain tables in place, the general process is this:

  1. Place your SQL in a file. We’ll call that file query1.sql for the purposes of this post. Your SQL can come directly from the dynamic sql snapshot, but you must make sure all of your tables are fully qualified with the schema name in the from clause. You should also place a semicolon at the end of the sql statement. Do not put multiple SQL statements in one file.
  2. At the command line, connect to your database
    • db2 connect to db_name
  3. At the command line, set your explain mode
    • db2 set current explain mode explain
  4. “Execute” the query – since your explain mode is set to “explain” the query will NOT be executed, only the explain information will be captured. You will receive the warning indicated, and you can ignore it – this is what you want.
    • db2 -tvf query1.sql
      SQL0217W  The statement was not executed as only Explain information requests
      are being processed.  SQLSTATE=01604
  5. Parse the explain output from tables to a file. In the below statement, replace db_name with your database name. The -w -1 means “the last statement for which data was inserted into the explain tables”, and that is why you’re not specifying an input file or query here:
    • db2exfmt -d db_name -w -1 -o query1.exfmt
  6. Set the explain mode back so that you can run SQL again:
    • db2 set current explain mode no
You now have the full explain plan, including graph in the query1.exfmt file. You can read it with any text editor. If it’s a really complex query, then it might be hard to scroll across a very wide explain plan – you may want to copy the data into notepad and turn word wrap off to make reading it easier.

Stored Procedures

So the easiest way here is to create your stored procedures with the explain stuff built-in on creation:

  1. On stored Procedure creation, include this statement before the create procedure statement is executed:
     CALL SYSPROC.SET_ROUTINE_OPTS('EXPLAIN ALL')@
  2. Connect to the database
    db2 connect to db_name
  3. If needed (explains have never been done on this database), create the explain tables
    cd $INSTHOME/sqllib/misc
    db2 -tf EXPLAIN.DDL
  4. Determine the package name associated with the stored procedure
    select  substr(deps.bschema,1,18) SCHEMA,
            substr(procs.routinename,1,30) PROCEDURE,
            substr(deps.bname,1,18) PACKAGE,
            procs.valid VALID
      from  sysibm.sysdependencies deps,
            sysibm.sysroutines procs
     where  deps.dtype = 'F'
       and  deps.btype = 'K'
       and  procs.specificname  = deps.dname
       and  procs.routineschema = deps.dschema
     order  by 1,2;
  5. Run db2exfmt for the package name – the output file will include explain plans for all SQL statements in the procedure
    db2exfmt -d db_name -s % -n  -w -1 -# 0 -o

    If you did not create the stored procedure with the proper options, the above will return nothing. In that case, you will have to get the package name from the query in #4 above, and then use db2expln:

    db2expln -d db_name -schema wscomusr -package package_name -graph -output sp_explain.db2expln

Run Index Advisor

(or Design Advisor or whatever they’re calling it these days)
This one is a bit simpler. Using the file you created for the explain above:
db2advis -d db_name -i query.sql |tee query1.advis
The trick here is NEVER DROP THE INDEXES IT RECOMMENDS YOU DROP. Those recommendations are made with the idea that this query is the only load against the tables in question, so when you’re running index advisor for a single or small set of queries like this, you must always ignore the recommendations for dropping indexes. Please read the next section before creating any indexes – there’s more analysis to be done.

Decide what action to take on your results

So the place to start is with the index advisor suggestions.

The index advisor gives you an estimated savings for each set of indexes, and that is your first clue. Is it worth it to create indexes that will save you 1%? Maybe, if their execution is a thousand times a second?

Each index may help select performance, but it hurts insert performance. I’m a lot less critical of indexes on catalog related tables like CATGROUP or CATENTRY because they are usually only inserted to during stagingprop, and insert performance is less critical. I’m much more selective about indexes I put on ORDERS, ORDERITEMS, or MEMBER because their insert performance will appear directly to the end-user during the registration or checkout process.

You’ll also want to look at what other indexes on the table already exist. Though you should never drop a base commerce index, sometimes you can replace other custom indexes. If disk space is an issue, you’ll have to consider if the additional space is worth the performance improvement in this case. Indexing always involves some data duplication.

WebSphere Commerce does not allow you to create a unique index on the MEMBER table, but all else is fair game, and a Commerce database without additional custom indexes is almost certainly under-indexed.

I have seen just four non-clustering indexes take a commerce database from unusably slow (site-down, literally) to humming along during a peak period.

One thing you should always do is choose a meaningful index name. Put the index in the same schema as your Commerce tables. At the very least, include the table name in the index name. My favorite tends to be

tablename_IX01

After creating an index on a non-volatile table, always do runstats – the index will not be used until you do so. My favorite syntax is:

db2 "runstats on table qualified.table_name with distribution and detailed indexes all"

Finally, the index advisor doesn’t know everything – after creating an index always use an explain to ensure that the new index is being used. Once you get good with explains you can also find and try indexes which the index advisor misses. Clearly, indexing is a very complex topic that deserves future posts.

Essentially each index should be added with thought and analysis, but you should always be looking for indexes to add.

Actions other than indexing

So if you can’t or already have indexed for a query and it’s still a problem, you need to discuss it with your developers. This is especially true with those SQLs involved in deadlocks – often there’s not much the dba can do to alleviate deadlocking, but the application people on the other hand can do a lot for it. There are some SQL that indicate possible SQL injection, and others that expose a flaw in the way the site is running. Some can also be avoided through caching at the application level.

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, and is now pivoting to focus on learning MySQL. Ember shares both posts about her core skill set and her journey learning MySQL. Ember lives in Denver and work from home

5 comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.