So I thought I’d do a short series on analyzing SQL. I want to focus on some of the techniques I use, so not necessarily everything you could know about it, but some basics on one way to do it.
Explain the SQL
I’m not going to cover reading explain plans here as that is a whole other topic that deserves a post of its own. Or more likely a series of posts. Here I’ll just focus on how to generate the information at the command line.
Dynamic SQL
Most of what you’re going to need is probably dynamic SQL. Dynamic SQL is all SQL that is not statically bound to the database through a stored procedure or package. The dynamic SQL method I’ll show you here can handle parameter markers, or the ? (question marks) that you may see in the output of some tools, such as the dynamic SQL snapshot.
If you have never done an explain before you will need to execute the file to build the explain tables first, like this:
db2 -tf $INSTHOME/sqllib/misc/EXPLAIN.DDL
After you have the explain tables in place, the general process is this:
- Place your SQL in a file. We’ll call that file query1.sql for the purposes of this post. Your SQL can come directly from the dynamic sql snapshot, but you must make sure all of your tables are fully qualified with the schema name in the from clause. You should also place a semicolon at the end of the sql statement. Do not put multiple SQL statements in one file.
- At the command line, connect to your database
-
db2 connect to db_name
- At the command line, set your explain mode
-
db2 set current explain mode explain
- “Execute” the query – since your explain mode is set to “explain” the query will NOT be executed, only the explain information will be captured. You will receive the warning indicated, and you can ignore it – this is what you want.
-
db2 -tvf query1.sql
SQL0217W The statement was not executed as only Explain information requests are being processed. SQLSTATE=01604
- Parse the explain output from tables to a file. In the below statement, replace db_name with your database name. The -w -1 means “the last statement for which data was inserted into the explain tables”, and that is why you’re not specifying an input file or query here:
-
db2exfmt -d db_name -w -1 -o query1.exfmt
- Set the explain mode back so that you can run SQL again:
-
db2 set current explain mode no
Stored Procedures
So the easiest way here is to create your stored procedures with the explain stuff built-in on creation:
- On stored Procedure creation, include this statement before the create procedure statement is executed:
CALL SYSPROC.SET_ROUTINE_OPTS('EXPLAIN ALL')@
- Connect to the database
db2 connect to db_name
- If needed (explains have never been done on this database), create the explain tables
cd $INSTHOME/sqllib/misc db2 -tf EXPLAIN.DDL
- Determine the package name associated with the stored procedure
select substr(deps.bschema,1,18) SCHEMA, substr(procs.routinename,1,30) PROCEDURE, substr(deps.bname,1,18) PACKAGE, procs.valid VALID from sysibm.sysdependencies deps, sysibm.sysroutines procs where deps.dtype = 'F' and deps.btype = 'K' and procs.specificname = deps.dname and procs.routineschema = deps.dschema order by 1,2;
- Run db2exfmt for the package name – the output file will include explain plans for all SQL statements in the procedure
db2exfmt -d db_name -s % -n -w -1 -# 0 -o
If you did not create the stored procedure with the proper options, the above will return nothing. In that case, you will have to get the package name from the query in #4 above, and then use db2expln:
db2expln -d db_name -schema wscomusr -package package_name -graph -output sp_explain.db2expln
Run Index Advisor
db2advis -d db_name -i query.sql |tee query1.advis
Decide what action to take on your results
So the place to start is with the index advisor suggestions.
The index advisor gives you an estimated savings for each set of indexes, and that is your first clue. Is it worth it to create indexes that will save you 1%? Maybe, if their execution is a thousand times a second?
Each index may help select performance, but it hurts insert performance. I’m a lot less critical of indexes on catalog related tables like CATGROUP or CATENTRY because they are usually only inserted to during stagingprop, and insert performance is less critical. I’m much more selective about indexes I put on ORDERS, ORDERITEMS, or MEMBER because their insert performance will appear directly to the end-user during the registration or checkout process.
You’ll also want to look at what other indexes on the table already exist. Though you should never drop a base commerce index, sometimes you can replace other custom indexes. If disk space is an issue, you’ll have to consider if the additional space is worth the performance improvement in this case. Indexing always involves some data duplication.
WebSphere Commerce does not allow you to create a unique index on the MEMBER table, but all else is fair game, and a Commerce database without additional custom indexes is almost certainly under-indexed.
I have seen just four non-clustering indexes take a commerce database from unusably slow (site-down, literally) to humming along during a peak period.
One thing you should always do is choose a meaningful index name. Put the index in the same schema as your Commerce tables. At the very least, include the table name in the index name. My favorite tends to be
tablename_IX01
After creating an index on a non-volatile table, always do runstats – the index will not be used until you do so. My favorite syntax is:
db2 "runstats on table qualified.table_name with distribution and detailed indexes all"
Finally, the index advisor doesn’t know everything – after creating an index always use an explain to ensure that the new index is being used. Once you get good with explains you can also find and try indexes which the index advisor misses. Clearly, indexing is a very complex topic that deserves future posts.
Essentially each index should be added with thought and analysis, but you should always be looking for indexes to add.
Actions other than indexing
So if you can’t or already have indexed for a query and it’s still a problem, you need to discuss it with your developers. This is especially true with those SQLs involved in deadlocks – often there’s not much the dba can do to alleviate deadlocking, but the application people on the other hand can do a lot for it. There are some SQL that indicate possible SQL injection, and others that expose a flaw in the way the site is running. Some can also be avoided through caching at the application level.
5 comments