Many times, the SQL analysis I do is extremely focused – mostly on SQL that is a proven resource hog or a suspected problem. Analyzing all SQL used in a particular process can be a bit different, and may not be something that is frequently done in WebSphere Commerce databases.
Recently, I had the output of statement event monitors while specific tasks were run against a development database, and was asked to do the more general tuning instead of the focused tuning that I normally do.
This post covers some tips for using db2advis or the design advisor at the command line with multiple queries as input.
In my case the input came from some processing that IBM did over the formatted output files from statement event monitors. You don’t really have to have that data – all that is really required is the SQL statements and the frequency at which they are executed within the workload. Using that input, I generated a file that looked in part like this:
--#SET FREQUENCY 4 SELECT ATTRIBUTE.ATTRIBUTE_ID, ATTRIBUTE.LANGUAGE_ID, ATTRIBUTE.ATTRTYPE_ID, ATTRIBUTE.NAME, ATTRIBUTE.SEQUENCE, ATTRIBUTE.DESCRIPTION, ATTRIBUTE.CATENTRY_ID, ATTRIBUTE.DESCRIPTION2, ATTRIBUTE.FIELD1, ATTRIBUTE.OID, ATTRIBUTE.USAGE, ATTRIBUTE.QTYUNIT_ID, ATTRIBUTE.GROUPNAME, ATTRIBUTE.NOTEINFO, ATTRIBUTE.MULTITYPE, ATTRIBUTE.OPTCOUNTER FROM ATTRIBUTE WHERE ATTRIBUTE.CATENTRY_ID IN (?) AND ATTRIBUTE.LANGUAGE_ID IN (?) ORDER BY ATTRIBUTE.SEQUENCE; --#SET FREQUENCY 4 SELECT DISTINCT STOREDEFCAT.STOREDEFCAT_ID, STOREDEFCAT.STOREENT_ID, STOREDEFCAT.CATALOG_ID, STOREDEFCAT.FIELD1, STOREDEFCAT.FIELD2, STOREDEFCAT.FIELD3, STOREDEFCAT.OPTCOUNTER FROM STOREDEFCAT WHERE STOREDEFCAT.STOREENT_ID IN (?, ?); --#SET FREQUENCY 3 SELECT T1.TOTALTAX, T1.LOCKED, T1.TOTALTAXSHIPPING, T1.STATUS, T1.FIELD2, T1.TIMEPLACED, T1.FIELD3, T1.CURRENCY, T1.SEQUENCE, T1.TOTALADJUSTMENT, T1.ORMORDER, T1.SHIPASCOMPLETE, T1.PROVIDERORDERNUM, T1.TOTALPRODUCT, T1.DESCRIPTION, T1.MEMBER_ID, T1.ORGENTITY_ID, T1.FIELD1, T1.STOREENT_ID, T1.ORDCHNLTYP_ID, T1.ADDRESS_ID, T1.LASTUPDATE, T1.ORDERS_ID, T1.COMMENTS, T1.NOTIFICATIONID, T1.TYPE, T1.EDITOR_ID, T1.OPTCOUNTER, T1.SOURCEID, T1.EXPIREDATE, T1.BUSCHN_ID, T1.BLOCKED, T1.TOTALSHIPPING, T1.TRANSFERSTATUS, T1.OPSYSTEM_ID, T1.BUYERPO_ID FROM ORDERS T1 WHERE (T1.STATUS = ?) AND (T1.MEMBER_ID = ?) AND (T1.STOREENT_ID = ?) ORDER BY T1.ORDERS_ID; ...
In the example above, the SET FREQUENCY statements are in the format required by db2advis. The SQL is properly formed, includes parameter markers (represented by question marks), and does not usually qualify tables with the schema name.
Once I had this, I placed it in a file called checkout_logon_all.sql in a working directory I use for explains and db2advis. My file had 97 SQL statements in it. Once that file existed, I could use it as my input for db2advis.
The next step is to execute db2advis, providing the file as input. That looks like this:
> db2advis -d SAMPLE -q WSCOMUSR -i checkout_logon_all.sql |tee checkout_logon_all.sql.advis Using user id as default schema name. Use -n option to specify schema execution started at timestamp 2013-06-14-17.33.03.220139 found  SQL statements from the input file Recommending indexes... total disk space needed for initial set [ 0.712] MB total disk space constrained to [ 156.057] MB Trying variations of the solution set. Optimization finished. 11 indexes in current solution [4382.5830] timerons (without recommendations) [3945.0443] timerons (with current solution) [9.98%] improvement ...
The ‘WSCOMUSR’ above is the schema name that should be assumed for any unqualified tables in the SQL. The information there tells us that 11 indexes were found that could help, with an overall improvement for the workload of 9.98%. I’m pretty sure that expected improvement does not count any slowdown to insert/update statements due to additional indexes.
Being an enlightened DBA, you should know that blindly following all suggestions of db2advis is never a good idea. I have other workloads that run against these tables beyond this one process, and have to consider the costs of additional and duplicate indexes. So how do you determine which of the 11 indexes might make sense?
At the bottom of db2advis output is a section of XML, which I generally ignore when using db2advis to look at a single statement. But it is pure gold when you’re using db2advis to look at a full workload like this.
After doing a bit of formatting (removing the leading — on every line, and replacing < and > within your code with
& l t ; and
& g t ;), I import the XML into Excel, and get something that looks like this(click the image to see it full-size and clearly):
This is a list of every index used by the queries in the list – both the newly recommended ones and existing ones. Pulling the data into a spreadsheet lets you sort on various columns. The “/index/benefit” column is in timerons, and shows you how much benefit in timerons you can expect to see by adding a particular index. Looking at them by table(/index/table/identifier/name) I find particularly interesting, as there are some tables I will more freely add indexes on based on how important insert/update performance. The “/index/statementlist” column lists which statements in the workload a particular index applies to. There is also additional data in that XML – you’ll find it down and to the right when you’ve imported the data into Excel. It lists things out by statement – so you can see the impact that each index has on each statement. I’ll be looking at that in more detail on some of my single-SQL analysis to see if it looks correct and useful.
Obviously this kind of analysis has a very specific usage. You’re not going to be running this every day. As always, indexing decisions take careful analysis. There’s nothing here that will do that analysis for you, but it may provide some additional tools.