Go-lives bring all kinds of opportunities to find problem SQL. A new site going live or a new site design going live can put a completely different load of SQL on a database. Depending on the level of load testing done, there can be some problems still to find. This post is an example of just one problem SQL statement (in this case, a delete) found and mitigated through database level analysis.
Finding the Problem
After a recent go-live for a client that did not do much load testing, I found the following delete statement showing up as a potential problem:
STATEMENT ROWS_READ PCT_TOT_RR ROWS_RETURNED READ_EFFICIENCY NUM_EXECUTIONS -------------------------------- -------------------- ---------- -------------------- --------------- -------------------- delete from EDPPAYINST where ... 47771786 32.00 0 -1.00 11
That’s in an hour. And a good example why I don’t look only for statements with poor read efficiency – notice that since this query doesn’t return any rows, read efficiency is not applicable here. And sure, it’s only executed 11 times an hour, but it is doing that every hour, and managing to be one of the top two queries in terms of rows read in my database. There is no need to read those 40 million plus rows if we don’t have to.
Note that there were other problem queries I found here – I’m just pulling this one out as an example.
Analyzing the Statement
First, I put the statement in a file – go-live_query1.sql – and made sure the tables were fully qualified with the schema name – something WebSphere Commerce does not do.
Then I captured explain information and generated the explain plan:
db2 connect to SAMPLE Database Connection Information Database server = DB2/LINUXX8664 9.7.6 SQL authorization ID = DB2INST1 Local database alias = SAMPLE $ db2 set current explain mode explain DB20000I The SQL command completed successfully. $ db2 -tvf go-live_query1.sql delete from wscomusr.EDPPAYINST where EDPORDER_ID in (select EDPORDER_ID from wscomusr.EDPORDER where ORDER_ID in (select ORDERS_ID from wscomusr.ORDERS where STATUS = 'P' and MEMBER_ID = ? )) SQL0217W The statement was not executed as only Explain information requests are being processed. SQLSTATE=01604 $ db2exfmt -d SAMPLE -1 -o go-live_query1.sql.exfmt DB2 Universal Database Version 9.7, 5622-044 (c) Copyright IBM Corp. 1991, 2009 Licensed Material - Program Property of IBM IBM DATABASE 2 Explain Table Format Tool Connecting to the Database. Connect to Database Successful. Output is in go-live_query1.sql.exfmt. Executing Connect Reset -- Connect Reset was Successful.
Now, I have the explain information in the file that looks like this (the top portion anyway):
DB2 Universal Database Version 9.7, 5622-044 (c) Copyright IBM Corp. 1991, 2009 Licensed Material - Program Property of IBM IBM DATABASE 2 Explain Table Format Tool ******************** EXPLAIN INSTANCE ******************** DB2_VERSION: 09.07.6 SOURCE_NAME: SQLC2H23 SOURCE_SCHEMA: NULLID SOURCE_VERSION: EXPLAIN_TIME: 2013-04-05-08.29.39.064276 EXPLAIN_REQUESTER: DB2INST1 Database Context: ---------------- Parallelism: None CPU Speed: 2.401083e-07 Comm Speed: 100 Buffer Pool size: 5985820 Sort Heap size: 2500 Database Heap size: 2467 Lock List size: 778829 Maximum Lock List: 98 Average Applications: 1 Locks Available: 24424078 Package Context: --------------- SQL Type: Dynamic Optimization Level: 5 Blocking: Block All Cursors Isolation Level: Cursor Stability ---------------- STATEMENT 1 SECTION 203 ---------------- QUERYNO: 2 QUERYTAG: CLP Statement Type: Searched Delete Updatable: Not Applicable Deletable: Not Applicable Query Degree: 1 Original Statement: ------------------ delete from wscomusr.EDPPAYINST where EDPORDER_ID in (select EDPORDER_ID from wscomusr.EDPORDER where ORDER_ID in (select ORDERS_ID from wscomusr.ORDERS where STATUS = 'P' and MEMBER_ID = ?)) Optimized Statement: ------------------- $WITH CONTEXT$($CONSTRAINT$(WSCOMUSR.EDPATMPAY.F_1097), $CONSTRAINT$(WSCOMUSR.EDPRELHIST.F_1101), $CONSTRAINT$(WSCOMUSR.EDPPAYHIST.F_1159)) DELETE FROM WSCOMUSR.EDPPAYINST AS Q5 WHERE $RID$ IN (SELECT Q8.EDPPAYINST_ID FROM WSCOMUSR.ORDERS AS Q6, WSCOMUSR.EDPORDER AS Q7, WSCOMUSR.EDPPAYINST AS Q8 WHERE (Q6.STATUS = 'P') AND (Q6.MEMBER_ID = ?) AND (Q7.ORDER_ID = Q6.ORDERS_ID) AND (Q8.EDPORDER_ID = Q7.EDPORDER_ID)) Access Plan: ----------- Total Cost: 424147 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 0.333333 FILTER ( 2) 424147 415578 +----------------------------------------------------+----------------+-----------------------------------+ 1 0 0.0717332 TBSCAN DELETE DELETE ( 3) ( 4) ( 21) 2.8813e-05 212078 212070 0 207789 207788 | /---+---\ /----+----\ 1 0 0 0.0717332 684953 TABFNC: SYSIBM NLJOIN TABLE: WSCOMUSR NLJOIN TABLE: WSCOMUSR GENROW ( 5) EDPRELHIST ( 22) EDPATMPAY 212078 Q3 212069 Q17 207789 207788 /----------+----------\ /----+----\ 0.145504 0 0.0724482 0.990131 DELETE IXSCAN TBSCAN IXSCAN ( 6) ( 20) ( 23) ( 24) 212078 0.00967348 212061 15.1398 207789 0 207787 2 /---+----\ | | | 0.145504 1.38936e+06 0 0.0724482 684953 NLJOIN TABLE: WSCOMUSR INDEX: SYSIBM TEMP INDEX: WSCOMUSR ( 7) EDPPAYHIST SQL121004092950890 ( 9) I0000846 212077 Q4 Q14 212061 Q19 207789 207787 /-------+-------\ 0.0724482 2.00838 TBSCAN FETCH ( 8) ( 18) 212061 22.7027 207787 3 | /---+----\ 0.0724482 2.00838 1.38936e+06 TEMP IXSCAN TABLE: WSCOMUSR ( 9) ( 19) EDPPAYHIST 212061 15.139 Q11 207787 2 | | 0.0724482 1.38936e+06 DELETE INDEX: WSCOMUSR ( 10) I0000910 212061 Q11 207787 /----+----\ 0.0724482 691780 NLJOIN TABLE: WSCOMUSR ( 11) EDPPAYINST 212061 Q5 207787 /-----------------+------------------\ 1.00049 0.0724127 ^HSJOIN FETCH ( 12) ( 16) 212045 15.6861 207785 2.07241 /------+-------\ /----+----\ 9.55329e+06 1.00049 0.0724127 691780 TBSCAN FETCH IXSCAN TABLE: WSCOMUSR ( 13) ( 14) ( 17) EDPPAYINST 211522 22.7026 15.1381 Q8 207782 3.00017 2 | /---+----\ | 9.55329e+06 1.00049 83849 691780 TABLE: WSCOMUSR IXSCAN TABLE: WSCOMUSR INDEX: WSCOMUSR EDPORDER ( 15) ORDERS I0000844 Q7 15.1377 Q6 Q8 2 | 83849 INDEX: WSCOMUSR I0000176 Q6
For a split second, my first thought was “why is it involving EDPPAYINST, EDPPAYHIST, and other tables not directly referenced in the statement?”, but then it hit me – cascading deletes. WebSphere Commerce databases depend on cascading deletes, so most of the time when you explain a delete, you’ll see the tables it cascades to.
If you have a WebSphere Commerce database, try explaining a delete on MEMBER or USER – they hit the most tables of any of them. The explain will take a while and will be hard to read, but it is interesting.
Now I have that basic information, I also like to have the DB2 design advisor recommendations. That looks like:
$ db2advis -d SAMPLE -i go-live_query1.sql |tee go-live_query1.sql.advis execution started at timestamp 2013-04-05-08.32.03.476088 found [1] SQL statements from the input file Recommending indexes... total disk space needed for initial set [ 337.919] MB total disk space constrained to [6260.996] MB Trying variations of the solution set. 6 indexes in current solution [424147.2812] timerons (without recommendations) [123.9237] timerons (with current solution) [99.97%] improvement -- -- -- LIST OF RECOMMENDED INDEXES -- =========================== -- index[1], 2.993MB CREATE INDEX "DB2INST1"."IDX1304051332060" ON "WSCOMUSR"."ORDERS" ("MEMBER_ID" ASC, "STATUS" ASC, "ORDERS_ID" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[2], 275.005MB CREATE INDEX "DB2INST1"."IDX1304051332230" ON "WSCOMUSR"."EDPORDER" ("ORDER_ID" ASC, "EDPORDER_ID" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[3], 19.915MB CREATE INDEX "DB2INST1"."IDX1304051332270" ON "WSCOMUSR"."EDPPAYINST" ("EDPORDER_ID" ASC, "EDPPAYINST_ID" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[4], 39.997MB CREATE INDEX "DB2INST1"."IDX1304051332320" ON "WSCOMUSR"."EDPPAYHIST" ("EDPPAYINST_ID" ASC, "EDPPAYHIST_ID" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[5], 0.009MB CREATE INDEX "DB2INST1"."IDX1304051332340" ON "WSCOMUSR"."EDPRELHIST" ("EDPPAYHIST_ID" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- -- -- RECOMMENDED EXISTING INDEXES -- ============================ -- RUNSTATS ON TABLE "WSCOMUSR"."ORDERS" FOR SAMPLED DETAILED INDEX "WSCOMUSR"."I0000176" ; -- COMMIT WORK ; -- RUNSTATS ON TABLE "WSCOMUSR"."EDPPAYINST" FOR SAMPLED DETAILED INDEX "WSCOMUSR"."I0000844" ; -- COMMIT WORK ; -- RUNSTATS ON TABLE "WSCOMUSR"."EDPATMPAY" FOR SAMPLED DETAILED INDEX "WSCOMUSR"."I0000846" ; -- COMMIT WORK ; -- RUNSTATS ON TABLE "WSCOMUSR"."EDPPAYHIST" FOR SAMPLED DETAILED INDEX "WSCOMUSR"."I0000910" ; -- COMMIT WORK ; -- RUNSTATS ON TABLE "WSCOMUSR"."EDPRELHIST" FOR SAMPLED DETAILED INDEX "SYSIBM "."SQL121004092950890" ; -- COMMIT WORK ; -- -- -- UNUSED EXISTING INDEXES -- ============================ -- DROP INDEX "WSCOMUSR"."I0000843"; -- DROP INDEX "WSCOMUSR"."I0000850"; -- DROP INDEX "WSCOMUSR"."I0000845"; -- DROP INDEX "WSCOMUSR"."I0000852"; -- DROP INDEX "WSCOMUSR"."I0000652"; -- DROP INDEX "WSCOMUSR"."I0000653"; -- DROP INDEX "WSCOMUSR"."I0000654"; -- DROP INDEX "WSCOMUSR"."I0000892"; -- DROP INDEX "WSCOMUSR"."I0000933"; -- DROP INDEX "WSCOMUSR"."I173124"; -- DROP INDEX "WSCOMUSR"."I0001267"; -- =========================== -- -- ====ADVISOR DETAILED XML OUTPUT============= -- ==(Benefits do not include clustering recommendations)== ...
Making Decisions
Never blindly take the recommendations of the design advisor – especially the deletes. The design advisor’s recommendations are as if this workload you’ve fed it (in this case, a single delete statement) are the only workloads on the database. It also aims for index-only access wherever possible, which can be appropriate for a query that is a large problem, with a measurable impact on an end-user process like checkout. But in this case, we have no evidence of end-user impact.
The tables we’re looking at here are very directly involved in user operations such as checkout where insert performance is critical. I hate to add an index to ORDERS or any order or payment related tables like these, because it can slow down those processes. So from the moment I saw the recommendations, I knew that I wasn’t about to do all of these indexes. I needed to figure out if there was just one that wouldn’t involve too much duplication that could help this delete, or if I was going to have to go back to the developers first and ask how critical this activity was.
If you look at the explain plan, note these two sections:
9.55329e+06 TBSCAN ( 13) 211522 207782 | 9.55329e+06 TABLE: WSCOMUSR EDPORDER Q7
AND
/----+----\ 0.0724482 0.990131 TBSCAN IXSCAN ( 23) ( 24) 212061 15.1398 207787 2 | | 0.0724482 684953 TEMP INDEX: WSCOMUSR ( 9) I0000846 212061 Q19 207787
If you dig through the detailed explain plan, you would see that the first is table scanning EDPORDER, and then feeding the output through several other transformations and filters, and feeding most of that tablescan into a temp table scan in operator #23. So if I can eliminate that table scan, I can eliminate the two most expensive parts of my explain plan and get the vast majority of the reduction in timerons and in rows read as well. Looking at the design advisor output, I see this recommended index for that table:
-- index[2], 275.005MB CREATE INDEX "DB2INST1"."IDX1304051332230" ON "WSCOMUSR"."EDPORDER" ("ORDER_ID" ASC, "EDPORDER_ID" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ;
To fully analyze that, I need to look at the other indexes on the table:
INDNAME UNIQUERULE COLNAMES LASTUSED -------------------- ---------- ------------------------- ---------- SQL121004092949970 P +EDPORDER_ID 04/13/2013 I0000843 D +STORE_ID+ORDER_ID 04/13/2013
There, we see that the new index would duplicate columns that are already in other indexes. However, I don’t yet have an index with ORDER_ID as the first column, and for the way this table is used, that seems like a decent idea anyway. The index to be created is also a bit on the large side – 275 MB, but I still think it is worth it for the kind of improvement we’re talking about. I also know that the cardinality of the new index will be very high since it happens to include the primary key. Maybe in version 10, a combination of jump scans and index anding would make this index obsolete, but I’m on 9.7 and since WebSphere Commerce does not yet support DB2 10, I think it unlikely that I’ll get there any time this year.
Adding my own naming standards and clearing up the syntax, this is what I end up adding:
db2 "CREATE INDEX WSCOMUSR.I_EDPORDER01 ON WSCOMUSR.EDPORDER (ORDER_ID ASC, EDPORDER_ID ASC) ALLOW REVERSE SCANS COLLECT DETAILED STATISTICS"
When creating a new object, you should create it in your test, QA or any other lower environments first, before moving it to stage. After creating a new index, it is always important to do another explain to ensure that your new index is used. Here’s what that explain looked like for me:
Access Plan: ----------- Total Cost: 147.709 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 0.333333 FILTER ( 2) 147.709 19.5123 +-------------------------------------------------------+--------------------+-----------------------------------+ 1 0 0.0717332 TBSCAN DELETE DELETE ( 3) ( 4) ( 21) 2.8813e-05 77.92 69.7886 0 10.293 9.21924 | /---+---\ /----+----\ 1 0 0 0.0717332 684953 TABFNC: SYSIBM NLJOIN TABLE: WSCOMUSR NLJOIN TABLE: WSCOMUSR GENROW ( 5) EDPRELHIST ( 22) EDPATMPAY 77.92 Q3 69.2461 Q17 10.293 9.14751 /----------+----------\ /----+----\ 0.145504 0 0.0724482 0.990131 DELETE IXSCAN TBSCAN IXSCAN ( 6) ( 20) ( 23) ( 24) 77.9095 0.00967348 61.6671 15.1398 10.293 0 8.14751 2 /---+----\ | | | 0.145504 1.38936e+06 0 0.0724482 684953 NLJOIN TABLE: WSCOMUSR INDEX: SYSIBM TEMP INDEX: WSCOMUSR ( 7) EDPPAYHIST SQL121004092950890 ( 9) I0000846 76.8091 Q4 Q14 61.6595 Q19 10.1475 8.14751 /-------+-------\ 0.0724482 2.00838 TBSCAN FETCH ( 8) ( 18) 61.6671 22.7027 8.14751 3 | /---+----\ 0.0724482 2.00838 1.38936e+06 TEMP IXSCAN TABLE: WSCOMUSR ( 9) ( 19) EDPPAYHIST 61.6595 15.139 Q11 8.14751 2 | | 0.0724482 1.38936e+06 DELETE INDEX: WSCOMUSR ( 10) I0000910 61.6567 Q11 8.14751 /----+----\ 0.0724482 691780 NLJOIN TABLE: WSCOMUSR ( 11) EDPPAYINST 61.1088 Q5 8.07506 /---------------+---------------\ 1.00049 0.0724127 NLJOIN FETCH ( 12) ( 16) 45.4151 15.6861 6.00163 2.07241 /---------+---------\ /----+----\ 1.00049 1 0.0724127 691780 FETCH IXSCAN IXSCAN TABLE: WSCOMUSR ( 13) ( 15) ( 17) EDPPAYINST 22.7026 22.7014 15.1381 Q8 3.00017 3 2 /---+----\ | | 1.00049 83849 9.55329e+06 691780 IXSCAN TABLE: WSCOMUSR INDEX: WSCOMUSR INDEX: WSCOMUSR ( 14) ORDERS I_EDPORDER01 I0000844 15.1377 Q6 Q7 Q8 2 | 83849 INDEX: WSCOMUSR I0000176 Q6
Whee, we went from over 420,000 timerons in the first explain to just 148 timerons with just one index addition. Not only is it using the index, but where DB2 used to have to scan over 200,000 rows to execute this delete, it is now handling just three.
I also like to check several days later to make sure the index is still used. In this case, the LASTUSED column in syscat.indexes shows that it is, indeed, being used.
The design advisor said that we could get this query down to about 124 timerons by adding all of the suggested indexes. But given the importance of inserts for these tables, I’m not willing to try the other indexes unless someone else notices that this query is a problem or that a process that this query is involved in is notably slow for end-users. They would be high-risk as far as indexes go, even before I analyze them further.
A possible follow up here is to also run the query and index by the development team to see if they have any thoughts or objections. In some organizations, you’ll have to do that first anyway to get approval to add it in production.
> cascading deletes
Thanks for the detailed blog entry. Note that simple foreign keys can cause exactly the same problems. Problems that you’ll never notice when just inserting.
Can you post the the query you used to find the delete statement in the first entry of this post ……………
It is actually querying my own set of custom history tables based on MON_GET_PKG_CACHE_STMT, so it wouldn’t nesecarily work against other tables.
Its from the MON_GET_PKG_CACHE_STMT 9.7 , is there a way we can get this from 9.1 , i mean throught Admin VIews or some how……………….
I believe you can get it using the SNAPDYN_SQL in db2 9.1 – most of it is in there. There’s a chart of methods for getting to that kind of information and the versions in my developerWorks article: http://www.ibm.com/developerworks/data/library/techarticle/dm-1211packagecache/
the interesting column i found is ROWS RETURNED which i cannot find in any of these view 9.1 , so i thought have i missed anything and thought of confirming with you …………Anyways how’s your trip to IDUG Orlando???????? Hope you enjoyed it a lot ……can sense it in your tweets…..
Thanks for the detailed post. Very helpful !
Hi Ember,
Thanks a lot for all your posts which is very helpful in learning’s and resolving problems. In the example, we created here an index suggested by db2 adviser only. Sometime, we face issue like where indexes recommended by db2 also do not help in complex queries. As you said in post, don’t relay on db2 adviser output. Is it possible to explain with an example what to do and how to choose and decide which index we should create and what are other points we should consider while looking at explain plan.
The other question – is it advisable always to include primary key columns combination while creating index to increase cardinality.
Only include primary key columns if they will also be queried along with the columns you are indexing on. Artificially inflating cardinality does not improve things – in fact, it may make performance worse.
If you have a low-cardinality index that you wish to create, and there are no high-cardinality columns that also make sense to include based on the queries run against the table, then you should consider making it a clustering index or a dimension for MDC. Or even if higher cardinality is introduced by join with another table, consider a statistical view.
Beware of any low-cardinality indexes of 3-5% or less of the table cardinality, especially if the data is unevenly distributed over the values. I do my best not to add such indexes as they may be detrimental to performance.