I actually had a blog entry started on this topic before IDUG. I knew this was possible, but not exactly how to do it. Then I sat in Scott Hayes’ session – D04 – More Sage Advice: Invaluable DB2 LUW Performance Insights from Around the Globe, and he had all the details I needed to really get rolling on trying this out – saving me valuable time in research.
Indexing is relatively easy with a simple data model, or even a complicated one that doesn’t join a lot of tables, once you get used to it. But get to a point where you’ve got views and tables and dozens of objects involved, where the time is not primarily spent on reading data from the tables and indexes, but in the joining and filtering, and it starts to get difficult.
I’m calling this multiple index regression analysis because we’re looking for which recommended indexes most impact our query performance. The DB2 design advisor can recommend many indexes to improve performance, and some of them are more impactful than others. In some analysis I was doing last week, one of the indexes turned out to only reduce query execution time by 0.0001 timerons – not worth the overhead and space for that particular index. This does not use actual statistical formulas like those used in Multiple Linear Regression, but I could see a whole-workload approach where that would be interesting.
This is a highly advanced DB2 topic. It is assumed that the reader understands terms like Timeron and gets the concepts and can read an explain and db2advis output.
Scenario
In this example, I’m working with this query:
select * from order_trace where c_state = ? and dl_ap_date2 >= ? and reference not in (select reference from edi_204 where edi_204.reference = reference and edi_204.order_id = order_id and CUST_s_location_id = ? ) and (status in (? ,? ) OR ORDER_TYPE = ? ) order by dl_ap_date2, dl_ap_time2, act_state, act_city, actual, order_id;
Doesn’t look so bad, right? except order_trace is a view that is quite complicated. So my explain plan looks like this:
Access Plan: ----------- Total Cost: 35208.6 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 2.92646 >^NLJOIN ( 2) 35208.6 11324.4 /-------------+--------------\ 2.92646 1 >^NLJOIN IXSCAN ( 3) ( 66) 35062.3 50.0091 11318.5 2 /-----------------+-----------------\ | 2.92646 1 595580 >^NLJOIN IXSCAN INDEX: DB2 ( 4) ( 65) ORDER_ID 34989.1 25.0083 Q1 11315.6 1 /---------------------+----------------------\ | 2.92646 1 13398 >^NLJOIN IXSCAN INDEX: SYSIBM ( 5) ( 64) SQL991011111520810 34915.9 25.0074 Q2 11312.7 1 /------------------------------+-------------------------------\ | 2.92646 1 2976 NLJOIN IXSCAN INDEX: SYSIBM ( 6) ( 63) SQL991011113441210 34842.7 25.0112 Q3 11309.8 1 /--------------------------------------------------+---------------------------------------------------\ | 2.92646 1 13398 TBSCAN NLJOIN INDEX: DB2 ( 7) ( 32) DRIVER_ID 33816.6 268.586 Q4 11237.2 19 | /------------+-------------\ 2.92646 1 1 SORT NLJOIN TBSCAN ( 8) ( 33) ( 59) 33816.6 243.01 12.7956 11237.2 17 1 | /----------+----------\ | 2.92646 1 1 1 HSJOIN NLJOIN TBSCAN TEMP ( 9) ( 34) ( 56) ( 60) 33816.6 191.871 38.3582 12.7911 11237.2 13 3 1 /-------------------+--------------------\ /--------+--------\ | | 139659 2.92646 1 1 1 1 ^HSJOIN HSJOIN ^NLJOIN TBSCAN TEMP GRPBY ( 10) ( 13) ( 35) ( 38) ( 57) ( 61) 9380.69 24432.7 37.7969 140.072 38.3536 12.7892 3516.17 7721 2 10 3 1 /------+-------\ /-----------+-----------\ /------+------\ | | | 139659 97515 2685.75 152.176 1 1 1 1 0.364844 IXSCAN IXSCAN NLJOIN ^NLJOIN IXSCAN IXSCAN TEMP IXSCAN IXSCAN ( 11) ( 12) ( 14) ( 19) ( 36) ( 37) ( 39) ( 58) ( 62) 5805.14 3567.73 6632.09 17800.5 25.0075 25.5699 140.067 38.3518 12.7892 2657.04 859.125 2970.8 4750.2 1 2 10 3 1 | | /----+-----\ /---+----\ | | | | | 139659 97515 1875.29 1.43218 152.176 1 2976 139659 1 440024 19946 INDEX: DB2 INDEX: DB2 TBSCAN IXSCAN TBSCAN IXSCAN INDEX: BFAIRCHILD INDEX: DB2 NLJOIN INDEX: DB2 INDEX: DB2 WIZ29 IDX201171957580000 ( 15) ( 18) ( 20) ( 31) IDX201162148270000 WIZ1171 ( 40) IDX201161942390000 IDX_STOP_1 Q23 Q25 209.649 38.3519 15258.7 25.5699 Q10 Q11 140.066 Q8 Q5 17.3389 3 3605.98 2 10 | | | | /------------+-------------\ 1875.29 139659 152.176 139659 1 1 SORT INDEX: DB2 SORT INDEX: DB2 TBSCAN TBSCAN ( 16) IDX201171959320000 ( 21) WIZ1171 ( 41) ( 50) 209.649 Q24 15258.7 Q22 76.1435 51.1416 17.3389 3605.98 5.00001 4 | | | | 1875.29 152.176 1.3729e-005 1 IXSCAN NLJOIN SORT TEMP ( 17) ( 22) ( 42) ( 51) 209.323 15258.6 76.1433 51.137 17.3389 3605.98 5.00001 4 | /------+-------\ | | 97515 56 2.71743 1.3729e-005 1 INDEX: DB2 TBSCAN IXSCAN NLJOIN TBSCAN IDX201171957580000 ( 23) ( 26) ( 43) ( 52) Q26 25.0247 332.82 76.1429 51.1352 1 35.2989 5.00001 4 | /---+----\ /-------+-------\ | 56 803.566 54670 1 1.3729e-005 1 SORT TBSCAN INDEX: DB2 TBSCAN FETCH SORT ( 24) ( 27) IDX_ORDER_01 ( 44) ( 48) ( 53) 25.0246 78.3923 Q21 51.1352 25.0077 51.135 1 6.08317 4 1.00001 4 | | | /----+-----\ | 56 803.566 1 1.3729e-005 519 1.00058 IXSCAN TEMP SORT IXSCAN TABLE: DB2 FETCH ( 25) ( 28) ( 45) ( 49) CODE ( 54) 25.0201 78.1369 51.135 25.0072 Q15 51.1347 1 6.08317 4 1 4 | | | | /------+------\ 519 803.566 1.00058 519 1.00058 1.9199e+007 INDEX: DB2 UNIQUE FETCH INDEX: DB2 IXSCAN TABLE: EDI IDX_CODE_01 ( 29) ( 46) IDX_CODE_01 ( 55) EDI_204_ADDITIONAL_FIELDS Q27 78.0063 51.1347 Q15 38.3527 Q18 6.08317 4 3 | /------+------\ | 803.566 1.00058 1.9199e+007 1.9199e+007 IXSCAN IXSCAN TABLE: EDI INDEX: EDI ( 30) ( 47) EDI_204_ADDITIONAL_FIELDS EDI_204_A_F_NDX 77.9929 38.3527 Q13 Q18 6.08317 3 | | 2.23029e+007 1.9199e+007 INDEX: DB2 INDEX: EDI IDX201171956050000 EDI_204_A_F_NDX Q28 Q13
A bit more horrendous, that, and looking at it in detail, there’s no obvious gigantic table scan or even fetch from a table after reading an index that provides a significant amount of the time.
Running db2advis on that, we get:
19 indexes in current solution [34706.0000] timerons (without recommendations) [19035.0000] timerons (with current solution) [45.15%] improvement -- -- -- LIST OF RECOMMENDED INDEXES -- =========================== -- index[1], 0.548MB CREATE UNIQUE INDEX "ECROOKS "."IDX1505210120080" ON "DB2 "."ORDER" ("TRAILER_SIZE" ASC, "DL_AP_DATE2" ASC, "CUSTOMER_RATE" ASC, "NOTIFIED_DATE" ASC, "DATE_ENTERED" ASC, "COMMODITY" ASC, "NEEDS_PAPERWORK" ASC, "LINE_OF_BUSINESS" ASC, "TOTAL_CHARGE" ASC, "MILES" ASC, "TERMINAL" ASC, "SCAC" ASC, "SEAL" ASC, "PO_NUM" ASC, "PU_NUM" ASC, "BL_NUM" ASC, "REFERENCE" ASC, "WHO_ENTERED" ASC, "ORDER_TYPE" ASC, "TRACTOR_ID" ASC, "DRIVER_ID" ASC, "DL_AT_TIME2" ASC, "DL_AT_DATE2" ASC, "DL_AT_TIME1" ASC, "DL_AT_DATE1" ASC, "DL_AP_TIME2" ASC, "DL_AP_TIME1" ASC, "DL_AP_DATE1" ASC, "PU_AT_TIME2" ASC, "PU_AT_DATE2" ASC, "PU_AT_TIME1" ASC, "PU_AT_DATE1" ASC, "PU_AP_TIME2" ASC, "PU_AP_DATE2" ASC, "PU_AP_TIME1" ASC, "PU_AP_DATE1" ASC, "S_LOCATION_ID" ASC, "CHASSIS" ASC, "CHECK_DIGIT" ASC, "TRAILER" ASC, "STATUS" ASC, "ORDER_ID" ASC) INCLUDE ("C_LOCATION_ID", "ACT_LOCATION_ID", "BTO_LOCATION_ID") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[2], 20.087MB CREATE INDEX "ECROOKS "."IDX1505210116380" ON "DB2 "."EDI_204" ("CUST_S_LOCATION_ID" ASC, "ORDER_ID" ASC, "REFERENCE" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[3], 0.048MB CREATE UNIQUE INDEX "ECROOKS "."IDX1505210115300" ON "DB2 "."VENDOR_UNIT" ("VENDOR_UNIT_ID" ASC) INCLUDE ("VENDOR_ID") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[4], 5.778MB CREATE UNIQUE INDEX "ECROOKS "."IDX1505210115350" ON "DB2 "."LOCATION" ("LOCATION_ID" ASC) INCLUDE ("NAME") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[5], 1673.419MB CREATE INDEX "ECROOKS "."IDX1505210115360" ON "EDI "."EDI_204_ADDITIONAL_FIELDS" ("REFERENCE" ASC, "FIELD_NAME" ASC, "INSERT_DT" ASC, "FIELD_VALUE" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[6], 0.013MB CREATE INDEX "ECROOKS "."IDX1505210116350" ON "DB2 "."CODE" ("CODE_TEXT" ASC, "TYPE" ASC, "CODE_ID" ASC, "CODE" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[7], 9.739MB CREATE UNIQUE INDEX "ECROOKS "."IDX1505210121040" ON "DB2 "."RAIL_ETA" ("ORDER_ID" ASC) INCLUDE ("STATE") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ;
Some of those indexes I’m not likely to try. index[1] for example is awfully wide. index[2] is nearly identical to an existing index, with the difference being in the order of the last two columns. And 19 total indexes that DB2 wants to use for this one query? Wow, that’s a lot.
Also, I don’t have a full-size test or QA environment – the data size in my lower environments is significantly smaller, so it is hard for me to scale down to see what happens with different indexes in those environments. One option here is to mimic statistics in my lower environments. I’ve done that before, but I want to try mimicking the new indexes instead.
Setup steps
First, my explain tables are all currently shared tables. With multiple DBAs potentially working in this environment, I have two real choices to isolate my data – #1 is to create my own set of explain tables and make use of them. #2 is to note the time before I collect data and then the time after I collect explain and db2advis data and look only within that period to verify that only the data I want is included. I’m chosing the second option. With only 3 potential people running explains and none of us dedicated to that full time, this seems like a good option – we’re not likely to conflict.
Running the Explain and db2advis
I need to capture the explain and db2advis data before I work on anything. Note that I happen to be working in a Windows environment from a powershell prompt for this work – everything here would be the same on Linux/UNIX.
To capture my initial data, I’ll use:
PS D:\xtivia\queries> db2 "select current timestamp from sysibm.sysdummy1" 1 -------------------------- 2015-05-21-10.00.07.793000 1 record(s) selected. PS D:\xtivia\queries> db2 set current explain mode explain DB20000I The SQL command completed successfully. PS D:\xtivia\queries> db2 -tvf query13.sql select * from db2.order_trace where c_state = ? and dl_ap_date2 >= ? and reference not in (select reference from db2.edi_204 edi_204 where edi_204.reference = reference and edi_204.order_id = order _id and CUST_s_location_id = ? ) and (status in (? ,? ) OR ORDER_TYPE = ? ) order by dl_ap_date2, dl_ap_time2, act_state, act_city, actual, order_id SQL0217W The statement was not executed as only Explain information requests are being processed. SQLSTATE=01604 PS D:\xtivia\queries> db2exfmt -d comtrak2 -e db2admin -1 -o query13_exfmt.txt DB2 Universal Database Version 9.7, 5622-044 (c) Copyright IBM Corp. 1991, 2009 Licensed Material - Program Property of IBM IBM DATABASE 2 Explain Table Format Tool Connecting to the Database. Connect to Database Successful. Output is in query13_exfmt.txt. Executing Connect Reset -- Connect Reset was Successful. PS D:\xtivia\queries> db2advis -d comtrak2 -i query13.sql -p |tee query13_advis.txt Using user id as default schema name. Use -n option to specify schema execution started at timestamp 2015-05-21-10.01.12.315000 found [1] SQL statements from the input file Recommending indexes... total disk space needed for initial set [1715.827] MB total disk space constrained to [35280.399] MB Trying variations of the solution set. Optimization finished. 19 indexes in current solution [38772.0000] timerons (without recommendations) [18742.0000] timerons (with current solution) [51.66%] improvement -- -- -- LIST OF RECOMMENDED INDEXES -- =========================== -- index[1], 0.532MB CREATE UNIQUE INDEX "DB2ADMIN"."IDX1505231651490" ON "DB2 "."ORDER" ("TRAILER_SIZE" ASC, "DL_AP_DATE2" ASC, "CUSTOMER_RATE" ASC, "NOTIFIED_DATE" ASC, "DATE_ENTERED" ASC, "COMMODITY" ASC, "NEEDS_PAPERWORK" ASC, "LINE_OF_BUSINESS" ASC, "TOTAL_CHARGE" ASC, "MILES" ASC, "TERMINAL" ASC, "SCAC" ASC, "SEAL" ASC, "PO_NUM" ASC, "PU_NUM" ASC, "BL_NUM" ASC, "REFERENCE" ASC, "WHO_ENTERED" ASC, "ORDER_TYPE" ASC, "TRACTOR_ID" ASC, "DRIVER_ID" ASC, "DL_AT_TIME2" ASC, "DL_AT_DATE2" ASC, "DL_AT_TIME1" ASC, "DL_AT_DATE1" ASC, "DL_AP_TIME2" ASC, "DL_AP_TIME1" ASC, "DL_AP_DATE1" ASC, "PU_AT_TIME2" ASC, "PU_AT_DATE2" ASC, "PU_AT_TIME1" ASC, "PU_AT_DATE1" ASC, "PU_AP_TIME2" ASC, "PU_AP_DATE2" ASC, "PU_AP_TIME1" ASC, "PU_AP_DATE1" ASC, "S_LOCATION_ID" ASC, "CHASSIS" ASC, "CHECK_DIGIT" ASC, "TRAILER" ASC, "STATUS" ASC, "ORDER_ID" ASC) INCLUDE ("C_LOCATION_ID", "ACT_LOCATION_ID", "BTO_LOCATION_ID") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[2], 21.126MB CREATE INDEX "DB2ADMIN"."IDX1505231648190" ON "DB2 "."EDI_204" ("CUST_S_LOCATION_ID" ASC, "ORDER_ID" ASC, "REFERENCE" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[3], 0.048MB CREATE UNIQUE INDEX "DB2ADMIN"."IDX1505231647110" ON "DB2 "."VENDOR_UNIT" ("VENDOR_UNIT_ID" ASC) INCLUDE ("VENDOR_ID") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[4], 5.782MB CREATE UNIQUE INDEX "DB2ADMIN"."IDX1505231647160" ON "DB2 "."LOCATION" ("LOCATION_ID" ASC) INCLUDE ("NAME") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[5], 1678.521MB CREATE INDEX "DB2ADMIN"."IDX1505231647170" ON "EDI "."EDI_204_ADDITIONAL_FIELDS" ("REFERENCE" ASC, "FIELD_NAME" ASC, "INSERT_DT" ASC, "FIELD_VALUE" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[6], 0.013MB CREATE INDEX "DB2ADMIN"."IDX1505231648160" ON "DB2 "."CODE" ("CODE_TEXT" ASC, "TYPE" ASC, "CODE_ID" ASC, "CODE" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; -- index[7], 9.806MB CREATE UNIQUE INDEX "DB2ADMIN"."IDX1505231652450" ON "DB2 "."RAIL_ETA" ("ORDER_ID" ASC) INCLUDE ("STATE") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS; COMMIT WORK ; snip 1449 solutions were evaluated by the advisor DB2 Workload Performance Advisor tool is finished. PS D:\xtivia\queries> db2 set current explain mode no DB20000I The SQL command completed successfully. PS D:\xtivia\queries> db2 "select current timestamp from sysibm.sysdummy1" 1 -------------------------- 2015-05-21-10.03.06.369000 1 record(s) selected.
What this gets me is all of the details about this query without any changes, and the recommended indexes – in this case 7 new indexes. I don’t really like all of those indexes, but I need to figure out which ones will give me the most impact and analyze other characteristics about them. Note the use of the -p
option on the db2advis. This causes DB2 to retain information about the explain plan and recommended indexes.
We also need to note from this process the starting time and the ending time for the analysis – in this case:
Start: 2015-05-21-10.00.07
End: 2015-05-21-10.03.06
Note Costs
The next step is to gather the costs of this query before and after the indexes recommended by the db2 design advisor. Note that we have to fill in the timestamps from above to limit the data returned.
select dec(total_cost,20,4) as before_total_cost , dec(io_cost,20,4) as before_io_cost , dec(CPU_cost,20,4) as before_cpu_cost , dec(Comm_cost,20,4) as before_comm_cost from Explain_Operator ,(select min(explain_time) as mintime from Explain_Operator where operator_type = 'RETURN' and explain_time between timestamp('2015-05-21-10.00.07') and timestamp('2015-05-21-10.03.06')) as b where explain_time = b.mintime and operator_type = 'RETURN' with UR ; BEFORE_TOTAL_COST BEFORE_IO_COST BEFORE_CPU_COST BEFORE_COMM_COST ---------------------- ---------------------- ---------------------- ---------------------- 38772.4492 11552.6152 705355200.0000 0.0000 1 record(s) selected.
And:
select dec(total_cost,20,4) as after_total_cost , dec(io_cost,20,4) as after_io_cost , dec(CPU_cost,20,4) as after_cpu_cost , dec(Comm_cost,20,4) as after_comm_cost from Explain_Operator ,(select max(explain_time) as maxtime from Explain_Operator where operator_type = 'RETURN' and explain_time between timestamp('2015-05-21-10.00.07') and timestamp('2015-05-21-10.03.06')) as b where explain_time = b.maxtime and operator_type = 'RETURN' with UR ; AFTER_TOTAL_COST AFTER_IO_COST AFTER_CPU_COST AFTER_COMM_COST ---------------------- ---------------------- ---------------------- ---------------------- 18741.7597 5339.1113 131031752.0000 0.0000 1 record(s) selected.
What we have in the explain tables at this point is really two sets of data – the data from the original explain, and the data from an explain as if all of the recommended indexes existed.
Now here’s a query I prefer to work with of the existing and recommended indexes for this query. It uses both data on existing indexes from syscat.indexes and syscat.tables, along with data on the proposed new indexes from advise_index. It’s also one I run over and over again to help me keep track of what indexes I’m trying things with:
with ts as ( select max(explain_time) as maxtime from Explain_Operator where operator_type = 'RETURN' and explain_time between timestamp('2015-05-21-10.00.07') and timestamp('2015-05-21-10.03.06')) select substr(name,1,25) as indname , substr(tbcreator,1,13) as tabschema , substr(tbname,1,18) as tabname , coalesce(si.fullkeycard, ai.fullkeycard) as fullkeycard , st.card , coalesce(si.uniquerule, ai.uniquerule) as uniquerule , use_index , exists , substr(coalesce(si.colnames, ai.colnames),1,60) as colnames from advise_index ai left outer join syscat.indexes si on ai.tbcreator=si.tabschema and ai.tbname=si.tabname and ai.name=si.indname left outer join syscat.tables st on st.tabschema=ai.tbcreator and st.tabname=ai.tbname join ts on explain_time = ts.maxtime order by exists, use_index, uniquerule desc, indname with ur;
I’m actually masaging that a bit with clpplus to get the output I want because substr isn’t doing so well at the powershell prompt for me at the moment:
Index Table Index Name Table Schema Table Name Cardinality Cardinality UNIQUERULE USE_INDEX EXISTS Column Names -------------------- --------------- ------------------------- -------------------- -------------------- ---------- --------- ------ ------------------------------------------------------------ IDX1505231647110 DB2 VENDOR_UNIT 2976 2976 U Y N +VENDOR_UNIT_ID-VENDOR_ID IDX1505231647160 DB2 LOCATION 140445 140445 U Y N +LOCATION_ID-NAME IDX1505231651490 DB2 ORDER 642 52497 U Y N +TRAILER_SIZE+DL_AP_DATE2+CUSTOMER_RATE+NOTIFIED_DATE+DATE_E NTERED+COMMODITY+NEEDS_PAPERWORK+LINE_OF_BUSINESS+TOTAL_CHAR GE+MILES+TERMINAL+SCAC+SEAL+PO_NUM+PU_NUM+BL_NUM+REFERE... IDX1505231652450 DB2 RAIL_ETA 649793 649793 U Y N +ORDER_ID-STATE IDX1505231647170 EDI EDI_204_ADDITIONAL_FIELDS 19897638 19897638 D Y N +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE IDX1505231647170 EDI EDI_204_ADDITIONAL_FIELDS 19897638 19897638 D Y N +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE IDX1505231648160 DB2 CODE 5 520 D Y N +CODE_TEXT+TYPE+CODE_ID+CODE IDX1505231648190 DB2 EDI_204 295885 1014841 D Y N +CUST_S_LOCATION_ID+ORDER_ID+REFERENCE DRIVER_ID DB2 DRIVER 13519 13519 U Y Y +DRIVER_ID+FIRST_NAME+LAST_NAME+PREVIOUS_NUM+DISPATCHER IDX201162151160000 DB2 CITY 97516 97516 U Y Y +CITY_ID+STATE+CITY+TIME_ZONE WIZ1166 DB2 LOCATION 140445 140445 U Y Y +LOCATION_ID+ZIP+LONGITUDE+LATITUDE+NAME+CITY_ID WIZ1171 DB2 LOCATION 140445 140445 U Y Y +LOCATION_ID+NAME+NUMBER SQL991011111520810 DB2 DRIVER 13519 13519 P Y Y +DRIVER_ID SQL991011113441210 DB2 VENDOR_UNIT 2976 2976 P Y Y +VENDOR_UNIT_ID IDX_CODE_01 DB2 CODE 518 520 D Y Y +TYPE+CODE_TEXT+CODE IDX_STOP_1 DB2 STOP 15883 21568 D Y Y +ORDER_ID IDX201161942390000 EDI EDI_204_HUB_CUSTOMER 451173 451173 D Y Y +REFERENCE+DESCRIPTION IDX201171957580000 DB2 CITY 97516 97516 D Y Y +STATE+CITY+CITY_ID IDX201171959320000 DB2 LOCATION 140445 140445 D Y Y +CITY_ID+DROP+ZIP+LONGITUDE+LATITUDE+NAME+LOCATION_ID
I pull that output into Excel so I can note details as I work through each index and easily play with the numbers. It looks something like this:
I hide and add some columns, but I’m running this same query between every change I make so I keep track of what I’ve done and not done.
Analysis Techniques
At this point, there are two different methods that Scott laid out in his presentation on this topic – Index Addition and Index Subtraction.
Index Addition
The point of index addition is to run an explain on the query with each proposed index individually, to see the impact of that index alone. This works particuarly well when a great impact is seen from a few indexes and the combination of indexes does not make much difference. To use this method, we update ADVISE_INDEX to have USE_INDEX set to N for all indexes that do not already exist:
PS > db2 "update advise_index set use_index = 'N' where exists= 'N'" DB20000I The SQL command completed successfully. Index Table Index Name Table Schema Table Name Cardinality Cardinality UNIQUERULE USE_INDEX EXISTS Column Names -------------------- --------------- ------------------------- -------------------- -------------------- ---------- --------- ------ ------------------------------------------------------------ IDX1505231647110 DB2 VENDOR_UNIT 2976 2976 U N N +VENDOR_UNIT_ID-VENDOR_ID IDX1505231647160 DB2 LOCATION 140445 140445 U N N +LOCATION_ID-NAME IDX1505231651490 DB2 ORDER 642 52497 U N N +TRAILER_SIZE+DL_AP_DATE2+CUSTOMER_RATE+NOTIFIED_DATE+DATE_E NTERED+COMMODITY+NEEDS_PAPERWORK+LINE_OF_BUSINESS+TOTAL_CHAR GE+MILES+TERMINAL+SCAC+SEAL+PO_NUM+PU_NUM+BL_NUM+REFERE... IDX1505231652450 DB2 RAIL_ETA 649793 649793 U N N +ORDER_ID-STATE IDX1505231647170 EDI EDI_204_ADDITIONAL_FIELDS 19897638 19897638 D N N +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE IDX1505231647170 EDI EDI_204_ADDITIONAL_FIELDS 19897638 19897638 D N N +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE IDX1505231648160 DB2 CODE 5 520 D N N +CODE_TEXT+TYPE+CODE_ID+CODE IDX1505231648190 DB2 EDI_204 295885 1014841 D N N +CUST_S_LOCATION_ID+ORDER_ID+REFERENCE DRIVER_ID DB2 DRIVER 13519 13519 U Y Y +DRIVER_ID+FIRST_NAME+LAST_NAME+PREVIOUS_NUM+DISPATCHER IDX201162151160000 DB2 CITY 97516 97516 U Y Y +CITY_ID+STATE+CITY+TIME_ZONE WIZ1166 DB2 LOCATION 140445 140445 U Y Y +LOCATION_ID+ZIP+LONGITUDE+LATITUDE+NAME+CITY_ID WIZ1171 DB2 LOCATION 140445 140445 U Y Y +LOCATION_ID+NAME+NUMBER SQL991011111520810 DB2 DRIVER 13519 13519 P Y Y +DRIVER_ID SQL991011113441210 DB2 VENDOR_UNIT 2976 2976 P Y Y +VENDOR_UNIT_ID IDX_CODE_01 DB2 CODE 518 520 D Y Y +TYPE+CODE_TEXT+CODE IDX_STOP_1 DB2 STOP 15883 21568 D Y Y +ORDER_ID IDX201161942390000 EDI EDI_204_HUB_CUSTOMER 451173 451173 D Y Y +REFERENCE+DESCRIPTION IDX201171957580000 DB2 CITY 97516 97516 D Y Y +STATE+CITY+CITY_ID IDX201171959320000 DB2 LOCATION 140445 140445 D Y Y +CITY_ID+DROP+ZIP+LONGITUDE+LATITUDE+NAME+LOCATION_ID
Now one by one we set use_index equal to ‘Y’ for each index and run an explain using the ‘evaluate indexes’ option.
PS D:\xtivia\queries> db2 "update advise_index set use_index = 'Y' where name = 'IDX1505231647110'"
DB20000I The SQL command completed successfully.
Index Table
Index Name Table Schema Table Name Cardinality Cardinality UNIQUERULE USE_INDEX EXISTS Column Names
-------------------- --------------- ------------------------- -------------------- -------------------- ---------- --------- ------ ------------------------------------------------------------
IDX1505231647160 DB2 LOCATION 140445 140445 U N N +LOCATION_ID-NAME
IDX1505231651490 DB2 ORDER 642 52497 U N N +TRAILER_SIZE+DL_AP_DATE2+CUSTOMER_RATE+NOTIFIED_DATE+DATE_E
NTERED+COMMODITY+NEEDS_PAPERWORK+LINE_OF_BUSINESS+TOTAL_CHAR
GE+MILES+TERMINAL+SCAC+SEAL+PO_NUM+PU_NUM+BL_NUM+REFERE...
IDX1505231652450 DB2 RAIL_ETA 649793 649793 U N N +ORDER_ID-STATE
IDX1505231647170 EDI EDI_204_ADDITIONAL_FIELDS 19897638 19897638 D N N +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE
IDX1505231647170 EDI EDI_204_ADDITIONAL_FIELDS 19897638 19897638 D N N +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE
IDX1505231648160 DB2 CODE 5 520 D N N +CODE_TEXT+TYPE+CODE_ID+CODE
IDX1505231648190 DB2 EDI_204 295885 1014841 D N N +CUST_S_LOCATION_ID+ORDER_ID+REFERENCE
IDX1505231647110 DB2 VENDOR_UNIT 2976 2976 U Y N +VENDOR_UNIT_ID-VENDOR_ID
DRIVER_ID DB2 DRIVER 13519 13519 U Y Y +DRIVER_ID+FIRST_NAME+LAST_NAME+PREVIOUS_NUM+DISPATCHER
IDX201162151160000 DB2 CITY 97516 97516 U Y Y +CITY_ID+STATE+CITY+TIME_ZONE
WIZ1166 DB2 LOCATION 140445 140445 U Y Y +LOCATION_ID+ZIP+LONGITUDE+LATITUDE+NAME+CITY_ID
WIZ1171 DB2 LOCATION 140445 140445 U Y Y +LOCATION_ID+NAME+NUMBER
SQL991011111520810 DB2 DRIVER 13519 13519 P Y Y +DRIVER_ID
SQL991011113441210 DB2 VENDOR_UNIT 2976 2976 P Y Y +VENDOR_UNIT_ID
IDX_CODE_01 DB2 CODE 518 520 D Y Y +TYPE+CODE_TEXT+CODE
IDX_STOP_1 DB2 STOP 15883 21568 D Y Y +ORDER_ID
IDX201161942390000 EDI EDI_204_HUB_CUSTOMER 451173 451173 D Y Y +REFERENCE+DESCRIPTION
IDX201171957580000 DB2 CITY 97516 97516 D Y Y +STATE+CITY+CITY_ID
IDX201171959320000 DB2 LOCATION 140445 140445 D Y Y +CITY_ID+DROP+ZIP+LONGITUDE+LATITUDE+NAME+LOCATION_ID
With that set, we set the current explain mode, and explain the query:
PS D:\xtivia\queries> db2 set current explain mode evaluate indexes DB20000I The SQL command completed successfully. PS D:\xtivia\queries> db2 -tvf query13.sql select * from db2.order_trace where c_state = ? and dl_ap_date2 >= ? and reference not in (select reference from db2.edi_204 edi_204 where edi_204.reference = reference and edi_204.order_id = order _id and CUST_s_location_id = ? ) and (status in (? ,? ) OR ORDER_TYPE = ? ) order by dl_ap_date2, dl_ap_time2, act_state, act_city, actual, order_id SQL0217W The statement was not executed as only Explain information requests are being processed. SQLSTATE=01604 PS D:\xtivia\queries> db2 set current explain mode no DB20000I The SQL command completed successfully.
Note that the explain mode used is not yes
or explain
, but evaluate indexes
.
Next, we must query the cost to record in our spreadsheet:
select dec(total_cost,20,4) as this_index_total_cost , dec(io_cost,20,4) as this_index_io_cost , dec(CPU_cost,20,4) as this_index_cpu_cost , dec(Comm_cost,20,4) as this_index_comm_cost from Explain_Operator ,(select max(explain_time) as maxtime from Explain_Operator where operator_type = 'RETURN') as b where explain_time = b.maxtime and operator_type = 'RETURN' with UR ; select THIS_INDEX_TOTAL_COST THIS_INDEX_IO_COST THIS_INDEX_CPU_COST THIS_INDEX_COMM_COST ---------------------- ---------------------- ---------------------- ---------------------- 38772.4492 11552.6152 705353856.0000 0.0000 1 record(s) selected.
If you’re keeping track, that’s ZERO impact on timerons for that particular index. Now we have to move through each index in turn. I’m only showing the second one so you understand that each time we mark the other indexes as not being used, and mark only one index as being used:
PS D:\xtivia\queries> db2 "update advise_index set use_index = 'N' where exists= 'N'" DB20000I The SQL command completed successfully. PS D:\xtivia\queries> db2 "update advise_index set use_index = 'Y' where name = 'IDX1505231647160'" DB20000I The SQL command completed successfully. Index Table Index Name Table Schema Table Name Cardinality Cardinality UNIQUERULE USE_INDEX EXISTS Column Names -------------------- --------------- ------------------------- -------------------- -------------------- ---------- --------- ------ ------------------------------------------------------------ IDX1505231647110 DB2 VENDOR_UNIT 2976 2976 U N N +VENDOR_UNIT_ID-VENDOR_ID IDX1505231651490 DB2 ORDER 642 52497 U N N +TRAILER_SIZE+DL_AP_DATE2+CUSTOMER_RATE+NOTIFIED_DATE+DATE_E NTERED+COMMODITY+NEEDS_PAPERWORK+LINE_OF_BUSINESS+TOTAL_CHAR GE+MILES+TERMINAL+SCAC+SEAL+PO_NUM+PU_NUM+BL_NUM+REFERE... IDX1505231652450 DB2 RAIL_ETA 649793 649793 U N N +ORDER_ID-STATE IDX1505231647170 EDI EDI_204_ADDITIONAL_FIELDS 19897638 19897638 D N N +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE IDX1505231647170 EDI EDI_204_ADDITIONAL_FIELDS 19897638 19897638 D N N +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE IDX1505231648160 DB2 CODE 5 520 D N N +CODE_TEXT+TYPE+CODE_ID+CODE IDX1505231648190 DB2 EDI_204 295885 1014841 D N N +CUST_S_LOCATION_ID+ORDER_ID+REFERENCE IDX1505231647160 DB2 LOCATION 140445 140445 U Y N +LOCATION_ID-NAME DRIVER_ID DB2 DRIVER 13519 13519 U Y Y +DRIVER_ID+FIRST_NAME+LAST_NAME+PREVIOUS_NUM+DISPATCHER IDX201162151160000 DB2 CITY 97516 97516 U Y Y +CITY_ID+STATE+CITY+TIME_ZONE WIZ1166 DB2 LOCATION 140445 140445 U Y Y +LOCATION_ID+ZIP+LONGITUDE+LATITUDE+NAME+CITY_ID WIZ1171 DB2 LOCATION 140445 140445 U Y Y +LOCATION_ID+NAME+NUMBER SQL991011111520810 DB2 DRIVER 13519 13519 P Y Y +DRIVER_ID SQL991011113441210 DB2 VENDOR_UNIT 2976 2976 P Y Y +VENDOR_UNIT_ID IDX_CODE_01 DB2 CODE 518 520 D Y Y +TYPE+CODE_TEXT+CODE IDX_STOP_1 DB2 STOP 15883 21568 D Y Y +ORDER_ID IDX201161942390000 EDI EDI_204_HUB_CUSTOMER 451173 451173 D Y Y +REFERENCE+DESCRIPTION IDX201171957580000 DB2 CITY 97516 97516 D Y Y +STATE+CITY+CITY_ID IDX201171959320000 DB2 LOCATION 140445 140445 D Y Y +CITY_ID+DROP+ZIP+LONGITUDE+LATITUDE+NAME+LOCATION_ID
PS D:\xtivia\queries> db2 set current explain mode evaluate indexes DB20000I The SQL command completed successfully. PS D:\xtivia\queries> db2 -tvf query13.sql select * from db2.order_trace where c_state = ? and dl_ap_date2 >= ? and reference not in (select reference from db2.edi_204 edi_204 where edi_204.reference = reference and edi_204.order_id = order _id and CUST_s_location_id = ? ) and (status in (? ,? ) OR ORDER_TYPE = ? ) order by dl_ap_date2, dl_ap_time2, act_state, act_city, actual, order_id SQL0217W The statement was not executed as only Explain information requests are being processed. SQLSTATE=01604 PS D:\xtivia\queries> db2exfmt -d comtrak2 -1 -o query13_ind2_exfmt.txt DB2 Universal Database Version 9.7, 5622-044 (c) Copyright IBM Corp. 1991, 2009 Licensed Material - Program Property of IBM IBM DATABASE 2 Explain Table Format Tool Connecting to the Database. Connect to Database Successful. Base table information incomplete Output is in query13_ind2_exfmt.txt. Executing Connect Reset -- Connect Reset was Successful. PS D:\xtivia\queries> db2 set current explain mode no DB20000I The SQL command completed successfully. PS D:\xtivia\queries> db2 -tvf this_index_cost_query.sql select dec(total_cost,20,4) as this_index_total_cost , dec(io_cost,20,4) as this_index_io_cost , dec(CPU_cost,20,4) as this_index_cpu_cost , dec(Comm_cost,20,4) as this_index_comm_cost from Explain_Op erator ,(select max(explain_time) as maxtime from Explain_Operator where operator_type = 'RETURN') as b where explain_time = b.maxtime and operator_type = 'RETURN' with UR THIS_INDEX_TOTAL_COST THIS_INDEX_IO_COST THIS_INDEX_CPU_COST THIS_INDEX_COMM_COST ---------------------- ---------------------- ---------------------- ---------------------- 38772.4492 11552.6152 705354496.0000 0.0000 1 record(s) selected.
Note that I do tend to run a db2exfmt each time. I’m not technically required to, but I find them useful to have to go back and look at. Again, this index appears to have ZERO impact by itself.
When I’ve gone through each one, this is what my spreadsheet looks like:
Note that by far, the biggest single impact is that wide index that I really don’t like the idea of. Only one other of the 7 recommended new indexes shows any impact by itself. This leads me to wonder why DB2 is even recommending some of the others. I’m hoping that it is because a combination of them makes a real difference. Let’s try Index Subtraction to see if that is true.
Index Subtraction
For Index Subtraction, we tell DB2 to give us the numbers as if all of the proposed indexes exist, and we take away only one at a time to see the impact of not having that particular index. The steps are very similar, just with slightly different updates to the USE_INDEX column.
To start with, we want all of the recommended indexes for this query to be marked with USE_INDEX of Y.
PS D:\xtivia\queries> db2 "update advise_index set use_index='Y' where name in ('IDX1505231647110','IDX1505231647160','IDX1505231652450','IDX1505231647170','IDX1505231648160','IDX1505231648190','IDX15 05231651490')" DB20000I The SQL command completed successfully. Index Table Index Name Table Schema Table Name Cardinality Cardinality UNIQUERULE USE_INDEX EXISTS Column Names -------------------- --------------- ------------------------- -------------------- -------------------- ---------- --------- ------ ------------------------------------------------------------ IDX1505231647110 DB2 VENDOR_UNIT 2976 2976 U Y N +VENDOR_UNIT_ID-VENDOR_ID IDX1505231647160 DB2 LOCATION 140445 140445 U Y N +LOCATION_ID-NAME IDX1505231651490 DB2 ORDER 642 52497 U Y N +TRAILER_SIZE+DL_AP_DATE2+CUSTOMER_RATE+NOTIFIED_DATE+DATE_E NTERED+COMMODITY+NEEDS_PAPERWORK+LINE_OF_BUSINESS+TOTAL_CHAR GE+MILES+TERMINAL+SCAC+SEAL+PO_NUM+PU_NUM+BL_NUM+REFERE... IDX1505231652450 DB2 RAIL_ETA 649793 649793 U Y N +ORDER_ID-STATE IDX1505231647170 EDI EDI_204_ADDITIONAL_FIELDS 19897638 19897638 D Y N +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE IDX1505231647170 EDI EDI_204_ADDITIONAL_FIELDS 19897638 19897638 D Y N +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE IDX1505231648160 DB2 CODE 5 520 D Y N +CODE_TEXT+TYPE+CODE_ID+CODE IDX1505231648190 DB2 EDI_204 295885 1014841 D Y N +CUST_S_LOCATION_ID+ORDER_ID+REFERENCE DRIVER_ID DB2 DRIVER 13519 13519 U Y Y +DRIVER_ID+FIRST_NAME+LAST_NAME+PREVIOUS_NUM+DISPATCHER IDX201162151160000 DB2 CITY 97516 97516 U Y Y +CITY_ID+STATE+CITY+TIME_ZONE WIZ1166 DB2 LOCATION 140445 140445 U Y Y +LOCATION_ID+ZIP+LONGITUDE+LATITUDE+NAME+CITY_ID WIZ1171 DB2 LOCATION 140445 140445 U Y Y +LOCATION_ID+NAME+NUMBER SQL991011111520810 DB2 DRIVER 13519 13519 P Y Y +DRIVER_ID SQL991011113441210 DB2 VENDOR_UNIT 2976 2976 P Y Y +VENDOR_UNIT_ID IDX_CODE_01 DB2 CODE 518 520 D Y Y +TYPE+CODE_TEXT+CODE IDX_STOP_1 DB2 STOP 15883 21568 D Y Y +ORDER_ID IDX201161942390000 EDI EDI_204_HUB_CUSTOMER 451173 451173 D Y Y +REFERENCE+DESCRIPTION IDX201171957580000 DB2 CITY 97516 97516 D Y Y +STATE+CITY+CITY_ID IDX201171959320000 DB2 LOCATION 140445 140445 D Y Y +CITY_ID+DROP+ZIP+LONGITUDE+LATITUDE+NAME+LOCATION_ID
Now one at a time, we will mark the indexes to not be used, and run the explain using evaluate indexes.
PS D:\xtivia\queries> db2 "update advise_index set use_index='N' where name = 'IDX1505231647110'" DB20000I The SQL command completed successfully. PS D:\xtivia\queries> db2 set current explain mode evaluate indexes DB20000I The SQL command completed successfully. PS D:\xtivia\queries> db2 -tvf query13.sql select * from db2.order_trace where c_state = ? and dl_ap_date2 >= ? and reference not in (select reference from db2.edi_204 edi_204 where edi_204.reference = reference and edi_204.order_id = order _id and CUST_s_location_id = ? ) and (status in (? ,? ) OR ORDER_TYPE = ? ) order by dl_ap_date2, dl_ap_time2, act_state, act_city, actual, order_id SQL0217W The statement was not executed as only Explain information requests are being processed. SQLSTATE=01604 PS D:\xtivia\queries> db2exfmt -d comtrak2 -1 -o query13_excl_ind1_exfmt.txt DB2 Universal Database Version 9.7, 5622-044 (c) Copyright IBM Corp. 1991, 2009 Licensed Material - Program Property of IBM IBM DATABASE 2 Explain Table Format Tool Connecting to the Database. Connect to Database Successful. Base table information incomplete Base table information incomplete Base table information incomplete Base table information incomplete Base table information incomplete Base table information incomplete Output is in query13_excl_ind1_exfmt.txt. Executing Connect Reset -- Connect Reset was Successful. PS D:\xtivia\queries> db2 set current explain mode no DB20000I The SQL command completed successfully. PS D:\xtivia\queries> db2 -tvf this_index_cost_query.sql select dec(total_cost,20,4) as this_index_total_cost , dec(io_cost,20,4) as this_index_io_cost , dec(CPU_cost,20,4) as this_index_cpu_cost , dec(Comm_cost,20,4) as this_index_comm_cost from Explain_Op erator ,(select max(explain_time) as maxtime from Explain_Operator where operator_type = 'RETURN') as b where explain_time = b.maxtime and operator_type = 'RETURN' with UR THIS_INDEX_TOTAL_COST THIS_INDEX_IO_COST THIS_INDEX_CPU_COST THIS_INDEX_COMM_COST ---------------------- ---------------------- ---------------------- ---------------------- 18741.7597 5339.1113 131033120.0000 0.0000 1 record(s) selected.
When I cycle through each index that way, ensuring that in each round I eliminate only one, this is what my spreadsheet looks like:
Results
With this information, I can immediately eliminate indexes: IDX1505231647110, IDX1505231647160, IDX1505231652450, IDX1505231648160, and IDX1505231647170. This leaves me with two indexes to consider. Given the fact that EDI_204 is one of the largest and most active tables in my database, I won’t be adding that index for the small additional impact that it gives for this query. I’m left with just one major index:
CREATE UNIQUE INDEX "DB2ADMIN"."IDX1505231651490" ON "DB2 "."ORDER" ("TRAILER_SIZE" ASC, "DL_AP_DATE2" ASC, "CUSTOMER_RATE" ASC, "NOTIFIED_DATE" ASC, "DATE_ENTERED" ASC, "COMMODITY" ASC, "NEEDS_PAPERWORK" ASC, "LINE_OF_BUSINESS" ASC, "TOTAL_CHARGE" ASC, "MILES" ASC, "TERMINAL" ASC, "SCAC" ASC, "SEAL" ASC, "PO_NUM" ASC, "PU_NUM" ASC, "BL_NUM" ASC, "REFERENCE" ASC, "WHO_ENTERED" ASC, "ORDER_TYPE" ASC, "TRACTOR_ID" ASC, "DRIVER_ID" ASC, "DL_AT_TIME2" ASC, "DL_AT_DATE2" ASC, "DL_AT_TIME1" ASC, "DL_AT_DATE1" ASC, "DL_AP_TIME2" ASC, "DL_AP_TIME1" ASC, "DL_AP_DATE1" ASC, "PU_AT_TIME2" ASC, "PU_AT_DATE2" ASC, "PU_AT_TIME1" ASC, "PU_AT_DATE1" ASC, "PU_AP_TIME2" ASC, "PU_AP_DATE2" ASC, "PU_AP_TIME1" ASC, "PU_AP_DATE1" ASC, "S_LOCATION_ID" ASC, "CHASSIS" ASC, "CHECK_DIGIT" ASC, "TRAILER" ASC, "STATUS" ASC, "ORDER_ID" ASC) INCLUDE ("C_LOCATION_ID", "ACT_LOCATION_ID", "BTO_LOCATION_ID") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
Given how wide this index is, I’m really concerned about adding it to a table that is already a bit over-indexed, so I haven’t decided yet if I’m going to add it. But at least I now understand that it’s the only index really worth considering for this query.
I find that for now, I like to use both the index addition and subtraction methods to really see what difference there might be in the results.
A big thanks to Scott Hayes for covering this topic in depth at his IDUG North America presentation this year. This process has become a regular tool in my arsenal.
[…] Multiple Index Regression Analysis […]
Hi Ember this is a very nice article, thanks for sharing it !!.
After you run db2advis you can also get the explain plan of the query with the recommended/new indexes. In this case you didn’t copy/paste it but I wonder; most likely the big new index (IDX1505210120080) is used in a join with another table. Don’t you think maybe by creating an index using only the column(s) joined with that other table could provide a good result (timerons) without including the rest of the columns?. Cause normally DB2 will only suggest such huge indexes to do an index-scan…. but I guess the most important thing is to optimize the join.
Regards
Absolutely, there is additional analysis I could do as to which columns in that index are truly necessary. Would probably make a good separate blog entry.
You can do this from Data Studio and the Query Tuner. After the Index Advisor is run, you Test Candidate indexes by running explain with different combinations of indexes added or removed. You can then do a graph compares to see which indexes helped the most.
Is the Query Tuner a pay-to-use tool? Not a fan of GUIs, so don’t use them much.
Even Im not but i recently tried my hands on the Query Tuner in IBM DS, believe me its generating an awesome report. Yeah i agree its basically drawing the data from explain information but html report its generating on a consolidated basis can be used for further discussion with the developers and RCA.
[…] Use a modified version of Multiple Index Regression Analysis […]
[…] Use a modified version of Multiple Index Regression Analysis […]
[…] I’ve ever written, they are shocked. Was it my one of my excellent articles on BLU? Was it my article describing how to determine the relative impact of recommended indexes without adding the in…? Was it the description of how to use a locking event monitor to analyze locktimeouts and […]
[…] I’ve ever written, they are shocked. Was it my one of my excellent articles on BLU? Was it my article describing how to determine the relative impact of recommended indexes without adding the in…? Was it the description of how to use a locking event monitor to analyze locktimeouts and […]
Bit of a late response:-) I’ve been thinking about investigating this since I saw Scotts presentation at IDUG a couple of years ago, but I somehow forgot about it. The other day I stumbled on your article whilst looking for something else, so I thought I give it a shot. The article is great and easy to follow, so it was easy to get things going.
I soon discovered that even with a moderate number of indexes, the process of evaluating the cost for different index combinations was a bit time-consuming. I also managed to screw up which indexes to enable/disable, so I had to start over to get things right. Here’s a sketch on how to automate this process, the idea is to generate a SQL script that saves the cost for each combination to a table. In the example, I only use total cost, but that easy is easy to extend. I also generate code for all possible index combinations. The SQL-generator is written in python2, so I guess it should work in a Jupyter notebook (haven’t tried though):
{code}
#!/usr/bin/python
import sys
from itertools import chain, combinations
# table to keep track of cost for combination of indexes
costtbl = “create table costtbl (nr int not null primary key, total_cost decimal(20,4) not null);”
# indexes proposed by db2advis
indexes = [ “IDX1904090821410”, “IDX1904090821340” ]
# the query under investigation
query = “select count(1) from nya.application a join nya.admission_round ar on a.admissionround_id = ar.admissionround_id where ar.process_state_id ./myscript.py | tee myscript.sql
#> db2 connect to mydb
#> db2 +c -s -tf myscript.sql
#> db2 commit
#> db2 “select * from costtbl”
NR TOTAL_COST
———– ———————-
0 40635.9140
1 248.7855
2 40635.9140
3 248.7855
NR is a bit-mask for the array of indexes (3 means first and second index). It’s just a sketch, but once the data is in a table we can do all sort of analyses on it. It should also be easy to add more dimensions to the cost-table.
Thanks for a great article.
I actually have a working Jupyter Notebook for this now, it is just not quite ready to publish yet.