One of the top SQL tips on many lists for developers of SQL is to avoid applying a function to the table data. I think of this as avoiding funcitons on the left of my comparison operator (=, !=, >, <, etc) – though that’s a simplification that may not always be correct.
The main reason to avoid this is because DB2 cannot make use of an index for a function like that. In DB2 10.5, you may be able to use an expression-based index to alleviate this when it is unavoidable, but you cannot reasonably create an index for every possible function. Especially in e-commerce databases, we need to keep indexing to a minimum. I would reserve that for critical queries where you cannot avoid a function or expression on a table column.
Real World Example
This is a precept in writing high-performing queries, and one I have readily repeated when asked about SQL performance. But when I was recently working on a reporting query that was intended to run against a WebSphere Commerce database, I tried the query both ways to see what difference it really made. The results were astounding – with the recommended syntax, the otherwise identical query ran nearly 45% faster.
The only change between two explains of a query was altering this clause:
date(o.timeplaced) = current date - 1 day
to this:
o.timeplaced between timestamp(current date - 1 day,'00.00.00') and timestamp(current date - 1 day,'23.59.59')
Warning – the explain plans were done before further optimization, so are a bit ugly – also this is a complicated query – the developers rarely engage a DBA for the simple ones. The explain plan for the first, less efficient method was this:
Access Plan: ----------- Total Cost: 216.313 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 7.07267e-08 >^NLJOIN ( 2) 216.313 110.842 /----------------------+-----------------------\ 7.07267e-08 1 FILTER FETCH ( 3) ( 28) 201.17 22.7039 108.841 3.00042 | /---+----\ 0.00824711 1 276276 >^NLJOIN IXSCAN TABLE: WSCOMUSR ( 4) ( 29) CATENTRY 201.17 15.1372 Q1 108.841 2 /---------------------------+---------------------------\ | 0.00824711 1 276276 >NLJOIN FETCH INDEX: SYSIBM ( 5) ( 26) SQL120207194339220 186.027 22.7039 Q1 106.841 3.00042 /--------------------------+--------------------------\ /---+----\ 0.00824711 1 1 276276 >NLJOIN FETCH IXSCAN TABLE: WSCOMUSR ( 6) ( 24) ( 27) CATENTRY 170.855 15.1716 15.1372 Q16 104.838 2.0025 2 /-------------------------+-------------------------\ /---+---\ | 0.00824711 0.406433 1 276274 276276 >NLJOIN FETCH IXSCAN TABLE: WSCOMUSR INDEX: SYSIBM ( 7) ( 22) ( 25) CATENTDESC SQL120207194339220 160.207 10.648 7.5892 Q14 Q16 103.432 1.40643 1 /----------------------+-----------------------\ /---+----\ | 0.00800992 1.02961 0.406433 2148 276274 >NLJOIN FETCH IXSCAN TABLE: WSCOMUSR INDEX: WSCOMUSR ( 8) ( 20) ( 23) OICOMPLIST I0001248 144.966 15.2419 7.57382 Q12 Q14 101.418 2.01364 1 /-----------------+------------------\ /---+----\ | 0.00800992 0.134035 1.02961 5285 2148 >^NLJOIN FETCH IXSCAN TABLE: WSCOMUSR INDEX: WSCOMUSR ( 9) ( 18) ( 21) ORDERITEMS I0000301 136.378 8.58738 7.57522 Q10 Q12 100.284 1.13403 1 /--------+--------\ /---+----\ | 0.00800992 1 0.134035 688 5285 TBSCAN FETCH IXSCAN TABLE: WSCOMUSR INDEX: WSCOMUSR ( 10) ( 16) ( 19) ORDPROMOCD I172138 121.045 15.3337 7.57342 Q8 Q10 98.2582 2.02592 1 | /---+----\ | 0.00800992 1 4051 688 SORT IXSCAN TABLE: WSCOMUSR INDEX: WSCOMUSR ( 11) ( 17) ADDRESS I0000418 121.044 7.57419 Q6 Q8 98.2582 1 | | 0.00800992 4051 FETCH INDEX: SYSIBM ( 12) SQL120207194329140 121.044 Q6 98.2582 /---+----\ 934 5133 RIDSCN TABLE: WSCOMUSR ( 13) ORDERS 17.1415 Q4 2.18352 | 934 SORT ( 14) 17.1412 2.18352 | 934 IXSCAN ( 15) 16.9044 2.18352 | 5133 INDEX: WSCOMUSR I0000654 Q4
The detail on the operator where the predicate is applied looks like this:
12) FETCH : (Fetch) Cumulative Total Cost: 121.044 Cumulative CPU Cost: 5.38439e+06 Cumulative I/O Cost: 98.2582 Cumulative Re-Total Cost: 0.931069 Cumulative Re-CPU Cost: 3.94233e+06 Cumulative Re-I/O Cost: 0 Cumulative First Row Cost: 120.94 Estimated Bufferpool Buffers: 85.5416 Arguments: --------- MAX RIDS: (Maximum RIDs per list prefetch request) 623 PREFETCH: (Type of Prefetch) LIST ROWLOCK : (Row Lock intent) NONE SPEED : (Assumed speed of scan, in sharing structures) SLOW TABLOCK : (Table Lock intent) INTENT NONE TBISOLVL: (Table access Isolation Level) UNCOMMITTED READ THROTTLE: (Scan may be throttled, for scan sharing) FALSE VISIBLE : (May be included in scan sharing structures) FALSE WRAPPING: (Scan may start anywhere and wrap) FALSE Predicates: ---------- 14) Sargable Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.18196 Predicate Text: -------------- (Q4.STOREENT_ID = 12001) 15) Sargable Predicate, Comparison Operator: In List (IN), evaluated by binary search (list sorted at compile-time) Subquery Input Required: No Filter Factor: 0.0440203 Predicate Text: -------------- Q4.STATUS IN ('C', 'D', 'M', 'R', 'S') 16) Sargable Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.000194818 Predicate Text: -------------- (DATE(Q4.TIMEPLACED) = $C0) Input Streams: ------------- 4) From Operator #13 Estimated number of rows: 934 Number of columns: 1 Subquery predicate ID: Not Applicable Column Names: ------------ +Q4.$RID$(A) 5) From Object WSCOMUSR.ORDERS Estimated number of rows: 5133 Number of columns: 9 Subquery predicate ID: Not Applicable Column Names: ------------ +Q4.TOTALADJUSTMENT+Q4.ADDRESS_ID+Q4.STATUS +Q4.TIMEPLACED+Q4.STOREENT_ID+Q4.TOTALSHIPPING +Q4.TOTALTAX+Q4.TOTALPRODUCT+Q4.ORDERS_ID Output Streams: -------------- 6) To Operator #11 Estimated number of rows: 0.00800992 Number of columns: 8 Subquery predicate ID: Not Applicable Column Names: ------------ +Q5.TOTALSHIPPING+Q5.TOTALTAX +Q5.TOTALADJUSTMENT+Q5.TOTALPRODUCT+Q5.STATUS +Q5.TIMEPLACED+Q5.ORDERS_ID+Q5.ADDRESS_ID
The explain plan for the second, more efficient method was this:
Access Plan: ----------- Total Cost: 120.626 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 7.5497e-05 >^NLJOIN ( 2) 120.626 15.9328 /--------+--------\ 7.5497e-05 1 TBSCAN FETCH ( 3) ( 26) 105.483 22.7039 13.9323 3.00042 | /---+----\ 7.5497e-05 1 276276 SORT IXSCAN TABLE: WSCOMUSR ( 4) ( 27) CATENTRY 105.483 15.1372 Q1 13.9323 2 | | 7.5497e-05 276276 FILTER INDEX: SYSIBM ( 5) SQL120207194339220 105.482 Q1 13.9323 | 0.147583 >^NLJOIN ( 6) 105.482 13.9323 /---------------------------+---------------------------\ 0.147583 1 >NLJOIN FETCH ( 7) ( 24) 90.3387 22.7039 11.9319 3.00042 /--------------------------+--------------------------\ /---+----\ 0.147583 1 1 276276 >NLJOIN FETCH IXSCAN TABLE: WSCOMUSR ( 8) ( 22) ( 25) CATENTRY 75.1671 15.1716 15.1372 Q16 9.92941 2.0025 2 /-------------------------+-------------------------\ /---+---\ | 0.147583 0.406433 1 276274 276276 >NLJOIN FETCH IXSCAN TABLE: WSCOMUSR INDEX: SYSIBM ( 9) ( 20) ( 23) CATENTDESC SQL120207194339220 64.5191 10.648 7.5892 Q14 Q16 8.52297 1.40643 1 /-----------------------+-----------------------\ /---+----\ | 0.143338 1.02961 0.406433 2148 276274 >NLJOIN FETCH IXSCAN TABLE: WSCOMUSR INDEX: WSCOMUSR ( 10) ( 18) ( 21) OICOMPLIST I0001248 49.2773 15.2419 7.57382 Q12 Q14 6.50934 2.01364 1 /------------------+-------------------\ /---+----\ | 0.143338 0.134035 1.02961 5285 2148 >^NLJOIN FETCH IXSCAN TABLE: WSCOMUSR INDEX: WSCOMUSR ( 11) ( 16) ( 19) ORDERITEMS I0000301 40.6899 8.58738 7.57522 Q10 Q12 5.3753 1.13403 1 /----------+----------\ /---+----\ | 0.143338 1 0.134035 688 5285 FETCH FETCH IXSCAN TABLE: WSCOMUSR INDEX: WSCOMUSR ( 12) ( 14) ( 17) ORDPROMOCD I172138 25.3562 15.3337 7.57342 Q8 Q10 3.34938 2.02592 1 /---+----\ /---+----\ | 17.8951 5133 1 4051 688 IXSCAN TABLE: WSCOMUSR IXSCAN TABLE: WSCOMUSR INDEX: WSCOMUSR ( 13) ORDERS ( 15) ADDRESS I0000418 7.58131 Q4 7.57419 Q6 Q8 1 1 | | 5133 4051 INDEX: WSCOMUSR INDEX: SYSIBM I173124 SQL120207194329140 Q4 Q6
Details on the operator where this particular predicate is applied look like this:
13) IXSCAN: (Index Scan) Cumulative Total Cost: 7.58131 Cumulative CPU Cost: 90214.1 Cumulative I/O Cost: 1 Cumulative Re-Total Cost: 0.00986093 Cumulative Re-CPU Cost: 41753.1 Cumulative Re-I/O Cost: 0 Cumulative First Row Cost: 7.57403 Estimated Bufferpool Buffers: 2 Arguments: --------- MAXPAGES: (Maximum pages for prefetch) 1 PREFETCH: (Type of Prefetch) NONE ROWLOCK : (Row Lock intent) NONE SCANDIR : (Scan Direction) FORWARD TABLOCK : (Table Lock intent) INTENT NONE TBISOLVL: (Table access Isolation Level) UNCOMMITTED READ Predicates: ---------- 16) Stop Key Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.0116209 Predicate Text: -------------- (Q4.TIMEPLACED <= TIMESTAMP((CURRENT DATE - 1 DAYS), '23.59.59.999999')) 17) Start Key Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.0116209 Predicate Text: -------------- (TIMESTAMP((CURRENT DATE - 1 DAYS), '00.00.00.000000') <= Q4.TIMEPLACED) Input Streams: ------------- 1) From Object WSCOMUSR.I173124 Estimated number of rows: 5133 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q4.TIMEPLACED(A)+Q4.$RID$ Output Streams: -------------- 2) To Operator #12 Estimated number of rows: 17.8951 Number of columns: 2 Subquery predicate ID: Not Applicable Column Names: ------------ +Q4.TIMEPLACED(A)+Q4.$RID$
Basically, in the less efficient method, the date predicate is applied during the FETCH from the orders table, accounting for the vast majority of timerons in that step (nearly 100). In the more efficient method, we can see that an index is insted used before the fetch from the orders table, vastly reducing the cost of that fetch.
If we look at that FETCH operator for the ORDERS table at the far lower left in each plan, we can also see that DB2 has to do less I/O - estimating ~3.3 I/Os instead of ~98.
This is an example of the drastic difference that it makes to apply functions in the optimal place. This particular query is only slated to run daily, so I'm not terribly worried about 100 timerons, but it was fun to experiment and prove one of the precepts of good SQL right.
Love it when I read “left of my comparison operator”.
I also code: [code]
select lastname from employee a where a.firstnme = ‘JOHN’ [/code]
But sometimes I see SQL like this: [code]
select lastname from employee a where ‘JOHN’ = a.firstnme[/code]
It looks very odd to me, but it seems to work as well. I never found arguments why the db-column should be on the left…. Are there?
Hi Dick,
the ‘other way round’ is mostly a coding pattern to prevent unintentional assignment, when there is an assignment operator ‘=’ and a comparison operator ‘==’. Some people do it even here… 😉
Cheers
Roland
Cool, I learn something new every day!
Nice article. Well done Ember!