The Danger of FETCH FIRST ROW ONLY

There are a a vast array of things you can do in SQL. But just because you can doesn’t always mean that you should.

I recently ran into an interesting situation with FETCH FIRST ROW ONLY.

How the Problem Manifested

The day after new code was deployed, a production database system that usually runs with 40% CPU utilization or less was suddenly pegged at 100% CPU utilization. After a bit of analysis, we determined that a single query was responsible for 64% of the CPU time being used by queries. This happens sometimes when a base WebSphere Commerce feature that we haven’t used on a particular site before or when a custom SQL is added without proper review by a DBA. In those cases, most of the time, we just have to index for the SQL.

The Roots of the Issue

In this case, when we looked into the SQL, it turned out to be one we had spent quite a bit of time writing and re-writing and tuning three months ago. Upon close inspection, however, it was slightly different than the SQL that had come out of our tuning exercises. The reviewed and approved SQL had been:

SELECT 
CATENTREL.CATENTRY_ID_CHILD 
FROM CATENTREL     INNER JOIN CATENTDESC ON (CATENTREL.CATENTRY_ID_CHILD = CATENTDESC.CATENTRY_ID) 
                   INNER JOIN CATENTRY ON (CATENTRY.CATENTRY_ID = CATENTDESC.CATENTRY_ID)
                   , CATGPENREL
WHERE CATENTREL.CATENTRY_ID_PARENT = ? 
       AND CATENTRY.MARKFORDELETE = 0 
       AND CATENTDESC.PUBLISHED = 1 
       AND CATGPENREL.CATENTRY_ID=CATENTREL.CATENTRY_ID_CHILD 
       AND CATGPENREL.CATALOG_ID=? 
       AND not exists(   SELECT 1
                               FROM CATFILTER CF, 
                                           CFPRODUCTSET CFPS, 
                                           PRODUCTSET PS, 
                                           PRSETCEREL PR, 
                                           TERMCOND TC, 
                                           CONTRACT C, 
                                           STOREENT S
                               WHERE       CF.CATFILTER_ID = CFPS.CATFILTER_ID 
                                           AND CFPS.PRODUCTSET_ID = PS.PRODUCTSET_ID 
                                           AND CFPS.TYPE = 0 
                                           AND PS.MARKFORDELETE = 0 
                                           AND PS.PRODUCTSET_ID = PR.PRODUCTSET_ID
                                           AND CF.CATFILTER_ID = TC.BIGINTFIELD1 
                                           AND TC.TRADING_ID = C.CONTRACT_ID 
                                           AND TC.TCSUBTYPE_ID = 'CatalogFilterTC'
                                           AND C.MEMBER_ID = S.MEMBER_ID 
                                           AND S.STOREENT_ID IN (?) 
                                           AND PR.CATENTRY_ID= CATENTREL.CATENTRY_ID_CHILD )
GROUP BY     CATENTREL.CATENTRY_ID_CHILD, 
             CATENTREL.SEQUENCE 
ORDER BY     COUNT(*) DESC, 
             CATENTREL.SEQUENCE ASC 
FETCH FIRST ROW ONLY;

While the actual executing query we found was:

SELECT 
CATENTREL.CATENTRY_ID_CHILD 
FROM CATENTREL     INNER JOIN CATENTDESC ON (CATENTREL.CATENTRY_ID_CHILD = CATENTDESC.CATENTRY_ID) 
                   INNER JOIN CATENTRY ON (CATENTRY.CATENTRY_ID = CATENTDESC.CATENTRY_ID)
                   , CATGPENREL 
WHERE CATENTREL.CATENTRY_ID_PARENT = ? 
       AND CATENTDESC.PUBLISHED = 1 
       AND CATENTRY.MARKFORDELETE = 0 
       AND not exists(   SELECT 1 
                               FROM CATFILTER CF, 
                                           CFPRODUCTSET CFPS, 
                                           PRODUCTSET PS, 
                                           PRSETCEREL PR,
                                           TERMCOND TC, 
                                           CONTRACT C, 
                                           STOREENT S 
                               WHERE       CF.CATFILTER_ID = CFPS.CATFILTER_ID 
                                           AND CFPS.PRODUCTSET_ID = PS.PRODUCTSET_ID 
                                           AND CFPS.TYPE = 0 
                                           AND PS.MARKFORDELETE = 0 
                                           AND PS.PRODUCTSET_ID = PR.PRODUCTSET_ID 
                                           AND CF.CATFILTER_ID = TC.BIGINTFIELD1 
                                           AND TC.TRADING_ID = C.CONTRACT_ID 
                                           AND TC.TCSUBTYPE_ID = 'CatalogFilterTC' 
                                           AND C.MEMBER_ID = S.MEMBER_ID 
                                            AND S.STOREENT_ID = ?  
                                           AND PR.CATENTRY_ID= CATENTREL.CATENTRY_ID_CHILD ) 
GROUP BY     CATENTREL.CATENTRY_ID_CHILD, 
             CATENTREL.SEQUENCE 
ORDER BY     COUNT(*) DESC, 
             CATENTREL.SEQUENCE ASC 
FETCH FIRST ROW ONLY

In the two queries above I’ve highlighted the differences in red. The difference in timerons when I ran an explain on each was drastic – 8,149 timerons for the one actually running as compared to just 656 for the one we had planned. This particular query executes 18 times for every page view, so the difference is drastic when you have end users hitting the website.

Resolving the Issue

When we look at the actual differences, we see that one of the where conditions eliminated, “AND CATGPENREL.CATENTRY_ID=CATENTREL.CATENTRY_ID_CHILD” is actually the join condition for the final table in the FROM clause! This means that table is generating a cartesian product. 18 times for every page load. Sheesh, no wonder there’s a performance problem. The reason it shows up as a CPU bottleneck is because the data is all in the bufferpools. It’s not actually a very huge table. It turns out that a developer meant to remove that table – CATGPENREL – from the query entirely, and simply mistakenly did not remove it from the FROM clause. The developer then did not run the corrected query by a DBA. Additionally, this particular client does not involve DBAs in load testing prior to go-live, so there was no opportunity for a DBA to see it and raise a red flag.

Use of FETCH FIRST ROW ONLY

Part of the reason this issue was not obvious is because of the “FETCH FIRST ROW ONLY”. In this case, it was needed, as some values would cause multiple rows to be returned and only the first row was needed. But at the same time, it masked the problem with the query. If the query was supposed to return one row, and it instead returned thousands, it would have been obvious that there was a problem with the query. The moral of the story here is that when queries are using “FETCH FIRST ROW ONLY”, you may want to take an extra careful look to see if they are processing more rows than they have to.

Ember Crooks
Ember Crooks

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, spent a year working with high-volume MySQL, and is now learning Snowflake. Ember shares both posts about her core skill sets and her journey learning Snowflake.

Ember lives in Denver and work from home

Articles: 544

3 Comments

  1. And a big warning for 9.7 FP 9!! If you have a FIRST n ROWS clause and the table was runstats while it was empty (but now has rows) then the resultset is EMPTY! Described in APAR IC98604: INCORRECT RESULTSET MIGHT BE RETURNED WHEN STATISTICS WERE PREVIOUSLY GATHERED ON AN EMPTY TABLE. A good friend found out just in time before it hit production! IBM seems pretty calm about it… I think the Fix pack should be pulled for such a major INCORROUT. Only applies to FP9. Greetz, Klaas Brant

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.