SQL Differences Between Db2 and Informix

Posted by

I have been working with Informix from a user perspective recently. I learned about some of the SQL differences and developed a greater empathy for developers in the process.

CTE Support

After repeatedly receiving errors on a statement I was trying to execute – both from Jupyter Notebook and from the command line using dbaccess, I was absolutely flabbergasted to learn that INFORMIX DOES NOT SUPPORT CTEs. Now that’s true as of the writing of this blog entry and some insider sources tell me that support may be added in a newer version of Informix in the very near future. Before you take my word for it, especially if you’re reading this more than two weeks after I published it, please verify this is true for your version.

CTEs are one of my pet SQL features. They’ve been in the SQL standard since ANSI SQL 99, and in most RDBMSes for at least 10 years. Even to me, that seems like forever. You’ll find them in some of the SQL in my DB2 Administrative SQL Cookbook Queries. A CTE is a key component of my favorite query for finding problem SQL in the package cache. This is a query I use a minimum of once a week, and sometimes it seems more like ten times a day.

To me, CTEs offer a great way of calculating numbers across a range of data and then comparing the numbers for individual objects to the aggregate values. They are a key part of queries involving recursion, such as the one Ian Bjorhovde wrote on reporting transaction log files archived by hour. They’re just a dang handy way of calculating something you’ll refer to in a query more than once.

In the query I was trying to use in this case, I was using the LAG function to calculate differences in values from one row to the next, so the CTE was critical. With a bit of research, though, I found I could achieve the same effect by using a subselect in my where clause. That means instead of a query like this:

WITH SUM_TAB (SUM_RR, SUM_CPU, SUM_EXEC, SUM_SORT, SUM_NUM_EXEC) AS (
        SELECT  FLOAT(SUM(ROWS_READ)),
                FLOAT(SUM(TOTAL_CPU_TIME)),
                FLOAT(SUM(STMT_EXEC_TIME)),
                FLOAT(SUM(TOTAL_SECTION_SORT_TIME)),
                FLOAT(SUM(NUM_EXECUTIONS))
            FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T
        )
SELECT
        SUBSTR(STMT_TEXT,1,10) as STATEMENT,
        ROWS_READ,
        DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) AS PCT_TOT_RR,
        TOTAL_CPU_TIME,
        DECIMAL(100*(FLOAT(TOTAL_CPU_TIME)/SUM_TAB.SUM_CPU),5,2) AS PCT_TOT_CPU,
        STMT_EXEC_TIME,
        DECIMAL(100*(FLOAT(STMT_EXEC_TIME)/SUM_TAB.SUM_EXEC),5,2) AS PCT_TOT_EXEC,
        TOTAL_SECTION_SORT_TIME,
        DECIMAL(100*(FLOAT(TOTAL_SECTION_SORT_TIME)/SUM_TAB.SUM_SORT),5,2) AS PCT_TOT_SRT,
        NUM_EXECUTIONS,
        DECIMAL(100*(FLOAT(NUM_EXECUTIONS)/SUM_TAB.SUM_NUM_EXEC),5,2) AS PCT_TOT_EXEC,
        DECIMAL(FLOAT(STMT_EXEC_TIME)/FLOAT(NUM_EXECUTIONS),10,2) AS AVG_EXEC_TIME
    FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T, SUM_TAB
    WHERE DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) > 10
        OR DECIMAL(100*(FLOAT(TOTAL_CPU_TIME)/SUM_TAB.SUM_CPU),5,2) >10
        OR DECIMAL(100*(FLOAT(STMT_EXEC_TIME)/SUM_TAB.SUM_EXEC),5,2) >10
        OR DECIMAL(100*(FLOAT(TOTAL_SECTION_SORT_TIME)/SUM_TAB.SUM_SORT),5,2) >10
        OR DECIMAL(100*(FLOAT(NUM_EXECUTIONS)/SUM_TAB.SUM_NUM_EXEC),5,2) >10
    ORDER BY ROWS_READ DESC FETCH FIRST 20 ROWS ONLY WITH UR;

I could do this:

SELECT
        SUBSTR(STMT_TEXT,1,10) as STATEMENT,
        ROWS_READ,
        DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) AS PCT_TOT_RR,
        TOTAL_CPU_TIME,
        DECIMAL(100*(FLOAT(TOTAL_CPU_TIME)/SUM_TAB.SUM_CPU),5,2) AS PCT_TOT_CPU,
        STMT_EXEC_TIME,
        DECIMAL(100*(FLOAT(STMT_EXEC_TIME)/SUM_TAB.SUM_EXEC),5,2) AS PCT_TOT_EXEC,
        TOTAL_SECTION_SORT_TIME,
        DECIMAL(100*(FLOAT(TOTAL_SECTION_SORT_TIME)/SUM_TAB.SUM_SORT),5,2) AS PCT_TOT_SRT,
        NUM_EXECUTIONS,
        DECIMAL(100*(FLOAT(NUM_EXECUTIONS)/SUM_TAB.SUM_NUM_EXEC),5,2) AS PCT_TOT_EXEC,
        DECIMAL(FLOAT(STMT_EXEC_TIME)/FLOAT(NUM_EXECUTIONS),10,2) AS AVG_EXEC_TIME
    FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T
        , ( SELECT  FLOAT(SUM(ROWS_READ)),
                FLOAT(SUM(TOTAL_CPU_TIME)),
                FLOAT(SUM(STMT_EXEC_TIME)),
                FLOAT(SUM(TOTAL_SECTION_SORT_TIME)),
                FLOAT(SUM(NUM_EXECUTIONS))
            FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T) AS SUM_TAB
    WHERE DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) > 10
        OR DECIMAL(100*(FLOAT(TOTAL_CPU_TIME)/SUM_TAB.SUM_CPU),5,2) >10
        OR DECIMAL(100*(FLOAT(STMT_EXEC_TIME)/SUM_TAB.SUM_EXEC),5,2) >10
        OR DECIMAL(100*(FLOAT(TOTAL_SECTION_SORT_TIME)/SUM_TAB.SUM_SORT),5,2) >10
        OR DECIMAL(100*(FLOAT(NUM_EXECUTIONS)/SUM_TAB.SUM_NUM_EXEC),5,2) >10
    ORDER BY ROWS_READ DESC FETCH FIRST 20 ROWS ONLY WITH UR;

The logic of the query really doesn’t have to change here, just where you define the “temporary table” your query is referencing. It turned out this wasn’t a big deal, but just meant a bit of extra work.

Part of the learning experience for me here was that just the week before I had helped analyze a query a developer wrote that used the latter logic, and I wondered at the time why they didn’t just use CTEs. It wasn’t a silly way for the developer to do things, just a different way, and maybe even one that’s more likely to work across RDBMSes.

Time format

This is something I was already aware was different between the various RDBMS dialects of SQL. This one was not a surprise to me. I noted the difference in date handling 8 years ago when I worked my way through the lowest level of Oracle certifications (OCA). However, the Informix way of working with dates doesn’t really resemble either one. My first lesson here was how to control the portion of the date that was returned, since I wanted to average about 6 values per hour into one value. This is the syntax I ended up with to do that:

select timestamp::DATETIME YEAR TO HOUR AS hour
    , 100 - avg(cpu_id) as cpu_busy
from os_vmstat
where mail_id='cli_dm_db1_prod1'
group by 1
order by 1

I have to admit, I kind of like the Informix syntax for casting dates. I find it more intuitive than the Db2 way of doing the same things after only a couple of days working with it. I wish Db2 would enable this syntax as an option at the very least. It took me a couple of tries to get used to it, but specifying the full range of what you want – YEAR TO HOUR makes a lot of sense to me. The double colon for casting also just seems intuitive to me after very little time working with it.

I thought I’d offer the same examples I used in my post comparing how Oracle and Db2 handle dates to show the Informix differences. I also have a Jupyter Notebook with the code I used here, much of which would execute in any database, in my GitHub repository.

Example #1:

The first example is syntax to find rows from two days in the past or 7 days into the future. The syntax for that in the blog entry about Oracle was strange because I was copying the same way that the example Oracle syntax did it. This time I’ll do it a bit different and the way I really would do it in Db2:

select *
from sales
where ts_sales_date between date('2006-03-31') - 2 days and date('2006-03-31') + 7 days
order by ts_sales_date
with ur

To do something similar in Informix, I would use:

select *
from sales
and ts_sales_date::DATETIME year to day between DATE('03/31/2006') - 2 UNITS DAY and DATE('03/31/2006') + 7 UNITS DAY
order by ts_sales_date

Not too difficult or too different, though the date format it was expecting threw me for a minute or two. I’m just so used to yyyy-mm-dd. As mentioned above, I have learned and actually like the syntax for casting a date to whatever level of precision needed which is of the format:

date_expression::DATETIME large_end to small_end

Where the date_expression is a column or other representation of a timestamp. This can also include math on the timestamp, as is seen in example 2, below.
The large_end is the largest part of the date – in the example above, that’s YEAR.
The small_end is the smallest part of the timestamp – in the example above, the day. The example above returns just the date, but this is supremely useful when you want to group things by hour, as in the example given earlier.

Example #2

The next example is to simply find what the date was 36 hours ago. Here’s how I would do it in Db2:

select date(current timestamp - 36 hours) 
from sysibm.sysdummy1

And in Informix, this works:

select (current - 36 units hour)::DATETIME YEAR to DAY 
from sysmaster:sysdual

This makes just as much sense as the Db2 way of doing it to me.

Example #3

For this example, the goal is to find the day of the week one month ago. Now if you look at the Jupyter Notebook, I ran these in March, making for the most boring possible result – the day of the week is exactly the same one month ago as today. But both methods here do correctly handle a 28-day month. Here’s the Db2 method:

select dayname(current timestamp - 1 month) weekday 
from sysibm.sysdummy1

And here it is in Informix:

select case WEEKDAY(current - 1 units month) 
  when 0 then 'Sunday'
  when 1 then 'Monday'
  when 2 then 'Tuesday'
  when 3 then 'Wednesday'
  when 4 then 'Thursday'
  when 5 then 'Friday'
  when 6 then 'Saturday' end as weekday
from sysmaster:sysdual 

Notice that I had to resort to a case statement here. I could not find a way to return the day name like I can in Db2. I could only find a way to get the number that corresponds to the day of the week. I suspect Informix’s way may actually be closer to the SQL standard. I’m also pretty sure I could also do it this way in Db2.

This doesn’t entirely mean a way to return the day name doesn’t exist in Informix. I probably only spent 30 minutes looking. I would love a comment below if anyone knows a way to do it in Informix.

Example #4

The final example here is finding the exact time 450 seconds from now. Here’s how to do it in Db2:

select current timestamp + 450 seconds 
from sysibm.sysdummy1

And in Informix, this works:

select current + 450 units second 
from sysmaster:sysdual

The Db2 one reads a bit more like natural language to me, but I’m not sure if that is a good or a bad thing.

Summary

Overall, there are clearly differences in the SQL you must write when working with Informix. This whole experience has lead me to have a much deeper and genuine empathy for developers. As a database administrator, I often wonder at how some developers get so far in their career knowing so little SQL. But it is difficult when fairly basic things like handling dates just don’t have a common syntax among the various platforms – even RDBMSes from the same vendor!

Lead Db2 Database Engineer and Service Delivery Manager , XTIVIA
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 17 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home for XTIVIA, leading a team of Db2 DBAs.

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.