I have been working with Informix from a user perspective recently. I learned about some of the SQL differences and developed a greater empathy for developers in the process.
CTE Support
After repeatedly receiving errors on a statement I was trying to execute – both from Jupyter Notebook and from the command line using dbaccess, I was absolutely flabbergasted to learn that INFORMIX DOES NOT SUPPORT CTEs. Now that’s true as of the writing of this blog entry and some insider sources tell me that support may be added in a newer version of Informix in the very near future. Before you take my word for it, especially if you’re reading this more than two weeks after I published it, please verify this is true for your version.
CTEs are one of my pet SQL features. They’ve been in the SQL standard since ANSI SQL 99, and in most RDBMSes for at least 10 years. Even to me, that seems like forever. You’ll find them in some of the SQL in my DB2 Administrative SQL Cookbook Queries. A CTE is a key component of my favorite query for finding problem SQL in the package cache. This is a query I use a minimum of once a week, and sometimes it seems more like ten times a day.
To me, CTEs offer a great way of calculating numbers across a range of data and then comparing the numbers for individual objects to the aggregate values. They are a key part of queries involving recursion, such as the one Ian Bjorhovde wrote on reporting transaction log files archived by hour. They’re just a dang handy way of calculating something you’ll refer to in a query more than once.
In the query I was trying to use in this case, I was using the LAG function to calculate differences in values from one row to the next, so the CTE was critical. With a bit of research, though, I found I could achieve the same effect by using a subselect in my where clause. That means instead of a query like this:
WITH SUM_TAB (SUM_RR, SUM_CPU, SUM_EXEC, SUM_SORT, SUM_NUM_EXEC) AS ( SELECT FLOAT(SUM(ROWS_READ)), FLOAT(SUM(TOTAL_CPU_TIME)), FLOAT(SUM(STMT_EXEC_TIME)), FLOAT(SUM(TOTAL_SECTION_SORT_TIME)), FLOAT(SUM(NUM_EXECUTIONS)) FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T ) SELECT SUBSTR(STMT_TEXT,1,10) as STATEMENT, ROWS_READ, DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) AS PCT_TOT_RR, TOTAL_CPU_TIME, DECIMAL(100*(FLOAT(TOTAL_CPU_TIME)/SUM_TAB.SUM_CPU),5,2) AS PCT_TOT_CPU, STMT_EXEC_TIME, DECIMAL(100*(FLOAT(STMT_EXEC_TIME)/SUM_TAB.SUM_EXEC),5,2) AS PCT_TOT_EXEC, TOTAL_SECTION_SORT_TIME, DECIMAL(100*(FLOAT(TOTAL_SECTION_SORT_TIME)/SUM_TAB.SUM_SORT),5,2) AS PCT_TOT_SRT, NUM_EXECUTIONS, DECIMAL(100*(FLOAT(NUM_EXECUTIONS)/SUM_TAB.SUM_NUM_EXEC),5,2) AS PCT_TOT_EXEC, DECIMAL(FLOAT(STMT_EXEC_TIME)/FLOAT(NUM_EXECUTIONS),10,2) AS AVG_EXEC_TIME FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T, SUM_TAB WHERE DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) > 10 OR DECIMAL(100*(FLOAT(TOTAL_CPU_TIME)/SUM_TAB.SUM_CPU),5,2) >10 OR DECIMAL(100*(FLOAT(STMT_EXEC_TIME)/SUM_TAB.SUM_EXEC),5,2) >10 OR DECIMAL(100*(FLOAT(TOTAL_SECTION_SORT_TIME)/SUM_TAB.SUM_SORT),5,2) >10 OR DECIMAL(100*(FLOAT(NUM_EXECUTIONS)/SUM_TAB.SUM_NUM_EXEC),5,2) >10 ORDER BY ROWS_READ DESC FETCH FIRST 20 ROWS ONLY WITH UR;
I could do this:
SELECT SUBSTR(STMT_TEXT,1,10) as STATEMENT, ROWS_READ, DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) AS PCT_TOT_RR, TOTAL_CPU_TIME, DECIMAL(100*(FLOAT(TOTAL_CPU_TIME)/SUM_TAB.SUM_CPU),5,2) AS PCT_TOT_CPU, STMT_EXEC_TIME, DECIMAL(100*(FLOAT(STMT_EXEC_TIME)/SUM_TAB.SUM_EXEC),5,2) AS PCT_TOT_EXEC, TOTAL_SECTION_SORT_TIME, DECIMAL(100*(FLOAT(TOTAL_SECTION_SORT_TIME)/SUM_TAB.SUM_SORT),5,2) AS PCT_TOT_SRT, NUM_EXECUTIONS, DECIMAL(100*(FLOAT(NUM_EXECUTIONS)/SUM_TAB.SUM_NUM_EXEC),5,2) AS PCT_TOT_EXEC, DECIMAL(FLOAT(STMT_EXEC_TIME)/FLOAT(NUM_EXECUTIONS),10,2) AS AVG_EXEC_TIME FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T , ( SELECT FLOAT(SUM(ROWS_READ)), FLOAT(SUM(TOTAL_CPU_TIME)), FLOAT(SUM(STMT_EXEC_TIME)), FLOAT(SUM(TOTAL_SECTION_SORT_TIME)), FLOAT(SUM(NUM_EXECUTIONS)) FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) AS T) AS SUM_TAB WHERE DECIMAL(100*(FLOAT(ROWS_READ)/SUM_TAB.SUM_RR),5,2) > 10 OR DECIMAL(100*(FLOAT(TOTAL_CPU_TIME)/SUM_TAB.SUM_CPU),5,2) >10 OR DECIMAL(100*(FLOAT(STMT_EXEC_TIME)/SUM_TAB.SUM_EXEC),5,2) >10 OR DECIMAL(100*(FLOAT(TOTAL_SECTION_SORT_TIME)/SUM_TAB.SUM_SORT),5,2) >10 OR DECIMAL(100*(FLOAT(NUM_EXECUTIONS)/SUM_TAB.SUM_NUM_EXEC),5,2) >10 ORDER BY ROWS_READ DESC FETCH FIRST 20 ROWS ONLY WITH UR;
The logic of the query really doesn’t have to change here, just where you define the “temporary table” your query is referencing. It turned out this wasn’t a big deal, but just meant a bit of extra work.
Part of the learning experience for me here was that just the week before I had helped analyze a query a developer wrote that used the latter logic, and I wondered at the time why they didn’t just use CTEs. It wasn’t a silly way for the developer to do things, just a different way, and maybe even one that’s more likely to work across RDBMSes.
Time format
This is something I was already aware was different between the various RDBMS dialects of SQL. This one was not a surprise to me. I noted the difference in date handling 8 years ago when I worked my way through the lowest level of Oracle certifications (OCA). However, the Informix way of working with dates doesn’t really resemble either one. My first lesson here was how to control the portion of the date that was returned, since I wanted to average about 6 values per hour into one value. This is the syntax I ended up with to do that:
select timestamp::DATETIME YEAR TO HOUR AS hour , 100 - avg(cpu_id) as cpu_busy from os_vmstat where mail_id='cli_dm_db1_prod1' group by 1 order by 1
I have to admit, I kind of like the Informix syntax for casting dates. I find it more intuitive than the Db2 way of doing the same things after only a couple of days working with it. I wish Db2 would enable this syntax as an option at the very least. It took me a couple of tries to get used to it, but specifying the full range of what you want – YEAR TO HOUR
makes a lot of sense to me. The double colon for casting also just seems intuitive to me after very little time working with it.
I thought I’d offer the same examples I used in my post comparing how Oracle and Db2 handle dates to show the Informix differences. I also have a Jupyter Notebook with the code I used here, much of which would execute in any database, in my GitHub repository.
Example #1:
The first example is syntax to find rows from two days in the past or 7 days into the future. The syntax for that in the blog entry about Oracle was strange because I was copying the same way that the example Oracle syntax did it. This time I’ll do it a bit different and the way I really would do it in Db2:
select * from sales where ts_sales_date between date('2006-03-31') - 2 days and date('2006-03-31') + 7 days order by ts_sales_date with ur
To do something similar in Informix, I would use:
select * from sales and ts_sales_date::DATETIME year to day between DATE('03/31/2006') - 2 UNITS DAY and DATE('03/31/2006') + 7 UNITS DAY order by ts_sales_date
Not too difficult or too different, though the date format it was expecting threw me for a minute or two. I’m just so used to yyyy-mm-dd. As mentioned above, I have learned and actually like the syntax for casting a date to whatever level of precision needed which is of the format:
date_expression::DATETIME large_end to small_end
Where the date_expression
is a column or other representation of a timestamp. This can also include math on the timestamp, as is seen in example 2, below.
The large_end
is the largest part of the date – in the example above, that’s YEAR.
The small_end
is the smallest part of the timestamp – in the example above, the day. The example above returns just the date, but this is supremely useful when you want to group things by hour, as in the example given earlier.
Example #2
The next example is to simply find what the date was 36 hours ago. Here’s how I would do it in Db2:
select date(current timestamp - 36 hours) from sysibm.sysdummy1
And in Informix, this works:
select (current - 36 units hour)::DATETIME YEAR to DAY from sysmaster:sysdual
This makes just as much sense as the Db2 way of doing it to me.
Example #3
For this example, the goal is to find the day of the week one month ago. Now if you look at the Jupyter Notebook, I ran these in March, making for the most boring possible result – the day of the week is exactly the same one month ago as today. But both methods here do correctly handle a 28-day month. Here’s the Db2 method:
select dayname(current timestamp - 1 month) weekday from sysibm.sysdummy1
And here it is in Informix:
select case WEEKDAY(current - 1 units month) when 0 then 'Sunday' when 1 then 'Monday' when 2 then 'Tuesday' when 3 then 'Wednesday' when 4 then 'Thursday' when 5 then 'Friday' when 6 then 'Saturday' end as weekday from sysmaster:sysdual
Notice that I had to resort to a case statement here. I could not find a way to return the day name like I can in Db2. I could only find a way to get the number that corresponds to the day of the week. I suspect Informix’s way may actually be closer to the SQL standard. I’m also pretty sure I could also do it this way in Db2.
This doesn’t entirely mean a way to return the day name doesn’t exist in Informix. I probably only spent 30 minutes looking. I would love a comment below if anyone knows a way to do it in Informix.
Example #4
The final example here is finding the exact time 450 seconds from now. Here’s how to do it in Db2:
select current timestamp + 450 seconds from sysibm.sysdummy1
And in Informix, this works:
select current + 450 units second from sysmaster:sysdual
The Db2 one reads a bit more like natural language to me, but I’m not sure if that is a good or a bad thing.
Summary
Overall, there are clearly differences in the SQL you must write when working with Informix. This whole experience has lead me to have a much deeper and genuine empathy for developers. As a database administrator, I often wonder at how some developers get so far in their career knowing so little SQL. But it is difficult when fairly basic things like handling dates just don’t have a common syntax among the various platforms – even RDBMSes from the same vendor!
Informix 14.10 was released roughly a week ago and it supports CTEs.
Regards.