This is a topic I have to look up from time to time, and I always go straight to Serge Rielau’s blog. He has a number of good articles on LISTAGG, the similar XML specific functions, and much more.
While he’s already covered the topic in great detail, I thought I’d add my own way of looking at it, and a use case or two for DB2 System views.
Why use LISTAGG
When writing SQL, there are often times we need to aggregate values from a columnar format into a single comma-separated list. This used to require recursion, which can be difficult and is certainly intimidating to many DBAs. Normalization often leads to this need to essentially condense things back down. Table A lists a bunch of orders, while table B lists the line items on each order, with a row for each order item – multiple order items per order. We may need a list of item numbers associated with an order.
LISTAGG is an aggregate function – meaning it is most often used with a group by to indicate within which groups a particular column should have the aggregation done. If not used with group by, it is used on the only column returned in the select or subselect statement.
LISTAGG was introduced in DB2 9.7 FixPack 4, so has been around for a while.
System Table Examples
My core focus is on database administration, so the vast majority of SQL that I write is related to querying system tables or monitoring tables to get information. In that context, I use LISTAGG – see “Using an Event Monitor to Capture Statements with Values for Parameter Markers for one example.
One of the places I like to play is coming up with queries that give me a little bit better information than what DB2 provides out of the box. Here is one such query, for the column names associated with the indexes for a table. List indexes show detail wraps in a way that I don’t like now, and the colnames column in syscat.indexes has absolutely no indication that a column in an index is an include column – this query prefixes include columns with a |
instead of the +
or -
used for ASC vs. DESC columns, to make it very clear which columns are include columns.
select substr(i.indschema,1,12) as indschema, substr(i.indname,1,25) as indname, cast(listagg(( case when ic.colorder = 'A' then '+' || ic.colname when ic.colorder = 'D' then '-' || ic.colname when ic.colorder = 'I' then '|' || ic.colname end ), '') within group (order by ic.colseq) as varchar(100)) as colnames from syscat.indexes as i join syscat.indexcoluse as ic on i.indschema = ic.indschema and i.indname = ic.indname where tabschema = 'DB2INST1' and tabname = 'PROJACT' group by i.tabschema, i.tabname, i.indschema, i.indname order by i.tabschema, i.tabname, i.indname; INDSCHEMA INDNAME COLNAMES ------------ ------------------------- ---------------------------------------------------------------------------------------------------- DB2INST1 IX_PROJACT_INCLUDE +PROJNO+ACTNO+ACSTDATE|ACSTAFF DB2INST1 PK_PROJACT +PROJNO+ACTNO+ACSTDATE 2 record(s) selected.
That’s a way to code it using a group by. I can also instead use a correlated subquery and add in some other information. Remember that any join can be rewritten as a correlated subquery and as a non-correlated subquery.
select substr(i.indschema,1,12) as indschema, substr(i.indname,1,25) as indname, uniquerule, (select cast(listagg(( case when ic.colorder = 'A' then '+' || ic.colname when ic.colorder = 'D' then '-' || ic.colname when ic.colorder = 'I' then '|' || ic.colname end ), '') within group (order by ic.colseq) as varchar(100)) from syscat.indexcoluse ic where ic.indschema = i.indschema and ic.indname = i.indname ) as colnames from syscat.indexes as i where tabschema = 'DB2INST1' and tabname = 'PROJACT' order by i.tabschema, i.tabname, i.indname; INDSCHEMA INDNAME UNIQUERULE COLNAMES ------------ ------------------------- ---------- ---------------------------------------------------------------------------------------------------- DB2INST1 IX_PROJACT_INCLUDE U +PROJNO+ACTNO+ACSTDATE|ACSTAFF DB2INST1 PK_PROJACT P +PROJNO+ACTNO+ACSTDATE 2 record(s) selected.
Note that in both of the above, I used a cast function to make my output look a bit prettier – for VARCHAR, listagg defaults to VARCHAR 4000, which gives a lot of blank space for smaller fields. This can result in warnings (SQL0445W
) and truncated data in some cases.
Another simple use case may come in when you need to get a list of all of the columns in a table. Maybe you want to select everything except one, and typing all 80 columns is just too much. For example:
select cast(listagg(colname, ', ') as varchar(100)) as columns from syscat.columns where tabschema='DB2INST1' and tabname='STAFF' with ur; COLUMNS ---------------------------------------------------------------------------------------------------- COMM, DEPT, ID, JOB, NAME, SALARY, YEARS 1 record(s) selected.
Now obviously on a simple sample database table like this, that’s probably not worth the effort, but if there are 80 columns and I want to select 78 of them, this can be a nice shortcut.
Performance
There are group bys and order bys here, so performance may not be ideal. I wonder if it would be the worse or same writing the same SQL with recursion. Explaining the second query above doesn’t look terrible, though in these examples, we’re not too worried about performance.
Rows RETURN ( 1) Cost I/O | 142.667 FILTER ( 2) 166.635 38 | 428 GRPBY ( 3) 166.581 38 | 982 TBSCAN ( 4) 165.658 38 | 982 SORT ( 5) 165.623 38 | 982 ^HSJOIN ( 6) 165.016 38 /-------+-------\ 982 428 TBSCAN FETCH ( 7) ( 8) 61.5068 103.399 9 29 | /---+----\ 982 428 428 TABLE: SYSIBM IXSCAN TABLE: SYSIBM SYSINDEXCOLUSE ( 9) SYSINDEXES Q1 0.110039 Q2 0 | 428 INDEX: SYSIBM INDINDEXES03 Q2
References
Serge Rielau’s blog
IBM DB2 Knowledge Center page on LISTAGG: http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0058709.html?cp=SSEPGG_10.5.0/2-12-4-0-7&lang=en
Thank you, will be very useful.