Using LISTAGG to Turn Rows of Data into a Comma Separated List

Posted by

This is a topic I have to look up from time to time, and I always go straight to Serge Rielau’s blog. He has a number of good articles on LISTAGG, the similar XML specific functions, and much more.

While he’s already covered the topic in great detail, I thought I’d add my own way of looking at it, and a use case or two for DB2 System views.

Why use LISTAGG

When writing SQL, there are often times we need to aggregate values from a columnar format into a single comma-separated list. This used to require recursion, which can be difficult and is certainly intimidating to many DBAs. Normalization often leads to this need to essentially condense things back down. Table A lists a bunch of orders, while table B lists the line items on each order, with a row for each order item – multiple order items per order. We may need a list of item numbers associated with an order.

LISTAGG is an aggregate function – meaning it is most often used with a group by to indicate within which groups a particular column should have the aggregation done. If not used with group by, it is used on the only column returned in the select or subselect statement.

LISTAGG was introduced in DB2 9.7 FixPack 4, so has been around for a while.

System Table Examples

My core focus is on database administration, so the vast majority of SQL that I write is related to querying system tables or monitoring tables to get information. In that context, I use LISTAGG – see “Using an Event Monitor to Capture Statements with Values for Parameter Markers for one example.

One of the places I like to play is coming up with queries that give me a little bit better information than what DB2 provides out of the box. Here is one such query, for the column names associated with the indexes for a table. List indexes show detail wraps in a way that I don’t like now, and the colnames column in syscat.indexes has absolutely no indication that a column in an index is an include column – this query prefixes include columns with a | instead of the + or - used for ASC vs. DESC columns, to make it very clear which columns are include columns.

select
    substr(i.indschema,1,12) as indschema, 
    substr(i.indname,1,25) as indname, 
    cast(listagg((
        case 
        when ic.colorder = 'A' then '+' || ic.colname 
        when ic.colorder = 'D' then '-' || ic.colname 
        when ic.colorder = 'I' then '|' || ic.colname 
        end
    ), '') within group (order by ic.colseq) as varchar(100)) as colnames 
from    syscat.indexes as i 
    join syscat.indexcoluse as ic 
        on i.indschema = ic.indschema 
        and i.indname = ic.indname 
where   tabschema = 'DB2INST1'
    and tabname = 'PROJACT'
group by    i.tabschema, 
        i.tabname, 
        i.indschema, 
        i.indname 
order by    i.tabschema, 
        i.tabname, 
        i.indname;

INDSCHEMA    INDNAME                   COLNAMES                                                                                            
------------ ------------------------- ----------------------------------------------------------------------------------------------------
DB2INST1     IX_PROJACT_INCLUDE        +PROJNO+ACTNO+ACSTDATE|ACSTAFF                                                                      
DB2INST1     PK_PROJACT                +PROJNO+ACTNO+ACSTDATE                                                                              

  2 record(s) selected.

That’s a way to code it using a group by. I can also instead use a correlated subquery and add in some other information. Remember that any join can be rewritten as a correlated subquery and as a non-correlated subquery.

select
    substr(i.indschema,1,12) as indschema, 
    substr(i.indname,1,25) as indname, 
    uniquerule,
    (select cast(listagg((
        case 
        when ic.colorder = 'A' then '+' || ic.colname 
        when ic.colorder = 'D' then '-' || ic.colname 
        when ic.colorder = 'I' then '|' || ic.colname 
        end
        ), '') within group (order by ic.colseq) as varchar(100)) 
        from syscat.indexcoluse ic 
        where ic.indschema = i.indschema and ic.indname = i.indname
    ) as colnames 
from    syscat.indexes as i 
where   tabschema = 'DB2INST1'
    and tabname = 'PROJACT'
order by    i.tabschema, 
        i.tabname, 
        i.indname;

INDSCHEMA    INDNAME                   UNIQUERULE COLNAMES                                                                                            
------------ ------------------------- ---------- ----------------------------------------------------------------------------------------------------
DB2INST1     IX_PROJACT_INCLUDE        U          +PROJNO+ACTNO+ACSTDATE|ACSTAFF                                                                      
DB2INST1     PK_PROJACT                P          +PROJNO+ACTNO+ACSTDATE                                                                              

  2 record(s) selected.

Note that in both of the above, I used a cast function to make my output look a bit prettier – for VARCHAR, listagg defaults to VARCHAR 4000, which gives a lot of blank space for smaller fields. This can result in warnings (SQL0445W) and truncated data in some cases.

Another simple use case may come in when you need to get a list of all of the columns in a table. Maybe you want to select everything except one, and typing all 80 columns is just too much. For example:

select cast(listagg(colname, ', ') as varchar(100)) as columns 
from syscat.columns 
where   tabschema='DB2INST1' 
        and tabname='STAFF' 
with ur;

COLUMNS                                                                                             
----------------------------------------------------------------------------------------------------
COMM, DEPT, ID, JOB, NAME, SALARY, YEARS                                                            

  1 record(s) selected.

Now obviously on a simple sample database table like this, that’s probably not worth the effort, but if there are 80 columns and I want to select 78 of them, this can be a nice shortcut.

Performance

There are group bys and order bys here, so performance may not be ideal. I wonder if it would be the worse or same writing the same SQL with recursion. Explaining the second query above doesn’t look terrible, though in these examples, we’re not too worried about performance.

                Rows
               RETURN
               (   1)
                Cost
                 I/O
                 |
               142.667
               FILTER
               (   2)
               166.635
                 38
                 |
                 428
               GRPBY
               (   3)
               166.581
                 38
                 |
                 982
               TBSCAN
               (   4)
               165.658
                 38
                 |
                 982
               SORT
               (   5)
               165.623
                 38
                 |
                 982
               ^HSJOIN
               (   6)
               165.016
                 38
         /-------+-------\
       982                 428
     TBSCAN              FETCH
     (   7)              (   8)
     61.5068             103.399
        9                  29
       |               /---+----\
       982           428          428
 TABLE: SYSIBM     IXSCAN   TABLE: SYSIBM
 SYSINDEXCOLUSE    (   9)     SYSINDEXES
       Q1         0.110039        Q2
                      0
                     |
                     428
               INDEX: SYSIBM
                INDINDEXES03
                     Q2

References

Serge Rielau’s blog
IBM DB2 Knowledge Center page on LISTAGG: http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0058709.html?cp=SSEPGG_10.5.0/2-12-4-0-7&lang=en

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, and is now pivoting to focus on learning MySQL. Ember shares both posts about her core skill set and her journey learning MySQL. Ember lives in Denver and work from home

3 comments

Leave a Reply to Rafael Schimitt Bottega Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.