Calculating Row Size for a Table

Posted by

There are several scenarios with DB2 where we need to know what the “worst-case” row length is. That is, if all varying fields are completely full and any LOBs are maximally inlined, how much space could a single row actually take up? SYSCAT.TABLES gives us the average row size, which is useful for some use cases, but not all of them.

Using a Stored Procedure

My top resouce for this is Serge Rileau’s Blog article with a great stored procedure definition. This stored procedure works great. The problem comes when I’m working in a client’s environment where they don’t want me to create any objects of any kind, much less stored procedures. To deal with this frequent consultant’s conundrum, I’ve had a goal for a while to work this into an SQL I can run without creating any objects. I sat down to do it several months ago, but didn’t have a solid couple of hours to dedicate to it, so came away feeling frustraed. However, I recently had time to sit down and work it out, and it wasn’t as complicated as I thought it would be.

My SQL to Calculate Row Sizes

My goal is to get a query that will give me the table name and row size for every table in the database. Using a couple of common table expressions, I’ve developed the following SQL. It is very heavily based on Serge Rileau’s stored procedure, and I’ve successfully compared results on a couple of systems:

with compr as (
  select 
    TABSCHEMA
    , TABNAME
    , CASE WHEN compression in ('B', 'V') THEN 2 ELSE 0 END as compression_mod
    FROM SYSCAT.TABLES
),
col as (
  SELECT TABSCHEMA
         , TABNAME
         , COLNAME
     ,COALESCE(D.SOURCENAME, C.TYPENAME) AS TYPENAME 
         ,COALESCE(D.LENGTH, C.LENGTH) AS LENGTH
         ,C.SCALE, C.NULLS, C.INLINE_LENGTH, D.METATYPE
         ,D.INLINE_LENGTH AS STRUCT_INLINE_LENGTH
     , CASE WHEN C.inline_length <> 0 
                    THEN C.inline_length
                WHEN metatype = 'R' THEN D.inline_length 
                WHEN COALESCE(D.SOURCENAME, C.TYPENAME) IN ('CLOB', 'BLOB', 'DBCLOB') 
                    THEN CASE WHEN COALESCE(D.LENGTH, C.LENGTH) <=       1024 THEN 68
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=       8192 THEN 92
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=      65536 THEN 116                                       
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=     524000 THEN 140
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=    4190000 THEN 164
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=  134000000 THEN 196
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=  536000000 THEN 220
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <= 1070000000 THEN 252
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <= 1470000000 THEN 276
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <= 2147483647 THEN 312
                         ELSE -2 END
                WHEN COALESCE(D.SOURCENAME, C.TYPENAME) IN ('LONG VARCHAR', 'LONG VARGRAPHIC')
                           THEN 20
                WHEN COALESCE(D.SOURCENAME, C.TYPENAME) = 'XML' THEN 80
           ELSE 0 END as LOBLENGTH
    FROM SYSCAT.COLUMNS AS C
         LEFT OUTER JOIN SYSCAT.DATATYPES AS D
              ON D.typeschema = C.typeschema 
              AND D.typename = C.typename
              AND D.typemodulename IS NULL
              AND C.typeschema <> 'SYSIBM  '
),
tot as (select  
    col.TABSCHEMA as tabschema
    , col.TABNAME as tabname
        ,SUM(CASE TYPENAME 
            WHEN 'SMALLINT'        THEN length + compression_mod
            WHEN 'INTEGER'         THEN length + compression_mod
            WHEN 'BIGINT'          THEN length + compression_mod
            WHEN 'REAL'            THEN length + compression_mod
            WHEN 'DOUBLE'          THEN length + compression_mod
            WHEN 'DECFLOAT'        THEN length + compression_mod
            WHEN 'DECIMAL'         THEN TRUNC(length / 2) + 1 + compression_mod
            WHEN 'CHARACTER'       THEN length + compression_mod
            WHEN 'VARCHAR'         THEN length + 4 - compression_mod
            WHEN 'GRAPHIC'         THEN length * 2 + compression_mod
            WHEN 'VARGRAPHIC'      THEN length * 2 + 4 - compression_mod
            WHEN 'LONG VARCHAR'    THEN 24 - compression_mod
            WHEN 'LONG VARGRAPHIC' THEN 24 - compression_mod
            WHEN 'CLOB'            THEN loblength + 4 - compression_mod
            WHEN 'BLOB'            THEN loblength + 4 - compression_mod
            WHEN 'DBCLOB'          THEN loblength + 4 - compression_mod
            WHEN 'XML'             THEN loblength + 3 - compression_mod
            WHEN 'DATE'            THEN length + compression_mod
            WHEN 'TIME'            THEN length + compression_mod
            WHEN 'TIMESTAMP'       THEN length + compression_mod
            ELSE CASE WHEN metatype = 'R' THEN loblength + 4 - compression_mod
                     ELSE -3 END
            END 
          + CASE WHEN compression_mod = 0 AND NULLS = 'Y' THEN 1 ELSE 0 END) as row_size

  FROM compr join col on compr.tabname=col.tabname
        and compr.tabschema=col.tabschema
  GROUP BY col.tabschema, col.tabname)
select 
       substr(tot.tabschema,1,24) as tabschema
       , substr(tot.tabname,1,40) as tabname
-- comment the above two rows and uncomment the below two rows for full, non-truncated table names
--     tot.tabschema
--     , tot.tabname
       , row_size + CASE WHEN compr.compression_mod <> 0 THEN 2 ELSE 0 END 
-- uncomment the following two rows if you have created the stored procedure and want to compare results
--       , GetRowSize(tot.tabschema, tot.tabname) as sp_row_size
--       , GetRowSize(tot.tabschema, tot.tabname) - row_size + CASE WHEN compr.compression_mod <> 0 THEN 2 ELSE 0 END as diff
  FROM tot join compr on tot.tabschema=compr.tabschema and tot.tabname=compr.tabname
;

Is this long and complicated? Yes, it is. The quick cheat I’ve run into others using is simply summing up length in syscat.columns for a table. That works as an approximation in some situations, but blows up when you hit XML, LOBS, and some other issues. I just can’t rely on a cheat like that on an ongoing basis.

The above is also less efficient if you were to run it for a single table than the stored procedure. Since my goal is a list of tables with their row sizes, this is not an issue for me. I would work more on optimization if I were running it for only a single table at a time and optimization mattered. Optimization for this will also matter more for environments with an excessively large number of tables, such as some SAP or PeopleSoft databases.

I’m also not sure how to work with this in an extended row size environment – I avoid extended row size for its performance implications.

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, and is now pivoting to focus on learning MySQL. Ember shares both posts about her core skill set and her journey learning MySQL. Ember lives in Denver and work from home

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.