There are several scenarios with DB2 where we need to know what the “worst-case” row length is. That is, if all varying fields are completely full and any LOBs are maximally inlined, how much space could a single row actually take up? SYSCAT.TABLES gives us the average row size, which is useful for some use cases, but not all of them.
Using a Stored Procedure
My top resouce for this is Serge Rileau’s Blog article with a great stored procedure definition. This stored procedure works great. The problem comes when I’m working in a client’s environment where they don’t want me to create any objects of any kind, much less stored procedures. To deal with this frequent consultant’s conundrum, I’ve had a goal for a while to work this into an SQL I can run without creating any objects. I sat down to do it several months ago, but didn’t have a solid couple of hours to dedicate to it, so came away feeling frustraed. However, I recently had time to sit down and work it out, and it wasn’t as complicated as I thought it would be.
My SQL to Calculate Row Sizes
My goal is to get a query that will give me the table name and row size for every table in the database. Using a couple of common table expressions, I’ve developed the following SQL. It is very heavily based on Serge Rileau’s stored procedure, and I’ve successfully compared results on a couple of systems:
with compr as ( select TABSCHEMA , TABNAME , CASE WHEN compression in ('B', 'V') THEN 2 ELSE 0 END as compression_mod FROM SYSCAT.TABLES ), col as ( SELECT TABSCHEMA , TABNAME , COLNAME ,COALESCE(D.SOURCENAME, C.TYPENAME) AS TYPENAME ,COALESCE(D.LENGTH, C.LENGTH) AS LENGTH ,C.SCALE, C.NULLS, C.INLINE_LENGTH, D.METATYPE ,D.INLINE_LENGTH AS STRUCT_INLINE_LENGTH , CASE WHEN C.inline_length <> 0 THEN C.inline_length WHEN metatype = 'R' THEN D.inline_length WHEN COALESCE(D.SOURCENAME, C.TYPENAME) IN ('CLOB', 'BLOB', 'DBCLOB') THEN CASE WHEN COALESCE(D.LENGTH, C.LENGTH) <= 1024 THEN 68 WHEN COALESCE(D.LENGTH, C.LENGTH) <= 8192 THEN 92 WHEN COALESCE(D.LENGTH, C.LENGTH) <= 65536 THEN 116 WHEN COALESCE(D.LENGTH, C.LENGTH) <= 524000 THEN 140 WHEN COALESCE(D.LENGTH, C.LENGTH) <= 4190000 THEN 164 WHEN COALESCE(D.LENGTH, C.LENGTH) <= 134000000 THEN 196 WHEN COALESCE(D.LENGTH, C.LENGTH) <= 536000000 THEN 220 WHEN COALESCE(D.LENGTH, C.LENGTH) <= 1070000000 THEN 252 WHEN COALESCE(D.LENGTH, C.LENGTH) <= 1470000000 THEN 276 WHEN COALESCE(D.LENGTH, C.LENGTH) <= 2147483647 THEN 312 ELSE -2 END WHEN COALESCE(D.SOURCENAME, C.TYPENAME) IN ('LONG VARCHAR', 'LONG VARGRAPHIC') THEN 20 WHEN COALESCE(D.SOURCENAME, C.TYPENAME) = 'XML' THEN 80 ELSE 0 END as LOBLENGTH FROM SYSCAT.COLUMNS AS C LEFT OUTER JOIN SYSCAT.DATATYPES AS D ON D.typeschema = C.typeschema AND D.typename = C.typename AND D.typemodulename IS NULL AND C.typeschema <> 'SYSIBM ' ), tot as (select col.TABSCHEMA as tabschema , col.TABNAME as tabname ,SUM(CASE TYPENAME WHEN 'SMALLINT' THEN length + compression_mod WHEN 'INTEGER' THEN length + compression_mod WHEN 'BIGINT' THEN length + compression_mod WHEN 'REAL' THEN length + compression_mod WHEN 'DOUBLE' THEN length + compression_mod WHEN 'DECFLOAT' THEN length + compression_mod WHEN 'DECIMAL' THEN TRUNC(length / 2) + 1 + compression_mod WHEN 'CHARACTER' THEN length + compression_mod WHEN 'VARCHAR' THEN length + 4 - compression_mod WHEN 'GRAPHIC' THEN length * 2 + compression_mod WHEN 'VARGRAPHIC' THEN length * 2 + 4 - compression_mod WHEN 'LONG VARCHAR' THEN 24 - compression_mod WHEN 'LONG VARGRAPHIC' THEN 24 - compression_mod WHEN 'CLOB' THEN loblength + 4 - compression_mod WHEN 'BLOB' THEN loblength + 4 - compression_mod WHEN 'DBCLOB' THEN loblength + 4 - compression_mod WHEN 'XML' THEN loblength + 3 - compression_mod WHEN 'DATE' THEN length + compression_mod WHEN 'TIME' THEN length + compression_mod WHEN 'TIMESTAMP' THEN length + compression_mod ELSE CASE WHEN metatype = 'R' THEN loblength + 4 - compression_mod ELSE -3 END END + CASE WHEN compression_mod = 0 AND NULLS = 'Y' THEN 1 ELSE 0 END) as row_size FROM compr join col on compr.tabname=col.tabname and compr.tabschema=col.tabschema GROUP BY col.tabschema, col.tabname) select substr(tot.tabschema,1,24) as tabschema , substr(tot.tabname,1,40) as tabname -- comment the above two rows and uncomment the below two rows for full, non-truncated table names -- tot.tabschema -- , tot.tabname , row_size + CASE WHEN compr.compression_mod <> 0 THEN 2 ELSE 0 END -- uncomment the following two rows if you have created the stored procedure and want to compare results -- , GetRowSize(tot.tabschema, tot.tabname) as sp_row_size -- , GetRowSize(tot.tabschema, tot.tabname) - row_size + CASE WHEN compr.compression_mod <> 0 THEN 2 ELSE 0 END as diff FROM tot join compr on tot.tabschema=compr.tabschema and tot.tabname=compr.tabname ;
Is this long and complicated? Yes, it is. The quick cheat I’ve run into others using is simply summing up length in syscat.columns for a table. That works as an approximation in some situations, but blows up when you hit XML, LOBS, and some other issues. I just can’t rely on a cheat like that on an ongoing basis.
The above is also less efficient if you were to run it for a single table than the stored procedure. Since my goal is a list of tables with their row sizes, this is not an issue for me. I would work more on optimization if I were running it for only a single table at a time and optimization mattered. Optimization for this will also matter more for environments with an excessively large number of tables, such as some SAP or PeopleSoft databases.
I’m also not sure how to work with this in an extended row size environment – I avoid extended row size for its performance implications.