My earlier post on table reorgs covered Reorg’s place in a database maintenance cycle and running reorgchk. If you haven’t already read it, check it out: How to Tell When a Table Reorg is Needed
INDEXES
After the section on tables, REORGCHK output has a section with details on each index for each table that REORGCHK looked at.
The index section of reorgchk looks something like this:
Index statistics: F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80 F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE)) F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100 F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20 F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20 SCHEMA.NAME INDCARD LEAF ELEAF LVLS NDEL KEYS LEAF_RECSIZE NLEAF_RECSIZE LEAF_PAGE_OVERHEAD NLEAF_PAGE_OVERHEAD PCT_PAGES_SAVED F4 F5 F6 F7 F8 REORG ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Table: WSCOMUSR.ATTR Index: SYSIBM.SQL120212212515070 170 1 0 1 0 170 8 8 566 566 0 98 - - 0 0 ----- Index: WSCOMUSR.I0001203 170 2 0 2 0 170 22 22 516 516 0 62 147 - 0 0 *---- Index: WSCOMUSR.I0001204 170 1 0 1 0 6 4 4 710 710 0 98 - - 0 0 ----- Index: WSCOMUSR.I0001464 170 1 0 1 0 2 2 2 822 822 0 96 - - 0 0 ----- Index: WSCOMUSR.X_ATTR_IX1 170 4 0 2 0 170 66 66 236 236 0 97 110 - 0 0 ----- Index: WSCOMUSR.X_ATTR_IX2 170 2 0 2 0 170 30 30 392 392 0 95 178 - 0 0 ----- Index: WSCOMUSR.X_ATTR_IX3 170 2 0 2 0 170 25 25 442 442 0 62 158 - 0 0 *---- Table: WSCOMUSR.ATTRIBUTE Index: SYSIBM.SQL120212212515260 0 16 16 2 0 0 12 12 476 476 0 100 105 - 0 100 ----* Index: WSCOMUSR.I0000019 0 26 26 2 0 0 16 16 442 442 0 100 105 - 0 100 ----* Index: WSCOMUSR.I0000298 0 5 5 2 0 0 11 11 516 516 0 100 105 - 0 100 ----* Table: WSCOMUSR.BUSEVENT Index: SYSIBM.SQL120212212516210 1.1e+07 48067 1414 4 8e+05 1.1e+07 12 12 948 948 0 68 62 266 7 2 *-*-- Index: WSCOMUSR.BUSEVENT_IX1 1.1e+07 25184 0 3 0 9048618 10 10 1048 1048 0 89 91 1 0 0 ----- Index: WSCOMUSR.I0001104 1.1e+07 10123 0 3 39 1 3 3 1894 1894 0 100 86 4 0 0 ----- Table: WSCOMUSR.DMEXPLOG Index: SYSIBM.SQL120212212521010 34214 4717 4439 4 1406 34214 8 8 566 566 0 93 59 +++ 3 94 ----* Index: WSCOMUSR.DMEXPLOG_IX1 34358 3252 3036 4 25 33629 10 10 516 516 0 100 83 +++ 0 93 ----* Index: WSCOMUSR.I0001198 34292 9708 8957 5 4285 34267 36 36 416 416 0 76 55 +++ 11 92 *---* Index: WSCOMUSR.I0001199 34332 8933 8377 4 163 33960 22 22 354 354 0 90 50 +++ 0 93 -*--*
It's nice to use a wide screen when viewing this output on a server - a 72-character wide window may drive you insane.
Note that the format is very similar to tables. There are numerical details about each index. There are the results of several forumulas - this time F4 through F8, and there is a final column on the right with flags when a reorg MAY be needed.
The output is a bit different in that there are sections for each table that the indexes are grouped in. This always made the output harder for parsing for scripting, though there are better options in more recent versions for scripting. It is not just the line formatting, but in some ways, the decisions to be made are a bit more complicated.
Let's work through the formulas one by one.
F4
Formula F4 is described in the index section header as:
F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80
This formula or flag is looking at how well clustered the table is over a particular index. The values of CLUSTERRATIO or CLUSTERFACTOR are pulled directly from the SYSCAT.INDEXES system catalog view. IF we want the table clustered over this particular index, the way to get rid of this flag is to reorganize the table ON the index. Think about this one. In many cases it is impossible to have a table clustered over all of the indexes at once. We cannot blindly reorg based on this flag. If we go that direction, we keep reorganizing the table over and over again on one index after another.
The important thing with F4 is to decide, on a table by table basis, whether we want the data clustered over an index. If we do, and if there is some reason you cannot or do not want to define that index as a clustering index, then and only then you should reorg the table on that index if the F4 formula is flagged. Reorging a table on an index can increase your reorg time.
F4 is the only flag in the index section that has the possibility to prompt a table reorg. The other formulas will indicate various kinds of index reorgs.
Index Reorgs
Unless you are using table partitioning, you may only reorg all indexes for a table. You may not reorg a specific index for a table without reorging the remaining indexes. There are multiple types of index reorgs that can be done. Index reorgs are kind of online (see the end of this article for how online), and do use resources. If you are lucky enough to do offline, classic table reorgs, indexes are rebuilt as a part of them. They are not rebuilt as a part of online table reorgs. If you are only doing online table reorgs, you will want to make sure you're also doing the appropriate index reorgs. I will talk about each of the remaining formulas and then revisit what kinds of reorgs are suggested by each flag.
F5
Formula F5 is described in the index section header as:
F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE))
This formula is trying to calculate how much of the space allocated to non-empty leaf pages is used, and checking to see if that is greater than either 50% or the space we expect to be used on pages given the value of PCTFREE for this index. If you need a reminder on what leaf pages are, see DB2 Basics: What is an index?
PCTFREE is a value that is set for each index and is respected in certain situations like reorg and load. It tells DB2 to leave a certain portion of each index leaf page open so that index keys can be inserted in proper place in the index without having to allocate new pages. It is set to 10% by default.
If we have a high percentage of space open on our non-empty leaf pages, it may indicate that our indexes can benefit from some consolidation and cleanup.
The following should work to calculate F5, if nleaf is greater than one and ((nleaf-num_empty_leafs-1) * (PAGESIZE-96))
does not return zero. Remember that any calculations provided are for reference and understanding only, and they are not guaranteed to be correct. The IBM Knowledge Center page on REORGCHK describes the calculations in detail, but does not provide SQL for performing them.
select (100 * ((fullkeycard * ((select SUM(CASE WHEN A.typename = 'VARCHAR' THEN A.AVGCOLLEN - 2 ELSE A.AVGCOLLEN END + CASE WHEN NULLS = 'Y' THEN 0 ELSE 0 END ) + 0 FROM SYSCAT.COLUMNS A, syscat.indexcoluse B WHERE A.colname=B.colname and B.indschema=i.indschema and B.indname=i.indname and A.tabschema=i.tabschema and A.tabname=i.tabname)+9)) + (card-fullkeycard) * 5) / ((nleaf-num_empty_leafs-1) * (PAGESIZE-96))) from syscat.indexes i join syscat.tables t on i.tabschema=t.tabschema and i.tabname=t.tabname join syscat.tablespaces ts on i.tbspaceid = ts.TBSPACEID where i.indschema = 'WSCOMUSR' and i.indname='IPF00008' with ur
F6
Formula F6 is described in the index section header as:
F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100
What this formula does is checks to see if an index with one less level would have enough space for all data, respecting PCTFREE. Usually that's a significant change, so this formula I don't find flagged as often as some of the others. It's another that is hard to get a query working to calculate, so I don't have that available. Check out the IBM Knowledge Center page on REORGCHK, and if you have SQL that works for it, post in the comments below so that others can enjoy your SQL prowess!
F7
Formula F7 is described in the index section header as:
100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20
With type 2 indexes, index keys are essentially marked as deleted rather than actually deleted. These are called pseudo-deleted RIDs. F7 checks to make sure that less than 20% of the RIDS in the table are marked for deletion.
Here is the query for calculating this one - it's pretty easy:
select substr(indname,1,30) as indname, decimal(100*(FLOAT(NUMRIDS_DELETED)/FLOAT(NUMRIDS_DELETED+INDCARD)),10,5) as F7 from syscat.indexes where tabschema='WSCOMUSR' and tabname='ATTRVALDESC' with ur INDNAME F7 ------------------------------ ------------ SQL120212212515340 0.06914 I_ATTRVALDESC01 0.06255 X_ATTRVALDESC_IX1 0.18416 I0001468 0.03293 I0001470 0.03293 5 record(s) selected.
This one is likely to be flagged if a lot of data is deleted from the table.
F8
Formula F8 is described in the index section header as:
F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20
This one is looking for entire pages where all of the RIDs have been marked as deleted. It checks to make sure that no more than 20% of the pages in the index are fully marked as deleted. Again, writing the SQL to calculate this yourself is not hard:
select substr(indname,1,30) as indname, decimal(100*(FLOAT(NUM_EMPTY_LEAFS)/FLOAT(NLEAF)),10,5) from syscat.indexes where tabschema='WSCOMUSR' and tabname='CATENTDESC' with ur INDNAME F8 ------------------------------ ------------ SQL120212212517170 0.00000 I0000304 0.00000 I0001196 7.88177 I0001248 0.00000 IPF00002 9.30232 5 record(s) selected.
Index Reorg Decisions
The cheat of just looking for lines in reorgchk with a *
simply does not work for indexes. There are too many complicated decisions to be made. Luckily for scripting, we have similar constructs as we did for table reorgs.
Start with REORGCHK_IX_STATS like this:
$ db2 "call REORGCHK_IX_STATS('T','SYSIBM.SYSCOLUMNS')" Result set 1 -------------- TABLE_SCHEMA TABLE_NAME INDEX_SCHEMA INDEX_NAME DATAPARTITIONNAME INDCARD NLEAF NUM_EMPTY_LEAFS NLEVELS NUMRIDS_DELETED FULLKEYCARD LEAF_RECSIZE NONLEAF_RECSIZE LEAF_PAGE_OVERHEAD NONLEAF_PAGE_OVERHEAD PCT_PAGES_SAVED F4 F5 F6 F7 F8 REORG -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ----------- -------------------- -------------------- -------------------- -------------------- -------------------- --------------------- --------------- ----------- ----------- ----------- ----------- ----------- ----- SYSIBM SYSCOLUMNS SYSIBM INDCOLUMNS01 18841 289 0 3 0 18841 40 40 710 710 0 95 90 19 0 0 ----- SYSIBM SYSCOLUMNS SYSIBM INDCOLUMNS02 18841 32 0 2 0 14 18 18 822 822 0 79 93 -1 0 0 *---- SYSIBM SYSCOLUMNS SYSIBM INDCOLUMNS03 18841 32 0 2 0 1 4 4 822 822 0 100 92 -1 0 0 ----- 3 record(s) selected. Return Status = 0
Again this first part returns good data, but not in an easily scriptable format. But the real power is in the population of the SESSION.IX_STATS table, which can be queried for each element like this:
select substr(table_schema,1,18) as tabschema, substr(table_name,1,30) as tabname, substr(index_schema,1,18) as indschema, substr(index_name,1,30) as indname, f4, f5, f6, f7, f8, reorg from SESSION.IX_STATS; TABSCHEMA TABNAME INDSCHEMA INDNAME F4 F5 F6 F7 F8 REORG ------------------ ------------------------------ ------------------ ------------------------------ ----------- ----------- ----------- ----------- ----------- ----- SYSIBM SYSCOLUMNS SYSIBM INDCOLUMNS01 95 90 19 0 0 ----- SYSIBM SYSCOLUMNS SYSIBM INDCOLUMNS02 79 93 -1 0 0 *---- SYSIBM SYSCOLUMNS SYSIBM INDCOLUMNS03 100 92 -1 0 0 ----- 3 record(s) selected.
Now you should have the data you need in the format you need it to make decisions. These recommendations come straight from the IBM Knowledge Center page on REORGCHK - a very useful page.
- If the results of the calculations for Formula 1, 2 and 3 do not exceed the bounds set by the formula and the results of the calculations for Formula 4, 5 or 6 do exceed the bounds set, then index reorganization is recommended.
- If only the results of the calculations Formula 7 exceed the bounds set, but the results of Formula 1, 2, 3, 4, 5 and 6 are within the set bounds, then cleanup of the indexes using the CLEANUP option of index reorganization is recommended.
- If the only calculation result to exceed the set bounds is the that of Formula 8, then a cleanup of the pseudo empty pages of the indexes using the CLEANUP PAGES option of index reorganization is recommended.
Your reorg command for indexes will always start with REORG INDEXES ALL FOR TABLE
and include the table name and one of the syntax options outlined above. There aren't online or offline choices or other options for indexes. Unless you're using the CLEANUP ONLY
or CLEANUP ONLY PAGES
options, what DB2 is doing is building new indexes side-by-side with the old it indexes. It will then acquire a Z lock on the whole table to make the switch. The switch may be fairly quick. But that's also a super-exclusive lock, which blocks all table access, including shared and read-only locks. I believe this even includes application using the UR or Uncommitted Read isolation level.
Share your own thoughts and experiences with index reorgs!
[…] How to Tell When an Index Reorg is Needed […]
This is another wonderful write up by you. Thanks for sharing. Is there any way to find directly the size of an index ?
In my mind, the easiest, quickest way is still to use reorgchk or one of the related functions. Using DMS or AST tablespaces, you cannot directly look at a file on disk to see the size if that’s what you’re asking.
How can I calculate the nleaf (number of active leaf pages in the index)?
I really need this information to create virtual indexes.
Thank you.
SYSCAT.INDEXES shows nleaf. It is populated when you do runstats on indexes. You can query it using something like this:
db2 “select substr(indschema,1,10) as indschema, substr(indname,1,30) as indname, nleaf from syscat.indexes”
Here is some sample sql which will provide a list of top 10 indexes based on size which are greater than 100 mb –
SELECT Substr(indschema,1,6) AS indschema,
Substr(indname,1,18) AS indname,
indextype,
Substr(tabschema,1,6) AS tabschema,
Substr(tabname,1,18) AS tabname,
avgleafkeysize,
avgnleafkeysize,
indcard,
and int((avgleafkeysize + 11)*indcard*2/(1024*1024)) AS space_mb,
compression,
lastused
FROM syscat.indexes
WHERE tabschema NOT LIKE ‘SYS%’
AND int((avgleafkeysize + 11)*indcard*2/(1024*1024)) > 102400
ORDER BY 9 desc
FETCH first 10 ROWS only WITH UR;
Explanation for formula used is available on webpage below and estimate is good for db2 9.7 as well:
http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0004929.html?lang=en
Hi Nadir,
SQL is not working,please check the syntax.
Is it possible to find which specific indexes are active in partitioned index of partition table?
I don’t know the answer to this one.
Fixed SQL
SELECT Substr(indschema,1,6) AS indschema,
Substr(indname,1,18) AS indname,
indextype,
Substr(tabschema,1,6) AS tabschema,
Substr(tabname,1,18) AS tabname,
avgleafkeysize,
avgnleafkeysize,
indcard,
int((avgleafkeysize + 11)*indcard*2/(1024*1024)) AS space_mb,
compression,
lastused
FROM syscat.indexes
WHERE tabschema NOT LIKE ‘SYS%’
AND int((avgleafkeysize + 11)*indcard*2/(1024*1024)) > 102400
ORDER BY 9 desc;