Like many applications, WebSphere Commerce puts all tables in USERSPACE1
unless they need larger page sizes. This actually works just fine for smaller and midrange implementations, but we have about one build a year that requires something else – either because of standards that client DBAs adhere to or because they actually are busy enough for I/O and separate buffer pools to matter. I recently got the experience to set up a framework and plan for this for a larger client and thought I’d share a few of my methods and thoughts.
Guiding Principles
When embarking on a re-design, it is a considerable amount of effort. Because of this, it is important to have a clear idea of why you’re making any changes you are. Don’t separate things into separate tablespaces just for the fun of it or just because – make informed, data supported decisions.
The main reasons you may want to separate tables into different tablespaces include:
- A group of tables that can be restored together through tablespace backup and restore This really isn’t realistic in an IBM WebSphere Commerce database, because all of the tables are so interconnected that you’d have a hard time finding any 5 out of the 1,000 plus tables that can be restored without a host of other tables.
- Tables related to specific applications If you have different applications accessing a database and some of those tables are dedicated to a single application, it may make sense to separate them out.
- Separate I/O paths If you have the luxury of fully separate I/O paths, especially those with different speed characteristics (such as SSD vs. SATA), it may make sense to separate I/O either for individual tables or individual partitions of tables. You will need dedicated tablespaces, storage groups, and possibly table partitions to make this work.
- Different Bufferpool Requirements This is the one that came to my mind as what I would primarily focus on, because the storage characteristics are so varied between the various clients that I support. But memory is always at a premium as databases grow, and managing my bufferpools more exactly is something I would love to do.
- Managing LOB data and filesystem requirements LOBs can be particularly problematic for performance. Tables with LOBs perform better if their tablespaces are enabled for filesystem caching, but tables without LOBs behave better without filesystem caching enabled for their tablespaces.
If splitting tables into multiple tablespaces does not meet one of these goals, then it’s important to ask yourself why you’re doing it. Doing so adds complexity to administration. Also changes are getting easier and easier to make without an outage – ADMIN_MOVE_TABLE now supports RI – so consider that you may alter where tables are later in the life of your database.
Analyzing Existing Activity
Some of the goals for deciding what to split out come out of a DBA’s knowledge of the database, but others you can get hard numbers to support, based on querying an existing database. Obviously if you’re working with a new database, you won’t have this advantage, but even for existing databases, this level of design needs to be revisited periodically.
There are a number of different things and ways to look at the data here. Queries below will also make use of my blog entry on Detailed Analysis of Individual Performance Metrics.
Note that the queries below are against a running production B2C WebSphere Commerce database, but one that has an unusual pattern for orders, or lack thereof, so may not prove a good model to follow for other WebSphere Commerce databases.
Tables with High Read Activity
There are two ways to look at high read activity – both overall numbers as far as the most read tables in your database, and also the number of rows read compared to the cardinality of the table.
Overall Highest Rows Read
Here is one of the areas where I use standard deviation to analyze a particular metric. The query I use is this:
with rr_stats as ( select decimal(avg(rows_read),20,2) as avg_rr, decimal(stddev(rows_read),20,2) as stddev_rr, decimal(avg(rows_read) - 2 * stddev(rows_read),20,2) as avg_less_2_stddev, decimal(avg(rows_read) + 2 * stddev(rows_read),20,2) as avg_plus_2_stddev from table (mon_get_table(NULL,NULL,-1)) as t where t.tabschema='WSCOMUSR' ) select substr(tabname,1,30) as tabname, rows_read, rr_stats.avg_rr, case when rows_read > rr_stats.avg_plus_2_stddev then 'High' when rows_read < rr_stats.avg_less_2_stddev then 'Low' else NULL end as flag, rr_stats.stddev_rr, rr_stats.avg_less_2_stddev, rr_stats.avg_plus_2_stddev from table (mon_get_table(NULL,NULL,-1)) as t1, rr_stats where t1.tabschema='REDACTED' order by rows_read desc with ur; TABNAME ROWS_READ AVG_RR FLAG STDDEV_RR AVG_LESS_2_STDDEV AVG_PLUS_2_STDDEV ------------------------------ -------------------- ---------------------- ---- ---------------------- ---------------------- ---------------------- ATTRVALDESC 324723707514 1027698557.00 High 15841362471.73 -30655026386.46 32710423500.46 CATGPENREL 319425104912 1027698557.00 High 15841362471.73 -30655026386.46 32710423500.46 CATGROUP 67354582284 1027698557.00 High 15841362471.73 -30655026386.46 32710423500.46 ORDADJUST 45602221591 1027698557.00 High 15841362471.73 -30655026386.46 32710423500.46 DMELEMENT 43291418589 1027698557.00 High 15841362471.73 -30655026386.46 32710423500.46 SRCHTERMASSOC 28945112773 1027698557.00 - 15841362471.73 -30655026386.46 32710423500.46 ORDERS 9847728695 1027698557.00 - 15841362471.73 -30655026386.46 32710423500.46 ATTRDESC 3882342048 1027698557.00 - 15841362471.73 -30655026386.46 32710423500.46 XTAHITEM 2769959859 1027698557.00 - 15841362471.73 -30655026386.46 32710423500.46 SCHCONFIG 2404139505 1027698557.00 - 15841362471.73 -30655026386.46 32710423500.46 PRODUCTSET 2376734932 1027698557.00 - 15841362471.73 -30655026386.46 32710423500.46 STATEPROV 2365127252 1027698557.00 - 15841362471.73 -30655026386.46 32710423500.46 PX_PROMOTION 2149025098 1027698557.00 - 15841362471.73 -30655026386.46 32710423500.46 ...
This query defines tables which have truly exceptional numbers of rows read, to consider whether there are indexing issues or whether the tables might need to be separated due to very high use. You'll want to consider whether these tables have a high number of rows read in relation to their cardinality. The "Flag" column tells us. The average, std_dev, and upper and lower boundary columns are the same for all rows - they are provided only if you are not yet familiar with this method of identifying outliers.
Rows Read as Compared to Cardinality
Finding tables where the rows read is many times the cardinality can help find those code tables. The small tables that are read through many times and that we want to keep in memory as much as possible. This query can help identify them:
select substr(t.tabname,1,30) as tabname, rows_read, card, case when card > 0 then rows_read/card else NULL end as rr_per_card from table (mon_get_table(NULL, NULL, -2)) as mgt, syscat.tables t where t.tabschema=mgt.tabschema and t.tabname=mgt.tabname and rows_read > 0 and rows_read > 10000*card and card > 0 and volatile !='C' and t.tabschema='REDACTED' order by card, rows_read with ur TABNAME ROWS_READ CARD RR_PER_CARD ------------------------------ -------------------- -------------------- -------------------- GRRGSTRNT 743823 1 743823 TAXCGRY 37312 2 18656 PLCYACCLCK 49759 2 24879 PLCYPASSWD 83208 2 41604 PLCYACCT 94085 2 47042 STORECONF 27373893 2 13686946 ATCHOBJTYP 9178023 4 2294505 DMACTTYPE 20402432 4 5100608 DMELEMENTTYPE 213803013 6 35633835 CFCATGROUP 10420263 7 1488609 CALUSAGE 340442 8 42555 DISPCGPREL 25019058 13 1924542 LANGUAGEDS 11830018 14 845001 CFPRODUCTSET 168646681 14 12046191 PRODUCTSET 2375323449 15 158354896 SHPARRANGE 529786 17 31163 ATCHRLUS 5958892 18 331049 JURST 12021918 18 667884 ...
What this query finds is tables where for every single row in the table, DB2 has read it at least 10,000 times. It also ignores volatile tables. Obviously, using the cardinality relies on having current statistics. The far right column tells us, on average, how many times the entire table was scanned (though the activity could just be reading one row over again). Logically, you'll want to consider what's happening with these tables - it could just be that they need indexes added, or it could be that DB2 is reading them over and over again, and they should be in a tablespace with a bufferpool dedicated so they can remain in memory.
Tables with High Update/Insert/Delete Activity
Update
For updates, this query works similarly to the first rows_read one:
with ru_stats as ( select decimal(avg(rows_updated),20,2) as avg_ru, decimal(stddev(rows_updated),20,2) as stddev_ru, decimal(avg(rows_updated) - 2 * stddev(rows_updated),20,2) as avg_less_2_stddev, decimal(avg(rows_updated) + 2 * stddev(rows_updated),20,2) as avg_plus_2_stddev from table (mon_get_table(NULL,NULL,-1)) as t where t.tabschema='REDACTED' ) select substr(tabname,1,30) as tabname, rows_updated, ru_stats.avg_ru, case when rows_updated > ru_stats.avg_plus_2_stddev then 'High' when rows_updated < ru_stats.avg_less_2_stddev then 'Low' else NULL end as flag, ru_stats.stddev_ru, ru_stats.avg_less_2_stddev, ru_stats.avg_plus_2_stddev from table (mon_get_table(NULL,NULL,-1)) as t1, ru_stats where t1.tabschema='REDACTED' and rows_updated > 0 order by rows_updated desc with ur; TABNAME ROWS_UPDATED AVG_RU FLAG STDDEV_RU AVG_LESS_2_STDDEV AVG_PLUS_2_STDDEV ------------------------------ -------------------- ---------------------- ---- ---------------------- ---------------------- ---------------------- DMELESTATS 31812700 72611.00 High 1156856.82 -2241102.64 2386324.64 DMEMSPOTSTATS 10227837 72611.00 High 1156856.82 -2241102.64 2386324.64 CATENTRYATTR 3981154 72611.00 High 1156856.82 -2241102.64 2386324.64 ORDERS 3578831 72611.00 High 1156856.82 -2241102.64 2386324.64 CATGPENREL 2140700 72611.00 - 1156856.82 -2241102.64 2386324.64 ORDERITEMS 1069409 72611.00 - 1156856.82 -2241102.64 2386324.64 MBRATTRVAL 1059392 72611.00 - 1156856.82 -2241102.64 2386324.64 USERS 911379 72611.00 - 1156856.82 -2241102.64 2386324.64 MEMBER 911365 72611.00 - 1156856.82 -2241102.64 2386324.64 ...
As with the previous query, the "FLAG" column identifies tables that are statistically significantly out of the norm.
Insert
with ri_stats as ( select decimal(avg(rows_inserted),20,2) as avg_ri, decimal(stddev(rows_inserted),20,2) as stddev_ri, decimal(avg(rows_inserted) - 2 * stddev(rows_inserted),20,2) as avg_less_2_stddev, decimal(avg(rows_inserted) + 2 * stddev(rows_inserted),20,2) as avg_plus_2_stddev from table (mon_get_table(NULL,NULL,-1)) as t where t.tabschema='REDACTED' ) select substr(tabname,1,30) as tabname, rows_inserted, ri_stats.avg_ri, case when rows_inserted > ri_stats.avg_plus_2_stddev then 'High' when rows_inserted < ri_stats.avg_less_2_stddev then 'Low' else NULL end as flag, ri_stats.stddev_ri, ri_stats.avg_less_2_stddev, ri_stats.avg_plus_2_stddev from table (mon_get_table(NULL,NULL,-1)) as t1, ri_stats where t1.tabschema='REDACTED' order by rows_inserted desc with ur; TABNAME ROWS_INSERTED AVG_RI FLAG STDDEV_RI AVG_LESS_2_STDDEV AVG_PLUS_2_STDDEV ------------------------------ -------------------- ---------------------- ---- ---------------------- ---------------------- ---------------------- CACHEIVL 72260808 115640.00 High 2482812.07 -4849984.15 5081264.15 CTXDATA 6582356 115640.00 High 2482812.07 -4849984.15 5081264.15 DMEXPLOG 4797431 115640.00 - 2482812.07 -4849984.15 5081264.15 PX_CDPROMO 3776761 115640.00 - 2482812.07 -4849984.15 5081264.15 PX_CDPOOL 3226659 115640.00 - 2482812.07 -4849984.15 5081264.15 DMUSERBHVR 1567480 115640.00 - 2482812.07 -4849984.15 5081264.15 MBRATTRVAL 920179 115640.00 - 2482812.07 -4849984.15 5081264.15 CTXMGMT 822950 115640.00 - 2482812.07 -4849984.15 5081264.15 MEMBER 795710 115640.00 - 2482812.07 -4849984.15 5081264.15 ...
Delete
with rd_stats as ( select decimal(avg(rows_deleted),20,2) as avg_rd, decimal(stddev(rows_deleted),20,2) as stddev_rd, decimal(avg(rows_deleted) - 2 * stddev(rows_deleted),20,2) as avg_less_2_stddev, decimal(avg(rows_deleted) + 2 * stddev(rows_deleted),20,2) as avg_plus_2_stddev from table (mon_get_table(NULL,NULL,-1)) as t where t.tabschema='REDACTED' ) select substr(tabname,1,30) as tabname, rows_deleted, rd_stats.avg_rd, case when rows_deleted > rd_stats.avg_plus_2_stddev then 'High' when rows_deleted < rd_stats.avg_less_2_stddev then 'Low' else NULL end as flag, rd_stats.stddev_rd, rd_stats.avg_less_2_stddev, rd_stats.avg_plus_2_stddev from table (mon_get_table(NULL,NULL,-1)) as t1, rd_stats where t1.tabschema='REDACTED' and rows_deleted > 0 order by rows_deleted desc with ur; TABNAME ROWS_DELETED AVG_RD FLAG STDDEV_RD AVG_LESS_2_STDDEV AVG_PLUS_2_STDDEV ------------------------------ -------------------- ---------------------- ---- ---------------------- ---------------------- ---------------------- CACHEIVL 62205357 90131.00 High 2139222.54 -4188314.08 4368576.08 CTXDATA 7152013 90131.00 High 2139222.54 -4188314.08 4368576.08 DMEXPLOG 4786455 90131.00 High 2139222.54 -4188314.08 4368576.08 DMUSERBHVR 1608087 90131.00 - 2139222.54 -4188314.08 4368576.08 CTXMGMT 894125 90131.00 - 2139222.54 -4188314.08 4368576.08 XSTORESKU 415115 90131.00 - 2139222.54 -4188314.08 4368576.08 TMPCMPLIST 168692 90131.00 - 2139222.54 -4188314.08 4368576.08 ORDERITEMS 59884 90131.00 - 2139222.54 -4188314.08 4368576.08 ...
By now the format of the query should look familiar. I also found it interesting that some of the tables I listed ahead of time were only confirmed through this process. Both CACHEIVL and CTXDATA I knew would need special attention ahead of time.
Tables that are Large in Size
If simply looking for tables that are the largest in your database, the query is simple:
select substr(tabname,1,30) as tabname, card from syscat.tables where type='T' and tabschema='WSCOMUSR' order by card desc fetch first 20 rows only with ur TABNAME CARD ------------------------------ -------------------- PX_CDPROMO 50934290 PX_CDPOOL 50933284 CACHEIVL 30270713 MBRATTRVAL 11243103 USERS 11033793 MEMBER 11033724 BUSEVENT 6805318 PPCEXTDATA 5368244 MBRGRPMBR 5326807 SRCHSTAT 5251287 SCHBRDCST 4341523 XSTORESKU 3908962 MBRREL 3529638 ORDPAYINFO 2776001 ADDRESS 2246643 CATENTRYATTR 2218289 ORDERITEMS 2033306 ADDRBOOK 1887871 MBRROLE 1768298 USERREG 1767892 20 record(s) selected.
One of the tricky things here is ensuring you are properly pruning your e-commerce database. If you haven't been doing that (using dbclean for WebSphere Commerce), then these numbers may be misleading. The database I'm looking at is well pruned. The next thing is to figure out what you need to do with these tables. Are they ones that are likely to be sequentially read, and so need to go in a tablespace/bufferpool appropriate for that? Or are they fairly randomly accessed? Do they warrant a single tablespace just for themselves?
Tables with Questionable Page Sizes
There's this really great blog entry from Serge Rielau and Rick Swagerman (they have many great entries) that outlines how to look at the "worst-case" row size, which is what you have to consider when you're looking at what page size a table should and must go on. Go check it out for the stored proceedure (DBA.GetRowSize) used in the next query.
This query can help you find tables where few rows fit on a page, and therefore a larger page size could be appropriate to save you space:
with T1 as( SELECT varchar(tabschema, 10) as tabschema , substr(tabname,1,30) as tabname , card , DBA.GetRowSize(tabschema, tabname) as rowsize , avgrowsize , pagesize , case when avgrowsize = 0 then NULL else pagesize/avgrowsize end as avg_rows_per_page , case when DBA.GetRowSize(tabschema, tabname) = 0 then NULL else pagesize/DBA.GetRowSize(tabschema, tabname) end as min_rows_per_page FROM SYSCAT.TABLES tab , SYSCAT.TABLESPACES tbsp WHERE TABNAME not like 'SYS%' AND type = 'T' AND tab.tbspaceid = tbsp.tbspaceid ORDER BY tabname ) SELECT tabschema , tabname , card , rowsize , avgrowsize , pagesize , case when rowsize > 16293 then '32K' when rowsize > 8101 then '16K' when rowsize > 4005 then '8K' when rowsize <= 4005 then '4K' end as min_page_size , avg_rows_per_page , min_rows_per_page FROM T1 ORDER BY avg_rows_per_page, min_rows_per_page FETCH FIRST 20 ROWS ONLY TABSCHEMA TABNAME CARD ROWSIZE AVGROWSIZE PAGESIZE MIN_PAGE_SIZE AVG_ROWS_PER_PAGE MIN_ROWS_PER_PAGE ---------- ------------------------------ -------------------- ----------- ---------- ----------- ------------- ----------------- ----------------- REDACTED BUSEVENT 6805318 4097 3442 8192 8K 2 1 DBAMON097 WORKLOAD_HIST_SUMMARY 2 1325 1088 4096 4K 3 3 REDACTED ACRELGRP 19 3052 975 4096 4K 4 1 DBA LOCK_PARTICIPANTS 13680 3575 535 4096 4K 7 1 REDACTED ORDERITEMS 2033306 1957 454 4096 4K 9 2 REDACTED BUSAUDIT 1846 1959 400 4096 4K 10 2 DB2INST1 EXPLAIN_OBJECT 48298 1116 351 4096 4K 11 3 REDACTED STADDRESS 236 3481 338 4096 4K 12 1 SYSTOOLS HMON_ATM_INFO 954 1421 319 4096 4K 12 2 DBA LOCK_PARTICIPANT_ACTIVITIES 6574 1672 323 4096 4K 12 2 REDACTED ADDRESS 2246643 3885 312 4096 4K 13 1 DB2INST1 ADVISE_INDEX 1122 3130 313 4096 4K 13 1 REDACTED ORDERS 1678809 1059 305 4096 4K 13 3 REDACTED PX_PROMOTION 1439 848 293 4096 4K 13 4 REDACTED ICDATAREG 9 296 306 4096 4K 13 13 REDACTED TERMCOND 52 3978 282 4096 4K 14 1 REDACTED CATENTRY 283320 1441 282 4096 4K 14 2 REDACTED DMCAMPAIGN 78 1847 274 4096 4K 14 2 REDACTED CALCODE 1466 748 280 4096 4K 14 5 REDACTED TRADEPOSCN 43 559 278 4096 4K 14 7 20 record(s) selected.
The first table in that list just drives me nuts. When I analyze it, I see that 40% or more of the space that the table takes up is just wasted, because it has a large varchar field that when full, can only fit one row on a page. Certain ways that the app uses the table means that there may be 300 or more associated rows in the table that all use that maximum and would always be queried as a block. Clearly that table needs a larger page size.
There are others there that may warrant investigation. Clearly we won't be changing explain or system tools tables, but it's still interesting. This query would be most useful if you're tight on space and looking to reduce wasted space in the database.
Tables with LOBs in Them
You may want to consider putting tables with LOBs in them in their own tablespace and turn on file system caching, or you may want to direct LOB data to a separate tablespace. Separate LOB data may only be in a DMS tablespace. You may also want to read Mike Krafick's blog entry on inline and dealing with LOBs.
This is a query that not only lists the tables with LOBs in them, but is also useful for analyzing the LOB columns in your tables, if you're thinking about inling LOBs or separating them out to their own tablespace:
select substr(c.tabname,1,18) tabname, substr(colname,1,30) colname, length, AVGCOLLEN, c.logged, t.card, ts.pagesize from syscat.columns c, syscat.tables t, syscat.tablespaces ts where t.tabname=c.tabname and t.tabschema=c.tabschema and t.tbspaceid=ts.tbspaceid and typename in ('BLOB', 'CLOB') and t.tabschema='REDACTED' order by c.tabname, c.colname with ur TABNAME COLNAME LENGTH AVGCOLLEN LOGGED CARD PAGESIZE ------------------ ------------------------------ ----------- ----------- ------ -------------------- ----------- ATTACHMENT CONTENT 10000000 5 Y 0 4096 BUSAUDIT PARAMETERS 10000000 134 Y 1846 4096 CATCONFINF CONFIGURATION 1000000 5 Y 0 8192 CATENTDESC LONGDESCRIPTION 1000000 55 Y 283318 16384 CATENTDESC XMLDETAIL 1000000 5 Y 283318 16384 CMLARGEFILE CMFILE 300000000 5 Y 0 4096 CMSMALLFILE CMFILE 32000 5 Y 0 4096 CNTRSTORE STOREXML 1000000 112 Y 15 4096 COLLDESC LONGMKTGTEXT 1000000 54 Y 8961 8192 DKPDCDESC LONGDESCRIPTION 1000000 5 Y 0 4096 DMACTIVITY FLOWXML 1000000 5 Y 3512 16384 DMELETEMPLATE BEHAVIORXML 1000000 77 Y 75 16384 DMELETEMPLATE DISPLAYXML 1000000 5 Y 75 16384 DMELETEMPLATE IMPLXML 1000000 112 Y 75 16384 DMELETEMPLATE RELATEDXML 1000000 57 Y 75 16384 ...
Summary
I am intentionally not sharing the extent of my final decisions here. I share a lot, but some of my work is the intellectual property of my employer. I did end up with a few crucial tables broken out into individual tablespaces, one or two tablespaces for a logical grouping of tables, one tablespace for those highly-read code tables, and one for LOBs.
I hope the methodology and thought process here helps someone.
[…] Redesigning Tablespaces in an Existing Database […]
Something you left out of consideration and as much as some people don’t like to believe it, Size matters(this isn’t always your table with the most rows, either). I have had to create additional tablespaces due to not having anymore space in a tablespace. Then you have to deal with lowering your High Water Mark as well. Not a fun process. It is much easier to think of these items prior to implementation rather than changing an existing database.
True – this was especially a problem back before the days of “LARGE” tablespaces – when the limit on a single table in a DMS tablespace on a single partition for 4K page size was 64GB. Luckily there are a number of things we can use to deal with that – the use of LARGE tablespaces, table partitioning, database partitioning, etc. It certainly is a good idea to break out particularly large tables.
Great article with even better SQL.
There is a lot more to be said about this subject, I agree with Dave. The fact that you are currenly running on V9.7 or even on V10 does not say a lot about the current state of your tablespaces. When you make it a habit to play it save during conversion (redirected restores only) your database could still contains the heritage of the older versions (SMS, non-large non-reclaimable DMS ect) or (do not forget to mention that) tablespaces residing on ext3 filesystems.
But again, thank you for another great article.
Thanks Ember for sharing your ideas. I agree with Dave: “size matters”. In our case we split every table into their own Data, Index and LOB (if applies) tablespaces. At the end this has given us more opportunity to do either offline/online maintenance or DDL changes and addresses many of the issues you mention above: tables with High DML Activity, need for different page size (Data vs Index vs LOB), FS caching.
Regards
Hi, What is the best practice defining a DMS container, like should we define one container or it should be multiple specially if all the containers are at the same FS.
Well, really, you should be using automatic storage on db2 10.1 and later – no need to define the storage at the tablespace level. Prior to 10.1, my opinion is that one container per fully separate I/O path – which usually means just one container. Containers can easily be grown and are not too hard to shrink.
Hi Ember,
General query regarding number of tablespaces . How many tablespaces does a production db contains .(what is the big number that you had observed in your experience ).
Back in the bad old days, I saw some databases (maybe in support of PeopleSoft?) where each table was in it’s own tablespace. Actually, I talked to a large national retailer within the last year who still did it that way. The reasoning originally was that in certain conditions the entire tablespace would be locked. Today, that level of locking just doesn’t happen outside of things like offline backups, so the reasoning is not there for that method in my mind.
The number of tablespaces can quickly multiply when you look at the categories I’ve talked about and the fact that you may need a tablespace of each page size for each of those categories.
It’s not about hitting a hard number, but really about not shooting yourself in the foot on maintenance, and keeping your reasoning for each tablespace solid.