Db2 Basics: Table Spaces

Table spaces are one of the more powerful tools in Db2 that are largely invisible. We used to have to spend a lot more time on table space management.

Basics and Terminology

Some of the terms here are rarely needed any more, and may not be something you run into every day, but they may be critical for understanding the historical context of how we got where we are today.

What is the Purpose of a Table Space?

A table space is a layer of abstraction. It allows a DBA to decide what the underlying storage looks like for a table or group of tables, while the developer at most just needs to know the name of the table space, and sometimes not even that. Table spaces can be used for striping data across multiple disks, isolating related tables to back up in a smaller unit than the whole database, enabling multi-temperature storage to reduce storage costs, isolation of problematic workloads in dedicated buffer pools, range partitioning one table across multiple different disks, separating IO for data, index, and long data, and for enabling more parallelism during backup, and probably some other purposes as well.

SMS vs. DMS

In the bad old days, SMS(System Managed Spaces) offered an advantage in space management at the cost of performance. At one point, we said it was about a 10% hit in performance. But it meant the only level at which we needed to allocate space was at the disk. Db2 would use space and increase what it was using. This methodology also put each table and index in their own files within the path specified. Today, SMS is only used for temporary table spaces, where it makes a lot of sense to make it particularly easy to release the very volatile space used by temporary tables. Db2 has been working on the transition away from SMS for a while. For a while, the system catalog was still on SMS.

What’s the alternative to SMS? It is DMS (Database Managed Space). DMS table spaces mingle all of the tables and indexes within a single set of files. There used to be an option to specify raw containers to allow Db2 to manage everything about the storage, but that has since gone away. It used to be that we had to very manually set and extend table space/table space container sizes, and manage their extension manually. Now, nearly all DMS table spaces use auto extend, mostly within the context of AST and storage groups.

Because databases generally tend to grow in size over time, making a table space smaller is still slightly manual, though relatively painless if you have reclaimable table spaces (created on 9.7 or later).

Today, the explicit use of SMS table spaces for user data is deprecated. I also wouldn’t design for explicit DMS, but would instead use AST, which will use SMS for temporary data and DMS for all other data under the covers.

User Data vs Temporary

In addition to storing user data, we also need space for temporary tables. Temporary tables are necessary to satisfy some queries, particularly on larger data sets. They are often used in sort and join operations. Temporary data is extremely volatile in size. There can be many GB used one moment and nothing the next moment. In fact, if you’ve not properly isolated temporary table spaces, they can fill up your disk to the point where other database activity is not possible. Because of this, it is great to isolate your temporary table spaces or allocate a reasonably large amount of space. In heavy analytics environments, it is common to put your temporary table spaces on the fastest disk you have available – up to and including local NVMe. Using the fastest possible storage here can speed up joins and sorts.

There are also two types of temporary table spaces. System temporary table spaces are for Db2 to use as it needs for sorts and joins and such. User temporary table spaces are to be used by user-defined temporary tables (whether declared or created). Though user created temporary tables need the user type of temporary table space available, the characacteristics of these tables are more similar to system temporary table spaces than regular table spaces, as the data is highly volatile, and apt to change from moment to moment.

Regular vs. Large

Using a 4K page size, the maximum possible size of a regular table space is 64 GB. This was more than enough back on 32-bit systems 20 years ago, but today is not really that much. To make this easier, IBM introduced the Large table space to accommodate larger sizes. The main difference is in the format of the RIDs (Row IDentifiers) that are used to uniquely locate each row. In large table spaces, the maximum table space size is 8 TB. All table spaces today should be created as large table spaces – there is really no advantage with going with regular table spaces.

Automatic Storage Table Spaces (AST)

Automatic Storage Table spaces (AST) is now the default. It introduces another layer of abstraction – the storage group – to further customize how storage is automatically laid out for a database. It is not required to use AST just to have DMS table spaces use autoresize to increase in size as needed – these are separate concepts.

Configuration Parameters Related to Table Spaces

The most basic configuration for a table space that can and should be specified on table space creation is the page size. If it is not specified, then the default page size for the database is used. There must be a buffer pool of the same page size as the table space being created, and if it’s not the same page size as IBMDEFAULTBP, the buffer pool must be explicitly specified. The type of table space from the categories explored above must be properly specified.

The extent size and prefetch size can be specified, or the defaults can be used. Specific parameters about disk overhead and transferrate can be inherited from the storage group, explicitly specified, or the defaults can be used.

It is important to properly specify on table space creation whether or not to use file system caching. Generally you want table spaces that contain or focus on LOB data to use filesystem caching, and otherwise you likely want filesystem caching off. The defaults here are a complicated mix of the operating system, filesystem types, and other factors. These complications mean it is best to explicitly specify what you want.

Commands and Exploring Table Spaces

My fingers still go first to LIST TABLESPACES, LIST TABLESPACES SHOW DETAIL and LIST TABLESPACE CONTAINERS FOR <tbsp_id>, but these commands are deprecated. Snapsnots are not something I commonly use here. Below are the methods I would recommend using.

db2pd

The information returned from db2pd is good, but a tad hard to read. Here’s what it looks like for a recently created SAMPLE database on 11.5:

$ db2pd -db sample -tablespaces

Database Member 0 -- Database SAMPLE -- Active -- Up 0 days 01:06:00 -- Date 2023-06-13-02.03.25.504125

Tablespace Configuration:
Address            Id    Type Content PageSz ExtentSz Auto Prefetch BufID BufIDDisk FSC NumCntrs MaxStripe  LastConsecPg RSE  Name
0x00007FB71733A540 0     DMS  Regular 8192   4        Yes  4        1     1         Def 1        0          3            Yes  SYSCATSPACE
0x00007FB717347900 1     SMS  SysTmp  8192   32       Yes  32       1     1         On  1        0          31           No   TEMPSPACE1
0x00007FB717354CC0 2     DMS  Large   8192   32       Yes  32       1     1         Def 1        0          31           Yes  USERSPACE1
0x00007FB717362080 3     DMS  Large   8192   32       Yes  32       1     1         Def 1        0          31           Yes  IBMDB2SAMPLEREL
0x00007FB71736F440 4     DMS  Large   8192   32       Yes  32       1     1         Def 1        0          31           Yes  IBMDB2SAMPLEXML
0x00007FB72110BAC0 5     DMS  Large   8192   4        Yes  4        1     1         Def 1        0          3            Yes  SYSTOOLSPACE

Tablespace Statistics:
Address            Id    TotalPgs   UsablePgs  UsedPgs    PndFreePgs FreePgs    HWM        Max HWM    State      MinRecTime NQuiescers PathsDropped TrackmodState
0x00007FB71733A540 0     20480      20476      16440      0          4036       16440      16440      0x00000000 0          0          No           n/a
0x00007FB717347900 1     1          1          1          0          0          -          -          0x00000000 0          0          No           n/a
0x00007FB717354CC0 2     4096       4064       1824       0          2240       1824       1824       0x00000000 0          0          No           n/a
0x00007FB717362080 3     4096       4064       608        0          3456       608        608        0x00000000 0          0          No           n/a
0x00007FB71736F440 4     4096       4064       1440       0          2624       1440       1440       0x00000000 0          0          No           n/a
0x00007FB72110BAC0 5     4096       4092       104        0          3988       104        104        0x00000000 0          0          No           n/a

Tablespace Autoresize Statistics:
Address            Id    AS  AR  InitSize             IncSize              IIP MaxSize              LastResize                 LRF
0x00007FB71733A540 0     Yes Yes 33554432             -1                   No  None                 None                       No
0x00007FB717347900 1     Yes No  0                    0                    No  0                    None                       No
0x00007FB717354CC0 2     Yes Yes 33554432             -1                   No  None                 None                       No
0x00007FB717362080 3     Yes Yes 33554432             -1                   No  None                 None                       No
0x00007FB71736F440 4     Yes Yes 33554432             -1                   No  None                 None                       No
0x00007FB72110BAC0 5     Yes Yes 33554432             -1                   No  None                 None                       No

Tablespace Storage Statistics:
Address            Id    DataTag  Rebalance SGID  SourceSGID
0x00007FB71733A540 0     0        No        0     -
0x00007FB717347900 1     0        No        0     -
0x00007FB717354CC0 2     -1       No        0     -
0x00007FB717362080 3     -1       No        0     -
0x00007FB71736F440 4     -1       No        0     -
0x00007FB72110BAC0 5     -1       No        0     -

Containers:
Address            TspId ContainNum Type    TotalPgs   UseablePgs PathID     StripeSet  Container
0x00007FB71732A320 0     0          File    20480      20476      0          0          /database/data/db2inst1/NODE0000/SAMPLE/T0000000/C0000000.CAT
0x00007FB71737D000 1     0          Path    1          1          0          0          /database/data/db2inst1/NODE0000/SAMPLE/T0000001/C0000000.TMP
0x00007FB717333380 2     0          File    4096       4064       0          0          /database/data/db2inst1/NODE0000/SAMPLE/T0000002/C0000000.LRG
0x00007FB717333BA0 3     0          File    4096       4064       0          0          /database/data/db2inst1/NODE0000/SAMPLE/T0000003/C0000000.LRG
0x00007FB71730DA40 4     0          File    4096       4064       0          0          /database/data/db2inst1/NODE0000/SAMPLE/T0000004/C0000000.LRG
0x00007FB72111B000 5     0          File    4096       4092       0          0          /database/data/db2inst1/NODE0000/SAMPLE/T0000005/C0000000.LRG

db2top/dmctop

Both db2top and dmctop have a tablespaces screen that provides most of the details you might need. Here’s an example of using dmctop for this on the same sample database:

In this interface, you can scroll right/left for more columns of information. If you have a lot of table spaces, you can also scroll up and down.

SQL

If you’re trying to do anything programatic with this data, SQL is absolutely the best way to get to it. The tables you’re likely to need are SYSCAT.TABLESPACES and the MON_GET_TABLESPACE table function. For an example of querying these tables, see the blog entry DB2 Administrative SQL Cookbook: Listing the Number of Pages in Tablespaces, by Bufferpool

Creating and Dropping

Table spaces can be created and dropped just like any any other database object. When you drop a table space, generally all of the objects in the table space are also dropped. This can be a lot, so be very careful when dropping table spaces. If there is a table that spans multiple table spaces, then you must drop that table (or otherwise remove any parts of that table from the table space before the DROP TABLESPACE can succeed. There is a property you can add to a table called RESTRICT on DROP. If this is defined for a table it must be removed or the table dropped before the table space containing it can be dropped.

The Bad Old Days

I remember when I first learned about table spaces in any detail, when I was about 6 months into being a DBA. This was about 2001, and probably on Db2 7.2. I had my own databases that I was a physical DBA for, but many of the people who had been working on these databases were also DBAs of less than 2 years, so there were … gaps. I went to use some commands I had learned to look at how full my table spaces were. Table spaces back then would not auto extend – you could script it, but there was no way to have Db2 do it for you. This meant that if you were using DMS, you needed to carefully monitor when your table spaces were getting full, make sure you had disk space, and then extend your containers (or more rarely, add more containers). In this case, I learned some commands for seeing how full tablespaces were, looked at a database that I regularly supported, and discovered that my table spaces were 98% full. I quickly got them extended, luckily before there was any impact. Had they filled up completely, the database would have been unwritable until they were extended. This scenario is fairly unlikely today.

Table Space Design Today

For small databases, just picking a page size based on your use cases and going with defaults across the board works just fine. As databases get larger, it may make sense to split tables out into table spaces. The two reasons I’ve most frequently seen this done is to speed up backups and to isolate a problematic table in it’s own buffer pool. Check out Redesigning Tablespaces in an Existing Database for more information on table space design.

I do occasionally run across databases where each table is in it’s own table space. This most often comes from one of two directions. First, in the bad old days, doing a LOAD operation into a table took a lock on the entire table space. This meant that separate table spaces were often needed for tables where LOADs needed to happen concurrently, or at the same time as activity on other tables. The other direction this design choice seems to come from is Db2 for z/OS, where the concepts of table and table space seem to be co-mingled. This used to be a terribly heavy design choice to make, as managing free space in all of those table spaces was rough. It is less so today with AST and with autoresize, but still is not my first choice by a long shot.

Contrast with MySQL

In MySQL if you use a file-per-table strategy, it’s fairly similar to Db2’s SMS approach. There are a couple of things that are very different. The reclamation of space if you don’t use file-per-table in MySQL is difficult or impossible, while in Db2, this is possible using DMS/AST table spaces.

Perhaps more importantly, Db2 assumes you will be using different logical disks. This has the disadvantage that if you want to take a disk-level snapshot, you must first suspend writes, then take your snapshot (hopefully seconds or less), and then remove the write suspend. In many cases, if you don’t do this, your disk snapshot will still work, but there are random cases when it won’t, which is not what you want when relying on a backup. MySQL on the other hand, you aim for having everything on the same logical disk, which avoids the problems around synchronization between different disks, but also rules out some interesting use cases.

Cool Uses

One of the more interesting configurations I’ve seen recently was on AWS. The client was using fairly slow EBS disks, and the query speed was not matching what they were looking for. However, the cost of faster disks was a major concern. Using table space configuration, we were able to stripe their data across a number of EBS volumes to get a faster overall throughput. It used to be a pretty major effort to try to match our table space design to our disk subsystem. As disks have become faster, and as storage admins have become less willing to commit to a specific set of storage characteristics, this has generally become less important.

Further Reading

I have a some other posts about tablespaces in a number of different aspects. Check out:

Ember Crooks
Ember Crooks

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, spent a year working with high-volume MySQL, and is now learning Snowflake. Ember shares both posts about her core skill sets and her journey learning Snowflake.

Ember lives in Denver and work from home

Articles: 548

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.