Table spaces are one of the more powerful tools in Db2 that are largely invisible. We used to have to spend a lot more time on table space management.
Basics and Terminology
Some of the terms here are rarely needed any more, and may not be something you run into every day, but they may be critical for understanding the historical context of how we got where we are today.
What is the Purpose of a Table Space?
A table space is a layer of abstraction. It allows a DBA to decide what the underlying storage looks like for a table or group of tables, while the developer at most just needs to know the name of the table space, and sometimes not even that. Table spaces can be used for striping data across multiple disks, isolating related tables to back up in a smaller unit than the whole database, enabling multi-temperature storage to reduce storage costs, isolation of problematic workloads in dedicated buffer pools, range partitioning one table across multiple different disks, separating IO for data, index, and long data, and for enabling more parallelism during backup, and probably some other purposes as well.
SMS vs. DMS
In the bad old days, SMS(System Managed Spaces) offered an advantage in space management at the cost of performance. At one point, we said it was about a 10% hit in performance. But it meant the only level at which we needed to allocate space was at the disk. Db2 would use space and increase what it was using. This methodology also put each table and index in their own files within the path specified. Today, SMS is only used for temporary table spaces, where it makes a lot of sense to make it particularly easy to release the very volatile space used by temporary tables. Db2 has been working on the transition away from SMS for a while. For a while, the system catalog was still on SMS.
What’s the alternative to SMS? It is DMS (Database Managed Space). DMS table spaces mingle all of the tables and indexes within a single set of files. There used to be an option to specify raw containers to allow Db2 to manage everything about the storage, but that has since gone away. It used to be that we had to very manually set and extend table space/table space container sizes, and manage their extension manually. Now, nearly all DMS table spaces use auto extend, mostly within the context of AST and storage groups.
Because databases generally tend to grow in size over time, making a table space smaller is still slightly manual, though relatively painless if you have reclaimable table spaces (created on 9.7 or later).
Today, the explicit use of SMS table spaces for user data is deprecated. I also wouldn’t design for explicit DMS, but would instead use AST, which will use SMS for temporary data and DMS for all other data under the covers.
User Data vs Temporary
In addition to storing user data, we also need space for temporary tables. Temporary tables are necessary to satisfy some queries, particularly on larger data sets. They are often used in sort and join operations. Temporary data is extremely volatile in size. There can be many GB used one moment and nothing the next moment. In fact, if you’ve not properly isolated temporary table spaces, they can fill up your disk to the point where other database activity is not possible. Because of this, it is great to isolate your temporary table spaces or allocate a reasonably large amount of space. In heavy analytics environments, it is common to put your temporary table spaces on the fastest disk you have available – up to and including local NVMe. Using the fastest possible storage here can speed up joins and sorts.
There are also two types of temporary table spaces. System temporary table spaces are for Db2 to use as it needs for sorts and joins and such. User temporary table spaces are to be used by user-defined temporary tables (whether declared or created). Though user created temporary tables need the user type of temporary table space available, the characacteristics of these tables are more similar to system temporary table spaces than regular table spaces, as the data is highly volatile, and apt to change from moment to moment.
Regular vs. Large
Using a 4K page size, the maximum possible size of a regular table space is 64 GB. This was more than enough back on 32-bit systems 20 years ago, but today is not really that much. To make this easier, IBM introduced the Large table space to accommodate larger sizes. The main difference is in the format of the RIDs (Row IDentifiers) that are used to uniquely locate each row. In large table spaces, the maximum table space size is 8 TB. All table spaces today should be created as large table spaces – there is really no advantage with going with regular table spaces.
Automatic Storage Table Spaces (AST)
Automatic Storage Table spaces (AST) is now the default. It introduces another layer of abstraction – the storage group – to further customize how storage is automatically laid out for a database. It is not required to use AST just to have DMS table spaces use autoresize to increase in size as needed – these are separate concepts.
Configuration Parameters Related to Table Spaces
The most basic configuration for a table space that can and should be specified on table space creation is the page size. If it is not specified, then the default page size for the database is used. There must be a buffer pool of the same page size as the table space being created, and if it’s not the same page size as IBMDEFAULTBP, the buffer pool must be explicitly specified. The type of table space from the categories explored above must be properly specified.
The extent size and prefetch size can be specified, or the defaults can be used. Specific parameters about disk overhead and transferrate can be inherited from the storage group, explicitly specified, or the defaults can be used.
It is important to properly specify on table space creation whether or not to use file system caching. Generally you want table spaces that contain or focus on LOB data to use filesystem caching, and otherwise you likely want filesystem caching off. The defaults here are a complicated mix of the operating system, filesystem types, and other factors. These complications mean it is best to explicitly specify what you want.
Commands and Exploring Table Spaces
My fingers still go first to
LIST TABLESPACES SHOW DETAIL and
LIST TABLESPACE CONTAINERS FOR <tbsp_id>, but these commands are deprecated. Snapsnots are not something I commonly use here. Below are the methods I would recommend using.
The information returned from db2pd is good, but a tad hard to read. Here’s what it looks like for a recently created SAMPLE database on 11.5:
$ db2pd -db sample -tablespaces Database Member 0 -- Database SAMPLE -- Active -- Up 0 days 01:06:00 -- Date 2023-06-13-02.03.25.504125 Tablespace Configuration: Address Id Type Content PageSz ExtentSz Auto Prefetch BufID BufIDDisk FSC NumCntrs MaxStripe LastConsecPg RSE Name 0x00007FB71733A540 0 DMS Regular 8192 4 Yes 4 1 1 Def 1 0 3 Yes SYSCATSPACE 0x00007FB717347900 1 SMS SysTmp 8192 32 Yes 32 1 1 On 1 0 31 No TEMPSPACE1 0x00007FB717354CC0 2 DMS Large 8192 32 Yes 32 1 1 Def 1 0 31 Yes USERSPACE1 0x00007FB717362080 3 DMS Large 8192 32 Yes 32 1 1 Def 1 0 31 Yes IBMDB2SAMPLEREL 0x00007FB71736F440 4 DMS Large 8192 32 Yes 32 1 1 Def 1 0 31 Yes IBMDB2SAMPLEXML 0x00007FB72110BAC0 5 DMS Large 8192 4 Yes 4 1 1 Def 1 0 3 Yes SYSTOOLSPACE Tablespace Statistics: Address Id TotalPgs UsablePgs UsedPgs PndFreePgs FreePgs HWM Max HWM State MinRecTime NQuiescers PathsDropped TrackmodState 0x00007FB71733A540 0 20480 20476 16440 0 4036 16440 16440 0x00000000 0 0 No n/a 0x00007FB717347900 1 1 1 1 0 0 - - 0x00000000 0 0 No n/a 0x00007FB717354CC0 2 4096 4064 1824 0 2240 1824 1824 0x00000000 0 0 No n/a 0x00007FB717362080 3 4096 4064 608 0 3456 608 608 0x00000000 0 0 No n/a 0x00007FB71736F440 4 4096 4064 1440 0 2624 1440 1440 0x00000000 0 0 No n/a 0x00007FB72110BAC0 5 4096 4092 104 0 3988 104 104 0x00000000 0 0 No n/a Tablespace Autoresize Statistics: Address Id AS AR InitSize IncSize IIP MaxSize LastResize LRF 0x00007FB71733A540 0 Yes Yes 33554432 -1 No None None No 0x00007FB717347900 1 Yes No 0 0 No 0 None No 0x00007FB717354CC0 2 Yes Yes 33554432 -1 No None None No 0x00007FB717362080 3 Yes Yes 33554432 -1 No None None No 0x00007FB71736F440 4 Yes Yes 33554432 -1 No None None No 0x00007FB72110BAC0 5 Yes Yes 33554432 -1 No None None No Tablespace Storage Statistics: Address Id DataTag Rebalance SGID SourceSGID 0x00007FB71733A540 0 0 No 0 - 0x00007FB717347900 1 0 No 0 - 0x00007FB717354CC0 2 -1 No 0 - 0x00007FB717362080 3 -1 No 0 - 0x00007FB71736F440 4 -1 No 0 - 0x00007FB72110BAC0 5 -1 No 0 - Containers: Address TspId ContainNum Type TotalPgs UseablePgs PathID StripeSet Container 0x00007FB71732A320 0 0 File 20480 20476 0 0 /database/data/db2inst1/NODE0000/SAMPLE/T0000000/C0000000.CAT 0x00007FB71737D000 1 0 Path 1 1 0 0 /database/data/db2inst1/NODE0000/SAMPLE/T0000001/C0000000.TMP 0x00007FB717333380 2 0 File 4096 4064 0 0 /database/data/db2inst1/NODE0000/SAMPLE/T0000002/C0000000.LRG 0x00007FB717333BA0 3 0 File 4096 4064 0 0 /database/data/db2inst1/NODE0000/SAMPLE/T0000003/C0000000.LRG 0x00007FB71730DA40 4 0 File 4096 4064 0 0 /database/data/db2inst1/NODE0000/SAMPLE/T0000004/C0000000.LRG 0x00007FB72111B000 5 0 File 4096 4092 0 0 /database/data/db2inst1/NODE0000/SAMPLE/T0000005/C0000000.LRG
Both db2top and dmctop have a tablespaces screen that provides most of the details you might need. Here’s an example of using dmctop for this on the same sample database:
In this interface, you can scroll right/left for more columns of information. If you have a lot of table spaces, you can also scroll up and down.
If you’re trying to do anything programatic with this data, SQL is absolutely the best way to get to it. The tables you’re likely to need are
SYSCAT.TABLESPACES and the
MON_GET_TABLESPACE table function. For an example of querying these tables, see the blog entry DB2 Administrative SQL Cookbook: Listing the Number of Pages in Tablespaces, by Bufferpool
Creating and Dropping
Table spaces can be created and dropped just like any any other database object. When you drop a table space, generally all of the objects in the table space are also dropped. This can be a lot, so be very careful when dropping table spaces. If there is a table that spans multiple table spaces, then you must drop that table (or otherwise remove any parts of that table from the table space before the
DROP TABLESPACE can succeed. There is a property you can add to a table called
RESTRICT on DROP. If this is defined for a table it must be removed or the table dropped before the table space containing it can be dropped.
The Bad Old Days
I remember when I first learned about table spaces in any detail, when I was about 6 months into being a DBA. This was about 2001, and probably on Db2 7.2. I had my own databases that I was a physical DBA for, but many of the people who had been working on these databases were also DBAs of less than 2 years, so there were … gaps. I went to use some commands I had learned to look at how full my table spaces were. Table spaces back then would not auto extend – you could script it, but there was no way to have Db2 do it for you. This meant that if you were using DMS, you needed to carefully monitor when your table spaces were getting full, make sure you had disk space, and then extend your containers (or more rarely, add more containers). In this case, I learned some commands for seeing how full tablespaces were, looked at a database that I regularly supported, and discovered that my table spaces were 98% full. I quickly got them extended, luckily before there was any impact. Had they filled up completely, the database would have been unwritable until they were extended. This scenario is fairly unlikely today.
Table Space Design Today
For small databases, just picking a page size based on your use cases and going with defaults across the board works just fine. As databases get larger, it may make sense to split tables out into table spaces. The two reasons I’ve most frequently seen this done is to speed up backups and to isolate a problematic table in it’s own buffer pool. Check out Redesigning Tablespaces in an Existing Database for more information on table space design.
I do occasionally run across databases where each table is in it’s own table space. This most often comes from one of two directions. First, in the bad old days, doing a
LOAD operation into a table took a lock on the entire table space. This meant that separate table spaces were often needed for tables where LOADs needed to happen concurrently, or at the same time as activity on other tables. The other direction this design choice seems to come from is Db2 for z/OS, where the concepts of table and table space seem to be co-mingled. This used to be a terribly heavy design choice to make, as managing free space in all of those table spaces was rough. It is less so today with AST and with autoresize, but still is not my first choice by a long shot.
Contrast with MySQL
In MySQL if you use a file-per-table strategy, it’s fairly similar to Db2’s SMS approach. There are a couple of things that are very different. The reclamation of space if you don’t use file-per-table in MySQL is difficult or impossible, while in Db2, this is possible using DMS/AST table spaces.
Perhaps more importantly, Db2 assumes you will be using different logical disks. This has the disadvantage that if you want to take a disk-level snapshot, you must first suspend writes, then take your snapshot (hopefully seconds or less), and then remove the write suspend. In many cases, if you don’t do this, your disk snapshot will still work, but there are random cases when it won’t, which is not what you want when relying on a backup. MySQL on the other hand, you aim for having everything on the same logical disk, which avoids the problems around synchronization between different disks, but also rules out some interesting use cases.
One of the more interesting configurations I’ve seen recently was on AWS. The client was using fairly slow EBS disks, and the query speed was not matching what they were looking for. However, the cost of faster disks was a major concern. Using table space configuration, we were able to stripe their data across a number of EBS volumes to get a faster overall throughput. It used to be a pretty major effort to try to match our table space design to our disk subsystem. As disks have become faster, and as storage admins have become less willing to commit to a specific set of storage characteristics, this has generally become less important.
I have a some other posts about tablespaces in a number of different aspects. Check out:
- Using DB2’s ADMIN_MOVE_TABLE to Move Tables from One Tablespace to Another
- DB2 Administrative SQL Cookbook: Finding Tables in a Specific Tablespace, Without RI
- Redesigning Tablespaces in an Existing Database
- Automatic Storage Tablespaces (AST): Compare and Contrast to DMS
- Find and translate a tablespace state
- DB2 Temporary Table Spaces
- Reclaimable Table Spaces