One of the things that learning a new platform (MySQL) has taught me is just how critical the most basic topics are, so I thought I’d do an overview of how to get data out of a Db2 (on Linux, Unix, or Windows) database. I’m going to cover a few basic access methods, and then go through extraction methods by target location for the data being extracted.
I’m not covering basic access by programs via drivers and APIs, as that is pretty specific to the language being used.
Types of Access
There are many different ways of accessing a Db2 database – some of which you may have and some of which you may not. Here are a few that I can think of and that may be relevant in the methods discussed in greater detail below. All assume you have some ID with permissions to query the data you need.
SSH Access to the Database Server
In the old days, this was the most basic and one of the more likely kinds of access to have. With databases in containers, in the cloud, and on Kubernetes, this is far less likely. Since Db2 and the systems surrounding it are a bit old-school, it is still worth covering in detail. Most DBAs I know do NOT use the prompt you get by typing ‘db2’ and hitting enter that seems to be the preference of DBAs coming from Oracle (SQLPLUS) or MySQL, but I’ll cover a couple of different ways to handle this in the right contexts below.
Just Work from BASH or KSH and Type db2 Before Every Command
This is my first choice for interacting with Db2 when I have ssh access to the database server. There are a couple of important details here. First, you don’t have to ssh to to the database server using a privileged id. You can ssh using any id, and then source the db2profile, and then specify the id with database privileges in the database connection statement. That looks something like this:
$ ssh ecrooks@server.name.example.com
...
$ db2 list db directory
-bash: db2: command not found
$ . /database/config/db2inst1/sqllib/db2profile
$ db2 list db directory
System Database Directory
Number of entries in the directory = 1
Database 1 entry:
Database alias = SAKILA
Database name = SAKILA
Local database directory = /database/data
Database release level = 15.00
Comment =
Directory entry type = Indirect
Catalog database partition number = 0
Alternate server hostname =
Alternate server port number =
$ db2 connect to SAKILA user db2inst1
Enter current password for db2inst1:
Database Connection Information
Database server = DB2/LINUXX8664 11.5.8.0
SQL authorization ID = DB2INST1
Local database alias = SAKILA
Once I have that connection, I can issue queries and further db2 commands by preceding my commands with db2
. That looks something like this:
$ db2 "select * from t1 with ur"
C1 U1 TEXT
-------------------- -------------------- --------------------------------------------------
1 101 test1
4 104 test4
3 103 test7
3 record(s) selected.
Other Command Line Interfaces
There are two command line interfaces that are maybe more similar to the interfaces used with Oracle and MySql. You can get to a db2 command line just by typing db2 and hitting enter. In that case, there is no need to precede queries with db2
, but I’ve never found the command recall options intuitive with it. You can also use clpplus, which does have some nice formatting options, but can be a bit difficult to get working for interactive commands in a containerized environment.
Command Line Client on App Server or Workstation/Laptop on Same Network as DB Server
This has most of the same options as being local on the server when we’re talking about access to data, but requires an extra step of cataloging the remote database you want to access, if it is not already cataloged.
GUI Client on App Server or Workstation/Laptop on Same Network as DB Server
I’ve never been much of a fan of GUI interfaces, but they can be nice if you’re working with many db platforms or if you’re more used to a GUI. Perhaps this is a great option for people coming over from MS SQL Server. There are a number of free and paid cross-rdbms GUI database interfaces for this. As far as ease of use goes, I found DBeaver to make a lot of sense. I’ve known people who used SQuirreL sql client and many others.
IBM offers a couple in this space as well. IBM DataStudio is a popular one with more developer features, and DMC (Database Management Console) is one with more features for administrators. DMC even comes in container form, which can be useful for those of us used to working with containers. Both are free.
Just like the command line tools, these GUIs will require you to enter the catalog information in some way.
Data Target: Human Eyeballs
The quickest way to get the data in front of human eyeballs is to establish a connection using any of the above and simply run a query. From a command line, that looks something like:
$ db2 "select * from syscat.bufferpools"
BPNAME BUFFERPOOLID DBPGNAME NPAGES PAGESIZE ESTORE NUMBLOCKPAGES BLOCKSIZE NGNAME
-------------------------------------------------------------------------------------------------------------------------------- ------------ -------------------------------------------------------------------------------------------------------------------------------- ----------- ----------- ------ ------------- ----------- --------------------------------------------------------------------------------------------------------------------------------
IBMDEFAULTBP 1 - -2 4096 N 0 0 -
1 record(s) selected.
Note that this isn’t all that great of a format for human eyeballs – the fields end up right-padded with what appears to be spaces to the maximum length of the text fields. Db2 doesn’t try to make this better for you in this interface. You can make this better by selecting substrings from longer fields or just use a different interface. It does let you just look at the data, and many of the queries you may read in my blogs include those substr functions for formatting.
A GUI may be a better choice for eyeballing it. GUIs will often let you enter any query, may have query building tools or an IDE, and often display output in a format similar to a spreadsheet. You just want to make sure you know what autocommit is doing and what your isolation level is, as Db2 can obtain and hold locks for simple read-onluy queries if you’re not careful. I’ve even seen carelessness in this area lead to outages.
Data Target: A Structured File (such as CSV)
This is one area where I’ve found myself disappointed in some other platforms. Getting a CSV out of Db2 is really a breeze. But let’s first look at simple output into a file.
Redirecting Query Output to a File
If all you really want is exactly what you see at the command line, you can simply dump it to a file like this:
$ db2 "select * from syscat.bufferpools" >temp.out
$ cat temp.out
BPNAME BUFFERPOOLID DBPGNAME NPAGES PAGESIZE ESTORE NUMBLOCKPAGES BLOCKSIZE NGNAME
-------------------------------------------------------------------------------------------------------------------------------- ------------ -------------------------------------------------------------------------------------------------------------------------------- ----------- ----------- ------ ------------- ----------- --------------------------------------------------------------------------------------------------------------------------------
IBMDEFAULTBP 1 - -2 4096 N 0 0 -
1 record(s) selected.
If you want to get rid of the headers (useful if you’re reading the file in to a simple shell loop or something), use the -x command line option:
$ db2 -x "select * from syscat.bufferpools" >temp2.out
$ cat temp2.out
IBMDEFAULTBP 1 - -2 4096 N 0 0 -
These files don’t need to be in any specific location, just whatever location the user you’ve logged in as has permissions. This works both on the server or a command line on another server with a connection.
These output files aren’t very “structured”, so let’s move on to more structure.
CSV or Other Formats
CSV is really easy to generate with db2, using the EXPORT
command. You can use the export command to any location on the server if you’re running the command on the server, and any location on the client if you’re running the command from the client. In it’s simplest form, the command to export the data from a table looks like this:
$ db2 "export to temp.csv of del select * from syscat.bufferpools"
SQL3104N The Export utility is beginning to export data to file "temp.csv".
SQL3105N The Export utility has finished exporting "1" rows.
Number of rows exported: 1
$ cat temp.csv
"IBMDEFAULTBP",1,,-2,4096,"N",0,0,
The of del
in the command above tells db2 to create a delimited file, and the default delimiter is a comma. There are a ton of options on EXPORT – to change the row, column, and string delimiters, to use a fixed-length format or a custom db2 format called IXF that includes the table definition, and many others. One option that mind-blowingly does NOT exist is to include headers in the first row of the output. There’s no built-in way to make that happen. You can get the full list of columns for an entire table using something like this:
db2 -x "select '\"' || listagg(colname, '\",\"') || '\"' from syscat.columns where tabschema='SYSCAT' and tabname='BUFFERPOOLS'"
Then you can manually add those results of that as the first line of the file you exported. If your query isn’t just a full select of every column, then you’ll have to tailor the column list manually. Using describe on a query is a decent way of getting the column output if you don’t otherwise know it:
$ db2 "describe select * from syscat.bufferpools"
Column Information
Number of columns: 9
SQL type Type length Column name Name length
-------------------- ----------- ------------------------------ -----------
448 VARCHAR 128 BPNAME 6
496 INTEGER 4 BUFFERPOOLID 12
449 VARCHAR 128 DBPGNAME 8
496 INTEGER 4 NPAGES 6
496 INTEGER 4 PAGESIZE 8
452 CHARACTER 1 ESTORE 6
496 INTEGER 4 NUMBLOCKPAGES 13
496 INTEGER 4 BLOCKSIZE 9
449 VARCHAR 128 NGNAME 6
This isn’t exactly an easy format to deal with programmatically.
Minus the column header issue, this is pretty easy.
GUIs
Many of the database GUIs have a way to ask for output as CSV or other formats, so if you’re using one of those, look at the documentation to determine if this is possible.
Data Target: Another Db2 LUW database
Whole Database
Backup/Restore
By far the fastest option for moving an entire Db2 database from one server to another with compatible operating systems is by using Db2’s backup command and restore command. Assuming the Db2 version is the same, you can use an online backup, taking no outage on the source. This is often many orders of magnitude faster than any other option. If the servers are roughly the same power, we expect a restore to take about 0.5-1.5 times the duration of the backup.
Notice I said between compatible operating systems and between the same version. Generally big-endian Unix/Linux, little-endian Unix/Linux, and Windows are compatible within themselves, but not with each other. See the IBM Documentation on cross-platform restores for details. Also if restoring between Db2 versions, you can never restore a newer version into an older version, and when restoring from an older version to a newer one, you cannot roll forward, which means you cannot restore from an online backup, and thus an outage of the source (or a transitional restore elsewhere) is needed.
db2move
There is a tool called db2move to make exporting and then import/load easier. Most of the time, that’s used with db2look to rebuild the database structure. This is complicated and also error-prone, particularly in real systems with hundreds or thousands of tables and other objects to re-create. You also have to move the data, once exported, between systems, which can be the longest part of the process, depending on what the network looks like.
Specific Tables
Individual Table Export and Import/LOAD
Tables can be exported to files and then either IMPORTed (durable) or LOADed (fast) into the target database. This can seem pretty manual if you’re doing a lot of tables. The transfer time can also be significant for the files, depending on the data size and your network characteristics.
Load from Cursor
One of the fastest non-restore data movement methods is load from cursor. That link is the documentation I go to whenever I use it, but it kind of seems like magic sometimes, and by transferring data incrementally while loading it, can reduce the overall time. It requires a network connection between the source and target, which isn’t always possible.
Ongoing Data Synchronization
None of these methods address anything beyond the initial data movement. If ongoing synchronization is required, look into HADR, one of the many replication methods, click to containerize, or IBM Lift.
Data Target: Some other RDBMS
Perhaps the easiest way to move data to another RDBMS is to enable federation and create nicknames for the target tables, and insert the data. This allows you to work in the source Db2 database with tables from the target RDBMS much like they were local tables. I’m fairly sure this isn’t the fastest, as many RDBMSes have faster ways of getting data in, but it just depends on your target. It’s worth the try, anyway.
Assuming the target RDBMS has a way of ingesting CSVs, exporting to files and then ingesting them on the target RDBMS is probably a good option, as long as you pay attention to deal with things like your column and string delimiters existing within the data you are moving.
Summary
This is such a huge topic. Data movement can get so involved and most DBAs have war stories. Hopefully there’s something to point you in the right direction, here and terms to search for more information.