Learning SQL can feel intimidating, even to some experienced DB2 DBAs. For years, you could administer Db2 without using SQL. There are a lot of developers, too who want to learn SQL. The set-thinking of SQL can be confusing. The complete detachment from the physical layer can confound others.
SQL, like many other things is something you can learn the basics of in a few hours, but to be a master, you have to spend hundreds or thousands of hours in real-life scenarios, and include an understanding of performance to really master. I spent a couple of years focusing my learning time on SQL several years ago, when I realized it was a weakness. I’ve come a long way, but there are still masters who make me look like a grasshopper.
The last entry in this series, Db2 SQL Basics: The Most Basic Select Statement, started with an SQL statement that is as simple as it gets – selecting all rows and all columns from a table. In this post we take just one step forward to select only specific columns. The methodology shown in this week’s article covers selecting specific columns.
What is a Column, and Why Limit the Columns Returned?
In a relational table, each row of the table represents an entity or occurrence, while each column represents one and only one attribute of the entities or occurrences. When writing SQL, the vast majority of the time, we need only a small subset of the columns or attributes. It should be rare that we’re returning everything. It is actually a bad coding practice to use “select *” for two reasons:
- Even if we need every column, SQL written for applications should generally specify the column names to future-proof the statement. Specifying the column names means that the result set cannot suddenly change shape when a column is added later. While some programming languages can handle unexpected columns, others cannot.
- The less data returned, the better performance. While this statement is not always true, it is true in the vast majority of situations. In some applications, particularly when using BLU (column-organized tables), the performance can be exponentially better.
Even if the tables are not fully normalized as we would expect in a relational database, the above still holds true. Many relational databases are not fully relational in their implementation.
Returning only Certain Columns/Attributes
The syntax used for returning only certain columns is quite simple. Insted of *
, we simply specify the names of the columns that we want to return. In the last blog entry in this series we used this example:
$ db2 "SELECT * FROM SYSCAT.BUFFERPOOLS" BPNAME BUFFERPOOLID DBPGNAME NPAGES PAGESIZE ESTORE NUMBLOCKPAGES BLOCKSIZE NGNAME -------------------------------------------------------------------------------------------------------------------------------- ------------ -------------------------------------------------------------------------------------------------------------------------------- ----------- ----------- ------ ------------- ----------- -------------------------------------------------------------------------------------------------------------------------------- IBMDEFAULTBP 1 - -2 8192 N 0 0 - 1 record(s) selected.
More useful than that is to return only select column names. Try this instead:
$ db2 "select bufferpoolid, bpname, pagesize from syscat.bufferpools" BUFFERPOOLID BPNAME PAGESIZE ------------ -------------------------------------------------------------------------------------------------------------------------------- ----------- 1 IBMDEFAULTBP 8192 1 record(s) selected.
The above information may be far more useful if all I need to know is the name of each buffer pool and its page size.
What are the Column Names for a Table?
Sometimes you don’t know the data structure for a table you’re querying. In this case, you can find that information by using the DESCRIBE
command. It is used at the command line like this:
$ db2 describe table syscat.bufferpools Data type Column Column name schema Data type name Length Scale Nulls ------------------------------- --------- ------------------- ---------- ----- ------ BPNAME SYSIBM VARCHAR 128 0 No BUFFERPOOLID SYSIBM INTEGER 4 0 No DBPGNAME SYSIBM VARCHAR 128 0 Yes NPAGES SYSIBM INTEGER 4 0 No PAGESIZE SYSIBM INTEGER 4 0 No ESTORE SYSIBM CHARACTER 1 0 No NUMBLOCKPAGES SYSIBM INTEGER 4 0 No BLOCKSIZE SYSIBM INTEGER 4 0 No NGNAME SYSIBM VARCHAR 128 0 Yes 9 record(s) selected.
This gives me an easy formatted list of every column in the table or view, and what data type each column uses. Many GUIs also provide easy access to this information.
More complex entries in this series are coming, so stay tuned!