Scripting for DB2 LUW – Best Practices for Perl and Other Languages – Part 1

Posted by

This is my own set of basic scripting standards and practices. I’d love to hear criticisms, suggestions and comments from readers on these. Some of these would be basic “duh” moments for a developer, but for a DBA from a different background, may be more enlightening. This ended up being so long that I’ve split it into three parts.

Script Header

Each of my scripts starts with a header with basic information about the script. The header looks like this:

#!/usr/bin/perl -w
# Script Name   :
# Author        : Ember Crooks
# Company       : Rosetta
# Date Written  : July 2008
# Script Version: -v
# DB2 Version   : 8.2
# Purpose       : To reset database monitors, sleep, then take snapshots
# Usage         : -h
# Cfg file      :
# Change Log    :
# Date     Chg by       Ver.    Description
# -------- ------------ ------- ---------------------------------------
# 07/17/09 Ember Crooks 1.0.1   Fixed file archiving and retention
# 07/2010  Ember Crooks 1.1.0   ****CFG FILE CHANGE****
#                                Added multiple database support
# 08/2010  Ember Crooks 1.1.1   Bug Fix: One db failing should not stop
#                                snapshots for other databses being
#                                written.
# 11/2010  Ember Crooks 1.1.2   Bug Fix: Archiving failing

The main point here is to have the basic information right at hand. Who wrote the script (so I can contact them if I need help later)? How long ago was the script written? What DB2 Version was it written for? What’s the basic overall function of the script? How do I get usage and version information? What is the config file (more about config files later)? And finally what major changes has been made so that I know if I encounter a problem and look at a more recent version if it’s likely to be fixed, and also if there’s a change to the config file so I have to re-do it. I probably refer to this section of one script or another once a week when I’m tracking down an issue or reviewing servers to make sure I’ve got the more recent versions out there.

Track Versions

It becomes obvious to anyone who has to maintain and troubleshoot a suite of scripts over time that versioning is critical. Is this the version of my backup script from before I fixed that bug or after it? You don’t know unless you do at least primitive versioning. To me, the most critical part here documentation of what makes this version different from another – and I track that in the script header – which is reasonable for the small stuff I write. I do also store my scripts in a version control system. I happen to use Mercurial because it is what the other technical minds at my company chose. You could use subversion or git or whatever. The features of the tool are not as important as they are for full-fledged code development because the number and size of scripts are comparatively small and the number of people working on them is also small. Even if you don’t have a repository like this, just a central location that makes sense for your company and teams will work – a SharePoint site, a Lotus Notes work space, a Wiki (all of which I’ve used before) – whatever.

Naming Standards

You should have your own naming standards for several different items. This will help you identify items and make sure your code is understandable and clean. Here are the naming standards I generally use:

  • Script names: – where script_purpose is replaced with a 1-3 underscore delimited all lower case definition of the general purpose of the script
  • Config file names: – where script_purpose is the same as the script the configuration file belongs to
  • Script output: db2.script_purpose.time.log – where script_purpose is the same as the script the configuration file belongs to, and time is some sort of time identifier
  • variable names: variable_name – preferring long descriptive multi-word variable names, all lower case and underscore delimited. Usually plural for a hash or array
  • subroutine names: subroutine_name – preferring long descriptive multi-word subroutine names that usually include an action word, all lower case and underscore delimited
The important thing is not what naming standard you pick so much as it is having a naming standard. Like Camel Case? Awesome, go for it. I chose my underscore heavy naming standard because it reads more naturally to me. I also have a special hatred of short variable names that don’t tell me what the variable is – the code is so much harder to read that way and when we are talking about scripts that are almost always (drastically) less than 2,000 lines of code, the savings just isn’t worth it.

Use Modules/Subroutines/Functions

There are many pieces of code that you can re-use within a script, or sometimes even from script to script. It can be annoying as heck to try to update and improve a script that is just one long sprawl of code. Common subroutines that almost all of my scripts have include:

  • init_script – anything that has to be done before anything else – opening output files, setting variables that I’ll use throughout, that sort of thing.
  • version – returns version number
  • usage – returns a syntax diagram and description – which I love using a here document for
  • get_user_input – parses command line input, and frequently ends up calling the other subroutines based on that input
  • db2Error* – a collection of subroutines I use for processing the output of a db2 command for errors
  • log_cleanup – deletes older output files of this script
  • arch_file – archives(compresses and moves) older output files of this script
And then of course, I have other subroutines to perform the actual work of the script, which vary by script.
One big thing I try to do along these lines is to notice when I’m doing something 2 times or more and think about putting in a subroutine for that. Clearly you can take it to extremes – this is, after all, just a script, and not a large compiled and commercially sold program –  finding the right balance is key.

Handling script output

There are two main approaches here:

  1. Your script sends everything to std out, and you then put it somewhere via the command line or cron every time you execute the script
  2. Your script writes output to a pre-determined file

The disadvantage with the first approach include the possibility of forgetting or someone not realizing the only output is what is on their screen and losing it. Also, if you use cron to redirect it to a file, you either have to build in script naming with a date factor in your cron entry or overwrite your output every time. You also would have to have a separate process to clean up old output files.

I’ll address the second approach, as it’s my preferred approach. With this approach, you specify an output directory and output file name in the script name. my naming standard for script output is: script_name.yyyymmddhh.log where yyyymmddhh represents the time at which the script was executed. I also make the output directory something that is easily changed in my script’s config file so that I can easily accommodate different filesystem or directory structures. Yes, I have a default directory structure, but I don’t count on having it because varying clients have an amazing array of varying requirements for such things.

Deleting older log files

And because of this approach, I also have to go out and delete the output files when they’re old. The majority of my scripts have archiving and retention parameters in the config file, and then on every execution will go out and archive and/or delete files based on those parameters. It’s a piece of code that is easy to re-use from script to script.

There’s also a trick that some people use that’s pretty slick. If you name your files like this: script_name.dd.log where dd represents the day of the month, you will automatically keep just one month’s worth of files without having to go out and delete anything.

I’m generally not a fan of scripting such that all output is written to one big long growing file. I find it easier to manage individual files.

Platform consideration and output files

This is also one of the main areas where you have to consider platform. Windows slashes fall the wrong way! So I code using a few Perl core modules – File::Path, File::Copy, and File::Spec. Using these will avoid things like this:

my $db_arch_dir='$arch_dir/$db_name';

In favor of this:

my $db_arch_dir=File::Spec->catdir($arch_dir, $db_name);

It’s really not hard to do this up front to ensure you can support both flavors of file specification. I’ve been able to port scripts mostly tested and used on Unix/Linux systems to Windows with a very small amount of effort due to thinking about such things up front.

Keep an eye out for Part 2 later in the week.

Lead Db2 Database Engineer and Service Delivery Manager , XTIVIA
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 17 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home for XTIVIA, leading a team of Db2 DBAs.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.