Informational Constraints – Benefits and Drawbacks

Posted by

Krafick_HeadshotOne of the most frustrating things a DBA can experience is troubleshooting due to bad data. The client is upset because rows are missing or incorrect data is returned.  The client facing web front end could be displaying gobilty-gook because the data retrieved makes no sense. Resources and energy are burned because of an issue is easily solved with the proper use of constraints.

So why would I ever want a check constraint to be created on a table but NOT ENFORCED?

Constraints are a double-edged sword. They protect the database from poor data quality at the cost of overhead associated with that constraint. If you load a large amount of data that must be verified with constraints you can add some significant overhead.

If the application is configured to verify the data being fed into the database, do we really need the additional overhead of database constraints? Data verification in both the application and the database is a duplication of effort and leads wasted resources and elongated time.

A constraint that is NOT ENFORCED informs the database manager what the data would normally look like and how it would behave under constraint, but data that violates the constraint is not prevented from accessing the target table. This is called an informational constraint.

What is the benefit of an informational constraint? Optimizer can base its query plan on what it believes the data should look like because of the constraint’s definition. This leads to improved performance.

The catch? What happens with bad data? Let’s take a look at an example.

In this specific scenario, assume we have a simple table that is the back end of a project management tool. On the web front end, the application takes in a project name and its current status. Responses can be “Green” for a project in good health, “Yellow” for a project facing challenges, and “Red” for a project at a standstill.

The table definition could look like this (Notice the NOT ENFORCED clause):

 

CREATE TABLE COMMAND with an informational constraint.
CREATE TABLE COMMAND with an informational constraint.

 

Once the data is inserted, we have the following project data:

Project Management Data

 

This is great! If the application is vetting data before it comes in, the constraint is NOT ENFORCED so there is no overhead and the ENABLE QUERY OPTIMIZATION clause tells DB2 that it can lean on this constraint to help generate a great optimization plan.

But what happens if the data is not vetted by the application.  For example:

What happens when bad data is inserted?
What happens when bad data is inserted?

The table would be loaded with incorrect data, invalid for queries to the database. Our table now is in the following state:

 

Bad data bypasses a NOT ENFORCED constraint and is inserted
Bad data bypasses a NOT ENFORCED constraint and is inserted

 

Here is the weakness of an informational constraint. Optimizer assumes  rows that violate the constraint don’t exist in the table. As a result, a query may not include the invalid rows when they should be returned. Although they exist, DB2 is convinced otherwise because of the informational constraint.

So the following query could return the following output:

Returned rows are missing data that is actually there.
Returned rows are missing data that is actually there.

 

Notice that optimizer believes the rows can’t be there so the invalid rows are not displayed. IBM’s Knowledge Center (as well as other training material) states that the rows may not display. This leads me to believe that two separate queries could return two separate results. Bad data now leads to inconsistent queries.

This weakness of an informational constraint could be too large point of failure for some administrators. However, the benefits of an informational constraint seem to outweigh the risks if your application is configured properly. Imagine the overhead saved and speed increase on a massive data warehouse with an overnight load cycle of hundreds of gigs.

 


Michael Krafick is an occasional contributor to db2commerce.com. He has been a production support DBA for over 12 years in data warehousing and highly transactional OLTP environments. He was acknowledged as a top ten session speaker for “10 Minute Triage” at the 2012 IDUG Technical Conference. Michael also has extensive experience in setting up monitoring configurations for DB2 Databases as well as preparing for high availability failover, backup, and recovery. He can be reached at “Michael.Krafick (at) icloud (dot) com”. Linked-in Profile: http://www.linkedin.com/in/michaelkrafick. Twitter: mkrafick

Mike’s blog posts include:
10 Minute Triage: Assessing Problems Quickly (Part I)
10 Minute Triage: Assessing Problems Quickly (Part II)
Now, now you two play nice … DB2 and HACMP failover
Technical Conference – It’s a skill builder, not a trip to Vegas.
Why won’t you just die?! (Cleaning DB2 Process in Memory)
Attack of the Blob: Blobs in a Transaction Processing Environment
Automatic Storage Tablespaces (AST): Compare and Contrast to DMS
DB2 v10.1 Column Masking
Automatic Storage (AST) and DMS
Reloacting the Instance Home Directory

Michael Krafick is an aspiring Zamboni driver and well known twig-eater. During the day he is a Sr. Database Engineer for a Fortune 500 company. He is a frequent contributor to datageek.blog, an IBM champion, member of the DB2 Technical Advisory Board, and IDUG speaker Hall of Fame inductee. With over 18 years experience in DB2 for LUW, Mike works hard to educate and mentor others.

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.