The keyword in the title of this post is “basics”. I spend some time training SQL Server or Oracle DBAs on DB2, and so this is the answer on the most basic level to how to identify indexes to add. There are certainly greater experts on this topic than me, so I don’t claim to have a comprehensive view on Indexing. But at the same time, there is a basic place where you can start and build your own strategy from there. This applies to all DB2 databases, though I’ve thrown in WebSphere Commerce database specifics as well.
Why is indexing important?
Indexing is important for database performance. Especially in an e-commerce database, we see a lot of what we call singleton-row queries. These are queries that return just one row. Ideally, DB2 will look up that one row in an index, and return it without reading much data. If an appropriate index is not available, the query will have to do a Table Scan, where it evaluates each row of the table one by one to see if it meets the query’s requirements. Even for the queries that return multiple rows, it is usually much more efficient to use an index to access the data than to scan through every row in what may be a million-row or more table.
I have personally seen just 4 added non-clustering indexes take a WebSphere Commerce database from un-usably busy (website down) to just normally active. And that was on one of the larger database servers I’ve seen lately. Indexes make a big difference.
How to identify where indexes might be helpful
The most basic way to identify where indexes may be useful is to identify problem SQL, particularly problem SQL that shows up as a problem in the area of Rows Read. The main reason for this is that frequently a high number of rows read indicates table scans are occurring. Once we have the SQL identified we can run the Design Advisor(which I describe how to do in SQL Analysis Overview). The Design Advisor is a starting point. It will come up with index recommendations for indexes based on the SQL you provide as input, and will also provide you with an estimate of how much that index will help the query you provided as input.
So once you have the Design Advisor recommendations, you’re not done yet. The first thing you need to do as look at the tables for which indexes are recommended and the existing indexes on those tables. Whenever you add an index, it improves select performance, but it also invariably hurts insert/update/delete performance. Each index is also a physical object that takes up space on disk. Here are some of the questions I ask myself:
What kind of table is this index on?
I have three different categories I place the WebSphere Commerce tables in for this purpose:
- Product catalog tables/Stagingprop tables – These tables are usually only updated in your production database during a stagingprop – the end-user activity on these tables is strictly select activity and can be either singleton-row or multiple row. Because of this activity pattern, I’m likely to add indexes because they’re unlikely to hurt where we care most about performance. There are limitations to that, of course. Examples include: CATENTRY, ATTRIBUTE, ATTR
- Order/User related tables – These tables are ones where an end-user will be sitting and waiting for an insert or update to complete. The access to the tables is nearly always singleton-row. Due this activity pattern, I am extremely critical of any index additions – I’ll still do them, but they have to be extreme improvements for qureies that are frequently executed. Examples include: USERS, MEMBER, ORDERS, ORDERITEMS
- Configuration/Other tables – These tables are rarely changed, though they may be frequently accessed. They are not directly associated with a user, order or product. I’m quite likely to add indexes because of the use patterns, though it’s rare the SQL shows up as the worst since the tables are small to medium sized and the frequency of queries is generally low. Examples include: SITE, SRCHCONF, STOREENT
What other indexes exist on the table?
There are a lot of judgement calls in this process, and one of them is the level of data duplication you’re willing to do. By it’s very definition any index is a duplication of data on disk. Considering each index is a balancing act. There are situations in which having indexes on say (CATENTRY_ID, PARTNUMBER) and another on (PARTNUMBER, CATENTRY_ID) is a clear duplication of data, but if you have a couple of different queries that are very frequently executed, it may be worth it (there would be some convincing to do). When looking at the output from the Design Advisor, consider both the % improvement you will see and the frequency of execution of the query. A one-time query rarely deserves its own index.
One place I tolerate nearly identical indexes is when the index I’d want to drop is one of the base WebSphere Commerce indexes – they shouldn’t be dropped.
Syntax and other considerations
Assuming you’re running the Design Advisor on just one problem query, it’ll probably include some drop index statements in the recommendations. DO NOT DROP THE INDEXES THE DESIGN ADVISOR RECOMMENDS YOU DROP. Those recommendations are based on the the idea that the query you provided as input is the only query every executed against the table(s) in question. That’s basically never true, so don’t follow the recommendations for dropping indexes.
Please change the index name. You can pick whatever naming standard you like. I tend to prefer TABLENAME_IX01, but there are many thoughts on the issue – just don’t use the cryptic ones that db2 comes up with.
Also, I generally remove the sytax on DB2 9.7 for sampled statistics, and just use my favorite runstats statement after the index creation:
db2 "runstats on table schema.tablename with distribution and detailed indexes all"
After creation verify the index(es) are being used
Once you’ve created the indexes and done runstats, run an explain on the query you were indexing for to verify that query is now using the index. This is important to verify. If it’s not using it, look at things like additional Design Advisor suggestions and so forth.
Other ways of identifying indexes
So the Design Advisor is one way of identifying indexes. It doesn’t find everything, though. Once you get good at reading an explain plan, you can play around and see what you can find yourself. I look first for places where table scans are happening and see if I can put the information from that part of the query into an index. I then look at the timerons for each part of the query, and locate the more expensive actions and see if perhaps having the data in order in an index might help me avoid a sort or use a different kind of join. This is even more of an art and a matter of trial and error than using the Design Advisor, but I have been able to find modest improvements in this way that the Design Advisor missed.
Other than starting with the SQL, another interesting approach is to start at the table level. You could start by calculating the Index Read Efficiency for each table, and find your tables with problems. From there you can specifically look for SQL related to that table and then step into the above process from there.
Indexing restrictions
An index in DB2 9.7 is limited to 64 columns, but the more limiting factor is that an index key can only be up to 1022 bytes – this means that there may be some indexes you want to create that you cannot. You also cannot include CLOBs in indexes (would you really use a clob for data that was only 1022 bytes or less?).
There are also some uses of functions that will preclude the use of an index by a query. So watch the use of functions in SQL. A simple UPPER in the wrong place can cause db2 to choose a table scan instead. Here’s hoping DB2 adds support for functions within indexes – there are cases where I would love to be able to create an index not just on say NAME but on UPPER(NAME). To my knowledge, this is not possible in DB2, though I believe Oracle now supports it.
References/Other Ideas
So these are only the very most basic ideas on indexing. There are many more considerations like improving performance through removing indexes, adding indexes on tables with as little as one row, and at least a dozen other ideas out there. Here are a few links to interesting thoughts on indexing:
http://edn.embarcadero.com/article/39887 (great presentation on more in-depth concerns with indexes)
http://www.dbisoftware.com/blog/db2_performance.php?id=131
http://www.ibm.com/developerworks/data/library/dmmag/DMMag_2010_Issue4/DataArchitect/index.html
http://www.ibm.com/developerworks/data/library/dmmag/DMMag_2011_Issue1/DataArchitect/index.html
http://www.ibm.com/developerworks/data/library/techarticle/0303kuznetsov/0303kuznetsov.html
http://catterallconsulting.blogspot.com/2009/05/much-ado-about-db2-indexes-part-2.html
Overall, indexes can be extremely helpful, but you should consider each one very carefully – don’t just blindly add them, even if the Design Advisor recommends them.