BLU has been all the hype for DB2 for the past year or more. What exactly is DB2 BLU? How does it work? What do you need to consider before jumping on the bandwagon?
What is DB2 BLU
BLU leverages the DB2 engine but is a significantly different animal in many ways from DB2 without BLU. BLU combines a number of things to get blazing fast performance for analytics workloads. IBM is working on it, but BLU is NOT for OLTP or e-commerce database workloads at this time – it will actually slow them down.
While traditional relational database tables are organized by and worked with in terms of rows, BLU works with things on a columnar basis. It still maintains the row relationships, but when it stores data on pages, it does so by column and not by row. This adds some compression efficiency and speed when summarizing data
One of the pillars of BLU is “actionable compression”. This is the compression that DB2 has used to date, with some additional enhancements and the page-level compression efficiency of having many values for the same column on a page. The actionable part is something DB2 is proud of, and should be – many comparisons and such can be done on the data without uncompressing it, making not just the amount of data on disk smaller, but also the amount of data in memory smaller as well.
BLU is not an exclusively in-memory database – it couldn’t be for analytics. But it uses more in-memory techniques than DB2 DBAs may be used to dealing with. It is memory and CPU intensive as a result. IBM recommends that you have 10% of the compressed size of your database in memory to handle the memory requirements – and make sure you get the latest compression of your data before panicking at that suggestion – compression can be significant if you’re coming from uncompressed to compression or even one of the previous version’s compression to 10.5 compression. IBM recommends that you use no fewer than 8 cores for BLU databases.
The data still goes through DB2’s tried and true bufferpools. Which is an advantage, really, since it’s not some new start-up that hasn’t worked out the best algorithms for what data is in memory. If you want to tell DB2 which tables have priority in memory, you can put them in separate tablespaces/bufferpools to reserve the space for them. You don’t have to wait to query tables until they’re loaded in memory, and you still get the benefit of prefetching when you query something you haven’t hit in a while.
Obsolescence of Indexes
Since it is such a different animal, IBM does not even allow the creation of indexes in BLU. As a DBA, this feels wrong in some ways, and I wonder if IBM will allow the use of indexes in future versions at all. This does seem to make the syntax a lot easier in some ways. IBM does still allow primary key and unique key constraints, but the “CREATE INDEX” statement will just fail with SQL1667N. I’m fascinated to see how well this works.
Limited support for MQTs
With 10.5 FixPack 4, MQTs became supported with BLU, but your MQTs must have the same organization (row or column) as the tables they are based on, and must maintained either manually or by replication. SYSTEM maintained tables are not supported, which means that if you must use MQTs, then you have to develop a plan/script for maintaining them. I wonder if they’re also thought to be obsolete like indexes.
IBM is pushing automatic maintenance with BLU. I’ll share my opinion on that once I’ve worked with it some.
What BLU is Not
BLU is not the magic bullet for all performance problems. We hear great edge cases from IBM, but do you think they’d trumpet any mediocre results? I’m just getting my hands on it in the real world, and am excited to see how it does in real use.
BLU is not for OLTP, e-commerce, or transactional workloads – singleton row update/insert/delete can be expected to be 2-3 times slower on BLU. They’re working on that with shadow tables and more behind the scenes stuff, but it’s just not there yet. You can mix row-organized and column-organized tables in a single database, but the tuning is complex to make that work.
Many of the tenets of good database administration still apply. Consider your recovery, high availability, and disaster recovery plans. Think about table placement in tablespaces and the mapping of bufferpools. Keep up with new fixpacks, and so forth.
How to get BLU
BLU is included with the advanced editions of DB2 – so you can’t get it in express-c, but you have it in Advanced Workgroup Server Edition (AWSE) and Advanced Enterprise Server Edition(AESE). It became available in DB2 10.5, and don’t go for anything lower than FixPack 4 if you plan to use it. Always verify my statements with IBM for licensing issues, of course.
I’m working with BLU some myself now – expect more blog articles as I face and tame some of the real world challenges.