DB2 Basics: What is BLU?

Posted by

BLU has been all the hype for DB2 for the past year or more. What exactly is DB2 BLU? How does it work? What do you need to consider before jumping on the bandwagon?

What is DB2 BLU

BLU leverages the DB2 engine but is a significantly different animal in many ways from DB2 without BLU. BLU combines a number of things to get blazing fast performance for analytics workloads. IBM is working on it, but BLU is NOT for OLTP or e-commerce database workloads at this time – it will actually slow them down.

Columnar

While traditional relational database tables are organized by and worked with in terms of rows, BLU works with things on a columnar basis. It still maintains the row relationships, but when it stores data on pages, it does so by column and not by row. This adds some compression efficiency and speed when summarizing data

Compression

One of the pillars of BLU is “actionable compression”. This is the compression that DB2 has used to date, with some additional enhancements and the page-level compression efficiency of having many values for the same column on a page. The actionable part is something DB2 is proud of, and should be – many comparisons and such can be done on the data without uncompressing it, making not just the amount of data on disk smaller, but also the amount of data in memory smaller as well.

In-Memory

BLU is not an exclusively in-memory database – it couldn’t be for analytics. But it uses more in-memory techniques than DB2 DBAs may be used to dealing with. It is memory and CPU intensive as a result. IBM recommends that you have 10% of the compressed size of your database in memory to handle the memory requirements – and make sure you get the latest compression of your data before panicking at that suggestion – compression can be significant if you’re coming from uncompressed to compression or even one of the previous version’s compression to 10.5 compression. IBM recommends that you use no fewer than 8 cores for BLU databases.

The data still goes through DB2’s tried and true bufferpools. Which is an advantage, really, since it’s not some new start-up that hasn’t worked out the best algorithms for what data is in memory. If you want to tell DB2 which tables have priority in memory, you can put them in separate tablespaces/bufferpools to reserve the space for them. You don’t have to wait to query tables until they’re loaded in memory, and you still get the benefit of prefetching when you query something you haven’t hit in a while.

Obsolescence of Indexes

Since it is such a different animal, IBM does not even allow the creation of indexes in BLU. As a DBA, this feels wrong in some ways, and I wonder if IBM will allow the use of indexes in future versions at all. This does seem to make the syntax a lot easier in some ways. IBM does still allow primary key and unique key constraints, but the “CREATE INDEX” statement will just fail with SQL1667N. I’m fascinated to see how well this works.

Limited support for MQTs

With 10.5 FixPack 4, MQTs became supported with BLU, but your MQTs must have the same organization (row or column) as the tables they are based on, and must maintained either manually or by replication. SYSTEM maintained tables are not supported, which means that if you must use MQTs, then you have to develop a plan/script for maintaining them. I wonder if they’re also thought to be obsolete like indexes.

Maintenance

IBM is pushing automatic maintenance with BLU. I’ll share my opinion on that once I’ve worked with it some.

What BLU is Not

BLU is not the magic bullet for all performance problems. We hear great edge cases from IBM, but do you think they’d trumpet any mediocre results? I’m just getting my hands on it in the real world, and am excited to see how it does in real use.

BLU is not for OLTP, e-commerce, or transactional workloads – singleton row update/insert/delete can be expected to be 2-3 times slower on BLU. They’re working on that with shadow tables and more behind the scenes stuff, but it’s just not there yet. You can mix row-organized and column-organized tables in a single database, but the tuning is complex to make that work.

Many of the tenets of good database administration still apply. Consider your recovery, high availability, and disaster recovery plans. Think about table placement in tablespaces and the mapping of bufferpools. Keep up with new fixpacks, and so forth.

How to get BLU

BLU is included with the advanced editions of DB2 – so you can’t get it in express-c, but you have it in Advanced Workgroup Server Edition (AWSE) and Advanced Enterprise Server Edition(AESE). It became available in DB2 10.5, and don’t go for anything lower than FixPack 4 if you plan to use it. Always verify my statements with IBM for licensing issues, of course.

I’m working with BLU some myself now – expect more blog articles as I face and tame some of the real world challenges.

Lead Db2 Database Engineer and Service Delivery Manager , XTIVIA
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 17 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home for XTIVIA, leading a team of Db2 DBAs.

4 comments

  1. Awesome! Been looking for down and dirty, in the trenches articles like this. Please keep writing along this vein!

    Also, I believe if I’m not mistaken I’ve heard that HADR did not work with BLU until Cancun? And I believe BLU does not work with pureScale???? Does it work with DPF then? Or is it basically still single node?

    Any other features to be aware of that you know of?

    1. Right, Cancun (DB2 10.5 FixPack 4) brought us HADR with BLU. BLU does not currently work with pureScale or DPF, though I hear that getting it working with DPF is in the works. I’m not sure if pureScale is on the roadmap or not – I think of pureScale as more for OLTP/transactional databases, and BLU is more for Analytics workloads.

      There is limited MQT support in BLU – system maintained is not available, and I’d like to see more work around unique indexes rather than unique constraints – the not allowing nulls of unique constraints bugs me. DPF is obviously a big one. I’m sure I’ll learn more as I go along. I think there was some SQL syntax that was not supported early on that is in later versions.

  2. Great article as always!
    We are running BLU v10.5.4 in Production on some smaller BI databases and the performance improvement is astounding, e.g. some queries that took 5 minutes to run in BLU v10.5.4 row based run in less than a second in BLU. We have found that mixing a row table and a BLU table in a query does not perform well. We try and make sure all the tables involved in a query are BLU even if a reference type table is small we still convert it. The compression is fantastic on our data. Our production database is now a third of its original size, even with the additional synopsis tables in the mix, and not all tables are BLU. Of course, some tables have negative compression rates if they are small as BLU uses 32K pages but it does not matter in the greater overall gain – its worth the hit. We have done no tuning and not needing to tune indexes has freed up a lot of time. We have kept all our primary indexes in place.
    Another nice thing about BLU is the time it saves when developing new projects. We have just put our second BLU project into Production since going BLU and the time to Production has reduced as the development and iterative test runs are much faster. Development databases benefit from having more sortheap (within best practice ratios) allocated as it allows the developers to do more investigative type queries that would not be ever be used in Production and saves them time. We are able to deliver more user stories as a result. When you are in scrum meetings you can feel that the pace of development has increased and it shows in the deliverables. Our data warehouse is a very good ft for BLU and our issues using BLU have been few and far between. Mostly, other teams are wondering what they can offload onto the BLU database as the performance is so much better. This is an issue that needs to managed as I personally don’t feel a “rainbow” database is a good thing even if the performance BLU offers is superb.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.