I started out with a DB2 Basics entry on BLU. That starts with the most basic concepts. In this post, I’d like to delve into some of the details of BLU and talk about when BLU is beneficial and when it may be detrimental and why.
BLU Workloads
If you’re considering BLU consider first that it is not for all workloads. There are still a number of limitations – though I’m quite happy to see that it’s now supported with HADR. BLU is designed for analytics workloads. For appropriate analytics workloads, it may be very fast and I’ve heard some DBAs who are very positive on it. However for OLTP or workloads that are focused more on singleton row operations, BLU is more likely to be detrimental to performance. I have it from good sources that IBM is working on that, but they’re not there yet, and OLTP workloads can expect to be 2-3 times SLOWER on BLU than without BLU.
BLU is an interesting mix of existing parts of the DB2 technology and some wholly new things. Under the covers, you can think of DB2 as essentially having two engines. The BLU engine is used only when dealing exclusively with column-organized tables (see more on that below). When traditional row-organized tables are part of the picture all the work is done in the traditional DB2 engine, without engaging the efficiencies of the BLU engine.
The 7 Big Ideas of BLU
I want to talk about BLU by going through the 7 big ideas behind BLU. Some of these, like some of the compression, you can get some of the benefits of in a more traditional table format. IBM also has plenty of marketing material on all this, but I want to add my own commentary. I don’t claim to hold a candle to the incredible minds behind BLU, but want to share my perspective.
Idea 1: Column Organized Storage
This is one way where BLU tables are significantly different than traditional DB2 row-organized tables. In the normal way that DB2 does things before BLU, data is grouped by rows in a table – these rows are placed on to pages. A page can only have data from one table, and belongs to one tablespace container in one tablespace. A DB2 row must fit on a single page (assuming you’re not using the extended row size feature, which I would recommend strongly against unless you have no other choice). DB2 will place as many rows as it can fit on a page, with the maximum number of rows per page limited to somewhere between 251 and 2,335 depending on tablespace type and page size. In any case, DB2 would move a row to another page if it couldn’t fit the results of an update on the existing page.
Since the minimum unit of I/O is a page (or an extent depending on how you look at it), all of a row had to be processed in order to get any results about that row (eliminating for the moment that indexes could be used to access smaller subsets of the data).
With column-organized tables in BLU, the data is sliced and diced differently. Data is grouped by column instead when being placed on pages. So a row absolutely will span multiple pages. This is one of the big reasons OLTP and singleton-row workloads perform so badly on BLU. With BLU when I need just one row, instead of it being almost guaranteed to be on a single page, now my row is instead guaranteed to be spread across multiple pages, and now I have to read more pages to get the same data. Whether that data is prefetched or not, it takes CPU cycles to access it.
Idea 2: Simplicity
IBM touts this, and I see it on several different levels. BLU is presented as LOAD and go. In some ways, this is true, but this can also introduce frustration. First, there is a phase added to the LOAD operation that can take quite a while. When data is first loaded into a BLU table, the data being loaded is scanned twice – once for this analyze phase, and once for the LOAD phase. IBM says that the lack of the slowdowns that indexes cause to LOAD make up for this time wise.
That’s right, indexes are not just discouraged, but not allowed on BLU tables unless they are part of a Primary Key or Unique constraint. I spend a lot of time with indexing, so this is hard to wrap my head around. But think about it – in a way indexing is simply a way to try to emulate column-organization – allowing us to directly access only certain columns of the table without accessing others.
The simplicity may be part of the speed. BLU eliminates triggers, system maintained MQTs, foreign keys, and even identity columns – all of which can introduce additional processing in various scenarios.
All this simplicity might cause a few more complications when you’re first converting an existing real world database to BLU. A number of those eliminated things are things that are pretty core to a database. Coming up with plans to replace those things through your application or scripting can be quite complicated depending on your scenario. In some ways, IBM kind of went back to the drawing board and really pared down what functionality a database offers. I’m not saying this is bad – in my recent conversion of a small DPF database to BLU, each of these were hurdles that my client hardly balked at – dozens of tables with identity columns? No problem, we’ll populate that through the app! Foreign keys? Don’t need ’em! This is one area that really drives home the fact that BLU is not for all workloads.
Idea 3: Actionable Compression
This is actually a feature of DB2 that can largely be used without BLU. DB2 has some incredible compression technologies that all work together at different levels. A table-level compression dictionary using approximate-huffman encoding (most frequent values take the least space), prefix compression, and offset compression. Additional page-level compression. With each recent version, IBM has offered greater and greater compression.
Think about the combination of column-organization and page-level compression. Now if a column has a lot of similar values that happen to fall on the same page, that can really help compression. For this reason, it’s recommended that if possible data loaded into a BLU table be sorted on columns that might benefit from this – particularly the ones frequently queried or joined on.
Another nifty feature of DB2’s compression (with or without BLU) is DB2 can do comparisons, aggregations, and even joins without uncompressing the data. This doesn’t just save the CPU that would be used for uncompression, but uses less space in memory and means DB2 is working with smaller pieces of data. From my understanding this works in the traditional DB2 engine as well as in the BLU engine.
Getting the best possible compression is entirely critical to BLU performance. I’ve heard this from multiple DB2 performance experts. I’ve even now come to believe that it is worth the longer analyze phase to let it run against the whole of data I’m loading into a BLU table rather than trying to come up with a representative sample. I’ve found the ANALYZE phase to be longer than the rest of the LOAD, but that’s where a lot of the magic happens.
Idea 4: Core-Friendly parallelism
I have heard repeatedly from IBM that memory is too slow. DB2 wants to be doing as much as possible in the L2 caches that are right on the CPU. There are some really smart people integrating with Intel and the chip makers to get the most out of the chip and really use that octo-core you might have in parallel. They’re even doing some experimental work with video cards – which have many, many tiny cpus – interesting stuff, that.
Idea 5: Single Instruction, Multiple Data (SIMD) or Vector Processing
Another area where IBM is really integrating with the chip makers. This kind of action is a feature that modern chips offer, where the same action can be applied to multiple pieces of data in an efficient manner.
Obviously I’m a bit less opinionated on the chip details – I know much less in that area. But these do mean that BLU is more likely to use whatever CPU you throw at it – DBAs may have to get more used to running higher on CPU utilization without panicking.
Idea 6: Scan Friendly Caching
DB2 has had efficient caching algorithms to decide what should stay in the bufferpool and what should not for years – that algorithm is well developed and continually advancing. BLU leverages in-memory, but it does so while using decades of experience in managing what stays in memory and what doesn’t. I don’t understand this Idea as well as the the others, but BLU does require and use more memory and do more in memory than a non-BLU DB2 database.
Idea 7: Data Skipping
BLU creates synopsis tables that let it have more meta-data about your BLU tables. These can be thought of a bit like a negative index. For each chunk of values (1024) in a column, the synopsis table will store the minimum and the maximum value, allowing some chunks to be skipped when the value being looked for is not between the minimum and the maximum for that chunk. This data isn’t just used to avoid I/O – it’s also used to avoid working with data in memory. Remember, memory is the new disk, too slow to use unless you have to.
Mixing Row-Organized and Column-Organized Tables
The moment you join a row-organized and a column organized table, most of the benefits of BLU go out the window. DB2 will use the traditional engine rather than the BLU engine. For this reason, you should mix the two only in well-defined scenarios for well-defined reasons. When you aren’t using the BLU engine, you still don’t have the indexes you would have in a fully non-BLU implementation, so you should end up with even worse performance than with a properly indexed non-BLU implementation.
IBM is offering up “Shadow Tables” in FixPack 5 of 10.5, which are column-organized copies of row-organized tables that are maintained with replication technology. Somewhat like an MQT that is the whole table, but maintained by replication. I think IBM can do better on the implementation of these – to me they have the kludgy feel of how something is done before IBM comes up with a better way to do something in a later release. I hope to see improvement in the implementation – an actual MQT system maintained sounds better. I’m sure it’s not easy, but I’m confident that a smoother solution must be coming. The concept of shadow tables is critical to let us mix OLTP and Analytics workloads – which for better or worse, most of my clients insist on doing.
Summary
Overall, BLU offers some interesting concepts, some incredible performance stories, and some new challenges for DBAs. Having worked with it some, I’m even more excited to pick the experts’ brains at the next conference!
Hi Ember,
One minor clarification about the last part, when row and column-organized tables are used in the same query, we do try to use the column “engine” as much as possible (e.g. for scans, local column joins, etc) before changing over to run the rest in the traditional row “engine”. How much is used depends on the query itself and the plan chosen by the optimizer.
Hi Ember,
I had a question about the use of the extended row size feature in a columnar table. I understand why it’s bad and should be avoided for a row-based table, but in a columnar table, since every column is on it’s own page, is there a downside to using this feature? We have a large table that’s been denormalized so much, it won’t fit on a 32K page. I was thinking of enabling this feature vs splitting the data into two tables and having to join them together. Just curious what your thoughts are. Thanks for the input.
The problem with extended row size is that the space that doesn’t fit on the page is stored in a CLOB. CLOBs are handled differently than other data – they do not go through the bufferpool. Larger CLOBs cannot be handled by HADR. There can be significant negative performance impacts from CLOBS. I have avoided this feature – I see it only as a patch to make DB2 behave more like Oracle, at the cost of performance.
You pose an interesting question though about row size and columnar tables, since the same row size concepts do not apply. Let me reach out to a couple of contacts and see if I can get a reasonable answer.
Thanks! Any information would be helpful.
What I’m hearing is that extended row size is not supported with BLU, nor are there any plans to add it. I’m not sure why traditional row-size limitations apply still – it’s not the kind of question that is easy to get an answer to.
Thanks for following up. I was curious to see what would happen in a BLU implementation, so I decided to test it out and see if DB2 would even let me create a columnar table larger than the pagesize. With the EXTENDED_ROW_SIZE parm ENABLED, it let me create the table and load it with no errors or issues. I’ll do some performance tests next to see if there’s a performance hit when a table is configured this way.
As a side note, I think it’s a little scary that EXTENDED_ROW_SIZE is ENABLED by default in 10.5. Someone could be using that feature without even knowing it and the performance headaches that come with it. I’ve made a note to turn it off in our new implementations. I’m happy and upgrade leaves it disabled.
I asked why row sizes over 32K on columnar tables are not supported since we don’t have the page limits for columnar tables the way we do for row-organized tables today on the DB2 Night Show, and Matt Huras implied that it might be something addressed in some way in a DB2 release in 2016.
Hi Ember, this is a really nice and precise article I should say…great! We have a recommendation to use CDC tables to load our final layer of BLU tables and with that I will assume that target tables should be row organized (as with CDC as input it will involve insert/updates on target tables). And later on I wanted to use replication technology and create shadow tables which could be used by the BI application…how does it sound and can it get any better?
On second thoughts, I was also thinking why to use CDC as source and why not do a load replace of entire source data every day into column organized BLU tables, do you think it will be more appropriate and efficient?
I think you’d have to test that thoroughly. Generally that would logically lead to the best compression, but you’d have to try it out. As far as I’m aware, you still cannot have a columnar table as a source for replication. I also find the shadow tables a bit clunky in the current implementation and keep hoping they’ll improve that.