One of the things that learning a new RDBMS is teaching me is that there are some concepts that are common to RDBMSes. It is helping me to identify some of those concepts, and I learn a lot from the similarities and differences.
What is a Buffer Pool or Buffer Cache?
A buffer pool is an area of memory used to buffer data. Accessing disk, even with SSD and modern disk technologies is one of the slowest things a database system does.
Storage is at the heart of a relational database management system. In order to fully provide ACID transactions, each transaction must be externalized to disk before it is reported as complete. Most RDBMSes use what is called Write-Ahead Logging(WAL) to accomplish this. WAL means that any transaction is written to transaction logs on disk to qualify for the Durability part of ACID. This is the smallest piece of information possible. If a crash happens, the RDBMS goes through crash recovery and can re-apply changes made by transactions that are not yet in the table data on disk from the logs to data within the tables.
Writing and reading to/from tables uses the buffer pool as an intermediary. Mature database systems will have prefetchers that try to pull read from disk before it is needed into the buffer pool. They also have some type of page cleaning, which takes data that has changed in the buffer pool, and writes it asynchronously to data within tables on disk.
The buffer pool hit ratio is defined as the percentage of time when a page needed to be read that it could be pulled from the buffer pool and avoid synchronous I/O. Modern RDBMSes often achieve a hit ratio of 99% or higher once the cache is warmed up, even when the cache is only a portion of the database size. This will vary by the data size, type, and access patterns, but it is really quite an achievement. What this means practically is that many relational database management systems perform much like an in-memory database, though they are using resources to keep the right data in memory.
Core Functionality of Buffer Pools
Why do RDBMSes not just use the filesystem cache provided by most operating systems for this work? There are a number of functionalities that they add above and beyond what that would provide. Perhaps the most powerful is data is handled in database pages that the RDBMS can use again and again in memory and not have to be going back and forth to disk. The RDBMS does not have to read an entire datafile, table space file, or table in to memory at a time. The minimum I/O is often either a single database page (usually 4k, 8k, 16k, or 32k, depending on RDBMS and configuration), or an extent or chunk of such pages.
This is very efficient in a database context because there is often “hot” data that is being very frequently queried and updated, and also because the size of the data actually being worked with is often quite small – changing data far smaller than the page level – a row or a single value within the row. It is not just that the I/O is asynchronous, but that it is managed by the RDBMS itself in ways that make sense for the logical structures involved.
There is also no restriction on what user is accessing or editing data (beyond the basic permissions on who has access to the data). Data in the buffer pool can be changed by one user and milliseconds later be used/changed by a different database user.
In summary, core functionality of buffer pools includes:
- Caching data in memory in a RDBMS-native, highly reusable fashion
- Using prefetching to get data from disk to memory asynchronously, often before it is needed
- Using page cleaners to write data from memory to disk asynchronously
- Handling data in very small chunks
RDBMS Platform Differences
So far, we’ve gone through what’s similar across platforms. I wanted to also talk about some things that may be different between platforms. Since I’m an expert with IBM Db2, have some recent experience with MySQL(InnoDB), I’ll talk about some of the differences I’ve seen for those platforms.
Number, Sizes, and Page-Size Considerations
This may be one of the areas where platforms vary quite a bit. Some platforms may have buffer pools that are defined by instance, by database, or that can be associated with individual tables or table spaces. For example, in Db2, every table space (every table lives in a table space – there can be one table space or thousands) uses only one buffer pool, but each buffer pool can be associated with many tablespaces. Db2 allows multiple page sizes in each database(separated at the table space level), and needs at least one buffer pool for each page size being used. DB2 can automatically configure buffer pool sizes, within limits set at the database or instance layer. Db2 buffer pool sizes can also be manually set, or a mixture of the two, allowing a lot of room for customizations like isolating a buffer pool to only serve a single table – to either pin the table in memory or keep it from overwhelming buffer pool space that other tables need.
A MySQL instance lets you configure the total size of buffer pools and the total number, along with some configurations for preventing table scans from overwhelming the buffer pools. There isn’t an ability to isolate tables or to automatically configure the sizes. The buffer pools are all the same size – a fraction of the total buffer pool space based on the number configured. MySQL does not allow the mixture of page sizes within the same instance.
Prefetching and Page-Cleaning Methodology and Algorithms
Each RDBMS vendor develops their own algorithms for page cleaning and prefetching. A straight Least-Recently-Used(LRU) algorithm for deciding what should stay in the buffer pool would be too simplistic.
Db2 doesn’t publish a lot on their algorithm, but I’ve heard descriptions of it several times. Db2 essentially maintains whether it loves or hates each page in the buffer pool, and decides what to keep and what to move back to disk based on these lists or tags. We don’t know what determines these tags.
This is one of several areas lately where I have absolutely fallen in love with the MySQL documentation. Check out this page in the MySQL documentation on the Old/Young LRU algorithms. Pages like this make me wonder if blogging about it is even as useful as my blogs about Db2 – no extra help is needed to find or interpret this document. I very much enjoy how the two sections (old and young) are described and how the documentation describes how MySQL decides where a page goes.
Buffer Pool Sizing
Db2 has this nifty thing called the Self Tuning Memory Manager (STMM). Within the limits you give it (or within the detected limits of the system Db2 runs on), it decides how much memory should be allocated to a handful of memory areas, including the buffer pools (note, not all AUTOMATIC memory parameters are handled by STMM). This means you can throw Db2 on a system and not only will Db2 select appropriate memory sizes for the various memory areas, but it will also change those memory sizes to match varying workloads. This can get at least 80% of the performance out of a system that a skilled DBA can, and it doesn’t need to sleep at night! It’s not perfect, but for the vast majority of systems, it is really good.
I have been unable to find a similar function in MySQL – please comment if you know of something I’m missing! For MySQL(InnoDB), you configure the buffer pool sizes manually using two parameters – innodb-buffer-pool-size and innodb-buffer-pool-instances. innodb-buffer-pool-size represents the total size of all bufferpools for an instance of MySQL, while that size is split into innodb-buffer-pool-instances buffer pools.
What Thread Does the Work
Both Db2 and MySQL use a primarily thread based architecture, though Db2 does have some things that get different processes like the audit facility, the watchdog, and command line connections.
In both Db2 and MySQL, if work is being done asynchronously, the prefetcher threads pull data into the buffer pool while the page cleaner threads write data from the buffer pool to disk. There is some configuration around the number of these threads. In Db2, you can set the thread counts to AUTOMATIC to have Db2 choose for you. If work is being done synchronously (i.e., the pre fetchers failed to get the data in time or the page cleaners failed to write dirty pages to disk fast enough), then the agent thread for the connection does the work of puling data into the buffer pool or writing the data out to disk and cleaning pages – only what is needed to fulfill the current SUID statement.
Buffer Pools are one of the powerhouses of database performance – an undersized bufferpool will quickly choke performance. In broad strokes, they work similarly in both Db2 and MySQL, though are significantly different in some details.
Please let me know if you find technical errors or lacking nuances in the comments below. I’m new to MySQL, so could be missing some details.