I’m a couple of months into working with MySQL, and thought I’d share some resources and thoughts on what I’m learning and what is working. At least half of that time was dedicated to a very in-depth onboarding process that barely mentioned a database at all in about a month of content, so it’s closer to a month in. I’m also working at a company which is running MySQL at an unbelievable scale, so I’m not sitting here spinning up servers or poking around databases on a daily basis. Manual interaction is a bug when you’re in a department of 40+ people supporting petabytes of data running on many hundreds of databases running on thousands of containers and VMS.
Keep in mind as you’re reading this that I’m still new to MySQL and my statements about it may be incorrect or use incorrect terminology. Please share anything you find there in the comments so I can correct it.
The absolutely most excellent source I’ve found to study MySQL as an expert in another RDBMS at the beginning has been the book High Performance MySQL. Yes, the book is focused on performance, but that ends up working out really well because it doesn’t start at “This is a table”. It makes assumptions about basic database knowledge, which is great if you understand some RDBMS. I find that I can read it and easily go off on side quests when I find a term I don’t understand or an area I need to understand better. I’m not yet done reading it, and I mostly expect to read it again later this year or next year, as I suspect I’ll get different things out of it without the side quests.
The official MySQL documentation is excellent for digging into topics deeper, and its organization makes sense to me coming from another platform.
On the topics where I want to go really deep or have an oddly specific question that often comes from a knowledge of Db2 internals, I go to the MySQL Internals Manual. This is way too deep for the basics, but really good at getting much more detailed than the Db2 documentation ever does.
Internet searching is good, and there is a lot of random content out there – more so than with Db2. The key, I’ve found is to use several sources and if there are significant conflicts to either ask someone who knows it or dig into the official documentation. There are a lot of people who are wrong on the internet and just because you CAN do something in MySQL (or any other software) doesn’t mean you SHOULD.
Open Source vs. Proprietary
One thing I’m going to be getting used to is the approach from an Open Source perspective and how that differs from the approach from a proprietary software approach. With IBM Db2, if we have a problem and work through everything we can think of and google on our own, we go to IBM and we open a case, and as painful as that process can be, often get an answer. Sometimes the answer is that you’ve done something stupid and you can just do it differently. Sometimes the answer is “working as designed”, meaning IBM has no intention of changing it. Sometimes the answer is to point you to an existing patch to fix the problem. And rarely the answer is a patch coded just for you to fix the problem. In any of those cases, there is some answer provided to you.
From my mentor, I’m learning that the approach in the MySQL world, even at the expert levels, is different. If we run into a sticky problem, we can dig into the code base and find the problem or suggest a code fix ourselves. What’s not different is that you’re still generally going to write up the issue really well including how to reproduce it and open an issue. In this case you’re hoping that your issue will be patched either with code you provide or some other way by Oracle. It’s not terribly likely that you’re going to fully fork the MySQL code and make your own modifications, as that’s tough to maintain over time.
I have obviously never seen the code that makes up Db2, so this is certainly a new way to look at things. In any case, I’m nowhere near deep enough just yet to do this. I’m still working through reconciling the basics.
Conceptually, there are a lot of similarities. I don’t have to understand why we need storage abstraction and what tablespaces are. The basic syntax for DML and DDL is relatively similar. Buffer Pools still make sense in general terms. I don’t need instruction on the concepts of data types and some of the data type choices – I only need to look at the edge cases. ENUM, I’m looking at you.
From what I’ve seen, MySQL, or perhaps how my organization uses MySQL, seems to have an extreme emphasis on high availability. There is an expectation that we never take a full database outage, even on upgrade. The structure of MySQL tables (clustered on Primary Key, with data stored on the leaf pages of the PK index) leads to not needing reorganization, but instead needing rebuilding when a certain narrower set of types of changes are made. There are tools like LHM out there to help make fully online schema changes more possible. While that’s specific to Ruby on Rails, I’m not aware of any similar tools for Db2 beyond the very manual use of things like admin_move_table.
Reads on read-only replicas are much more stable in MySQL than they are on a Db2 HADR standby, and MySQL replication, if done correctly (for everything, not for specific databases/schemas), can handle structural changes and work on everything under MySQL’s control. There are great tools like ProxySQL to spread reads across large numbers of read replicas. I’ve never heard of someone directing read traffic and write traffic from the same application to different Db2 servers. Specific read-only traffic, yes, but usually that’s reserved for something like reporting, and HADR’s limitations have historically made it fairly universally hated for that purpose, even with a VIP that gets redirected if the standby is down. We’ve pushed IBM on continuous availability in Db2 for years, and while it’s better than it used to be, it’s just not where it could be.
There are things they’re doing with MySQL here that I can picture doing with Db2, but that I’d have to wrap in a LOT of custom code to make work. I’m not sure I can picture a petabyte scale Db2 database with hundreds of shards and at least 3 replicas per shard actually working and scaling the way we’re doing it with MySQL. I could probably shard the databases the same way we do here, at the application layer, but the code I’d have to wrap around replicas and schema changes using any flavor of Db2 replication – I shudder to think of it.
One area that stood out to me with differences was the optimizer. Not surprisingly, the Db2 optimizer seems much more advanced than the MySQL optimizer, along with some of the options on indexing. I’m working with MySQL 5.7, so there are some advancements in 8 that I haven’t seen yet, particularly related to the optimizer. In 5.7, when MySQL says it’s a cost-based optimizer, the main thing it looks at is cardinality. Distribution statistics and frequent values are not a part of the picture. I read that for statistics gathering, MySQL samples 8 pages by default and nearly had a heart attack. Then I the realized if it’s only cardinality you’re looking at, that might not be so bad.
In MySQL 5.7 the only join type is nested loop, so it makes looking at explains entirely different in an interesting way. There’s nothing I can do to change the join type, so I’m largely focusing on each table access and how it is working or not working. Which when you come down to it isn’t all that different than Db2. Reading explains isn’t that hard to pick up. Selectivity is still a thing. Table Scans are still bad. Column order in indexes still matters. Too many indexes still slows down inserts, updates, and deletes. You can still tell the optimizer what to do in comments (and this seems a more common and accepted technique in MySQL). But odd things like the primary key being in every index instead of a RID still take thought.
Some of the advanced query performance techniques aren’t there. Column group statistics aren’t a thing in MySQL. I can’t use statistical views. Even little things like I can’t have column A of the index in ascending order while having column B of the index in descending order.
I am appalled by some of the ways MySQL will let you shoot yourself in the foot. Some of the storage engines are just insane (looking at you, BLACKHOLE), and not using strict mode – why are you even using an RDBMS?
My Journey So Far
It somehow manages to be both easier and harder than I thought it would be. A lot is similar, and I love the concepts and problems that still apply. A lot of the details are different. I have a hard time determining how deep to go on side quests and when to go on them. I’ve started verbally, out loud, alone in my home office, firmly telling myself “NO” when I discover yet another thing I know I don’t understand, but also that I don’t have to understand to address the current issue or understand the current concept. I have a list of links to read a mile long to help with this, and haven’t done much yet of going back and actually reading the list. I have an awesome mentor who has endless enthusiasm to answer my endless questions, and a willingness to very slowly work through real world problems while I ask the most basic questions.
As someone who thrives on change and loves learning, I am loving drinking from the firehose and quite happy with the change I’ve made.
I plan to start writing on some MySQL specific topics soon, and am also toying with the idea of a separate site/blog that can help translate concepts between different RDBMS platforms. I have so much to write about!