DBA Perspective: Open Source vs. Proprietary RDBMS

Posted by

Moving from being a DBA on a proprietary RDBMS for 20 years to working on an open source RDBMS is teaching me a lot of lessons. Many of them are lessons I expected – remembering how and when to ask for help, getting used to not being expected to be the expert on everything, and technical details around how MySQL does things differently. I’d like to talk about some of the lessons I didn’t really expect.

Is X a MySQL Thing or a “My Employer” Thing

When I come across a new term, a practice, or even a technical detail, I often have to ask (myself or others) a question like: “Is this a MySQL thing or a MySQL at My Employer thing?”

As an expert on IBM Db2 with experience consulting in a number of companies, it was easy for me to see the lines between the RDBMS and how a particular employer used the RDBMS. I have seen and discussed a lot of conventional and not so conventional uses of Db2. The distinction is less obvious when I’m not as familiar with the RDBMS.

A simple example of this is foreign keys. At work, we don’t use them, and the main reason is that we use a schema migration tool and also use Ruby on Rails. The checks that foreign keys provide are handled at the application layer. Having foreign keys would break the processes for schema migration that we use for true CI/CD. Coming in from a company whose idea of CI/CD was deploys once every six weeks, it took a bit to understand that this wasn’t due to any lacking in MySQL, and may not be a common practice in the MySQL community, but just how my employer does things.

Deep Troubleshooting

Every piece of software, including proprietary and very established RDBMSes, encounter errors. At some point they’re going to have some problem to work through or around. Last week was my first experience with a MySQL failure that took a database down, and it was a doozy.

When encountering an error with Db2 that causes a crash or data issue, you call IBM support. You call IBM Support even before you’ve verified it wasn’t something related to known causes like hardware failures or obvious problems with OS or applications. They’ll ask a ton of questions and likely ask for logs of various sorts from the issue. If the issue stopped, they may ask you to try to recreate the issue with additional logging enabled. With the information that you provide, they’ll help with both immediate and long-term root cause of the issue and suggest a course of action – be it changing configuration in some way, or even writing a patch just for you. Sometimes the answer is you just made a mistake, and they’ll make that clear, but still help you identify the mistake and correct it. Now this is a bit of a simplification because IBM support is not particularly easy to deal with. There are many pitfalls along the way including them telling you to upgrade, recommending patently wrong courses of action, or the dreaded “Working as Designed” designator.

The approach for a similar issue on MySQL (without any paid support) is strikingly different. With a fairly large MySQL implementation, my employer decides not to pay for support from MySQL or Percona, but to instead employ people who can handle deep problems. This means that the course of action falls entirely on our teams. First, fixing the problem in production without data loss and low down time because we are prepared for even unexpected failures, and secondly digging into the MySQL code ourselves to see if we can find some clue about where the problem stems from so we can either suggest a fix to the code ourselves or at least point to where exactly things went wrong. Generally opening an issue with MySQL with “I had this problem” won’t help anyone.

General RDBMS Concepts

Another separation that is fascinating for me is discovering which concepts are general RDBMS concepts and which are specific to a platform or vary between platforms. While concepts like the buffer pool hit ratio and page cleaning are very similar at a high level, the details of how many buffer pools there are, how work is mapped to them, and the algorithm for deciding what to retain in the buffer pool are either slightly or very different. I was recently talking with another Db2 expert who has moved to MySQL, and his comment was that it felt like MySQL stole a lot of the terminology from Db2. I think that isn’t the case, but more that the areas where that feels true are the more general RDBMS concepts that apply to many platforms.

Summary

None of this should direct whether a company chooses to use a proprietary or open-source RDBMS, but I do feel like the RDBMS category choice should at least not clash too badly with an organization and a corporate culture. I’ve seen some astounding cultures of blame in my career, and have hated them with every fiber of my being. I wonder if a proprietary RDBMS better fits a culture of blame, because ultimately the finger can be pointed at IBM or whoever the vendor is. I’ve also seen some corporate cultures that took a wonderful, collaborative, blameless approach to problem solving, and still used a proprietary RDBMS. The expertise of your database team looks different for each, but both require deep expertise, particularly at larger scales.

Overall, I feel like I’ve been particularly lucky in this transition. I have at least three people at work I can ask the dumb questions and know they won’t make fun of me for it, and will take the time to explain in detail down to whatever level I need. I still have moments where I learn some technical detail and my immediate reaction is “What kind of podunk RDBMS is this?!?!”, but they’re now followed by marveling at the people I work with and the fascinating large scale we’re working at.

Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, and is now pivoting to focus on learning MySQL. Ember shares both posts about her core skill set and her journey learning MySQL. Ember lives in Denver and work from home

7 comments

  1. I love learning from your experience, Ember. Indeed, how to approach resolution is strikingly different. In a way, it’s actually similar to how Db2 development works. We stare at the code till we solve it. Investigating a memory corruption is like investigating a crime scene, looking for clues and guessing the sequence of events as well as motivations. 🙂

    I wonder what the cost effectiveness is like, as well as the speed of resolution. Organizations will choose cheaper and faster resolutions. Db2 needs to stay ahead of that game to stay competitive.

  2. Thanks for your interesting perspective. I’ve used MySQL for most of my development and consulting over the past 20 years. It’s true most projects don’t use foreign keys. Part of the history is that the default storage engine in MySQL before 2010 did not support foreign keys anyway (you could try to define one but it would be ignored). But even since 2010, foreign keys are usually skipped because they come with some locking behavior and operational complications that are surprising to some developers. For example, the schema change issue you mentioned. That said, we can safely assume that projects that doesn’t use referential integrity constraints are guaranteed to have tons of orphaned rows that go undetected.

  3. I can only think that OLTP databases are poorly designed having 3 issues, that FKs are being taken out of databases.

    1) Never stage historical data / immutable outside the main tables (the last years its getting worse).
    2) A database design purely conceptual in the 3-4NF or Boyce-Codd, without taking consideration that there is a serious effort that databases perform to make the data consistent.
    3) FKs are a pain in a partitioned environment. Which is pretty bad in most of Free Software databases (i can guess that its not the best world in the paid databases too)

    No wonder that MongoDB is making strides… It swallows anything, logically consistent or not….
    And the devs can hapzardly decide whatever data is stuffed each row

    I dare to ask if any company at all implement the rules/checks that you can specify in the table creation on MongoDB ….

    And a cream to the top, MongoDB offers a very robust row compression. So , The less IO it does, aligned with LSM mode, it can swallow so much data that makes other DBs sweat.

    There’s a nice trick also in the Object identifier/ PKs coding for MongoDB.
    It contrasts very much when seeing standard RDBMS modelling accepting happily PKs using plain monotonically crescent identifiers , which lacks some nice tricks in clustering/partitioning.

    Its so sad and painful to see that DB2 charges for row compression. It could turn some heads if making it a standard feature even on workgroup editions.

    Enough of rant today….

      1. Same is true for the rest of world. There are three editions. Two are paid. One is free. All three editions come with all features of Db2. The difference is the size of the deployment in # of cpu and memory. We used to limit the amount of data but that restriction was also removed some time ago.

        1. Thanks, gr8 to know. I wasnt aware that DB2 currently packs data compression as a standard feature. I could guess this is a key feature that should be more promoted.

          I have been away from DB2 for quite some time, and this is very nice to know. Next chats i’ll have i can use it to add more option on the table to argue/decide.

          Really thanks
          Alexandre

  4. Foreign Key Constraints: Booking also does not use them. I dumped our internal SQL Engineering Guidelines into my blog, under that title. It has a section on the use of FKC, too.

    Basically, if you use FKC, you cannot use Group Replication, and you cannot use online schema change tooling. That alone breaks too many things.

    Also, FKC come at a terrible cost in multiple way, and even more so when Replication is being used, because they are checked, as rechecked, every time things are being replicated, at each instance. In MySQL, specifically, they are also always IMMEDIATE and never DEFERRED, which makes them kind of exhausting to use: They force an order on the statements inside a transaction.

    We check these things in application side code, and that seems to have worked ok the last 20 years.

    “his comment was that it felt like MySQL stole a lot of the terminology from Db2” I don’t know. MySQL InnoDB, where these terms are being used, is suposedly a rather straightforward implementation of “Transactional Information Systems” by Weikum and Vossen. I had no opportunity to check, the book is still on my pile of shame.

Leave a Reply to Bill Karwin Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.