When I started working as a DBA in 2001, the relational database management system choices were more limited and the choice itself was often left up to a system architect. Rarely did developers have a say, and even DBAs only had a say in whether they could support it or would have to hire people to support it. How databases are chosen has changed some since them, but there are a few things that I’ve learned that I thought might be worth sharing.
Though I am biased towards the systems I have worked with, this will not be a “choose Db2” article. I might use Db2 and MySQL as examples, as they’re the platforms I have worked with the most.
Let’s start with the assumption that you’ve decided that a relational database is what you need. They’re a great general-purpose data storage choice and, in my thoroughly biased opinion, should be the default starting place until requirements or even testing determine they just can’t do the job very well. Mature RDBMSes have absorbed and become better from challenges from everything from object databases to xml databases to key-value stores. Most of them can give decent performance in a number of scenarios and astonishing performance when well architected running with applications that fully take advantage of their strengths.
I’ll mostly take this from the standpoint of an established small or medium sized technical organization choosing an RDBMS for some new project. If you’re coming from a startup perspective, there should still be some interesting things to think about here.
It may be surprising to hear, but the absolute #1 consideration when choosing an RDBMS should be to find one that fits your technical organizational structure and culture. Conway’s Law has the trump card here. Fighting this will just be painful and probably not worth it.
This can be at a very basic level. If you have nearly entirely Microsoft Windows based technical organization, choosing an RDBMS which runs only on Linux will not make you or any of your technical staff happy.
Database and Development Staff
One of the most critical factors in supporting an RDBMS is the technical staff available to you. This include employees, contractors you normally work with, AND the pool of potential applicants for either of these roles. If you choose FoxPro DBaseII today, you’ll have a hard time finding people to support it. This is one of the key things to keep in mind if you choose something unusual – where will the admins and experts to run and code for it come from? This is also a question to ask any vendor trying to sell you any RDBMS – what do they do to develop talent around their product? Can you actually attend a random session of a class or conference to see how many people are there and how engaged they are?
It is not incredibly difficult for developers to work with different RDBMSes, though it does take conscious effort to be aware of a number of the different choices the RDBMSes make and what they mean. It may cause context switching if a developer has to flip back and forth, particularly when doing fairly deep work.
At the infrastructure level, each relational database platform really requires a DBA or database engineer with a different skill set. At entry and mid levels, there’s a fair amount of similarity and many DBAs handle multiple platforms, but as you scale and a platform gets more demanding, you need deeper levels of expertise either on-call or on staff.
DBAs are one of the last technical specialties to jump on the devops bandwagon, and with some platforms you may have a harder time finding database professionals who are used to (or even willing to) work in a devops environment.
What Do You Want to do When the Stuff Hits the Fan?
If I had to pick one element as the most critical, this is it. When there is a worst-case scenario and you have an outage which was not caused by human error or normal hardware problems, but by the RDBMS itself, which of these two adventures do you choose?
- You have someone on staff or a pre-arranged support contract with a consultant or consultant organization that can dig through all details, determine a fix (be it custom code changes to the RDBMS or updating to a specific patch level), and implement the fix. You’re willing and able to run on a branch of the code with these fixes or to advocate at the right levels with the support organization or open-source maintainers to get the fix merged back into a main and maintained branch.
- You can navigate the bureaucracy of the RDBMS vendor to get real support and a real fix delivered in a reasonable time frame. This involves much more communication and figuring out what to do when told something is “working as designed”. You’ll often do better on this the more revenue you represent to the vendor with better access to deeper technical folks the more you pay.
These are not the commonplace problems that are caused by a poorly architected implementation. These are the deeper problems requiring code-level fixes.
The first adventure above is what happens with an open-source RDBMS and the second is for a vended RDBMS. In either case, community involvement is fairly critical for anything beyond a mid-size implmentation.
A similar calculus goes for how you want to approach it when you have a feature that you absolutely must have, and have to try to get it into the code via either getting open-source contributions actually noticed or through communicating with a vendor. Either path is likely to take years.
Location of Computing Resources
Databases generally need to sit physically near the applications they serve. When we care about whether a query takes 10 microseconds or 10 milliseconds, network time matters a lot. This means that compute for the application in Google Cloud and using AWS Aurora probably isn’t the best match up.
Most clouds offer a few databases fairly natively. But most of them you can also spin up your own virtual machines and run any RDBMS for which you have or are willing to pay for licenses. You get the most flexibility in the long run with an RDBMS that you COULD run anywhere, but moving data center methodologies or clouds is so complicated that it may not be worth considering long term portability all that much. It feels like it gives you better negotiating power, but I’m not really sure how much of that you have if you’re not one of the larger organizations anyway. I’m not a cloud economist.
Many applications these days support only one or a small selection of relational databases. This tends to lead to a proliferation of different RDBMSes as different ones are required, and very few see the RDBMS as a deciding factor when choosing an application.
It is entirely worth pushing a vendor to support the RDBMS you want, with the caveat that it’s not all that fun to be a pioneer in this area. It is not generally worth trying to make a vended product work on an RDBMS they don’t support. You’ll simply get a lot of finger pointing for complicated problems.
Everyone thinks they’re the next Facebook or Twitter, and think that the RDBMS they start with will need to scale to millions of queries per second. You’re not going to be the next Facebook. And if you do become so, you’ll have the resources to change platforms to something that can better handle your scale. I can name both vended and open-source options in RDBMSes that can indeed scale from puny to millions of queries per second, but going from RDBMS A on a single VM to RDBMS A that can handle millions of queries per second is often just as complicated as switching RDBMSes. Focus on something that will handle realistic estimates for the next 5 years and call it good.
Changing Relational Database Platforms
Changing RDBMSes sucks. The easy parts are the structures and the data, oddly, though depending on data size it can take a while, and trying to keep things up during a cutover is expensive mostly in the time of the experts to do it. The hardest part is in-database logic. Isolation levels do not work identically across RDBMSes. Stored procedure languages are totally different, and finding the consultant fluent in the two that you need is difficult and may be expensive. You will probably have totally different performance bottlenecks, and they may be better or worse than the previous RDBMS. I have seen clients working on a transition from one RDBMS to another take decades to actually accomplish what they thought was a two-year task. It’s the Gilligan’s Island of relational databases. You thought it would be a 3-hour trip from here to there, but you get stuck in the middle and basically lose yourself and all sense of reality.
It’s surely not impossible, though. And if you’re moving to a vended RDBMS the vendor likely has all kinds of free resources available to help you, particularly if you’re willing to be part of their marketing. They may also offer you truly astounding discounts on the software for the first year or more.
The licensing cost of the RDBMS is not at the top of my list as deciding factors. You can find low-cost or free versions of several commercial RDBMSes, and depending on your organizational structure and the experts you are able to hire, you may want to pay for support even for a “free” RDBMS. For example, Db2 has a free version without support that can be used on up to 4 cores and with up to 16 GB of memory. If an RDBMS that costs money better fits your organizational structure, and therefore your needs in an RDBMS, it is likely well worth the cost. Navigating the costs may also be an area where it is worth finding a consultant who does not work for a company that sells support contracts to get ideas on what your options are cost-wise, given your use case. With Db2, working with an IBM Gold Consultant is a good way to go on this. IBM Gold Consultants do not work for IBM, but are independent consultants that IBM has recognized as having a deep level of expertise and involvement in the community. There are absolutely other experts outside this group who can help you as well. This is an example using the platform I’m most familiar with. I would expect similar things to exist in other RDBMS communities.
Which RDBMS is Best?
Every RDBMS makes little technical choices that are not better or worse, but just different. Would I try to put an organization who is entirely dedicated to Azure and .Net on Db2? Probably not as the first choice. They would hate it. Conversely, would I try to put a strongly hierarchical organization which likes monoliths and is used to communicating with large vendors on MySQL? Nope, they also might hate it without the right support structure around it. Have I seen organizations successfully cross this divide? Absolutely, but it involves hiring and retention of just the right people. I can think of advice I would have for each situation to make it as successful as possible and to help communicate the challenges to upper management. But given the choice, it is better to match the RDBMS to your organizational structure.
The best RDBMS for you fits the shape of your organization and how you deliver technical solutions.If you choose an RDBMS and find yourself frustrated with thing after thing it can’t do, it’s likely you are asking the wrong questions. It just doesn’t line up with the way your organization does business, and another RDBMS looks better not because it does things the previous one doesn’t, but because it asks and answers questions in a way that aligns with your way of doing business.