I have seen the question several times on various groups or forums – “How do I install SQL?”. From a DBA standpoint, this isn’t even a properly formed question. From a developer learning standpoint the question makes sense. I needed Python, I went and downloaded Anaconda. I needed Perl, I put Strawberry Perl on Windows and used the package manager to install or update it on Linux.
SQL is different, though. The procedural extensions for SQL (SQL PL, PL/SQL, T-SQL) can have most of the features of a full programming language. However, plain SQL is a language centered on data retrieval and manipulation. Originally this data retrieval was from relational databases, but today SQL can be used against a variety of data including interfaces for working with XML, JSON, and even flat files in interesting ways.
Behind SQL there is always some sort of database engine. What does a database engine do? It does a ton of data-centric things. At its most basic, it provides a way of interacting with data in a logical way without the end user having to know just where on disk data is stored. I don’t have to know the file name or location within the file.
There are a ton of additional common problems that the database engines have solved or nearly solved to help you with dealing with data. Many of them will do data type conversions for you. They have optimizers that take the logical requests for data that they are given (queries) and figure out where that data is on disk and how to put it all together the way you asked for it and return it. They can handle users concurrently reading and even updating the same piece of data at a much more granular level than a file.
The original database engines were hierarchical. For decades, relational database engines have ruled the roost, and are the default thing many think of when they hear “database”.
Some database engines are targeted on a specific purpose and do that function well while not doing something else as well. No database engine is the fastest for every possible use case, and especially today, some are really purpose-built to do one thing really well. Sometimes they focus on really fast ingest. Sometimes they handle specific data structures or non-structures really well. Sometimes they’re awesome at crunching through a lot of data in one specific way or another.
One of the best representations I’ve seen out there is Research451’s data platforms map. Unfortunately you need to provide information to actually download it, and likely have to pay for the most recent versions. But it does a great job of letting you see what data platforms are related and similar and which are not. Let’s talk about a few general classes and examples.
I list this one by itself because of its ubiquity in the learning community and ease of use for a single user.
SQLite is a small, simple C-library implementation of an SQL engine. If you must have something local for a single user, particularly if you want something easy to use, it is a good choice. Perfect for students who want to learn SQL, or if you need an SQL engine on a mobile device. It will not scale for many users or for large amounts of data. It is probably the closest you will get to the “How do I install SQL?” type of simplicity.
SQL also has many dialects. While there are international standards for SQL, just about every database engine adds on and has their own way of doing certain things. Every dialect supports different parts of the standard. Many of the newer database engines will state which SQL they’re compatible with. Amazon Aurora, for example claims to be “MySQL and PostgreSQL-compatible”.
This means that if you’re learning SQL so you can work with a particular database engine, you likely don’t want SQLite, and instead want the specific database engine you’ll be working with.
Enterprise Relational Database Management Systems
The three big options in this space are Db2 (IBM), Oracle, and Microsoft SQL Server.
These are the big relational database engines that have been around a long time. So long, in fact that they’ve incorporated features from a number of data management fads over the years – Object-Oriented databases, XML, NoSQL – all the big engines have features in these spaces.
There are several major advantages with an Enterprise RDBMS.
- Years have been invested in solving data problems for paying customers using huge and highly active databases. This means they have things like concurrency controls largely solved in well-documented ways. It may take time to understand and use their solutions, but they are there.
- They have invested millions of dollars in their optimizers. They generally use cost-based optimizers instead of rule-based ones, making high performance easier to achieve when you know how to use their tools in this area.
- They have professional support either included in their licensing fees or for an additional cost. You can also sometimes buy support for the free ones in the next section.
- They’re scalable. They often have a cheaper entry-level option, but can scale out to high volume (12,000 transactions per second) without blinking, and out to unimaginable scale with a bit more effort.
- Documentation – it’s pretty good once you know how to navigate, though often impenetrable when you don’t. Since they’re widely used by talented professionals there is a lot of content around how to use them, too.
There are several disadvantages in this space. The first is the obvious one – cost. Free offerings of the enterprise RDBMSes:
|RDBMS||Product||Feature Limits||Data Size Limit||Core Limit||Memory Limit|
|IBM Db2||Db2 Community Edition||NONE – all fixes and features included||100 GB||4||16 GB|
|Oracle||Oracle XE||No more than 3 databases
Restricted feature set
|12 GB||2||2 GB|
|Microsoft SQL Server||SQL Server 2017 Express Edition||Restricted feature set||10 GB||1 Socket/4 Cores||~2GB|
These limits may work for some environments, but be sure to read the small print on the feature limits. I also rarely see a database that’s smaller than 12 GB these days. Please note I have taken the limits from each vendor’s site as of the day this article was written – please verify they’re still true as of the time you read this.
Enterprise databases aren’t easily extensible – you can’t get the code and compile your own fix. Though how many actually do that, I don’t know. If you’re a small organization, it can be very hard to get the big vendors’ attention for something they consider “working as designed” – unless many other people have the same issue or a large implementation has the same issue. Finally, they make some assumptions that may not fit your needs – such as the focus on ACID and being very reluctant to impact that in any way.
Open-Source Relational Database Management Systems
Offerings in this space include PostgreSQL, MySQL, and MariaDB. There are many others.
These are popular open-source databases with a lot of contributions. They all have strengths and weaknesses. I would likely list their weaknesses as the same things that are strengths for the enterprise database engines, with one important addition. It seems exceptionally easy with the ones I have touched to shoot yourself in the foot. Choose the wrong storage engine for MySQL and you just lost recovery to any point in time. You probably gained something else, but you have to be very aware of the trade-offs.
On the strengths side, well “Free“. Also you can often code your own extensions and get a lot of support from some very robust communities.
You can buy support from various vendors for these products. MySQL support is available from Oracle. PostgreSQL support is available from IBM. Support is likely available from other vendors as well.
No-SQL and Unstructured Data
Why am I talking about No-SQL in an article about SQL? Well, when the No-SQL movement started, it meant that here was no SQL in the product. But then it morphed to Not Only SQL. Many of the No-SQL products started to solve one problem really well. Maybe it was ingest data really fast. Maybe it was sharding data across many servers with redundant copies of the data. Maybe it was not requiring a schema up front. But each of the specific use cases had other consequences. Ingesting data really fast might mean there are no relationships to be defined and retrieving the data is complicated. Sharding data tends to lead to consistency issues in the data and the loss of a significant portion of ACID. Not requiring a schema up front seems great at first, but it leads to a lot of scanning through data later to aggregate or pull out specific data.
When choosing one of these options, it is particularly important to be aware of what the limitations are and make sure they match your use case. If you don’t have extremely well defined requirements, it is often better to go with one of the more established and traditional database engines.
For example, MongoDB is great for some unstructured data use cases, but trying to make it ACID will probably either not work or lead to poorer performance than a Relational Database would have. If eventual consistency works for your use case, great. If you don’t know what eventual consistency is, you don’t have your requirements for a database management system defined well enough.
Why is SQL Needed?
So many languages and frameworks these days have database abstraction layers. Rest APIs are all the rage to retrieve data. Why do you even need SQL? I am obviously biased. I’ve been a relational DBA for 18 years, and you can probably see that in the tone of this post. Time after time I see issues with the abstraction layers. Truly performant SQL is simply easier to achieve in pure SQL. Those abstraction layers add syntax and calls that you just don’t need, sometimes to the tune of 60% or more of your CPU. When the speed of a transaction a user is waiting for boils down to “How fast can I write the data to the database?” or “How fast can I read the data from the database?”, milliseconds matter. I have spent hours of my life just trying to shave a few milliseconds off a particularly critical query.
We can argue about which method is best, but learn SQL and it won’t matter what language or framework you’re using – data retrieval and change will just work.
Nice post… One comment: In the Microsoft SQL Server community I have noticed that some people use “SQL” and a short version of “SQL Server” so people may be asking “How do I install SQL Server?” when they are asking “How do I install SQL?”