The Basics Of Containers for DBAs

Posted by

I’ve spent much of the last year or two on a containerization and DevOps crash-course. I’m just recently starting to come up for air and fill in some of the gaps that being neck-deep in the how leave in in the what and why. I’m starting to share what I’ve learned, so be on the lookout for more posts about containerization and DevOps.

I’m targeting this article on what a container is, coming from a typical Db2 DBA skillset.

What is a Container?

A container in its simplest definition is an isolated area of an OS with resource usage limits. The thing that always comes to mind when I’m describing containers is this hypothetical conversation:
Developer: It works on my machine!
Manager: Ok, let’s ship your machine, then!

Many of the problems that containers solve are in the software development space. Containers make a lot of sense for applications that do not include much stateful data and need to scale out quickly and easily. It turns out they have a lot of applications beyond that. Check out my article – An Overview of Running Db2 in Containers – for some of the reasons around using containers for databases.

Containers vs. VMs

Being old enough to have seen first-hand the revolution that Virtual Machines represented for information technology, comparing containers to Virtual Machines really helps me. I remember early on, DBAs were very much against running databases on Virtual Machines. I’ve experienced and heard horror stories about running databases in poorly architected virtual environments. However these days, unless you’re talking about a database with really high-level performance or throughput requirements, I have no trouble running Db2 databases on virtual machines.

Here’s a comparison of bare metal servers, virtual machines, and containers:

This is not a unique take on this, but a diagram you can find in different versions on 100 different sites.

The thing to really note here is that using VMs requires that both the host and the guest have full operating systems. When we’re using containers, the container largely and often pulls from the operating system of the host. The overhead in a lot of ways is less. Much like Virtual Machines, there are a ton of infrastructure things around them for managing network and storage and all the other required components.

While logging into a container feels much the same as logging into a VM, there are definitely some differences, particularly if you’re really doing things the container way. For example, in a VM I wouldn’t think twice about adding a user I needed using typical Linux useradd commands. In a container, I know that the right way to add a user is to change the container definition to include the user and then build/deploy the container. In general, you are not making changes to the container, you’re changing the container definition and then (re-)deploying it.

This should bring up pointed questions from any DBA like “What about my data?”. When running a database in a container, we still have to have stateful data and still use SQL, DDL, and RDBMS-specific commands to work with the data and structures. A stateful set and stateful storage are used to accomplish this.

Container Software

Notice that so far, I haven’t once said “Docker”. Docker is the container runtime software most use to work with containers. It is the software I use to work with containers. Recently when Kubernetes announced that they would stop supporting docker as a container runtime, I was shocked when the talented technicians I work with were not the least bit bothered, and didn’t really think it would be a problem for our environments. Other container runtimes are perfectly valid to use. I don’t pretend to have the expertise to discuss other options.

Images

Images are what you start with for containers. You get an image from a public or private repository, or you build an image yourself. When you pull an image from a repository, it is placed in a local repository of images, and that image may be used by many containers. An image consists of layers that are defined by a Dockerfile. These layers are read-only. Basically every file that you need to have a running “server” is included in these read-only layers. There is also a manifest that defines the layers.

Containers

How, then, do you get a running server from a bunch of read-only files? The answer is that each container, when run is largely a thin write-layer so that the running of the container can occur. Any changes to a container are never written to the image itself.

Docker Repositories

Images are pulled from Docker repositories. The default that Docker assumes is Docker Hub. You can also push your own images to Docker Hub, but unless you pay Docker Hub, they are then publicly available. Docker has recently implemented rate limiting, meaning that if you’re frequently pulling images in an enterprise context, you probably want to have an enterprise repository to avoid errors when you exceed your rate limit.

Dockerfiles

To build a new image, you use a Dockerfile to determine what you want to happen. You generally pull from another image – whether that’s a base OS one like centos8, or an image that includes more. Go with something that has the bare minimum of what you need. One of the precepts of containers is to only have what you need in a container – optimally only a single app and what it needs to run. Size matters for docker containers – the smaller, the better!

Once you’ve defined your starting point, your docker file defines additional steps to take to make the container what you want it to be. As you’re doing this, you want to balance the opportunity to reuse layers (do you have many containers that need the same set of actions?) and reducing the number of layers by grouping commands on the same line, and issuing as few commands as possible.

Summary

Understanding some of these basics of containers is useful in a DBA’s career. We have more and more demands to containerize databases and to assist developers with their containerized environments. Even if the only way you use containers is for your own sandbox environments, they’re still ridiculously easy to use.

Lead Database Administrator
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 18 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.