I recently discovered an interesting (and frustrating) issue while running PostgreSQL in Kubernetes. While the issue technically isn’t specific to Kubernetes, per se, it is certainly easy to encounter in Kubernetes for users of the official PostgreSQL Docker image (and many other images) mounting volumes somewhere other than the default location. Luckily, it didn’t affect anything in production, but it turned out that even with a volume setup for persisting data outside the container, restarting the container caused my data to disappear!
Yes, I know all about containers needing volumes to store data outside of their writable layer. In fact, the issue is related to how the volume needs to be mounted with the PostgreSQL image. After digging first through the Dockerfile, I discovered that the problem is related to using the
VOLUME directive. Many, many people complain about the use of the
VOLUME instruction in upstream images and for good reason.
The default location for data in the PostgreSQL image is
/var/lib/postgresql/data, though the README for the image has some slightly confusing advice about choosing a precise location. In my Kubernetes StatefulSet, I defined a persistent volume and mounted it at
/var/lib/postgresql, which seemed harmless enough. My assumption was that this would allow the data directory to be safely created, persisted, and carried along across restarts of the container/Pod. After running the image for some time, listing the contents of this directory seems pretty straightforward:
root@db-0:/var/lib/postgresql# ls data
There’s nothing else in there, and it is clearly creating and storing things in there as expected. It turns out though that using this parent directory for mounting the volume does matter, and a lot. After restarting this Pod during some routine maintenance, all the data was gone! Not just that, but the database was reinitialized as if it never had any data. Even more strange, checking the timestamp of volume/directory
/var/lib/postgresql yielded the date from when I first launched the container (as expected). Digging deeper, I saw the underlying volume created by Kubernetes was the same volume, still intact. So the same volume is still there and mounted but it doesn’t have my PostgreSQL data anymore. What gives?
The use of the
VOLUME instruction within the Dockerfile, it turns out, has a peculiar consequence for scenarios like these. If no volume is attached at the precise location specified by the
VOLUME instruction (Kubernetes or otherwise), the Docker engine creates what’s called an anonymous volume at that location. What’s more, this behavior is exacerbated on Kubernetes thanks to the distinction between Pods and containers, where Kubernetes volumes are carried through the lifecycle of the Pod, which commonly includes outright replacing containers, while volumes created because of the
VOLUME directive are specific to the container. So while I did create a volume for the PostgreSQL data and it was properly mounted at
/var/lib/postgresql, the underlying Docker engine was creating a separate, container-specific volume one level deeper at
What is the lesson here? Well, the general idea is not to use
VOLUME in your Dockerfiles; it is far less flexible and has no real upside. Instead, provide environment variables that allow specifying the data volume location and allow the user of your image decide how and where to create volumes. In this specific case, where this is a vendor-provided image, be sure to thoroughly read and understand the Dockerfile used to create it and try to stick to the locations and examples provided in their README. For the PostgreSQL image, I ended up mounting a volume at
/var/lib/postgresql/data and even specified the
PGDATA as a subdirectory of this volume as recommended in the README.
This certainly isn’t an argument against using the official PostgreSQL image or other databases in Docker/Kubernetes; this is an argument for really digging in and understanding how these things work. Running databases this way offers a ton of flexibility, speed, and can be very robust and reliable. As with all things though, be sure to test various failure scenarios to make sure everything is actually working as expected. This issue was obviously very frustrating, but once I understood it, it is easy enough to take into account.