It’s extremely important that you have backups. Without backups, you are at serious risk of losing data. If you don’t have backups, all it takes is one machine failure or fat fingered command to lose data.
There are several different ways to backup a Postgres database. In this post, we are going to look at backups done through an approach called “continuous archiving”. We’ll take a look at how continuous archiving works. We’ll also cover how to setup continuous archiving, and how to restore a backup through continuous archiving.
One neat aspect of backups created through continuous archiving is they allow point in time recovery. Point in time recovery means you can use the backup to restore your database to any point in time you want. You are able to specify the exact moment to restore the backup to. Let’s say you accidentally ran DROP TABLE on the wrong table yesterday at 4:05pm. With point in time recovery, you can recover your database to 4:04pm, right before you dropped the table.
Let’s now take a look at how continuous archiving works. There are two pieces to continuous archiving backups: a base backup and a WAL archive. A base backup is a copy of the entire database at a specific moment in time. On the other hand, the WAL archive is a list of all the diffs made to the database after the base backup was created. The WAL archive is made up of series of WAL files. Each WAL file contains the diffs made during a small period time. Altogether the WAL files give you the diffs since the base backup was created. Not only is WAL used here as part of continuous archiving, but it is also used to recovery in case the database crashes. I’ve talked a bit about how the WAL helps when recovering from crashes here.
With a base backup and WAL archive, it’s straightforward to restore the database to any point in time after the base backup was created. You first restore the base backup. This gives you the database at the time the base backup was created. From there, you apply all of the WAL before the time you want to restore the database to. Since the WAL is a list of all the diffs made to the database, this will bring the database up to the state it was in at that specific time.
Now that we have an idea of how continuous archiving works, let’s take a look at how to set it up. We setup continuous archiving through the core functionality Postgres provides for setting up continuous archiving. If you are setting up backups for you own database, you should instead look at a tool like WAL-E or WAL-G instead. WAL-E and WAL-G wrap the core Postgres functionality and take care of a lot of the work for you. We’ll take a look at WAL-E in my next post.
Creating a base backup is straightforward. Postgres provides a command line tool, pg_basebackup. The pg_basebackup command creates a base backup of the specified Postgres instance and stores it in the location you specify.
Setting up the WAL archive is a bit more involved. Postgres has a configuration parameter archive_command. Every time Postgres generates a certain amount of WAL, it will create a WAL file. Postgres will then run the command in archive_command passing the location of the WAL file as an argument. The archive_command needs to copy the WAL file and store it somewhere where it can be retrieved later. In other words, in the WAL archive. The Postgres docs give the following example archive_command:
test ! -f /mnt/server/archivedir/%f & cp %p /mnt/server/archivedir/%f
When Postgres runs the command, it will replace the %f with the name of the WAL file and it will replace %p with the full path to the existing WAL file. The command above first uses the test command to make sure it hasn’t already copied the specified WAL file. It then copies the WAL file from it’s current location to /mnt/server/archivedir, the location of the WAL archive. Every time Postgres generates a new WAL file, it will run archive_command and copy the file to /mnt/server/archivedir.
Restoring a Postgres backup is pretty much the opposite of how you create it. First you need to restore the base backup. To do so, you clear out the Postgres data directory to wipe the existing Postgres database. Then you copy the files generated by pg_basebackup to the data directory. This will restore the base backup.
Then, to restore the WAL, you write a recovery.conf file. A recovery.conf file specifies a few parameters. The two most important ones are recovery_target_time and restore_command. recovery_target_time is the time at which you want to restore the backup to. The other parameter, restore_command is the opposite of archive_command. Instead of copying the WAL file from Postgres to the WAL archive, restore_command copies the WAL file from the archive to the specified Postgres directory. Whenever Postgres needs a specific WAL file, it will run restore_command with the name of the file it needs. Here’s an example recovery_command file the Postgres docs give that is the inverse of the above archive_command:
restore_command = 'cp /mnt/server/archivedir/%f %p'
This command copies the WAL file from /mnt/server/archivedir to the location where Postgres keeps WAL files.
Once you have copied the base backup and setup the recovery.conf file, all you need to do is start Postgres. Postgres will detect that you want to restore a backup and automatically run restore_command until it has restored up to the point you specified! This will give you a copy of your database at exactly the time you requested. If you want a more detailed walk through on how to setup a Postgres backup, the docs have a pretty good one.
I honestly find it incredible that you can restore backups in Postgres to an arbitrary point in time. The way I’ve discussed above is only one way to setup continuous archiving. Because it only uses the bare Postgres functionality, you still need to do a lot of work on your end to configure it how you want. In my next post, we’ll take a look at WAL-E. WAL-E wraps the Postgres functionality and makes it much easier to setup continous archiving. It even has functionality for storing backups in the cloud storage system of your choice!