Daily Drupal Backups with Jenkins in Five Lines

Some time ago, we posted a how-to about backing-up Drupal databases to Amazon's S3 using Drush and Jenkins and the command-line tool s3cmd. Recently, we needed to revisit the database backup script, and we took the opportunity to update the S3 connection portion of it to use Amazon's official command-line tools, aws-cli, which they describe as "...a unified tool to manage your AWS services." At the same time, as part of our ongoing effort to automate all routine systems administration tasks, we created a small Ansible role to install and configure aws-cli on servers where we need to use it.

The backup script

Like the simple backup script we explained in the original post, the new variant still has the same three main tasks to perform:

  1. Create a gzipped dump of the database in a specified dump directory.
  2. Upload the new dump to an S3 bucket.
  3. Delete all database dumps in the dump directory older than ten days.

Creating a database dump hasn't changed; we still use drush to create a gzipped sql file whose name includes a human-readable date that will automatically sort with the oldest files at the top of the list:

drush sql-dump --gzip --result-file=/home/yourJenkinsUser/db-backups/yourProject-`date +%F-%T`.sql

Likewise, deleting the older files has not changed; we use find to find all of the contents of the dump directory that are files (not directories or links), named something.sql.gz, and ten or more days old:

find /home/yourJenkinsUser/db-backups/ -type f -name "*.sql.gz" -mtime +10 -delete

What has changed is that:

  1. We are now using the s3 subcommand of the aws-cli. library.
  2. We decided that we only needed to store ten days of database backups on S3 too (in our original script, we didn't prune the offsite backups).

With s3cmd, we used the put command to upload the latest db dump to an S3 bucket, then deleted out-of-date dump files. With aws s3, we could use the aws s3 cp command to copy the most recent dump file to our S3 bucket and then use the aws s3 rm command to remove out-of-date backup files from the S3 bucket much like we use the find command above to remove out-of-date files on the server.

However, doing this would have increased the complexity of what we intended to be a simple tool. That is, we'd have needed to list the contents of the S3 bucket, and then intelligently decide which to remove.

But the aws s3 command has more options than cp and rm, including one called aws s3 sync. This command allows us to simply synchronize the contents of local database backup directory and the S3 bucket. To make this work in the context of our script, we had to change the order of operations such that we delete local out-of-date database dumps, and then synchronize the directory and our S3 bucket.

Since the script deletes at least one database dump file each time the it runs (e.g. if it runs once per day, it will remove one file on each run), and since this happens before we copy the dump offsite, it's important to make sure that a) the job stops as soon as any error occurs, and b) that somebody is notified when or if this happens.

In the script below, we accomplish this with set -e which will cause the script to fail if any of the commands that run inside it return an error. For longer, or more complex scripts it would be worthwhile to include more sophisticated error-handling or checking.

The resulting script--which can be run daily using cron or a tool like Jenkins--looks something like this:

#!/bin/bash

set -e

# Variables.
MAXDAYS=10

# Switch to the docroot.
cd /var/www/yourProject/docroot/

# Backup the database.
drush sql-dump --gzip --result-file=/home/yourJenkinsUser/db-backups/yourProject-`date +%F-%T`.sql

# Delete local database backups older than $MAXDAYS days.
find /home/yourJenkinsUser/db-backups/ -type f -name "*.sql.gz" -mtime +${MAXDAYS} -delete

# Sync the backup directory to the bucket, deleting remote files or objects
# that are not present here.
aws s3 sync /home/yourJenkinsUser/db-backups s3://yourBucketName --delete

The aws-cli role

Installing and configuring aws-cli is not difficult but since we use Ansible for other similar tasks, we created an Ansible role to do this too. It installs the library, and creates fully-customizable versions of the required config and credentials files, making setting up the tool as easy as using the backup script. Automation FTW!