Improving Tugboat QA Environment Build Times with Parallelization

We make extensive use of Tugboat in our development process for generating on demand QA environments for every branch. However, "on demand" isn't instant. It still requires building the site (e.g. build the theme, run deployment commands, run database updates, etc) before it is ready for testing.

This isn't much of an issue when building a single site, but when a codebase powers a platform of ~20 sites, the build process requires too much patience to be considered "on demand", which isn't what you want in a CI/CD process. As we have added sites to this platform, the build times continued to rise and recently topped the 15 minute mark, which felt like a bridge too far. So we went to work on a solution.

Running each site build sequentially offered limited options for faster builds. Removing build operations or speeding up the operations already in place were about the only options. We even explored limiting the kinds of commands we run based upon the directories that the diff contained, but decided that the limited cases where it would actually help weren’t worth the risk of deviating from the production deployment process.

It became clear that a real solution would require thinking outside the sequential box.

Task Parallelization with Robo

What if we could run multiple commands at one time and deploy the sites in parallel? We had previously used the parallel command in instances such as these with some success, but sorting out all of the flags and arguments isn't an intuitive process, which left us unsatisfied with the end result.

Enter Robo, our tool of choice for simplifying the authoring of CI/CD tasks the past several years. While researching alternative solutions for parallelizing our builds, we found the ParallelExec class and set off on refactoring our code.

The refactoring proved to be fairly straightforward thankfully, but it can never be too easy. The new parallel process exposed some unexpected speed bumps with certain idiosyncrasies with the way child themes are built, but we were able to overcome these issues and soon had a working prototype.

Below are snippets of the original process and the new process.

Sequential Robo Deployment Script

foreach ($sites as $siteName => $siteInfo) {
$this->io()->section("$siteName: tugboat build.");
// Build the theme.
$this->say("$siteName: theme build.");
$results->accumulate(
"$siteName theme build",
$this->taskExec("composer robo theme:build $siteName")->run()
);
// Deploy Drupal.
$results->accumulate(
"$siteName deploy drupal",
$this->taskExec("composer robo deploy:drupal $this->tugboatRoot $siteName web")->run()
);
}
}
...

Parallel Robo Deployment Script

$parallelTasks = $this->taskParallelExec()->printOutput(TRUE);
foreach ($sites as $siteName => $siteInfo) {
// Build the theme.
$parallelTasks->process(
$this->taskExec("composer robo theme:build $siteName")
);
// Deploy Drupal.
$parallelTasks->process(
$this->taskExec("composer robo deploy:drupal $this->tugboatRoot $siteName web")
);
}
$parallelTasks->run();
...

Leveraging the online Build Command

While the parallelization of the build steps was the largest factor in reducing our build times, it wasn't the only optimization we made. The online command allows "commands to run once, after a Preview has built, is online, and is ready to accept incoming requests," which allowed Tugboat to report success to the PR and then complete additional checks.

With this in mind, we moved the configuration validation process from the build command to the online command to further reduce our build times. This improvement also allowed us to report configuration check failures independently of the PR preview’s status. Now a failing config check does not fail the entire build. This empowers PR authors and reviewers to work more nimbly: client stakeholders can still preview a new feature or bugfix while developers sort out configuration issues.

Tugboat Build Time Trial Results

With these changes we took a ~14 minute build process down to ~3:30 minutes, which is a substantial win. That is nearly three times faster according to my math, and it should scale nicely into the future. This should also be a big win for reducing developer context switching while waiting for a QA environment.

The only downside is that we now only have 3 minutes to refill our coffee after submitting a PR. ☕️😏️

Caveat: Parallelization Within Concurrent Builds

We should note that Tugboat offers the ability to perform concurrent preview builds within their infrastructure configuration for some plans. However, when we enabled parallelization within the build process for a preview that was already building several previews concurrently, we quickly began maxing out our CPU and memory usage.

Increasing our server size is one solution to this problem, but we opted for reducing the concurrency of Tugboat preview builds. New previews might have to wait in line, but the line moves much more quickly.

Limiting the number of cores used by Robo's parallel task execution was also proposed as a way to limit the resources one preview would consume. This should allow multiple previews to build at once, each using parallelization within their internal build process. However, this is currently not supported by Robo, so we can't report whether this would be helpful yet.