Migration Memory Management with Batching and Limits

Migrations are fraught with unexpected discoveries and issues. Fighting memory issues with particularly long or processing heavy migrations should not be another obstacle to overcome, however, that is often not the case.

For instance, I recently ran into this error no matter how high I raised the PHP memory_limit or lowered the --limit flag value.

$ drush migrate:import example_migration --limit=1000

Fatal error: Allowed memory size of 4244635648 bytes exhausted (tried to allocate 45056 bytes) in web/core/lib/Drupal/Core/Database/Statement.php on line 59

Fatal error: Allowed memory size of 4244635648 bytes exhausted (tried to allocate 65536 bytes) in vendor/symfony/http-kernel/Controller/ArgumentResolver/DefaultValueResolver.php on line 23

It should be noted that the --limit flag, while extremely useful, does not reduce the number of rows loaded into memory. It simply limits the number of destination records created. The source data query has no LIMIT statement, and the processRow(Row $row) method in the source plugin class is still called for every row.

Batch Size

This is where query batch size functionality comes in. This functionality is located in \Drupal\migrate\Plugin\migrate\source\SqlBase and allows for the source data query to be performed in batches, effectively using SQL LIMIT statements.

This can be controlled in the source plugin class with via the batchSize property.

/**
* {@inheritdoc}
*/

protected $batchSize = 1000;

Alternatively, it can be set in the migration yml file with the batch_size property under the source definition.

source:
batch_size: 10
plugin: example_migration_source
key: example_source_db

There are very few references that I could find in existing online documentation. I eventually discovered it via a passing reference in a Drupal.org issue queue discussion.

Once I knew what I was looking for, I went searching for how this worked and discovered several other valuable options in the migration SqlBase class.

\Drupal\migrate\Plugin\migrate\source\SqlBase

/**
* Sources whose data may be fetched via a database connection.
*
* Available configuration keys:
* - database_state_key: (optional) Name of the state key which contains an
* array with database connection information.
* - key: (optional) The database key name. Defaults to 'migrate'.
* - target: (optional) The database target name. Defaults to 'default'.
* - batch_size: (optional) Number of records to fetch from the database during
* each batch. If omitted, all records are fetched in a single query.
* - ignore_map: (optional) Source data is joined to the map table by default to
* improve migration performance. If set to TRUE, the map table will not be
* joined. Using expressions in the query may result in column aliases in the
* JOIN clause which would be invalid SQL. If you run into this, set
* ignore_map to TRUE.
*
* For other optional configuration keys inherited from the parent class, refer
* to \Drupal\migrate\Plugin\migrate\source\SourcePluginBase.
* …
*/

abstract class SqlBase extends SourcePluginBase implements ContainerFactoryPluginInterface, RequirementsInterface {

Migration Limit

Despite the “flaws” of the --limit flag, it still offers us a valuable tool in our effort to mitigate migration memory issues and increase migration speed. My anecdotal evidence from timing responses from the --feedback flag shows a much high migration throughput for the initial items, and a gradually tapering speed as a migration progresses.

I also encountered an issue where the migration memory reclamation process eventually failed and the migration ground to a halt. I was not alone in this issue, MediaCurrent found and documented this issue in their post Memory Management with Migrations in Drupal 8.

Memory usage is 2.57 GB (85% of limit 3.02 GB), reclaiming memory. [warning] Memory usage is now 2.57 GB (85% of limit 3.02 GB), not enough reclaimed, starting new batch [warning] Processed 1007 items (1007 created, 0 updated, 0 failed, 0 ignored) - done with 'nodes_articles' The migration would then cease to continue importing items as if it had finished, while there were still several hundred thousand nodes left to import. Running the import again would produce the same result.

I adapted the approach MediaCurrent showed in their post to work with Drush 9. It solved the memory issue, improved migration throughput, and provided a standardized way to trigger migrations upon deployment or during testing.

The crux of the solution is to repeatedly call drush migrate:import in a loop with a low --limit value to keep the average item processing time lower.

Our updated version of the script is available in a gist.

Disabling Hooks

Another potential pitfall is the execution of entity hooks during a migration. Often times these hooks perform important logic, but during a migration these additional actions may be unwanted. I have yet to find a way of determining if an entity hook is being triggered from a migration without using custom code. Leveraging a faux field and checking for its existence as shown by this article is the best approach I have seen so far.

So the next time you are tasked with an overwhelming migration challenge, you no longer need to worry about memory issues. Now you can stick to focusing on tracking down the source data, processing and mapping it, and all of the other challenges migrations tend to surface.