Migrations are fraught with unexpected discoveries and issues. Fighting memory issues with particularly long or processing heavy migrations should not be another obstacle to overcome, however, that is often not the case.
For instance, I recently ran into this error no matter how high I raised the PHP memory_limit
or lowered the --limit
flag value.
$ drush migrate:import example_migration --limit=1000
Fatal error: Allowed memory size of 4244635648 bytes exhausted (tried to allocate 45056 bytes) in web/core/lib/Drupal/Core/Database/Statement.php on line 59
Fatal error: Allowed memory size of 4244635648 bytes exhausted (tried to allocate 65536 bytes) in vendor/symfony/http-kernel/Controller/ArgumentResolver/DefaultValueResolver.php on line 23
It should be noted that the --limit
flag, while extremely useful, does not reduce the number of rows loaded into memory. It simply limits the number of destination records created. The source data query has no LIMIT
statement, and the processRow(Row $row)
method in the source plugin class is still called for every row.
Batch Size
This is where query batch size functionality comes in. This functionality is located in \Drupal\migrate\Plugin\migrate\source\SqlBase
and allows for the source data query to be performed in batches, effectively using SQL LIMIT
statements.
This can be controlled in the source plugin class with via the batchSize
property.
/**
* {@inheritdoc}
*/
protected $batchSize = 1000;
Alternatively, it can be set in the migration yml file with the batch_size
property under the source
definition.
source:
batch_size: 10
plugin: example_migration_source
key: example_source_db
There are very few references that I could find in existing online documentation. I eventually discovered it via a passing reference in a Drupal.org issue queue discussion.
Once I knew what I was looking for, I went searching for how this worked and discovered several other valuable options in the migration SqlBase
class.
\Drupal\migrate\Plugin\migrate\source\SqlBase
/**
* Sources whose data may be fetched via a database connection.
*
* Available configuration keys:
* - database_state_key: (optional) Name of the state key which contains an
* array with database connection information.
* - key: (optional) The database key name. Defaults to 'migrate'.
* - target: (optional) The database target name. Defaults to 'default'.
* - batch_size: (optional) Number of records to fetch from the database during
* each batch. If omitted, all records are fetched in a single query.
* - ignore_map: (optional) Source data is joined to the map table by default to
* improve migration performance. If set to TRUE, the map table will not be
* joined. Using expressions in the query may result in column aliases in the
* JOIN clause which would be invalid SQL. If you run into this, set
* ignore_map to TRUE.
*
* For other optional configuration keys inherited from the parent class, refer
* to \Drupal\migrate\Plugin\migrate\source\SourcePluginBase.
* …
*/
abstract class SqlBase extends SourcePluginBase implements ContainerFactoryPluginInterface, RequirementsInterface {
Migration Limit
Despite the “flaws” of the --limit
flag, it still offers us a valuable tool in our effort to mitigate migration memory issues and increase migration speed. My anecdotal evidence from timing responses from the --feedback
flag shows a much high migration throughput for the initial items, and a gradually tapering speed as a migration progresses.
I also encountered an issue where the migration memory reclamation process eventually failed and the migration ground to a halt. I was not alone in this issue, MediaCurrent found and documented this issue in their post Memory Management with Migrations in Drupal 8.
Memory usage is 2.57 GB (85% of limit 3.02 GB), reclaiming memory. [warning]
Memory usage is now 2.57 GB (85% of limit 3.02 GB), not enough reclaimed, starting new batch [warning]
Processed 1007 items (1007 created, 0 updated, 0 failed, 0 ignored) - done with 'nodes_articles'
The migration would then cease to continue importing items as if it had finished, while there were still several hundred thousand nodes left to import. Running the import again would produce the same result.
I adapted the approach MediaCurrent showed in their post to work with Drush 9. It solved the memory issue, improved migration throughput, and provided a standardized way to trigger migrations upon deployment or during testing.
The crux of the solution is to repeatedly call drush migrate:import
in a loop with a low --limit
value to keep the average item processing time lower.
Our updated version of the script is available in a gist.
Disabling Hooks
Another potential pitfall is the execution of entity hooks during a migration. Often times these hooks perform important logic, but during a migration these additional actions may be unwanted. I have yet to find a way of determining if an entity hook is being triggered from a migration without using custom code. Leveraging a faux field and checking for its existence as shown by this article is the best approach I have seen so far.
So the next time you are tasked with an overwhelming migration challenge, you no longer need to worry about memory issues. Now you can stick to focusing on tracking down the source data, processing and mapping it, and all of the other challenges migrations tend to surface.