Introduction to Drupal 8 Migrations

Migrations can be intimidating but with Migrate modules now in core, it’s easier than ever to upgrade or migrate legacy applications to Drupal 8. Let's demystify the process by taking a closer look at how to get started from the ground level.

In this session we’ll cover:

  • A brief overview of the Migrate APIs for importing content to Drupal 8, including Migrate Drupal's capabilities to move content from a Drupal source.
  • Understanding your migration pathway.
  • Getting your site ready for migrated content.
  • Sample migration scripts and configuration files for migrating nodes and field mapping.
  • Consideration of media entities, file attachments, and other dependencies.
  • Using Migrate Drupal UI and Migrate Tools for managing migrations.

For those new to Drupal, this session will introduce the basic concepts and framework for setting up your Drupal 8 migration path successfully.

The slide deck for this presentation is now available.

Code samples for this presentation are on GitHub.

Transcript

Clare Ming: Thanks.

[applause]

Clare: That was a close call. Welcome, everyone. Thank you for coming. This is the session Introduction to the Drupal 8 Migrations. Hopefully, that's where you're wanting to be. I'll start by--

Speaker 2: [unintelligible 00:00:16] that mic I can't hear a word.

Clare: You can't hear me at all?

Speaker 2: Not at all.

Clare: They told me that they would be really loud. Is this okay?

Speaker 2: [unintelligible 00:00:22]

Clare: Can you hear me?

Speaker 2: Yes.

Clare: This sounds really loud to me.

Speaker 2: It's good.

Clare: It's not too loud?

Speaker 2: No. [unintelligible 00:00:30]

Clare: [chuckles] Okay. I'll introduce myself. This is the session Introduction to Drupal 8 Migrations. My name is Clare Ming. I am a developer with Chromatic. Here are a few places where you can find me. We are a fully distributed digital agency. Some of our notable clients. We have a booth here at DrupalCon, 516. Please come by and see us. We love hearing about hard problems and seeing if we can help solve them. Also, if you are a developer, frontend or backend, and you are interested in working remotely for an agency, come see us. We are also hiring.

Before diving in, I just want to be upfront about some assumptions I'm going to be making during the course of my presentation. It's my opinion that in order to tackle a Drupal 8 migration successfully, requires some basic skills. Familiarity with Drupal 8 and hence, PHP, object-oriented programming. Hopefully, you're familiar with DRUSH, the command-line utility for managing Drupal applications. A basic understanding of custom module development and configuration management is also extremely helpful.

I have to confess that when I originally pitched this presentation, it wasn't until I started preparing for it that I realized how daunting a task it is to try to share anything meaningful or of value about migrations in such a short, succinct amount of time. Now that I have a number of migrations under my belt, I thought maybe what I could best offer is what I wish I had known prior to going into some of these migration projects fresh without ever having done them before. If nothing else, and at the very least, I hope that this session provides you with the jumping-off point to level up your migration chops and provide some broad-stroke knowledge of what a move to Drupal 8 involves.

Here's a short list of topics that I hope to cover in the next 20-some minutes. We'll talk about preparing for your migration path, what the Migrate API is and what modules you might need, what are migration plugins, what's YAML all about? We'll also talk about the migration process in the context of ETL, extract, transform, load, as they correspond to source, process, and destination plugins. We'll also touch on how to run migrations using DRUSH, as well as a little bit about debugging and troubleshooting. It's an ambitious list, and we'll try our best to get through it.

It would be irresponsible of me to not, at least, spend one minute, hopefully, less talking about the importance of preparation and planning. Like most efforts in life, usually, preparation and planning increases your probability of your efforts coming to fruition and success, and I believe that is true for Drupal 8. Taking on a data migration is a big deal, it's a big effort, even for a small legacy site, it can be time- consuming and challenging. I think it's really helpful to think of a move to Drupal 8 as being a rebuild as much as it is a migration.

The migration component, obviously, is very important, but you need to have a standing Drupal 8 site ready to go in order to house your migrated data. Out of the planning stage, hopefully, you've made some critical decisions. Not to overstate the obvious, but a thorough content inventory and site audit is absolutely necessary. There's no point burning through development cycles, writing or testing migrations for content that you might not want or even need. You need to know what functionality's required, what content or features you want to keep, what can be discarded because it's no longer used by your legacy system.

How will the information architecture change? This is an opportunity to update the underlying content model or the data structure of the legacy system. Additionally, when you migrate from a previous version of Drupal, you are going to need to create a new theme. With all the changes that came with Drupal 8 and the Twig templating engine, this is going to be the case. Even if you don't have a migration component, every Drupal install will require a custom theme implementation. It's a great chance to update your look and feel, do a redesign, or even a rebranding of your site.

With that said, hopefully, I pounded on the critical importance of preparing for a migration plan. At a very high level, migrations involve really understanding your source data and how you're going to be mapping your fields from the source to the target. You'll also, as I mentioned, need to have a Drupal 8 site built and ready to go, and 90% of your time as a developer is going to be spent testing and iterating on your migrations. The Migrate API is a set of services or a framework for migrating data from an external source into Drupal 8. These are the modules that will ship with core, migrate, obviously.

There's also a handful of Migrate Drupal related modules that handle the specific prior version of Drupal 6 or 7 into Drupal 8. Those are still evolving very much, but it's really exciting to see the work that's being done there. If you're doing a Drupal-to- Drupal migration, definitely check out the upgrade documentation on drupal.org. Also, I highly encourage you to get familiar with the Migrate API on drupal.org. Really, the documentation is evolving and getting much better by the day. Here's a short list of contributing modules that I feel are invaluable for the task of running a successful migration, namely Migrate Tools and Migrate Plus.

Migrate Plus is a project that provides extended features that extend the core migration framework functionality, as well as provide really great example code. I'll be drawing heavily on that in the next few slides. Migrate Tools is the tooling for running your migrations and executing them using DRUSH. Migrate Upgrade is also very useful. It handles the DRUSH integration for the Drupal-to-Drupal migrations. That'll eventually become obsolete as they get into core or into DRUSH. I threw in Migrate Source CSV as an example of contrib projects that is also really helpful for adding additional features that you need. We'll take a look at that here soon as well.

On my GitHub handle, I uploaded a repo called D8migrations. Please go check that out. All the code samples and snippets for this deck will be up there. There are three migrate modules that are actually production code that was run, of course, scrubbed off client data and sensitive information. Feel free to go up there to take a more in- depth look at complete migration modules. There's a source CSV, as well as a D7 to D8 and a WordPress to D8 example module out there. Migrate Plus has a migrate example module that I'll also provide code samples for, and definitely take a look at that for how to do a complete migration.

Migration Plugins are basically definitions or templates for the instructions on how to get your data from the source, your legacy application to its destination, your shiny new Drupal 8 site. Migration Plugins are defined in what's called YAML format, which is a recursive acronym for YAML Ain't Markup Language, which I didn't know prior to preparing for this. This is a human-friendly data-serialization standard for all programming languages, which basically means that user expectations of data are universally met. Migration Plugins specify individual ETL migrations.

That stands for Extract, Transform, Load, which is a standard pattern in computing, where data is extracted from a source, and then the source data is manipulated or transformed, and then sent to the destination, loaded to the destination. The extract phase of a migration is handled by source plugins. Source plugins extract data from the source and return it as rows that represent singular items or objects to be imported along with extra information about the properties of each row. As far as sources go, you're either migrating from an external data store, a sequel backend maybe or a CSV, JSON, XML, or a previous version of Drupal.

The transform phase is defined by what's called process plugins, and that's taking the source data and massaging it until it's in the right format that you need to be ingested. The load phase is handled by destination plugins, which is responsible for creating the entities in the context of content entities being migrated on the Drupal 8 side after the source data's been extracted and transformed. Here are just a few screenshots from the Migrate Plus, D7 to D8 and WordPress examples. All migration plugins, and I'll use that term interchangeably, migration plugins, YML files, migration configuration, they all refer to individual migrations of whatever object type that's getting imported, again, in the context of content entities. In the config install file structure, all three of these examples leverage the Migrate Plus module, so the naming convention there is migrate_plus.migration. whatever symantec name, hopefully it's symantec, of the object that you're importing. That will become your migration ID that you reference as you execute migrations using DRUSH.

A little bit more about source plugins. As we mentioned, they extract your source data and return it as row objects with properties, and each source plugin is determined by the migration definition, and each migration only has a single source plugin. With all the plugins, there are many provided by core, contrib and you can roll your own. You can write custom plugins as well. Because of time constraints, I'll focus on the SQL version of source plugins, all the examples are mostly SQL, so we'll go straight into that.

You need to define your source database connection in your settings or settings.local.php file along with your default Drupal 8 creds. The Drupal database API allows for the definition of multiple database connections, so when you set your source legacy database, keep note of the key. In this case, it's example_D7. Then when you go and look at your migration files, hopefully, you'll have the ID of that source database key identified and matched according to what you have in settings.

Here in the D7D8 example, we're leveraging Migrate Plus which has a cool feature called migrate groups, and in the migrate group declaration, we can actually define a shared configuration source key. Here, we can see that example_D7 matches what we have in our settings file. More often than not, you'll see the source key in individual migration YML files. In the WordPress example, the source key for the database is simply WordPress. If we were to go to the WordPress postmigration YML file, we can see, under the source key, the WordPress key there.

Source plugins live in the plugin migrate source namespace. Again, another screenshot of what that file structure looks like. In source plugin migrate source, a list of source plugins that are basically extensions of the SQL base class in all of these examples. In your source plugin, again, in the context of SQL, you need to overwrite-- you're required to implement three methods, query(), fields(), and getIds(). query(), obviously, the data that you want to extract from your source database, fields() returns the fields that you need to define your source properties and then a unique identifier for each source row, which is what getIds() returns.

In the Migrate Plus migrate_example beer node module, we can see, in the beer node migration plugin that the source plugin is define as beer_node and that also happens to be the migration ID of this. Hopefully, that's not confusing. If we were to go and take a look at the beer node class, the ID is in the annotation, and this will be the case for all your migration plugins. beer_node is right there and you can see the query(), fields() and getIds() methods in that source plugin.

Source plugins also have a very handy method called prepareRow(), which is where you can run additional queries on the source data to add additional information for the object that you're trying to import. In the beer node example, we want to not only just get the beer objects but we want to get associated terms, taxonomy terms, and the row object has get and set source properties that you can use. They're very handy for setting your source properties as well. Another example of prepareRow() in the WordPress postmigration. Here, we're looking for associated tags and attachments.

I also wanted to provide an example of an alternate SQL source plugin. In this case, it's Migrate Source CSV, which is a contrib project that needs to be installed and enabled in order to use it. In the Alpha files migration, we can see the source plugin key is just simply csv and then the key subsequent to that, hopefully, you're pretty intuitive. You can define a path to your CSV, whether it's relative to your project root or in the public file system. Then you can also map the header columns of your CSV to the fields that you use for setting your source properties.

Process plugins are the transform phase of a migration and like source plugins, there are many provided by core and contrib, and you can roll your own and write custom ones. Remember that that's the process through which you're manipulating the source data and getting it ready and prepared for loading into the target application. Here's an example in the beer node module of a process plugin. We talked about the source plugin key and below that, we have a process key, which is basically an associate of array of destination properties.

Process plugins are basically run on each destination property to arrive at a derived value that you'll ascribe for the destination property of your destination entity that you're importing. Another example of your custom process plugin in the WordPress example FileImport. If we go look at that class, it extends process plugin base. Note the ID is file_import. In the transform method, in this particular context, we're getting a source property of a file path and then seeing if we can retrieve it. It will return a file ID, meaning that it's been ingested into the files managed system of Drupal.

One really cool thing about process plugins is that they can be chained. Here's an example of a combination of core and custom process plugins being used in what's called a process pipeline. If we look at the WordPress users migration, the process key has a destination property called field_user_photo and the dashes underneath it represent an array of process plugins that are run sequentially. The first one, skip_on_empty. Pretty obvious, it'll skip the row if the value is empty. The source property of user_photo_url. You only need to pass the source parameter to your first process plugin in the pipeline.

It then pass nodes to pass it to the next one sequentially. Then if it's not empty, it'll check if it exists and if it exists, it'll finally import the file. Very handy thing to know that you can chain process plugins together that way. You can also use variables and set variables for your destination property. In this example, in the Custom to D8 Alpha files migration, we have a destination property called destination folder, which accepts a parameter, a file path, and runs through the Alpha destination process plugin. In this particular context, it's going to return either a public or a private stream wrapper.

Then you can access the value of destination folder by using this notation, the @ sign enclosed in quotes. The URI destination property is using a core process plugin called concat, which concatenates the source parameters, the derived variables of destination folder and file name to create the value for the URI destination property. You can also define constants in your plugins as well, which is super handy. Here is a list of the process plugins provided by core, some of the ones that you'll probably end up using a lot. format_date is a great one. You can just format dates directly in your configuration files.

migration_lookup is handy. You can look up, for dependent migrations, IDs that you might need. sub_process is another one that I end up using a lot. It used to be known as the iterator process plugin but that's been deprecated. Here's an example of migrate_lookup and sub_process being used in the D7 to D8 example. Here, the destination property field_related_persons is a multi-value entity reference field and uses the sub_process and migration_lookups to populate those destination property values. You can also pass additional values that you need.

In this example, the module key is set with the value of a module name. If we go take a look at that parse_xml process plugin class that extends process plugin base, the transform method can access the value you set using the configuration property with the key name that you define. Here are some examples of the process plugins provided by Migrate Plus, entity_lookup is one of them that I use a lot for defining IDs for entity reference fields. entity_generate another great one for creating entities on the fly in your configuration if they don't already exist.

I really recommend getting to know the process plugins provided by core and contrib. I definitely don't recommend going down the path of writing a custom plugin, because you didn't know that it already existed in core or contrib. Destination plugins are the load phase of a migration and that's what's handling, actually saving your imported data into Drupal 8. Destination plugins dictate the fields that the data can be saved to, they also provide a unique ID of your record that's created for mapping purposes. On the back end, there are mapping tables that are in your database as you run migrations with unique IDs for execution and rollback.

Ultimately, destination plugins create new records from your imported data and persist them to Drupal storage. Full disclosure, I've never written a custom destination plugin. That's I think more in the context of if you're a maintainer of a module, and you're responsible for data that needs to move from one system to another, that's when you would write your own custom destination plugins. In all the migrations that I've done, I've only had to use the ones provided by core and contrib, so the next few screenshots in the deck will just give examples of that in the migration plugin files.

In the WordPress post example, the destination plugin declaration is just simply entity and the type of entity in which case this is node, and the bundle is specified as post, so of node type post. In a user's migration, you can just specify the destination plugin as entity user. Then I wanted to also include an example of using file and media entities, in this case, the alpha files migration ingests a bunch of files from the source application and uses the destination plugin entity file to create the file entities in Drupal.

If you're taking advantage of media, then you need to also create media entities out of them and so the files migration is a dependency for generating the media entities. Here you can just define the destination plugin as entity medium. Just a word about configuration import and export. If you are familiar with configuration management, you know that if you are making changes to YAML files, you have to re-import that config to refresh and see your changes. Usually, you can get away with doing a partial config import, and that looks like a Drush cim or a Drush config import with the partial flag as well as a source flag specifying the config install directory of your custom migrate module.

You also need to do a cache rebuild if you create new plugins, and of course, you want to export that config when you're ready to ship. Here are a few resources for running migrations. Again, if you're doing a Drupal to Drupal migration, the tools there are fast-evolving and very impressive. You can do things in the UI or if you have migrate upgrade, run a Drush command where you can import entities from Drupal 6 or Drupal 7, literally with a click of a button. They're not as quite as flexible as running migrations or executing them through Drush, so that is my preferred way of executing and running migrations, we'll talk about that for a second.

Running migrations from command line requires the migrate tools module to be installed and enabled, and these are the Drush commands that you will be running over and over and over again, as you test and iterate, migrate-status, migrate-import, migrate-rollback being the top three. When you run migrations, sometimes things get stuck, so migrate-reset-status is also a good one to reset the migration to idle. Here are the Drush commands as you run them, drush migrate-import with the MIGRATION-ID is how you run the import operation.

You can pass really useful flags as you're in active development. If you are running files migration, say and there's thousands of nodes, rather than just looking at a blank command prompt, you can request feedback for whatever you specify in that flag and it'll spit back information about what's been processed, every 500 items that are imported. You can also set limits. That's really handy too if you're starting out with migration and just want to see what happens to a handful of rows. You can also pass a specific ID. If you want to look at a particular row object and see what's happening in there, you can also pass an ID to it.

A lot of handy flags to explore and lookup and of course, drush-migrate-rollback to rollback your migrations. This is just a quick glance of what you can expect to see if you run Drush migrate-status or ms in the command prompt in the command line, it'll just give you back some information about when things were imported, and what's been processed or unprocessed. Just a note about debugging and troubleshooting. I am an evangelist of a debugger that allows you to set breakpoints and inspect values of various variables as you're running migrations.

Xdebug is my debugger of choice, you want to configure that to work with your PHP CLI so that you can set breakpoints during Drush-triggered migrations but Migratable also has some really cool utilities that you can run with the Drush migrate-import command. It gives you flags that allow you to debug each row, it basically spits back the debug info in your command prompt. Migrate debug, migrate debug pre, prints out the rows as they run.

I wanted to include a quote from Lucas Hedding who's one of the maintainers for Migrate Plus and Migrate Tools. He wrote a post not that long ago, just saying where he sets his breakpoints when he's troubleshooting migrations, in Core's migrate executable, he sets them on the import() and processRow() methods but if you know which process plugin or whichever plugin is giving you grief, you can set them right in there and take a look at what's happening to try to debug and troubleshoot your migration.

Just a few resources that I drew heavily on in prepare preparation for this talk. Again, really highly recommend the migrate API, as well as Drupalize.Me has a great series of tutorials on migrations and as I already mentioned, Migrate Plus has migrate_example and migrate_example_advanced and which are fully fleshed out migration examples that are designed to walk you through the basic concepts of a migration. I also wanted to not only acknowledge Migrate API and Drupalize.Me, but also whenever I seem to have a question about migrations, Mike Ryan seems to have the answer when I google, so shout out to him.

I think he's also the chief architect and developer of Migrate-- one of them. Also, a shout out to my illustrious colleague Adam Zimmermann, who wrote the initial WordPress scripts that are up on my GitHub handle, and also Les Cordell, who took the original CSS scripts that I wrote for a recent client project and took them next level. I apologize that I didn't really leave room for questions as this is a big challenge to try to crush everything together but please come see us at our booth 516. I'll be there if you have any questions or just want to follow up. Thank you for coming.

[applause]

[00:29:22] [END OF AUDIO]